Re: Hive on Spark

2015-09-03 Thread Patrick McAnneny
What is the benefit of Hive on Spark if you cannot pre-load data into
memory that you know will be queried.

On Mon, Aug 31, 2015 at 4:25 PM, Xuefu Zhang  wrote:

> What you described isn't part of the functionality of Hive on Spark.
> Rather, Spark is used here as a general purpose engine similar to MR but
> without intemediate stages. It's batch origientated.
>
> Keeping 100T data in memory is hardly beneficial unless you know that that
> dataset is going to be used in subsequent queries.
>
> For loading data in memory and providing near real-time response, you
> might want to look at some memory-based DBs.
>
> Thanks,
> Xuefu
>
> On Thu, Aug 27, 2015 at 9:11 AM, Patrick McAnneny <
> patrick.mcann...@leadkarma.com> wrote:
>
>> Once I get "hive.execution.engine=spark" working, how would I go about
>> loading portions of my data into memory? Lets say I have a 100TB database
>> and want to load all of last weeks data in spark memory, is this possible
>> or even beneficial? Or am I thinking about hive on spark in the wrong way.
>>
>> I also assume hive on spark could get me to near-real-time capabilities
>> for large queries. Is this true?
>>
>
>


Table level stats are not shown after insert starting in Hive 0.13?

2015-09-03 Thread Jim Green
*Hive 0.12:*
After insert SQL:
Partition default.mytablepar{id=111} stats: [num_files: 1, num_rows: 0,
total_size: 4, raw_data_size: 0]
Table default.mytablepar stats: [num_partitions: 1, num_files: 1, num_rows:
0, total_size: 4, raw_data_size: 0]

*Hive 0.13:*
After insert SQL:
Partition default.mytablepar{id=111} stats: [numFiles=2, numRows=1,
totalSize=9, rawDataSize=3]


What is the rational beind the change? And how can we enable that table
level stats shown after insert?
Thanks.

-- 
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)


Re: Request for write access to the Hive wiki

2015-09-03 Thread Lefty Leverenz
You've got it.  Welcome to the Hive wiki team, Aswathy!

-- Lefty


On Thu, Sep 3, 2015 at 1:21 PM, Aswathy C.S <2aswa...@gmail.com> wrote:

> Hi Lefty,
>
> I have created Confluence account, same username: asreekumar.
> Misunderstood Confluence account for apache account. Thanks for the signup
> link. Hope you could help me with write access now.
>
> Aswathy
>
> On Thu, Sep 3, 2015 at 1:30 AM, Lefty Leverenz 
> wrote:
>
>> Aswathy, Confluence doesn't recognize that username (although JIRA
>> does), nor does it recognize your actual name.  Did you create the
>> Confluence account here
>> ?
>>
>> -- Lefty
>>
>>
>> On Wed, Sep 2, 2015 at 6:12 PM, Aswathy C.S <2aswa...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I would like to get write access to Hive will. My Confluence username:
>>> asreekumar.
>>>
>>> thanks
>>> Aswathy
>>>
>>
>>
>


Subquery in select statement

2015-09-03 Thread Daniel Lopes
Hi,

There are somthing that I can do this?

SELECT
   tb.id,
   (SELECT tb3.field FROM database.table2 tb2 JOIN database.table3 tb3 ON (
tb3.id = tb2.table3_id) ORDER BY tb3.date DESC LIMIT 1) AS tb3_field
FROM database.table1 tb1


Best,

*Daniel Lopes, B.Eng*
Data Scientist - BankFacil
CREA/SP 5069410560

Mob +55 (18) 99764-2733 
Ph +55 (11) 3522-8009
http://about.me/dannyeuu

Av. Nova Independência, 956, São Paulo, SP
Bairro Brooklin Paulista
CEP 04570-001
https://www.bankfacil.com.br


Re: Request for write access to the Hive wiki

2015-09-03 Thread Aswathy C.S
Hi Lefty,

I have created Confluence account, same username: asreekumar. Misunderstood
Confluence account for apache account. Thanks for the signup link. Hope you
could help me with write access now.

Aswathy

On Thu, Sep 3, 2015 at 1:30 AM, Lefty Leverenz 
wrote:

> Aswathy, Confluence doesn't recognize that username (although JIRA does),
> nor does it recognize your actual name.  Did you create the Confluence
> account here ?
>
> -- Lefty
>
>
> On Wed, Sep 2, 2015 at 6:12 PM, Aswathy C.S <2aswa...@gmail.com> wrote:
>
>> Hi,
>>
>> I would like to get write access to Hive will. My Confluence username:
>> asreekumar.
>>
>> thanks
>> Aswathy
>>
>
>


Re: ORC NPE while writing stats

2015-09-03 Thread Prasanth Jayachandran

> On Sep 2, 2015, at 10:57 PM, David Capwell  wrote:
> 
> So, very quickly looked at the JIRA and I had the following question;
> if you have a pool per thread rather than global, then assuming 50%
> heap will cause writer to OOM with multiple threads, which is
> different than older (0.14) ORC, correct?
> 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcConf.java#L83
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/MemoryManager.java#L94
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java#L226
> 
> So with orc.memory.pool=0.5, this value only seems to make sense if
> single threaded, so if you are writing with multiple threads, then I
> assume the value should be (0.5 / #threads), so if 50 threads then
> 0.01 should be the value?

Yes. You are correct. Since hive’s operator pipeline is single threaded there 
was uncontested locks causing slow down. Hence the change.
I will create a JIRA to update the docs to reflect the same and the config 
description. If multiple threads are writing then you might
need to share the heap for multiple writers. 

> If this is true, I can't find any documentation about this, all docs
> make it sound global.
> 

Noted. Since this is unreleased version, I will create a jira to make sure this 
gets reflected in the docs.

> On Wed, Sep 2, 2015 at 7:34 PM, David Capwell  wrote:
>> Thanks for the jira, will see if that works for us.
>> 
>> On Sep 2, 2015 7:11 PM, "Prasanth Jayachandran"
>>  wrote:
>>> 
>>> Memory manager is made thread local
>>> https://issues.apache.org/jira/browse/HIVE-10191
>>> 
>>> Can you try the patch from HIVE-10191 and see if that helps?
>>> 
>>> On Sep 2, 2015, at 8:58 PM, David Capwell  wrote:
>>> 
>>> I'll try that out and see if it goes away (not seen this in the past 24
>>> hours, no code change).
>>> 
>>> Doing this now means that I can't share the memory, so will prob go with a
>>> thread local and allocate fixed sizes to the pool per thread (50% heap / 50
>>> threads).  Will most likely be awhile before I can report back (unless it
>>> fails fast in testing)
>>> 
>>> On Sep 2, 2015 2:11 PM, "Owen O'Malley"  wrote:
 
 (Dropping dev)
 
 Well, that explains the non-determinism, because the MemoryManager will
 be shared across threads and thus the stripes will get flushed at
 effectively random times.
 
 Can you try giving each writer a unique MemoryManager? You'll need to put
 a class into the org.apache.hadoop.hive.ql.io.orc package to get access to
 the necessary class (MemoryManager) and method
 (OrcFile.WriterOptions.memory). We may be missing a synchronization on the
 MemoryManager somewhere and thus be getting a race condition.
 
 Thanks,
   Owen
 
 On Wed, Sep 2, 2015 at 12:57 PM, David Capwell 
 wrote:
> 
> We have multiple threads writing, but each thread works on one file, so
> orc writer is only touched by one thread (never cross threads)
> 
> On Sep 2, 2015 11:18 AM, "Owen O'Malley"  wrote:
>> 
>> I don't see how it would get there. That implies that minimum was null,
>> but the count was non-zero.
>> 
>> The ColumnStatisticsImpl$StringStatisticsImpl.serialize looks like:
>> 
>> @Override
>> OrcProto.ColumnStatistics.Builder serialize() {
>>  OrcProto.ColumnStatistics.Builder result = super.serialize();
>>  OrcProto.StringStatistics.Builder str =
>>OrcProto.StringStatistics.newBuilder();
>>  if (getNumberOfValues() != 0) {
>>str.setMinimum(getMinimum());
>>str.setMaximum(getMaximum());
>>str.setSum(sum);
>>  }
>>  result.setStringStatistics(str);
>>  return result;
>> }
>> 
>> and thus shouldn't call down to setMinimum unless it had at least some
>> non-null values in the column.
>> 
>> Do you have multiple threads working? There isn't anything that should
>> be introducing non-determinism so for the same input it would fail at the
>> same point.
>> 
>> .. Owen
>> 
>> 
>> 
>> 
>> On Tue, Sep 1, 2015 at 10:51 PM, David Capwell 
>> wrote:
>>> 
>>> We are writing ORC files in our application for hive to consume.
>>> Given enough time, we have noticed that writing causes a NPE when
>>> working with a string column's stats.  Not sure whats causing it on
>>> our side yet since replaying the same data is just fine, it seems more
>>> like this just happens over time (different data sources will hit this
>>> around the same time in the same JVM).
>>> 
>>> Here is the code in question, and below is the exception:
>>> 
>>> final Writer writer = 

Re: ORC NPE while writing stats

2015-09-03 Thread David Capwell
Thanks, that should help moving forward
On Sep 3, 2015 10:38 AM, "Prasanth Jayachandran" <
pjayachand...@hortonworks.com> wrote:

>
> > On Sep 2, 2015, at 10:57 PM, David Capwell  wrote:
> >
> > So, very quickly looked at the JIRA and I had the following question;
> > if you have a pool per thread rather than global, then assuming 50%
> > heap will cause writer to OOM with multiple threads, which is
> > different than older (0.14) ORC, correct?
> >
> >
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcConf.java#L83
> >
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/MemoryManager.java#L94
> >
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java#L226
> >
> > So with orc.memory.pool=0.5, this value only seems to make sense if
> > single threaded, so if you are writing with multiple threads, then I
> > assume the value should be (0.5 / #threads), so if 50 threads then
> > 0.01 should be the value?
>
> Yes. You are correct. Since hive’s operator pipeline is single threaded
> there was uncontested locks causing slow down. Hence the change.
> I will create a JIRA to update the docs to reflect the same and the config
> description. If multiple threads are writing then you might
> need to share the heap for multiple writers.
>
> > If this is true, I can't find any documentation about this, all docs
> > make it sound global.
> >
>
> Noted. Since this is unreleased version, I will create a jira to make sure
> this gets reflected in the docs.
>
> > On Wed, Sep 2, 2015 at 7:34 PM, David Capwell 
> wrote:
> >> Thanks for the jira, will see if that works for us.
> >>
> >> On Sep 2, 2015 7:11 PM, "Prasanth Jayachandran"
> >>  wrote:
> >>>
> >>> Memory manager is made thread local
> >>> https://issues.apache.org/jira/browse/HIVE-10191
> >>>
> >>> Can you try the patch from HIVE-10191 and see if that helps?
> >>>
> >>> On Sep 2, 2015, at 8:58 PM, David Capwell  wrote:
> >>>
> >>> I'll try that out and see if it goes away (not seen this in the past 24
> >>> hours, no code change).
> >>>
> >>> Doing this now means that I can't share the memory, so will prob go
> with a
> >>> thread local and allocate fixed sizes to the pool per thread (50% heap
> / 50
> >>> threads).  Will most likely be awhile before I can report back (unless
> it
> >>> fails fast in testing)
> >>>
> >>> On Sep 2, 2015 2:11 PM, "Owen O'Malley"  wrote:
> 
>  (Dropping dev)
> 
>  Well, that explains the non-determinism, because the MemoryManager
> will
>  be shared across threads and thus the stripes will get flushed at
>  effectively random times.
> 
>  Can you try giving each writer a unique MemoryManager? You'll need to
> put
>  a class into the org.apache.hadoop.hive.ql.io.orc package to get
> access to
>  the necessary class (MemoryManager) and method
>  (OrcFile.WriterOptions.memory). We may be missing a synchronization
> on the
>  MemoryManager somewhere and thus be getting a race condition.
> 
>  Thanks,
>    Owen
> 
>  On Wed, Sep 2, 2015 at 12:57 PM, David Capwell 
>  wrote:
> >
> > We have multiple threads writing, but each thread works on one file,
> so
> > orc writer is only touched by one thread (never cross threads)
> >
> > On Sep 2, 2015 11:18 AM, "Owen O'Malley"  wrote:
> >>
> >> I don't see how it would get there. That implies that minimum was
> null,
> >> but the count was non-zero.
> >>
> >> The ColumnStatisticsImpl$StringStatisticsImpl.serialize looks like:
> >>
> >> @Override
> >> OrcProto.ColumnStatistics.Builder serialize() {
> >>  OrcProto.ColumnStatistics.Builder result = super.serialize();
> >>  OrcProto.StringStatistics.Builder str =
> >>OrcProto.StringStatistics.newBuilder();
> >>  if (getNumberOfValues() != 0) {
> >>str.setMinimum(getMinimum());
> >>str.setMaximum(getMaximum());
> >>str.setSum(sum);
> >>  }
> >>  result.setStringStatistics(str);
> >>  return result;
> >> }
> >>
> >> and thus shouldn't call down to setMinimum unless it had at least
> some
> >> non-null values in the column.
> >>
> >> Do you have multiple threads working? There isn't anything that
> should
> >> be introducing non-determinism so for the same input it would fail
> at the
> >> same point.
> >>
> >> .. Owen
> >>
> >>
> >>
> >>
> >> On Tue, Sep 1, 2015 at 10:51 PM, David Capwell 
> >> wrote:
> >>>
> >>> We are writing ORC files in our application for hive to consume.
> >>> Given enough time, we have noticed that writing causes a NPE when
> >>> working with a string column's stats.  Not sure whats 

Re: Disabling local mode optimization

2015-09-03 Thread sreebalineni .
Hi,

Is not it that you should set it true, by default it is disabled which is
false.

Hive analyzes the size of each map-reduce job in a query and may run it
locally if the following thresholds are satisfied:

   - The total input size of the job is lower than:
   hive.exec.mode.local.auto.inputbytes.max (128MB by default)
   - The total number of map-tasks is less than:
   hive.exec.mode.local.auto.tasks.max (4 by default)
   - The total number of reduce tasks required is 1 or 0.

So for queries over small data sets, or for queries with multiple
map-reduce jobs where the input to subsequent jobs is substantially smaller
(because of reduction/filtering in the prior job), jobs may be run locally.

so we may need to check the sizeof your input, which version of hive are
you using? it can work only from Hive 0.7 onwards

On Wed, Sep 2, 2015 at 4:46 PM, Daniel Haviv <
daniel.ha...@veracity-group.com> wrote:

> Hi,
> I would like to disable the optimization where a query that just selects
> data is running without mapreduce (local mode).
>
> hive.exec.mode.local.auto is set to false but hive still runs in local mode 
> for some queries.
>
>
> How can I disable local mode completely?
>
>
> Thank you.
>
> Daniel
>
>


Re: Request for write access to the Hive wiki

2015-09-03 Thread Lefty Leverenz
Aswathy, Confluence doesn't recognize that username (although JIRA does),
nor does it recognize your actual name.  Did you create the Confluence
account here ?

-- Lefty


On Wed, Sep 2, 2015 at 6:12 PM, Aswathy C.S <2aswa...@gmail.com> wrote:

> Hi,
>
> I would like to get write access to Hive will. My Confluence username:
> asreekumar.
>
> thanks
> Aswathy
>


Re: unsubscribe

2015-09-03 Thread Lefty Leverenz
Sasha, to unsubscribe please send a message to
user-unsubscr...@hive.apache.org as described here:  Mailing Lists
.  Thanks.

-- Lefty


On Thu, Sep 3, 2015 at 2:35 AM, Sasha Ostrikov  wrote:

> unsubscribe
>


unsubscribe

2015-09-03 Thread Sasha Ostrikov
unsubscribe


Re: Disabling local mode optimization

2015-09-03 Thread Daniel Haviv
Excatly the info I needed.
Thanks

Daniel

> On 3 בספט׳ 2015, at 09:02, sreebalineni .  wrote:
> 
> Hi,
> 
> Is not it that you should set it true, by default it is disabled which is 
> false.
> Hive analyzes the size of each map-reduce job in a query and may run it 
> locally if the following thresholds are satisfied:
> The total input size of the job is lower than: 
> hive.exec.mode.local.auto.inputbytes.max (128MB by default)
> The total number of map-tasks is less than: 
> hive.exec.mode.local.auto.tasks.max (4 by default)
> The total number of reduce tasks required is 1 or 0.
> So for queries over small data sets, or for queries with multiple map-reduce 
> jobs where the input to subsequent jobs is substantially smaller (because of 
> reduction/filtering in the prior job), jobs may be run locally.
> so we may need to check the sizeof your input, which version of hive are you 
> using? it can work only from Hive 0.7 onwards
> 
>> On Wed, Sep 2, 2015 at 4:46 PM, Daniel Haviv 
>>  wrote:
>> Hi,
>> I would like to disable the optimization where a query that just selects 
>> data is running without mapreduce (local mode).
>> hive.exec.mode.local.auto is set to false but hive still runs in local mode 
>> for some queries.
>> 
>> How can I disable local mode completely?
>> 
>> Thank you.
>> Daniel
> 


Error upon XML serialization.

2015-09-03 Thread Raajay
I attempted to serialize a "QueryPlan (ql/QueryPlan.java)" using
Utilities.serializePlan() function, and encountered errors (pasted below);
I had
"hive.plan.serialization.format=javaXML". No such exception was thrown
while setting serialization format to "kryo". However, I was not successful
in deserializing the object after writing the serialized form to disk. So,

1. Does it make sense to serialize QueryPlan ?
2. If yes, what are the correct configurations ?
3. If not, which is the ideal data structure to serialize after the query
compilation stage ?

Thanks,
Raajay
ps: Apologies for including long log output, but perhaps it is helpful.







15/09/03 19:21:51 [main]: INFO exec.Utilities: Serializing QueryPlan via
javaXML
15/09/03 19:21:51 [main]: INFO exec.Utilities: Raajay: Serializing using
JAVA-XML
15/09/03 19:21:51 [main]: WARN exec.Utilities:
java.lang.InstantiationException: org.apache.hadoop.hive.ql.plan.TezWork
at java.lang.Class.newInstance(Class.java:427)
at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
at java.beans.Statement.invokeInternal(Statement.java:292)
at java.beans.Statement.access$000(Statement.java:58)
at java.beans.Statement$2.run(Statement.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at java.beans.Statement.invoke(Statement.java:182)
at java.beans.Expression.getValue(Expression.java:155)
at java.beans.Encoder.getValue(Encoder.java:105)
at java.beans.Encoder.get(Encoder.java:252)
at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:112)
at java.beans.Encoder.writeObject(Encoder.java:74)
at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
at java.beans.Encoder.writeExpression(Encoder.java:330)
at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
at java.beans.Encoder.writeObject(Encoder.java:74)
at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
at java.beans.Encoder.writeExpression(Encoder.java:330)
at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
at
java.beans.DefaultPersistenceDelegate.doProperty(DefaultPersistenceDelegate.java:196)
at
java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:258)
at
java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:406)
at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
at java.beans.Encoder.writeObject(Encoder.java:74)
at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
at java.beans.Encoder.writeExpression(Encoder.java:330)
at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
at java.beans.Encoder.writeObject(Encoder.java:74)
at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
at java.beans.Encoder.writeObject1(Encoder.java:258)
at java.beans.Encoder.cloneStatement(Encoder.java:271)
at java.beans.Encoder.writeStatement(Encoder.java:301)
at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:400)
at
java.beans.DefaultPersistenceDelegate.invokeStatement(DefaultPersistenceDelegate.java:219)
at
java.beans.MetaData$java_util_List_PersistenceDelegate.initialize(MetaData.java:655)
at java.beans.PersistenceDelegate.initialize(PersistenceDelegate.java:214)
at
java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:404)
at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
at java.beans.Encoder.writeObject(Encoder.java:74)
at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
at java.beans.Encoder.writeExpression(Encoder.java:330)
at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
at java.beans.Encoder.writeObject(Encoder.java:74)
at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
at java.beans.Encoder.writeExpression(Encoder.java:330)
at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
at
java.beans.DefaultPersistenceDelegate.doProperty(DefaultPersistenceDelegate.java:196)
at
java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:258)
at
java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:406)
at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
at java.beans.Encoder.writeObject(Encoder.java:74)
at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
at java.beans.Encoder.writeExpression(Encoder.java:330)
at 

Re: Table level stats are not shown after insert starting in Hive 0.13?

2015-09-03 Thread Jim Green
Adding Dev user list.
Could somebody help take a look?

On Thu, Sep 3, 2015 at 12:25 PM, Jim Green  wrote:

> Also tried Hive 1.0, and the result is the same as Hive 0.13.
> Is there any reason why we do not print the table level stats for
> partition table?
>
> On Thu, Sep 3, 2015 at 10:41 AM, Jim Green  wrote:
>
>> *Hive 0.12:*
>> After insert SQL:
>> Partition default.mytablepar{id=111} stats: [num_files: 1, num_rows: 0,
>> total_size: 4, raw_data_size: 0]
>> Table default.mytablepar stats: [num_partitions: 1, num_files: 1,
>> num_rows: 0, total_size: 4, raw_data_size: 0]
>>
>> *Hive 0.13:*
>> After insert SQL:
>> Partition default.mytablepar{id=111} stats: [numFiles=2, numRows=1,
>> totalSize=9, rawDataSize=3]
>>
>>
>> What is the rational beind the change? And how can we enable that table
>> level stats shown after insert?
>> Thanks.
>>
>> --
>> Thanks,
>> www.openkb.info
>> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
>>
>
>
>
> --
> Thanks,
> www.openkb.info
> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
>



-- 
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)


Limiting with index creation on hive empty tables with tez execution engine

2015-09-03 Thread venkata srinivasarao kolla
Hi Team,

In our environment we have set hive execution to Tez and tried to
create/rebuild index on empty tables. Then we found that index
creation is not happening. we are getting the below error:
 FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.tez.TezTask.

It seems to be a limitation with index creation on empty table. When
we populate some data into the table and if we are try generate the
index then we are not getting into this problem.

Is there an existing HIVE issue for this?


Regards,
kolla venkata srinivasa Rao


Index creation is failing from beeline when execution engine is set to Tez

2015-09-03 Thread venkata srinivasarao kolla
Hi Hive Team,

In Hive, with execution engine is set to Tez and when we tried to
rebuild index on a table (which is not empty) using beeline, index
creation is not happening. It is not throwing any error too on to
console. But in the corresponding tez job logs we found below error:

2015-08-24 18:08:43,999 WARN [AMShutdownThread]
org.apache.tez.dag.history.recovery.RecoveryService: Error when
closing summary stream
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on 
/tmp/hive-alti-test-01/tez_session_dir/573df26e-4a22-4bf5-ad5c-a0d72cdacec6/application_1439396922313_1732/recovery/1/application_1439396922313_1732.summary:
File does not exist. Holder DFSClient_NONMAPREDUCE-117450461_1 does
not have any open files. at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2956)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3027)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3007)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:641)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:484)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

Below are the set of commands we have used: CREATE TABLE
table02(column1 String, column2 string, column3 int, column4 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; LOAD DATA LOCAL INPATH
'posts_us' OVERWRITE INTO TABLE table02; CREATE INDEX table02_index ON
TABLE table02 (column3) AS
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH
DEFERRED REBUILD; ALTER INDEX table02_index ON table02 REBUILD;

Does any one seen this problem?


Regards,
Srinivas.