Re: Derby version used by Hive

2015-09-28 Thread kulkarni.swar...@gmail.com
Richard,

A quick eye-balling of the code doesn't show anything that could
potentially be a blocker for this upgrade. Also +1 on staying on the latest
and greatest. Please feel free to open up a JIRA and submit the patch.

Also just out of curiosity, what are you really using a derby backed store
for?

On Mon, Sep 28, 2015 at 11:02 AM, Richard Hillegas 
wrote:

>
>
> I haven't received a response to the following message, which I posted last
> week. Maybe my message rambled too much. Here is an attempt to pose my
> question more succinctly:
>
> Q: Does anyone know of any reason why we can't upgrade Hive's Derby version
> to 10.12.1.1, the new version being vetted by the Derby community right
> now?
>
> Thanks,
> -Rick
>
> > I am following the Hive build instructions here:
> >
>
> https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallationandConfiguration
> > .
> >
> > I noticed that Hive development seems to be using an old version of
> Derby:
> > 10.10.2.0. Is there some defect in the most recent Derby version
> > (10.11.1.1) which prevents Hive from upgrading to 10.11.1.1? The only
> > Hive-tagged Derby bug which I can find is
> > https://issues.apache.org/jira/browse/DERBY-6358. That issue doesn't
> seem
> > to be version-specific and it mentions a resolved Hive issue:
> > https://issues.apache.org/jira/browse/HIVE-8739.
> >
> > Staying with 10.10.2.0 makes sense if you need to run on some ancient
> JVMs:
> > Java SE 5 or Java ME CDC/Foundation Profile 1.1. Hadoop, however,
> requires
> > at least Java 6 according to
> > https://wiki.apache.org/hadoop/HadoopJavaVersions.
> >
> > Note that the Derby community expects to release version 10.12.1.1 soon:
> > https://wiki.apache.org/db-derby/DerbyTenTwelveOneRelease. This might be
> a
> > good opportunity for Hive to upgrade to a more capable version of Derby.
> >
> > I mention this because the Derby version used by Hive ends up on the
> > classpath used by downstream projects (like Spark). That makes it awkward
> > for downstream projects to use more current Derby versions. Do you know
> of
> > any reason that downstream projects shouldn't override the Derby version
> > currently preferred by Hive?
> >
> > Thanks,
> > -Rick
>



-- 
Swarnim


Re: Avro column type in Hive

2015-09-28 Thread kulkarni.swar...@gmail.com
Sergey,

Is your table a partitioned or a non-partitioned one? I have usually seen
this problem manifest itself for partitioned tables and that is mostly
where the pruning bites. So if you now try to add a partition to this
table, you might see an exception like:

java.sql.BatchUpdateException: Data truncation: Data too long for column
'TYPE_NAME' at row 1)

The "TYPE_NAME" is not actually a definition of the Avro schema.  Instead,
it is a definition of the type structure in Hive terms.  I assume it is
used for things such as validating the query before it is executed, etc.

On Mon, Sep 28, 2015 at 7:38 PM, Chaoyu Tang  wrote:

> Yes, when you described the avro table, what you get back was actually from
> your avro schema instead of database table. The avro table is NOT
> considered as a metastore backed SerDe. But that it has its columns
> populated to DB (e.g. HIVE-6308
> ) is mainly for column
> statistics purpose, which obviously is not applicable to your case which
> has a type name > 100kb.
>
> Chaoyu
>
> On Mon, Sep 28, 2015 at 8:12 PM, Sergey Shelukhin 
> wrote:
>
> > Hi.
> > I noticed that when I create an Avro table using a very large schema
> file,
> > mysql metastore silently truncates the TYPE_NAME in COLUMNS_V2 table to
> > the size of varchar (4000); however, when I do describe on the table, it
> > still displays the whole type name (around 100Kb long) that I presume it
> > gets from deserializer.
> > Is the value in TYPE_NAME used for anything for Avro tables?
> >
> >
>



-- 
Swarnim


Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan

2015-09-17 Thread kulkarni.swar...@gmail.com
Congratulations! Well deserved!

On Thu, Sep 17, 2015 at 12:03 AM, Vikram Dixit K 
wrote:

> Congrats Ashutosh!
>
> On Wed, Sep 16, 2015 at 9:01 PM, Chetna C  wrote:
>
>> Congrats Ashutosh !
>>
>> Thanks,
>> Chetna Chaudhari
>>
>> On 17 September 2015 at 06:53, Navis Ryu  wrote:
>>
>> > Congratulations!
>> >
>> > 2015-09-17 9:35 GMT+09:00 Xu, Cheng A :
>> > > Congratulations, Ashutosh!
>> > >
>> > > -Original Message-
>> > > From: Mohammad Islam [mailto:misla...@yahoo.com.INVALID]
>> > > Sent: Thursday, September 17, 2015 8:23 AM
>> > > To: u...@hive.apache.org; Hive
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > >
>> > > Congratulations Asutosh!
>> > >
>> > >
>> > >  On Wednesday, September 16, 2015 4:51 PM, Bright Ling <
>> > brig...@hostworks.com.au> wrote:
>> > >
>> > >
>> > >  #yiv7221259285 #yiv7221259285 -- _filtered #yiv7221259285
>> > {font-family:SimSun;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered
>> #yiv7221259285
>> > {font-family:PMingLiU;panose-1:2 2 5 0 0 0 0 0 0 0;} _filtered
>> > #yiv7221259285 {panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv7221259285
>> > {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered
>> > #yiv7221259285 {font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;}
>> > _filtered #yiv7221259285 {panose-1:2 2 5 0 0 0 0 0 0 0;} _filtered
>> > #yiv7221259285 {panose-1:2 1 6 0 3 1 1 1 1 1;}#yiv7221259285
>> #yiv7221259285
>> > p.yiv7221259285MsoNormal, #yiv7221259285 li.yiv7221259285MsoNormal,
>> > #yiv7221259285 div.yiv7221259285MsoNormal
>> > {margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv7221259285
>> a:link,
>> > #yiv7221259285 span.yiv7221259285MsoHyperlink
>> > {color:blue;text-decoration:underline;}#yiv7221259285 a:visited,
>> > #yiv7221259285 span.yiv7221259285MsoHyperlinkFollowed
>> > {color:purple;text-decoration:underline;}#yiv7221259285
>> > p.yiv7221259285MsoAcetate, #yiv7221259285 li.yiv7221259285MsoAcetate,
>> > #yiv7221259285 div.yiv7221259285MsoAcetate
>> > {margin:0cm;margin-bottom:.0001pt;font-size:8.0pt;}#yiv7221259285
>> > span.yiv7221259285EmailStyle17 {color:#1F497D;}#yiv7221259285
>> > span.yiv7221259285BalloonTextChar {}#yiv7221259285
>> > .yiv7221259285MsoChpDefault {font-size:10.0pt;} _filtered #yiv7221259285
>> > {margin:72.0pt 72.0pt 72.0pt 72.0pt;}#yiv7221259285
>> > div.yiv7221259285WordSection1 {}#yiv7221259285 Congratulations Asutosh!
>> >From: Sathi Chowdhury [mailto:sathi.chowdh...@lithium.com]
>> > > Sent: Thursday, 17 September 2015 8:04 AM
>> > > To: u...@hive.apache.org
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > Congrats Asutosh!From:Sergey Shelukhin
>> > > Reply-To: "u...@hive.apache.org"
>> > > Date: Wednesday, September 16, 2015 at 2:31 PM
>> > > To: "u...@hive.apache.org"
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > Congrats!From:Alpesh Patel 
>> > > Reply-To: "u...@hive.apache.org" 
>> > > Date: Wednesday, September 16, 2015 at 13:24
>> > > To: "u...@hive.apache.org" 
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > Congratulations AshutoshOn Wed, Sep 16, 2015 at 1:23 PM, Pengcheng
>> > Xiong  wrote: Congratulations Ashutosh!On Wed,
>> Sep
>> > 16, 2015 at 1:17 PM, John Pullokkaran 
>> > wrote: Congrats Ashutosh!From:Vaibhav Gumashta <
>> > vgumas...@hortonworks.com>
>> > > Reply-To: "u...@hive.apache.org" 
>> > > Date: Wednesday, September 16, 2015 at 1:01 PM
>> > > To: "u...@hive.apache.org" , "
>> dev@hive.apache.org"
>> > 
>> > > Cc: Ashutosh Chauhan 
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > Congrats Ashutosh! —VaibhavFrom:Prasanth Jayachandran <
>> > pjayachand...@hortonworks.com>
>> > > Reply-To: "u...@hive.apache.org" 
>> > > Date: Wednesday, September 16, 2015 at 12:50 PM
>> > > To: "dev@hive.apache.org" , "
>> u...@hive.apache.org"
>> > 
>> > > Cc: "dev@hive.apache.org" , Ashutosh Chauhan <
>> > hashut...@apache.org>
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > Congratulations Ashutosh!
>> > >
>> > >  On Wed, Sep 16, 2015 at 12:48 PM -0700, "Xuefu Zhang" <
>> > xzh...@cloudera.com> wrote: Congratulations, Ashutosh!. Well-deserved.
>> > >
>> > > Thanks to Carl also for the hard work in the past few years!
>> > >
>> > > --Xuefu
>> > >
>> > > On Wed, Sep 16, 2015 at 12:39 PM, Carl Steinbach 
>> wrote:
>> > >
>> > >> I am very happy to announce that Ashutosh Chauhan is taking over as
>> > >> the new VP of the Apache Hive project. Ashutosh has been a longtime
>> > >> contributor to Hive and has played a pivotal role 

Patches needing review

2015-09-10 Thread kulkarni.swar...@gmail.com
Hello all,

I have couple of patches submitted and out for review for some time. If I
can get some help on getting them reviewed and merged, I would highly
appreciate that!

HIVE-11691 (Wiki update for developer debugging. Already one +1 from Lefty)
HIVE-11647 (HBase dependency bump to 1.1.1)
HIVE-11609 (10-100x perf improvement on HBase comp key queries)
HIVE-11590 (Log updates to AvroSerDe. Already one +1)
HIVE-11560 (Fixing a passivity issue introduced by HIVE-8898)
HIVE-10708 (Support to proactively check for avro reader/writer schema
compatibility)

Thanks again for help,
Swarnim


Re: hiveserver2 hangs

2015-09-08 Thread kulkarni.swar...@gmail.com
Sanjeev,

I am going off this exception in the stacktrace that you posted.

"at java.lang.OutOfMemoryError.(OutOfMemoryError.java:48)"

which def. indicates that it's not very happy memory wise. I would def.
recommend to bump up the memory and see if it helps. If not, we can debug
further from there.

On Tue, Sep 8, 2015 at 12:17 PM, Sanjeev Verma <sanjeev.verm...@gmail.com>
wrote:

> What this exception implies here? how to identify the problem here.
> Thanks
>
> On Tue, Sep 8, 2015 at 10:44 PM, Sanjeev Verma <sanjeev.verm...@gmail.com>
> wrote:
>
>> We have 8GB HS2 java heap, we have not tried any bumping.
>>
>> On Tue, Sep 8, 2015 at 8:14 PM, kulkarni.swar...@gmail.com <
>> kulkarni.swar...@gmail.com> wrote:
>>
>>> How much memory have you currently provided to HS2? Have you tried
>>> bumping that up?
>>>
>>> On Mon, Sep 7, 2015 at 1:09 AM, Sanjeev Verma <sanjeev.verm...@gmail.com
>>> > wrote:
>>>
>>>> *I am getting the following exception when the HS2 is crashing, any
>>>> idea why it has happening*
>>>>
>>>> "pool-1-thread-121" prio=4 tid=19283 RUNNABLE
>>>> at java.lang.OutOfMemoryError.(OutOfMemoryError.java:48)
>>>> at java.util.Arrays.copyOf(Arrays.java:2271)
>>>> Local Variable: byte[]#1
>>>> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
>>>> at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutput
>>>> Stream.java:93)
>>>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
>>>> Local Variable: org.apache.thrift.TByteArrayOutputStream#42
>>>> Local Variable: byte[]#5378
>>>> at org.apache.thrift.transport.TSaslTransport.write(TSaslTransp
>>>> ort.java:446)
>>>> at org.apache.thrift.transport.TSaslServerTransport.write(TSasl
>>>> ServerTransport.java:41)
>>>> at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryP
>>>> rotocol.java:163)
>>>> at org.apache.thrift.protocol.TBinaryProtocol.writeString(TBina
>>>> ryProtocol.java:186)
>>>> Local Variable: byte[]#2
>>>> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
>>>> mnStandardScheme.write(TStringColumn.java:490)
>>>> Local Variable: java.util.ArrayList$Itr#1
>>>> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
>>>> mnStandardScheme.write(TStringColumn.java:433)
>>>> Local Variable: org.apache.hive.service.cli.th
>>>> rift.TStringColumn$TStringColumnStandardScheme#1
>>>> at org.apache.hive.service.cli.thrift.TStringColumn.write(TStri
>>>> ngColumn.java:371)
>>>> at org.apache.hive.service.cli.thrift.TColumn.standardSchemeWri
>>>> teValue(TColumn.java:381)
>>>> Local Variable: org.apache.hive.service.cli.thrift.TColumn#504
>>>> Local Variable: org.apache.hive.service.cli.thrift.TStringColumn#453
>>>> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:244)
>>>> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213)
>>>> at org.apache.thrift.TUnion.write(TUnion.java:152)
>>>>
>>>>
>>>>
>>>> On Fri, Aug 21, 2015 at 6:16 AM, kulkarni.swar...@gmail.com <
>>>> kulkarni.swar...@gmail.com> wrote:
>>>>
>>>>> Sanjeev,
>>>>>
>>>>> One possibility is that you are running into[1] which affects hive
>>>>> 0.13. Is it possible for you to apply the patch on [1] and see if it fixes
>>>>> your problem?
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/HIVE-10410
>>>>>
>>>>> On Thu, Aug 20, 2015 at 6:12 PM, Sanjeev Verma <
>>>>> sanjeev.verm...@gmail.com> wrote:
>>>>>
>>>>>> We are using hive-0.13 with hadoop1.
>>>>>>
>>>>>> On Thu, Aug 20, 2015 at 11:49 AM, kulkarni.swar...@gmail.com <
>>>>>> kulkarni.swar...@gmail.com> wrote:
>>>>>>
>>>>>>> Sanjeev,
>>>>>>>
>>>>>>> Can you tell me more details about your hive version/hadoop version
>>>>>>> etc.
>>>>>>>
>>>>>>> On Wed, Aug 19, 2015 at 1:35 PM, Sanjeev Verma <
>>>>>>> sanjeev.verm...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Can somebody gives me some pointer to looked upon?
>>>>>>>>
>>>>>>>> On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma <
>>>>>>>> sanjeev.verm...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>> We are experiencing a strange problem with the hiveserver2, in one
>>>>>>>>> of the job it gets the GC limit exceed from mapred task and hangs even
>>>>>>>>> having enough heap available.we are not able to identify what causing 
>>>>>>>>> this
>>>>>>>>> issue.
>>>>>>>>> Could anybody help me identify the issue and let me know what
>>>>>>>>> pointers I need to looked up.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Swarnim
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Swarnim
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Swarnim
>>>
>>
>>
>


-- 
Swarnim


Re: [DISCUSS] github integration

2015-09-08 Thread kulkarni.swar...@gmail.com
I personally am a big fan of pull requests which is primarily the reason
for a similar proposal that I made almost a year and half ago[1] :). I
think the consensus we reached at the time was that to move the primary
source code from svn to git(which we did) but still use patches submitted
to JIRAs to maintain a permalink to the changes and also because it's
little harder to treat a pull requests as a patch.

[1] http://qnalist.com/questions/4754349/proposal-to-switch-to-pull-requests

On Tue, Sep 8, 2015 at 5:53 PM, Owen O'Malley  wrote:

> All,
>I think we should use the github integrations that Apache infra has
> introduced. You can read about it here:
>
>
> https://blogs.apache.org/infra/entry/improved_integration_between_apache_and
>
> The big win from my point of view is that you can use github pull requests
> for doing reviews. All of the traffic from the pull request is sent to
> Apache email lists and vice versa.
>
> Thoughts?
>
>Owen
>



-- 
Swarnim


Re: hiveserver2 hangs

2015-09-08 Thread kulkarni.swar...@gmail.com
How much memory have you currently provided to HS2? Have you tried bumping
that up?

On Mon, Sep 7, 2015 at 1:09 AM, Sanjeev Verma <sanjeev.verm...@gmail.com>
wrote:

> *I am getting the following exception when the HS2 is crashing, any idea
> why it has happening*
>
> "pool-1-thread-121" prio=4 tid=19283 RUNNABLE
> at java.lang.OutOfMemoryError.(OutOfMemoryError.java:48)
> at java.util.Arrays.copyOf(Arrays.java:2271)
> Local Variable: byte[]#1
> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
> at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutput
> Stream.java:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
> Local Variable: org.apache.thrift.TByteArrayOutputStream#42
> Local Variable: byte[]#5378
> at org.apache.thrift.transport.TSaslTransport.write(TSaslTransp
> ort.java:446)
> at org.apache.thrift.transport.TSaslServerTransport.write(TSasl
> ServerTransport.java:41)
> at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryP
> rotocol.java:163)
> at org.apache.thrift.protocol.TBinaryProtocol.writeString(TBina
> ryProtocol.java:186)
> Local Variable: byte[]#2
> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
> mnStandardScheme.write(TStringColumn.java:490)
> Local Variable: java.util.ArrayList$Itr#1
> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
> mnStandardScheme.write(TStringColumn.java:433)
> Local Variable: org.apache.hive.service.cli.th
> rift.TStringColumn$TStringColumnStandardScheme#1
> at org.apache.hive.service.cli.thrift.TStringColumn.write(TStri
> ngColumn.java:371)
> at org.apache.hive.service.cli.thrift.TColumn.standardSchemeWri
> teValue(TColumn.java:381)
> Local Variable: org.apache.hive.service.cli.thrift.TColumn#504
> Local Variable: org.apache.hive.service.cli.thrift.TStringColumn#453
> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:244)
> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213)
> at org.apache.thrift.TUnion.write(TUnion.java:152)
>
>
>
> On Fri, Aug 21, 2015 at 6:16 AM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> Sanjeev,
>>
>> One possibility is that you are running into[1] which affects hive 0.13.
>> Is it possible for you to apply the patch on [1] and see if it fixes your
>> problem?
>>
>> [1] https://issues.apache.org/jira/browse/HIVE-10410
>>
>> On Thu, Aug 20, 2015 at 6:12 PM, Sanjeev Verma <sanjeev.verm...@gmail.com
>> > wrote:
>>
>>> We are using hive-0.13 with hadoop1.
>>>
>>> On Thu, Aug 20, 2015 at 11:49 AM, kulkarni.swar...@gmail.com <
>>> kulkarni.swar...@gmail.com> wrote:
>>>
>>>> Sanjeev,
>>>>
>>>> Can you tell me more details about your hive version/hadoop version etc.
>>>>
>>>> On Wed, Aug 19, 2015 at 1:35 PM, Sanjeev Verma <
>>>> sanjeev.verm...@gmail.com> wrote:
>>>>
>>>>> Can somebody gives me some pointer to looked upon?
>>>>>
>>>>> On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma <
>>>>> sanjeev.verm...@gmail.com> wrote:
>>>>>
>>>>>> Hi
>>>>>> We are experiencing a strange problem with the hiveserver2, in one of
>>>>>> the job it gets the GC limit exceed from mapred task and hangs even 
>>>>>> having
>>>>>> enough heap available.we are not able to identify what causing this 
>>>>>> issue.
>>>>>> Could anybody help me identify the issue and let me know what
>>>>>> pointers I need to looked up.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Swarnim
>>>>
>>>
>>>
>>
>>
>> --
>> Swarnim
>>
>
>


-- 
Swarnim


Re: [ANNOUNCE] New Hive Committer - Lars Francke

2015-09-07 Thread kulkarni.swar...@gmail.com
Congrats!

On Mon, Sep 7, 2015 at 3:54 AM, Carl Steinbach  wrote:

> The Apache Hive PMC has voted to make Lars Francke a committer on the
> Apache Hive Project.
>
> Please join me in congratulating Lars!
>
> Thanks.
>
> - Carl
>
>


-- 
Swarnim


Patches needing review love

2015-08-21 Thread kulkarni.swar...@gmail.com
Hey all,

I have couple of patches currently review state which are either ready to
merge or need review. If I can have someone help me out with these, I would
really appreciate that.

HIVE-11513 (Ready to merge)
HIVE-5277 (Ready to merge)
HIVE-11559 (Needs review)
HIVE-11469 (Needs review)

Swarnim


Re: hiveserver2 hangs

2015-08-20 Thread kulkarni.swar...@gmail.com
Sanjeev,

One possibility is that you are running into[1] which affects hive 0.13. Is
it possible for you to apply the patch on [1] and see if it fixes your
problem?

[1] https://issues.apache.org/jira/browse/HIVE-10410

On Thu, Aug 20, 2015 at 6:12 PM, Sanjeev Verma sanjeev.verm...@gmail.com
wrote:

 We are using hive-0.13 with hadoop1.

 On Thu, Aug 20, 2015 at 11:49 AM, kulkarni.swar...@gmail.com 
 kulkarni.swar...@gmail.com wrote:

 Sanjeev,

 Can you tell me more details about your hive version/hadoop version etc.

 On Wed, Aug 19, 2015 at 1:35 PM, Sanjeev Verma sanjeev.verm...@gmail.com
  wrote:

 Can somebody gives me some pointer to looked upon?

 On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma 
 sanjeev.verm...@gmail.com wrote:

 Hi
 We are experiencing a strange problem with the hiveserver2, in one of
 the job it gets the GC limit exceed from mapred task and hangs even having
 enough heap available.we are not able to identify what causing this issue.
 Could anybody help me identify the issue and let me know what pointers
 I need to looked up.

 Thanks





 --
 Swarnim





-- 
Swarnim


Re: hiveserver2 hangs

2015-08-20 Thread kulkarni.swar...@gmail.com
Sanjeev,

Can you tell me more details about your hive version/hadoop version etc.

On Wed, Aug 19, 2015 at 1:35 PM, Sanjeev Verma sanjeev.verm...@gmail.com
wrote:

 Can somebody gives me some pointer to looked upon?

 On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma sanjeev.verm...@gmail.com
 wrote:

 Hi
 We are experiencing a strange problem with the hiveserver2, in one of the
 job it gets the GC limit exceed from mapred task and hangs even having
 enough heap available.we are not able to identify what causing this issue.
 Could anybody help me identify the issue and let me know what pointers I
 need to looked up.

 Thanks





-- 
Swarnim


Re: [DISCUSS] Hive and HBase dependency

2015-08-14 Thread kulkarni.swar...@gmail.com
Yeah. I don't think HBase 1.0 vs 1.1 should really make a difference. Just
issues like [1] had me concerned that something within the existing
codebase might not work really well with 1.0.

[1] https://issues.apache.org/jira/browse/HIVE-10990

On Fri, Aug 14, 2015 at 1:47 PM, Alan Gates alanfga...@gmail.com wrote:

 My hope is to have it ready to merge into master soon (like in a few
 weeks). I don't think it will affect anything in the hive hbase integration
 other than we need to make sure we can work with the same version of
 hbase.  If we needed to move back to HBase 1.0 for that I think that would
 be ok.

 Alan.

 kulkarni.swar...@gmail.com
 August 14, 2015 at 11:12
 Thanks Alan. I created [1] to revert the non-passive changes from 1.x.

 Out of curiosity, what are your plans on merging the metastore branch to
 master? It seems like some coordination might be needed as some of the
 stuff in the hive hbase integration might need some massaging before that
 is done.

 [1] https://issues.apache.org/jira/browse/HIVE-11559




 --
 Swarnim
 Alan Gates alanfga...@gmail.com
 August 13, 2015 at 10:52
 On the hbase-metastore branch I've actually already moved to HBase 1.1.
 I'm +1 for moving to 1.1 or 1.0 on master and staying at 0.98 on branch-1.

 Alan.

 kulkarni.swar...@gmail.com
 August 12, 2015 at 8:43
 Hi all,

 It seems like our current dependency on HBase is a little fuzzy to say the
 least. And with increased features relying on HBase(HBase integration,
 HBase metastore etc), I think it would be worth giving a thought into how
 we want to manage this dependency. I have also seen regressions[1][2] come
 up recently due to this dependency not managed properly. Plus we need to
 think about moving to HBase 1.0 soon as well to take advantage of the
 backwards compatibility guarantees that HBase is providing.

 Our current HBase dependency is 0.98.9. Also with out current bifurcation
 of branches to create a 1.x branch for stability and 2.x for bleeding edge,
 I propose that we still keep the version to 0.98.9 on the 1.x branch and
 move to HBase 1.0 in our 2.0 branch. In that way we can start taking
 advantage of the latest updates to the HBase API in our 2.x branch and
 still keep 1.x backwards compatible by avoiding a direct jump to HBase 1.0.
 If we decide to go this route, we might need to revert back some of the
 compatibility breaking changes[2] that sneaked into 1.x and move them over
 to 2.x.

 Thoughts?

 Thanks,
 Swarnim


 [1] https://issues.apache.org/jira/browse/HIVE-10990
 [2] https://issues.apache.org/jira/browse/HIVE-8898




-- 
Swarnim


Re: [DISCUSS] Hive and HBase dependency

2015-08-14 Thread kulkarni.swar...@gmail.com
Thanks Alan. I created [1] to revert the non-passive changes from 1.x.

Out of curiosity, what are your plans on merging the metastore branch to
master? It seems like some coordination might be needed as some of the
stuff in the hive hbase integration might need some massaging before that
is done.

[1] https://issues.apache.org/jira/browse/HIVE-11559

On Thu, Aug 13, 2015 at 12:52 PM, Alan Gates alanfga...@gmail.com wrote:

 On the hbase-metastore branch I've actually already moved to HBase 1.1.
 I'm +1 for moving to 1.1 or 1.0 on master and staying at 0.98 on branch-1.

 Alan.

 kulkarni.swar...@gmail.com
 August 12, 2015 at 8:43
 Hi all,

 It seems like our current dependency on HBase is a little fuzzy to say the
 least. And with increased features relying on HBase(HBase integration,
 HBase metastore etc), I think it would be worth giving a thought into how
 we want to manage this dependency. I have also seen regressions[1][2] come
 up recently due to this dependency not managed properly. Plus we need to
 think about moving to HBase 1.0 soon as well to take advantage of the
 backwards compatibility guarantees that HBase is providing.

 Our current HBase dependency is 0.98.9. Also with out current bifurcation
 of branches to create a 1.x branch for stability and 2.x for bleeding edge,
 I propose that we still keep the version to 0.98.9 on the 1.x branch and
 move to HBase 1.0 in our 2.0 branch. In that way we can start taking
 advantage of the latest updates to the HBase API in our 2.x branch and
 still keep 1.x backwards compatible by avoiding a direct jump to HBase 1.0.
 If we decide to go this route, we might need to revert back some of the
 compatibility breaking changes[2] that sneaked into 1.x and move them over
 to 2.x.

 Thoughts?

 Thanks,
 Swarnim


 [1] https://issues.apache.org/jira/browse/HIVE-10990
 [2] https://issues.apache.org/jira/browse/HIVE-8898




-- 
Swarnim


[DISCUSS] Hive and HBase dependency

2015-08-12 Thread kulkarni.swar...@gmail.com
Hi all,

It seems like our current dependency on HBase is a little fuzzy to say the
least. And with increased features relying on HBase(HBase integration,
HBase metastore etc), I think it would be worth giving a thought into how
we want to manage this dependency. I have also seen regressions[1][2] come
up recently due to this dependency not managed properly. Plus we need to
think about moving to HBase 1.0 soon as well to take advantage of the
backwards compatibility guarantees that HBase is providing.

Our current HBase dependency is 0.98.9. Also with out current bifurcation
of branches to create a 1.x branch for stability and 2.x for bleeding edge,
I propose that we still keep the version to 0.98.9 on the 1.x branch and
move to HBase 1.0 in our 2.0 branch. In that way we can start taking
advantage of the latest updates to the HBase API in our 2.x branch and
still keep 1.x backwards compatible by avoiding a direct jump to HBase 1.0.
If we decide to go this route, we might need to revert back some of the
compatibility breaking changes[2] that sneaked into 1.x and move them over
to 2.x.

Thoughts?

Thanks,
Swarnim


[1] https://issues.apache.org/jira/browse/HIVE-10990
[2] https://issues.apache.org/jira/browse/HIVE-8898


Re: Hive column mapping to hbase

2015-08-05 Thread kulkarni.swar...@gmail.com
Sunile,

Starting hive 0.12, you can use prefixes to pull the columns corresponding
to the column families. So in your case as long as you have sensible
prefixes, for example for everything in july, if you use july-DATE-speed,
then you can simply do something like WITH SERDEPROPERTIES
('hbase.columns.mapping' = ':key,columnfamily:july.*') and it will pull
everything automatically related to the july column. In that way you don't
have to define things statically.

Hope that helps.

[1] https://issues.apache.org/jira/browse/HIVE-3725

On Tue, Aug 4, 2015 at 9:06 PM, Manjee, Sunile sunile.man...@teradata.com
wrote:


 I would appreciate any assistance.

 Hive forces me to predefine column mappings :  WITH SERDEPROPERTIES
 ('hbase.columns.mapping' = ':key,columnfamily:july8-speed')

 I need to create columns in hbase based on dates (which are values in my
 source) and append some other field like measurement. This will then make
 up my column name i.e. July8-speed. Predefining these seem senseless as I
 do not know which dates and/or measurements I will get from source data.
 Hive forces me to create a static mapping like I have shown above.

 Any assistance or insights would be appreciated.




-- 
Swarnim


Re: [ANNOUNCE] New Hive PMC Member - Sushanth Sowmyan

2015-07-23 Thread kulkarni.swar...@gmail.com
Congrats Sushanth!

On Thu, Jul 23, 2015 at 3:40 PM, Eugene Koifman ekoif...@hortonworks.com
wrote:

 Congratulations!

 On 7/22/15, 9:45 AM, Carl Steinbach c...@apache.org wrote:

 I am pleased to announce that Sushanth Sowmyan has been elected to the
 Hive
 Project Management Committee. Please join me in congratulating Sushanth!
 
 Thanks.
 
 - Carl




-- 
Swarnim


Re: hbase column without prefix

2015-07-23 Thread kulkarni.swar...@gmail.com
Hey,

Just so that I understand your issue better, why do you think it should be

key: one, value: 0.5
key: two: value: 0.5

instead of

key: tag_one, value: 0.5
key: tag_two, value: 0.5

when you know that the prefixes for your columns are tag_. Hive won't
really do anything but simply pull all the columns that start with the
given prefix and add them to the key for your map which is exactly what you
are seeing here.


On Wed, Jul 22, 2015 at 10:03 AM, Wojciech Indyk wojciechin...@gmail.com
wrote:

 Hi!
 I've created an issue https://issues.apache.org/jira/browse/HIVE-11329
 and need an advice is it a bug or should it be a new feature, e.g. a
 flag to enable somewhere in a table definition?
 I am eager to create a patch, however I need some help with design a
 work to do (e.g. which modules affect this thing).

 Kindly regards
 Wojciech Indyk




-- 
Swarnim


Re: hbase column without prefix

2015-07-23 Thread kulkarni.swar...@gmail.com
So let me ask you this. If we did not have the support for pulling data via
prefixes, there would be two options for us to pull this data. One, wither
we provide just the column family name like fam: and let hive pull
everything under that column family and stuff it in a map with key being
the column name. Or, the other option would be to provide the column names
individually. In either case, the column prefixes would end up in the hive
column name. My intend behind adding this support was to have a shortcut
way which was an extension of the existing support to pull all columns by
providing a family_name: to pulling just the columns that start with
given prefix. Everything else should stay same and consistent. That said, I
am ok with adding a flag to hide the prefix in the column name, IMO it
would be confusing for someone to understand why in this particular case
the prefix needs to be hidden vs not in any other case.

Does that make sense?

On Thu, Jul 23, 2015 at 9:46 AM, Wojciech Indyk wojciechin...@gmail.com
wrote:

 Hello!

 Yes, but if I define a map prefix tag_ I don't want to receive the
 prefix for each element of the map. I know what the prefix for the map
 is. It is hard to join such data with another structures which doesn't
 have prefixes. All in all it's easier to integrate data without
 prefixes. IMO Prefixes are artificial structure (like 'super-column')
 to optimize queries and be able to store a map in hbase. That's why i
 want to cut prefixes.

 What do you think about it? Does it make sense for you? Even if it's
 not a bug it would be nice to have option to hide prefixes in keys of
 map.

 Kindly regards
 Wojciech Indyk


 2015-07-23 16:32 GMT+02:00 kulkarni.swar...@gmail.com
 kulkarni.swar...@gmail.com:
  Hey,
 
  Just so that I understand your issue better, why do you think it should
 be
 
  key: one, value: 0.5
  key: two: value: 0.5
 
  instead of
 
  key: tag_one, value: 0.5
  key: tag_two, value: 0.5
 
  when you know that the prefixes for your columns are tag_. Hive won't
  really do anything but simply pull all the columns that start with the
  given prefix and add them to the key for your map which is exactly what
 you
  are seeing here.
 
 
  On Wed, Jul 22, 2015 at 10:03 AM, Wojciech Indyk 
 wojciechin...@gmail.com
  wrote:
 
  Hi!
  I've created an issue https://issues.apache.org/jira/browse/HIVE-11329
  and need an advice is it a bug or should it be a new feature, e.g. a
  flag to enable somewhere in a table definition?
  I am eager to create a patch, however I need some help with design a
  work to do (e.g. which modules affect this thing).
 
  Kindly regards
  Wojciech Indyk
 
 
 
 
  --
  Swarnim




-- 
Swarnim


Re: [DISCUSS] Supporting Hadoop-1 and experimental features

2015-05-22 Thread kulkarni.swar...@gmail.com
+1 on the new proposal. Feedback below:

 New features must be put into master.  Whether to put them into branch-1
is at the discretion of the developer.

How about we change this to *All* features must be put into master.
Whether to put them into branch-1 is at the discretion of the *committer*.
The reason I think is going forward for us to sustain as a happy and
healthy community, it's imperative for us to make it not only easy for the
users, but also for developers and committers to contribute/commit patches.
To me being a hive contributor would be hard to determine which branch my
code belongs. Also IMO(and I might be wrong) but many committers have their
own areas of expertise and it's also very hard for them to immediately
determine what branch a patch should go to unless very well documented
somewhere. Putting all code into the master would be an easy approach to
follow and then cherry picking to other branches can be done. So even if
people forget to do that, we can always go back to master and port the
patches out to these branches. So we have a master branch, a branch-1 for
stable code, branch-2 for experimental and bleeding edge code and so on.
Once branch-2 is stable, we deprecate branch-1, create branch-3 and move on.

Another reason I say this is because in my experience, a pretty significant
amount of work is hive is still bug fixes and I think that is what the user
cares most about(correctness above anything else). So with this approach,
might be very obvious to what branches to commit this to.

On Fri, May 22, 2015 at 1:11 PM, Alan Gates alanfga...@gmail.com wrote:

 Thanks for your feedback Chris.  It sounds like there are a couple of
 reasonable concerns being voiced repeatedly:
 1) Fragmentation, the two branches will drift too far apart.
 2) Stagnation, branch-1 will effectively become a dead-end.

 So I modify the proposal as follows to deal with those:

 1) New features must be put into master.  Whether to put them into
 branch-1 is at the discretion of the developer.  The exception would be
 features that would not apply in master (e.g. say someone developed a way
 to double the speed of map reduce jobs Hive produces).  For example, I
 might choose to put the materialized view work I'm doing in both branch-1
 and master, but the HBase metastore work only in master.  This should avoid
 fragmentation by keeping branch-1 a subset of master.

 2) For the next 12 months we will port critical bug fixes (crashes,
 security issues, wrong results) to branch-1 as well as fixing them on
 master.  We might choose to lengthen this time depending on how stable
 master is and how fast the uptake is.  This avoids branch-1 being
 immediately abandoned by developers while users are still depending on it.

 Alan.

   Chris Drome cdr...@yahoo-inc.com.INVALID
  May 22, 2015 at 0:49
 I understand the motivation and benefits of creating a branch-2 where more
 disruptive work can go on without affecting branch-1. While not necessarily
 against this approach, from Yahoo's standpoint, I do have some questions
 (concerns).
 Upgrading to a new version of Hive requires a significant commitment of
 time and resources to stabilize and certify a build for deployment to our
 clusters. Given the size of our clusters and scale of datasets, we have to
 be particularly careful about adopting new functionality. However, at the
 same time we are interested in new testing and making available new
 features and functionality. That said, we would have to rely on branch-1
 for the immediate future.
 One concern is that branch-1 would be left to stagnate, at which point
 there would be no option but for users to move to branch-2 as branch-1
 would be effectively end-of-lifed. I'm not sure how long this would take,
 but it would eventually happen as a direct result of the very reason for
 creating branch-2.
 A related concern is how disruptive the code changes will be in branch-2.
 I imagine that changes in early in branch-2 will be easy to backport to
 branch-1, while this effort will become more difficult, if not impractical,
 as time goes. If the code bases diverge too much then this could lead to
 more pressure for users of branch-1 to add features just to branch-1, which
 has been mentioned as undesirable. By the same token, backporting any code
 in branch-2 will require an increasing amount of effort, which contributors
 to branch-2 may not be interested in committing to.
 These questions affect us directly because, while we require a certain
 amount of stability, we also like to pull in new functionality that will be
 of value to our users. For example, our current 0.13 release is probably
 closer to 0.14 at this point. Given the lifespan of a release, it is often
 more palatable to backport features and bugfixes than to jump to a new
 version.

 The good thing about this proposal is the opportunity to evaluate and
 clean up alot of the old code.
 Thanks,
 chris



 On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin
 

Re: [ANNOUNCE] New Hive Committer - Chaoyu Tang

2015-05-21 Thread kulkarni.swar...@gmail.com
Congrats Chaoyu!

On Thu, May 21, 2015 at 9:17 AM, Sergio Pena sergio.p...@cloudera.com
wrote:

 Congratulations Chaoyu !!!

 On Wed, May 20, 2015 at 5:29 PM, Carl Steinbach c...@apache.org wrote:

  The Apache Hive PMC has voted to make Chaoyu Tang a committer on the
 Apache
  Hive Project.
 
  Please join me in congratulating Chaoyu!
 
  Thanks.
 
  - Carl
 




-- 
Swarnim


Re: Questions related to HBase general use

2015-05-14 Thread kulkarni.swar...@gmail.com
+ hive-dev

Thanks for your question. We recently have been busy adding quite a few
features on top on Hive/HBase Integration to make it more stable and easy
to use. We also did a talk very recently at HBaseCon 2015 showing off the
latest improvements. Slides here[1]. Like Jerry mentioned, if you run a
regular query from Hive on an HBase table with billions of rows, it is
going to be slow as it would trigger a full table scan. However, Hive has
smarts around filter pushdown where the attributes in a where clause are
pushed down and converted to scan ranges and filters to optimize the scan.
Plus with the recent Hive On Spark uplift, I see this integration take
benefit of that as well.

That said, we here use this integration daily over billions of rows to run
hundreds of queries without any issues. Since you mentioned that you are a
already a big consumer of Hive, I would highly recommend to give this a
spin and report back with whatever issues you face so we can work on making
this more stable.

Hope that helps.

Swarnim

[1]
https://docs.google.com/presentation/d/1K2A2NMsNbmKWuG02aUDxsLo0Lal0lhznYy8SB6HjC9U/edit#slide=id.p

On Wed, May 13, 2015 at 6:26 PM, Nick Dimiduk ndimi...@gmail.com wrote:

 + Swarnim, who's expert on HBase/Hive integration.

 Yes, snapshots may be interesting for you. I believe Hive can access HBase
 timestamps, exposed as a virtual column. It's assumed across there whole
 row however, not per cell.

 On Sun, May 10, 2015 at 9:14 PM, Jerry He jerry...@gmail.com wrote:

 Hi, Yong

 You have a good understanding of the benefit of HBase already.
 Generally speaking, HBase is suitable for real time read/write to your big
 data set.
 Regarding the HBase performance evaluation tool, the 'read' test use HBase
 'get'. For 1m rows, the test would issue 1m 'get' (and RPC) to the server.
 The 'scan' test scans the table and transfers the rows to the client in
 batches (e.g. 100 rows at a time), which will take shorter time for the
 whole test to complete for the same number of rows.
 The hive/hbase integration, as you said, needs more consideration.
 1) The performance.  Hive access HBase via HBase client API, which
 involves
 going to the HBase server for all the data access. This will slow things
 down.
 There are a couple of things you can explore. e.g. Hive/HBase snapshot
 integration. This would provide direct access to HBase hfiles.
 2) In your email, you are interested in HBase's capability of storing
 multiple versions of data.  You need to consider if Hive supports this
 HBase feature. i.e provide you access to multi versions. As I can
 remember,
 it is not fully.

 Jerry


 On Thu, May 7, 2015 at 6:18 PM, java8964 java8...@hotmail.com wrote:

  Hi,
  I am kind of new to HBase. Currently our production run IBM BigInsight
 V3,
  comes with Hadoop 2.2 and HBase 0.96.0.
  We are mostly using HDFS and Hive/Pig for our BigData project, it works
  very good for our big datasets. Right now, we have a one dataset needs
 to
  be loaded from Mysql, about 100G, and will have about Gs change daily.
 This
  is a very important slow change dimension data, we like to sync between
  Mysql and BigData platform.
  I am thinking of using HBase to store it, instead of refreshing the
 whole
  dataset in HDFS, due to:
  1) HBase makes the merge the change very easy.2) HBase could store all
 the
  changes in the history, as a function out of box. We will replicate all
 the
  changes from the binlog level from Mysql, and we could keep all changes
 in
  HBase (or long history), then it can give us some insight that cannot be
  done easily in HDFS.3) HBase could give us the benefit to access the
 data
  by key fast, for some cases.4) HBase is available out of box.
  What I am not sure is the Hive/HBase integration. Hive is the top tool
 in
  our environment. If one dataset stored in Hbase (even only about 100G as
  now), the join between it with the other Big datasets in HDFS worries
 me. I
  read quite some information about Hive/HBase integration, and feel that
 it
  is not really mature, as not too many usage cases I can find online,
  especially on performance. There are quite some JIRAs related to make
 Hive
  utilize the HBase for performance in MR job are still pending.
  I want to know other people experience to use HBase in this way. I
  understand HBase is not designed as a storage system for Data Warehouse
  component or analytics engine. But the benefits to use HBase in this
 case
  still attractive me. If my use cases of HBase is mostly read or full
 scan
  the data, how bad it is compared to HDFS in the same cluster? 3x? 5x?
  To help me understand the read throughput of HBase, I use the HBase
  performance evaluation tool, but the output is quite confusing. I have 2
  clusters, one is with 5 nodes with 3 slaves all running on VM (Each with
  24G + 4 cores, so cluster has 12 mappers + 6 reducers), another is real
  cluster with 5 nodes with 3 slaves with 64G + 24 cores and with (48
 mapper
  

Re: JIRA notifications

2015-05-14 Thread kulkarni.swar...@gmail.com
Also not sure if it's related but seems like RB has been pretty sluggish
lately too for me. It takes forever for a patch to submitted and a review
request created(the latest one is still running for past 30 minutes with no
output)

On Wed, May 13, 2015 at 4:26 PM, Lefty Leverenz leftylever...@gmail.com
wrote:

 By the way, we still need to add iss...@hive.apache.org to the
 website's Mailing
 Lists http://hive.apache.org/mailing_lists.html page -- see HIVE-10124
 https://issues.apache.org/jira/browse/HIVE-10124.

 -- Lefty

 On Wed, May 13, 2015 at 2:16 PM, Lefty Leverenz leftylever...@gmail.com
 wrote:

  But some notifications and comments aren't making it onto any Hive
 mailing
  list -- see INFRA-9221 https://issues.apache.org/jira/browse/INFRA-9221
 (please
  add your own comments and examples).  This means the mail archives don't
  have a complete record of JIRA activity.
 
  -- Lefty
 
  On Wed, May 13, 2015 at 10:03 AM, Thejas Nair thejas.n...@gmail.com
  wrote:
 
  comments now added go to iss...@hive.apache.org .
  emails for JIRAs created should still go to dev@
 
 
  On Wed, May 13, 2015 at 9:25 AM, kulkarni.swar...@gmail.com
  kulkarni.swar...@gmail.com wrote:
   I noticed that I haven't been getting notifications(or they are really
   delayed) on any of the new JIRAs created/ comments added. Anyone else
   noticing similar issues as well?
  
   --
   Swarnim
 
 
 




-- 
Swarnim


[DISCUSS] Hive API passivity

2015-05-14 Thread kulkarni.swar...@gmail.com
While reviewing some of the recent patches, I came across a few with
non-passive changes and or discussion around them. I was wondering what
kind of passivity guarantees should we provide to our consumers? I
understand that Hive API is probably not as widely used as some of its
peers in the ecosystem like HBase. But should that be something we should
start thinking on especially around user facing interfaces like UDFs,
SerDes, StorageHandlers etc? More so given that we are 1.0 now?
IMO we should avoid doing any of such changes and/or if we have to do so
with a major version bump for the next release.

Thoughts?

-- 
Swarnim


Re: JIRA notifications

2015-05-14 Thread kulkarni.swar...@gmail.com
Yeah I was having issues with both the manual method as well as with rbt.
But seems like things are back to normal now.

Thanks guys!
On May 14, 2015 12:51 PM, Alexander Pivovarov apivova...@gmail.com
wrote:

 You can use the following command to create new review. It takes about 3-5
 sec
 $ rbt post -g yes

 To update the review you can run.
 $ rbt post -u -g yes

 On Thu, May 14, 2015 at 10:48 AM, Prasanth Jayachandran 
 pjayachand...@hortonworks.com wrote:

  @Swarnim..
  Generating patch with git diff needs to include the full index for it to
  be uploaded to review board. “git diff —full-index”.
  https://code.google.com/p/reviewboard/issues/detail?id=3115
 
  - Prasanth
 
   On May 14, 2015, at 9:14 AM, Thejas Nair thejas.n...@gmail.com
 wrote:
  
   Now that we have moved to git, you can try using github pull request
  instead.
   It also  integrates with jira.
   More git instructions - http://accumulo.apache.org/git.html
  
  
   On Thu, May 14, 2015 at 8:01 AM, kulkarni.swar...@gmail.com
   kulkarni.swar...@gmail.com wrote:
   Also not sure if it's related but seems like RB has been pretty
 sluggish
   lately too for me. It takes forever for a patch to submitted and a
  review
   request created(the latest one is still running for past 30 minutes
  with no
   output)
  
   On Wed, May 13, 2015 at 4:26 PM, Lefty Leverenz 
  leftylever...@gmail.com
   wrote:
  
   By the way, we still need to add iss...@hive.apache.org to the
   website's Mailing
   Lists http://hive.apache.org/mailing_lists.html page -- see
  HIVE-10124
   https://issues.apache.org/jira/browse/HIVE-10124.
  
   -- Lefty
  
   On Wed, May 13, 2015 at 2:16 PM, Lefty Leverenz 
  leftylever...@gmail.com
   wrote:
  
   But some notifications and comments aren't making it onto any Hive
   mailing
   list -- see INFRA-9221 
  https://issues.apache.org/jira/browse/INFRA-9221
   (please
   add your own comments and examples).  This means the mail archives
  don't
   have a complete record of JIRA activity.
  
   -- Lefty
  
   On Wed, May 13, 2015 at 10:03 AM, Thejas Nair 
 thejas.n...@gmail.com
   wrote:
  
   comments now added go to iss...@hive.apache.org .
   emails for JIRAs created should still go to dev@
  
  
   On Wed, May 13, 2015 at 9:25 AM, kulkarni.swar...@gmail.com
   kulkarni.swar...@gmail.com wrote:
   I noticed that I haven't been getting notifications(or they are
  really
   delayed) on any of the new JIRAs created/ comments added. Anyone
  else
   noticing similar issues as well?
  
   --
   Swarnim
  
  
  
  
  
  
  
   --
   Swarnim
 
 



JIRA notifications

2015-05-13 Thread kulkarni.swar...@gmail.com
I noticed that I haven't been getting notifications(or they are really
delayed) on any of the new JIRAs created/ comments added. Anyone else
noticing similar issues as well?

-- 
Swarnim


[DISCUSS] Hive/HBase Integration

2015-05-09 Thread kulkarni.swar...@gmail.com
Hello all,

So last week, Myself, Brock Noland and Nick Dimiduk got a chance to present
some of the work we have been doing in the Hive/HBase integration space at
HBaseCon 2015 (slides here[1] for anyone interested). One of the
interesting things that we noted at this conference was that even though
this was an HBase conference, *SQL on HBase* was by far the most popular
theme with talks on Apache Phoenix, Trafodion, Apache Kylin, Apache Drill
and a SQL-On-HBase panel to compare these and other technologies.

I personally feel that with the existing work, we have come a long way but
still have work to do and would need more love to make this a top-notch
feature of Hive. However I was curious to know what the community thought
about it and where do they see this integration stand in coming time when
compared with all the other upcoming techs?

Thanks,
Swarnim

[1]
https://docs.google.com/presentation/d/1K2A2NMsNbmKWuG02aUDxsLo0Lal0lhznYy8SB6HjC9U/edit#slide=id.p


Re: [ANNOUNCE] New Hive Committer - Alex Pivovarov

2015-05-04 Thread kulkarni.swar...@gmail.com
Congratulations Alex!!

On Thu, Apr 30, 2015 at 2:49 PM, Sergey Shelukhin ser...@hortonworks.com
wrote:

 Congratulations!

 On 15/4/29, 17:57, Jimmy Xiang jxi...@cloudera.com wrote:

 Congrats!!
 
 On Wed, Apr 29, 2015 at 5:48 PM, Xu, Cheng A cheng.a...@intel.com
 wrote:
 
  Congratulations Alex!
 
  -Original Message-
  From: Lefty Leverenz [mailto:leftylever...@gmail.com]
  Sent: Thursday, April 30, 2015 8:46 AM
  To: dev@hive.apache.org
  Subject: Re: [ANNOUNCE] New Hive Committer - Alex Pivovarov
 
  Congratulations Alex!
 
  -- Lefty
 
  On Wed, Apr 29, 2015 at 8:41 PM, Vaibhav Gumashta 
  vgumas...@hortonworks.com
   wrote:
 
   Congrats Alex!
  
  
  
  
  
   On Wed, Apr 29, 2015 at 5:26 PM -0700, Alexander Pivovarov 
   apivova...@gmail.commailto:apivova...@gmail.com wrote:
  
   Thank you Everyone!
   Do you know where I can get my lightsaber?
  
   On Wed, Apr 29, 2015 at 1:19 PM, Thejas Nair thejas.n...@gmail.com
   wrote:
  
Congrats Alex!
   
On Wed, Apr 29, 2015 at 12:37 PM, Jason Dere jd...@hortonworks.com
 
wrote:
 Congrats Alex!

 On Apr 29, 2015, at 12:35 PM, Chao Sun c...@cloudera.com
  wrote:

 Congrats Alex! Well done!

 On Wed, Apr 29, 2015 at 12:32 PM, Prasanth Jayachandran 
 pjayachand...@hortonworks.com wrote:

 Congratulations Alex!

 On Apr 29, 2015, at 12:17 PM, Eugene Koifman 
ekoif...@hortonworks.com
 wrote:

 Congratulations!

 On 4/29/15, 12:14 PM, Carl Steinbach c...@apache.org wrote:

 The Apache Hive PMC has voted to make Alex Pivovarov a
 committer on
the
 Apache Hive Project.

 Please join me in congratulating Alex!

 Thanks.

 - Carl





 --
 Best,
 Chao

   
  
 




-- 
Swarnim


Re: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan

2015-04-15 Thread kulkarni.swar...@gmail.com
Congratulations!!

On Wed, Apr 15, 2015 at 10:57 AM, Viraj Bhat vi...@yahoo-inc.com.invalid
wrote:

 Mithun Congrats!!
 Viraj

   From: Carl Steinbach c...@apache.org
  To: dev@hive.apache.org; u...@hive.apache.org; mit...@apache.org
  Sent: Tuesday, April 14, 2015 2:54 PM
  Subject: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan

 The Apache Hive PMC has voted to make Mithun Radhakrishnan a committer on
 the Apache Hive Project.
 Please join me in congratulating Mithun.
 Thanks.
 - Carl







-- 
Swarnim


Re: Can anyone review dayofyear UDF (HIVE-3378)?

2015-04-09 Thread kulkarni.swar...@gmail.com
Alexander,

I reviewed your code and left a few suggestions on how to possibly simplify
it(if I understood your implementation correctly). Let me know if they
don't make sense to you.

On Wed, Apr 8, 2015 at 12:34 PM, Alexander Pivovarov apivova...@gmail.com
wrote:

 https://issues.apache.org/jira/browse/HIVE-3378

 https://reviews.apache.org/r/32732/




-- 
Swarnim


Re: [ANNOUNCE] New Hive PMC Member - Sergey Shelukhin

2015-02-27 Thread kulkarni.swar...@gmail.com
Congratulations Sergey! Well deserved!

On Fri, Feb 27, 2015 at 1:51 AM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 Congratulations and keep up the great work!

 +Vinod

 On Feb 25, 2015, at 8:43 AM, Carl Steinbach c...@apache.org wrote:

  I am pleased to announce that Sergey Shelukhin has been elected to the
 Hive Project Management Committee. Please join me in congratulating Sergey!
 
  Thanks.
 
  - Carl
 




-- 
Swarnim


Re: [ANNOUNCE] New Hive Committer - Eugene Koifman

2014-09-15 Thread kulkarni.swar...@gmail.com
Congratulations! Nice Job!

On Mon, Sep 15, 2014 at 2:54 AM, Damien Carol dca...@blitzbs.com wrote:

  Congratulations, Eugene.

  Damien CAROL

- tél : +33 (0)4 74 96 88 14
- fax : +33 (0)4 74 96 31 88
- email : dca...@blitzbs.com

 BLITZ BUSINESS SERVICE
  Le 14/09/2014 09:23, Thejas Nair a écrit :

 Congrats Eugene!


 On Sat, Sep 13, 2014 at 8:26 AM, Ted Yu yuzhih...@gmail.com 
 yuzhih...@gmail.com wrote:

  Congratulations, Eugene.





-- 
Swarnim


Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho

2014-06-23 Thread kulkarni.swar...@gmail.com
Congratulations guys!


On Mon, Jun 23, 2014 at 2:09 AM, Lefty Leverenz leftylever...@gmail.com
wrote:

 Bravo, Szehon and Gopal!

 -- Lefty


 On Mon, Jun 23, 2014 at 12:53 AM, Gopal V gop...@apache.org wrote:

  On 6/22/14, 8:42 PM, Carl Steinbach wrote:
 
  The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon Ho
  committers on the Apache Hive Project.
 
 
  Thanks everyone! And congrats Szehon!
 
  Cheers,
  Gopal
 




-- 
Swarnim


Re: Documentation Policy

2014-06-14 Thread kulkarni.swar...@gmail.com
A few more from older releases:

*0.10*:
https://issues.apache.org/jira/browse/HIVE-2397?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC10%20AND%20status%20in%20(Resolved%2C%20Closed)%20ORDER%20BY%20priority%20DESC

*0.11:*
https://issues.apache.org/jira/browse/HIVE-3073?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC11%20AND%20status%20in%20(Resolved%2C%20Closed)%20ORDER%20BY%20priority%20DESC

*0.12:*
https://issues.apache.org/jira/browse/HIVE-5161?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC12%20AND%20status%20in%20(Resolved%2C%20Closed)%20ORDER%20BY%20priority%20DESC

Should we create  JIRA for these so that the work to be done on these does
not get lost?



On Fri, Jun 13, 2014 at 5:59 PM, Lefty Leverenz leftylever...@gmail.com
wrote:

 Agreed, deleting TODOC## simplifies the labels field, so we should just use
 comments to keep track of docs done.

 Besides, doc tasks can get complicated -- my gmail inbox has a few messages
 with simultaneous done and to-do labels -- so comments are best for
 tracking progress.  Also, as Szehon noticed, links in the comments make it
 easy to find the docs.

 +1 on (a):  delete TODOCs when done; don't add any new labels.

 -- Lefty


 On Fri, Jun 13, 2014 at 1:31 PM, kulkarni.swar...@gmail.com 
 kulkarni.swar...@gmail.com wrote:

  +1 on deleting the TODOC tag as I think it's assumed by default that once
  an enhancement is done, it will be doc'ed. We may consider adding an
  additional docdone tag but I think we can instead just wait for a +1
 from
  the contributor that the documentation is satisfactory (and assume a
  implicit +1 for no reply) before deleting the TODOC tag.
 
 
  On Fri, Jun 13, 2014 at 1:32 PM, Szehon Ho sze...@cloudera.com wrote:
 
   Yea, I'd imagine the TODOC tag pollutes the query of TODOC's and
 confuses
   the state of a JIRA, so its probably best to remove it.
  
   The idea of docdone is to query what docs got produced and needs
  review?
It might be nice to have a tag for that, to easily signal to
 contributor
   or interested parties to take a look.
  
   On a side note, I already find very helpful your JIRA comments with
 links
   to doc-wikis, both to inform the contributor and just as reference for
   others.  Thanks again for the great work.
  
  
   On Fri, Jun 13, 2014 at 1:33 AM, Lefty Leverenz 
 leftylever...@gmail.com
  
   wrote:
  
One more question:  what should we do after the documentation is done
   for a
JIRA ticket?
   
(a) Just remove the TODOC## label.
(b) Replace TODOC## with docdone (no caps, no version number).
(c) Add a docdone label but keep TODOC##.
(d) Something else.
   
   
-- Lefty
   
   
On Thu, Jun 12, 2014 at 12:54 PM, Brock Noland br...@cloudera.com
   wrote:
   
 Thank you guys! This is great work.


 On Wed, Jun 11, 2014 at 6:20 PM, kulkarni.swar...@gmail.com 
 kulkarni.swar...@gmail.com wrote:

  Going through the issues, I think overall Lefty did an awesome
 job
 catching
  and documenting most of them in time. Following are some of the
  0.13
and
  0.14 ones which I found which either do not have documentation or
   have
  outdated one and probably need one to be consumeable.
 Contributors,
feel
  free to remove the label if you disagree.
 
  *TODOC13:*
 
 

   
  
 
 https://issues.apache.org/jira/browse/HIVE-6827?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC13%20AND%20status%20in%20(Resolved%2C%20Closed)
 
  *TODOC14:*
 
 

   
  
 
 https://issues.apache.org/jira/browse/HIVE-6999?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC14%20AND%20status%20in%20(Resolved%2C%20Closed)
 
  I'll continue digging through the queue going backwards to 0.12
 and
0.11
  and see if I find similar stuff there as well.
 
 
 
  On Wed, Jun 11, 2014 at 10:36 AM, kulkarni.swar...@gmail.com 
  kulkarni.swar...@gmail.com wrote:
 
Feel free to label such jiras with this keyword and ask the
  contributors
   for more information if you need any.
  
   Cool. I'll start chugging through the queue today adding labels
  as
apt.
  
  
   On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair 
   the...@hortonworks.com

   wrote:
  
Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13?
   Sounds good to me.
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual
  or
 entity
   to
   which it is addressed and may contain information that is
 confidential,
   privileged and exempt from disclosure under applicable law. If
  the
  reader
   of this message is not the intended recipient, you are hereby
notified
   that
   any printing, copying, dissemination, distribution, disclosure
  or
   forwarding of this communication is strictly prohibited. If
 you
   have

Re: Documentation Policy

2014-06-13 Thread kulkarni.swar...@gmail.com
+1 on deleting the TODOC tag as I think it's assumed by default that once
an enhancement is done, it will be doc'ed. We may consider adding an
additional docdone tag but I think we can instead just wait for a +1 from
the contributor that the documentation is satisfactory (and assume a
implicit +1 for no reply) before deleting the TODOC tag.


On Fri, Jun 13, 2014 at 1:32 PM, Szehon Ho sze...@cloudera.com wrote:

 Yea, I'd imagine the TODOC tag pollutes the query of TODOC's and confuses
 the state of a JIRA, so its probably best to remove it.

 The idea of docdone is to query what docs got produced and needs review?
  It might be nice to have a tag for that, to easily signal to contributor
 or interested parties to take a look.

 On a side note, I already find very helpful your JIRA comments with links
 to doc-wikis, both to inform the contributor and just as reference for
 others.  Thanks again for the great work.


 On Fri, Jun 13, 2014 at 1:33 AM, Lefty Leverenz leftylever...@gmail.com
 wrote:

  One more question:  what should we do after the documentation is done
 for a
  JIRA ticket?
 
  (a) Just remove the TODOC## label.
  (b) Replace TODOC## with docdone (no caps, no version number).
  (c) Add a docdone label but keep TODOC##.
  (d) Something else.
 
 
  -- Lefty
 
 
  On Thu, Jun 12, 2014 at 12:54 PM, Brock Noland br...@cloudera.com
 wrote:
 
   Thank you guys! This is great work.
  
  
   On Wed, Jun 11, 2014 at 6:20 PM, kulkarni.swar...@gmail.com 
   kulkarni.swar...@gmail.com wrote:
  
Going through the issues, I think overall Lefty did an awesome job
   catching
and documenting most of them in time. Following are some of the 0.13
  and
0.14 ones which I found which either do not have documentation or
 have
outdated one and probably need one to be consumeable. Contributors,
  feel
free to remove the label if you disagree.
   
*TODOC13:*
   
   
  
 
 https://issues.apache.org/jira/browse/HIVE-6827?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC13%20AND%20status%20in%20(Resolved%2C%20Closed)
   
*TODOC14:*
   
   
  
 
 https://issues.apache.org/jira/browse/HIVE-6999?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC14%20AND%20status%20in%20(Resolved%2C%20Closed)
   
I'll continue digging through the queue going backwards to 0.12 and
  0.11
and see if I find similar stuff there as well.
   
   
   
On Wed, Jun 11, 2014 at 10:36 AM, kulkarni.swar...@gmail.com 
kulkarni.swar...@gmail.com wrote:
   
  Feel free to label such jiras with this keyword and ask the
contributors
 for more information if you need any.

 Cool. I'll start chugging through the queue today adding labels as
  apt.


 On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair 
 the...@hortonworks.com
  
 wrote:

  Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13?
 Sounds good to me.

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
   entity
 to
 which it is addressed and may contain information that is
   confidential,
 privileged and exempt from disclosure under applicable law. If the
reader
 of this message is not the intended recipient, you are hereby
  notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you
 have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.




 --
 Swarnim

   
   
   
--
Swarnim
   
  
 




-- 
Swarnim


Re: Documentation Policy

2014-06-11 Thread kulkarni.swar...@gmail.com
 Feel free to label such jiras with this keyword and ask the contributors
for more information if you need any.

Cool. I'll start chugging through the queue today adding labels as apt.


On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair the...@hortonworks.com wrote:

  Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13?
 Sounds good to me.

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
Swarnim


Re: Documentation Policy

2014-06-11 Thread kulkarni.swar...@gmail.com
Going through the issues, I think overall Lefty did an awesome job catching
and documenting most of them in time. Following are some of the 0.13 and
0.14 ones which I found which either do not have documentation or have
outdated one and probably need one to be consumeable. Contributors, feel
free to remove the label if you disagree.

*TODOC13:*
https://issues.apache.org/jira/browse/HIVE-6827?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC13%20AND%20status%20in%20(Resolved%2C%20Closed)

*TODOC14:*
https://issues.apache.org/jira/browse/HIVE-6999?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC14%20AND%20status%20in%20(Resolved%2C%20Closed)

I'll continue digging through the queue going backwards to 0.12 and 0.11
and see if I find similar stuff there as well.



On Wed, Jun 11, 2014 at 10:36 AM, kulkarni.swar...@gmail.com 
kulkarni.swar...@gmail.com wrote:

  Feel free to label such jiras with this keyword and ask the contributors
 for more information if you need any.

 Cool. I'll start chugging through the queue today adding labels as apt.


 On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair the...@hortonworks.com
 wrote:

  Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13?
 Sounds good to me.

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.




 --
 Swarnim




-- 
Swarnim


Re: Documentation Policy

2014-06-10 Thread kulkarni.swar...@gmail.com
 Writing documentation sooner rather than later is likely to increases the
chances of things getting documented.

Big +1 on this. I think documentation contributes towards one of the major
technical debts(I personally have quite a bit for the patches I
contributed). IMHO committers may choose to reject patches that don't have
usage documentation if they include significant work which practically
cannot be consumed without proper documentation.

Slightly tangential but how we do choose on adding this to some of the
already resolved JIRAs that are missing documentation? I can volunteer to
dig through our JIRA queue and find some of these out but probably would
need some help from the contributors on these to be sure that they are
doc'ed properly. :)


On Tue, Jun 10, 2014 at 5:33 PM, Thejas Nair the...@hortonworks.com wrote:

  Also, I don't think we need to wait for end of the release cycle to start
  documenting features for the next release.
 
 
  Agreed, but I think we should wait until the next release is less than
 two
  months away.  What do other people think?

 We have been releasing almost every 3-4 months. So that is the longest
 un-released version documentation would be in the docs.
 Writing documentation sooner rather than later is likely to increases
 the chances of things getting documented. It is easier to get details
 from developers while the details are still fresh in their minds. It
 would also even the load on documentation volunteers over the time.

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
Swarnim


Re: Fixing Trunk tests and getting stable nightly build on b.a.o

2014-05-30 Thread kulkarni.swar...@gmail.com
Hi Lewis,

Are there any specific tests that you are seeing trouble with? If so,
please feel free to log appropriate JIRAs to get them fixed (and submit
patches ;) )

There is a developer guide[1] that explains in quite detail how to run
tests.

 Are there any suggested EXTRA_PARAMETERS to be passing in to the JVM when 
 invoking
a build?

I think it depends. If you are running the full suite, most probably yes.
You might need to do something like export MAVEN_OPTS =-Xmx2g
-XX:MaxPermSize=256M.

Hope that helps.

[1] https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ


On Thu, May 29, 2014 at 6:28 PM, Lewis John Mcgibbney 
lewis.mcgibb...@gmail.com wrote:

 Hi Folks,
 Is there any interest in getting a stable build on builds.apache.org?
 I just checked out trunk code and it appears broken out of the box...
 It also eats up memory like mad.
 Are there any suggested EXTRA_PARAMETERS to be passing in to the JVM when
 invoking a build?
 Thanks
 Lewis

 --
 *Lewis*




-- 
Swarnim


Re: Bumping a few JIRAs

2014-03-20 Thread kulkarni.swar...@gmail.com
I am also definitely willing to help out with reviewing the JIRAs. Just my
+1 won't matter much as it is non-binding. :)


On Thu, Mar 20, 2014 at 1:21 PM, Lefty Leverenz leftylever...@gmail.comwrote:

 I gave HIVE-6331 https://issues.apache.org/jira/browse/HIVE-6331 a +1
 and
 asked for a trivial fix in
 HIVE-5652https://issues.apache.org/jira/browse/HIVE-5652,
 then I'll give it +1 also.

 But I can't vouch for the technical information except as far as common
 sense takes me.  So can someone else take a look too?

 -- Lefty


 On Thu, Mar 20, 2014 at 12:46 PM, Xuefu Zhang xzh...@cloudera.com wrote:

  Thanks for reaching out. Since the first two were initially reviewed by
  Lefty, it's better to get her +1 in order to be committed if she's
  available.
 
  I can take a look at HIVE-6510.
 
  Thanks,
  Xuefu
 
 
  On Thu, Mar 20, 2014 at 9:32 AM, Lars Francke lars.fran...@gmail.com
  wrote:
 
   Hi,
  
   I have submitted a couple of minor JIRAs with patches but nothing has
   happened for months. I'd like to get these in if possible.
  
   Is there anything I can do to help that process?
  
   https://issues.apache.org/jira/browse/HIVE-5652
   https://issues.apache.org/jira/browse/HIVE-6331
   https://issues.apache.org/jira/browse/HIVE-6510
  
   Thanks for your help.
  
   Cheers,
   Lars
  
 




-- 
Swarnim


Re: Bumping a few JIRAs

2014-03-20 Thread kulkarni.swar...@gmail.com
Left few minor comments on the JIRAs


On Thu, Mar 20, 2014 at 2:42 PM, kulkarni.swar...@gmail.com 
kulkarni.swar...@gmail.com wrote:

 I am also definitely willing to help out with reviewing the JIRAs. Just my
 +1 won't matter much as it is non-binding. :)


 On Thu, Mar 20, 2014 at 1:21 PM, Lefty Leverenz 
 leftylever...@gmail.comwrote:

 I gave HIVE-6331 https://issues.apache.org/jira/browse/HIVE-6331 a +1
 and
 asked for a trivial fix in
 HIVE-5652https://issues.apache.org/jira/browse/HIVE-5652,
 then I'll give it +1 also.

 But I can't vouch for the technical information except as far as common
 sense takes me.  So can someone else take a look too?

 -- Lefty


 On Thu, Mar 20, 2014 at 12:46 PM, Xuefu Zhang xzh...@cloudera.com
 wrote:

  Thanks for reaching out. Since the first two were initially reviewed by
  Lefty, it's better to get her +1 in order to be committed if she's
  available.
 
  I can take a look at HIVE-6510.
 
  Thanks,
  Xuefu
 
 
  On Thu, Mar 20, 2014 at 9:32 AM, Lars Francke lars.fran...@gmail.com
  wrote:
 
   Hi,
  
   I have submitted a couple of minor JIRAs with patches but nothing has
   happened for months. I'd like to get these in if possible.
  
   Is there anything I can do to help that process?
  
   https://issues.apache.org/jira/browse/HIVE-5652
   https://issues.apache.org/jira/browse/HIVE-6331
   https://issues.apache.org/jira/browse/HIVE-6510
  
   Thanks for your help.
  
   Cheers,
   Lars
  
 




 --
 Swarnim




-- 
Swarnim


Re: Proposal to switch to pull requests

2014-03-07 Thread kulkarni.swar...@gmail.com
+1


On Fri, Mar 7, 2014 at 1:05 AM, Thejas Nair the...@hortonworks.com wrote:

 Should we start with moving our primary source code repository from
 svn to git ? I feel git is more powerful and easy to use (once you go
 past the learning curve!).


 On Wed, Mar 5, 2014 at 7:39 AM, Brock Noland br...@cloudera.com wrote:
  Personally I prefer the Github workflow, but I believe there have been
  some challenges with that since the source for apache projects must be
  stored in apache source control (git or svn).
 
  Relevent:
 https://blogs.apache.org/infra/entry/improved_integration_between_apache_and
 
  On Wed, Mar 5, 2014 at 9:19 AM, kulkarni.swar...@gmail.com
  kulkarni.swar...@gmail.com wrote:
  Hello,
 
  Since we have a nice mirrored git repository for hive[1], any specific
  reason why we can't switch to doing pull requests instead of patches?
 IMHO
  pull requests are awesome for peer review plus it is also very easy to
 keep
  track of JIRAs with open pull requests instead of looking for JIRAs in a
  Patch Available state. Also since they get updated automatically, it
 is
  also very easy to see if a review comment made by a reviewer was
 addressed
  properly or not.
 
  Thoughts?
 
  Thanks,
 
  [1] https://github.com/apache/hive
 
  --
  Swarnim
 
 
 
  --
  Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
Swarnim


Proposal to switch to pull requests

2014-03-05 Thread kulkarni.swar...@gmail.com
Hello,

Since we have a nice mirrored git repository for hive[1], any specific
reason why we can't switch to doing pull requests instead of patches? IMHO
pull requests are awesome for peer review plus it is also very easy to keep
track of JIRAs with open pull requests instead of looking for JIRAs in a
Patch Available state. Also since they get updated automatically, it is
also very easy to see if a review comment made by a reviewer was addressed
properly or not.

Thoughts?

Thanks,

[1] https://github.com/apache/hive

-- 
Swarnim


Re: [Discuss] project chop up

2013-08-07 Thread kulkarni.swar...@gmail.com
 I'd like to propose we move towards Maven.

Big +1 on this. Most of the major apache projects(hadoop, hbase, avro etc.)
are maven based.

Also can't agree more that the current build system is frustrating to say
the least. Another issue I had with the existing ant based system is that
there are no checkpointing capabilities[1]. So if a 6 hour build fails
after 5hr 30 minutes, most of the things even though successful have to be
rebuilt which is very time consuming. Maven reactors have inbuilt support
for lot of this stuff.

[1] https://issues.apache.org/jira/browse/HIVE-3449.


On Wed, Aug 7, 2013 at 2:06 PM, Brock Noland br...@cloudera.com wrote:

 Thus far there hasn't been any dissent to managing our modules with maven.
  In addition there have been several comments positive on a move towards
 maven. I'd like to add Ivy seems to have issues managing multiple versions
 of libraries. For example in HIVE-3632 Ivy cache had to be cleared when
 testing patches that installed the new version of DataNucleus  I have had
 the same issue on HIVE-4388. Requiring the deletion of the ivy cache
 is extremely painful for developers that don't have access to high
 bandwidth connections or live in areas far from California where most of
 these jars are hosted.

 I'd like to propose we move towards Maven.


 On Sat, Jul 27, 2013 at 1:19 PM, Mohammad Islam misla...@yahoo.com
 wrote:

 
 
  Yes hive build and test cases got convoluted as the project scope
  gradually increased. This is the time to take action!
 
  Based on my other Apache experiences, I prefer the option #3 Breakup the
  projects within our own source tree. Make multiple modules or
  sub-projects. By default, only key modules will be built.
 
  Maven could be a possible candidate.
 
  Regards,
  Mohammad
 
 
 
  
   From: Edward Capriolo edlinuxg...@gmail.com
  To: dev@hive.apache.org dev@hive.apache.org
  Sent: Saturday, July 27, 2013 7:03 AM
  Subject: Re: [Discuss] project chop up
 
 
  Or feel free to suggest different approach. I am used to managing
 software
  as multi-module maven projects.
  From a development standpoint if I was working on beeline, it would be
 nice
  to only require some of the sub-projects to be open in my IDE to do that.
  Also managing everything globally is not ideal.
 
  Hive's project layout, build, and test infrastructure is just funky. It
 has
  to do a few interesting things (shims, testing), but I do not think what
 we
  are doing justifies the massive ant build system we have. Ant is so ten
  years ago.
 
 
 
  On Sat, Jul 27, 2013 at 12:04 AM, Alan Gates ga...@hortonworks.com
  wrote:
 
   But I assume they'd still be a part of targets like package, tar, and
   binary?  Making them compile and test separately and explicitly load
 the
   core Hive jars from maven/ivy seems reasonable.
  
   Alan.
  
   On Jul 26, 2013, at 8:40 PM, Brock Noland wrote:
  
Hi,
   
I think thats part of it but I'd like to decouple the downstream
  projects
even further so that the only connection is the dependency on the
 hive
   jars.
   
Brock
On Jul 26, 2013 10:10 PM, Alan Gates ga...@hortonworks.com
 wrote:
   
I'm not sure how this is different from what hcat does today.  It
  needs
Hive's jars to compile, so it's one of the last things in the
 compile
   step.
Would moving the other modules you note to be in the same category
 be
enough?  Did you want to also make it so that the default ant target
doesn't compile those?
   
Alan.
   
On Jul 26, 2013, at 4:09 PM, Edward Capriolo wrote:
   
My mistake on saying hcat was a fork metastore. I had a brain fart
  for
   a
moment.
   
One way we could do this is create a folder called downstream. In
 our
release step we can execute the downstream builds and then copy the
   files
we need back. So nothing downstream will be on the classpath of the
   main
project.
   
This could help us breakup ql as well. Things like exotic file
  formats
   ,
and things that are pluggable like zk locking can go here. That
 might
   be
overkill.
   
For now we can focus on building downstream and hivethrift1might be
  the
first thing to try to downstream.
   
   
On Friday, July 26, 2013, Thejas Nair the...@hortonworks.com
  wrote:
+1 to the idea of making the build of core hive and other
 downstream
components independent.
   
bq.  I was under the impression that Hcat and hive-metastore was
supposed to merge up somehow.
   
The metastore code was never forked. Hcat was just using
hive-metastore and making the metadata available to rest of hadoop
(pig, java MR..).
A lot of the changes that were driven by hcat goals were being
 made
  in
hive-metastore. You can think of hcat as set of libraries that let
  pig
and java MR use hive metastore. Since hcat is closely tied to
hive-metastore, it makes sense to have them in same project.
   
   
 

Re: Access to trigger jobs on jenkins

2013-08-05 Thread kulkarni.swar...@gmail.com
Hi Brock,

Yes I was looking to trigger the pre-commit builds without having to
check-in a new patch everytime to auto-trigger them. I assumed they were
similar to the *regular* builds?


On Mon, Aug 5, 2013 at 7:43 AM, Brock Noland br...@cloudera.com wrote:

 Hi,

 Are you looking to trigger the pre-commit builds?

 Unfortunately to trigger *regular* builds you'd need an Apache username
 according the Apache Infra Jenkins http://wiki.apache.org/general/Jenkins
 page.

 Brock


 On Sun, Aug 4, 2013 at 1:37 PM, kulkarni.swar...@gmail.com 
 kulkarni.swar...@gmail.com wrote:

  Hello,
 
  I was wondering if it is possible to get access to be able to trigger
 jobs
  on the jenkins server? Or is that access limited to committers?
 
  Thanks,
 
  --
  Swarnim
 



 --
 Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org




-- 
Swarnim


Access to trigger jobs on jenkins

2013-08-04 Thread kulkarni.swar...@gmail.com
Hello,

I was wondering if it is possible to get access to be able to trigger jobs
on the jenkins server? Or is that access limited to committers?

Thanks,

-- 
Swarnim


Re: is there set of queries, which can be used to benchmark the hive performance?

2013-04-16 Thread kulkarni.swar...@gmail.com
Hi Rob,

HiBench[1] is one I have seen most commonly used.

[1] https://github.com/intel-hadoop/HiBench/tree/master/hivebench


On Tue, Apr 16, 2013 at 6:42 PM, ur lops urlop...@gmail.com wrote:

 I am looking to benchmark my database with hive. but before I do that,
 I want to run a set of tests on hive to benchmark hive. Is there
 something exists in hive, similar to pig gridmix?
 Thanks in advance
 Rob.




-- 
Swarnim


Preferred way to run unit tests

2013-04-12 Thread kulkarni.swar...@gmail.com
Hello,

I have been trying to run the unit tests for the last hive release (0.10).
For me they have been taking  in access of 10 hrs to run (not to mention
the occasional failures with some of the flaky tests).

Current I am just doing a ant clean package test. Is there a better way
to run these? Also is it possible for the build to ignore any test failures
and complete?

Thanks for any help.

-- 
Swarnim


Re: Review Requests

2013-02-20 Thread kulkarni.swar...@gmail.com
Would someone have a chance to take a quick look at these review
requests[1][2].

[1] https://reviews.apache.org/r/9275/
[2] https://reviews.apache.org/r/9276/

Thanks,


On Tue, Feb 5, 2013 at 10:00 AM, kulkarni.swar...@gmail.com 
kulkarni.swar...@gmail.com wrote:

 Thanks Mark. Appreciate that. I'll take a look.


 On Mon, Feb 4, 2013 at 10:23 PM, Mark Grover 
 grover.markgro...@gmail.comwrote:

 Swarnim,
 I left some comments on  reviewboard.

 On Mon, Feb 4, 2013 at 8:00 AM, kulkarni.swar...@gmail.com 
 kulkarni.swar...@gmail.com wrote:

  Hello,
 
  I opened up two reviews for small issues, HIVE-3553[1] and
 HIVE-3725[2]. If
  you guys get a chance to review and provide feedback on it, I will
 really
  appreciate.
 
  Thanks,
 
  [1] https://reviews.apache.org/r/9275/
  [2] https://reviews.apache.org/r/9276/
 
  --
  Swarnim
 




 --
 Swarnim




-- 
Swarnim


Re: Review Requests

2013-02-05 Thread kulkarni.swar...@gmail.com
Thanks Mark. Appreciate that. I'll take a look.


On Mon, Feb 4, 2013 at 10:23 PM, Mark Grover grover.markgro...@gmail.comwrote:

 Swarnim,
 I left some comments on  reviewboard.

 On Mon, Feb 4, 2013 at 8:00 AM, kulkarni.swar...@gmail.com 
 kulkarni.swar...@gmail.com wrote:

  Hello,
 
  I opened up two reviews for small issues, HIVE-3553[1] and HIVE-3725[2].
 If
  you guys get a chance to review and provide feedback on it, I will really
  appreciate.
 
  Thanks,
 
  [1] https://reviews.apache.org/r/9275/
  [2] https://reviews.apache.org/r/9276/
 
  --
  Swarnim
 




-- 
Swarnim


Review Requests

2013-02-04 Thread kulkarni.swar...@gmail.com
Hello,

I opened up two reviews for small issues, HIVE-3553[1] and HIVE-3725[2]. If
you guys get a chance to review and provide feedback on it, I will really
appreciate.

Thanks,

[1] https://reviews.apache.org/r/9275/
[2] https://reviews.apache.org/r/9276/

-- 
Swarnim


Re: hive 0.10 release

2012-11-19 Thread kulkarni.swar...@gmail.com
There are couple of enhancements that I have been working on mainly related
to the hive/hbase integration. It would be awesome if it is possible at all
to include them in this release. None of them should really be high risk. I
have patches submitted for few of them. Will try to get for others
submitted in next couple of days. Any specific deadline that I should be
looking forward to?

[1] https://issues.apache.org/jira/browse/HIVE-2599 (Patch Available)
[2] https://issues.apache.org/jira/browse/HIVE-3553 (Patch Available)
[3] https://issues.apache.org/jira/browse/HIVE-3211
[4] https://issues.apache.org/jira/browse/HIVE-3555
[5] https://issues.apache.org/jira/browse/HIVE-3725


On Mon, Nov 19, 2012 at 4:55 PM, Ashutosh Chauhan hashut...@apache.orgwrote:

 Another quick update. I have created a hive-0.10 branch. At this point,
 HIVE-3678 is a blocker to do a 0.10 release. There are few others nice to
 have which were there in my previous email. I will be happy to merge new
 patches between now and RC if folks request for it and are low risk.

 Thanks,
 Ashutosh
 On Thu, Nov 15, 2012 at 2:29 PM, Ashutosh Chauhan hashut...@apache.org
 wrote:

  Good progress. Looks like folks are on board. I propose to cut the branch
  in next couple of days. There are few jiras which are patch ready which I
  want to get into the hive-0.10 release, including HIVE-3255 HIVE-2517
  HIVE-3400 HIVE-3678
  Ed has already made a request for HIVE-3083.  If folks have other patches
  they want see in 0.10, please chime in.
  Also, request to other committers to help in review patches. There are
  quite a few in Patch Available state.
 
  Thanks,
  Ashutosh
 
 
  On Thu, Nov 8, 2012 at 3:22 PM, Owen O'Malley omal...@apache.org
 wrote:
 
  +1
 
 
  On Thu, Nov 8, 2012 at 3:18 PM, Carl Steinbach c...@cloudera.com
 wrote:
 
   +1
  
   On Wed, Nov 7, 2012 at 11:23 PM, Alexander Lorenz 
 wget.n...@gmail.com
   wrote:
  
+1, good karma
   
On Nov 8, 2012, at 4:58 AM, Namit Jain nj...@fb.com wrote:
   
 +1 to the idea

 On 11/8/12 6:33 AM, Edward Capriolo edlinuxg...@gmail.com
  wrote:

 That sounds good. I think this issue needs to be solved as well
 as
 anything else that produces a bugus query result.

 https://issues.apache.org/jira/browse/HIVE-3083

 Edward

 On Wed, Nov 7, 2012 at 7:50 PM, Ashutosh Chauhan 
   hashut...@apache.org
 wrote:
 Hi,

 Its been a while since we released 0.10 more than six months
 ago.
  All
 this
 while, lot of action has happened with various cool features
  landing
   in
 trunk. Additionally, I am looking forward to HiveServer2 landing
  in
 trunk.  So, I propose that we cut the branch for 0.10 soon
  afterwards
 and
 than release it. Thoughts?

 Thanks,
 Ashutosh

   
--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF
   
   
  
 
 
 




-- 
Swarnim


Hive JIRA slow/dead

2012-10-08 Thread kulkarni.swar...@gmail.com
Seems like the hive JIRA is extremely slow to respond since this morning.
Is there anyway to may be cycle the instance to fix the issue?

Thanks,

-- 
Swarnim


Re: Custom UserDefinedFunction in Hive

2012-08-07 Thread kulkarni.swar...@gmail.com
Have you tried using EXPLAIN[1] on your query? I usually like to use that
to get a better understanding of what my query is actually doing and
debugging at other times.

[1] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

On Tue, Aug 7, 2012 at 12:20 PM, Raihan Jamal jamalrai...@gmail.com wrote:

 Hi Jan,


 I figured that out, it is working fine for me now. The only question I
 have is, if I am doing like this-



 SELECT * FROM REALTIME where dt= yesterdaydate('MMdd') LIMIT 10;



 Then the above query will be evaluated as below right?



 SELECT * FROM REALTIME where dt= ‘20120806’ LIMIT 10;



 So that means it will look for data in the corresponding dt partition 
 *(20120806)
 *only right as above table is partitioned on dt column ? And it will not
 scan the whole table right?**



 *Raihan Jamal*



 On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár dolik@gmail.com wrote:

 Hi Jamal,

 Check if the function really returns what it should and that your data
 are really in MMdd format. You can do this by simple query like this:

 SELECT dt, yesterdaydate('MMdd') FROM REALTIME LIMIT 1;

 I don't see anything wrong with the function itself, it works well for me
 (although I tested it in hive 0.7.1). The only thing I would change about
 it would be to optimize it by calling 'new' only at the time of
 construction and reusing the object when the function is called, but that
 should not affect the functionality at all.

 Best regards,
 Jan




 On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal jamalrai...@gmail.comwrote:

 *Problem*

 I created the below UserDefinedFunction to get the yesterday's day in
 the format I wanted as I will be passing the format into this below method
 from the query.



 *public final class YesterdayDate extends UDF {*

 * *

 *public String evaluate(final String format) { *

 *DateFormat dateFormat = new
 SimpleDateFormat(format); *

 *Calendar cal = Calendar.getInstance();*

 *cal.add(Calendar.DATE, -1); *

 *return
 dateFormat.format(cal.getTime()).toString(); *

 *} *

 *}*





 So whenever I try to run the query like below by adding the jar to
 classpath and creating the temporary function yesterdaydate, I always get
 zero result back-



 hive create temporary function *yesterdaydate* as
 'com.example.hive.udf.YesterdayDate';

 OK

 Time taken: 0.512 seconds



 Below is the query I am running-



 *hive SELECT * FROM REALTIME where dt= yesterdaydate('MMdd') LIMIT
 10;*

 *OK*

 * *

 And I always get zero result back but the data is there in that table
 for Aug 5th.**



 What wrong I am doing? Any suggestions will be appreciated.





 NOTE:- As I am working with Hive 0.6 so it doesn’t support variable
 substitution thing, so I cannot use hiveconf here and the above table has
 been partitioned on dt(date) column.**






-- 
Swarnim


Re: Some Weird Behavior

2012-08-07 Thread kulkarni.swar...@gmail.com
In that case you might want to try count(1) instead of count(*) and see
if that makes any difference. [1]

[1] https://issues.apache.org/jira/browse/HIVE-287

On Tue, Aug 7, 2012 at 1:07 PM, Techy Teck comptechge...@gmail.com wrote:

 I am running Hive 0.6.





 On Tue, Aug 7, 2012 at 11:04 AM, kulkarni.swar...@gmail.com 
 kulkarni.swar...@gmail.com wrote:

 What is the hive version that you are using?


 On Tue, Aug 7, 2012 at 12:57 PM, Techy Teck comptechge...@gmail.comwrote:

 I am not sure about the data, but when we do

 SELECT count(*) from data_realtime where dt='20120730' and uid is null

 I get the count

 but If I do-

 SELECT * from data_realtime where dt='20120730' and uid is null

 I get zero record back. But if all the record is NULL then I should be
 getting NULL record back right?


 But I am not getting anything back and that is the reason it is making me
 more confuse.






 On Tue, Aug 7, 2012 at 10:31 AM, Yue Guan pipeha...@gmail.com wrote:

  Just in case, all Record is null when uid is null?
 
  On Tue, Aug 7, 2012 at 1:14 PM, Techy Teck comptechge...@gmail.com
  wrote:
   SELECT count(*) from data_realtime where dt='20120730' and uid is
 null
  
  
  
   I get the count as 1509
  
  
  
   So that means If I will be doing
  
  
  
   SELECT * from data_realtime where dt='20120730' and uid is null
  
  
  
   I should be seeing those records in which uid is null? right?
  
   But I get zero record back with the above query. Why is it so? Its
 very
   strange and why is it happening like this. Something wrong with the
 Hive?
  
  
  
   Can anyone suggest me what is happening?
  
  
  
  
 




 --
 Swarnim





-- 
Swarnim


Re: Logging info is not present in console output

2012-08-07 Thread kulkarni.swar...@gmail.com
Are you running via console? The default logging level is WARN.

$HIVE_HOME/bin/hive -hiveconf hive.root.logger=INFO,console


This should print the INFO messages onto the console.


On Tue, Aug 7, 2012 at 4:07 PM, Ablimit Aji abli...@gmail.com wrote:

 Hi,

 I have put some LOG.info() statements inside the join operator and I'm not
 seeing them by running a join statement.
 How can I see it? or is there any better way of debugging ?

 Thanks,
 Ablimit




-- 
Swarnim


Re: Casting exception while converting from LazyDouble to LazyString

2012-07-10 Thread kulkarni.swar...@gmail.com
Hi Kanna,

This might just mean that in your query you are having a STRING type for a
field which is actually a DOUBLE.

On Tue, Jul 10, 2012 at 3:05 PM, Kanna Karanam kanna...@microsoft.comwrote:

  Has anyone seen this error before? Am I missing anything here?

 ** **

 2012-07-10 11:11:02,203 INFO org.apache.hadoop.mapred.TaskInProgress:
 Error from attempt_201207091248_0107_m_00_0:
 java.lang.RuntimeException:
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
 processing row {name:zach johnson,age:77,gpa:3.27}

 at
 org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)

 at
 org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)

 at
 org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)

 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)*
 ***

 at org.apache.hadoop.mapred.Child$4.run(Child.java:271)***
 *

 at java.security.AccessController.doPrivileged(Native
 Method)

 at javax.security.auth.Subject.doAs(Subject.java:396)

 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1124)
 

 at org.apache.hadoop.mapred.Child.main(Child.java:265)

 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
 Error while processing row {name:zach johnson,age:77,gpa:3.27}

 at
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)**
 **

 at
 org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)

 ... 8 more

 Caused by: java.lang.ClassCastException:
 org.apache.hadoop.hive.serde2.lazy.LazyDouble cannot be cast to
 org.apache.hadoop.hive.serde2.lazy.LazyString

 at
 org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveWritableObject(LazyStringObjectInspector.java:47)
 

 at
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:351)
 

 at
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:255)
 

 at
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:202)
 

 at
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:236)
 

 at
 org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)

 at
 org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)

 at
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
 

 at
 org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)

 at
 org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)

 at
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)**
 **

 ** **

 ** **

 Thanks,

 Kanna




-- 
Swarnim


Re: Developing Hive UDF in eclipse

2012-06-05 Thread kulkarni.swar...@gmail.com
Did you try this[1]? It had got me most of my way through the process.

[1] https://cwiki.apache.org/Hive/gettingstarted-eclipsesetup.html

On Tue, Jun 5, 2012 at 8:49 AM, Arun Prakash ckarunprak...@gmail.comwrote:

 Hi Friends,
 I tried to develop udf for hive but i am getting package import error
 in eclipse.

 import org.apache.hadoop.hive.ql.exec.UDF;


 How to import hive package in eclipse?


 Any inputs much appreciated.



 Best Regards
  Arun Prakash C.K

 Keep On Sharing Your Knowledge with Others




-- 
Swarnim


getStructFieldData method on StructObjectInspector

2012-05-25 Thread kulkarni.swar...@gmail.com
I am trying to write a custom ObjectInspector extending the
StructObjectInspector and got a little confused about the use of the
getStructFieldData method on the inspector. Looking at the definition of
the method:

public Object getStructFieldData(Object data, StructField fieldRef);

I understand that the use of this method is to retrieve the specific given
field from the buffer. However, what I don't understand is what is it
expected to return. I looked around the tests and related code and mostly
stuff returned was either a LazyPrimitive or a LazyNonPrimitive, but I
couldn't find anything that enforces this(specially given that the return
type is a plain Object)! Does this mean that I am free to return even my
custom object as a return type of this method? If so, what is the guarantee
that it will be interpreted correctly down the pipeline?

Thanks,
-- 
Swarnim