Making Thrift work with Hive in client-server mode

2011-02-03 Thread Jay Ramadorai
Can someone explain how the Thriftserver finds the Hive metastore?

I am running with all non-default values and need to know how to connect to 
Thrift so it finds Hive with the right metastore.

I am running Derby in server mode on a non-default port. And my metastore name 
is non-default. And I want to run my Thrift server on a non-default port.

My hive-site looks like this:
property
  namejavax.jdo.option.ConnectionURL/name
  valuejdbc:derby://myhost:/MYmetastore_db;create=true/value
  descriptionJDBC connect string for a JDBC metastore/description
/property
property
  namejavax.jdo.option.ConnectionDriverName/name
  valueorg.apache.derby.jdbc.ClientDriver/value
  descriptionDriver class name for a JDBC metastore/description
/property

I start derby up as follows :
cd $DERBY_HOME/data
nohup $DERBY_HOME/bin/startNetworkServer -h 0.0.0.0 -p  
-
I am able to connect from the Hive CLI just fine and able to create, drop, 
select from tables in the right metastore.

Now I start my Thrift server as follows:
HIVE_PORT=11000
export HIVE_PORT
nohup hive  --service hiveserver 

Thrift server starts up fine and attaches to port 11000

Now I try to run the Hive server test: 
ant test -Dtestcase=TestJdbcDriver -Dstandalone=true
...and of course it says Tests Failed, with no further specific detail.

The Test java program (http://wiki.apache.org/hadoop/Hive/HiveClient)  tries to 
connect as:
DriverManager.getConnection(jdbc:hive://localhost:1/default, , )
My question is besides changing it to 
DriverManager.getConnection(jdbc:hive://myhost:11000/default, , )
what else do I need to do?
What does the default in the connect string signify? Should that be my 
metastore name? There is also a DATABASE in Hive called default, so I am not 
so sure that I should change this.

Bottom line how is the Thrift server supposed to find the metatore, and how 
should I connect to the Thrift server from a jdbc client.
Thanks
JayR





Re: Making Thrift work with Hive in client-server mode

2011-02-03 Thread Jay Ramadorai
Sorry. I had an error in my message below. I start up Derby on the same port 
that is specified in hive-site. So my derby start looks like:
 
 nohup $DERBY_HOME/bin/startNetworkServer -h 0.0.0.0 -p  
(not )
 
BTW, all the ports shown here are examples only.

On Feb 3, 2011, at 9:22 AM, Jay Ramadorai wrote:

 Can someone explain how the Thriftserver finds the Hive metastore?
 
 I am running with all non-default values and need to know how to connect to 
 Thrift so it finds Hive with the right metastore.
 
 I am running Derby in server mode on a non-default port. And my metastore 
 name is non-default. And I want to run my Thrift server on a non-default port.
 
 My hive-site looks like this:
 property
   namejavax.jdo.option.ConnectionURL/name
   valuejdbc:derby://myhost:/MYmetastore_db;create=true/value
   descriptionJDBC connect string for a JDBC metastore/description
 /property
 property
   namejavax.jdo.option.ConnectionDriverName/name
   valueorg.apache.derby.jdbc.ClientDriver/value
   descriptionDriver class name for a JDBC metastore/description
 /property
 
 I start derby up as follows :
 cd $DERBY_HOME/data
 nohup $DERBY_HOME/bin/startNetworkServer -h 0.0.0.0 -p  
 -
 I am able to connect from the Hive CLI just fine and able to create, drop, 
 select from tables in the right metastore.
 
 Now I start my Thrift server as follows:
 HIVE_PORT=11000
 export HIVE_PORT
 nohup hive  --service hiveserver 
 
 Thrift server starts up fine and attaches to port 11000
 
 Now I try to run the Hive server test: 
 ant test -Dtestcase=TestJdbcDriver -Dstandalone=true
 ...and of course it says Tests Failed, with no further specific detail.
 
 The Test java program (http://wiki.apache.org/hadoop/Hive/HiveClient)  tries 
 to connect as:
 DriverManager.getConnection(jdbc:hive://localhost:1/default, , )
 My question is besides changing it to 
 DriverManager.getConnection(jdbc:hive://myhost:11000/default, , )
 what else do I need to do?
 What does the default in the connect string signify? Should that be my 
 metastore name? There is also a DATABASE in Hive called default, so I am 
 not so sure that I should change this.
 
 Bottom line how is the Thrift server supposed to find the metatore, and how 
 should I connect to the Thrift server from a jdbc client.
 Thanks
 JayR
 
 
 



Re: [VOTE] Sponsoring Howl as an Apache Incubator project

2011-02-03 Thread Edward Capriolo
On Thu, Feb 3, 2011 at 12:16 AM, Alan Gates ga...@yahoo-inc.com wrote:
 Edward,

 I understand your concern with having a copy of the metastore code in Howl.
  However, let's separate code from governance.  The reason Howl has a copy
 of Hive's metastore is not because we're proposing it for the Incubator, it
 is because in the course of developing it over the last six months we've
 found that Howl development needs to move much faster than Hive development
 can.  This is appropriate, since Hive is a mature product and has at least
 one large customer that runs code in production very soon after it is
 checked in.  Thus the Hive community is rightly cautious about checking in
 changes to the metastore.  Howl, on the other hand, is new and innovating
 quickly, so it likes to get things checked in quickly.  Over the last six
 months every patch Howl has made to the Hive metastore code has made it back
 into Hive code.  But it generally takes a few weeks or more to get in.

 Whether Howl is a Hive subproject or an Incubator project it faces the same
 dilemma. The only other alternative that was suggested was to have Howl
 extern the metastore code from Hive and keep its patches in its build and
 apply them at build time.  But this is very fragile, since any changes in
 the Hive metastore code could invalidate all those patches.  We know that
 this is not sustainable in the long run, which is why the proposal calls out
 the need to resolve this one way or another as the project matures.

 As far as reaching an end state where Hive and Howl are not compatible, we
 would view that as a failure for Howl.  The goal for Howl is to be a
 metastore for Pig, MapReduce, and Hive, not just 2 out 3.  So we have a
 strong motivation to maintain that compatibility.

 In terms of governance, given that we have significant contributions coming
 from members of the Pig team, the Hive team, and the core Hadoop team it
 seemed that giving Howl its own space in the Incubator made more sense than
 adding it as a subproject of any one of those teams.

 Alan.

 On Feb 2, 2011, at 3:11 PM, Edward Capriolo wrote:

 On Wed, Feb 2, 2011 at 5:08 PM, Jeff Hammerbacher ham...@cloudera.com
 wrote:

 Awesome! Huge +1.

 On Wed, Feb 2, 2011 at 1:18 PM, Alan Gates ga...@yahoo-inc.com wrote:

 Howl is a table management system built to provide metadata and storage
 management across data processing tools in Hadoop (Pig, Hive, MapReduce,
 ...).  You can learn more details at http://wiki.apache.org/pig/Howl.
  For
 the last six months the code has been hosted at github.  The Howl team
 would
 like to move the project into the Apache Incubator.  You can see the
 proposal for the project at
 http://wiki.apache.org/incubator/HowlProposal.

 In order to be accepted as an Incubator project Howl needs a Sponsoring
 project.  I propose that we, the Pig project, sponsor Howl.  By
 sponsoring
 Howl we are saying that we believe it is a good fit for the ASF and that
 we
 will assist the Howl project to succeed.  You can read full details of
 sponsoring a project at

 http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor
 .

 Our bylaws don't explicitly cover such a vote, but I think lazy majority
 should be reasonable.  All votes are welcome, PMC member votes will be
 binding.

 Clearly I'm +1.

 Alan.



 I do think it is a great idea that hive/pig/ and map reduce share a
 meta store. However I am not sure I agree with the approach. IMHO Howl
 should be a hive sub project.

 The initial release of Howl will allow interoperability of data
 between Pig, Map Reduce, and Hive
 I believe the The initial release of Howl should support hive
 at this point hive should remove the /metastore code from inside hive
 and depend on howl.

 I say this because hive is very actively reworking the metastore right
 now for security, a new type of views, and indexes. I feel if the
 metastore branches from the hive as howl getting the two entities back
 together will be difficult. Having 99% of the same code base shared
 between hive and howl but not having compatibility between the two is
 my fear.



Alan,

I see your points. I agree with you and I am +1.

(incubator/subproject is not important to me)

You mentioned that hive is cautious about checking changes into the
meta-store. I would not say we (hive) are cautious. Hive is getting
pulled in many people in many directions (this is a good thing). But
the number of people that can technically review patches might be
burdened at times by the number of them.

Ideally, I would think hive committers are going to be active (and
probably would have commit) on howl or is it going to be the burden of
howl track pig and hive until hive drops /metastore and begins using
howl? I am just curious about what you think the time line looks like
(IE how long howl will be in the incubator for) (rought guess of
course)

Thank you,
Edward


Re: [VOTE] Sponsoring Howl as an Apache Incubator project

2011-02-03 Thread Ashutosh Chauhan
+1

On Wed, Feb 2, 2011 at 13:18, Alan Gates ga...@yahoo-inc.com wrote:
 Howl is a table management system built to provide metadata and storage
 management across data processing tools in Hadoop (Pig, Hive, MapReduce,
 ...).  You can learn more details at http://wiki.apache.org/pig/Howl.  For
 the last six months the code has been hosted at github.  The Howl team would
 like to move the project into the Apache Incubator.  You can see the
 proposal for the project at http://wiki.apache.org/incubator/HowlProposal.

 In order to be accepted as an Incubator project Howl needs a Sponsoring
 project.  I propose that we, the Pig project, sponsor Howl.  By sponsoring
 Howl we are saying that we believe it is a good fit for the ASF and that we
 will assist the Howl project to succeed.  You can read full details of
 sponsoring a project at
 http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor.

 Our bylaws don't explicitly cover such a vote, but I think lazy majority
 should be reasonable.  All votes are welcome, PMC member votes will be
 binding.

 Clearly I'm +1.

 Alan.



RE: Please read if you plan to use Hive 0.7.0 on Hadoop 0.20.0

2011-02-03 Thread Severance, Steve
We are not using 0.20 at eBay so we are fine with this.

Steve

From: Ajo Fod [mailto:ajo@gmail.com]
Sent: Monday, January 31, 2011 9:49 PM
To: user@hive.apache.org
Subject: Re: Please read if you plan to use Hive 0.7.0 on Hadoop 0.20.0

I am new to hive and hadoop and I got the packaged version from Cloudera. So, 
personally, I'd be happy if the new package is mutually consistent.

-Ajo
On Mon, Jan 31, 2011 at 5:14 PM, Carl Steinbach 
c...@cloudera.commailto:c...@cloudera.com wrote:
Hi,

I'm trying to get an idea of how many people plan on running Hive
0.7.0 on top of Hadoop 0.20.0 (as opposed to 0.20.1 or 0.20.2),
and are in a position where they can't upgrade to one of more
recent releases of the 0.20.x branch. I'm asking because there is
a ticket open (HIVE-1817) that blocks the 0.7.0 release, and the
simplest, lowest-risk fix for this ticket is to remove support
for Hadoop 0.20.0. If we go with this solution it means that in
order to use Hive 0.7.0 you will have to use Hadoop 0.20.1 or
0.20.2. The only advantage of the more complicated solution is
that we will be able to retain support for Hadoop 0.20.0, but
most likely at the expense of delaying the release and possibly
introducing new bugs.

Thanks.

Carl




Hive queries consuming 100% cpu

2011-02-03 Thread Vijay
Hi,

The simplest of hive queries seem to be consuming 100% cpu. This is
with a small 4-node cluster. The machines are pretty beefy (16 cores
per machine, tons of RAM, 16 M+R maximum tasks configured, 1GB RAM for
mapred.child.java.opts, etc). A simple query like select count(1)
from events where the events table has daily partitions of log files
in gzipped file format). While this is probably too generic a question
and there is a bunch of investigation we need to, are there any
specific areas for me to look at? Has anyone see anything like this
before? Also, are there any tools or easy options to profile hive
query execution?

Thanks in advance,
Vijay


Re: [VOTE] Sponsoring Howl as an Apache Incubator project

2011-02-03 Thread Jeff Hammerbacher
Hey,


 If we do go ahead with pulling the metastore out of Hive, it might make
 most sense for Howl to become its own TLP rather than a subproject.


Yes, I did not read the proposal closely enough. I think an end state as a
TLP makes more sense for Howl than as a Pig subproject. I'd really love to
see Howl replace the metastore in Hive and it would be more natural to do so
as a TLP than as a Pig subproject--especially since the current Howl
repository is literally a fork of Hive.


 In the incubator proposal, we have mentioned these issues, but we've
 attempted to avoid prejudicing any decision.  Instead, we'd like to assess
 the pros and cons (including effort required and impact expected) for both
 approaches as part of the incubation process.


Glad the issues are being considered.

Later,
Jeff


Re: [VOTE] Sponsoring Howl as an Apache Incubator project

2011-02-03 Thread yongqiang he
I am interested in some numbers around the lines of code changes (or
files of changes) which are in Howl but not in Hive?
Can anyone give some information here?

Thanks
Yongqiang
On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher ham...@cloudera.com wrote:
 Hey,


 If we do go ahead with pulling the metastore out of Hive, it might make
 most sense for Howl to become its own TLP rather than a subproject.

 Yes, I did not read the proposal closely enough. I think an end state as a
 TLP makes more sense for Howl than as a Pig subproject. I'd really love to
 see Howl replace the metastore in Hive and it would be more natural to do so
 as a TLP than as a Pig subproject--especially since the current Howl
 repository is literally a fork of Hive.


 In the incubator proposal, we have mentioned these issues, but we've
 attempted to avoid prejudicing any decision.  Instead, we'd like to assess
 the pros and cons (including effort required and impact expected) for both
 approaches as part of the incubation process.

 Glad the issues are being considered.
 Later,
 Jeff


Re: [VOTE] Sponsoring Howl as an Apache Incubator project

2011-02-03 Thread Ashutosh Chauhan
There are none as of today. In the past, whenever we had to have
changes, we do it in a separate branch in Howl and once those get
committed to hive repo, we pull it over in our trunk and drop the
branch.

Ashutosh
On Thu, Feb 3, 2011 at 13:41, yongqiang he heyongqiang...@gmail.com wrote:
 I am interested in some numbers around the lines of code changes (or
 files of changes) which are in Howl but not in Hive?
 Can anyone give some information here?

 Thanks
 Yongqiang
 On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher ham...@cloudera.com wrote:
 Hey,


 If we do go ahead with pulling the metastore out of Hive, it might make
 most sense for Howl to become its own TLP rather than a subproject.

 Yes, I did not read the proposal closely enough. I think an end state as a
 TLP makes more sense for Howl than as a Pig subproject. I'd really love to
 see Howl replace the metastore in Hive and it would be more natural to do so
 as a TLP than as a Pig subproject--especially since the current Howl
 repository is literally a fork of Hive.


 In the incubator proposal, we have mentioned these issues, but we've
 attempted to avoid prejudicing any decision.  Instead, we'd like to assess
 the pros and cons (including effort required and impact expected) for both
 approaches as part of the incubation process.

 Glad the issues are being considered.
 Later,
 Jeff



Re: [VOTE] Sponsoring Howl as an Apache Incubator project

2011-02-03 Thread John Sichi
But Howl does layer on some additional code, right?

https://github.com/yahoo/howl/tree/howl/howl

JVS

On Feb 3, 2011, at 1:49 PM, Ashutosh Chauhan wrote:

 There are none as of today. In the past, whenever we had to have
 changes, we do it in a separate branch in Howl and once those get
 committed to hive repo, we pull it over in our trunk and drop the
 branch.
 
 Ashutosh
 On Thu, Feb 3, 2011 at 13:41, yongqiang he heyongqiang...@gmail.com wrote:
 I am interested in some numbers around the lines of code changes (or
 files of changes) which are in Howl but not in Hive?
 Can anyone give some information here?
 
 Thanks
 Yongqiang
 On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher ham...@cloudera.com 
 wrote:
 Hey,
 
 
 If we do go ahead with pulling the metastore out of Hive, it might make
 most sense for Howl to become its own TLP rather than a subproject.
 
 Yes, I did not read the proposal closely enough. I think an end state as a
 TLP makes more sense for Howl than as a Pig subproject. I'd really love to
 see Howl replace the metastore in Hive and it would be more natural to do so
 as a TLP than as a Pig subproject--especially since the current Howl
 repository is literally a fork of Hive.
 
 
 In the incubator proposal, we have mentioned these issues, but we've
 attempted to avoid prejudicing any decision.  Instead, we'd like to assess
 the pros and cons (including effort required and impact expected) for both
 approaches as part of the incubation process.
 
 Glad the issues are being considered.
 Later,
 Jeff
 



Re: [VOTE] Sponsoring Howl as an Apache Incubator project

2011-02-03 Thread Ashutosh Chauhan
What I am referring to is metastore/ dir of hive, part of hive code
which howl cares about most. Other howl code is for additional
functionalities that Howl provides (none of which lives in metastore/
dir) they are in howl/ dir. There are few build file changes, but they
are trivial.

Ashutosh
On Thu, Feb 3, 2011 at 14:49, John Sichi jsi...@fb.com wrote:
 But Howl does layer on some additional code, right?

 https://github.com/yahoo/howl/tree/howl/howl

 JVS

 On Feb 3, 2011, at 1:49 PM, Ashutosh Chauhan wrote:

 There are none as of today. In the past, whenever we had to have
 changes, we do it in a separate branch in Howl and once those get
 committed to hive repo, we pull it over in our trunk and drop the
 branch.

 Ashutosh
 On Thu, Feb 3, 2011 at 13:41, yongqiang he heyongqiang...@gmail.com wrote:
 I am interested in some numbers around the lines of code changes (or
 files of changes) which are in Howl but not in Hive?
 Can anyone give some information here?

 Thanks
 Yongqiang
 On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher ham...@cloudera.com 
 wrote:
 Hey,


 If we do go ahead with pulling the metastore out of Hive, it might make
 most sense for Howl to become its own TLP rather than a subproject.

 Yes, I did not read the proposal closely enough. I think an end state as a
 TLP makes more sense for Howl than as a Pig subproject. I'd really love to
 see Howl replace the metastore in Hive and it would be more natural to do 
 so
 as a TLP than as a Pig subproject--especially since the current Howl
 repository is literally a fork of Hive.


 In the incubator proposal, we have mentioned these issues, but we've
 attempted to avoid prejudicing any decision.  Instead, we'd like to assess
 the pros and cons (including effort required and impact expected) for both
 approaches as part of the incubation process.

 Glad the issues are being considered.
 Later,
 Jeff





Re: [VOTE] Sponsoring Howl as an Apache Incubator project

2011-02-03 Thread John Sichi
I forgot about the serde dependencies...can you add those to the Initial Source 
note in [[HowlProposal]] just for completeness?

JVS

On Feb 3, 2011, at 3:11 PM, Alan Gates wrote:

 Yes, it adds Input and Output formats for MapReduce and load and store 
 functions for Pig.  In the future it we expect it will continue to add more 
 additional layers.
 
 Alan.
 
 On Feb 3, 2011, at 2:49 PM, John Sichi wrote:
 
 But Howl does layer on some additional code, right?
 
 https://github.com/yahoo/howl/tree/howl/howl
 
 JVS
 
 On Feb 3, 2011, at 1:49 PM, Ashutosh Chauhan wrote:
 
 There are none as of today. In the past, whenever we had to have
 changes, we do it in a separate branch in Howl and once those get
 committed to hive repo, we pull it over in our trunk and drop the
 branch.
 
 Ashutosh
 On Thu, Feb 3, 2011 at 13:41, yongqiang he heyongqiang...@gmail.com wrote:
 I am interested in some numbers around the lines of code changes (or
 files of changes) which are in Howl but not in Hive?
 Can anyone give some information here?
 
 Thanks
 Yongqiang
 On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher ham...@cloudera.com 
 wrote:
 Hey,
 
 
 If we do go ahead with pulling the metastore out of Hive, it might make
 most sense for Howl to become its own TLP rather than a subproject.
 
 Yes, I did not read the proposal closely enough. I think an end state as a
 TLP makes more sense for Howl than as a Pig subproject. I'd really love to
 see Howl replace the metastore in Hive and it would be more natural to do 
 so
 as a TLP than as a Pig subproject--especially since the current Howl
 repository is literally a fork of Hive.
 
 
 In the incubator proposal, we have mentioned these issues, but we've
 attempted to avoid prejudicing any decision.  Instead, we'd like to 
 assess
 the pros and cons (including effort required and impact expected) for 
 both
 approaches as part of the incubation process.
 
 Glad the issues are being considered.
 Later,
 Jeff
 
 
 



Re: [VOTE] Sponsoring Howl as an Apache Incubator project

2011-02-03 Thread Alex Boisvert
Hi John,

Just to clarify where I was going with my line of questioning.   There's no
Apache policy that prevents dependencies on incubator project, whether it's
releases, snapshots or even home-made hacked-together packaging of an
incubator project.It's been done before and as long as the incubator
code's IP has been cleared and the packaging isn't represented as an
official release if it isn't so, there's no wrong in doing that.

Now, whether the project choses to use and release with an incubator
dependency is a matter of judgment (and ultimately a vote by committers if
there is no consensus).   I just wanted to make sure there were no incorrect
assumptions made.

alex


On Thu, Feb 3, 2011 at 4:07 PM, John Sichi jsi...@fb.com wrote:

 I was going off of what I read in HADOOP-3676 (which lacks a reference as
 well).  But I guess if a release can be made from the incubator, then it's
 not a blocker.

 JVS

 On Feb 3, 2011, at 3:29 PM, Alex Boisvert wrote:

  On Thu, Feb 3, 2011 at 11:38 AM, John Sichi jsi...@fb.com wrote:
  Besides the fact that the refactoring required is significant, I don't
 think this is possible to do quickly since:
 
  1) Hive (unlike Pig) requires a metastore
 
  2) Hive releases can't depend on an incubator project
 
  I'm not sure what you mean by can't depend on an incubator project
 here.  AFAIK, there is no policy at Apache that projects should not depend
 on incubator projects.  Can you clarify what you mean and why you think such
 a restriction exists?
 
  alex
 




Re: [VOTE] Sponsoring Howl as an Apache Incubator project

2011-02-03 Thread Alan Gates
Are you referring to the serde jar or any particular serde's we are  
making use of?


Alan.

On Feb 3, 2011, at 4:30 PM, John Sichi wrote:

I forgot about the serde dependencies...can you add those to the  
Initial Source note in [[HowlProposal]] just for completeness?


JVS

On Feb 3, 2011, at 3:11 PM, Alan Gates wrote:

Yes, it adds Input and Output formats for MapReduce and load and  
store functions for Pig.  In the future it we expect it will  
continue to add more additional layers.


Alan.

On Feb 3, 2011, at 2:49 PM, John Sichi wrote:


But Howl does layer on some additional code, right?

https://github.com/yahoo/howl/tree/howl/howl

JVS

On Feb 3, 2011, at 1:49 PM, Ashutosh Chauhan wrote:


There are none as of today. In the past, whenever we had to have
changes, we do it in a separate branch in Howl and once those get
committed to hive repo, we pull it over in our trunk and drop the
branch.

Ashutosh
On Thu, Feb 3, 2011 at 13:41, yongqiang he heyongqiang...@gmail.com 
 wrote:
I am interested in some numbers around the lines of code changes  
(or

files of changes) which are in Howl but not in Hive?
Can anyone give some information here?

Thanks
Yongqiang
On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher ham...@cloudera.com 
 wrote:

Hey,



If we do go ahead with pulling the metastore out of Hive, it  
might make
most sense for Howl to become its own TLP rather than a  
subproject.


Yes, I did not read the proposal closely enough. I think an end  
state as a
TLP makes more sense for Howl than as a Pig subproject. I'd  
really love to
see Howl replace the metastore in Hive and it would be more  
natural to do so
as a TLP than as a Pig subproject--especially since the current  
Howl

repository is literally a fork of Hive.



In the incubator proposal, we have mentioned these issues, but  
we've
attempted to avoid prejudicing any decision.  Instead, we'd  
like to assess
the pros and cons (including effort required and impact  
expected) for both

approaches as part of the incubation process.


Glad the issues are being considered.
Later,
Jeff












Re: [VOTE] Sponsoring Howl as an Apache Incubator project

2011-02-03 Thread John Sichi
On Feb 3, 2011, at 5:09 PM, Alan Gates wrote:

 Are you referring to the serde jar or any particular serde's we are making 
 use of?


Both (see below).

JVS



[jsichi@dev1066 ~/open/howl/howl/howl/src/java/org/apache/hadoop/hive/howl] ls
cli/  common/  data/  mapreduce/  pig/  rcfile/
[jsichi@dev1066 ~/open/howl/howl/howl/src/java/org/apache/hadoop/hive/howl] 
grep serde */*
common/HowlUtil.java:import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
common/HowlUtil.java:import 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde.Constants;
rcfile/RCFileInputDriver.java:import 
org.apache.hadoop.hive.serde2.ColumnProjectionUtils;
rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.SerDe;
rcfile/RCFileInputDriver.java:import 
org.apache.hadoop.hive.serde2.SerDeException;
rcfile/RCFileInputDriver.java:import 
org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable;
rcfile/RCFileInputDriver.java:import 
org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe;
rcfile/RCFileInputDriver.java:import 
org.apache.hadoop.hive.serde2.columnar.ColumnarStruct;
rcfile/RCFileInputDriver.java:import 
org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
rcfile/RCFileInputDriver.java:import 
org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector;
rcfile/RCFileInputDriver.java:import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
rcfile/RCFileInputDriver.java:import 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
rcfile/RCFileInputDriver.java:import 
org.apache.hadoop.hive.serde2.objectinspector.StructField;
rcfile/RCFileInputDriver.java:import 
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
rcfile/RCFileInputDriver.java:  private SerDe serde;
rcfile/RCFileInputDriver.java:  struct = 
(ColumnarStruct)serde.deserialize(bytesRefArray);
rcfile/RCFileInputDriver.java:  serde = new ColumnarSerDe();
rcfile/RCFileInputDriver.java:  
serde.initialize(context.getConfiguration(), howlProperties);
rcfile/RCFileInputDriver.java:  oi = (StructObjectInspector) 
serde.getObjectInspector();
rcfile/RCFileMapReduceInputFormat.java:import 
org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable;
rcfile/RCFileMapReduceOutputFormat.java:import 
org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable;
rcfile/RCFileMapReduceRecordReader.java:import 
org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable;
rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde.Constants;
rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.SerDe;
rcfile/RCFileOutputDriver.java:import 
org.apache.hadoop.hive.serde2.SerDeException;
rcfile/RCFileOutputDriver.java:import 
org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe;
rcfile/RCFileOutputDriver.java:import 
org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
rcfile/RCFileOutputDriver.java:import 
org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector;
rcfile/RCFileOutputDriver.java:import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
rcfile/RCFileOutputDriver.java:import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
rcfile/RCFileOutputDriver.java:import 
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
rcfile/RCFileOutputDriver.java:import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
rcfile/RCFileOutputDriver.java:import 
org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo;
rcfile/RCFileOutputDriver.java:import 
org.apache.hadoop.hive.serde2.typeinfo.MapTypeInfo;
rcfile/RCFileOutputDriver.java:import 
org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo;
rcfile/RCFileOutputDriver.java:import 
org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo;
rcfile/RCFileOutputDriver.java:import 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
rcfile/RCFileOutputDriver.java:import 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
rcfile/RCFileOutputDriver.java:   /** The serde for serializing the HowlRecord 
to bytes writable */
rcfile/RCFileOutputDriver.java:   private SerDe serde;
rcfile/RCFileOutputDriver.java:  return serde.serialize(value.getAll(), 
objectInspector);
rcfile/RCFileOutputDriver.java:  serde = new ColumnarSerDe();
rcfile/RCFileOutputDriver.java:  
serde.initialize(context.getConfiguration(), howlProperties);



Howl, howl, howl, howl! O! you are men of stones:
Had I your tongues and eyes, I'd use them so
That heaven's vaults should crack



Re: [VOTE] Sponsoring Howl as an Apache Incubator project

2011-02-03 Thread John Sichi
Got it, thanks for the correction.

JVS

On Feb 3, 2011, at 4:56 PM, Alex Boisvert wrote:

 Hi John,
 
 Just to clarify where I was going with my line of questioning.   There's no 
 Apache policy that prevents dependencies on incubator project, whether it's 
 releases, snapshots or even home-made hacked-together packaging of an 
 incubator project.It's been done before and as long as the incubator 
 code's IP has been cleared and the packaging isn't represented as an official 
 release if it isn't so, there's no wrong in doing that.
 
 Now, whether the project choses to use and release with an incubator 
 dependency is a matter of judgment (and ultimately a vote by committers if 
 there is no consensus).   I just wanted to make sure there were no incorrect 
 assumptions made.
 
 alex
 
 
 On Thu, Feb 3, 2011 at 4:07 PM, John Sichi jsi...@fb.com wrote:
 I was going off of what I read in HADOOP-3676 (which lacks a reference as 
 well).  But I guess if a release can be made from the incubator, then it's 
 not a blocker.
 
 JVS
 
 On Feb 3, 2011, at 3:29 PM, Alex Boisvert wrote:
 
  On Thu, Feb 3, 2011 at 11:38 AM, John Sichi jsi...@fb.com wrote:
  Besides the fact that the refactoring required is significant, I don't 
  think this is possible to do quickly since:
 
  1) Hive (unlike Pig) requires a metastore
 
  2) Hive releases can't depend on an incubator project
 
  I'm not sure what you mean by can't depend on an incubator project here.  
  AFAIK, there is no policy at Apache that projects should not depend on 
  incubator projects.  Can you clarify what you mean and why you think such a 
  restriction exists?
 
  alex
 
 
 



if query in hive

2011-02-03 Thread Amlan Mandal
Actually I need to port some SQL queries to hive QL.

Lets say I have hive table t which has columns mobile_no, cookie, ip,
access_id.

Lets say I want to count unique users. My definition of of unique user = all
unique mobile numbers + all unique cookie (if for them mobile number not
present) + all unique ip ( where both mobile number and cookie is not
present)

For example:

mobile_no, cookie, ip , access_id
'9741112345', '', '1.2.3.4', 1 // may be from sms so cookie is not present
'9741112346', '', '1.2.3.4', 2
'', 'aa', '1.2.3.4', 3
'', 'bb', '1.2.3.4', 4
'','', '1.2.3.5',5
'','','1.2.3.4',6

There are 6 unique users .

in MySQL we can handle like

select count(distinct if(mobile_no !='', mobile_no, if(cookie != '',
cookie,ip)) from table.

Is it possible to do the same thing in Hive in one query itself?
To be more specific can I do IF (control functions) in Hive?


Amlan


Re: if query in hive

2011-02-03 Thread Viral Bajaria
http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF

http://wiki.apache.org/hadoop/Hive/LanguageManual/UDFcheck conditional
functions in the link above, it has the IF and CASE statement definitions. I
am guessing some of them might not work with older version of Hive but not
too sure.

On Thu, Feb 3, 2011 at 11:38 PM, Amlan Mandal am...@fourint.com wrote:

 Actually I need to port some SQL queries to hive QL.

 Lets say I have hive table t which has columns mobile_no, cookie, ip,
 access_id.

 Lets say I want to count unique users. My definition of of unique user =
 all unique mobile numbers + all unique cookie (if for them mobile number not
 present) + all unique ip ( where both mobile number and cookie is not
 present)

 For example:

 mobile_no, cookie, ip , access_id
 '9741112345', '', '1.2.3.4', 1 // may be from sms so cookie is not present
 '9741112346', '', '1.2.3.4', 2
 '', 'aa', '1.2.3.4', 3
 '', 'bb', '1.2.3.4', 4
 '','', '1.2.3.5',5
 '','','1.2.3.4',6

 There are 6 unique users .

 in MySQL we can handle like

 select count(distinct if(mobile_no !='', mobile_no, if(cookie != '',
 cookie,ip)) from table.

 Is it possible to do the same thing in Hive in one query itself?
 To be more specific can I do IF (control functions) in Hive?


 Amlan