File Path and Partition names

2012-10-02 Thread carla.staeben
Quick question about using hive to create new hdfs file paths.

Generally speaking, we like to keep our data files with a path similar to

Dataset/year/month/day/hour

I need to create a new table in hive and populate it with data from a different 
dataset, using a HiveQL query.  If I do this:
CREATE EXTERNAL TABLE IF NOT EXISTS new_table

(field1 string
,field2 string
,field3 string
)
partitioned by (reg_yr string, reg_mon string, reg_day string, reg_hour string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE  ;

And then do an insert overwrite into, I end up with this path in hdfs:

Dataset/reg_year=2012/reg_mon=10/reg_day=02/reg_hour=07

Is there an *easy* way to remove the partition name from the creation of the 
hdfs path?

Thanks
Carla


Re: File Path and Partition names

2012-10-02 Thread Bejoy KS
Hi Carla

If you like to have your custom directory structure for your  partitions. You 
can create dirs in hdfs of your choice , load data into them (If from another 
hive table then you can use 'Insert Overwrite Directory..' To populate an hdfs 
dir). Now you need to register this dir as a new partition on to required table 
using

'Alter Table Add Parition ...'


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-Original Message-
From: carla.stae...@nokia.com
Date: Tue, 2 Oct 2012 10:55:19 
To: user@hive.apache.org
Reply-To: user@hive.apache.org
Subject: File Path and Partition names

Quick question about using hive to create new hdfs file paths.

Generally speaking, we like to keep our data files with a path similar to

Dataset/year/month/day/hour

I need to create a new table in hive and populate it with data from a different 
dataset, using a HiveQL query.  If I do this:
CREATE EXTERNAL TABLE IF NOT EXISTS new_table

(field1 string
,field2 string
,field3 string
)
partitioned by (reg_yr string, reg_mon string, reg_day string, reg_hour string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE  ;

And then do an insert overwrite into, I end up with this path in hdfs:

Dataset/reg_year=2012/reg_mon=10/reg_day=02/reg_hour=07

Is there an *easy* way to remove the partition name from the creation of the 
hdfs path?

Thanks
Carla



RE: File Path and Partition names

2012-10-02 Thread carla.staeben
Thanks Bejoy, I was kind of hoping to avoid all of the 'extra' work...it would 
be nice if hive didn't include the partition name in the path creation...I was 
hoping that there was a 'set' parameter/config I was missing.

Thanks
Carla

From: ext Bejoy KS [mailto:bejoy...@yahoo.com]
Sent: Tuesday, October 02, 2012 08:54
To: user@hive.apache.org
Subject: Re: File Path and Partition names

Hi Carla

If you like to have your custom directory structure for your partitions. You 
can create dirs in hdfs of your choice , load data into them (If from another 
hive table then you can use 'Insert Overwrite Directory..' To populate an hdfs 
dir). Now you need to register this dir as a new partition on to required table 
using

'Alter Table Add Parition ...'
Regards
Bejoy KS

Sent from handheld, please excuse typos.

From: carla.stae...@nokia.commailto:carla.stae...@nokia.com
Date: Tue, 2 Oct 2012 10:55:19 +
To: user@hive.apache.orgmailto:user@hive.apache.org
ReplyTo: user@hive.apache.orgmailto:user@hive.apache.org
Subject: File Path and Partition names

Quick question about using hive to create new hdfs file paths.

Generally speaking, we like to keep our data files with a path similar to

Dataset/year/month/day/hour

I need to create a new table in hive and populate it with data from a different 
dataset, using a HiveQL query.  If I do this:
CREATE EXTERNAL TABLE IF NOT EXISTS new_table

(field1 string
,field2 string
,field3 string
)
partitioned by (reg_yr string, reg_mon string, reg_day string, reg_hour string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE  ;

And then do an insert overwrite into, I end up with this path in hdfs:

Dataset/reg_year=2012/reg_mon=10/reg_day=02/reg_hour=07

Is there an *easy* way to remove the partition name from the creation of the 
hdfs path?

Thanks
Carla


Re: File Path and Partition names

2012-10-02 Thread Doug Houck
Hi Carla,  I assume you are using dynamic partitioning for this, correct??

Assuming so, I have the same question and am trying to figure it out, and will 
let you know if I do.

If you are using static partitions, you just need to specify the location on 
the 'alter table' command when the partition(s) is/are added...

alter table my table add if not exists partition(year=2012,month=10,day=02) 
location '2012/10/02';

Again, I have not yet figured out if I can get this to occur with dynamic 
partitions.



- Original Message -
From: carla staeben carla.stae...@nokia.com
To: user@hive.apache.org, bejoy ks bejoy...@yahoo.com
Sent: Tuesday, October 2, 2012 8:56:50 AM
Subject: RE: File Path and Partition names




Thanks Bejoy, I was kind of hoping to avoid all of the ‘extra’ work…it would be 
nice if hive didn’t include the partition name in the path creation…I was 
hoping that there was a ‘set’ parameter/config I was missing. 



Thanks 

Carla 





From: ext Bejoy KS [mailto:bejoy...@yahoo.com] 
Sent: Tuesday, October 02, 2012 08:54 
To: user@hive.apache.org 
Subject: Re: File Path and Partition names 



Hi Carla 

If you like to have your custom directory structure for your partitions. You 
can create dirs in hdfs of your choice , load data into them (If from another 
hive table then you can use 'Insert Overwrite Directory..' To populate an hdfs 
dir). Now you need to register this dir as a new partition on to required table 
using 

'Alter Table Add Parition ...' 


Regards 
Bejoy KS 

Sent from handheld, please excuse typos. 




From:  carla.stae...@nokia.com  


Date: Tue, 2 Oct 2012 10:55:19 + 


To:  user@hive.apache.org  


ReplyTo: user@hive.apache.org 


Subject: File Path and Partition names 




Quick question about using hive to create new hdfs file paths. 



Generally speaking, we like to keep our data files with a path similar to 



Dataset/year/month/day/hour 



I need to create a new table in hive and populate it with data from a different 
dataset, using a HiveQL query. If I do this: 

CREATE EXTERNAL TABLE IF NOT EXISTS new_table 



(field1 string 

,field2 string 

,field3 string 

) 

partitioned by (reg_yr string, reg_mon string, reg_day string, reg_hour string) 

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ; 



And then do an insert overwrite into, I end up with this path in hdfs: 



Dataset/reg_year=2012/reg_mon=10/reg_day=02/reg_hour=07 



Is there an * easy * way to remove the partition name from the creation of the 
hdfs path? 



Thanks 

Carla


RE: File Path and Partition names

2012-10-02 Thread carla.staeben
Yep, dynamic.

Let me know if you figure something out.  I'd hate to have to go through all of 
the trouble to etl the data and then create tables on top with the alter table 
command.  Such a waste of time and effort...

Carla

-Original Message-
From: ext Doug Houck [mailto:doug.ho...@trustedconcepts.com] 
Sent: Tuesday, October 02, 2012 09:11
To: user@hive.apache.org
Cc: bejoy ks; Staeben Carla (Nokia-LC/Boston)
Subject: Re: File Path and Partition names

Hi Carla,  I assume you are using dynamic partitioning for this, correct??

Assuming so, I have the same question and am trying to figure it out, and will 
let you know if I do.

If you are using static partitions, you just need to specify the location on 
the 'alter table' command when the partition(s) is/are added...

alter table my table add if not exists partition(year=2012,month=10,day=02) 
location '2012/10/02';

Again, I have not yet figured out if I can get this to occur with dynamic 
partitions.



- Original Message -
From: carla staeben carla.stae...@nokia.com
To: user@hive.apache.org, bejoy ks bejoy...@yahoo.com
Sent: Tuesday, October 2, 2012 8:56:50 AM
Subject: RE: File Path and Partition names




Thanks Bejoy, I was kind of hoping to avoid all of the ‘extra’ work…it would be 
nice if hive didn’t include the partition name in the path creation…I was 
hoping that there was a ‘set’ parameter/config I was missing. 



Thanks 

Carla 





From: ext Bejoy KS [mailto:bejoy...@yahoo.com] 
Sent: Tuesday, October 02, 2012 08:54 
To: user@hive.apache.org 
Subject: Re: File Path and Partition names 



Hi Carla 

If you like to have your custom directory structure for your partitions. You 
can create dirs in hdfs of your choice , load data into them (If from another 
hive table then you can use 'Insert Overwrite Directory..' To populate an hdfs 
dir). Now you need to register this dir as a new partition on to required table 
using 

'Alter Table Add Parition ...' 


Regards 
Bejoy KS 

Sent from handheld, please excuse typos. 




From:  carla.stae...@nokia.com  


Date: Tue, 2 Oct 2012 10:55:19 + 


To:  user@hive.apache.org  


ReplyTo: user@hive.apache.org 


Subject: File Path and Partition names 




Quick question about using hive to create new hdfs file paths. 



Generally speaking, we like to keep our data files with a path similar to 



Dataset/year/month/day/hour 



I need to create a new table in hive and populate it with data from a different 
dataset, using a HiveQL query. If I do this: 

CREATE EXTERNAL TABLE IF NOT EXISTS new_table 



(field1 string 

,field2 string 

,field3 string 

) 

partitioned by (reg_yr string, reg_mon string, reg_day string, reg_hour string) 

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ; 



And then do an insert overwrite into, I end up with this path in hdfs: 



Dataset/reg_year=2012/reg_mon=10/reg_day=02/reg_hour=07 



Is there an * easy * way to remove the partition name from the creation of the 
hdfs path? 



Thanks 

Carla


Hive and RESTFul with RESTEasy (jax-rs)

2012-10-02 Thread Zebeljan, Nebojsa
Hi,
I'm very new to Hive and I need to approach how to fire Hive sql queries via 
the RESTEasy framework and to stream back the query result as a JSON string to 
the client.

I wonder, if there is any approach or best practice how I can achieve this with 
Hive and a RESTFul service.

Thanks in advance!

Regards,
Nebo


RE: Hive does not run - Typical NoSuchFieldError

2012-10-02 Thread Connell, Chuck
Try the easy way... Cloudera CDH4 running on Centos 5.8. Can install everything 
on one machine.

Chuck


From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com]
Sent: Tuesday, October 02, 2012 1:23 PM
To: user@hive.apache.org
Subject: Hive does not run - Typical NoSuchFieldError

I've tried different attempts to get Hive running (Riptano GitHub and the SVN 
trunk) but nothing seems to work.

The ql module seems to be the issue. I've noticed many posts about replacing 
the antlr jars for compilation and running but none of these version 
(3.0.1-3.4.1) work.

I've also tried downloading the bundle from 
hive.apache.orghttp://hive.apache.org and this still gives me the same error.

Env:
Mac OS X
Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811)
Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode)

I'm going to try 32-bit mode to see if this is the problem otherwise are there 
any other suggestions?

Anthony



Re: Hive does not run - Typical NoSuchFieldError

2012-10-02 Thread Anthony Ikeda
Unfortunately not an option. This is an internal application and if I can't
get it running locally, then it's no longer a tech option.

I know I've had this working in the past but it seems that when the
HiveLexer is generated in the ql project, the type field definition is
not created - each time a command is resolved the member variable type is
meant to be set but none of the parent classes define this field.

I'll try and understand Antlr a little more to see if there is meant to be
a more explicit declaration to be made but I'm giving up if I can't resolve
this today.

Anthony


On Tue, Oct 2, 2012 at 10:26 AM, Connell, Chuck chuck.conn...@nuance.comwrote:

  Try the easy way… Cloudera CDH4 running on Centos 5.8. Can install
 everything on one machine.

 ** **

 Chuck

 ** **

 ** **

 *From:* Anthony Ikeda [mailto:anthony.ikeda@gmail.com]
 *Sent:* Tuesday, October 02, 2012 1:23 PM
 *To:* user@hive.apache.org
 *Subject:* Hive does not run - Typical NoSuchFieldError

 ** **

 I've tried different attempts to get Hive running (Riptano GitHub and the
 SVN trunk) but nothing seems to work. 

 ** **

 The ql module seems to be the issue. I've noticed many posts about
 replacing the antlr jars for compilation and running but none of these
 version (3.0.1-3.4.1) work.

 ** **

 I've also tried downloading the bundle from hive.apache.org and this
 still gives me the same error.

 ** **

 Env:

 Mac OS X

 Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811)

 Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode)

 ** **

 I'm going to try 32-bit mode to see if this is the problem otherwise are
 there any other suggestions?

 ** **

 Anthony

 ** **



Re: Hive does not run - Typical NoSuchFieldError

2012-10-02 Thread Edward Capriolo
You are in a heap of trouble. The problem is Cassandra and Hive use
different versions of ANTLR and when you get two versions of antlr on
a single java classpath, well you get your result.

I have talked to a few people about this and the only way to handle
this is tools like jarjar that edit class files so there are no name
collisions. The other option is trying to build one tool with the
other tools antlr.

Edward

On Tue, Oct 2, 2012 at 1:57 PM, Anthony Ikeda
anthony.ikeda@gmail.com wrote:
 Unfortunately not an option. This is an internal application and if I can't
 get it running locally, then it's no longer a tech option.

 I know I've had this working in the past but it seems that when the
 HiveLexer is generated in the ql project, the type field definition is not
 created - each time a command is resolved the member variable type is
 meant to be set but none of the parent classes define this field.

 I'll try and understand Antlr a little more to see if there is meant to be a
 more explicit declaration to be made but I'm giving up if I can't resolve
 this today.

 Anthony


 On Tue, Oct 2, 2012 at 10:26 AM, Connell, Chuck chuck.conn...@nuance.com
 wrote:

 Try the easy way… Cloudera CDH4 running on Centos 5.8. Can install
 everything on one machine.



 Chuck





 From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com]
 Sent: Tuesday, October 02, 2012 1:23 PM
 To: user@hive.apache.org
 Subject: Hive does not run - Typical NoSuchFieldError



 I've tried different attempts to get Hive running (Riptano GitHub and the
 SVN trunk) but nothing seems to work.



 The ql module seems to be the issue. I've noticed many posts about
 replacing the antlr jars for compilation and running but none of these
 version (3.0.1-3.4.1) work.



 I've also tried downloading the bundle from hive.apache.org and this still
 gives me the same error.



 Env:

 Mac OS X

 Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811)

 Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode)



 I'm going to try 32-bit mode to see if this is the problem otherwise are
 there any other suggestions?



 Anthony






Re: Hive does not run - Typical NoSuchFieldError

2012-10-02 Thread Anthony Ikeda
Yeah I get this. I've tried the different branches in the GitHub repository
(cassandra-0.7, cassandra-1.0, etc) but all seem to yield the same issue.
Even the 0.9.0 tar.gz download is exhibiting these issues and I don't think
that is actually using cassandra jar files. I might also look at the
version of Hadoop I have set -0 currently I have hadoop-0.20.205.0 but I'm
trying to locate the base antlr Lexer class that defines the type field
but so far no luck - it doesn't exist.

As in the Hive.g file there is no top level variable for type.

I could fall back to CQL but there are a lot of features (e.g. JSON
parsing) in Hive that I was hoping to make use of.

I'll also see if I can build just the 'ql' project independently.

Anthony



On Tue, Oct 2, 2012 at 11:04 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 You are in a heap of trouble. The problem is Cassandra and Hive use
 different versions of ANTLR and when you get two versions of antlr on
 a single java classpath, well you get your result.

 I have talked to a few people about this and the only way to handle
 this is tools like jarjar that edit class files so there are no name
 collisions. The other option is trying to build one tool with the
 other tools antlr.

 Edward

 On Tue, Oct 2, 2012 at 1:57 PM, Anthony Ikeda
 anthony.ikeda@gmail.com wrote:
  Unfortunately not an option. This is an internal application and if I
 can't
  get it running locally, then it's no longer a tech option.
 
  I know I've had this working in the past but it seems that when the
  HiveLexer is generated in the ql project, the type field definition is
 not
  created - each time a command is resolved the member variable type is
  meant to be set but none of the parent classes define this field.
 
  I'll try and understand Antlr a little more to see if there is meant to
 be a
  more explicit declaration to be made but I'm giving up if I can't resolve
  this today.
 
  Anthony
 
 
  On Tue, Oct 2, 2012 at 10:26 AM, Connell, Chuck 
 chuck.conn...@nuance.com
  wrote:
 
  Try the easy way… Cloudera CDH4 running on Centos 5.8. Can install
  everything on one machine.
 
 
 
  Chuck
 
 
 
 
 
  From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com]
  Sent: Tuesday, October 02, 2012 1:23 PM
  To: user@hive.apache.org
  Subject: Hive does not run - Typical NoSuchFieldError
 
 
 
  I've tried different attempts to get Hive running (Riptano GitHub and
 the
  SVN trunk) but nothing seems to work.
 
 
 
  The ql module seems to be the issue. I've noticed many posts about
  replacing the antlr jars for compilation and running but none of these
  version (3.0.1-3.4.1) work.
 
 
 
  I've also tried downloading the bundle from hive.apache.org and this
 still
  gives me the same error.
 
 
 
  Env:
 
  Mac OS X
 
  Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811)
 
  Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode)
 
 
 
  I'm going to try 32-bit mode to see if this is the problem otherwise are
  there any other suggestions?
 
 
 
  Anthony
 
 
 
 



RE: Hive does not run - Typical NoSuchFieldError

2012-10-02 Thread Connell, Chuck
Seems easier to create a new VM, install CentOS and CDH4 on it, and you are off 
and running. This setup runs pretty much perfectly on the first try. I have 
built 5-6 of them.

Why do you have to do it on a Mac?

Chuck


From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com]
Sent: Tuesday, October 02, 2012 2:12 PM
To: user@hive.apache.org
Subject: Re: Hive does not run - Typical NoSuchFieldError

Yeah I get this. I've tried the different branches in the GitHub repository 
(cassandra-0.7, cassandra-1.0, etc) but all seem to yield the same issue. Even 
the 0.9.0 tar.gz download is exhibiting these issues and I don't think that is 
actually using cassandra jar files. I might also look at the version of Hadoop 
I have set -0 currently I have hadoop-0.20.205.0 but I'm trying to locate the 
base antlr Lexer class that defines the type field but so far no luck - it 
doesn't exist.

As in the Hive.g file there is no top level variable for type.

I could fall back to CQL but there are a lot of features (e.g. JSON parsing) in 
Hive that I was hoping to make use of.

I'll also see if I can build just the 'ql' project independently.

Anthony


On Tue, Oct 2, 2012 at 11:04 AM, Edward Capriolo 
edlinuxg...@gmail.commailto:edlinuxg...@gmail.com wrote:
You are in a heap of trouble. The problem is Cassandra and Hive use
different versions of ANTLR and when you get two versions of antlr on
a single java classpath, well you get your result.

I have talked to a few people about this and the only way to handle
this is tools like jarjar that edit class files so there are no name
collisions. The other option is trying to build one tool with the
other tools antlr.

Edward

On Tue, Oct 2, 2012 at 1:57 PM, Anthony Ikeda
anthony.ikeda@gmail.commailto:anthony.ikeda@gmail.com wrote:
 Unfortunately not an option. This is an internal application and if I can't
 get it running locally, then it's no longer a tech option.

 I know I've had this working in the past but it seems that when the
 HiveLexer is generated in the ql project, the type field definition is not
 created - each time a command is resolved the member variable type is
 meant to be set but none of the parent classes define this field.

 I'll try and understand Antlr a little more to see if there is meant to be a
 more explicit declaration to be made but I'm giving up if I can't resolve
 this today.

 Anthony


 On Tue, Oct 2, 2012 at 10:26 AM, Connell, Chuck 
 chuck.conn...@nuance.commailto:chuck.conn...@nuance.com
 wrote:

 Try the easy way... Cloudera CDH4 running on Centos 5.8. Can install
 everything on one machine.



 Chuck





 From: Anthony Ikeda 
 [mailto:anthony.ikeda@gmail.commailto:anthony.ikeda@gmail.com]
 Sent: Tuesday, October 02, 2012 1:23 PM
 To: user@hive.apache.orgmailto:user@hive.apache.org
 Subject: Hive does not run - Typical NoSuchFieldError



 I've tried different attempts to get Hive running (Riptano GitHub and the
 SVN trunk) but nothing seems to work.



 The ql module seems to be the issue. I've noticed many posts about
 replacing the antlr jars for compilation and running but none of these
 version (3.0.1-3.4.1) work.



 I've also tried downloading the bundle from 
 hive.apache.orghttp://hive.apache.org and this still
 gives me the same error.



 Env:

 Mac OS X

 Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811)

 Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode)



 I'm going to try 32-bit mode to see if this is the problem otherwise are
 there any other suggestions?



 Anthony







Re: Hive does not run - Typical NoSuchFieldError

2012-10-02 Thread Edward Capriolo
If you are trying to build the brisk versions of hive with cassandra
support chosing a VM and cdh are not going to help you with this
issue.

Edward

On Tue, Oct 2, 2012 at 2:31 PM, Anthony Ikeda
anthony.ikeda@gmail.com wrote:
 Yeah I think the Vm is the next option. We run RedHat, I'll try that first.

 Running on Mac was just to spike the tech, if running a VM means better
 compatibility then I guess I'll take that route instead.

 Thanks Chuck.

 On Tue, Oct 2, 2012 at 11:19 AM, Connell, Chuck chuck.conn...@nuance.com
 wrote:

 Seems easier to create a new VM, install CentOS and CDH4 on it, and you
 are off and running. This setup runs pretty much perfectly on the first try.
 I have built 5-6 of them.



 Why do you have to do it on a Mac?



 Chuck





 From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com]
 Sent: Tuesday, October 02, 2012 2:12 PM
 To: user@hive.apache.org
 Subject: Re: Hive does not run - Typical NoSuchFieldError



 Yeah I get this. I've tried the different branches in the GitHub
 repository (cassandra-0.7, cassandra-1.0, etc) but all seem to yield the
 same issue. Even the 0.9.0 tar.gz download is exhibiting these issues and I
 don't think that is actually using cassandra jar files. I might also look at
 the version of Hadoop I have set -0 currently I have hadoop-0.20.205.0 but
 I'm trying to locate the base antlr Lexer class that defines the type
 field but so far no luck - it doesn't exist.



 As in the Hive.g file there is no top level variable for type.



 I could fall back to CQL but there are a lot of features (e.g. JSON
 parsing) in Hive that I was hoping to make use of.



 I'll also see if I can build just the 'ql' project independently.



 Anthony





 On Tue, Oct 2, 2012 at 11:04 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

 You are in a heap of trouble. The problem is Cassandra and Hive use
 different versions of ANTLR and when you get two versions of antlr on
 a single java classpath, well you get your result.

 I have talked to a few people about this and the only way to handle
 this is tools like jarjar that edit class files so there are no name
 collisions. The other option is trying to build one tool with the
 other tools antlr.

 Edward


 On Tue, Oct 2, 2012 at 1:57 PM, Anthony Ikeda
 anthony.ikeda@gmail.com wrote:
  Unfortunately not an option. This is an internal application and if I
  can't
  get it running locally, then it's no longer a tech option.
 
  I know I've had this working in the past but it seems that when the
  HiveLexer is generated in the ql project, the type field definition is
  not
  created - each time a command is resolved the member variable type is
  meant to be set but none of the parent classes define this field.
 
  I'll try and understand Antlr a little more to see if there is meant to
  be a
  more explicit declaration to be made but I'm giving up if I can't
  resolve
  this today.
 
  Anthony
 
 
  On Tue, Oct 2, 2012 at 10:26 AM, Connell, Chuck
  chuck.conn...@nuance.com
  wrote:
 
  Try the easy way… Cloudera CDH4 running on Centos 5.8. Can install
  everything on one machine.
 
 
 
  Chuck
 
 
 
 
 
  From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com]
  Sent: Tuesday, October 02, 2012 1:23 PM
  To: user@hive.apache.org
  Subject: Hive does not run - Typical NoSuchFieldError
 
 
 
  I've tried different attempts to get Hive running (Riptano GitHub and
  the
  SVN trunk) but nothing seems to work.
 
 
 
  The ql module seems to be the issue. I've noticed many posts about
  replacing the antlr jars for compilation and running but none of these
  version (3.0.1-3.4.1) work.
 
 
 
  I've also tried downloading the bundle from hive.apache.org and this
  still
  gives me the same error.
 
 
 
  Env:
 
  Mac OS X
 
  Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811)
 
  Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode)
 
 
 
  I'm going to try 32-bit mode to see if this is the problem otherwise
  are
  there any other suggestions?
 
 
 
  Anthony
 
 
 
 






Re: Hive does not run - Typical NoSuchFieldError

2012-10-02 Thread Anthony Ikeda
So is the Hive with Casaandra Data Handler officially not working? I.e the 
riptano git repository branch cassandra-1.0

Sent from my [6th Gen] iPhone

On 02/10/2012, at 12:08, Edward Capriolo edlinuxg...@gmail.com wrote:

 If you are trying to build the brisk versions of hive with cassandra
 support chosing a VM and cdh are not going to help you with this
 issue.
 
 Edward
 
 On Tue, Oct 2, 2012 at 2:31 PM, Anthony Ikeda
 anthony.ikeda@gmail.com wrote:
 Yeah I think the Vm is the next option. We run RedHat, I'll try that first.
 
 Running on Mac was just to spike the tech, if running a VM means better
 compatibility then I guess I'll take that route instead.
 
 Thanks Chuck.
 
 On Tue, Oct 2, 2012 at 11:19 AM, Connell, Chuck chuck.conn...@nuance.com
 wrote:
 
 Seems easier to create a new VM, install CentOS and CDH4 on it, and you
 are off and running. This setup runs pretty much perfectly on the first try.
 I have built 5-6 of them.
 
 
 
 Why do you have to do it on a Mac?
 
 
 
 Chuck
 
 
 
 
 
 From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com]
 Sent: Tuesday, October 02, 2012 2:12 PM
 To: user@hive.apache.org
 Subject: Re: Hive does not run - Typical NoSuchFieldError
 
 
 
 Yeah I get this. I've tried the different branches in the GitHub
 repository (cassandra-0.7, cassandra-1.0, etc) but all seem to yield the
 same issue. Even the 0.9.0 tar.gz download is exhibiting these issues and I
 don't think that is actually using cassandra jar files. I might also look at
 the version of Hadoop I have set -0 currently I have hadoop-0.20.205.0 but
 I'm trying to locate the base antlr Lexer class that defines the type
 field but so far no luck - it doesn't exist.
 
 
 
 As in the Hive.g file there is no top level variable for type.
 
 
 
 I could fall back to CQL but there are a lot of features (e.g. JSON
 parsing) in Hive that I was hoping to make use of.
 
 
 
 I'll also see if I can build just the 'ql' project independently.
 
 
 
 Anthony
 
 
 
 
 
 On Tue, Oct 2, 2012 at 11:04 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
 
 You are in a heap of trouble. The problem is Cassandra and Hive use
 different versions of ANTLR and when you get two versions of antlr on
 a single java classpath, well you get your result.
 
 I have talked to a few people about this and the only way to handle
 this is tools like jarjar that edit class files so there are no name
 collisions. The other option is trying to build one tool with the
 other tools antlr.
 
 Edward
 
 
 On Tue, Oct 2, 2012 at 1:57 PM, Anthony Ikeda
 anthony.ikeda@gmail.com wrote:
 Unfortunately not an option. This is an internal application and if I
 can't
 get it running locally, then it's no longer a tech option.
 
 I know I've had this working in the past but it seems that when the
 HiveLexer is generated in the ql project, the type field definition is
 not
 created - each time a command is resolved the member variable type is
 meant to be set but none of the parent classes define this field.
 
 I'll try and understand Antlr a little more to see if there is meant to
 be a
 more explicit declaration to be made but I'm giving up if I can't
 resolve
 this today.
 
 Anthony
 
 
 On Tue, Oct 2, 2012 at 10:26 AM, Connell, Chuck
 chuck.conn...@nuance.com
 wrote:
 
 Try the easy way… Cloudera CDH4 running on Centos 5.8. Can install
 everything on one machine.
 
 
 
 Chuck
 
 
 
 
 
 From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com]
 Sent: Tuesday, October 02, 2012 1:23 PM
 To: user@hive.apache.org
 Subject: Hive does not run - Typical NoSuchFieldError
 
 
 
 I've tried different attempts to get Hive running (Riptano GitHub and
 the
 SVN trunk) but nothing seems to work.
 
 
 
 The ql module seems to be the issue. I've noticed many posts about
 replacing the antlr jars for compilation and running but none of these
 version (3.0.1-3.4.1) work.
 
 
 
 I've also tried downloading the bundle from hive.apache.org and this
 still
 gives me the same error.
 
 
 
 Env:
 
 Mac OS X
 
 Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811)
 
 Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode)
 
 
 
 I'm going to try 32-bit mode to see if this is the problem otherwise
 are
 there any other suggestions?
 
 
 
 Anthony
 
 


Re: Hive does not run - Typical NoSuchFieldError

2012-10-02 Thread Edward Capriolo
If your working on datastax/riptano branch you probably should take
this up on one their forums.

Edward

On Tue, Oct 2, 2012 at 3:35 PM, Anthony Ikeda
anthony.ikeda@gmail.com wrote:
 So is the Hive with Casaandra Data Handler officially not working? I.e the 
 riptano git repository branch cassandra-1.0

 Sent from my [6th Gen] iPhone

 On 02/10/2012, at 12:08, Edward Capriolo edlinuxg...@gmail.com wrote:

 If you are trying to build the brisk versions of hive with cassandra
 support chosing a VM and cdh are not going to help you with this
 issue.

 Edward

 On Tue, Oct 2, 2012 at 2:31 PM, Anthony Ikeda
 anthony.ikeda@gmail.com wrote:
 Yeah I think the Vm is the next option. We run RedHat, I'll try that first.

 Running on Mac was just to spike the tech, if running a VM means better
 compatibility then I guess I'll take that route instead.

 Thanks Chuck.

 On Tue, Oct 2, 2012 at 11:19 AM, Connell, Chuck chuck.conn...@nuance.com
 wrote:

 Seems easier to create a new VM, install CentOS and CDH4 on it, and you
 are off and running. This setup runs pretty much perfectly on the first 
 try.
 I have built 5-6 of them.



 Why do you have to do it on a Mac?



 Chuck





 From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com]
 Sent: Tuesday, October 02, 2012 2:12 PM
 To: user@hive.apache.org
 Subject: Re: Hive does not run - Typical NoSuchFieldError



 Yeah I get this. I've tried the different branches in the GitHub
 repository (cassandra-0.7, cassandra-1.0, etc) but all seem to yield the
 same issue. Even the 0.9.0 tar.gz download is exhibiting these issues and I
 don't think that is actually using cassandra jar files. I might also look 
 at
 the version of Hadoop I have set -0 currently I have hadoop-0.20.205.0 but
 I'm trying to locate the base antlr Lexer class that defines the type
 field but so far no luck - it doesn't exist.



 As in the Hive.g file there is no top level variable for type.



 I could fall back to CQL but there are a lot of features (e.g. JSON
 parsing) in Hive that I was hoping to make use of.



 I'll also see if I can build just the 'ql' project independently.



 Anthony





 On Tue, Oct 2, 2012 at 11:04 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

 You are in a heap of trouble. The problem is Cassandra and Hive use
 different versions of ANTLR and when you get two versions of antlr on
 a single java classpath, well you get your result.

 I have talked to a few people about this and the only way to handle
 this is tools like jarjar that edit class files so there are no name
 collisions. The other option is trying to build one tool with the
 other tools antlr.

 Edward


 On Tue, Oct 2, 2012 at 1:57 PM, Anthony Ikeda
 anthony.ikeda@gmail.com wrote:
 Unfortunately not an option. This is an internal application and if I
 can't
 get it running locally, then it's no longer a tech option.

 I know I've had this working in the past but it seems that when the
 HiveLexer is generated in the ql project, the type field definition is
 not
 created - each time a command is resolved the member variable type is
 meant to be set but none of the parent classes define this field.

 I'll try and understand Antlr a little more to see if there is meant to
 be a
 more explicit declaration to be made but I'm giving up if I can't
 resolve
 this today.

 Anthony


 On Tue, Oct 2, 2012 at 10:26 AM, Connell, Chuck
 chuck.conn...@nuance.com
 wrote:

 Try the easy way… Cloudera CDH4 running on Centos 5.8. Can install
 everything on one machine.



 Chuck





 From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com]
 Sent: Tuesday, October 02, 2012 1:23 PM
 To: user@hive.apache.org
 Subject: Hive does not run - Typical NoSuchFieldError



 I've tried different attempts to get Hive running (Riptano GitHub and
 the
 SVN trunk) but nothing seems to work.



 The ql module seems to be the issue. I've noticed many posts about
 replacing the antlr jars for compilation and running but none of these
 version (3.0.1-3.4.1) work.



 I've also tried downloading the bundle from hive.apache.org and this
 still
 gives me the same error.



 Env:

 Mac OS X

 Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811)

 Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode)



 I'm going to try 32-bit mode to see if this is the problem otherwise
 are
 there any other suggestions?



 Anthony




Re: best way to load millions of gzip files in hdfs to one table in hive?

2012-10-02 Thread Alexander Pivovarov
Options
1. create table and put files under the table dir

2. create external table and point it to files dir

3. if files are small then I recomend to create new set of files using
simple MR program and specifying number of reduce tasks. Goal is to make
files size  hdfs block size (it safes NN memory and read will be faster)


On Tue, Oct 2, 2012 at 3:53 PM, zuohua zhang zuo...@gmail.com wrote:

 I have millions of gzip files in hdfs (with the same fields), would like
 to load them into one table in hive with a specified schema.
 What is the most efficient ways to do that?
 Given that my data is only in hdfs, and also gzipped, does that mean I
 could just simply set up the table somehow bypassing some unnecessary
 overhead of the typical approach?

 Thanks!



Re: Hive does not run - Typical NoSuchFieldError

2012-10-02 Thread Anthony Ikeda
Okay, thanks Edward!



On Tue, Oct 2, 2012 at 12:38 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 If your working on datastax/riptano branch you probably should take
 this up on one their forums.

 Edward

 On Tue, Oct 2, 2012 at 3:35 PM, Anthony Ikeda
 anthony.ikeda@gmail.com wrote:
  So is the Hive with Casaandra Data Handler officially not working? I.e
 the riptano git repository branch cassandra-1.0
 
  Sent from my [6th Gen] iPhone
 
  On 02/10/2012, at 12:08, Edward Capriolo edlinuxg...@gmail.com wrote:
 
  If you are trying to build the brisk versions of hive with cassandra
  support chosing a VM and cdh are not going to help you with this
  issue.
 
  Edward
 
  On Tue, Oct 2, 2012 at 2:31 PM, Anthony Ikeda
  anthony.ikeda@gmail.com wrote:
  Yeah I think the Vm is the next option. We run RedHat, I'll try that
 first.
 
  Running on Mac was just to spike the tech, if running a VM means better
  compatibility then I guess I'll take that route instead.
 
  Thanks Chuck.
 
  On Tue, Oct 2, 2012 at 11:19 AM, Connell, Chuck 
 chuck.conn...@nuance.com
  wrote:
 
  Seems easier to create a new VM, install CentOS and CDH4 on it, and
 you
  are off and running. This setup runs pretty much perfectly on the
 first try.
  I have built 5-6 of them.
 
 
 
  Why do you have to do it on a Mac?
 
 
 
  Chuck
 
 
 
 
 
  From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com]
  Sent: Tuesday, October 02, 2012 2:12 PM
  To: user@hive.apache.org
  Subject: Re: Hive does not run - Typical NoSuchFieldError
 
 
 
  Yeah I get this. I've tried the different branches in the GitHub
  repository (cassandra-0.7, cassandra-1.0, etc) but all seem to yield
 the
  same issue. Even the 0.9.0 tar.gz download is exhibiting these issues
 and I
  don't think that is actually using cassandra jar files. I might also
 look at
  the version of Hadoop I have set -0 currently I have
 hadoop-0.20.205.0 but
  I'm trying to locate the base antlr Lexer class that defines the
 type
  field but so far no luck - it doesn't exist.
 
 
 
  As in the Hive.g file there is no top level variable for type.
 
 
 
  I could fall back to CQL but there are a lot of features (e.g. JSON
  parsing) in Hive that I was hoping to make use of.
 
 
 
  I'll also see if I can build just the 'ql' project independently.
 
 
 
  Anthony
 
 
 
 
 
  On Tue, Oct 2, 2012 at 11:04 AM, Edward Capriolo 
 edlinuxg...@gmail.com
  wrote:
 
  You are in a heap of trouble. The problem is Cassandra and Hive use
  different versions of ANTLR and when you get two versions of antlr on
  a single java classpath, well you get your result.
 
  I have talked to a few people about this and the only way to handle
  this is tools like jarjar that edit class files so there are no name
  collisions. The other option is trying to build one tool with the
  other tools antlr.
 
  Edward
 
 
  On Tue, Oct 2, 2012 at 1:57 PM, Anthony Ikeda
  anthony.ikeda@gmail.com wrote:
  Unfortunately not an option. This is an internal application and if I
  can't
  get it running locally, then it's no longer a tech option.
 
  I know I've had this working in the past but it seems that when the
  HiveLexer is generated in the ql project, the type field
 definition is
  not
  created - each time a command is resolved the member variable type
 is
  meant to be set but none of the parent classes define this field.
 
  I'll try and understand Antlr a little more to see if there is meant
 to
  be a
  more explicit declaration to be made but I'm giving up if I can't
  resolve
  this today.
 
  Anthony
 
 
  On Tue, Oct 2, 2012 at 10:26 AM, Connell, Chuck
  chuck.conn...@nuance.com
  wrote:
 
  Try the easy way… Cloudera CDH4 running on Centos 5.8. Can install
  everything on one machine.
 
 
 
  Chuck
 
 
 
 
 
  From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com]
  Sent: Tuesday, October 02, 2012 1:23 PM
  To: user@hive.apache.org
  Subject: Hive does not run - Typical NoSuchFieldError
 
 
 
  I've tried different attempts to get Hive running (Riptano GitHub
 and
  the
  SVN trunk) but nothing seems to work.
 
 
 
  The ql module seems to be the issue. I've noticed many posts about
  replacing the antlr jars for compilation and running but none of
 these
  version (3.0.1-3.4.1) work.
 
 
 
  I've also tried downloading the bundle from hive.apache.org and
 this
  still
  gives me the same error.
 
 
 
  Env:
 
  Mac OS X
 
  Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811)
 
  Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode)
 
 
 
  I'm going to try 32-bit mode to see if this is the problem otherwise
  are
  there any other suggestions?
 
 
 
  Anthony
 
 



unsubscribe

2012-10-02 Thread Nic Chidu




Re: best way to load millions of gzip files in hdfs to one table in hive?

2012-10-02 Thread Edward Capriolo
You may want to use:

https://github.com/edwardcapriolo/filecrush

We use this to deal with pathological cases although the best idea is
to avoid big files all together.

Edward

On Tue, Oct 2, 2012 at 4:16 PM, Alexander Pivovarov
apivova...@gmail.com wrote:
 Options
 1. create table and put files under the table dir

 2. create external table and point it to files dir

 3. if files are small then I recomend to create new set of files using
 simple MR program and specifying number of reduce tasks. Goal is to make
 files size  hdfs block size (it safes NN memory and read will be faster)


 On Tue, Oct 2, 2012 at 3:53 PM, zuohua zhang zuo...@gmail.com wrote:

 I have millions of gzip files in hdfs (with the same fields), would like
 to load them into one table in hive with a specified schema.
 What is the most efficient ways to do that?
 Given that my data is only in hdfs, and also gzipped, does that mean I
 could just simply set up the table somehow bypassing some unnecessary
 overhead of the typical approach?

 Thanks!




Re: best way to load millions of gzip files in hdfs to one table in hive?

2012-10-02 Thread Abhishek
Hi Edward,

I am kind of interested in this, for crush to work do we need install any 
thing?? 

How can it be used in a cluster. 

Regards
Abhi

Sent from my iPhone

On Oct 2, 2012, at 5:45 PM, Edward Capriolo edlinuxg...@gmail.com wrote:

 You may want to use:
 
 https://github.com/edwardcapriolo/filecrush
 
 We use this to deal with pathological cases although the best idea is
 to avoid big files all together.
 
 Edward
 
 On Tue, Oct 2, 2012 at 4:16 PM, Alexander Pivovarov
 apivova...@gmail.com wrote:
 Options
 1. create table and put files under the table dir
 
 2. create external table and point it to files dir
 
 3. if files are small then I recomend to create new set of files using
 simple MR program and specifying number of reduce tasks. Goal is to make
 files size  hdfs block size (it safes NN memory and read will be faster)
 
 
 On Tue, Oct 2, 2012 at 3:53 PM, zuohua zhang zuo...@gmail.com wrote:
 
 I have millions of gzip files in hdfs (with the same fields), would like
 to load them into one table in hive with a specified schema.
 What is the most efficient ways to do that?
 Given that my data is only in hdfs, and also gzipped, does that mean I
 could just simply set up the table somehow bypassing some unnecessary
 overhead of the typical approach?
 
 Thanks!
 
 


Re: Hive and RESTFul with RESTEasy (jax-rs)

2012-10-02 Thread MiaoMiao
Don't know any besides JDBC or THRIFT.

On Tue, Oct 2, 2012 at 11:24 PM, Zebeljan, Nebojsa
nebojsa.zebel...@adtech.com wrote:
 Hi,
 I'm very new to Hive and I need to approach how to fire Hive sql queries via
 the RESTEasy framework and to stream back the query result as a JSON string
 to the client.

 I wonder, if there is any approach or best practice how I can achieve this
 with Hive and a RESTFul service.

 Thanks in advance!

 Regards,
 Nebo


Unsubscribe

2012-10-02 Thread Prasanna Kumar Jalakam

  
  
Unsubscribe
-- 
  
  
  
  

   Thanks

 Regards,
  
   Prasanna.J
  R  D
Networks
  IMImobile Pvt.
Ltd.
  Plot 770, Rd. 44
Jubilee Hills
  Hyderabad -
500033
  PH: +91 40 2355
5945 Ext: 230
  www.imimobile.com