File Path and Partition names
Quick question about using hive to create new hdfs file paths. Generally speaking, we like to keep our data files with a path similar to Dataset/year/month/day/hour I need to create a new table in hive and populate it with data from a different dataset, using a HiveQL query. If I do this: CREATE EXTERNAL TABLE IF NOT EXISTS new_table (field1 string ,field2 string ,field3 string ) partitioned by (reg_yr string, reg_mon string, reg_day string, reg_hour string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ; And then do an insert overwrite into, I end up with this path in hdfs: Dataset/reg_year=2012/reg_mon=10/reg_day=02/reg_hour=07 Is there an *easy* way to remove the partition name from the creation of the hdfs path? Thanks Carla
Re: File Path and Partition names
Hi Carla If you like to have your custom directory structure for your partitions. You can create dirs in hdfs of your choice , load data into them (If from another hive table then you can use 'Insert Overwrite Directory..' To populate an hdfs dir). Now you need to register this dir as a new partition on to required table using 'Alter Table Add Parition ...' Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: carla.stae...@nokia.com Date: Tue, 2 Oct 2012 10:55:19 To: user@hive.apache.org Reply-To: user@hive.apache.org Subject: File Path and Partition names Quick question about using hive to create new hdfs file paths. Generally speaking, we like to keep our data files with a path similar to Dataset/year/month/day/hour I need to create a new table in hive and populate it with data from a different dataset, using a HiveQL query. If I do this: CREATE EXTERNAL TABLE IF NOT EXISTS new_table (field1 string ,field2 string ,field3 string ) partitioned by (reg_yr string, reg_mon string, reg_day string, reg_hour string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ; And then do an insert overwrite into, I end up with this path in hdfs: Dataset/reg_year=2012/reg_mon=10/reg_day=02/reg_hour=07 Is there an *easy* way to remove the partition name from the creation of the hdfs path? Thanks Carla
RE: File Path and Partition names
Thanks Bejoy, I was kind of hoping to avoid all of the 'extra' work...it would be nice if hive didn't include the partition name in the path creation...I was hoping that there was a 'set' parameter/config I was missing. Thanks Carla From: ext Bejoy KS [mailto:bejoy...@yahoo.com] Sent: Tuesday, October 02, 2012 08:54 To: user@hive.apache.org Subject: Re: File Path and Partition names Hi Carla If you like to have your custom directory structure for your partitions. You can create dirs in hdfs of your choice , load data into them (If from another hive table then you can use 'Insert Overwrite Directory..' To populate an hdfs dir). Now you need to register this dir as a new partition on to required table using 'Alter Table Add Parition ...' Regards Bejoy KS Sent from handheld, please excuse typos. From: carla.stae...@nokia.commailto:carla.stae...@nokia.com Date: Tue, 2 Oct 2012 10:55:19 + To: user@hive.apache.orgmailto:user@hive.apache.org ReplyTo: user@hive.apache.orgmailto:user@hive.apache.org Subject: File Path and Partition names Quick question about using hive to create new hdfs file paths. Generally speaking, we like to keep our data files with a path similar to Dataset/year/month/day/hour I need to create a new table in hive and populate it with data from a different dataset, using a HiveQL query. If I do this: CREATE EXTERNAL TABLE IF NOT EXISTS new_table (field1 string ,field2 string ,field3 string ) partitioned by (reg_yr string, reg_mon string, reg_day string, reg_hour string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ; And then do an insert overwrite into, I end up with this path in hdfs: Dataset/reg_year=2012/reg_mon=10/reg_day=02/reg_hour=07 Is there an *easy* way to remove the partition name from the creation of the hdfs path? Thanks Carla
Re: File Path and Partition names
Hi Carla, I assume you are using dynamic partitioning for this, correct?? Assuming so, I have the same question and am trying to figure it out, and will let you know if I do. If you are using static partitions, you just need to specify the location on the 'alter table' command when the partition(s) is/are added... alter table my table add if not exists partition(year=2012,month=10,day=02) location '2012/10/02'; Again, I have not yet figured out if I can get this to occur with dynamic partitions. - Original Message - From: carla staeben carla.stae...@nokia.com To: user@hive.apache.org, bejoy ks bejoy...@yahoo.com Sent: Tuesday, October 2, 2012 8:56:50 AM Subject: RE: File Path and Partition names Thanks Bejoy, I was kind of hoping to avoid all of the ‘extra’ work…it would be nice if hive didn’t include the partition name in the path creation…I was hoping that there was a ‘set’ parameter/config I was missing. Thanks Carla From: ext Bejoy KS [mailto:bejoy...@yahoo.com] Sent: Tuesday, October 02, 2012 08:54 To: user@hive.apache.org Subject: Re: File Path and Partition names Hi Carla If you like to have your custom directory structure for your partitions. You can create dirs in hdfs of your choice , load data into them (If from another hive table then you can use 'Insert Overwrite Directory..' To populate an hdfs dir). Now you need to register this dir as a new partition on to required table using 'Alter Table Add Parition ...' Regards Bejoy KS Sent from handheld, please excuse typos. From: carla.stae...@nokia.com Date: Tue, 2 Oct 2012 10:55:19 + To: user@hive.apache.org ReplyTo: user@hive.apache.org Subject: File Path and Partition names Quick question about using hive to create new hdfs file paths. Generally speaking, we like to keep our data files with a path similar to Dataset/year/month/day/hour I need to create a new table in hive and populate it with data from a different dataset, using a HiveQL query. If I do this: CREATE EXTERNAL TABLE IF NOT EXISTS new_table (field1 string ,field2 string ,field3 string ) partitioned by (reg_yr string, reg_mon string, reg_day string, reg_hour string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ; And then do an insert overwrite into, I end up with this path in hdfs: Dataset/reg_year=2012/reg_mon=10/reg_day=02/reg_hour=07 Is there an * easy * way to remove the partition name from the creation of the hdfs path? Thanks Carla
RE: File Path and Partition names
Yep, dynamic. Let me know if you figure something out. I'd hate to have to go through all of the trouble to etl the data and then create tables on top with the alter table command. Such a waste of time and effort... Carla -Original Message- From: ext Doug Houck [mailto:doug.ho...@trustedconcepts.com] Sent: Tuesday, October 02, 2012 09:11 To: user@hive.apache.org Cc: bejoy ks; Staeben Carla (Nokia-LC/Boston) Subject: Re: File Path and Partition names Hi Carla, I assume you are using dynamic partitioning for this, correct?? Assuming so, I have the same question and am trying to figure it out, and will let you know if I do. If you are using static partitions, you just need to specify the location on the 'alter table' command when the partition(s) is/are added... alter table my table add if not exists partition(year=2012,month=10,day=02) location '2012/10/02'; Again, I have not yet figured out if I can get this to occur with dynamic partitions. - Original Message - From: carla staeben carla.stae...@nokia.com To: user@hive.apache.org, bejoy ks bejoy...@yahoo.com Sent: Tuesday, October 2, 2012 8:56:50 AM Subject: RE: File Path and Partition names Thanks Bejoy, I was kind of hoping to avoid all of the ‘extra’ work…it would be nice if hive didn’t include the partition name in the path creation…I was hoping that there was a ‘set’ parameter/config I was missing. Thanks Carla From: ext Bejoy KS [mailto:bejoy...@yahoo.com] Sent: Tuesday, October 02, 2012 08:54 To: user@hive.apache.org Subject: Re: File Path and Partition names Hi Carla If you like to have your custom directory structure for your partitions. You can create dirs in hdfs of your choice , load data into them (If from another hive table then you can use 'Insert Overwrite Directory..' To populate an hdfs dir). Now you need to register this dir as a new partition on to required table using 'Alter Table Add Parition ...' Regards Bejoy KS Sent from handheld, please excuse typos. From: carla.stae...@nokia.com Date: Tue, 2 Oct 2012 10:55:19 + To: user@hive.apache.org ReplyTo: user@hive.apache.org Subject: File Path and Partition names Quick question about using hive to create new hdfs file paths. Generally speaking, we like to keep our data files with a path similar to Dataset/year/month/day/hour I need to create a new table in hive and populate it with data from a different dataset, using a HiveQL query. If I do this: CREATE EXTERNAL TABLE IF NOT EXISTS new_table (field1 string ,field2 string ,field3 string ) partitioned by (reg_yr string, reg_mon string, reg_day string, reg_hour string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ; And then do an insert overwrite into, I end up with this path in hdfs: Dataset/reg_year=2012/reg_mon=10/reg_day=02/reg_hour=07 Is there an * easy * way to remove the partition name from the creation of the hdfs path? Thanks Carla
Hive and RESTFul with RESTEasy (jax-rs)
Hi, I'm very new to Hive and I need to approach how to fire Hive sql queries via the RESTEasy framework and to stream back the query result as a JSON string to the client. I wonder, if there is any approach or best practice how I can achieve this with Hive and a RESTFul service. Thanks in advance! Regards, Nebo
RE: Hive does not run - Typical NoSuchFieldError
Try the easy way... Cloudera CDH4 running on Centos 5.8. Can install everything on one machine. Chuck From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com] Sent: Tuesday, October 02, 2012 1:23 PM To: user@hive.apache.org Subject: Hive does not run - Typical NoSuchFieldError I've tried different attempts to get Hive running (Riptano GitHub and the SVN trunk) but nothing seems to work. The ql module seems to be the issue. I've noticed many posts about replacing the antlr jars for compilation and running but none of these version (3.0.1-3.4.1) work. I've also tried downloading the bundle from hive.apache.orghttp://hive.apache.org and this still gives me the same error. Env: Mac OS X Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811) Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode) I'm going to try 32-bit mode to see if this is the problem otherwise are there any other suggestions? Anthony
Re: Hive does not run - Typical NoSuchFieldError
Unfortunately not an option. This is an internal application and if I can't get it running locally, then it's no longer a tech option. I know I've had this working in the past but it seems that when the HiveLexer is generated in the ql project, the type field definition is not created - each time a command is resolved the member variable type is meant to be set but none of the parent classes define this field. I'll try and understand Antlr a little more to see if there is meant to be a more explicit declaration to be made but I'm giving up if I can't resolve this today. Anthony On Tue, Oct 2, 2012 at 10:26 AM, Connell, Chuck chuck.conn...@nuance.comwrote: Try the easy way… Cloudera CDH4 running on Centos 5.8. Can install everything on one machine. ** ** Chuck ** ** ** ** *From:* Anthony Ikeda [mailto:anthony.ikeda@gmail.com] *Sent:* Tuesday, October 02, 2012 1:23 PM *To:* user@hive.apache.org *Subject:* Hive does not run - Typical NoSuchFieldError ** ** I've tried different attempts to get Hive running (Riptano GitHub and the SVN trunk) but nothing seems to work. ** ** The ql module seems to be the issue. I've noticed many posts about replacing the antlr jars for compilation and running but none of these version (3.0.1-3.4.1) work. ** ** I've also tried downloading the bundle from hive.apache.org and this still gives me the same error. ** ** Env: Mac OS X Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811) Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode) ** ** I'm going to try 32-bit mode to see if this is the problem otherwise are there any other suggestions? ** ** Anthony ** **
Re: Hive does not run - Typical NoSuchFieldError
You are in a heap of trouble. The problem is Cassandra and Hive use different versions of ANTLR and when you get two versions of antlr on a single java classpath, well you get your result. I have talked to a few people about this and the only way to handle this is tools like jarjar that edit class files so there are no name collisions. The other option is trying to build one tool with the other tools antlr. Edward On Tue, Oct 2, 2012 at 1:57 PM, Anthony Ikeda anthony.ikeda@gmail.com wrote: Unfortunately not an option. This is an internal application and if I can't get it running locally, then it's no longer a tech option. I know I've had this working in the past but it seems that when the HiveLexer is generated in the ql project, the type field definition is not created - each time a command is resolved the member variable type is meant to be set but none of the parent classes define this field. I'll try and understand Antlr a little more to see if there is meant to be a more explicit declaration to be made but I'm giving up if I can't resolve this today. Anthony On Tue, Oct 2, 2012 at 10:26 AM, Connell, Chuck chuck.conn...@nuance.com wrote: Try the easy way… Cloudera CDH4 running on Centos 5.8. Can install everything on one machine. Chuck From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com] Sent: Tuesday, October 02, 2012 1:23 PM To: user@hive.apache.org Subject: Hive does not run - Typical NoSuchFieldError I've tried different attempts to get Hive running (Riptano GitHub and the SVN trunk) but nothing seems to work. The ql module seems to be the issue. I've noticed many posts about replacing the antlr jars for compilation and running but none of these version (3.0.1-3.4.1) work. I've also tried downloading the bundle from hive.apache.org and this still gives me the same error. Env: Mac OS X Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811) Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode) I'm going to try 32-bit mode to see if this is the problem otherwise are there any other suggestions? Anthony
Re: Hive does not run - Typical NoSuchFieldError
Yeah I get this. I've tried the different branches in the GitHub repository (cassandra-0.7, cassandra-1.0, etc) but all seem to yield the same issue. Even the 0.9.0 tar.gz download is exhibiting these issues and I don't think that is actually using cassandra jar files. I might also look at the version of Hadoop I have set -0 currently I have hadoop-0.20.205.0 but I'm trying to locate the base antlr Lexer class that defines the type field but so far no luck - it doesn't exist. As in the Hive.g file there is no top level variable for type. I could fall back to CQL but there are a lot of features (e.g. JSON parsing) in Hive that I was hoping to make use of. I'll also see if I can build just the 'ql' project independently. Anthony On Tue, Oct 2, 2012 at 11:04 AM, Edward Capriolo edlinuxg...@gmail.comwrote: You are in a heap of trouble. The problem is Cassandra and Hive use different versions of ANTLR and when you get two versions of antlr on a single java classpath, well you get your result. I have talked to a few people about this and the only way to handle this is tools like jarjar that edit class files so there are no name collisions. The other option is trying to build one tool with the other tools antlr. Edward On Tue, Oct 2, 2012 at 1:57 PM, Anthony Ikeda anthony.ikeda@gmail.com wrote: Unfortunately not an option. This is an internal application and if I can't get it running locally, then it's no longer a tech option. I know I've had this working in the past but it seems that when the HiveLexer is generated in the ql project, the type field definition is not created - each time a command is resolved the member variable type is meant to be set but none of the parent classes define this field. I'll try and understand Antlr a little more to see if there is meant to be a more explicit declaration to be made but I'm giving up if I can't resolve this today. Anthony On Tue, Oct 2, 2012 at 10:26 AM, Connell, Chuck chuck.conn...@nuance.com wrote: Try the easy way… Cloudera CDH4 running on Centos 5.8. Can install everything on one machine. Chuck From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com] Sent: Tuesday, October 02, 2012 1:23 PM To: user@hive.apache.org Subject: Hive does not run - Typical NoSuchFieldError I've tried different attempts to get Hive running (Riptano GitHub and the SVN trunk) but nothing seems to work. The ql module seems to be the issue. I've noticed many posts about replacing the antlr jars for compilation and running but none of these version (3.0.1-3.4.1) work. I've also tried downloading the bundle from hive.apache.org and this still gives me the same error. Env: Mac OS X Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811) Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode) I'm going to try 32-bit mode to see if this is the problem otherwise are there any other suggestions? Anthony
RE: Hive does not run - Typical NoSuchFieldError
Seems easier to create a new VM, install CentOS and CDH4 on it, and you are off and running. This setup runs pretty much perfectly on the first try. I have built 5-6 of them. Why do you have to do it on a Mac? Chuck From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com] Sent: Tuesday, October 02, 2012 2:12 PM To: user@hive.apache.org Subject: Re: Hive does not run - Typical NoSuchFieldError Yeah I get this. I've tried the different branches in the GitHub repository (cassandra-0.7, cassandra-1.0, etc) but all seem to yield the same issue. Even the 0.9.0 tar.gz download is exhibiting these issues and I don't think that is actually using cassandra jar files. I might also look at the version of Hadoop I have set -0 currently I have hadoop-0.20.205.0 but I'm trying to locate the base antlr Lexer class that defines the type field but so far no luck - it doesn't exist. As in the Hive.g file there is no top level variable for type. I could fall back to CQL but there are a lot of features (e.g. JSON parsing) in Hive that I was hoping to make use of. I'll also see if I can build just the 'ql' project independently. Anthony On Tue, Oct 2, 2012 at 11:04 AM, Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com wrote: You are in a heap of trouble. The problem is Cassandra and Hive use different versions of ANTLR and when you get two versions of antlr on a single java classpath, well you get your result. I have talked to a few people about this and the only way to handle this is tools like jarjar that edit class files so there are no name collisions. The other option is trying to build one tool with the other tools antlr. Edward On Tue, Oct 2, 2012 at 1:57 PM, Anthony Ikeda anthony.ikeda@gmail.commailto:anthony.ikeda@gmail.com wrote: Unfortunately not an option. This is an internal application and if I can't get it running locally, then it's no longer a tech option. I know I've had this working in the past but it seems that when the HiveLexer is generated in the ql project, the type field definition is not created - each time a command is resolved the member variable type is meant to be set but none of the parent classes define this field. I'll try and understand Antlr a little more to see if there is meant to be a more explicit declaration to be made but I'm giving up if I can't resolve this today. Anthony On Tue, Oct 2, 2012 at 10:26 AM, Connell, Chuck chuck.conn...@nuance.commailto:chuck.conn...@nuance.com wrote: Try the easy way... Cloudera CDH4 running on Centos 5.8. Can install everything on one machine. Chuck From: Anthony Ikeda [mailto:anthony.ikeda@gmail.commailto:anthony.ikeda@gmail.com] Sent: Tuesday, October 02, 2012 1:23 PM To: user@hive.apache.orgmailto:user@hive.apache.org Subject: Hive does not run - Typical NoSuchFieldError I've tried different attempts to get Hive running (Riptano GitHub and the SVN trunk) but nothing seems to work. The ql module seems to be the issue. I've noticed many posts about replacing the antlr jars for compilation and running but none of these version (3.0.1-3.4.1) work. I've also tried downloading the bundle from hive.apache.orghttp://hive.apache.org and this still gives me the same error. Env: Mac OS X Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811) Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode) I'm going to try 32-bit mode to see if this is the problem otherwise are there any other suggestions? Anthony
Re: Hive does not run - Typical NoSuchFieldError
If you are trying to build the brisk versions of hive with cassandra support chosing a VM and cdh are not going to help you with this issue. Edward On Tue, Oct 2, 2012 at 2:31 PM, Anthony Ikeda anthony.ikeda@gmail.com wrote: Yeah I think the Vm is the next option. We run RedHat, I'll try that first. Running on Mac was just to spike the tech, if running a VM means better compatibility then I guess I'll take that route instead. Thanks Chuck. On Tue, Oct 2, 2012 at 11:19 AM, Connell, Chuck chuck.conn...@nuance.com wrote: Seems easier to create a new VM, install CentOS and CDH4 on it, and you are off and running. This setup runs pretty much perfectly on the first try. I have built 5-6 of them. Why do you have to do it on a Mac? Chuck From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com] Sent: Tuesday, October 02, 2012 2:12 PM To: user@hive.apache.org Subject: Re: Hive does not run - Typical NoSuchFieldError Yeah I get this. I've tried the different branches in the GitHub repository (cassandra-0.7, cassandra-1.0, etc) but all seem to yield the same issue. Even the 0.9.0 tar.gz download is exhibiting these issues and I don't think that is actually using cassandra jar files. I might also look at the version of Hadoop I have set -0 currently I have hadoop-0.20.205.0 but I'm trying to locate the base antlr Lexer class that defines the type field but so far no luck - it doesn't exist. As in the Hive.g file there is no top level variable for type. I could fall back to CQL but there are a lot of features (e.g. JSON parsing) in Hive that I was hoping to make use of. I'll also see if I can build just the 'ql' project independently. Anthony On Tue, Oct 2, 2012 at 11:04 AM, Edward Capriolo edlinuxg...@gmail.com wrote: You are in a heap of trouble. The problem is Cassandra and Hive use different versions of ANTLR and when you get two versions of antlr on a single java classpath, well you get your result. I have talked to a few people about this and the only way to handle this is tools like jarjar that edit class files so there are no name collisions. The other option is trying to build one tool with the other tools antlr. Edward On Tue, Oct 2, 2012 at 1:57 PM, Anthony Ikeda anthony.ikeda@gmail.com wrote: Unfortunately not an option. This is an internal application and if I can't get it running locally, then it's no longer a tech option. I know I've had this working in the past but it seems that when the HiveLexer is generated in the ql project, the type field definition is not created - each time a command is resolved the member variable type is meant to be set but none of the parent classes define this field. I'll try and understand Antlr a little more to see if there is meant to be a more explicit declaration to be made but I'm giving up if I can't resolve this today. Anthony On Tue, Oct 2, 2012 at 10:26 AM, Connell, Chuck chuck.conn...@nuance.com wrote: Try the easy way… Cloudera CDH4 running on Centos 5.8. Can install everything on one machine. Chuck From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com] Sent: Tuesday, October 02, 2012 1:23 PM To: user@hive.apache.org Subject: Hive does not run - Typical NoSuchFieldError I've tried different attempts to get Hive running (Riptano GitHub and the SVN trunk) but nothing seems to work. The ql module seems to be the issue. I've noticed many posts about replacing the antlr jars for compilation and running but none of these version (3.0.1-3.4.1) work. I've also tried downloading the bundle from hive.apache.org and this still gives me the same error. Env: Mac OS X Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811) Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode) I'm going to try 32-bit mode to see if this is the problem otherwise are there any other suggestions? Anthony
Re: Hive does not run - Typical NoSuchFieldError
So is the Hive with Casaandra Data Handler officially not working? I.e the riptano git repository branch cassandra-1.0 Sent from my [6th Gen] iPhone On 02/10/2012, at 12:08, Edward Capriolo edlinuxg...@gmail.com wrote: If you are trying to build the brisk versions of hive with cassandra support chosing a VM and cdh are not going to help you with this issue. Edward On Tue, Oct 2, 2012 at 2:31 PM, Anthony Ikeda anthony.ikeda@gmail.com wrote: Yeah I think the Vm is the next option. We run RedHat, I'll try that first. Running on Mac was just to spike the tech, if running a VM means better compatibility then I guess I'll take that route instead. Thanks Chuck. On Tue, Oct 2, 2012 at 11:19 AM, Connell, Chuck chuck.conn...@nuance.com wrote: Seems easier to create a new VM, install CentOS and CDH4 on it, and you are off and running. This setup runs pretty much perfectly on the first try. I have built 5-6 of them. Why do you have to do it on a Mac? Chuck From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com] Sent: Tuesday, October 02, 2012 2:12 PM To: user@hive.apache.org Subject: Re: Hive does not run - Typical NoSuchFieldError Yeah I get this. I've tried the different branches in the GitHub repository (cassandra-0.7, cassandra-1.0, etc) but all seem to yield the same issue. Even the 0.9.0 tar.gz download is exhibiting these issues and I don't think that is actually using cassandra jar files. I might also look at the version of Hadoop I have set -0 currently I have hadoop-0.20.205.0 but I'm trying to locate the base antlr Lexer class that defines the type field but so far no luck - it doesn't exist. As in the Hive.g file there is no top level variable for type. I could fall back to CQL but there are a lot of features (e.g. JSON parsing) in Hive that I was hoping to make use of. I'll also see if I can build just the 'ql' project independently. Anthony On Tue, Oct 2, 2012 at 11:04 AM, Edward Capriolo edlinuxg...@gmail.com wrote: You are in a heap of trouble. The problem is Cassandra and Hive use different versions of ANTLR and when you get two versions of antlr on a single java classpath, well you get your result. I have talked to a few people about this and the only way to handle this is tools like jarjar that edit class files so there are no name collisions. The other option is trying to build one tool with the other tools antlr. Edward On Tue, Oct 2, 2012 at 1:57 PM, Anthony Ikeda anthony.ikeda@gmail.com wrote: Unfortunately not an option. This is an internal application and if I can't get it running locally, then it's no longer a tech option. I know I've had this working in the past but it seems that when the HiveLexer is generated in the ql project, the type field definition is not created - each time a command is resolved the member variable type is meant to be set but none of the parent classes define this field. I'll try and understand Antlr a little more to see if there is meant to be a more explicit declaration to be made but I'm giving up if I can't resolve this today. Anthony On Tue, Oct 2, 2012 at 10:26 AM, Connell, Chuck chuck.conn...@nuance.com wrote: Try the easy way… Cloudera CDH4 running on Centos 5.8. Can install everything on one machine. Chuck From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com] Sent: Tuesday, October 02, 2012 1:23 PM To: user@hive.apache.org Subject: Hive does not run - Typical NoSuchFieldError I've tried different attempts to get Hive running (Riptano GitHub and the SVN trunk) but nothing seems to work. The ql module seems to be the issue. I've noticed many posts about replacing the antlr jars for compilation and running but none of these version (3.0.1-3.4.1) work. I've also tried downloading the bundle from hive.apache.org and this still gives me the same error. Env: Mac OS X Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811) Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode) I'm going to try 32-bit mode to see if this is the problem otherwise are there any other suggestions? Anthony
Re: Hive does not run - Typical NoSuchFieldError
If your working on datastax/riptano branch you probably should take this up on one their forums. Edward On Tue, Oct 2, 2012 at 3:35 PM, Anthony Ikeda anthony.ikeda@gmail.com wrote: So is the Hive with Casaandra Data Handler officially not working? I.e the riptano git repository branch cassandra-1.0 Sent from my [6th Gen] iPhone On 02/10/2012, at 12:08, Edward Capriolo edlinuxg...@gmail.com wrote: If you are trying to build the brisk versions of hive with cassandra support chosing a VM and cdh are not going to help you with this issue. Edward On Tue, Oct 2, 2012 at 2:31 PM, Anthony Ikeda anthony.ikeda@gmail.com wrote: Yeah I think the Vm is the next option. We run RedHat, I'll try that first. Running on Mac was just to spike the tech, if running a VM means better compatibility then I guess I'll take that route instead. Thanks Chuck. On Tue, Oct 2, 2012 at 11:19 AM, Connell, Chuck chuck.conn...@nuance.com wrote: Seems easier to create a new VM, install CentOS and CDH4 on it, and you are off and running. This setup runs pretty much perfectly on the first try. I have built 5-6 of them. Why do you have to do it on a Mac? Chuck From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com] Sent: Tuesday, October 02, 2012 2:12 PM To: user@hive.apache.org Subject: Re: Hive does not run - Typical NoSuchFieldError Yeah I get this. I've tried the different branches in the GitHub repository (cassandra-0.7, cassandra-1.0, etc) but all seem to yield the same issue. Even the 0.9.0 tar.gz download is exhibiting these issues and I don't think that is actually using cassandra jar files. I might also look at the version of Hadoop I have set -0 currently I have hadoop-0.20.205.0 but I'm trying to locate the base antlr Lexer class that defines the type field but so far no luck - it doesn't exist. As in the Hive.g file there is no top level variable for type. I could fall back to CQL but there are a lot of features (e.g. JSON parsing) in Hive that I was hoping to make use of. I'll also see if I can build just the 'ql' project independently. Anthony On Tue, Oct 2, 2012 at 11:04 AM, Edward Capriolo edlinuxg...@gmail.com wrote: You are in a heap of trouble. The problem is Cassandra and Hive use different versions of ANTLR and when you get two versions of antlr on a single java classpath, well you get your result. I have talked to a few people about this and the only way to handle this is tools like jarjar that edit class files so there are no name collisions. The other option is trying to build one tool with the other tools antlr. Edward On Tue, Oct 2, 2012 at 1:57 PM, Anthony Ikeda anthony.ikeda@gmail.com wrote: Unfortunately not an option. This is an internal application and if I can't get it running locally, then it's no longer a tech option. I know I've had this working in the past but it seems that when the HiveLexer is generated in the ql project, the type field definition is not created - each time a command is resolved the member variable type is meant to be set but none of the parent classes define this field. I'll try and understand Antlr a little more to see if there is meant to be a more explicit declaration to be made but I'm giving up if I can't resolve this today. Anthony On Tue, Oct 2, 2012 at 10:26 AM, Connell, Chuck chuck.conn...@nuance.com wrote: Try the easy way… Cloudera CDH4 running on Centos 5.8. Can install everything on one machine. Chuck From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com] Sent: Tuesday, October 02, 2012 1:23 PM To: user@hive.apache.org Subject: Hive does not run - Typical NoSuchFieldError I've tried different attempts to get Hive running (Riptano GitHub and the SVN trunk) but nothing seems to work. The ql module seems to be the issue. I've noticed many posts about replacing the antlr jars for compilation and running but none of these version (3.0.1-3.4.1) work. I've also tried downloading the bundle from hive.apache.org and this still gives me the same error. Env: Mac OS X Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811) Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode) I'm going to try 32-bit mode to see if this is the problem otherwise are there any other suggestions? Anthony
Re: best way to load millions of gzip files in hdfs to one table in hive?
Options 1. create table and put files under the table dir 2. create external table and point it to files dir 3. if files are small then I recomend to create new set of files using simple MR program and specifying number of reduce tasks. Goal is to make files size hdfs block size (it safes NN memory and read will be faster) On Tue, Oct 2, 2012 at 3:53 PM, zuohua zhang zuo...@gmail.com wrote: I have millions of gzip files in hdfs (with the same fields), would like to load them into one table in hive with a specified schema. What is the most efficient ways to do that? Given that my data is only in hdfs, and also gzipped, does that mean I could just simply set up the table somehow bypassing some unnecessary overhead of the typical approach? Thanks!
Re: Hive does not run - Typical NoSuchFieldError
Okay, thanks Edward! On Tue, Oct 2, 2012 at 12:38 PM, Edward Capriolo edlinuxg...@gmail.comwrote: If your working on datastax/riptano branch you probably should take this up on one their forums. Edward On Tue, Oct 2, 2012 at 3:35 PM, Anthony Ikeda anthony.ikeda@gmail.com wrote: So is the Hive with Casaandra Data Handler officially not working? I.e the riptano git repository branch cassandra-1.0 Sent from my [6th Gen] iPhone On 02/10/2012, at 12:08, Edward Capriolo edlinuxg...@gmail.com wrote: If you are trying to build the brisk versions of hive with cassandra support chosing a VM and cdh are not going to help you with this issue. Edward On Tue, Oct 2, 2012 at 2:31 PM, Anthony Ikeda anthony.ikeda@gmail.com wrote: Yeah I think the Vm is the next option. We run RedHat, I'll try that first. Running on Mac was just to spike the tech, if running a VM means better compatibility then I guess I'll take that route instead. Thanks Chuck. On Tue, Oct 2, 2012 at 11:19 AM, Connell, Chuck chuck.conn...@nuance.com wrote: Seems easier to create a new VM, install CentOS and CDH4 on it, and you are off and running. This setup runs pretty much perfectly on the first try. I have built 5-6 of them. Why do you have to do it on a Mac? Chuck From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com] Sent: Tuesday, October 02, 2012 2:12 PM To: user@hive.apache.org Subject: Re: Hive does not run - Typical NoSuchFieldError Yeah I get this. I've tried the different branches in the GitHub repository (cassandra-0.7, cassandra-1.0, etc) but all seem to yield the same issue. Even the 0.9.0 tar.gz download is exhibiting these issues and I don't think that is actually using cassandra jar files. I might also look at the version of Hadoop I have set -0 currently I have hadoop-0.20.205.0 but I'm trying to locate the base antlr Lexer class that defines the type field but so far no luck - it doesn't exist. As in the Hive.g file there is no top level variable for type. I could fall back to CQL but there are a lot of features (e.g. JSON parsing) in Hive that I was hoping to make use of. I'll also see if I can build just the 'ql' project independently. Anthony On Tue, Oct 2, 2012 at 11:04 AM, Edward Capriolo edlinuxg...@gmail.com wrote: You are in a heap of trouble. The problem is Cassandra and Hive use different versions of ANTLR and when you get two versions of antlr on a single java classpath, well you get your result. I have talked to a few people about this and the only way to handle this is tools like jarjar that edit class files so there are no name collisions. The other option is trying to build one tool with the other tools antlr. Edward On Tue, Oct 2, 2012 at 1:57 PM, Anthony Ikeda anthony.ikeda@gmail.com wrote: Unfortunately not an option. This is an internal application and if I can't get it running locally, then it's no longer a tech option. I know I've had this working in the past but it seems that when the HiveLexer is generated in the ql project, the type field definition is not created - each time a command is resolved the member variable type is meant to be set but none of the parent classes define this field. I'll try and understand Antlr a little more to see if there is meant to be a more explicit declaration to be made but I'm giving up if I can't resolve this today. Anthony On Tue, Oct 2, 2012 at 10:26 AM, Connell, Chuck chuck.conn...@nuance.com wrote: Try the easy way… Cloudera CDH4 running on Centos 5.8. Can install everything on one machine. Chuck From: Anthony Ikeda [mailto:anthony.ikeda@gmail.com] Sent: Tuesday, October 02, 2012 1:23 PM To: user@hive.apache.org Subject: Hive does not run - Typical NoSuchFieldError I've tried different attempts to get Hive running (Riptano GitHub and the SVN trunk) but nothing seems to work. The ql module seems to be the issue. I've noticed many posts about replacing the antlr jars for compilation and running but none of these version (3.0.1-3.4.1) work. I've also tried downloading the bundle from hive.apache.org and this still gives me the same error. Env: Mac OS X Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811) Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode) I'm going to try 32-bit mode to see if this is the problem otherwise are there any other suggestions? Anthony
unsubscribe
Re: best way to load millions of gzip files in hdfs to one table in hive?
You may want to use: https://github.com/edwardcapriolo/filecrush We use this to deal with pathological cases although the best idea is to avoid big files all together. Edward On Tue, Oct 2, 2012 at 4:16 PM, Alexander Pivovarov apivova...@gmail.com wrote: Options 1. create table and put files under the table dir 2. create external table and point it to files dir 3. if files are small then I recomend to create new set of files using simple MR program and specifying number of reduce tasks. Goal is to make files size hdfs block size (it safes NN memory and read will be faster) On Tue, Oct 2, 2012 at 3:53 PM, zuohua zhang zuo...@gmail.com wrote: I have millions of gzip files in hdfs (with the same fields), would like to load them into one table in hive with a specified schema. What is the most efficient ways to do that? Given that my data is only in hdfs, and also gzipped, does that mean I could just simply set up the table somehow bypassing some unnecessary overhead of the typical approach? Thanks!
Re: best way to load millions of gzip files in hdfs to one table in hive?
Hi Edward, I am kind of interested in this, for crush to work do we need install any thing?? How can it be used in a cluster. Regards Abhi Sent from my iPhone On Oct 2, 2012, at 5:45 PM, Edward Capriolo edlinuxg...@gmail.com wrote: You may want to use: https://github.com/edwardcapriolo/filecrush We use this to deal with pathological cases although the best idea is to avoid big files all together. Edward On Tue, Oct 2, 2012 at 4:16 PM, Alexander Pivovarov apivova...@gmail.com wrote: Options 1. create table and put files under the table dir 2. create external table and point it to files dir 3. if files are small then I recomend to create new set of files using simple MR program and specifying number of reduce tasks. Goal is to make files size hdfs block size (it safes NN memory and read will be faster) On Tue, Oct 2, 2012 at 3:53 PM, zuohua zhang zuo...@gmail.com wrote: I have millions of gzip files in hdfs (with the same fields), would like to load them into one table in hive with a specified schema. What is the most efficient ways to do that? Given that my data is only in hdfs, and also gzipped, does that mean I could just simply set up the table somehow bypassing some unnecessary overhead of the typical approach? Thanks!
Re: Hive and RESTFul with RESTEasy (jax-rs)
Don't know any besides JDBC or THRIFT. On Tue, Oct 2, 2012 at 11:24 PM, Zebeljan, Nebojsa nebojsa.zebel...@adtech.com wrote: Hi, I'm very new to Hive and I need to approach how to fire Hive sql queries via the RESTEasy framework and to stream back the query result as a JSON string to the client. I wonder, if there is any approach or best practice how I can achieve this with Hive and a RESTFul service. Thanks in advance! Regards, Nebo
Unsubscribe
Unsubscribe -- Thanks Regards, Prasanna.J R D Networks IMImobile Pvt. Ltd. Plot 770, Rd. 44 Jubilee Hills Hyderabad - 500033 PH: +91 40 2355 5945 Ext: 230 www.imimobile.com