Re: select count(*) from table;
If you have enabled performance optimization by enabling statistics it will come from there if the underlying file format supports infile statistics (like ORC), it will come from there if its just plain vanilla text file format, it needs to run a job to get the count so the longest of all On Tue, Mar 22, 2016 at 12:44 PM, Amey Barve <ameybarv...@gmail.com> wrote: > select count(*) from table; > > How does hive evaluate count(*) on a table? > > Does it return count by actually querying table, or directly return count > by consulting some statistics locally. > > For Hive's Text format it takes few seconds while Hive's Orc format takes > fraction of seconds. > > Regards, > Amey > -- Nitin Pawar
Re: Importing Oracle data into Hive
check sqoop On Sun, Jan 31, 2016 at 6:36 PM, Ashok Kumar <ashok34...@yahoo.com> wrote: > Hi, > > What is the easiest method of importing data from an Oracle 11g table to > Hive please? This will be a weekly periodic job. The source table has 20 > million rows. > > I am running Hive 1.2.1 > > regards > > > -- Nitin Pawar
Re: How to load XML file in Hive table
take a look at this https://github.com/dvasilen/Hive-XML-SerDe/wiki/XML-data-sources On Mon, Jan 11, 2016 at 9:30 AM, nitinpathakala . <nitinpathak...@gmail.com> wrote: > Hello, > > Any ideas on this . > > Thanks, > Nitin > > On Thu, Jan 7, 2016 at 6:06 PM, nitinpathakala . <nitinpathak...@gmail.com > > wrote: > >> Hello, >> >> We have a requirement to load data from xml file to Hive tables. >> The xml tags woud be the columns and values will be the data for those >> columns. >> Any pointers will be really helpful. >> >> Thanks, >> Nitin >> > > -- Nitin Pawar
Re: Hive Query failing !!!
Ok Sorry my bad I had overlooked your query that you are doing joins via where clause. On Tue, Sep 22, 2015 at 12:20 PM, @Sanjiv Singh <sanjiv.is...@gmail.com> wrote: > Nitin, > > Following setting already there at HIVE. > set hive.exec.mode.local.auto=false; > > Surprisingly , when it did following setting , it started working > set hive.auto.convert.join=true; > > can you please help me understand , what had happened ? > > > > Regards > Sanjiv Singh > Mob : +091 9990-447-339 > > On Tue, Sep 22, 2015 at 11:41 AM, Nitin Pawar <nitinpawar...@gmail.com> > wrote: > >> Can you try setting these >> set hive.exec.mode.local.auto=false; >> >> >> On Tue, Sep 22, 2015 at 11:25 AM, @Sanjiv Singh <sanjiv.is...@gmail.com> >> wrote: >> >>> >>> >>> *Hi Folks,* >>> >>> >>> *I am running given hive query . it is giving error while executing. >>> please help me get out of it and understand possible reason for error.* >>> >>> *Hive Query :* >>> >>> SELECT * >>> FROM store_sales , date_dim , store , >>> household_demographics , customer_address >>> WHERE store_sales.ss_sold_date_sk = date_dim.d_date_sk AND >>> store_sales.ss_store_sk = store.s_store_sk >>> AND store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk AND >>> store_sales.ss_addr_sk = customer_address.ca_address_sk >>> AND ( date_dim.d_dom BETWEEN 1 AND 2 ) >>> AND (household_demographics.hd_dep_count = 3 OR >>> household_demographics.hd_vehicle_count = -1 ) >>> AND date_dim.d_year IN (1998, 1998 + 1 , 1998 + 2 ) AND store.s_city >>> IN ('Midway','Fairview') ; >>> >>> >>> *Note : * >>> All tables [store_sales , date_dim , store , >>> household_demographics , customer_address] are in ORC format. >>> hive version : 1.0.0 >>> >>> >>> *Additional note :* >>> I also checked hive EXPLAIN for same query . It is failing at last stage >>> where is joining intermediate result to customer_address. >>> I also checked for null values on store_sales.ss_addr_sk , >>> customer_address.ca_address_sk. which is not the case. >>> I also changed hive log level to DEBUG , not specific in log file >>> regarding error. >>> >>> I really wanted to understand why hive query is failing. >>> and how can be resolved ? >>> and where to look into ? >>> any help is highly appreciated. >>> >>> >>> *At Hive console :* >>> >>> Launching Job 4 out of 4 >>> Number of reduce tasks not specified. Estimated from input data size: 1 >>> In order to change the average load for a reducer (in bytes): >>> set hive.exec.reducers.bytes.per.reducer= >>> In order to limit the maximum number of reducers: >>> set hive.exec.reducers.max= >>> In order to set a constant number of reducers: >>> set mapreduce.job.reduces= >>> java.lang.NullPointerException >>> at >>> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:265) >>> at >>> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:272) >>> at >>> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:509) >>> ... >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:606) >>> at org.apache.hadoop.util.RunJar.run(RunJar.java:221) >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:136) >>> Job Submission failed with exception >>> 'java.lang.NullPointerException(null)' >>> FAILED: Execution Error, return code 1 from >>> org.apache.hadoop.hive.ql.exec.mr.MapRedTask >>> MapReduce Jobs Launched: >>> Stage-Stage-5: Map: 2 Reduce: 1 Cumulative CPU: 4.08 sec HDFS Read: >>> 746 HDFS Write: 96 SUCCESS >>> Stage-Stage-3: Map: 2 Reduce: 1 Cumulative CPU: 3.32 sec HDFS Read: >>> 889 HDFS Write: 96 SUCCESS >>> Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 3.21 sec HDFS Read: >>> 889 HDFS Write: 96 SUCCESS >>> >>> >>> >>> >>> *Hive erro
Re: Hive Query failing !!!
Client$1.run(JobClient.java:557) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) > at > org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:429) > at > org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1604) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1364) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1177) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:994) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:201) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:153) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:364) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:712) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:631) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:570) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > > > > Regards > Sanjiv Singh > Mob : +091 9990-447-339 > -- Nitin Pawar
Re: Tables missing on the file system
are you loading data in partitioned test_table after creating table and before repairing using MSCK ? On Tue, Sep 15, 2015 at 3:51 PM, ravi teja <raviort...@gmail.com> wrote: > The location is present in the filesystem. > > Thanks, > Ravi > > On Tue, Sep 15, 2015 at 12:16 PM, Chetna C <chetna@gmail.com> wrote: > >> Hi Ravi, >> Please make sure the location mentioned while create table exists at >> the time of *'MSCK REPAIR'*. This error occurs, if location does not >> exists on fs. >> >> Thanks, >> Chetna Chaudhari >> >> On 15 September 2015 at 12:03, ravi teja <raviort...@gmail.com> wrote: >> >>> Hi, >>> I am getting this exception when I repair a table. >>> Not sure what this means, didnt get any info while searching also. >>> >>> Can someone guide , what this means? >>> >>> >>> CREATE EXTERNAL TABLE IF NOT EXISTS test_table >>> OK >>> Time taken: 0.124 seconds >>> >>> MSCK REPAIR TABLE test_table >>> OK >>> Tables missing on filesystem: test_table >>> >>> Time taken: 0.691 seconds, Fetched: 1 row(s) >>> >>> >>> Thanks, >>> Ravi >>> >>> >> > -- Nitin Pawar
Re: Loading multiple file format in hive
you are talking about 15 minutes delay to convert the job so you have two options 1) redesign your table in a way where you have two partitions with two file fomrats and you load data from one to other and then clear that partition, so if you query data without partition it will read both file formats and serve data 2) take a 15 mins delay in reporting and show the data only from paraquet formats On Tue, Aug 25, 2015 at 12:06 PM, Jeetendra G jeetendr...@housing.com wrote: If I write to staging area and then run job to convert this data to parquet , there wont be delay of this much time? mean to say this data wont be available to hive until it converts to parquet and write to hive location? On Tue, Aug 25, 2015 at 11:53 AM, Nitin Pawar nitinpawar...@gmail.com wrote: Is it possible for you to write the data into staging area and run a job on that and then convert ito paraquet table ? so you are looking to have two table .. one temp for holding data till 15mins and then your job loads this temp data to to your parquet backed table sorry for my misunderstanding .. you can though set fileformat at each partition level but then you need to entirely redesign your table to have staging partition and real data partition On Tue, Aug 25, 2015 at 11:46 AM, Jeetendra G jeetendr...@housing.com wrote: Thanks Nitin for reply. I have data coming from RabbitMQ and i have spark streaming API which take this events and dump into HDFS. I cant really convert data events to some format like parquet/orc because I dont have schema here. Once I dump to HDFS i am writing one job which read this data and convert into Parquet. By this time I will have some raw events right? On Tue, Aug 25, 2015 at 11:35 AM, Nitin Pawar nitinpawar...@gmail.com wrote: file formats in a hive is a table level property. I am not sure why would you have data at 15mins interval to your actual table instead of a staging table and do the conversion or have the raw file in the format you want and load it directly into table On Tue, Aug 25, 2015 at 11:27 AM, Jeetendra G jeetendr...@housing.com wrote: I tried searching how to set multiple format with multiple partitions , could not find much detail. Can please share some good material around this if you have any. On Mon, Aug 24, 2015 at 10:49 PM, Daniel Haviv daniel.ha...@veracity-group.com wrote: Hi, You can set a different file format per partition. You can't mix files in the same directory (You could theoretically write some kind of custom SerDe). Daniel. On Mon, Aug 24, 2015 at 6:15 PM, Jeetendra G jeetendr...@housing.com wrote: Can anyone put some light on this please? On Mon, Aug 24, 2015 at 12:32 PM, Jeetendra G jeetendr...@housing.com wrote: HI All, I have a directory where I have json formatted and parquet files in same folder. can hive load these? I am getting Json data and storing in HDFS. later I am running job to convert JSon to Parquet(every 15 mins). so we will habe 15 mins Json data. Can i provide multiple serde in hive? regards Jeetendra -- Nitin Pawar -- Nitin Pawar -- Nitin Pawar
Re: Loading multiple file format in hive
file formats in a hive is a table level property. I am not sure why would you have data at 15mins interval to your actual table instead of a staging table and do the conversion or have the raw file in the format you want and load it directly into table On Tue, Aug 25, 2015 at 11:27 AM, Jeetendra G jeetendr...@housing.com wrote: I tried searching how to set multiple format with multiple partitions , could not find much detail. Can please share some good material around this if you have any. On Mon, Aug 24, 2015 at 10:49 PM, Daniel Haviv daniel.ha...@veracity-group.com wrote: Hi, You can set a different file format per partition. You can't mix files in the same directory (You could theoretically write some kind of custom SerDe). Daniel. On Mon, Aug 24, 2015 at 6:15 PM, Jeetendra G jeetendr...@housing.com wrote: Can anyone put some light on this please? On Mon, Aug 24, 2015 at 12:32 PM, Jeetendra G jeetendr...@housing.com wrote: HI All, I have a directory where I have json formatted and parquet files in same folder. can hive load these? I am getting Json data and storing in HDFS. later I am running job to convert JSon to Parquet(every 15 mins). so we will habe 15 mins Json data. Can i provide multiple serde in hive? regards Jeetendra -- Nitin Pawar
Re: Loading multiple file format in hive
Is it possible for you to write the data into staging area and run a job on that and then convert ito paraquet table ? so you are looking to have two table .. one temp for holding data till 15mins and then your job loads this temp data to to your parquet backed table sorry for my misunderstanding .. you can though set fileformat at each partition level but then you need to entirely redesign your table to have staging partition and real data partition On Tue, Aug 25, 2015 at 11:46 AM, Jeetendra G jeetendr...@housing.com wrote: Thanks Nitin for reply. I have data coming from RabbitMQ and i have spark streaming API which take this events and dump into HDFS. I cant really convert data events to some format like parquet/orc because I dont have schema here. Once I dump to HDFS i am writing one job which read this data and convert into Parquet. By this time I will have some raw events right? On Tue, Aug 25, 2015 at 11:35 AM, Nitin Pawar nitinpawar...@gmail.com wrote: file formats in a hive is a table level property. I am not sure why would you have data at 15mins interval to your actual table instead of a staging table and do the conversion or have the raw file in the format you want and load it directly into table On Tue, Aug 25, 2015 at 11:27 AM, Jeetendra G jeetendr...@housing.com wrote: I tried searching how to set multiple format with multiple partitions , could not find much detail. Can please share some good material around this if you have any. On Mon, Aug 24, 2015 at 10:49 PM, Daniel Haviv daniel.ha...@veracity-group.com wrote: Hi, You can set a different file format per partition. You can't mix files in the same directory (You could theoretically write some kind of custom SerDe). Daniel. On Mon, Aug 24, 2015 at 6:15 PM, Jeetendra G jeetendr...@housing.com wrote: Can anyone put some light on this please? On Mon, Aug 24, 2015 at 12:32 PM, Jeetendra G jeetendr...@housing.com wrote: HI All, I have a directory where I have json formatted and parquet files in same folder. can hive load these? I am getting Json data and storing in HDFS. later I am running job to convert JSon to Parquet(every 15 mins). so we will habe 15 mins Json data. Can i provide multiple serde in hive? regards Jeetendra -- Nitin Pawar -- Nitin Pawar
Re: query behaviors with subquery in clause
any help guys ? On Thu, Aug 13, 2015 at 2:52 PM, Nitin Pawar nitinpawar...@gmail.com wrote: Hi, right now hive does not support the equality clause in sub-queries. for ex: select * from A where date = (select max(date) from B) It though supports IN clause select * from A where date in (select max(date) from B) in table A the table is partitioned by date column so i was hoping that when I apply IN clause it would look only for that partition but it is reading the entire table select * from A where date='2015-08-09' ... reads one partition select * from A where date in ('2015-08-09') ... reads one partitions select * from A where date in (select max(date) from B) ... reads all partitions from A am I missing anything error or am i doing something wrong ? -- Nitin Pawar -- Nitin Pawar
Re: query behaviors with subquery in clause
Thanks Noam. As we are doing this via oozie, it will be either EL Action of something else I will just get around with a temp table and do a join with temp table with date column On Thu, Aug 20, 2015 at 5:27 PM, Noam Hasson noam.has...@kenshoo.com wrote: I observed in other situation, when ever you run queries where you don't specify statistics partitions, Hive doesn't pre-compute which one to take so it will take all the table. I would suggest implementing the max date by code in a separate query. On Thu, Aug 20, 2015 at 12:16 PM, Nitin Pawar nitinpawar...@gmail.com wrote: any help guys ? On Thu, Aug 13, 2015 at 2:52 PM, Nitin Pawar nitinpawar...@gmail.com wrote: Hi, right now hive does not support the equality clause in sub-queries. for ex: select * from A where date = (select max(date) from B) It though supports IN clause select * from A where date in (select max(date) from B) in table A the table is partitioned by date column so i was hoping that when I apply IN clause it would look only for that partition but it is reading the entire table select * from A where date='2015-08-09' ... reads one partition select * from A where date in ('2015-08-09') ... reads one partitions select * from A where date in (select max(date) from B) ... reads all partitions from A am I missing anything error or am i doing something wrong ? -- Nitin Pawar -- Nitin Pawar This e-mail, as well as any attached document, may contain material which is confidential and privileged and may include trademark, copyright and other intellectual property rights that are proprietary to Kenshoo Ltd, its subsidiaries or affiliates (Kenshoo). This e-mail and its attachments may be read, copied and used only by the addressee for the purpose(s) for which it was disclosed herein. If you have received it in error, please destroy the message and any attachment, and contact us immediately. If you are not the intended recipient, be aware that any review, reliance, disclosure, copying, distribution or use of the contents of this message without Kenshoo's express permission is strictly prohibited. -- Nitin Pawar
Re: [blocker] ArrayIndexoutofbound in a hive query
sorry but i could not find following info 1) are you using tez as execution engine? if yes make sure its not snapshot version 2) are you using orc file format? if yes then set flag to ignore corrupt data 3) are there nulls on your join condition columns if possible share the query and underlying file formats with some sample data On Fri, Jul 31, 2015 at 12:14 PM, ravi teja raviort...@gmail.com wrote: Hi, We are facing issue with our hive query with ArrayIndexoutofbound exception. I have tried googling out and I see many users facing the same error, but no solution yet. This is a blocker for our production and we really need help on this. We are using Hive version : 1.3.0. Our query is doing multiple joins(right and left). *Diagnostic Messages for this Task:* Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {_col0:48436215,_col1:87269315,_col2:\u,_col3:Customer,_col4:null,_col5:null,_col6:CSS Email,_col7:,_col8:null,_col9:null,_col10:null,_col11:null,_col12:null,_col13:null} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {_col0:48436215,_col1:87269315,_col2:\u,_col3:Customer,_col4:null,_col5:null,_col6:CSS Email,_col7:,_col8:null,_col9:null,_col10:null,_col11:null,_col12:null,_col13:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:403) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508) ... 9 more Caused by: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.io.Text.set(Text.java:225) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:267) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:204) at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:558) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:383) ... 13 more FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask Thanks, Ravi -- Nitin Pawar
Re: [blocker] ArrayIndexoutofbound in a hive query
is there a different output format or the output table bucketed? can you try putting a not null condition on join columns On Fri, Jul 31, 2015 at 12:45 PM, ravi teja raviort...@gmail.com wrote: Hi Nithin, Thanks for replying. The select query runs like a charm, but only when insertion into a table, then this problem occurs. Please find the answers inline. Thanks, Ravi On Fri, Jul 31, 2015 at 12:34 PM, Nitin Pawar nitinpawar...@gmail.com wrote: sorry but i could not find following info 1) are you using tez as execution engine? if yes make sure its not snapshot version *NO* 2) are you using orc file format? if yes then set flag to ignore corrupt data *NO, Its Text file format* 3) are there nulls on your join condition columns *Yes, there might be some* if possible share the query and underlying file formats with some sample data *I cant really share the query.* On Fri, Jul 31, 2015 at 12:14 PM, ravi teja raviort...@gmail.com wrote: Hi, We are facing issue with our hive query with ArrayIndexoutofbound exception. I have tried googling out and I see many users facing the same error, but no solution yet. This is a blocker for our production and we really need help on this. We are using Hive version : 1.3.0. Our query is doing multiple joins(right and left). *Diagnostic Messages for this Task:* Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {_col0:48436215,_col1:87269315,_col2:\u,_col3:Customer,_col4:null,_col5:null,_col6:CSS Email,_col7:,_col8:null,_col9:null,_col10:null,_col11:null,_col12:null,_col13:null} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {_col0:48436215,_col1:87269315,_col2:\u,_col3:Customer,_col4:null,_col5:null,_col6:CSS Email,_col7:,_col8:null,_col9:null,_col10:null,_col11:null,_col12:null,_col13:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:403) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508) ... 9 more Caused by: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.io.Text.set(Text.java:225) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:267) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:204) at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:558) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:383) ... 13 more FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask Thanks, Ravi -- Nitin Pawar -- Nitin Pawar
Re: Regarding query in HiveResultSet
then why not just use max function? select max(a) from (select sum(a) as a, b from t group by b)n On Fri, Jul 31, 2015 at 12:48 PM, Renuka Be renunalin...@gmail.com wrote: Hi Nitin, I am using hive query. Regards, Renuka N. On Fri, Jul 31, 2015 at 2:42 AM, Nitin Pawar nitinpawar...@gmail.com wrote: are you writing your java code using hive or you are writing hive query? On Fri, Jul 31, 2015 at 11:13 AM, Renuka Be renunalin...@gmail.com wrote: Hi Folks, I want to find Max value from the HiveResult. There is option listed in HiveResultSet properties HiveResultSet.Max(). When i use this 'HiveResultSet.Max()' it throws exception. Error : At least one object must implement IComparable. Is there any way to find Min, Max from the HiveResultSet? Thanks, Renuka N. -- Nitin Pawar -- Nitin Pawar
Re: Regarding query in HiveResultSet
are you writing your java code using hive or you are writing hive query? On Fri, Jul 31, 2015 at 11:13 AM, Renuka Be renunalin...@gmail.com wrote: Hi Folks, I want to find Max value from the HiveResult. There is option listed in HiveResultSet properties HiveResultSet.Max(). When i use this 'HiveResultSet.Max()' it throws exception. Error : At least one object must implement IComparable. Is there any way to find Min, Max from the HiveResultSet? Thanks, Renuka N. -- Nitin Pawar
Re: Regarding query in HiveResultSet
why don't you get those as result of query results instead of iterating through all on c# side? your query can directly provide min and max is there is a specific thing which is blocking you from getting from hive and do it on application side? On Fri, Jul 31, 2015 at 4:14 PM, Renuka Be renunalin...@gmail.com wrote: I have used hive query to get column values that returns HiveResultSet. I need to find Min and Max value in HiveResultSet in code level. Is there any possibility. I am using c#. -Renuka N On Fri, Jul 31, 2015 at 3:29 AM, Nitin Pawar nitinpawar...@gmail.com wrote: then why not just use max function? select max(a) from (select sum(a) as a, b from t group by b)n On Fri, Jul 31, 2015 at 12:48 PM, Renuka Be renunalin...@gmail.com wrote: Hi Nitin, I am using hive query. Regards, Renuka N. On Fri, Jul 31, 2015 at 2:42 AM, Nitin Pawar nitinpawar...@gmail.com wrote: are you writing your java code using hive or you are writing hive query? On Fri, Jul 31, 2015 at 11:13 AM, Renuka Be renunalin...@gmail.com wrote: Hi Folks, I want to find Max value from the HiveResult. There is option listed in HiveResultSet properties HiveResultSet.Max(). When i use this 'HiveResultSet.Max()' it throws exception. Error : At least one object must implement IComparable. Is there any way to find Min, Max from the HiveResultSet? Thanks, Renuka N. -- Nitin Pawar -- Nitin Pawar -- Nitin Pawar
Re: character '' not supported here
I could not solve the problem so I had to recreate the table from another temp table I think its issue with ORC file format may be we can post to dev@ or wait for some dev to response On Mon, Jul 20, 2015 at 1:51 PM, patcharee patcharee.thong...@uni.no wrote: Hi, I created a hive table stored as orc file (partitioned and compressed by ZLIB) from Hive CLI, added data into this table by a Spark application. After adding I was able to query data and everything looked fine. Then I concatenated the table from Hive CLI. After that I am not able to query data, like select count(*) from Table, any more, just got error line 1:1 character '' not supported here, no matter Tez or MR engine. How can you solve the problem in your case? BR, Patcharee On 18. juli 2015 21:26, Nitin Pawar wrote: can you tell exactly what steps you did/? also did you try running the query with processing to MR instead of tez? not sure this issue with orc file formats .. i had once faced issues on alter table for orc backed tabled on adding a new column On Sun, Jul 19, 2015 at 12:05 AM, pth001 patcharee.thong...@uni.no wrote: Hi, The query result 11236119012.64043-5.9708868.5592070.0 0.00.0-19.6869931308.804799848.00.0061966440.0 0.0301.274750.382470460.0NULL1120081 11236122012.513598-6.36717137.39279460.0 0.00.0-22.3003921441.054799848.00.00508465060.0 0.0112.207870.304595230.0NULL1120081 5122503682415.1955.1722354.9027147 -0.0244086120.023590.553-38.96928-1130.046974660.54 2.5969802E-49.706164E-1123054.2680.00.241967370.0 NULL1120081 9121449412.25196412.081688-9.594620.0 0.00.0-25.93576258.6562599848.00.00217082170.0 0.01.29632131.15602660.0NULL1120081 9121458412.3020987.752461-12.1834630.0 0.00.0-24.983763351.195399848.00.00237235990.0 0.01.41373750.992398860.0NULL1120081 I stored table in orc format, partitioned and compressed by ZLIB. The problem happened just after I concatenate table. BR, Patcharee On 18/07/15 12:46, Nitin Pawar wrote: select * without where will work because it does not involve file processing I suspect the problem is with field delimiter so i asked for records so that we can see whats the data in each column are you using csv file with columns delimited by some char and it has numeric data in quotes ? On Sat, Jul 18, 2015 at 3:58 PM, patcharee patcharee.thong...@uni.no wrote: This select * from table limit 5; works, but not others. So? Patcharee On 18. juli 2015 12:08, Nitin Pawar wrote: can you do select * from table limit 5; On Sat, Jul 18, 2015 at 3:35 PM, patcharee patcharee.thong...@uni.no wrote: Hi, I am using hive 0.14 with Tez engine. Found a weird problem. Any suggestions? hive select count(*) from 4D; line 1:1 character '' not supported here line 1:2 character '' not supported here line 1:3 character '' not supported here line 1:4 character '' not supported here line 1:5 character '' not supported here line 1:6 character '' not supported here line 1:7 character '' not supported here line 1:8 character '' not supported here line 1:9 character '' not supported here ... ... line 1:131 character '' not supported here line 1:132 character '' not supported here line 1:133 character '' not supported here line 1:134 character '' not supported here line 1:135 character '' not supported here line 1:136 character '' not supported here line 1:137 character '' not supported here line 1:138 character '' not supported here line 1:139 character '' not supported here line 1:140 character '' not supported here line 1:141 character '' not supported here line 1:142 character '' not supported here line 1:143 character '' not supported here line 1:144 character '' not supported here line 1:145 character '' not supported here line 1:146 character '' not supported here BR, Patcharee -- Nitin Pawar -- Nitin Pawar -- Nitin Pawar -- Nitin Pawar
Re: character '' not supported here
can you do select * from table limit 5; On Sat, Jul 18, 2015 at 3:35 PM, patcharee patcharee.thong...@uni.no wrote: Hi, I am using hive 0.14 with Tez engine. Found a weird problem. Any suggestions? hive select count(*) from 4D; line 1:1 character '' not supported here line 1:2 character '' not supported here line 1:3 character '' not supported here line 1:4 character '' not supported here line 1:5 character '' not supported here line 1:6 character '' not supported here line 1:7 character '' not supported here line 1:8 character '' not supported here line 1:9 character '' not supported here ... ... line 1:131 character '' not supported here line 1:132 character '' not supported here line 1:133 character '' not supported here line 1:134 character '' not supported here line 1:135 character '' not supported here line 1:136 character '' not supported here line 1:137 character '' not supported here line 1:138 character '' not supported here line 1:139 character '' not supported here line 1:140 character '' not supported here line 1:141 character '' not supported here line 1:142 character '' not supported here line 1:143 character '' not supported here line 1:144 character '' not supported here line 1:145 character '' not supported here line 1:146 character '' not supported here BR, Patcharee -- Nitin Pawar
Re: character '' not supported here
select * without where will work because it does not involve file processing I suspect the problem is with field delimiter so i asked for records so that we can see whats the data in each column are you using csv file with columns delimited by some char and it has numeric data in quotes ? On Sat, Jul 18, 2015 at 3:58 PM, patcharee patcharee.thong...@uni.no wrote: This select * from table limit 5; works, but not others. So? Patcharee On 18. juli 2015 12:08, Nitin Pawar wrote: can you do select * from table limit 5; On Sat, Jul 18, 2015 at 3:35 PM, patcharee patcharee.thong...@uni.no wrote: Hi, I am using hive 0.14 with Tez engine. Found a weird problem. Any suggestions? hive select count(*) from 4D; line 1:1 character '' not supported here line 1:2 character '' not supported here line 1:3 character '' not supported here line 1:4 character '' not supported here line 1:5 character '' not supported here line 1:6 character '' not supported here line 1:7 character '' not supported here line 1:8 character '' not supported here line 1:9 character '' not supported here ... ... line 1:131 character '' not supported here line 1:132 character '' not supported here line 1:133 character '' not supported here line 1:134 character '' not supported here line 1:135 character '' not supported here line 1:136 character '' not supported here line 1:137 character '' not supported here line 1:138 character '' not supported here line 1:139 character '' not supported here line 1:140 character '' not supported here line 1:141 character '' not supported here line 1:142 character '' not supported here line 1:143 character '' not supported here line 1:144 character '' not supported here line 1:145 character '' not supported here line 1:146 character '' not supported here BR, Patcharee -- Nitin Pawar -- Nitin Pawar
Re: character '' not supported here
can you tell exactly what steps you did/? also did you try running the query with processing to MR instead of tez? not sure this issue with orc file formats .. i had once faced issues on alter table for orc backed tabled on adding a new column On Sun, Jul 19, 2015 at 12:05 AM, pth001 patcharee.thong...@uni.no wrote: Hi, The query result 11236119012.64043-5.9708868.5592070.0 0.00.0-19.6869931308.804799848.00.0061966440.0 0.0301.274750.382470460.0NULL1120081 11236122012.513598-6.36717137.39279460.0 0.00.0-22.3003921441.054799848.00.00508465060.0 0.0112.207870.304595230.0NULL1120081 5122503682415.1955.1722354.9027147 -0.0244086120.023590.553-38.96928-1130.046974660.54 2.5969802E-49.706164E-1123054.2680.00.241967370.0 NULL1120081 9121449412.25196412.081688-9.594620.0 0.00.0-25.93576258.6562599848.00.00217082170.0 0.01.29632131.15602660.0NULL1120081 9121458412.3020987.752461-12.1834630.0 0.00.0-24.983763351.195399848.00.00237235990.0 0.01.41373750.992398860.0NULL1120081 I stored table in orc format, partitioned and compressed by ZLIB. The problem happened just after I concatenate table. BR, Patcharee On 18/07/15 12:46, Nitin Pawar wrote: select * without where will work because it does not involve file processing I suspect the problem is with field delimiter so i asked for records so that we can see whats the data in each column are you using csv file with columns delimited by some char and it has numeric data in quotes ? On Sat, Jul 18, 2015 at 3:58 PM, patcharee patcharee.thong...@uni.no wrote: This select * from table limit 5; works, but not others. So? Patcharee On 18. juli 2015 12:08, Nitin Pawar wrote: can you do select * from table limit 5; On Sat, Jul 18, 2015 at 3:35 PM, patcharee patcharee.thong...@uni.no wrote: Hi, I am using hive 0.14 with Tez engine. Found a weird problem. Any suggestions? hive select count(*) from 4D; line 1:1 character '' not supported here line 1:2 character '' not supported here line 1:3 character '' not supported here line 1:4 character '' not supported here line 1:5 character '' not supported here line 1:6 character '' not supported here line 1:7 character '' not supported here line 1:8 character '' not supported here line 1:9 character '' not supported here ... ... line 1:131 character '' not supported here line 1:132 character '' not supported here line 1:133 character '' not supported here line 1:134 character '' not supported here line 1:135 character '' not supported here line 1:136 character '' not supported here line 1:137 character '' not supported here line 1:138 character '' not supported here line 1:139 character '' not supported here line 1:140 character '' not supported here line 1:141 character '' not supported here line 1:142 character '' not supported here line 1:143 character '' not supported here line 1:144 character '' not supported here line 1:145 character '' not supported here line 1:146 character '' not supported here BR, Patcharee -- Nitin Pawar -- Nitin Pawar -- Nitin Pawar
Re: Hive Query Error
can u check your config? host appears twice 01hw357381.tcsgegdc.com: 01hw357381.tcsgegdc.com it shd be hostname:port also once you correct this, you do a nslookup on the host to make sure its identified by the hive client On Thu, Jul 9, 2015 at 7:19 PM, Ajeet O ajee...@tcs.com wrote: Hi All , I have installed Hadoop 2.0 , Hive 0.12 on Cent OS 7. When I run a query in Hive - select count(*) from u_data ; it gives following errors. , However I can run select * from u_data ; pls help. hive select count(*) from u_data; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number java.net.UnknownHostException: 01hw357381.tcsgegdc.com: 01hw357381.tcsgegdc.com: unknown error at java.net.InetAddress.getLocalHost(InetAddress.java:1484) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:439) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.net.UnknownHostException: 01hw357381.tcsgegdc.com: unknown error at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:907) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1302) at java.net.InetAddress.getLocalHost(InetAddress.java:1479) ... 34 more Job Submission failed with exception 'java.net.UnknownHostException( 01hw357381.tcsgegdc.com: 01hw357381.tcsgegdc.com: unknown error)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask Thanks Ajeet =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you -- Nitin Pawar
Re: fails to alter table concatenate
can you try doing same by changing the query engine from tez to mr1? not sure if its hive bug or tez bug On Tue, Jun 30, 2015 at 1:46 PM, patcharee patcharee.thong...@uni.no wrote: Hi, I am using hive 0.14. It fails to alter table concatenate occasionally (see the exception below). It is strange that it fails from time to time not predictable. Is there any suggestion/clue? hive alter table 4dim partition(zone=2,z=15,year=2005,month=4) CONCATENATE; VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED File MergeFAILED -1 00 -1 0 0 VERTICES: 00/01 [--] 0%ELAPSED TIME: 1435651968.00 s Status: Failed Vertex failed, vertexName=File Merge, vertexId=vertex_1435307579867_0041_1_00, diagnostics=[Vertex vertex_1435307579867_0041_1_00 [File Merge] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: [hdfs://service-10-0.local:8020/apps/hive/warehouse/wrf_tables/4dim/zone=2/z=15/year=2005/month=4] initializer failed, vertex=vertex_1435307579867_0041_1_00 [File Merge], java.lang.NullPointerException at org.apache.hadoop.hive.ql.io .HiveInputFormat.init(HiveInputFormat.java:265) at org.apache.hadoop.hive.ql.io .CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:452) at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:441) at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:295) at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:124) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ] DAG failed due to vertex failure. failedVertices:1 killedVertices:0 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.DDLTask BR, Patcharee -- Nitin Pawar
Re: Show table in Spark
please check on spark userlist. I don't think its related to hive On Tue, Jun 30, 2015 at 4:42 PM, Vinod Kuamr vinod.rajan1...@yahoo.com wrote: Hi Folks, Can anyone please let me know how to show content of dataframe in spark? when I using *dt.show()* ( here df is dataframe) I am getting following result [image: Inline image] I am using Scala version 1.3.1 in windows 8 Thanks in advance, Vinod -- Nitin Pawar
Re: Hive and elasticsearch-hadoop-2.1.0 : NoClassDefFoundError
by any chance you built hive yourself ? On Mon, Jun 29, 2015 at 7:11 PM, Erwan Queffélec erwan.queffe...@gmail.com wrote: Additional info : it works when I manually add the jar with ADD JAR file; hive ADD JAR ' /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar' I'm quite new to hive and hadoop in general. This is my first post on this mailing list, so please excuse me if the folowing question has been asked answered over and over again : Perhaps I'm a bit naíve, but I though that hive Custom SerDe/UD*Fs were able to access everything already on the Hive classpath. Was it just a dream ? I would greatly appreciate some pointers, thanks to anyone who might be able to help ! Best regards, Erwan On Mon, Jun 29, 2015 at 2:30 PM, Erwan Queffélec erwan.queffe...@gmail.com wrote: Hello, I'm running HortonWorks HDP 2.2.6, hive 0.14 alongside an ElasticSearch cluster For some reason Hive can't seem to connect to my ES cluster Using the ES SerDe (I'm using elasticsearch-hadoop-2.1.0.jar). When time comes for my job to insert the my query result in an ES resources, I get this stacktrace: NoClassDefFoundError: org/apache/commons/httpclient/URIException at org.elasticsearch.hadoop.hive.HiveUtils.structObjectInspector(HiveUtils.java:57) at org.elasticsearch.hadoop.hive.EsSerDe.initialize(EsSerDe.java:82) at org.elasticsearch.hadoop.hive.EsSerDe.initialize(EsSerDe.java:97) I'm quite puzzled because commons-httpclient is supposed to be on the hive-client classpath : # ls -l /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar -rw-r--r-- 1 root root 279781 Mar 31 20:26 /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar # ls -l /usr/hdp/current/hive-client/lib/commons-httpclient-3.0.1.jar -rw-r--r-- 1 root root 279781 Mar 31 20:26 /usr/hdp/current/hive-client/lib/commons-httpclient-3.0.1.jar What am I missing ? Thanks a lot for your help, Kind regards, Erwan -- Nitin Pawar
Re: Hive and elasticsearch-hadoop-2.1.0 : NoClassDefFoundError
I am using 2.2.4-2.2 but did not get any error. can you check what all services are installed on the node where hive client is running On Mon, Jun 29, 2015 at 7:18 PM, Erwan Queffélec erwan.queffe...@gmail.com wrote: Hi Nitin, No, I didn't do such a thing. I'm using the stock 0.14 version from HDP 2.2.4 (said 2.2.6 earlier but that was wrong) # hive --version Hive 0.14.0.2.2.4.2-2 Subversion git://ip-10-0-0-5.ec2.internal/grid/0/jenkins/workspace/HDP-2.2.4.1-centos6/bigtop/build/hive/rpm/BUILD/hive-0.14.0.2.2.4.2 -r 115d99896f5a4a81e7d91e052e8d38d7436b78d4 Compiled by jenkins on Tue Mar 31 16:26:33 EDT 2015 From source with checksum 1f34a1d4e566c3e801582862ed85ee93 Thanks for taking the time. Kind regards, Erwan On Mon, Jun 29, 2015 at 3:44 PM, Nitin Pawar nitinpawar...@gmail.com wrote: by any chance you built hive yourself ? On Mon, Jun 29, 2015 at 7:11 PM, Erwan Queffélec erwan.queffe...@gmail.com wrote: Additional info : it works when I manually add the jar with ADD JAR file; hive ADD JAR ' /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar' I'm quite new to hive and hadoop in general. This is my first post on this mailing list, so please excuse me if the folowing question has been asked answered over and over again : Perhaps I'm a bit naíve, but I though that hive Custom SerDe/UD*Fs were able to access everything already on the Hive classpath. Was it just a dream ? I would greatly appreciate some pointers, thanks to anyone who might be able to help ! Best regards, Erwan On Mon, Jun 29, 2015 at 2:30 PM, Erwan Queffélec erwan.queffe...@gmail.com wrote: Hello, I'm running HortonWorks HDP 2.2.6, hive 0.14 alongside an ElasticSearch cluster For some reason Hive can't seem to connect to my ES cluster Using the ES SerDe (I'm using elasticsearch-hadoop-2.1.0.jar). When time comes for my job to insert the my query result in an ES resources, I get this stacktrace: NoClassDefFoundError: org/apache/commons/httpclient/URIException at org.elasticsearch.hadoop.hive.HiveUtils.structObjectInspector(HiveUtils.java:57) at org.elasticsearch.hadoop.hive.EsSerDe.initialize(EsSerDe.java:82) at org.elasticsearch.hadoop.hive.EsSerDe.initialize(EsSerDe.java:97) I'm quite puzzled because commons-httpclient is supposed to be on the hive-client classpath : # ls -l /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar -rw-r--r-- 1 root root 279781 Mar 31 20:26 /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar # ls -l /usr/hdp/current/hive-client/lib/commons-httpclient-3.0.1.jar -rw-r--r-- 1 root root 279781 Mar 31 20:26 /usr/hdp/current/hive-client/lib/commons-httpclient-3.0.1.jar What am I missing ? Thanks a lot for your help, Kind regards, Erwan -- Nitin Pawar -- Nitin Pawar
Re: Hive and elasticsearch-hadoop-2.1.0 : NoClassDefFoundError
great it helped On Mon, Jun 29, 2015 at 7:29 PM, Erwan Queffélec erwan.queffe...@gmail.com wrote: [continued] A dependency for a custom UDF seems not to be properly shaded, as I could see in an excerpt of the maven build output: [INFO] Including org.apache.httpcomponents:httpclient:jar:4.1.2 in the shaded jar. [INFO] Including org.apache.httpcomponents:httpcore:jar:4.1.2 in the shaded jar. I'm going to look into this. Thanks a lot for confirming things worked as I expected on your end! Regards, Erwan On Mon, Jun 29, 2015 at 3:55 PM, Erwan Queffélec erwan.queffe...@gmail.com wrote: Hi Nitin, Digging up a bit I discovered that the error is probably on our end : On Mon, Jun 29, 2015 at 3:54 PM, Nitin Pawar nitinpawar...@gmail.com wrote: I am using 2.2.4-2.2 but did not get any error. can you check what all services are installed on the node where hive client is running On Mon, Jun 29, 2015 at 7:18 PM, Erwan Queffélec erwan.queffe...@gmail.com wrote: Hi Nitin, No, I didn't do such a thing. I'm using the stock 0.14 version from HDP 2.2.4 (said 2.2.6 earlier but that was wrong) # hive --version Hive 0.14.0.2.2.4.2-2 Subversion git://ip-10-0-0-5.ec2.internal/grid/0/jenkins/workspace/HDP-2.2.4.1-centos6/bigtop/build/hive/rpm/BUILD/hive-0.14.0.2.2.4.2 -r 115d99896f5a4a81e7d91e052e8d38d7436b78d4 Compiled by jenkins on Tue Mar 31 16:26:33 EDT 2015 From source with checksum 1f34a1d4e566c3e801582862ed85ee93 Thanks for taking the time. Kind regards, Erwan On Mon, Jun 29, 2015 at 3:44 PM, Nitin Pawar nitinpawar...@gmail.com wrote: by any chance you built hive yourself ? On Mon, Jun 29, 2015 at 7:11 PM, Erwan Queffélec erwan.queffe...@gmail.com wrote: Additional info : it works when I manually add the jar with ADD JAR file; hive ADD JAR ' /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar' I'm quite new to hive and hadoop in general. This is my first post on this mailing list, so please excuse me if the folowing question has been asked answered over and over again : Perhaps I'm a bit naíve, but I though that hive Custom SerDe/UD*Fs were able to access everything already on the Hive classpath. Was it just a dream ? I would greatly appreciate some pointers, thanks to anyone who might be able to help ! Best regards, Erwan On Mon, Jun 29, 2015 at 2:30 PM, Erwan Queffélec erwan.queffe...@gmail.com wrote: Hello, I'm running HortonWorks HDP 2.2.6, hive 0.14 alongside an ElasticSearch cluster For some reason Hive can't seem to connect to my ES cluster Using the ES SerDe (I'm using elasticsearch-hadoop-2.1.0.jar). When time comes for my job to insert the my query result in an ES resources, I get this stacktrace: NoClassDefFoundError: org/apache/commons/httpclient/URIException at org.elasticsearch.hadoop.hive.HiveUtils.structObjectInspector(HiveUtils.java:57) at org.elasticsearch.hadoop.hive.EsSerDe.initialize(EsSerDe.java:82) at org.elasticsearch.hadoop.hive.EsSerDe.initialize(EsSerDe.java:97) I'm quite puzzled because commons-httpclient is supposed to be on the hive-client classpath : # ls -l /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar -rw-r--r-- 1 root root 279781 Mar 31 20:26 /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar # ls -l /usr/hdp/current/hive-client/lib/commons-httpclient-3.0.1.jar -rw-r--r-- 1 root root 279781 Mar 31 20:26 /usr/hdp/current/hive-client/lib/commons-httpclient-3.0.1.jar What am I missing ? Thanks a lot for your help, Kind regards, Erwan -- Nitin Pawar -- Nitin Pawar -- Nitin Pawar
Re: Left function
try using substr function On Tue, Jun 16, 2015 at 3:03 PM, Ravisankar Mani rrav...@gmail.com wrote: Hi every one, how to get leftmost length of characters from the string in hive? In Mysql or sq has specific function LEFT(string,length) Could you please help any other way to achieve this scenario? Regards Ravisnkar -- Nitin Pawar
Re: difference between add file from a local disk and hdfs file
Answering my own question either way the file was available via distributed cache. it was a spelling mistake in the code for me, correcting it solved the problem On Sun, May 17, 2015 at 2:46 AM, Nitin Pawar nitinpawar...@gmail.com wrote: Hi, I am trying to access a lookup file from a udf. There are two ways I add lookup file to distribute cache option1: loading file from local disk to distributed cache this is for hive cli add file tmp.txt; option2: add a file from hdfs to distributed cache so that oozie can do it too add file hdfs:///user/admin/tmp.txt; i want to use a file from hdfs into distributed cache so that I can use it a hive udf. Problem is when I load a file using option1. it is available to the udf (works fine) hive add file format.txt; Added resources: [format.txt] hive list files; format.txt But when I load the file from hdfs, it moves into tmp folder and i am not sure if the path remains same all the time hive add file hdfs:user/admin/tmp.txt ; converting to local hdfs:///user/admin/tmp.txt Added resources: [hdfs:tmp.txt] hive list files; /tmp/006ab981-ddac-4bcb-bee1-7d8ed9a271a0_resources/tmp.txt Question: how do I get the file at same location (like option 1 all times) cause from option 2 I keep getting the error tmp.txt does not exists when I initialize the udf thanks -- Nitin Pawar -- Nitin Pawar
difference between add file from a local disk and hdfs file
Hi, I am trying to access a lookup file from a udf. There are two ways I add lookup file to distribute cache option1: loading file from local disk to distributed cache this is for hive cli add file tmp.txt; option2: add a file from hdfs to distributed cache so that oozie can do it too add file hdfs:///user/admin/tmp.txt; i want to use a file from hdfs into distributed cache so that I can use it a hive udf. Problem is when I load a file using option1. it is available to the udf (works fine) hive add file format.txt; Added resources: [format.txt] hive list files; format.txt But when I load the file from hdfs, it moves into tmp folder and i am not sure if the path remains same all the time hive add file hdfs:user/admin/tmp.txt ; converting to local hdfs:///user/admin/tmp.txt Added resources: [hdfs:tmp.txt] hive list files; /tmp/006ab981-ddac-4bcb-bee1-7d8ed9a271a0_resources/tmp.txt Question: how do I get the file at same location (like option 1 all times) cause from option 2 I keep getting the error tmp.txt does not exists when I initialize the udf thanks -- Nitin Pawar
Re: user matching query does not exist
this is related to djnago see this on how to clear sessions from django http://www.opencsw.org/community/questions/289/how-to-clear-the-django-session-cache On Fri, May 15, 2015 at 12:24 PM, amit kumar ak3...@gmail.com wrote: Yes it is happening for hue only, can u plz suggest how i cleaning up hue session from server ? The query is succeed in hive command line. On Fri, May 15, 2015 at 11:52 AM, Nitin Pawar nitinpawar...@gmail.com wrote: Is this happening for Hue? If yes, may be you can try cleaning up hue sessions from server. (this may clean all users active sessions from hue so be careful while doing it) On Fri, May 15, 2015 at 11:31 AM, amit kumar ak3...@gmail.com wrote: i am using CDH 5.2.1, Any pointers will be of immense help. Thanks On Fri, May 15, 2015 at 9:43 AM, amit kumar ak3...@gmail.com wrote: Hi, After re-create my account in Hue, i receives “User matching query does not exist” when attempting to perform hive query. The query is succeed in hive command line. Please suggest on this, Thanks you Amit -- Nitin Pawar -- Nitin Pawar
Re: user matching query does not exist
Is this happening for Hue? If yes, may be you can try cleaning up hue sessions from server. (this may clean all users active sessions from hue so be careful while doing it) On Fri, May 15, 2015 at 11:31 AM, amit kumar ak3...@gmail.com wrote: i am using CDH 5.2.1, Any pointers will be of immense help. Thanks On Fri, May 15, 2015 at 9:43 AM, amit kumar ak3...@gmail.com wrote: Hi, After re-create my account in Hue, i receives “User matching query does not exist” when attempting to perform hive query. The query is succeed in hive command line. Please suggest on this, Thanks you Amit -- Nitin Pawar
Re: Stopping HiveServer2
how did you start it ? On Wed, Apr 29, 2015 at 4:26 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Hello, How can I stop hiveserver2? I am not able to find the command. Thanks *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar
Re: Clear up Hive scratch directory
Thanks Martin Can you also mention steps you did to reclaim the hdfs data from temporary data ? On Fri, Apr 24, 2015 at 12:21 PM, Martin Benson martin.ben...@jaywing.com wrote: Hi All, I just wanted to feedback that it does appear to be safe - I emptied the directory manually, without adverse consequences. Thanks, Martin. -- From: Martin Benson martin.ben...@jaywing.com Sent: 20/04/2015 18:06 To: user@hive.apache.org Subject: Clear up Hive scratch directory Hi, One of my users tried to run an HUGE join, which failed due to a lack of space in HDFS. This has resulted in a large amount of data remaining in the Hive scratch directory which I need to clear down. I've tried setting hive.start.cleanup.scratchdir to true and restarting Hive, but it didn't tidy it up. So, I'm wondering if it is safe to just delete the content of the directory in HDFS (while Hive is stopped). Could anyone advise please? Many thanks, Martin. Registered in England and Wales at Players House, 300 Attercliffe Common, Sheffield, S9 2AG. Company number 05935923. This email and its attachments are confidential and are intended solely for the use of the addressed recipient. Any views or opinions expressed are those of the author and do not necessarily represent Jaywing. If you are not the intended recipient, you must not forward or show this to anyone or take any action based upon it. Please contact the sender if you received this in error. Registered in England and Wales at Players House, 300 Attercliffe Common, Sheffield, S9 2AG. Company number 05935923. This email and its attachments are confidential and are intended solely for the use of the addressed recipient. Any views or opinions expressed are those of the author and do not necessarily represent Jaywing. If you are not the intended recipient, you must not forward or show this to anyone or take any action based upon it. Please contact the sender if you received this in error. -- Nitin Pawar
Re: Discrepancy in String matching between Teradata and HIVE
Hive does not manipulate data by its own, if your processing logic needs the trimming of spaces then you can provide that in query. On Fri, Mar 27, 2015 at 1:17 PM, @Sanjiv Singh sanjiv.is...@gmail.com wrote: Hi All, I am getting into Hive and learning hive. I have customer table in teradata , used sqoop to extract complete table in hive which worked fine. See below customer table both in Teradata and HIVE. *In Teradata :* select TOP 4 id,name,''||status||'' from customer; 3172460 Customer#003172460 BUILDING 3017726 Customer#003017726 BUILDING 2817987 Customer#002817987 COMPLETE 2817984 Customer#002817984 BUILDING *In HIVE :* select id,name,CONCAT ('' , status , '') from customer LIMIT 4; 3172460 Customer#003172460 BUILDING 3017726 Customer#003017726 BUILDING 2817987 Customer#002817987 COMPLETE 2817984 Customer#002817984 BUILDING When I tried to fetch records from table customer with column matching which is of String type. I am getting different result for same query in different environment. See below query results.. *In Teradata :* select TOP 2 id,name,''||status||'' from customer WHERE status = 'BUILDING'; 3172460 Customer#003172460 BUILDING 3017726 Customer#003017726 BUILDING *In HIVE :* select id,name,CONCAT ('' , status , '') from customer WHERE status = 'BUILDING' LIMIT 2; ***No Result*** It seems that teradata is doing trimming short of thing before actually comparing stating values. But Hive is matching strings as it is. Not sure, It is expected behaviour or bug or can be raised as enhancement. I see below possible solution: - Convert into like operator expression with wildcard character before and after Looking forward for your response on this. How can it be handled/achieved in hive. Regards Sanjiv Singh Mob : +091 9990-447-339 -- Nitin Pawar
Re: Discrepancy in String matching between Teradata and HIVE
Hive is only PRO SQL compliance, In hive the string comparisons work just like they would work in java so in hive BUILDING = BUILDING BUILDING != BUILDING (extra space added) On Fri, Mar 27, 2015 at 2:11 PM, @Sanjiv Singh sanjiv.is...@gmail.com wrote: Hi, I can use rtrim function, i.e: select id,name,CONCAT ('' , status , '') from customer WHERE rtrim(status) = 'BUILDING' LIMIT 2; But question raised what standard in string comparison Hive uses? According to ANSI/ISO SQL-92 'BUILDING' == 'BUILDING ', Here is a link http://support.microsoft.com/en-us/kb/316626 for an article about it. Regards Sanjiv Singh Mob : +091 9990-447-339 On Fri, Mar 27, 2015 at 1:41 PM, Nitin Pawar nitinpawar...@gmail.com wrote: Hive does not manipulate data by its own, if your processing logic needs the trimming of spaces then you can provide that in query. On Fri, Mar 27, 2015 at 1:17 PM, @Sanjiv Singh sanjiv.is...@gmail.com wrote: Hi All, I am getting into Hive and learning hive. I have customer table in teradata , used sqoop to extract complete table in hive which worked fine. See below customer table both in Teradata and HIVE. *In Teradata :* select TOP 4 id,name,''||status||'' from customer; 3172460 Customer#003172460 BUILDING 3017726 Customer#003017726 BUILDING 2817987 Customer#002817987 COMPLETE 2817984 Customer#002817984 BUILDING *In HIVE :* select id,name,CONCAT ('' , status , '') from customer LIMIT 4; 3172460 Customer#003172460 BUILDING 3017726 Customer#003017726 BUILDING 2817987 Customer#002817987 COMPLETE 2817984 Customer#002817984 BUILDING When I tried to fetch records from table customer with column matching which is of String type. I am getting different result for same query in different environment. See below query results.. *In Teradata :* select TOP 2 id,name,''||status||'' from customer WHERE status = 'BUILDING'; 3172460 Customer#003172460 BUILDING 3017726 Customer#003017726 BUILDING *In HIVE :* select id,name,CONCAT ('' , status , '') from customer WHERE status = 'BUILDING' LIMIT 2; ***No Result*** It seems that teradata is doing trimming short of thing before actually comparing stating values. But Hive is matching strings as it is. Not sure, It is expected behaviour or bug or can be raised as enhancement. I see below possible solution: - Convert into like operator expression with wildcard character before and after Looking forward for your response on this. How can it be handled/achieved in hive. Regards Sanjiv Singh Mob : +091 9990-447-339 -- Nitin Pawar -- Nitin Pawar
Re: Re: how to set column level privileges
Column level security in hive was added at HIVE-5837 https://issues.apache.org/jira/browse/HIVE-5837 It has the PDF link for your readings. https://cwiki.apache.org/confluence/display/Hive/AuthDev talks about setting column level permissions On Thu, Mar 26, 2015 at 4:39 PM, Allen bjallenw...@sina.com wrote: Thanks for your replay. If we handle the privileges by creating views, it will lead to lots of views in our database. I found there is a table named TBL_COL_PRIV in hive metastore database, maybe this table is related to column privilege,but it is never used in hive. Anybody knew why? - 原始邮件 - 发件人:Daniel Haviv daniel.ha...@veracity-group.com 收件人:user@hive.apache.org user@hive.apache.org 主题:Re: how to set column level privileges 日期:2015年03月26日 18点42分 Create a view with the permitted columns and handle the privileges for it Daniel On 26 במרץ 2015, at 12:40, Allen bjallenw...@sina.com wrote: hi, We use SQL standards based authorization for authorization in Hive 0.14. But it has not support for column level privileges. So, I want to know Is there anyway to set column level privileges? Thanks! -- Nitin Pawar
Re: CREATE FUNCTION: How to automatically load extra jar file?
:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:740) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar )' Execution failed with exit status: 1 Obtaining error information Task failed! Task ID: Stage-1 Logs: /tmp/hadoop/hive.log FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask Step 5: (check the file) hive dfs -ls /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar; ls: `/tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar': No such file or directory Command failed with exit code = 1 Query returned non-zero code: 1, cause: null -- Nitin Pawar
Re: CREATE FUNCTION: How to automatically load extra jar file?
If you put a file inside tmp then there is no guarantee it will live there forever based on ur cluster configuration. You may want to put it as a place where all users can access it like making a folder and keeping it read permission On Wed, Dec 31, 2014 at 11:40 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Thanks. Below are my steps, I did copy my JAR to HDFS and CREATE FUNCTION using the JAR in HDFS, however during my smoke test, I got FileNotFoundException. java.io.FileNotFoundException: File does not exist: hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar Step 1: (make sure the jar in in HDFS) hive dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar; -rw-r--r-- 3 hadoop hadoop 57388 2014-12-30 10:02 hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar Step 2: (drop if function exists) hive drop function sysdate; OK Time taken: 0.013 seconds Step 3: (create function using the jar in HDFS) hive CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar'; converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar Added /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar to class path Added resource: /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar OK Time taken: 0.034 seconds Step 4: (test) hive select sysdate(); Execution log at: /tmp/hadoop/hadoop_20141230101717_282ec475-8621-40fa-8178-a7927d81540b.log java.io.FileNotFoundException: File does not exist: hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar Please help! Arthur On 31 Dec, 2014, at 12:31 am, Nitin Pawar nitinpawar...@gmail.com wrote: just copy pasting Jason's reply to other thread If you have a recent version of Hive (0.13+), you could try registering your UDF as a permanent UDF which was added in HIVE-6047: 1) Copy your JAR somewhere on HDFS, say hdfs:///home/nirmal/udf/hiveUDF-1.0-SNAPSHOT.jar. 2) In Hive, run CREATE FUNCTION zeroifnull AS 'com.test.udf.ZeroIfNullUDF' USING JAR ' hdfs:///home/nirmal/udf/hiveUDF-1.0-SNAPSHOT.jar'; The function definition should be saved in the metastore and Hive should remember to pull the JAR from the location you specified in the CREATE FUNCTION call. On Tue, Dec 30, 2014 at 9:54 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Thank you. Will this work for *hiveserver2 *? Arthur On 30 Dec, 2014, at 2:24 pm, vic0777 vic0...@163.com wrote: You can put it into $HOME/.hiverc like this: ADD JAR full_path_of_the_jar. Then, the file is automatically loaded when Hive is started. Wantao At 2014-12-30 11:01:06, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I am using Hive 0.13.1 on Hadoop 2.4.1, I need to automatically load an extra JAR file to hive for UDF, below are my steps to create the UDF function. I have tried the following but still no luck to get thru. Please help!! Regards Arthur Step 1: (make sure the jar in in HDFS) hive dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar; -rw-r--r-- 3 hadoop hadoop 57388 2014-12-30 10:02 hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar Step 2: (drop if function exists) hive drop function sysdate; OK Time taken: 0.013 seconds Step 3: (create function using the jar in HDFS) hive CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar'; converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar Added /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar to class path Added resource: /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar OK Time taken: 0.034 seconds Step 4: (test) hive select sysdate(); Automatically selecting local only mode for query Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hbase-0.98.5-hadoop2/lib/phoenix-4.1.0-client-hadoop2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 14/12/30 10:17:06 WARN conf.Configuration: file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/12/30 10:17:06 WARN conf.Configuration: file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003
Re: Detailing on how UPDATE is performed in Hive
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions entire implementation is under jira here https://issues.apache.org/jira/browse/HIVE-5317 On Thu, Nov 27, 2014 at 4:11 PM, unmesha sreeveni unmeshab...@gmail.com wrote: Hi friends Where can I find details on how update is performed in Hive. 1. When an update is performed,whether HDFS will write that block elsewhere with the new value. 2. whether the old block is unallocated and is allowed for further writes. 3. Whether this process create fragmentation ? 4. while creating a partitioned table, and update is performed ,whether the partition is deleted and updated with new value or the entire block is deleted and written once again? where will be the good place to gather these knowlege -- *Thanks Regards * *Unmesha Sreeveni U.B* *Hadoop, Bigdata Developer* *Centre for Cyber Security | Amrita Vishwa Vidyapeetham* http://www.unmeshasreeveni.blogspot.in/ -- Nitin Pawar
Re: UPDATE in Hive -0.14.0
whats your create table DDL? On 24 Nov 2014 13:43, unmesha sreeveni unmeshab...@gmail.com wrote: Hi I am using hive -0.14.0 which support UPDATE statement but I am getting an error once I did this Command UPDATE Emp SET salary = 5 WHERE employeeid = 19; FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations. hive Am I doing anything wrong? -- *Thanks Regards * *Unmesha Sreeveni U.B* *Hadoop, Bigdata Developer* *Centre for Cyber Security | Amrita Vishwa Vidyapeetham* http://www.unmeshasreeveni.blogspot.in/
Re: from_unixtime() and epoch definition
Hi Maciek, Jason Sorry I could not find my old code but I came up with a little code as much as I can remember. you can try the following jar https://github.com/nitinpawar/hive-udfs/tree/master/FromUnixtimeWithTZ/dist and let me know if this works for you guys. I can change it the way it needs to be PS: I am not a java dev so forgive anything bad I have done in there On Thu, Nov 6, 2014 at 3:44 PM, Maciek mac...@sonra.io wrote: @Jason: re. Hive (…) just assumes things are in the system's local timezone, just to clarify - this is not true in case of conversions (from_unixtime()) as it respects the local system TZ settings hence the problem. TZ itself is a very hairy subject and would definitely be a big undertaking. Extending from_unixtime seems like easiest solution for now. Happy to do ER in JIRA but haven't done this for before... @Nitin Would be very grateful if you're able to dig it out! Thanks! Best Regards On Thu, Nov 6, 2014 at 7:48 AM, Jason Dere jd...@hortonworks.com wrote: That would be great! On Nov 5, 2014, at 10:49 PM, Nitin Pawar nitinpawar...@gmail.com wrote: May be a JIRA ? I remember having my own UDF for doing this. If possible I will share the code On Thu, Nov 6, 2014 at 6:22 AM, Jason Dere jd...@hortonworks.com wrote: Hive should probably at least provide a timezone option to from_unixtime(). As you mentioned, Hive doesn't really do any timezone handling, just assumes things are in the system's local timezone. It will be a bit of a bigger project to add better time zone handling to Hive timestamps. On Nov 5, 2014, at 7:18 AM, Maciek mac...@sonra.io wrote: I see… and confirm, it's consistent with Linux/Unix output I get: date -r 0 Thu 1 Jan 1970 01:00:00 IST date Wed 5 Nov 2014 14:49:52 GMT Got some digging and it actually makes sense. Turns out Ireland didn't observe daylight saving time in years 1968-1972 as set permanently to GMT+1=IST. Anyway, back to Hive I'm trying to convert unix_times to UTC (using from_unixtime UDF )but due to the issue it I'm getting different results on different servers (TZ settings) Is there any way influence that behaviour without changing timezone on the server? Oracle for that instance offers a good few options to facilitate timezone conversion, among the others: 'AT TIME ZONE [GMT]' clause ALTER SESSION SET TIME_ZONE [= 'GMT'] or to_timestamp_tz() function Currently it seems, the only way to perform this conversion is to detect server settings first (won't work at all for some cases like though JDBC connection I think) and apply the shift during the process. Would be really nice if Hive offers some elegant way to support this. I'm thinking of similar ALTER SESSION statement equivalent, maybe parameter SET in hive or extra parameter for the from_unixtime() Hive function? On Mon, Nov 3, 2014 at 10:33 PM, Jason Dere jd...@hortonworks.com wrote: As Nitin mentions, the behavior is to a string representing the timestamp of that moment in the current system time zone. What are the timezone settings on your machine? $ TZ=GMT date -r 0 Thu Jan 1 00:00:00 GMT 1970 $ TZ=UTC date -r 0 Thu Jan 1 00:00:00 UTC 1970 $ TZ=Europe/London date -r 0 Thu Jan 1 01:00:00 BST 1970 $ TZ=Europe/Dublin date -r 0 Thu Jan 1 01:00:00 IST 1970 On Nov 3, 2014, at 12:50 PM, Maciek mac...@sonra.io wrote: I'd consider this behaviour as a bug and would like to raise it as such. Is there anyone to confirm it's the same on Hive 0.14? On Fri, Oct 31, 2014 at 3:41 PM, Maciek mac...@sonra.io wrote: Actually confirmed! It's down to the timezone settings I've moved temporarily server/client settings to 'Atlantic/Reykjavik' (no change in time comparing to what I was on (GMT), but it's permanent UTC and as such doesn't observe daylight saving. I believe this shouldn't matter (see my points from previous mail) but apparently there's an issue with it. Not sure how to deal with this situation (can't just change TZ settings everywhere because of Hive) and don't want to hardcode anything. I'm on Hive 0.13. Does Hive 0.14 provide better support for TimeZones? On Fri, Oct 31, 2014 at 3:25 PM, Maciek mac...@sonra.io wrote: Thought about that myself based on my prior (bad) experience when tried to working with timezones in Hive (functionality pretty much doesn't exists) That shouldn't be the case here though, here's why: in Oracle [timestamp with timezone] can be adjusted when sent/displayed on the client based on client's settings. This may be also relevant if the timestamp in question would fall onto client's daily saving time period. This behaviour would make sense to me, however: • this is server, not client settings we're talking about here • the server and client do reside in the same timezone anyway, which is currently GMT [UTC] • while we observe the daily saving here [Dublin] the time in question (1970-01-01 00:00:00) is not in that period, neither the time
Re: Unix script for identifying current active namenode in a HA cluster
looks good to me thanks for the share On Wed, Nov 5, 2014 at 5:15 PM, Devopam Mittra devo...@gmail.com wrote: hi Nitin, Thanks for the vital input around Hadoop Home addition. At times such things totally go off the radar when you have customized your own environment. As suggested I have shared this on github : https://github.com/devopam/hadoopHA apologies if there is any problem on github as I have limited familiarity with it :( regards Devopam On Wed, Nov 5, 2014 at 12:31 PM, Nitin Pawar nitinpawar...@gmail.com wrote: +1 If you can optionally add hadoop home directory in the script and use that in path, it can be used out of the box. Also can you share this on github On Wed, Nov 5, 2014 at 10:02 AM, Devopam Mittra devo...@gmail.com wrote: hi All, Please find attached a simple shell script to dynamically determine the active namenode in the HA Cluster and subsequently run the Hive job / query via Talend OS generated workflows. It was tried successfully on a HDP2.1 cluster with 2 nn, 7 dn running on CentOS 6.5. Each ETL job invokes this script first in our framework to derive the NN FQDN and then run the hive jobs subsequently to avoid failures. Takes a max. of 2 secs to execute (small cost in our case, as compared to dealing with a failure and then recalculating the NN to resubmit the job). Sharing it with you in case you can leverage the same without spending effort to code it. Do share your feedback/ fixes if you spot any. -- Devopam Mittra Life and Relations are not binary -- Nitin Pawar -- Devopam Mittra Life and Relations are not binary -- Nitin Pawar
Re: from_unixtime() and epoch definition
May be a JIRA ? I remember having my own UDF for doing this. If possible I will share the code On Thu, Nov 6, 2014 at 6:22 AM, Jason Dere jd...@hortonworks.com wrote: Hive should probably at least provide a timezone option to from_unixtime(). As you mentioned, Hive doesn't really do any timezone handling, just assumes things are in the system's local timezone. It will be a bit of a bigger project to add better time zone handling to Hive timestamps. On Nov 5, 2014, at 7:18 AM, Maciek mac...@sonra.io wrote: I see… and confirm, it's consistent with Linux/Unix output I get: date -r 0 Thu 1 Jan 1970 01:00:00 IST date Wed 5 Nov 2014 14:49:52 GMT Got some digging and it actually makes sense. Turns out Ireland didn't observe daylight saving time in years 1968-1972 as set permanently to GMT+1=IST. Anyway, back to Hive I'm trying to convert unix_times to UTC (using from_unixtime UDF )but due to the issue it I'm getting different results on different servers (TZ settings) Is there any way influence that behaviour without changing timezone on the server? Oracle for that instance offers a good few options to facilitate timezone conversion, among the others: 'AT TIME ZONE [GMT]' clause ALTER SESSION SET TIME_ZONE [= 'GMT'] or to_timestamp_tz() function Currently it seems, the only way to perform this conversion is to detect server settings first (won't work at all for some cases like though JDBC connection I think) and apply the shift during the process. Would be really nice if Hive offers some elegant way to support this. I'm thinking of similar ALTER SESSION statement equivalent, maybe parameter SET in hive or extra parameter for the from_unixtime() Hive function? On Mon, Nov 3, 2014 at 10:33 PM, Jason Dere jd...@hortonworks.com wrote: As Nitin mentions, the behavior is to a string representing the timestamp of that moment in the current system time zone. What are the timezone settings on your machine? $ TZ=GMT date -r 0 Thu Jan 1 00:00:00 GMT 1970 $ TZ=UTC date -r 0 Thu Jan 1 00:00:00 UTC 1970 $ TZ=Europe/London date -r 0 Thu Jan 1 01:00:00 BST 1970 $ TZ=Europe/Dublin date -r 0 Thu Jan 1 01:00:00 IST 1970 On Nov 3, 2014, at 12:50 PM, Maciek mac...@sonra.io wrote: I'd consider this behaviour as a bug and would like to raise it as such. Is there anyone to confirm it's the same on Hive 0.14? On Fri, Oct 31, 2014 at 3:41 PM, Maciek mac...@sonra.io wrote: Actually confirmed! It's down to the timezone settings I've moved temporarily server/client settings to 'Atlantic/Reykjavik' (no change in time comparing to what I was on (GMT), but it's permanent UTC and as such doesn't observe daylight saving. I believe this shouldn't matter (see my points from previous mail) but apparently there's an issue with it. Not sure how to deal with this situation (can't just change TZ settings everywhere because of Hive) and don't want to hardcode anything. I'm on Hive 0.13. Does Hive 0.14 provide better support for TimeZones? On Fri, Oct 31, 2014 at 3:25 PM, Maciek mac...@sonra.io wrote: Thought about that myself based on my prior (bad) experience when tried to working with timezones in Hive (functionality pretty much doesn't exists) That shouldn't be the case here though, here's why: in Oracle [timestamp with timezone] can be adjusted when sent/displayed on the client based on client's settings. This may be also relevant if the timestamp in question would fall onto client's daily saving time period. This behaviour would make sense to me, however: • this is server, not client settings we're talking about here • the server and client do reside in the same timezone anyway, which is currently GMT [UTC] • while we observe the daily saving here [Dublin] the time in question (1970-01-01 00:00:00) is not in that period, neither the time I'm sending the query (now). Based on all above, I don't see the reason the time gets shifted by one hour, but I realise the issue might be down to the general problems in Hive' implementation of timezones… On Fri, Oct 31, 2014 at 12:26 PM, Nitin Pawar nitinpawar...@gmail.com wrote: In hive from_unixtime is returned from the timezone which you belong to From document : from_unixtime(bigint unixtime[, string format]) : Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of 1970-01-01 00:00:00. if possible can you also check by changing the timezone to UTC on your machine? On Fri, Oct 31, 2014 at 12:00 PM, Maciek mac...@sonra.io wrote: Any reason why select from_unixtime(0) t0 FROM … gives 1970-01-01 01:00:00 ? By all available definitions (epoch, from_unixtime etc..) I would expect it to be 1970-01-01 00:00:00…? CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain
Re: Hive 0.14 configuration
currently only ORCFileformat is supports ACIDOutputformat So you may want to create a table with orcfile format and see if you are able to do acid opertaions. On Tue, Nov 4, 2014 at 1:14 PM, mahesh kumar sankarmahes...@gmail.com wrote: Hi Nitin, how to create table with AcidOuptut Format.?Can you send me examples. Thanks Mahesh On Tue, Nov 4, 2014 at 12:21 PM, Nitin Pawar nitinpawar...@gmail.com wrote: As the error says, your table file format has to be AcidOutPutFormat or table needs to be bucketed to perform update operation. You may want to create a new table from your existing table with AcidOutPutFormat and insert data from current table to that table and then try update op on new table On Tue, Nov 4, 2014 at 12:11 PM, mahesh kumar sankarmahes...@gmail.com wrote: Hi , Is anyone tried hive 0.14 configuration.I built it using maven from github. Insert is working fine but when i use update/delete i got the error.First i created table and inserted rows. CREATE TABLE new(id int ,name string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; insert into table new values ('1','Mahesh'); update new set name='Raj' where id=1; FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table default.new that does not use an AcidOutputFormat or is not bucketed. When i update the table i got the above error. Can you help me guys. Thanks Mahesh.S -- Nitin Pawar -- Nitin Pawar
Re: Unix script for identifying current active namenode in a HA cluster
+1 If you can optionally add hadoop home directory in the script and use that in path, it can be used out of the box. Also can you share this on github On Wed, Nov 5, 2014 at 10:02 AM, Devopam Mittra devo...@gmail.com wrote: hi All, Please find attached a simple shell script to dynamically determine the active namenode in the HA Cluster and subsequently run the Hive job / query via Talend OS generated workflows. It was tried successfully on a HDP2.1 cluster with 2 nn, 7 dn running on CentOS 6.5. Each ETL job invokes this script first in our framework to derive the NN FQDN and then run the hive jobs subsequently to avoid failures. Takes a max. of 2 secs to execute (small cost in our case, as compared to dealing with a failure and then recalculating the NN to resubmit the job). Sharing it with you in case you can leverage the same without spending effort to code it. Do share your feedback/ fixes if you spot any. -- Devopam Mittra Life and Relations are not binary -- Nitin Pawar
Re: Hive 0.14 configuration
As the error says, your table file format has to be AcidOutPutFormat or table needs to be bucketed to perform update operation. You may want to create a new table from your existing table with AcidOutPutFormat and insert data from current table to that table and then try update op on new table On Tue, Nov 4, 2014 at 12:11 PM, mahesh kumar sankarmahes...@gmail.com wrote: Hi , Is anyone tried hive 0.14 configuration.I built it using maven from github. Insert is working fine but when i use update/delete i got the error.First i created table and inserted rows. CREATE TABLE new(id int ,name string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; insert into table new values ('1','Mahesh'); update new set name='Raj' where id=1; FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table default.new that does not use an AcidOutputFormat or is not bucketed. When i update the table i got the above error. Can you help me guys. Thanks Mahesh.S -- Nitin Pawar
Re: from_unixtime() and epoch definition
Do you have a copy paste error? I see both values as same On Fri, Oct 31, 2014 at 5:30 PM, Maciek mac...@sonra.io wrote: Any reason why select from_unixtime(0) t0 FROM … gives 1970-01-01 01:00:00 ? By all available definitions (epoch, from_unixtime etc..) I would expect it to be 1970-01-01 01:00:00…? -- Nitin Pawar
Re: from_unixtime() and epoch definition
In hive from_unixtime is returned from the timezone which you belong to From document : from_unixtime(bigint unixtime[, string format]) : Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of 1970-01-01 00:00:00. if possible can you also check by changing the timezone to UTC on your machine? On Fri, Oct 31, 2014 at 5:32 PM, Maciek mac...@sonra.io wrote: meant 1970-01-01 00:00:00 of course… On Fri, Oct 31, 2014 at 12:00 PM, Maciek mac...@sonra.io wrote: Any reason why select from_unixtime(0) t0 FROM … gives 1970-01-01 01:00:00 ? By all available definitions (epoch, from_unixtime etc..) I would expect it to be 1970-01-01 00:00:00…? -- Kind Regards Maciek Kocon -- Nitin Pawar
Re: select * from table and select column from table in hive
whats your table create ddl? is the data in csv like format? On 21 Oct 2014 00:26, Raj Hadoop hadoop...@yahoo.com wrote: I am able to see the data in the table for all the columns when I issue the following - SELECT * FROM t1 WHERE dt1='2013-11-20' But I am unable to see the column data when i issue the following - SELECT cust_num FROM t1 WHERE dt1='2013-11-20' The above shows null values. How should I debug this ?
Re: Querying A table which JDBC
can you share hiveserver2 heap size and your table size ? On Tue, Sep 23, 2014 at 11:31 PM, Shiang Luong shiang.lu...@openx.com wrote: Ritesh thanks for your response. Where do I download and place the jars? Do you mean on the hive server itself? I believe the files are already there since I can query the same table via command line. It feels like the serde is not being sent along with the query? or I need to get the jar sent out to the distributed cache? I even tried running: myStatment.execute(add JAR /usr/lib/hive/extra_libs/test.jar); That didn't work. I'm not sure just shooting out thoughts. Thanks, Shiang On Mon, Sep 22, 2014 at 10:52 PM, Ritesh Kumar Singh riteshoneinamill...@gmail.com wrote: try downloading the jar files and put it in the libraries folder On Tue, Sep 23, 2014 at 10:58 AM, Shiang Luong shiang.lu...@openx.com wrote: Hi All, I'm new to hive. I'm having some problems querying an hive table with JDBC. It fails when it is trying to run an map reduce job. It can't seem to find the serde jar file. When I query it through the command line it works fine. Anyone have any hints on how I can get it working with JDBC? Thanks in advance. Shiang -- Shiang Luong Software Engineer in Test | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x | m: +1 (626) 512-2165 | shiang.lu...@openx.com OpenX ranked No. 7 in Forbes’ America’s Most Promising Companies -- Nitin Pawar
Re: Handling updates to Bucketed Table
When you bucket the data in a partition, there will be a file created for each of your bucketing key. Now if you add more data to the same bucket that means that file would need to rebuild I would prefer a partition on day level under month level where I write the data once a day and bucket the data there I am not sure hive supports append to bucketed files yet. please wait for others to answer as well On Thu, Sep 18, 2014 at 9:27 PM, Kumar V kumarbuyonl...@yahoo.com wrote: Hi, I would like to know how to handle frequent updates to bucketed tables. Is there a way to update without a rebuild ? I have a monthly partition for a table with buckets. But I have to update the table every day. Is there a way to achieve this without a rebuild of this partition every day ? Or, is this a wrong use case for a bucketed table ? This table is joined with another table. So, I thought bucketing will speed up the queries. What are my options ? Please let me know. Regards, Murali. -- Nitin Pawar
Re: Correlated Subqueries Workaround in Hive!
have you taken a look at lag and lead functions ? On Mon, Sep 15, 2014 at 4:46 PM, Viral Parikh viral.j.par...@gmail.com wrote: To Whomsoever It May Concern, I posted this question last week but still haven't heard from anyone; I'd appreciate any reply. I've got a table that contains a LocationId field. In some cases, where a record shares the same foreign key, the LocationId might come through as -1. What I want to do is in my select query is in the case of this happening, the previous location. Example data: Record FK StartTime EndTime Location1 110 2011/01/01 12.302011/01/01 6.10 4562 110 2011/01/01 3.40 2011/01/01 4.00 -13 110 2011/01/02 1.00 2011/01/02 8.00 8914 110 2011/01/02 5.00 2011/01/02 6.00 -15 110 2011/01/02 6.10 2011/01/02 6.30 -1 The -1 should come out as 456 for record 2, and 891 for record 4 and 5 Can someone help me do this with Hive syntax? I can do it using SQL syntax (as below) but since Hive doesnt support correlated subqueries in select clauses and so I am unable to get it. SELECT T1.record, T1.fk, T1.start_time, T1.end_time, CASE WHEN T1.location != -1 THEN Location ELSE ( SELECT TOP (1) T2.location FROM#temp1 AS T2 WHERE T2.record T1.record AND T2.fk = T1.fk AND T2.location != -1 ORDER BY T2.Record DESC ) ENDFROM#temp1 AS T1 Thank you for your help in advance! -- Nitin Pawar
Re: Correlated Subqueries Workaround in Hive!
Other way I can think at this is .. 1) ignore all -1 and create a tmp table 2) I see there are couple of time stamps 3) Oder the table by timestamp 4) from this tmp tabel create anothe tmp table which says FK MinStartTime MaxEndTime Location 5) Now this tmp table from step 4 join with ur raw data and put where clause with min and max times I hope this is not confusing On Mon, Sep 15, 2014 at 6:25 PM, Viral Parikh viral.j.par...@gmail.com wrote: thanks! is there any other way than writing python UDF etc. any way i can leverage hive joins to get this working? On Mon, Sep 15, 2014 at 6:56 AM, Sreenath sreenaths1...@gmail.com wrote: How about writing a python UDF that takes input line by line and it saves the previous lines location and can replace it with that if location turns out to be '-1' On 15 September 2014 17:01, Nitin Pawar nitinpawar...@gmail.com wrote: have you taken a look at lag and lead functions ? On Mon, Sep 15, 2014 at 4:46 PM, Viral Parikh viral.j.par...@gmail.com wrote: To Whomsoever It May Concern, I posted this question last week but still haven't heard from anyone; I'd appreciate any reply. I've got a table that contains a LocationId field. In some cases, where a record shares the same foreign key, the LocationId might come through as -1. What I want to do is in my select query is in the case of this happening, the previous location. Example data: Record FK StartTime EndTime Location1 110 2011/01/01 12.302011/01/01 6.10 4562 110 2011/01/01 3.40 2011/01/01 4.00 -13 110 2011/01/02 1.00 2011/01/02 8.00 8914 110 2011/01/02 5.00 2011/01/02 6.00 -15 110 2011/01/02 6.10 2011/01/02 6.30 -1 The -1 should come out as 456 for record 2, and 891 for record 4 and 5 Can someone help me do this with Hive syntax? I can do it using SQL syntax (as below) but since Hive doesnt support correlated subqueries in select clauses and so I am unable to get it. SELECT T1.record, T1.fk, T1.start_time, T1.end_time, CASE WHEN T1.location != -1 THEN Location ELSE ( SELECT TOP (1) T2.location FROM#temp1 AS T2 WHERE T2.record T1.record AND T2.fk = T1.fk AND T2.location != -1 ORDER BY T2.Record DESC ) ENDFROM#temp1 AS T1 Thank you for your help in advance! -- Nitin Pawar -- Sreenath S Kamath Bangalore Ph No:+91-9590989106 -- Nitin Pawar
Re: Dynamic Partitioning- Partition_Naming
Thanks for correcting me Anusha, Here are the links you gave me https://cwiki.apache.org/confluence/display/Hive/HCatalog+Config+Properties https://issues.apache.org/jira/secure/attachment/12622686/HIVE-6109.pdf On Tue, Sep 9, 2014 at 5:16 PM, Nitin Pawar nitinpawar...@gmail.com wrote: you can not modify the paths of partitions being created by dynamic partitioning or rename them Thats the default implementation for having column=value in path as partition On Tue, Sep 9, 2014 at 5:18 AM, anusha Mangina anusha.mang...@gmail.com wrote: I need a table partitioned by country and then city . I created a table and INSERTed data from another table using dynamic partition. CREATE TABLE invoice_details_hive _partitioned(Invoice_Id double,Invoice_Date string,Invoice_Amount double,Paid_Date string)PARTITIONED BY(pay_country STRING,pay_location STRING); Everything worked fine. Partitions by default are named like pay_country=INDIA and pay_city=DELHI etc in ../hive/warehouse/invoice_details_hive_partitioned/pay_country=INDIA/pay_city=DELHI can I get partition name as Just Column Value INDIA and DELHI ...not including column name ...like /hive/warehouse/invoice_details_hive _partitioned/INDIA/DELHI? Thanks in Advance -- Nitin Pawar -- Nitin Pawar
Re: Dynamic Partitioning- Partition_Naming
you can not modify the paths of partitions being created by dynamic partitioning or rename them Thats the default implementation for having column=value in path as partition On Tue, Sep 9, 2014 at 5:18 AM, anusha Mangina anusha.mang...@gmail.com wrote: I need a table partitioned by country and then city . I created a table and INSERTed data from another table using dynamic partition. CREATE TABLE invoice_details_hive _partitioned(Invoice_Id double,Invoice_Date string,Invoice_Amount double,Paid_Date string)PARTITIONED BY(pay_country STRING,pay_location STRING); Everything worked fine. Partitions by default are named like pay_country=INDIA and pay_city=DELHI etc in ../hive/warehouse/invoice_details_hive_partitioned/pay_country=INDIA/pay_city=DELHI can I get partition name as Just Column Value INDIA and DELHI ...not including column name ...like /hive/warehouse/invoice_details_hive _partitioned/INDIA/DELHI? Thanks in Advance -- Nitin Pawar
Re: Hive columns
If those are text files you can create the table with single column and then process them line by line On Thu, Sep 4, 2014 at 6:13 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Hello, Is it possible to create an external table without specifying the columns? In fact, I am creating an external table that points to a directory that contains 3 text file, and each text file has different number of columns. Thanks *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar
Re: Hive columns
it means you will need to define atleast one column in hive or build your fileformat which can handle reading the files and giving data back to hive when i say atleast one column, by default hive uses \n as record terminator that means you can define an entire row as a column and then process it the way you want this is just a suggestion and it would be really tedious to keep the mapping. Instead I would suggest use pig to create proper tables from these files and then use hive to do more deeper analytics On Thu, Sep 4, 2014 at 6:35 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Can you please specify what this means? *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com] *Sent:* Thursday, September 04, 2014 4:00 PM *To:* user@hive.apache.org *Subject:* Re: Hive columns If those are text files you can create the table with single column and then process them line by line On Thu, Sep 4, 2014 at 6:13 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Hello, Is it possible to create an external table without specifying the columns? In fact, I am creating an external table that points to a directory that contains 3 text file, and each text file has different number of columns. Thanks *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar
Re: Mysql - Hive Sync
have you looked at sqoop? On Wed, Sep 3, 2014 at 10:15 AM, Muthu Pandi muthu1...@gmail.com wrote: Dear All Am developing a prototype of syncing tables from mysql to Hive using python and JDBC. Is it a good idea using the JDBC for this purpose. My usecase will be generating the sales report using the hive, data pulled from mysql using the prototype tool.My data will be around 2GB/day. *Regards Muthupandi.K* [image: Picture (Device Independent Bitmap)] -- Nitin Pawar
Re: how to create custom user defined data type in Hive
from teradata documentation A PERIOD column in Teradata can be any date or timestamp type I think both of these are supported in hive-0.13 if not as Peyman suggested, strings are best friends when we are not sure On Tue, Aug 26, 2014 at 6:56 AM, reena upadhyay reena2...@gmail.com wrote: Hi, As long as the data type is ANSI complaint, its equivalent type is available in Hive. But there are few data types that are database specific. Like there is a PERIOD data type in teradata, it is specific to teradata only, So how to map such columns in Hive? Thanks. On Tue, Aug 26, 2014 at 6:44 AM, Peyman Mohajerian mohaj...@gmail.com wrote: As far as i know you cannot do that and most likely you don't need it, here are sample mappings between the two systems: Teradata Hive DECIMAL(x,y) double DATE,TIMESTAMP timestamp INTEGER,SMALLINT,BYTINT int VARCHAR,CHAR string DECIMAL(x,0) bigint I would typically stage data in hadoop as all string and then move it to hive managed/orc with the above mapping. On Mon, Aug 25, 2014 at 8:42 PM, reena upadhyay reena2...@gmail.com wrote: Hi, Is there any way to create custom user defined data type in Hive? I want to move some table data from teradata database to Hive. But in teradata database tables, there are few columns data type that are not supported in Hive. So to map the source table columns to my destination table columns in Hive, I want to create my own data type in Hive. I know about writing UDF's in Hive but have no idea about creating user defined data type in HIve. Any idea and example on the same would be of great help. Thanks. -- Nitin Pawar
Re: List of dates as arguments
with your shell script calculate your start date and end date hive $HIVEPARAMS -hiveconf startdate=$var1 -hiveconf enddate=$var2 also set in ..hiverc set hive.variable.substitute=true; On Sun, Aug 24, 2014 at 10:19 AM, karthik Srivasthava karthiksrivasth...@gmail.com wrote: As my raw-data table is partitioned by date.. i want to get data to run a query every days to find top 10 products in last 15 days . How to pass list of dates dynamically as arguments in hive query using hiveconf? -- Nitin Pawar
Re: List of dates as arguments
I am not sure if you can transform array from shell to java, you may want to write your own custom UDF for that if these are continuous dates, then you can have less than greater than comparison On Sun, Aug 24, 2014 at 12:39 PM, karthik Srivasthava karthiksrivasth...@gmail.com wrote: Nitin, Teja Thank you.. I exactly need what Teja suggested... i need list of dates between start date and end date On Sun, Aug 24, 2014 at 2:05 AM, Teja Kunapareddy tejakunapare...@gmail.com wrote: Thanks Nithin For your reply.. I can get start date and end date,. But can i get all the dates with in START DATE AND END DATE.??? . so that my query looks something like this Select a, b, c from table_x where date in (${hiveconf:LIST_OF DATES}) On 24 August 2014 01:18, Nitin Pawar nitinpawar...@gmail.com wrote: with your shell script calculate your start date and end date hive $HIVEPARAMS -hiveconf startdate=$var1 -hiveconf enddate=$var2 also set in ..hiverc set hive.variable.substitute=true; On Sun, Aug 24, 2014 at 10:19 AM, karthik Srivasthava karthiksrivasth...@gmail.com wrote: As my raw-data table is partitioned by date.. i want to get data to run a query every days to find top 10 products in last 15 days . How to pass list of dates dynamically as arguments in hive query using hiveconf? -- Nitin Pawar -- Nitin Pawar
Re: List of dates as arguments
Bala, I think they need an array substitution instead of string as hiveconf variable substitution On Sun, Aug 24, 2014 at 11:55 PM, Bala Krishna Gangisetty b...@altiscale.com wrote: Here is my understanding on your requirements. Let me know if I am missing something. You, a) would like to run a query daily to find top 10 products in the past 15 days b) would like to pass dates dynamically as arguments to HIVE query Given the requirement a), passing just two variables(startdate and enddate) to HIVE query will suffice to achieve the requirement b). Assuming startdate and enddate variables are passed to HIVE query, the query will look like below. SELECT * FROM *table_name* WHERE *date_column* BETWEEN *${hiveconf:startdate}* AND *${hiveconf:enddate}* Note, values for startdate and enddate must be enclosed in ' '. Hope this helps. --Bala G. On Sun, Aug 24, 2014 at 12:57 AM, Nitin Pawar nitinpawar...@gmail.com wrote: I am not sure if you can transform array from shell to java, you may want to write your own custom UDF for that if these are continuous dates, then you can have less than greater than comparison On Sun, Aug 24, 2014 at 12:39 PM, karthik Srivasthava karthiksrivasth...@gmail.com wrote: Nitin, Teja Thank you.. I exactly need what Teja suggested... i need list of dates between start date and end date On Sun, Aug 24, 2014 at 2:05 AM, Teja Kunapareddy tejakunapare...@gmail.com wrote: Thanks Nithin For your reply.. I can get start date and end date,. But can i get all the dates with in START DATE AND END DATE.??? . so that my query looks something like this Select a, b, c from table_x where date in (${hiveconf:LIST_OF DATES}) On 24 August 2014 01:18, Nitin Pawar nitinpawar...@gmail.com wrote: with your shell script calculate your start date and end date hive $HIVEPARAMS -hiveconf startdate=$var1 -hiveconf enddate=$var2 also set in ..hiverc set hive.variable.substitute=true; On Sun, Aug 24, 2014 at 10:19 AM, karthik Srivasthava karthiksrivasth...@gmail.com wrote: As my raw-data table is partitioned by date.. i want to get data to run a query every days to find top 10 products in last 15 days . How to pass list of dates dynamically as arguments in hive query using hiveconf? -- Nitin Pawar -- Nitin Pawar -- Nitin Pawar
Re: Passing variables using Hiveconf
this is one way hive $HIVEPARAMS -hiveconf target=$var1 -hiveconf mapred.child.java.opts=-server -Xmx1200m -Djava.net.preferIPv4Stack=true and you need to set this variable set hive.variable.substitute=true; On Fri, Aug 22, 2014 at 9:24 PM, karthik Srivasthava karthiksrivasth...@gmail.com wrote: Hi, I am passing substitution variable using hiveconf in Hive.. But i couldnt execute simple queries when i am trying to pass more than one parameter. It throws NoViableAltException - AtomExpression.. Am i missing something.? -- Nitin Pawar
Re: Load CSV files with embedded map and arrays to Hive
Hey sorry .. got stuck with work. I will take a look today On Wed, Aug 20, 2014 at 5:43 PM, Sushant Prusty sushan...@gmx.com wrote: Hi Nitin, Hope you have received the dataset. If you have any further requirement, please feel free to contact. Will appreciate your help. Regards, Sushant On Tuesday 19 August 2014 02:33 PM, Nitin Pawar wrote: can you give an example of your dataset? On Tue, Aug 19, 2014 at 2:31 PM, Sushant Prusty sushan...@gmx.com wrote: Pl let me know how I can load a CSV file with embedded map and arrays data into Hive. Regards, Sushant -- Nitin Pawar -- Warm regards, Sushant Prusty -- Nitin Pawar
Re: Load CSV files with embedded map and arrays to Hive
can you give an example of your dataset? On Tue, Aug 19, 2014 at 2:31 PM, Sushant Prusty sushan...@gmx.com wrote: Pl let me know how I can load a CSV file with embedded map and arrays data into Hive. Regards, Sushant -- Nitin Pawar
Re: Cache tables in hive
are you talking about the tables in map--join being loaded into distributed cache? On Wed, Aug 13, 2014 at 6:01 PM, harish tangella harish.tange...@gmail.com wrote: Hi all, Request you to help What are cache tables in hive Regards Harish -- Nitin Pawar
Re: Distributed data
what do you mean the data is distributed on many computers? are you saying the data is on hdfs like filesystem ? On Tue, Aug 12, 2014 at 5:51 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Hello, Using Hive, we know that we should specify the file path to read data from a specific location. If the data is distributed on many computers, how can we read it? Thanks *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar
Re: Distributed data
If your hadoop is setup with same filesystem as hdfs, hive will take care of it If your hdfs is totally different than where the file resides, then you need to get the file from that filesystem and then push it to hive using load if that filesystem supports import/export with tools like sqoop then you can use them as well On Tue, Aug 12, 2014 at 5:58 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Yes I mean the data is on hdfs like filesystem *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com] *Sent:* Tuesday, August 12, 2014 3:26 PM *To:* user@hive.apache.org *Subject:* Re: Distributed data what do you mean the data is distributed on many computers? are you saying the data is on hdfs like filesystem ? On Tue, Aug 12, 2014 at 5:51 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Hello, Using Hive, we know that we should specify the file path to read data from a specific location. If the data is distributed on many computers, how can we read it? Thanks *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar
Re: Hive: Centralized HDFS Caching
Please take a look at hive with tez as execution engine on hadoop 2.3. it may help you compare it with what you want to achieve On Fri, Aug 1, 2014 at 4:13 PM, Uli Bethke uli.bet...@sonra.io wrote: Hi. in Hive can I make use of the centralized cache management introduced in Hadoop 2.3 ( http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html)? If not implemented yet, is this on the roadmap? My use case is that I want to pin a fact table that needs to be queried frequently into memory. Impala already supports this as per the Cloudera documentation http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_perf_hdfs_caching.html Thanks uli -- Nitin Pawar
Re: How can I know one table is a partitioned table in hive?
what are the options you have? can you write a java code which can interact with hcatalog ? or you can do a describle table and check for partion column details in there On Thu, Jul 31, 2014 at 1:11 PM, 张甲超 rebeyond1...@gmail.com wrote: dear all, I want know that one table is a partitioned table in hive, and return the result to shell. How can I do? -- Nitin Pawar
Re: Input
if you specified ; as your delimiter then abc will be complete string and not abc only. Take a look at csv fileformat if you want to use proper comma delimited feature On Thu, Jul 31, 2014 at 3:44 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Hello, I am using Hive and trying to read from a txt file. I have an input like the following: “string”;”string”;”integer”. First, I specified that the row fields are delimited by a semi-column. Is it possible to read the integer without the quotations? Thank you *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar
Re: SELECT specific data
you mean just by writing query then I think no. But if you want to read only first 3 columns of the data then it would work with just a single table and load data into On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Hello, I am interested in selecting specific data from a source and loading it to a table. For example, if I have 5 columns in my dataset, I want to load 3 columns of it. Is it possible to do it without create a second table? Thank you *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar
Re: SELECT specific data
With hive, without creating a table with full data, you can do intermediate processing like select only few columns and write into another table, If this is something one time then you can take a look at awk or cut commands in linux and generate those files only. On Wed, Jul 30, 2014 at 2:49 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: I am only using Hive and hadoop, nothing more. *From:* Devopam Mittra [mailto:devo...@gmail.com] *Sent:* Wednesday, July 30, 2014 12:15 PM *To:* user@hive.apache.org *Subject:* Re: SELECT specific data Are you using any tool to load data ? If yes, then the ETL tool will provide you such options. If not, then please explore unix file processing/external table route. On Wed, Jul 30, 2014 at 2:09 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Hello, Thank you for your reply. Consider we have data divided into 5 columns (col1, col2, col3, col4, col5). So I can’t load directly col1, col3 and col5? If I can’t do it directly, can you provide me with an alternate solution? Thank you. *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com] *Sent:* Wednesday, July 30, 2014 11:37 AM *To:* user@hive.apache.org *Subject:* Re: SELECT specific data you mean just by writing query then I think no. But if you want to read only first 3 columns of the data then it would work with just a single table and load data into On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Hello, I am interested in selecting specific data from a source and loading it to a table. For example, if I have 5 columns in my dataset, I want to load 3 columns of it. Is it possible to do it without create a second table? Thank you *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Devopam Mittra Life and Relations are not binary *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar
Re: SELECT specific data
sorry hit send too soon .. I mean without creating intermediate tables, in hive you can process the file directly On Wed, Jul 30, 2014 at 3:06 PM, Nitin Pawar nitinpawar...@gmail.com wrote: With hive, without creating a table with full data, you can do intermediate processing like select only few columns and write into another table, If this is something one time then you can take a look at awk or cut commands in linux and generate those files only. On Wed, Jul 30, 2014 at 2:49 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: I am only using Hive and hadoop, nothing more. *From:* Devopam Mittra [mailto:devo...@gmail.com] *Sent:* Wednesday, July 30, 2014 12:15 PM *To:* user@hive.apache.org *Subject:* Re: SELECT specific data Are you using any tool to load data ? If yes, then the ETL tool will provide you such options. If not, then please explore unix file processing/external table route. On Wed, Jul 30, 2014 at 2:09 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Hello, Thank you for your reply. Consider we have data divided into 5 columns (col1, col2, col3, col4, col5). So I can’t load directly col1, col3 and col5? If I can’t do it directly, can you provide me with an alternate solution? Thank you. *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com] *Sent:* Wednesday, July 30, 2014 11:37 AM *To:* user@hive.apache.org *Subject:* Re: SELECT specific data you mean just by writing query then I think no. But if you want to read only first 3 columns of the data then it would work with just a single table and load data into On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Hello, I am interested in selecting specific data from a source and loading it to a table. For example, if I have 5 columns in my dataset, I want to load 3 columns of it. Is it possible to do it without create a second table? Thank you *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Devopam Mittra Life and Relations are not binary *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar -- Nitin Pawar
Re: SELECT specific data
Please check another mail i sent right after that. my bad had hit send button too soon without reading the mail. I will rephrase In hive to process the data, you will need the table created and data loaded to the table. You can not process a file without loading it into a table. If you want to do that and do not want to create a temporary table in hive with full columns from file then options available to you are 1) simple unix tools like awk or sed or cut 2) write a pig script 3) write your own mapreduce code On Wed, Jul 30, 2014 at 3:09 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: “With hive, without creating a table with full data, you can do intermediate processing like select only few columns and write into another table”. How can I do this process? Thank you alot! *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com] *Sent:* Wednesday, July 30, 2014 12:37 PM *To:* user@hive.apache.org *Subject:* Re: SELECT specific data sorry hit send too soon .. I mean without creating intermediate tables, in hive you can process the file directly On Wed, Jul 30, 2014 at 3:06 PM, Nitin Pawar nitinpawar...@gmail.com wrote: With hive, without creating a table with full data, you can do intermediate processing like select only few columns and write into another table, If this is something one time then you can take a look at awk or cut commands in linux and generate those files only. On Wed, Jul 30, 2014 at 2:49 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: I am only using Hive and hadoop, nothing more. *From:* Devopam Mittra [mailto:devo...@gmail.com] *Sent:* Wednesday, July 30, 2014 12:15 PM *To:* user@hive.apache.org *Subject:* Re: SELECT specific data Are you using any tool to load data ? If yes, then the ETL tool will provide you such options. If not, then please explore unix file processing/external table route. On Wed, Jul 30, 2014 at 2:09 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Hello, Thank you for your reply. Consider we have data divided into 5 columns (col1, col2, col3, col4, col5). So I can’t load directly col1, col3 and col5? If I can’t do it directly, can you provide me with an alternate solution? Thank you. *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com] *Sent:* Wednesday, July 30, 2014 11:37 AM *To:* user@hive.apache.org *Subject:* Re: SELECT specific data you mean just by writing query then I think no. But if you want to read only first 3 columns of the data then it would work with just a single table and load data into On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Hello, I am interested in selecting specific data from a source and loading it to a table. For example, if I have 5 columns in my dataset, I want to load 3 columns of it. Is it possible to do it without create a second table? Thank you *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Devopam Mittra Life and Relations are not binary *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar -- Nitin Pawar *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential
Re: Hive Data
hive reads the files by the input format defined by the table schema. By default it reads the TextFile in which columns are separated by CTRL+A key if you have a csv file then you can use a csv serde. there are lots of such file formats. what does your file look like? On Wed, Jul 30, 2014 at 5:54 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Hello, I am interested in testing Hive with a huge sample data. Does Hive read all data types? Should the file be a table? Thank you *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar
Re: Exception in Hive with SMB join and Parquet
:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:96) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:79) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:66) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:471) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:561) ... 18 more Looks like it is trying to access the column with index as 29 where as there are only 5 non null columns being present in the row - which matches the Arraylist size. What could be going wrong here? Thanks Suma -- Nitin Pawar
Re: UDTF
you want to know how initializes an udtf or how to build udtf ? On Tue, Jul 29, 2014 at 1:30 AM, Doug Christie doug.chris...@sas.com wrote: Can anyone point me to the source code in hive where the calls to initialize, process and forward in a UDTF are made? Thanks. Doug -- Nitin Pawar
Re: Drop Partition by ID
you can try with like statement On 21 Jul 2014 19:32, fab wol darkwoll...@gmail.com wrote: Hi everyone, I have the following problem: I have a partitoned managed table (Partition table is a string which represents a date, eg. log-date=2014-07-15). Unfortunately there is one partition in there like this: log_date=2014-07-15-23%3A45%3A38 (copied from show partitions stmt). This partitions most likeley got created to a wrong script 8which is fixed). Now i want to delete this partition, but it doesn't work: - alter table ... drop partitition (log_date='2014-07-15-23%3A45%3A38') gives no error, but the partition is still existing afterwards - I tried escaping the %-signs with backslashes but no luck with that - I delete the directory in the HDFS and run msck repair table afterwards. It recognizes that the folder is missing but is not deleting the metadata So what can I do to get rid of the metadata? My next guess would be to go directly to the metastore DB and delete the metadata there. But what exactly has to be deleted? I guess there are several dependencies. Other idea: is there a possibility in Hive to delete a partition by a unique ID or something like that? Or what is needed to delete the table with the normal alter table drop partition command? Cheers Wolli
Re: difference between partition by and distribute by in rank()
In general principle, distribute by ensures each of N reducers gets non-overlapping ranges of X , but doesn't sort the output of each reducer. You end up with N or unsorted files with non-overlapping ranges. So this is more of a horizontal distribution of data. In my view, Partition by is more based on values so its vertical distribution of data. I may be wrong in understanding this On Fri, Jul 11, 2014 at 1:38 PM, Eric Chu e...@rocketfuel.com wrote: Does anyone know what *rank() over(distribute by p_mfgr sort by p_name) * does exactly and how it's different from *rank() over(partition by p_mfgr order by p_name)*? Thanks, Eric -- Nitin Pawar
Re: Error while renaming Partitioned column name
whats your table DDL? On Wed, Jul 9, 2014 at 11:03 PM, Manish Kothari manish.koth...@vonage.com wrote: Thanks Dipesh. Here is what I tried : - ALTER TABLE siplogs_partitioned PARTITION (pcol1='str_hour',pcol2='str_date') RENAME TO PARTITION (pcol1='call_hour',pcol2='call_date'); When I run the above command I am getting the error below : - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. str_date not found in table's partition spec: {pcol1=str_hour, pcol2=str_date} Am I missing something here? Thanks, Manish *From:* D K [mailto:deepe...@gmail.com] *Sent:* Wednesday, July 09, 2014 12:38 PM *To:* user@hive.apache.org *Subject:* Re: Error while renaming Partitioned column name Here is an example: ALTER TABLE alter_rename_partition PARTITION (pCol1='old_part1', pcol2='old_part2') RENAME TO PARTITION (pCol1='new_part1', pcol2='new_part2'); On Wed, Jul 9, 2014 at 9:20 AM, Manish Kothari manish.koth...@vonage.com wrote: Hi, I have a table name siplogs_partitioned which is partitioned by columns str_date(DATE) and str_hour(INT). I want to rename the partitioned columns to call_date and call_hour. I am using the below command to alter the partitioned column name: - ALTER TABLE siplogs_partitioned PARTITION str_date RENAME TO PARTITION call_date; When I run the above command I am getting an error : - FAILED: ParseException line 1:12 cannot recognize input near 'siplogs_partitioned' 'PARTITION' 'str_date' in alter table partition statement Is the “ALTER TABLE” usage correct to rename the partitioned column names? Any pointer or help is appreciated. Thanks, Manish -- Nitin Pawar
Re: Hive metastore error
is your hive metastore service running ? On Thu, Jun 26, 2014 at 2:11 PM, Rishabh Bhardwaj rbnex...@yahoo.com wrote: HI all, I have changed my hive metastore to mysql using the steps described here http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html Now when I am running any hive command on cli like show databases or show tables , It gives me the following error: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient This is related to hive metastore only. Can anyone please help me out with this. Thanks, Rishabh -- Nitin Pawar
edit permissions to wiki
Hi, can someone add me to hive wiki editors? My userid is : nitinpawar432 -- Nitin Pawar
Re: how to load json with nested array into hive?
I think you can just take a look at jsonserde It does take care of nested json documents. (though you will need to know entire json structure upfront) Here is example of using it http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/ On Mon, Jun 23, 2014 at 2:28 PM, Christian Link christian.l...@mdmp.com wrote: Hi Jerome, thanks...I've already found Brickhouse and the Hive UDFs, but it didn't help. Today I'll try again to process the json file after going through all my mails...maybe I'll find a solution. Best, Chris On Fri, Jun 20, 2014 at 7:16 PM, Jerome Banks jba...@tagged.com wrote: Christian, Sorry to spam this newsgroup, and this is not a commercial endorsement, but check out the Hive UDFs in the Brickhouse project ( http://github.com/klout/brickhouse ) ( http://brickhouseconfessions.wordpress.com/2014/02/07/hive-and-json-made-simple/ ) You can convert arbitrary complex Hive structures to an from json with it's to_json and from_json UDF's. See the blog posting for an explanation. -- jerome On Fri, Jun 20, 2014 at 8:26 AM, Christian Link christian.l...@mdmp.com wrote: hi, I'm very, very new to Hadoop, Hive, etc. and I have to import data into hive tables. Environment: Amazon EMR, S3, etc. The input file is on S3 and I copied it into my HDFS. 1. flat table with one column and loaded data into it: CREATE TABLE mdmp_raw_data (json_record STRING); LOAD DATA INPATH 'hdfs:///input-api/1403181319.json' OVERWRITE INTO TABLE `mdmp_raw_data`; That worked, I can access some data, like this: SELECT d.carrier, d.language, d.country FROM mdmp_raw_data a LATERAL VIEW json_tuple(a.data, 'requestTimestamp', 'context') bAS requestTimestamp, context LATERAL VIEW json_tuple(b.context, 'locale') c AS locale LATERAL VIEW json_tuple(c.locale, 'carrier', 'language', 'country') d AS carrier, language, country LIMIT 1; Result: o2 - de Deutsch Deutschland I can also select the array at once: SELECT b.requestTimestamp, b.batch FROM mdmp_raw_data a LATERAL VIEW json_tuple(a.data, 'requestTimestamp', 'batch') b AS requestTimestamp, batch LIMIT 1; This will give me: [{timestamp:2014-06-19T14:25:18+02:00,requestId:2ca08247-5542-4cb4-be7e-4a8574fb77a8,sessionId:f29ec175ca6b7d10,event:TEST Doge Comments,userId:doge96514016ruffruff,action:track,context:{library:analytics-android,libraryVersion:0.6.13},properties:{comment:Much joy.}}, ...] This batch may contain n events will a structure like above. I want to put all events in a table where each element will be stored in a unique column: timestamp, requestId, sessionId, event, userId, action, context, properties 2. explode the batch I read a lot about SerDe, etc. - but I don't get it. - I tried to create a table with an array and load the data into it - several errors use explode in query but it doesn't accept batch as array - integrated several SerDes but get things like unknown function jspilt - I'm lost in too many documents, howtos, etc. and could need some advices... Thank you in advance! Best, Chris -- Nitin Pawar
Re: hive variables
perfect On Sun, Jun 22, 2014 at 11:48 AM, Lefty Leverenz leftylever...@gmail.com wrote: Thanks Nitin, I've added that information to the wiki on the Variable Substitution page https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution#LanguageManualVariableSubstitution-SubstitutionDuringQueryConstruction . Please check my wording and let me know if revisions are needed. -- Lefty On Fri, Jun 20, 2014 at 5:17 AM, Nitin Pawar nitinpawar...@gmail.com wrote: hive variables are not replaced on mapreduce jobs but when the query is constructed with the variable. if you are running two difference hivesessions, the variables will not be mixed. If you are setting variables with same name in same hive session then the last set value will be picked On Fri, Jun 20, 2014 at 2:44 PM, Bogala, Chandra Reddy chandra.bog...@gs.com wrote: How does hive variables work?. if I have multiple Hive jobs running simultaneously? Will they end up picking up values from each other? In automation I am constructing an HQL file by prepending it with some SET statements. I want to make sure if I submit two jobs at the same time that use the same variable names, one job won't pick up values from the other job. Same question from stakeoverflow: http://stackoverflow.com/questions/12464636/how-to-set-variables-in-hive-scripts Thanks, Chandra -- Nitin Pawar -- Nitin Pawar
Re: hive variables
hive variables are not replaced on mapreduce jobs but when the query is constructed with the variable. if you are running two difference hivesessions, the variables will not be mixed. If you are setting variables with same name in same hive session then the last set value will be picked On Fri, Jun 20, 2014 at 2:44 PM, Bogala, Chandra Reddy chandra.bog...@gs.com wrote: How does hive variables work?. if I have multiple Hive jobs running simultaneously? Will they end up picking up values from each other? In automation I am constructing an HQL file by prepending it with some SET statements. I want to make sure if I submit two jobs at the same time that use the same variable names, one job won't pick up values from the other job. Same question from stakeoverflow: http://stackoverflow.com/questions/12464636/how-to-set-variables-in-hive-scripts Thanks, Chandra -- Nitin Pawar
Re: mismatched input 'EOF' expecting FROM near 'CURRENT_TIME' in from clause
Please take a look at hive's query language support. its pro-sql but not fully sql compliance On Thu, Jun 19, 2014 at 7:19 PM, Clay McDonald stuart.mcdon...@bateswhite.com wrote: Why does this not work? hive SELECT CURRENT_TIME; MismatchedTokenException(-1!=107) at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617) at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1194) at org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:31423) at org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:29520) at org.apache.hadoop.hive.ql.parse.HiveParser.regular_body(HiveParser.java:29428) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatement(HiveParser.java:28968) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:28762) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1238) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:938) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1000) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:19 mismatched input 'EOF' expecting FROM near 'CURRENT_TIME' in from clause -- Nitin Pawar
Re: simple insert query question
remember in hive, insert operation is 1) from a file 2) from another table hive's underlying storage is hdfs which is not meant for having just single record kind of stuff (as of now, this will change once hive starts supporting ACID actions in coming releases) 1) either create a sample file and load data in table using file 2) or create a dummy table and then write insert into table select from table2 kind of dummy query On Thu, Jun 19, 2014 at 7:26 PM, Clay McDonald stuart.mcdon...@bateswhite.com wrote: What about if I wanted to run this in hive, create table test_log (test_time timestamp, test_notes varchar(60)); insert into table test_log values(now(),'THIS IS A TEST'); *From:* Nishant Kelkar [mailto:nishant@gmail.com] *Sent:* Thursday, June 19, 2014 9:29 AM *To:* user@hive.apache.org; Clay McDonald *Subject:* Re: simple insert query question Hey Stuart, As far as I know, files in HDFS are immutable. So I would think that your query below would not have a direct Hive conversion. What you can do though, is create a local text file and then create an EXTERNAL TABLE on top of that. Then, instead of your INSERT query, just use some linux command to append a line to text file. It will automatically reflect in your external Hive table! :) To understand what Hive external tables are and how to create them, I'd just go on the Hive wiki page. Good luck! Best, Nishant On Jun 19, 2014 6:17 AM, Clay McDonald stuart.mcdon...@bateswhite.com wrote: hi all, how do I write the following query to insert a note with a current system timestamp? I tried the following; INSERT INTO TEST_LOG VALUES (unix_timestamp(),'THIS IS A TEST.'); thanks, Clay -- Nitin Pawar
Re: Storing and reading XML files in HIVE
see if this can help you https://github.com/dvasilen/Hive-XML-SerDe/wiki/XML-data-sources On Fri, Jun 6, 2014 at 3:25 PM, Knowledge gatherer knowledge.gatherer@gmail.com wrote: U need to have a CustomSerde in Hive to read the XML files On Fri, Jun 6, 2014 at 2:58 PM, Yu Azuryy azuryy@gmail.com wrote: AFAIK, Hive doesn't provide XMLInputFormat , so you had to write it by yourself. On Fri, Jun 6, 2014 at 5:23 PM, Ramasubramanian Narayanan ramasubramanian.naraya...@gmail.com wrote: Dear All, Request your help to guide how to store and read XML data in HIVE. while querying it should look as if we are having txt format file under HIVE (it is fine if we use view to parse the XML and show). Have gone through some sites but not able to figure out correctly.. few are mentioning that we need use some JAR's to achieve it... Thanks in advance, Rams -- Nitin Pawar
Re: Python version compatibility for hive 0.13
do you mean python hiveserver client library? I would recommend you to upgrade to python 2.6 to the least On Wed, May 21, 2014 at 9:54 PM, Hari Rajendhran hari.rajendh...@tcs.comwrote: Hi Team, Does Python 2.4.3 supports apache hive 0.13 version ? Best Regards Hari Krishnan Rajendhran Hadoop Admin DESS-ABIM ,Chennai BIGDATA Galaxy Tata Consultancy Services Cell:- 9677985515 Mailto: hari.rajendh...@tcs.com Website: http://www.tcs.com Experience certainty. IT Services Business Solutions Consulting =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you -- Nitin Pawar
Re: Connecting hive to SAP BO
another option would be add jar /path/to/serde/jar/file; On Tue, May 20, 2014 at 10:45 AM, Shengjun Xin s...@gopivotal.com wrote: hive –auxpath /path-to-/csvserde.jar On Tue, May 20, 2014 at 12:59 PM, Chhaya Vishwakarma chhaya.vishwaka...@lntinfotech.com wrote: Hi, I have connected SAP BO to Hive using a ODBC driver. I am able to see the database and table in hive. but when i fetch data from hive it gives error as org.apache.hadoop.hìve.serde2.SerDeExceptio SerDe com,bizohive.serde.csv.CSVSerde does not exist Can ayone suggest where i should put csvserde jar in SAP BO Regards, Chhaya Vishwakarma -- The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. LT Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail -- Regards Shengjun -- Nitin Pawar
Re: hive query to select top 10 product of each subcategory and select most recent product info
may be you can share your table ddl, your query and what output r u looking for On Fri, Apr 11, 2014 at 12:26 PM, Mohit Durgapal durgapalmo...@gmail.comwrote: I have a hive table partitioned by dates. It contains ecomm data in the format siteid,sitecatid,catid,subcatgid,pid,pname,pprice,pmrp,pdesc What I need to do is to run a query on table above in hive for top 10 products(count wise) in each sub category. What adds a bit more complexity is that I need all the information of the product. Now when I do group by with only subcatg,pid, I can only select the same fields. But I want all the data for that product coming in the same row as subcatg prodid like prodname, proddesc,price, mrp,imageurl. And since some information like price proddesc of a product keep on changing I want to pick the latest column values(according to a date field) for a pid if we are able to do a group by on subcatg,pid. I am not able to find a solution to my problem in hive. Any help would be much appreciated. Regards Mohit -- Nitin Pawar
Re: hive query to select top 10 product of each subcategory and select most recent product info
will it be a good idea to just get top 10 ranked products by whatever your ranking is based on and then join it with its metadata (self join or any other way) ? On Fri, Apr 11, 2014 at 1:52 PM, Mohit Durgapal durgapalmo...@gmail.comwrote: Hi Nitin, The ddl is as follows: CREATE EXTERNAL TABLE user_logs( users_iduuidstring, siteid int, site_catid int, stext string, catgint, // CATEGORY scatg int, // SUBCATEGORY catgnamestring, scatgname string, brand string,// PRODUCT BRAND NAME prrange string, currint, pname string, // product name pid int, // product ID price string, //Product Price prodnbr int, mrp string, //MRP prURL string, //Product url prIMGURL string, //Product Image URL opr string, oid string, txsucc string, last_updatedstring //timestamp ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' I am looking for an output where I have top 10 products from each subcategory(on the basis of count) with all their information like product name, price, url, imgurl. Any there will be multiple entries for the same products (pids) within the same subcategory, In that case I have to pick the product info that is latest(by last_updated field). I have written a query but it is considering a multiple entries of product as different products If price or any other info changes for that product. select siteid,site_catid,catg,scatg,COLLECT_SET(PRODDESC) from ( select PRODDESC,displays,siteid,site_catid,catg,scatg,rank(siteid,site_catid,catg,scatg) as row_number from ( select count(*) as displays,siteid,site_catid,catg,scatg,CONCAT('{','pname:',pname,',price:',price,',','mrp:',mrp,',curr:',curr,',pid:',pid,'}') as PRODDESC from user_logs group by siteid,site_catid,catg,scatg,pid,pname,price,mrp,curr order by siteid,site_catid,catg,scatg,displays desc ) A ) B WHERE row_number 10 group by siteid,site_catid,catg,scatg order by siteid,site_catid,catg,scatg desc; The rank() method simply helps in fetching top 10 within a subcategory. Every time it encounters the same combination of siteid,site_catid,catg,scatg it increments row_number goes till 10. The problem above is that I am forced to put product info such as pname,price,mrp, in the group by clause otherwise I will not be able to get that information in select. Therefore, even if someone changes just the price a product(this happens very frequently) it is considered a different product by the above query. And that is something I don't want. I hope I have made it a little more clear? Thanks for your reply :) On Fri, Apr 11, 2014 at 12:45 PM, Nitin Pawar nitinpawar...@gmail.comwrote: may be you can share your table ddl, your query and what output r u looking for On Fri, Apr 11, 2014 at 12:26 PM, Mohit Durgapal durgapalmo...@gmail.com wrote: I have a hive table partitioned by dates. It contains ecomm data in the format siteid,sitecatid,catid,subcatgid,pid,pname,pprice,pmrp,pdesc What I need to do is to run a query on table above in hive for top 10 products(count wise) in each sub category. What adds a bit more complexity is that I need all the information of the product. Now when I do group by with only subcatg,pid, I can only select the same fields. But I want all the data for that product coming in the same row as subcatg prodid like prodname, proddesc,price, mrp,imageurl. And since some information like price proddesc of a product keep on changing I want to pick the latest column values(according to a date field) for a pid if we are able to do a group by on subcatg,pid. I am not able to find a solution to my problem in hive. Any help would be much appreciated. Regards Mohit -- Nitin Pawar -- Nitin Pawar
Re: HIVE UDF Error
Can you put first few lines of your code here or upload code on github and share the link? On Wed, Apr 9, 2014 at 11:59 AM, Rishabh Bhardwaj rbnex...@yahoo.comwrote: Hi all, I have done the following steps to create a UDF in hive but getting error.Please help me. 1. Created the udf as described herehttp://blog.matthewrathbone.com/2013/08/10/guide-to-writing-hive-udfs.html . 2. Compiled it successfully. 3. Copy the class file to a directory hiveudfs. 4. Added it to a jar with this command: jar -cf hiveudfs.jar hiveudfs/SimpleUDFExample.class 5. Import the jar into hive. add jar hiveudfs.jar; (Added Successfully) create temporary function helloworld as 'hiveudfs.SimpleUDFExample'; At this I am getting the following error, hive create temporary function helloworld as 'hiveudfs.SimpleUDFExample'; java.lang.NoClassDefFoundError: hiveudfs/SimpleUDFExample (wrong name: SimpleUDFExample) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:266) at org.apache.hadoop.hive.ql.exec.FunctionTask.getUdfClass(FunctionTask.java:105) at org.apache.hadoop.hive.ql.exec.FunctionTask.createFunction(FunctionTask.java:75) at org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:63) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1353) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1137) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:867) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:755) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask Thanks, Rishabh. -- Nitin Pawar
Re: HIVE UDF Error
in your code and that code package is missing what you need to do is define package something like package org.apache.hadoop.hive.ql.udf; then your add function definition becomes CREATE TEMPORARY FUNCTION function_name AS 'org.apache.hadoop.hive.ql.udf.ClassName'; feel free to use any package name you wish but make sure its reflected same also to build and compile and package hive udfs use the shell script if you are on linux http://yaboolog.blogspot.in/2011/06/compiling-original-hive-udf.html On Wed, Apr 9, 2014 at 12:12 PM, Rishabh Bhardwaj rbnex...@yahoo.comwrote: Hi Nitin, Thanks for the concern. Here is the code of the UDF, import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; @Description( name=SimpleUDFExample, value=returns 'hello x', where x is whatever you give it (STRING), extended=SELECT simpleudfexample('world') from foo limit 1; ) class SimpleUDFExample extends UDF { public Text evaluate(Text input) { if(input == null) return null; return new Text(Hello + input.toString()); } } From google I came across a blog. I have taken this from here (git linkhttps://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/SimpleUDFExample.java ). On Wednesday, 9 April 2014 12:08 PM, Nitin Pawar nitinpawar...@gmail.com wrote: Can you put first few lines of your code here or upload code on github and share the link? On Wed, Apr 9, 2014 at 11:59 AM, Rishabh Bhardwaj rbnex...@yahoo.comwrote: Hi all, I have done the following steps to create a UDF in hive but getting error.Please help me. 1. Created the udf as described herehttp://blog.matthewrathbone.com/2013/08/10/guide-to-writing-hive-udfs.html . 2. Compiled it successfully. 3. Copy the class file to a directory hiveudfs. 4. Added it to a jar with this command: jar -cf hiveudfs.jar hiveudfs/SimpleUDFExample.class 5. Import the jar into hive. add jar hiveudfs.jar; (Added Successfully) create temporary function helloworld as 'hiveudfs.SimpleUDFExample'; At this I am getting the following error, hive create temporary function helloworld as 'hiveudfs.SimpleUDFExample'; java.lang.NoClassDefFoundError: hiveudfs/SimpleUDFExample (wrong name: SimpleUDFExample) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:266) at org.apache.hadoop.hive.ql.exec.FunctionTask.getUdfClass(FunctionTask.java:105) at org.apache.hadoop.hive.ql.exec.FunctionTask.createFunction(FunctionTask.java:75) at org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:63) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1353) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1137) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:867) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:755) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask Thanks, Rishabh. -- Nitin Pawar -- Nitin Pawar
Re: HIVE UDF Error
Follow the steps as it is from the link I shared .. it works Somehow your package is getting messed and it is not able to find the class On Wed, Apr 9, 2014 at 12:27 PM, Rishabh Bhardwaj rbnex...@yahoo.comwrote: I added, package rishabh.udf.hive; in the above code. and repeated the steps. But Now getting the following error, hive create temporary function helloworld as 'rishabh.udf.hive.SimpleUDFExample'; FAILED: Class rishabh.udf.hive.SimpleUDFExample not found FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask The SimpleUDFExample.class file is in hiveudfs.jar file. On Wednesday, 9 April 2014 12:20 PM, Nitin Pawar nitinpawar...@gmail.com wrote: in your code and that code package is missing what you need to do is define package something like package org.apache.hadoop.hive.ql.udf; then your add function definition becomes CREATE TEMPORARY FUNCTION function_name AS 'org.apache.hadoop.hive.ql.udf.ClassName'; feel free to use any package name you wish but make sure its reflected same also to build and compile and package hive udfs use the shell script if you are on linux http://yaboolog.blogspot.in/2011/06/compiling-original-hive-udf.html On Wed, Apr 9, 2014 at 12:12 PM, Rishabh Bhardwaj rbnex...@yahoo.comwrote: Hi Nitin, Thanks for the concern. Here is the code of the UDF, import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; @Description( name=SimpleUDFExample, value=returns 'hello x', where x is whatever you give it (STRING), extended=SELECT simpleudfexample('world') from foo limit 1; ) class SimpleUDFExample extends UDF { public Text evaluate(Text input) { if(input == null) return null; return new Text(Hello + input.toString()); } } From google I came across a blog. I have taken this from here (git linkhttps://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/SimpleUDFExample.java ). On Wednesday, 9 April 2014 12:08 PM, Nitin Pawar nitinpawar...@gmail.com wrote: Can you put first few lines of your code here or upload code on github and share the link? On Wed, Apr 9, 2014 at 11:59 AM, Rishabh Bhardwaj rbnex...@yahoo.comwrote: Hi all, I have done the following steps to create a UDF in hive but getting error.Please help me. 1. Created the udf as described herehttp://blog.matthewrathbone.com/2013/08/10/guide-to-writing-hive-udfs.html . 2. Compiled it successfully. 3. Copy the class file to a directory hiveudfs. 4. Added it to a jar with this command: jar -cf hiveudfs.jar hiveudfs/SimpleUDFExample.class 5. Import the jar into hive. add jar hiveudfs.jar; (Added Successfully) create temporary function helloworld as 'hiveudfs.SimpleUDFExample'; At this I am getting the following error, hive create temporary function helloworld as 'hiveudfs.SimpleUDFExample'; java.lang.NoClassDefFoundError: hiveudfs/SimpleUDFExample (wrong name: SimpleUDFExample) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:266) at org.apache.hadoop.hive.ql.exec.FunctionTask.getUdfClass(FunctionTask.java:105) at org.apache.hadoop.hive.ql.exec.FunctionTask.createFunction(FunctionTask.java:75) at org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:63) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1353) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1137) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:867) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:755) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613
Re: Can I update just one row in Hive table using Hive INSERT OVERWRITE
for non partitioned columns ans in one word: NO detailed answer here: This feature is still being build as part of https://issues.apache.org/jira/browse/HIVE-5317 On Sat, Apr 5, 2014 at 2:28 AM, Raj Hadoop hadoop...@yahoo.com wrote: Can I update ( delete and insert kind of) just one row keeping the remaining rows intact in Hive table using Hive INSERT OVERWRITE. There is no partition in the Hive table. INSERT OVERWRITE TABLE tablename SELECT col1,col2,col3 from tabx where col2='abc'; Does the above work ? Please advise. -- Nitin Pawar
Re: READING FILE FROM MONGO DB
you can always write customUDF for your needs On Tue, Apr 1, 2014 at 1:35 PM, Swagatika Tripathy swagatikat...@gmail.comwrote: Do we hv a for loop concept in hive to iterate through the array elements n display them. We need an alternative for explode method Well you cN use Json serde for this Sent from my iPhone On Mar 26, 2014, at 8:40 PM, Swagatika Tripathy swagatikat...@gmail.com wrote: Hi , The use case is we have some unstructured data fetched from Mongo DB and stored in a particular location. Our task is to load those data into our staging and core hive tables in form of rows and columns.eg if the data is in key value pair like: { Id: bigint(12346), Name:string(ABC), Subjects: {Subject enrolled: Subjects: [eng ,math] } {Game enrolled: [Football,cricket] } This is just a very simple eg fr reference but we have a complex Json format with huge amount of data. So, in this case how can we load it into hive tables and hdfs? On Mar 26, 2014 10:59 PM, shouvanik.hal...@accenture.com wrote: Are you swagatika mohanty? Thanks, Shouvanik -Original Message- From: Siddharth Tiwari [mailto:siddharth.tiw...@live.com] Sent: Wednesday, March 26, 2014 10:03 AM To: user@hive.apache.org Subject: Re: READING FILE FROM MONGO DB Hi Swagatika You can create external tables to Mongo and can process it using hive. New mongo connectors have added support for hive. Did you try that? Sent from my iPhone On Mar 26, 2014, at 9:59 AM, Swagatika Tripathy swagatikat...@gmail.com wrote: Hi, We have some files stored in MongoDB , mostly in key value format. We need to parse those files and store it into Hive tables. Any inputs on this will be appreciated. Thanks, Swagatika This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. __ www.accenture.com -- Nitin Pawar
Re: pig,hive install over hadoop
Pig and hive do not come in bare minimum version. Its complete pig or hive package You can use existing hadoop cluster with pig and hive. If you do not need persistent storage for hive tables, then you dont need to configure much . Search for hive with derby and that should get you started. On pig side, just downloading the binaries is good enough. You can point it to your HADOOP_HOME and it should work fine On Tue, Apr 1, 2014 at 3:34 PM, Rahul Singh smart.rahul.i...@gmail.comwrote: Hi, I have installed and configured hadoop. Now, I want to install hive and pig, As per my understanding pig and hive internally uses hadoop. So is there a way i can just install bare minimum hive or pig and take advantage of already installed hadoop or i need to separately install and configure complete hive and pig. Thanks, -Rahul Singh -- Nitin Pawar
Re: MSCK REPAIR TABLE
can you grab more logs from hiveserver2 log file? On Thu, Mar 27, 2014 at 2:31 PM, fab wol darkwoll...@gmail.com wrote: Hey everyone, I have a table with currently 5541 partitions. Daily there are 14 partitions added. I will switch the update for the metastore from msck repair table to alter table add partition, since its performing better, but sometimes this might fail, and i need the msck repair table command. But unfortunately its not working anymore with this table size it seems: 0: jdbc:hive2://clusterXYZ- use DB_NAME; No rows affected (1.082 seconds) 0: jdbc:hive2://clusterXYZ- set hive.metastore.client.socket.timeout=6000; No rows affected (0.029 seconds) 0: jdbc:hive2://clusterXYZ- MSCK REPAIR TABLE TABLENAME; Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) anyone had luck with getting this to work? As you can see, I already raised the time until the Thrift Timeout kicks in, but this error is happening even before the time runs off ... Cheers Wolli -- Nitin Pawar