Getting access to hadoop output from Hive JDBC session
Hello, I am switching from Hive 0.9 to Hive 0.12 and decided to start using Hive metadata server mode. As it turns out, Hive1 JDBC driver connected as jdbc:hive:// only works via direct access to the metastore database. The Hive2 driver connected as jdbc:hive2:// does work with the remote Hive metastore server, but there is another serious difference in behavior. When I was using Hive1 driver I saw Hadoop output - the information about Hive job ID and the usual Hadoop output showing percentages of map and reduce done. The Hive2 driver silently waited for map/reduce to complete and just produced the result. As I can see, both Hive itself and beeline are able to get the same Hadoop output as I was getting with Hive1 driver, so it should be somehow possible but it isn't clear how they do this. Can someone suggest the way to get Hadoop output with Hive2 JDBC driver? Thanks for any help! - Alex
Re: Handling blob in hive
You can store Blob data type as string in hive. Thanks, Saurabh Sent from my iPhone, please avoid typos. On 08-Aug-2014, at 9:10 am, Chhaya Vishwakarma chhaya.vishwaka...@lntinfotech.com wrote: Hi, I want to store and retrieve blob in hive.Is it possible to store blob in hive? If it is not supported what alternatives i can go with? Blob may reside inside an relation DB also. I did some research but not finding relevant solution Regards, Chhaya Vishwakarma The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. LT Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail
hive query with in statement
Hi all, I have a problem with IN statement in HiveQL. My table cdr, column calldate which type is date. First query is successfully return: select * from cdr where calldate = '2014-05-02'; But when query with IN statement, select * from cdr where calldate in ( '2014-08-11','2014-05-02'); it returns below exception: Error: Error while processing statement: FAILED: SemanticException [Error 10014]: Line 1:38 Wrong arguments ''20014-03-02'': The arguments for IN should be the same type! Types are: {date IN (string, string)} (state=42000,code=10014) How can I handle this? Thanks. Hive version 0.12
hive query with in statement
Hi all, I have a problem with IN statement in HiveQL. My table cdr, column calldate which type is date. First query is successfully return: select * from cdr where calldate = '2014-05-02'; But when query with IN statement, select * from cdr where calldate in ( '2014-08-11','2014-05-02'); it returns below exception: Error: Error while processing statement: FAILED: SemanticException [Error 10014]: Line 1:38 Wrong arguments ''20014-03-02'': The arguments for IN should be the same type! Types are: {date IN (string, string)} (state=42000,code=10014) How can I handle this? Thanks. Hive version 0.12
hive query with in statement
Hi all, I have a problem with IN statement in HiveQL. My table cdr, column calldate which type is date. First query is successfully return: select * from cdr where calldate = '2014-05-02'; But when query with IN statement, select * from cdr where calldate in ( '2014-08-11','2014-05-02'); it returns below exception: Error: Error while processing statement: FAILED: SemanticException [Error 10014]: Line 1:38 Wrong arguments ''20014-03-02'': The arguments for IN should be the same type! Types are: {date IN (string, string)} (state=42000,code=10014) How can I handle this? Thanks. Hive version 0.12
Distributed data
Hello, Using Hive, we know that we should specify the file path to read data from a specific location. If the data is distributed on many computers, how can we read it? Thanks *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system.
Re: Distributed data
what do you mean the data is distributed on many computers? are you saying the data is on hdfs like filesystem ? On Tue, Aug 12, 2014 at 5:51 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Hello, Using Hive, we know that we should specify the file path to read data from a specific location. If the data is distributed on many computers, how can we read it? Thanks *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar
RE: Distributed data
Yes I mean the data is on hdfs like filesystem From: Nitin Pawar [mailto:nitinpawar...@gmail.com] Sent: Tuesday, August 12, 2014 3:26 PM To: user@hive.apache.org Subject: Re: Distributed data what do you mean the data is distributed on many computers? are you saying the data is on hdfs like filesystem ? On Tue, Aug 12, 2014 at 5:51 PM, CHEBARO Abdallah abdallah.cheb...@murex.commailto:abdallah.cheb...@murex.com wrote: Hello, Using Hive, we know that we should specify the file path to read data from a specific location. If the data is distributed on many computers, how can we read it? Thanks *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system.
Re: Distributed data
If your hadoop is setup with same filesystem as hdfs, hive will take care of it If your hdfs is totally different than where the file resides, then you need to get the file from that filesystem and then push it to hive using load if that filesystem supports import/export with tools like sqoop then you can use them as well On Tue, Aug 12, 2014 at 5:58 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Yes I mean the data is on hdfs like filesystem *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com] *Sent:* Tuesday, August 12, 2014 3:26 PM *To:* user@hive.apache.org *Subject:* Re: Distributed data what do you mean the data is distributed on many computers? are you saying the data is on hdfs like filesystem ? On Tue, Aug 12, 2014 at 5:51 PM, CHEBARO Abdallah abdallah.cheb...@murex.com wrote: Hello, Using Hive, we know that we should specify the file path to read data from a specific location. If the data is distributed on many computers, how can we read it? Thanks *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar
RE: Distributed data
First of all, thank you, the information is very helpful. Can you please provide me more details about “If your hadoop is setup with same filesystem as hdfs, hive will take care of it “ ? Thanks From: Nitin Pawar [mailto:nitinpawar...@gmail.com] Sent: Tuesday, August 12, 2014 3:50 PM To: user@hive.apache.org Subject: Re: Distributed data If your hadoop is setup with same filesystem as hdfs, hive will take care of it If your hdfs is totally different than where the file resides, then you need to get the file from that filesystem and then push it to hive using load if that filesystem supports import/export with tools like sqoop then you can use them as well On Tue, Aug 12, 2014 at 5:58 PM, CHEBARO Abdallah abdallah.cheb...@murex.commailto:abdallah.cheb...@murex.com wrote: Yes I mean the data is on hdfs like filesystem From: Nitin Pawar [mailto:nitinpawar...@gmail.commailto:nitinpawar...@gmail.com] Sent: Tuesday, August 12, 2014 3:26 PM To: user@hive.apache.orgmailto:user@hive.apache.org Subject: Re: Distributed data what do you mean the data is distributed on many computers? are you saying the data is on hdfs like filesystem ? On Tue, Aug 12, 2014 at 5:51 PM, CHEBARO Abdallah abdallah.cheb...@murex.commailto:abdallah.cheb...@murex.com wrote: Hello, Using Hive, we know that we should specify the file path to read data from a specific location. If the data is distributed on many computers, how can we read it? Thanks *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system.
RE: Distributed data
Hello, Please explain to me : “If your hadoop is setup with same filesystem as hdfs, hive will take care of it “ From: Nitin Pawar [mailto:nitinpawar...@gmail.com] Sent: Tuesday, August 12, 2014 3:50 PM To: user@hive.apache.org Subject: Re: Distributed data If your hadoop is setup with same filesystem as hdfs, hive will take care of it If your hdfs is totally different than where the file resides, then you need to get the file from that filesystem and then push it to hive using load if that filesystem supports import/export with tools like sqoop then you can use them as well On Tue, Aug 12, 2014 at 5:58 PM, CHEBARO Abdallah abdallah.cheb...@murex.commailto:abdallah.cheb...@murex.com wrote: Yes I mean the data is on hdfs like filesystem From: Nitin Pawar [mailto:nitinpawar...@gmail.commailto:nitinpawar...@gmail.com] Sent: Tuesday, August 12, 2014 3:26 PM To: user@hive.apache.orgmailto:user@hive.apache.org Subject: Re: Distributed data what do you mean the data is distributed on many computers? are you saying the data is on hdfs like filesystem ? On Tue, Aug 12, 2014 at 5:51 PM, CHEBARO Abdallah abdallah.cheb...@murex.commailto:abdallah.cheb...@murex.com wrote: Hello, Using Hive, we know that we should specify the file path to read data from a specific location. If the data is distributed on many computers, how can we read it? Thanks *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system. -- Nitin Pawar *** This e-mail contains information for the intended recipient only. It may contain proprietary material or confidential information. If you are not the intended recipient you are not authorised to distribute, copy or use this e-mail or any attachment to it. Murex cannot guarantee that it is virus free and accepts no responsibility for any loss or damage arising from its use. If you have received this e-mail in error please notify immediately the sender and delete the original email received, any attachments and all copies from your system.
Re: hive auto join conversion
Yeah, I was trying the same thing, though a little big ugly. My query needs to LJ/J with multiple tables. When there are 1 or 2 LJ/Js, rewriting works but when there are 3 tables, the got the same exception triggered by the following bug. https://issues.apache.org/jira/browse/HIVE-5891 Chen On Wed, Jul 30, 2014 at 10:07 PM, Eugene Koifman ekoif...@hortonworks.com wrote: would manually rewriting the query from (T1 union all T2) LOJ S to equivalent (T1 LOJ S) union all (T2 LOJ S) help work around this issue? On Wed, Jul 30, 2014 at 6:19 PM, Chen Song chen.song...@gmail.com wrote: I tried that and I got the following error. FAILED: SemanticException [Error 10227]: Not all clauses are supported with mapjoin hint. Please remove mapjoin hint. I then tried turning off auto join conversion. set hive.auto.convert.join=false But no luck, same error. Looks like it is a known issue, http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/bk_releasenotes_hdp_2.0/content/ch_relnotes-hdp2.0.0.2-5-2.html Chen On Wed, Jul 30, 2014 at 9:10 PM, Navis류승우 navis@nexr.com wrote: Could you do it with hive.ignore.mapjoin.hint=false? Mapjoin hint is ignored from hive-0.11.0 by default (see https://issues.apache.org/jira/browse/HIVE-4042) Thanks, Navis 2014-07-31 10:04 GMT+09:00 Chen Song chen.song...@gmail.com: I am using cdh5 with hive 0.12. We have some hive jobs migrated from hive 0.10 and they are written like below: select /*+ MAPJOIN(sup) */ c1, c2, sup.c from ( select key, c1, c2 from table1 union all select key, c1, c2 from table2 ) table left outer join sup on (table.c1 = sup.key) distribute by c1 In Hive 0.10 (CDH4), Hive translates the left outer join into a map join (map only job), followed by a regular MR job for distribute by. In Hive 0.12 (CDH5), Hive is not able to convert the join into a map join. Instead it launches a common map reduce for the join, followed by another mr for distribute by. However, when I take out the union all operator, Hive seems to be able to create a single MR job, with map join on map phase, and reduce for distribute by. I read a bit on https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins and found out that there are some restrictions on map side join starting Hive 0.11. The following are not supported. - Union Followed by a MapJoin - Lateral View Followed by a MapJoin - Reduce Sink (Group By/Join/Sort By/Cluster By/Distribute By) Followed by MapJoin - MapJoin Followed by Union - MapJoin Followed by Join - MapJoin Followed by MapJoin So if one side of the table (big side) is a union of some tables and the other side is a small table, Hive would not be able to do a map join at all? Is that correct? If correct, what should I do to make the job backward compatible? -- Chen Song -- Chen Song -- Thanks, Eugene CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Chen Song
Re: ulimit for Hive
+ Hive user mailing list It should be a better place for your questions. On Mon, Aug 11, 2014 at 3:17 PM, Ana Gillan ana.gil...@gmail.com wrote: Hi, I’ve been reading a lot of posts about needing to set a high ulimit for file descriptors in Hadoop and I think it’s probably the cause of a lot of the errors I’ve been having when trying to run queries on larger data sets in Hive. However, I’m really confused about how and where to set the limit, so I have a number of questions: 1. How high is it recommended to set the ulimit? 2. What is the difference between soft and hard limits? Which one needs to be set to the value from question 1? 3. For which user(s) do I set the ulimit? If I am running the Hive query with my login, do I set my own ulimit to the high value? 4. Do I need to set this limit for these users on all the machines in the cluster? (we have one master node and 6 slave nodes) 5. Do I need to restart anything after configuring the ulimit? Thanks in advance, Ana -- Zhijie Shen Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
ArrayWritableGroupConverter
Hello. (First off, sorry if I accidentally posted to the wrong mailing list before - dev - and you are getting this again) Regarding the ArrayWritableGroupConverter class: I was just wondering how come the field count has to be either 1 or 2? I'm trying to read a column where the amount is fields is 3 and I'm getting an invalid parquet hive schema (in hive 0.12) error when I try to do so. It looks like it links back to here. *https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java* Thanks, -Raymond
Question about use of Table Lock Manager for clients
Hive documentation http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/C DH4-Installation-Guide/cdh4ig_topic_18_5.html describes the use of Table Lock Manager for HiveServer2. What isn¹t clear there is whether the lock manager should be enabled only on the host running HiveServer2 or on Hive clients as well? - Alex
Re: Getting access to hadoop output from Hive JDBC session
If you were using a remote HS2, then you need a getLog api call, which is a work in progress in one of the jiras. HIVE-4629 https://issues.apache.org/jira/browse/HIVE-4629 HS2 should support an API to retrieve query logs -- Lefty On Tue, Aug 12, 2014 at 10:45 PM, Thejas Nair the...@hortonworks.com wrote: you are running HS2 in embedded mode, I think you should be able to get the hadoop output by setting the log4j settings appropriately. If you were using a remote HS2, then you need a getLog api call, which is a work in progress in one of the jiras. On Tue, Aug 12, 2014 at 6:17 PM, Alexander Kolbasov ak...@conviva.com wrote: Cross-posted from user@hive.apache.org Hello, I am switching from Hive 0.9 to Hive 0.12 and decided to start using Hive metadata server mode. As it turns out, Hive1 JDBC driver connected as jdbc:hive:// only works via direct access to the metastore database. The Hive2 driver connected as jdbc:hive2:// does work with the remote Hive metastore server, but there is another serious difference in behavior. When I was using Hive1 driver I saw Hadoop output - the information about Hive job ID and the usual Hadoop output showing percentages of map and reduce done. The Hive2 driver silently waited for map/reduce to complete and just produced the result. As I can see, both Hive itself and beeline are able to get the same Hadoop output as I was getting with Hive1 driver, so it should be somehow possible but it isn't clear how they do this. Can someone suggest the way to get Hadoop output with Hive2 JDBC driver? Thanks for any help! - Alex -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: hive query with in statement
Hi, hive doesn't support IN clause. you might want to check out http://stackoverflow.com/questions/7677333/how-to-write-subquery-and-use-in-clause-in-hive On 12 August 2014 17:07, ilhami Kalkan ilhami1...@hotmail.com wrote: Hi all, I have a problem with IN statement in HiveQL. My table cdr, column calldate which type is date. First query is successfully return: select * from cdr where calldate = '2014-05-02'; But when query with IN statement, select * from cdr where calldate in ( '2014-08-11','2014-05-02'); it returns below exception: Error: Error while processing statement: FAILED: SemanticException [Error 10014]: Line 1:38 Wrong arguments ''20014-03-02'': The arguments for IN should be the same type! Types are: {date IN (string, string)} (state=42000,code=10014) How can I handle this? Thanks. Hive version 0.12 -- Sreenath S Kamath Bangalore Ph No:+91-9590989106