Getting access to hadoop output from Hive JDBC session

2014-08-12 Thread Alexander Kolbasov
Hello,

I am switching from Hive 0.9 to Hive 0.12 and decided to start using Hive 
metadata server mode. As it turns out, Hive1 JDBC driver connected as 
jdbc:hive:// only works via direct access to the metastore database. The 
Hive2 driver connected as jdbc:hive2:// does work with the remote Hive 
metastore server, but there is another serious difference in behavior. When I 
was using Hive1 driver I saw Hadoop output - the information about Hive job ID 
and the usual Hadoop output showing percentages of map and reduce done. The 
Hive2 driver silently waited for map/reduce to complete and just produced the 
result.

As I can see, both Hive itself and beeline are able to get the same Hadoop 
output as I was getting with Hive1 driver, so it should be somehow possible but 
it isn't clear how they do this. Can someone suggest the way to get Hadoop 
output with Hive2 JDBC driver?

Thanks for any help!

- Alex





Re: Handling blob in hive

2014-08-12 Thread Db-Blog
You can store Blob data type as string in hive. 

Thanks,
Saurabh

Sent from my iPhone, please avoid typos.

 On 08-Aug-2014, at 9:10 am, Chhaya Vishwakarma 
 chhaya.vishwaka...@lntinfotech.com wrote:
 
 Hi,
  
 I want to store and retrieve blob in hive.Is it possible to store blob in 
 hive?
 If it is not supported what alternatives i can go with?
 Blob may reside inside an relation DB also.
 I did some research but not finding relevant solution
  
 Regards,
 Chhaya Vishwakarma
  
 
 The contents of this e-mail and any attachment(s) may contain confidential or 
 privileged information for the intended recipient(s). Unintended recipients 
 are prohibited from taking action on the basis of information in this e-mail 
 and using or disseminating the information, and must notify the sender and 
 delete it from their system. LT Infotech will not accept responsibility or 
 liability for the accuracy or completeness of, or the presence of any virus 
 or disabling code in this e-mail


hive query with in statement

2014-08-12 Thread ilhami Kalkan

Hi all,
I have a problem with IN statement in HiveQL. My table cdr, column 
calldate which type is date. First query is successfully return:

select * from cdr where calldate = '2014-05-02';

But when query with IN statement,

select * from cdr where calldate in ( '2014-08-11','2014-05-02');

it returns below exception:

Error: Error while processing statement: FAILED: SemanticException 
[Error 10014]: Line 1:38 Wrong arguments ''20014-03-02'': The arguments 
for IN should be the same type! Types are: {date IN (string, string)} 
(state=42000,code=10014)


How can I handle this?
Thanks.

Hive version 0.12





hive query with in statement

2014-08-12 Thread ilhami Kalkan

Hi all,
I have a problem with IN statement in HiveQL. My table cdr, column
calldate which type is date. First query is successfully return:
select * from cdr where calldate = '2014-05-02';

But when query with IN statement,

select * from cdr where calldate in ( '2014-08-11','2014-05-02');

it returns below exception:

Error: Error while processing statement: FAILED: SemanticException
[Error 10014]: Line 1:38 Wrong arguments ''20014-03-02'': The arguments
for IN should be the same type! Types are: {date IN (string, string)}
(state=42000,code=10014)

How can I handle this?
Thanks.

Hive version 0.12







hive query with in statement

2014-08-12 Thread ilhami Kalkan

Hi all,
I have a problem with IN statement in HiveQL. My table cdr, column
calldate which type is date. First query is successfully return:
select * from cdr where calldate = '2014-05-02';

But when query with IN statement,

select * from cdr where calldate in ( '2014-08-11','2014-05-02');

it returns below exception:

Error: Error while processing statement: FAILED: SemanticException
[Error 10014]: Line 1:38 Wrong arguments ''20014-03-02'': The arguments
for IN should be the same type! Types are: {date IN (string, string)}
(state=42000,code=10014)

How can I handle this?
Thanks.

Hive version 0.12









Distributed data

2014-08-12 Thread CHEBARO Abdallah
Hello,

Using Hive, we know that we should specify the file path to read data from a 
specific location. If the data is distributed on many computers, how can we 
read it?

Thanks
***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.


Re: Distributed data

2014-08-12 Thread Nitin Pawar
what do you mean the data is distributed on many computers?

are you saying the data is on hdfs like filesystem ?


On Tue, Aug 12, 2014 at 5:51 PM, CHEBARO Abdallah 
abdallah.cheb...@murex.com wrote:

  Hello,



 Using Hive, we know that we should specify the file path to read data from
 a specific location. If the data is distributed on many computers, how can
 we read it?



 Thanks

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.




-- 
Nitin Pawar


RE: Distributed data

2014-08-12 Thread CHEBARO Abdallah
Yes I mean the data is on hdfs like filesystem

From: Nitin Pawar [mailto:nitinpawar...@gmail.com]
Sent: Tuesday, August 12, 2014 3:26 PM
To: user@hive.apache.org
Subject: Re: Distributed data

what do you mean the data is distributed on many computers?

are you saying the data is on hdfs like filesystem ?

On Tue, Aug 12, 2014 at 5:51 PM, CHEBARO Abdallah 
abdallah.cheb...@murex.commailto:abdallah.cheb...@murex.com wrote:
Hello,

Using Hive, we know that we should specify the file path to read data from a 
specific location. If the data is distributed on many computers, how can we 
read it?

Thanks

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Nitin Pawar

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.


Re: Distributed data

2014-08-12 Thread Nitin Pawar
If your hadoop is setup with same filesystem as hdfs, hive will take care
of it

If your hdfs is totally different than where the file resides, then you
need to get the file from that filesystem and then push it to hive using
load

if that filesystem supports import/export with tools like sqoop then you
can use them as well




On Tue, Aug 12, 2014 at 5:58 PM, CHEBARO Abdallah 
abdallah.cheb...@murex.com wrote:

  Yes I mean the data is on hdfs like filesystem



 *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com]
 *Sent:* Tuesday, August 12, 2014 3:26 PM
 *To:* user@hive.apache.org
 *Subject:* Re: Distributed data



 what do you mean the data is distributed on many computers?



 are you saying the data is on hdfs like filesystem ?



 On Tue, Aug 12, 2014 at 5:51 PM, CHEBARO Abdallah 
 abdallah.cheb...@murex.com wrote:

 Hello,



 Using Hive, we know that we should specify the file path to read data from
 a specific location. If the data is distributed on many computers, how can
 we read it?



 Thanks

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.





 --
 Nitin Pawar



 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.




-- 
Nitin Pawar


RE: Distributed data

2014-08-12 Thread CHEBARO Abdallah
First of all, thank you, the information is very helpful.

Can you please provide me more details about “If your hadoop is setup with same 
filesystem as hdfs, hive will take care of it “ ?

Thanks

From: Nitin Pawar [mailto:nitinpawar...@gmail.com]
Sent: Tuesday, August 12, 2014 3:50 PM
To: user@hive.apache.org
Subject: Re: Distributed data

If your hadoop is setup with same filesystem as hdfs, hive will take care of it

If your hdfs is totally different than where the file resides, then you need to 
get the file from that filesystem and then push it to hive using load

if that filesystem supports import/export with tools like sqoop then you can 
use them as well



On Tue, Aug 12, 2014 at 5:58 PM, CHEBARO Abdallah 
abdallah.cheb...@murex.commailto:abdallah.cheb...@murex.com wrote:
Yes I mean the data is on hdfs like filesystem

From: Nitin Pawar 
[mailto:nitinpawar...@gmail.commailto:nitinpawar...@gmail.com]
Sent: Tuesday, August 12, 2014 3:26 PM
To: user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: Distributed data

what do you mean the data is distributed on many computers?

are you saying the data is on hdfs like filesystem ?

On Tue, Aug 12, 2014 at 5:51 PM, CHEBARO Abdallah 
abdallah.cheb...@murex.commailto:abdallah.cheb...@murex.com wrote:
Hello,

Using Hive, we know that we should specify the file path to read data from a 
specific location. If the data is distributed on many computers, how can we 
read it?

Thanks

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Nitin Pawar


***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Nitin Pawar
***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.


RE: Distributed data

2014-08-12 Thread CHEBARO Abdallah
Hello,

Please explain to me : “If your hadoop is setup with same filesystem as hdfs, 
hive will take care of it “

From: Nitin Pawar [mailto:nitinpawar...@gmail.com]
Sent: Tuesday, August 12, 2014 3:50 PM
To: user@hive.apache.org
Subject: Re: Distributed data

If your hadoop is setup with same filesystem as hdfs, hive will take care of it

If your hdfs is totally different than where the file resides, then you need to 
get the file from that filesystem and then push it to hive using load

if that filesystem supports import/export with tools like sqoop then you can 
use them as well



On Tue, Aug 12, 2014 at 5:58 PM, CHEBARO Abdallah 
abdallah.cheb...@murex.commailto:abdallah.cheb...@murex.com wrote:
Yes I mean the data is on hdfs like filesystem

From: Nitin Pawar 
[mailto:nitinpawar...@gmail.commailto:nitinpawar...@gmail.com]
Sent: Tuesday, August 12, 2014 3:26 PM
To: user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: Distributed data

what do you mean the data is distributed on many computers?

are you saying the data is on hdfs like filesystem ?

On Tue, Aug 12, 2014 at 5:51 PM, CHEBARO Abdallah 
abdallah.cheb...@murex.commailto:abdallah.cheb...@murex.com wrote:
Hello,

Using Hive, we know that we should specify the file path to read data from a 
specific location. If the data is distributed on many computers, how can we 
read it?

Thanks

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Nitin Pawar


***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Nitin Pawar
***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.


Re: hive auto join conversion

2014-08-12 Thread Chen Song
Yeah, I was trying the same thing, though a little big ugly.

My query needs to LJ/J with multiple tables. When there are 1 or 2 LJ/Js,
rewriting works but when there are  3 tables, the got the same exception
triggered by the following bug.

https://issues.apache.org/jira/browse/HIVE-5891

Chen


On Wed, Jul 30, 2014 at 10:07 PM, Eugene Koifman ekoif...@hortonworks.com
wrote:

 would manually rewriting the query from (T1 union all T2) LOJ S to
 equivalent (T1 LOJ S) union all (T2 LOJ S) help work around this issue?


 On Wed, Jul 30, 2014 at 6:19 PM, Chen Song chen.song...@gmail.com wrote:

 I tried that and I got the following error.

 FAILED: SemanticException [Error 10227]: Not all clauses are supported
 with mapjoin hint. Please remove mapjoin hint.

 I then tried turning off auto join conversion.

 set hive.auto.convert.join=false

 But no luck, same error.

 Looks like it is a known issue,


 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/bk_releasenotes_hdp_2.0/content/ch_relnotes-hdp2.0.0.2-5-2.html

 Chen




 On Wed, Jul 30, 2014 at 9:10 PM, Navis류승우 navis@nexr.com wrote:

 Could you do it with hive.ignore.mapjoin.hint=false? Mapjoin hint is
 ignored from hive-0.11.0 by default (see
 https://issues.apache.org/jira/browse/HIVE-4042)

 Thanks,
 Navis


 2014-07-31 10:04 GMT+09:00 Chen Song chen.song...@gmail.com:

 I am using cdh5 with hive 0.12. We have some hive jobs migrated from
 hive 0.10 and they are written like below:

 select /*+ MAPJOIN(sup) */ c1, c2, sup.c
 from
 (
 select key, c1, c2 from table1
 union all
 select key, c1, c2 from table2
 ) table
 left outer join
 sup
 on (table.c1 = sup.key)
 distribute by c1

 In Hive 0.10 (CDH4), Hive translates the left outer join into a map
 join (map only job), followed by a regular MR job for distribute by.

 In Hive 0.12 (CDH5), Hive is not able to convert the join into a map
 join. Instead it launches a common map reduce for the join, followed by
 another mr for distribute by. However, when I take out the union all
 operator, Hive seems to be able to create a single MR job, with map join on
 map phase, and reduce for distribute by.

 I read a bit on
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
 and found out that there are some restrictions on map side join
 starting Hive 0.11. The following are not supported.


- Union Followed by a MapJoin
- Lateral View Followed by a MapJoin
- Reduce Sink (Group By/Join/Sort By/Cluster By/Distribute By)
Followed by MapJoin
- MapJoin Followed by Union
- MapJoin Followed by Join
- MapJoin Followed by MapJoin


 So if one side of the table (big side) is a union of some tables and
 the other side is a small table, Hive would not be able to do a map join at
 all? Is that correct?

 If correct, what should I do to make the job backward compatible?

 --
 Chen Song





 --
 Chen Song




 --

 Thanks,
 Eugene

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
Chen Song


Re: ulimit for Hive

2014-08-12 Thread Zhijie Shen
+ Hive user mailing list

It should be a better place for your questions.


On Mon, Aug 11, 2014 at 3:17 PM, Ana Gillan ana.gil...@gmail.com wrote:

 Hi,

 I’ve been reading a lot of posts about needing to set a high ulimit for
 file descriptors in Hadoop and I think it’s probably the cause of a lot of
 the errors I’ve been having when trying to run queries on larger data sets
 in Hive. However, I’m really confused about how and where to set the limit,
 so I have a number of questions:

1. How high is it recommended to set the ulimit?
2. What is the difference between soft and hard limits? Which one
needs to be set to the value from question 1?
3. For which user(s) do I set the ulimit? If I am running the Hive
query with my login, do I set my own ulimit to the high value?
4. Do I need to set this limit for these users on all the machines in
the cluster? (we have one master node and 6 slave nodes)
5. Do I need to restart anything after configuring the ulimit?

 Thanks in advance,
 Ana




-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


ArrayWritableGroupConverter

2014-08-12 Thread Raymond Lau
Hello.  (First off, sorry if I accidentally posted to the wrong mailing
list before - dev - and you are getting this again)

Regarding the ArrayWritableGroupConverter class: I was just wondering how
come the field count has to be either 1 or 2?  I'm trying to read a column
where the amount is fields is 3 and I'm getting an invalid parquet hive
schema (in hive 0.12) error when I try to do so.  It looks like it links
back to here.

*https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java*


Thanks,
-Raymond


Question about use of Table Lock Manager for clients

2014-08-12 Thread Alexander Kolbasov
Hive documentation 
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/C
DH4-Installation-Guide/cdh4ig_topic_18_5.html describes the use of Table
Lock Manager for HiveServer2. What isn¹t clear there is whether the lock
manager should be enabled only on the host running HiveServer2 or on Hive
clients as well?

- Alex



Re: Getting access to hadoop output from Hive JDBC session

2014-08-12 Thread Lefty Leverenz

 If you were using a remote HS2, then you need a getLog api call, which is a
 work in progress in one of the jiras.


HIVE-4629 https://issues.apache.org/jira/browse/HIVE-4629 HS2 should
support an API to retrieve query logs

-- Lefty


On Tue, Aug 12, 2014 at 10:45 PM, Thejas Nair the...@hortonworks.com
wrote:

 you are running HS2 in embedded mode, I think you should be able to get the
 hadoop output by setting the log4j settings appropriately.
 If you were using a remote HS2, then you need a getLog api call, which is a
 work in progress in one of the jiras.



 On Tue, Aug 12, 2014 at 6:17 PM, Alexander Kolbasov ak...@conviva.com
 wrote:

  Cross-posted from user@hive.apache.org
 
  Hello,
 
  I am switching from Hive 0.9 to Hive 0.12 and decided to start using Hive
  metadata server mode. As it turns out, Hive1 JDBC driver connected as
  jdbc:hive:// only works via direct access to the metastore database.
 The
  Hive2 driver connected as jdbc:hive2:// does work with the remote Hive
  metastore server, but there is another serious difference in behavior.
  When I was using Hive1 driver I saw Hadoop output - the information about
  Hive job ID and the usual Hadoop output showing percentages of map and
  reduce done. The Hive2 driver silently waited for map/reduce to complete
  and just produced the result.
 
  As I can see, both Hive itself and beeline are able to get the same
 Hadoop
  output as I was getting with Hive1 driver, so it should be somehow
  possible but it isn't clear how they do this. Can someone suggest the way
  to get Hadoop output with Hive2 JDBC driver?
 
  Thanks for any help!
 
  - Alex
 
 
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: hive query with in statement

2014-08-12 Thread Sreenath
Hi,

hive doesn't support IN clause. you might want to check out
http://stackoverflow.com/questions/7677333/how-to-write-subquery-and-use-in-clause-in-hive


On 12 August 2014 17:07, ilhami Kalkan ilhami1...@hotmail.com wrote:

 Hi all,
 I have a problem with IN statement in HiveQL. My table cdr, column
 calldate which type is date. First query is successfully return:
 select * from cdr where calldate = '2014-05-02';

 But when query with IN statement,

 select * from cdr where calldate in ( '2014-08-11','2014-05-02');

 it returns below exception:

 Error: Error while processing statement: FAILED: SemanticException
 [Error 10014]: Line 1:38 Wrong arguments ''20014-03-02'': The arguments
 for IN should be the same type! Types are: {date IN (string, string)}
 (state=42000,code=10014)

 How can I handle this?
 Thanks.

 Hive version 0.12










-- 
Sreenath S Kamath
Bangalore
Ph No:+91-9590989106