Re: select count(*) from table;

2016-03-22 Thread Nitin Pawar
If you have enabled performance optimization by enabling statistics it will
come from there
if the underlying file format supports infile statistics (like ORC), it
will come from there
if its just plain vanilla text file format, it needs to run a job to get
the count so the longest of all

On Tue, Mar 22, 2016 at 12:44 PM, Amey Barve <ameybarv...@gmail.com> wrote:

> select count(*) from table;
>
> How does hive evaluate count(*) on a table?
>
> Does it return count by actually querying table, or directly return count
> by consulting some statistics locally.
>
> For Hive's Text format it takes few seconds while Hive's Orc format takes
> fraction of seconds.
>
> Regards,
> Amey
>



-- 
Nitin Pawar


Re: Importing Oracle data into Hive

2016-01-31 Thread Nitin Pawar
check sqoop

On Sun, Jan 31, 2016 at 6:36 PM, Ashok Kumar <ashok34...@yahoo.com> wrote:

>   Hi,
>
> What is the easiest method of importing data from an Oracle 11g table to
> Hive please? This will be a weekly periodic job. The source table has 20
> million rows.
>
> I am running Hive 1.2.1
>
> regards
>
>
>


-- 
Nitin Pawar


Re: How to load XML file in Hive table

2016-01-10 Thread Nitin Pawar
take a look at this
https://github.com/dvasilen/Hive-XML-SerDe/wiki/XML-data-sources

On Mon, Jan 11, 2016 at 9:30 AM, nitinpathakala . <nitinpathak...@gmail.com>
wrote:

> Hello,
>
> Any ideas on this .
>
> Thanks,
> Nitin
>
> On Thu, Jan 7, 2016 at 6:06 PM, nitinpathakala . <nitinpathak...@gmail.com
> > wrote:
>
>> Hello,
>>
>> We have a requirement to load data from xml file to Hive tables.
>> The  xml tags woud be the columns and values will be the data for those
>> columns.
>> Any pointers will be really helpful.
>>
>> Thanks,
>> Nitin
>>
>
>


-- 
Nitin Pawar


Re: Hive Query failing !!!

2015-09-22 Thread Nitin Pawar
Ok Sorry my bad
I had overlooked your query that you are doing joins via where clause.


On Tue, Sep 22, 2015 at 12:20 PM, @Sanjiv Singh <sanjiv.is...@gmail.com>
wrote:

> Nitin,
>
> Following setting already there at HIVE.
> set hive.exec.mode.local.auto=false;
>
> Surprisingly , when it did following setting , it started working 
> set hive.auto.convert.join=true;
>
> can you please help me understand , what had happened ?
>
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
> On Tue, Sep 22, 2015 at 11:41 AM, Nitin Pawar <nitinpawar...@gmail.com>
> wrote:
>
>> Can you try setting these
>> set hive.exec.mode.local.auto=false;
>>
>>
>> On Tue, Sep 22, 2015 at 11:25 AM, @Sanjiv Singh <sanjiv.is...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> *Hi Folks,*
>>>
>>>
>>> *I am running given hive query . it is giving error while executing.
>>> please help me get out of it and understand possible reason for error.*
>>>
>>> *Hive Query :*
>>>
>>> SELECT *
>>> FROM  store_sales ,  date_dim ,  store ,
>>> household_demographics ,  customer_address
>>> WHERE store_sales.ss_sold_date_sk = date_dim.d_date_sk AND
>>> store_sales.ss_store_sk = store.s_store_sk
>>> AND store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk  AND
>>> store_sales.ss_addr_sk = customer_address.ca_address_sk
>>> AND  ( date_dim.d_dom BETWEEN 1 AND 2 )
>>> AND (household_demographics.hd_dep_count = 3 OR
>>> household_demographics.hd_vehicle_count = -1 )
>>> AND date_dim.d_year  IN (1998, 1998 + 1 , 1998 + 2 )  AND store.s_city
>>> IN ('Midway','Fairview')  ;
>>>
>>>
>>> *Note : *
>>> All tables [store_sales ,  date_dim ,  store ,
>>> household_demographics ,  customer_address] are in ORC format.
>>> hive version  : 1.0.0
>>>
>>>
>>> *Additional note :*
>>> I also checked hive EXPLAIN for same query . It is failing at last stage
>>> where is joining intermediate result to customer_address.
>>> I also checked for null values on store_sales.ss_addr_sk ,
>>> customer_address.ca_address_sk. which is not the case.
>>> I also changed hive log level to DEBUG , not specific in log file
>>> regarding error.
>>>
>>> I really wanted to understand why hive query is failing.
>>> and how can be resolved ?
>>> and where to look into ?
>>> any help is highly appreciated.
>>>
>>>
>>> *At Hive console :*
>>>
>>> Launching Job 4 out of 4
>>> Number of reduce tasks not specified. Estimated from input data size: 1
>>> In order to change the average load for a reducer (in bytes):
>>>   set hive.exec.reducers.bytes.per.reducer=
>>> In order to limit the maximum number of reducers:
>>>   set hive.exec.reducers.max=
>>> In order to set a constant number of reducers:
>>>   set mapreduce.job.reduces=
>>> java.lang.NullPointerException
>>> at
>>> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:265)
>>> at
>>> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:272)
>>> at
>>> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:509)
>>> ...
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>>> Job Submission failed with exception
>>> 'java.lang.NullPointerException(null)'
>>> FAILED: Execution Error, return code 1 from
>>> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>>> MapReduce Jobs Launched:
>>> Stage-Stage-5: Map: 2  Reduce: 1   Cumulative CPU: 4.08 sec   HDFS Read:
>>> 746 HDFS Write: 96 SUCCESS
>>> Stage-Stage-3: Map: 2  Reduce: 1   Cumulative CPU: 3.32 sec   HDFS Read:
>>> 889 HDFS Write: 96 SUCCESS
>>> Stage-Stage-1: Map: 2  Reduce: 1   Cumulative CPU: 3.21 sec   HDFS Read:
>>> 889 HDFS Write: 96 SUCCESS
>>>
>>>
>>>
>>>
>>> *Hive erro

Re: Hive Query failing !!!

2015-09-22 Thread Nitin Pawar
Client$1.run(JobClient.java:557)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:429)
> at
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1604)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1364)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1177)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:994)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:201)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:153)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:364)
> at
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:712)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:631)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:570)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>



-- 
Nitin Pawar


Re: Tables missing on the file system

2015-09-15 Thread Nitin Pawar
are you loading data in partitioned test_table after creating table and
before repairing using MSCK ?

On Tue, Sep 15, 2015 at 3:51 PM, ravi teja <raviort...@gmail.com> wrote:

> The location is present in the filesystem.
>
> Thanks,
> Ravi
>
> On Tue, Sep 15, 2015 at 12:16 PM, Chetna C <chetna@gmail.com> wrote:
>
>> Hi Ravi,
>> Please make sure the location mentioned while create table exists at
>> the time of *'MSCK REPAIR'*. This error occurs, if location does not
>> exists on fs.
>>
>> Thanks,
>> Chetna Chaudhari
>>
>> On 15 September 2015 at 12:03, ravi teja <raviort...@gmail.com> wrote:
>>
>>> Hi,
>>> I am getting this exception when I repair a table.
>>> Not sure what this means, didnt get any info while searching also.
>>>
>>> Can someone guide , what this means?
>>>
>>>
>>>  CREATE EXTERNAL TABLE IF NOT EXISTS  test_table 
>>>  OK
>>>  Time taken: 0.124 seconds
>>>
>>>  MSCK REPAIR TABLE test_table
>>>  OK
>>>  Tables missing on filesystem:  test_table
>>>
>>>  Time taken: 0.691 seconds, Fetched: 1 row(s)
>>>
>>>
>>> Thanks,
>>> Ravi
>>>
>>>
>>
>


-- 
Nitin Pawar


Re: Loading multiple file format in hive

2015-08-25 Thread Nitin Pawar
you are talking about 15 minutes delay to convert the job
so you have two options
1) redesign your table in a way where you have two partitions with two file
fomrats and you load data from one to other and then clear that partition,
so if you query data without partition it will read both file formats and
serve data
2) take a 15 mins delay in reporting and show the data only from paraquet
formats

On Tue, Aug 25, 2015 at 12:06 PM, Jeetendra G jeetendr...@housing.com
wrote:

 If I write to staging area and then run job to convert this data to
 parquet , there wont be delay of this much time? mean to say this data wont
 be available to hive until it converts to parquet and write to hive
 location?




 On Tue, Aug 25, 2015 at 11:53 AM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 Is it possible for you to write the data into staging area and run a job
 on that and then convert ito paraquet table ?
 so you are looking to have two table .. one temp for holding data till
 15mins and then your job loads this temp data to to your parquet backed
 table

 sorry for my misunderstanding .. you can though set fileformat at each
 partition level but then you need to entirely redesign your table to have
 staging partition and real data partition

 On Tue, Aug 25, 2015 at 11:46 AM, Jeetendra G jeetendr...@housing.com
 wrote:

 Thanks Nitin for reply.

 I have data coming from RabbitMQ and i have spark streaming API which
 take this events and dump into HDFS.
 I cant really convert data events to some format like parquet/orc
 because I dont have schema here.
 Once I dump to HDFS i am writing one job which read this data  and
 convert into Parquet.
 By this time I will have some raw events right?




 On Tue, Aug 25, 2015 at 11:35 AM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 file formats in a hive is a table level property.
 I am not sure why would you have data at 15mins interval to your actual
 table instead of a staging table and do the conversion or have the raw file
 in the format you want and load it directly into table

 On Tue, Aug 25, 2015 at 11:27 AM, Jeetendra G jeetendr...@housing.com
 wrote:

 I tried searching how to set multiple format with multiple partitions
 , could not find much detail.
 Can please share some good material around this if you have any.

 On Mon, Aug 24, 2015 at 10:49 PM, Daniel Haviv 
 daniel.ha...@veracity-group.com wrote:

 Hi,
 You can set a different file format per partition.
 You can't mix files in the same directory (You could theoretically
 write some kind of custom SerDe).

 Daniel.



 On Mon, Aug 24, 2015 at 6:15 PM, Jeetendra G jeetendr...@housing.com
  wrote:

 Can anyone put some light on this please?

 On Mon, Aug 24, 2015 at 12:32 PM, Jeetendra G 
 jeetendr...@housing.com wrote:

 HI All,

 I have a directory where I have json formatted and parquet files in
 same folder. can hive load these?

 I am getting Json data and storing in HDFS. later I am running job
 to convert JSon to Parquet(every 15 mins). so we will habe 15 mins Json
 data.

 Can i provide multiple serde in hive?

 regards
 Jeetendra







 --
 Nitin Pawar





 --
 Nitin Pawar





-- 
Nitin Pawar


Re: Loading multiple file format in hive

2015-08-25 Thread Nitin Pawar
file formats in a hive is a table level property.
I am not sure why would you have data at 15mins interval to your actual
table instead of a staging table and do the conversion or have the raw file
in the format you want and load it directly into table

On Tue, Aug 25, 2015 at 11:27 AM, Jeetendra G jeetendr...@housing.com
wrote:

 I tried searching how to set multiple format with multiple partitions ,
 could not find much detail.
 Can please share some good material around this if you have any.

 On Mon, Aug 24, 2015 at 10:49 PM, Daniel Haviv 
 daniel.ha...@veracity-group.com wrote:

 Hi,
 You can set a different file format per partition.
 You can't mix files in the same directory (You could theoretically write
 some kind of custom SerDe).

 Daniel.



 On Mon, Aug 24, 2015 at 6:15 PM, Jeetendra G jeetendr...@housing.com
 wrote:

 Can anyone put some light on this please?

 On Mon, Aug 24, 2015 at 12:32 PM, Jeetendra G jeetendr...@housing.com
 wrote:

 HI All,

 I have a directory where I have json formatted and parquet files in
 same folder. can hive load these?

 I am getting Json data and storing in HDFS. later I am running job to
 convert JSon to Parquet(every 15 mins). so we will habe 15 mins Json data.

 Can i provide multiple serde in hive?

 regards
 Jeetendra







-- 
Nitin Pawar


Re: Loading multiple file format in hive

2015-08-25 Thread Nitin Pawar
Is it possible for you to write the data into staging area and run a job on
that and then convert ito paraquet table ?
so you are looking to have two table .. one temp for holding data till
15mins and then your job loads this temp data to to your parquet backed
table

sorry for my misunderstanding .. you can though set fileformat at each
partition level but then you need to entirely redesign your table to have
staging partition and real data partition

On Tue, Aug 25, 2015 at 11:46 AM, Jeetendra G jeetendr...@housing.com
wrote:

 Thanks Nitin for reply.

 I have data coming from RabbitMQ and i have spark streaming API which take
 this events and dump into HDFS.
 I cant really convert data events to some format like parquet/orc because
 I dont have schema here.
 Once I dump to HDFS i am writing one job which read this data  and convert
 into Parquet.
 By this time I will have some raw events right?




 On Tue, Aug 25, 2015 at 11:35 AM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 file formats in a hive is a table level property.
 I am not sure why would you have data at 15mins interval to your actual
 table instead of a staging table and do the conversion or have the raw file
 in the format you want and load it directly into table

 On Tue, Aug 25, 2015 at 11:27 AM, Jeetendra G jeetendr...@housing.com
 wrote:

 I tried searching how to set multiple format with multiple partitions ,
 could not find much detail.
 Can please share some good material around this if you have any.

 On Mon, Aug 24, 2015 at 10:49 PM, Daniel Haviv 
 daniel.ha...@veracity-group.com wrote:

 Hi,
 You can set a different file format per partition.
 You can't mix files in the same directory (You could theoretically
 write some kind of custom SerDe).

 Daniel.



 On Mon, Aug 24, 2015 at 6:15 PM, Jeetendra G jeetendr...@housing.com
 wrote:

 Can anyone put some light on this please?

 On Mon, Aug 24, 2015 at 12:32 PM, Jeetendra G jeetendr...@housing.com
  wrote:

 HI All,

 I have a directory where I have json formatted and parquet files in
 same folder. can hive load these?

 I am getting Json data and storing in HDFS. later I am running job to
 convert JSon to Parquet(every 15 mins). so we will habe 15 mins Json 
 data.

 Can i provide multiple serde in hive?

 regards
 Jeetendra







 --
 Nitin Pawar





-- 
Nitin Pawar


Re: query behaviors with subquery in clause

2015-08-20 Thread Nitin Pawar
any help guys ?

On Thu, Aug 13, 2015 at 2:52 PM, Nitin Pawar nitinpawar...@gmail.com
wrote:

 Hi,

 right now hive does not support the equality clause in sub-queries.
 for ex:  select * from A where date = (select max(date) from B)

 It though supports IN clause
 select * from A where date in (select max(date) from B)

 in table A the table is partitioned by date column so i was hoping that
 when I apply IN clause it would look only for that partition but it is
 reading the entire table

 select * from A where date='2015-08-09' ... reads one partition
 select * from A where date in ('2015-08-09') ... reads one partitions
 select * from A where date in (select max(date) from B) ... reads all
 partitions from A

 am I missing anything error or am i doing something wrong ?

 --
 Nitin Pawar




-- 
Nitin Pawar


Re: query behaviors with subquery in clause

2015-08-20 Thread Nitin Pawar
Thanks Noam.
As we are doing this via oozie, it will be either EL Action of something
else

I will just get around with a temp table and do a join with temp table with
date column

On Thu, Aug 20, 2015 at 5:27 PM, Noam Hasson noam.has...@kenshoo.com
wrote:

 I observed in other situation, when ever you run queries where you don't
 specify statistics partitions, Hive doesn't pre-compute which one to take
 so it will take all the table.

 I would suggest implementing the max date by code in a separate query.


 On Thu, Aug 20, 2015 at 12:16 PM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 any help guys ?

 On Thu, Aug 13, 2015 at 2:52 PM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 Hi,

 right now hive does not support the equality clause in sub-queries.
 for ex:  select * from A where date = (select max(date) from B)

 It though supports IN clause
 select * from A where date in (select max(date) from B)

 in table A the table is partitioned by date column so i was hoping that
 when I apply IN clause it would look only for that partition but it is
 reading the entire table

 select * from A where date='2015-08-09' ... reads one partition
 select * from A where date in ('2015-08-09') ... reads one partitions
 select * from A where date in (select max(date) from B) ... reads all
 partitions from A

 am I missing anything error or am i doing something wrong ?

 --
 Nitin Pawar




 --
 Nitin Pawar



 This e-mail, as well as any attached document, may contain material which
 is confidential and privileged and may include trademark, copyright and
 other intellectual property rights that are proprietary to Kenshoo Ltd,
  its subsidiaries or affiliates (Kenshoo). This e-mail and its
 attachments may be read, copied and used only by the addressee for the
 purpose(s) for which it was disclosed herein. If you have received it in
 error, please destroy the message and any attachment, and contact us
 immediately. If you are not the intended recipient, be aware that any
 review, reliance, disclosure, copying, distribution or use of the contents
 of this message without Kenshoo's express permission is strictly prohibited.




-- 
Nitin Pawar


Re: [blocker] ArrayIndexoutofbound in a hive query

2015-07-31 Thread Nitin Pawar
sorry but i could not find following info
1) are you using tez as execution engine? if yes make sure its not snapshot
version
2) are you using orc file format? if yes then set flag to ignore corrupt
data
3) are there nulls on your join condition columns
if possible share the query and underlying file formats with some sample
data

On Fri, Jul 31, 2015 at 12:14 PM, ravi teja raviort...@gmail.com wrote:

 Hi,

 We are facing issue with our hive query with ArrayIndexoutofbound
 exception.
 I have tried googling out and I see many users facing the same error, but
 no solution yet. This is a blocker for our production and we really need
 help on this.

 We are using Hive version : 1.3.0.

 Our query is doing multiple joins(right and left).


 *Diagnostic Messages for this Task:*
 Error: java.lang.RuntimeException:
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
 processing row
 {_col0:48436215,_col1:87269315,_col2:\u,_col3:Customer,_col4:null,_col5:null,_col6:CSS
 Email,_col7:,_col8:null,_col9:null,_col10:null,_col11:null,_col12:null,_col13:null}
 at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
 Error while processing row
 {_col0:48436215,_col1:87269315,_col2:\u,_col3:Customer,_col4:null,_col5:null,_col6:CSS
 Email,_col7:,_col8:null,_col9:null,_col10:null,_col11:null,_col12:null,_col13:null}
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
 at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
 ... 8 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
 java.lang.ArrayIndexOutOfBoundsException
 at
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:403)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
 at
 org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
 at
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
 ... 9 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException
 at java.lang.System.arraycopy(Native Method)
 at org.apache.hadoop.io.Text.set(Text.java:225)
 at
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48)
 at
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:267)
 at
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:204)
 at
 org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
 at
 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
 at
 org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
 at
 org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
 at
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:558)
 at
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:383)
 ... 13 more


 FAILED: Execution Error, return code 2 from
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask



 Thanks,
 Ravi




-- 
Nitin Pawar


Re: [blocker] ArrayIndexoutofbound in a hive query

2015-07-31 Thread Nitin Pawar
is there a different output format or the output table bucketed?
can you try putting a not null condition on join columns

On Fri, Jul 31, 2015 at 12:45 PM, ravi teja raviort...@gmail.com wrote:

 Hi Nithin,
 Thanks for replying.
 The select query runs like  a charm, but only when insertion into a table,
 then this problem occurs.

 Please find the answers inline.


 Thanks,
 Ravi

 On Fri, Jul 31, 2015 at 12:34 PM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 sorry but i could not find following info
 1) are you using tez as execution engine? if yes make sure its not
 snapshot version  *NO*
 2) are you using orc file format? if yes then set flag to ignore corrupt
 data  *NO, Its Text file format*
 3) are there nulls on your join condition columns  *Yes, there might be
 some*
 if possible share the query and underlying file formats with some sample
 data   *I cant really share the query.*

 On Fri, Jul 31, 2015 at 12:14 PM, ravi teja raviort...@gmail.com wrote:

 Hi,

 We are facing issue with our hive query with ArrayIndexoutofbound
 exception.
 I have tried googling out and I see many users facing the same error,
 but no solution yet. This is a blocker for our production and we really
 need help on this.

 We are using Hive version : 1.3.0.

 Our query is doing multiple joins(right and left).


 *Diagnostic Messages for this Task:*
 Error: java.lang.RuntimeException:
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
 processing row
 {_col0:48436215,_col1:87269315,_col2:\u,_col3:Customer,_col4:null,_col5:null,_col6:CSS
 Email,_col7:,_col8:null,_col9:null,_col10:null,_col11:null,_col12:null,_col13:null}
 at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
 Runtime Error while processing row
 {_col0:48436215,_col1:87269315,_col2:\u,_col3:Customer,_col4:null,_col5:null,_col6:CSS
 Email,_col7:,_col8:null,_col9:null,_col10:null,_col11:null,_col12:null,_col13:null}
 at
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
 at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
 ... 8 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
 java.lang.ArrayIndexOutOfBoundsException
 at
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:403)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
 at
 org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
 at
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
 at
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
 ... 9 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException
 at java.lang.System.arraycopy(Native Method)
 at org.apache.hadoop.io.Text.set(Text.java:225)
 at
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48)
 at
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:267)
 at
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:204)
 at
 org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
 at
 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
 at
 org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
 at
 org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
 at
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:558)
 at
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:383)
 ... 13 more


 FAILED: Execution Error, return code 2 from
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask



 Thanks,
 Ravi




 --
 Nitin Pawar





-- 
Nitin Pawar


Re: Regarding query in HiveResultSet

2015-07-31 Thread Nitin Pawar
then why not just use max function?

select max(a) from (select sum(a) as a, b from t group by b)n

On Fri, Jul 31, 2015 at 12:48 PM, Renuka Be renunalin...@gmail.com wrote:

 Hi Nitin,

 I am using hive query.

 Regards,
 Renuka N.

 On Fri, Jul 31, 2015 at 2:42 AM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 are you writing your java code using hive or you are writing hive query?

 On Fri, Jul 31, 2015 at 11:13 AM, Renuka Be renunalin...@gmail.com
 wrote:

 Hi Folks,

 I want to find Max value from the HiveResult. There is option listed in
 HiveResultSet properties HiveResultSet.Max(). When i use this
 'HiveResultSet.Max()' it throws exception.

 Error : At least one object must implement IComparable.

 Is there any way to find Min, Max from the HiveResultSet?

 Thanks,
 Renuka N.




 --
 Nitin Pawar





-- 
Nitin Pawar


Re: Regarding query in HiveResultSet

2015-07-31 Thread Nitin Pawar
are you writing your java code using hive or you are writing hive query?

On Fri, Jul 31, 2015 at 11:13 AM, Renuka Be renunalin...@gmail.com wrote:

 Hi Folks,

 I want to find Max value from the HiveResult. There is option listed in
 HiveResultSet properties HiveResultSet.Max(). When i use this
 'HiveResultSet.Max()' it throws exception.

 Error : At least one object must implement IComparable.

 Is there any way to find Min, Max from the HiveResultSet?

 Thanks,
 Renuka N.




-- 
Nitin Pawar


Re: Regarding query in HiveResultSet

2015-07-31 Thread Nitin Pawar
why don't you get those as result of query results instead of iterating
through all on c# side?
your query can directly provide min and max

is there is a specific thing which is blocking you from getting from hive
and do it on application side?

On Fri, Jul 31, 2015 at 4:14 PM, Renuka Be renunalin...@gmail.com wrote:

 I have used hive query to get column values that returns HiveResultSet. I
 need to find Min and Max value in HiveResultSet in code level.
 Is there any possibility. I am using c#.

 -Renuka N


 On Fri, Jul 31, 2015 at 3:29 AM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 then why not just use max function?

 select max(a) from (select sum(a) as a, b from t group by b)n

 On Fri, Jul 31, 2015 at 12:48 PM, Renuka Be renunalin...@gmail.com
 wrote:

 Hi Nitin,

 I am using hive query.

 Regards,
 Renuka N.

 On Fri, Jul 31, 2015 at 2:42 AM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 are you writing your java code using hive or you are writing hive
 query?

 On Fri, Jul 31, 2015 at 11:13 AM, Renuka Be renunalin...@gmail.com
 wrote:

 Hi Folks,

 I want to find Max value from the HiveResult. There is option listed
 in HiveResultSet properties HiveResultSet.Max(). When i use this
 'HiveResultSet.Max()' it throws exception.

 Error : At least one object must implement IComparable.

 Is there any way to find Min, Max from the HiveResultSet?

 Thanks,
 Renuka N.




 --
 Nitin Pawar





 --
 Nitin Pawar





-- 
Nitin Pawar


Re: character '' not supported here

2015-07-20 Thread Nitin Pawar
I could not solve the problem so I had to recreate the table from another
temp table

I think its issue with ORC file format
may be we can post to dev@ or wait for some dev to response

On Mon, Jul 20, 2015 at 1:51 PM, patcharee patcharee.thong...@uni.no
wrote:

  Hi,

 I created a hive table stored as orc file (partitioned and compressed by
 ZLIB) from Hive CLI, added data into this table by a Spark application.
 After adding I was able to query data and everything looked fine. Then I
 concatenated the table from Hive CLI. After that I am not able to query
 data, like select count(*) from Table, any more, just got error line 1:1
 character '' not supported here, no matter Tez or MR engine.

 How can you solve the problem in your case?

 BR,
 Patcharee



 On 18. juli 2015 21:26, Nitin Pawar wrote:

  can you tell exactly what steps you did/?
  also did you try running the query with processing to MR instead of tez?
  not sure this issue with orc file formats .. i had once faced issues on
 alter table for orc backed tabled on adding a new column

 On Sun, Jul 19, 2015 at 12:05 AM, pth001 patcharee.thong...@uni.no
 wrote:

  Hi,

 The query result

 11236119012.64043-5.9708868.5592070.0
 0.00.0-19.6869931308.804799848.00.0061966440.0
 0.0301.274750.382470460.0NULL1120081
 11236122012.513598-6.36717137.39279460.0
 0.00.0-22.3003921441.054799848.00.00508465060.0
 0.0112.207870.304595230.0NULL1120081
 5122503682415.1955.1722354.9027147
 -0.0244086120.023590.553-38.96928-1130.046974660.54
 2.5969802E-49.706164E-1123054.2680.00.241967370.0
 NULL1120081
 9121449412.25196412.081688-9.594620.0
 0.00.0-25.93576258.6562599848.00.00217082170.0
 0.01.29632131.15602660.0NULL1120081
 9121458412.3020987.752461-12.1834630.0
 0.00.0-24.983763351.195399848.00.00237235990.0
 0.01.41373750.992398860.0NULL1120081

 I stored table in orc format, partitioned and compressed by ZLIB. The
 problem happened just after I concatenate table.

 BR,
 Patcharee

 On 18/07/15 12:46, Nitin Pawar wrote:

  select * without where will work because it does not involve file
 processing
  I suspect the problem is with field delimiter so i asked for records so
 that we can see whats the data in each column

  are you using csv file with columns delimited by some char and it has
 numeric data in quotes ?

 On Sat, Jul 18, 2015 at 3:58 PM, patcharee patcharee.thong...@uni.no
 wrote:

  This select * from table limit 5; works, but not others. So?

 Patcharee


 On 18. juli 2015 12:08, Nitin Pawar wrote:

 can you do select * from table limit 5;

 On Sat, Jul 18, 2015 at 3:35 PM, patcharee patcharee.thong...@uni.no
 wrote:

 Hi,

 I am using hive 0.14 with Tez engine. Found a weird problem. Any
 suggestions?

 hive select count(*) from 4D;
 line 1:1 character '' not supported here
 line 1:2 character '' not supported here
 line 1:3 character '' not supported here
 line 1:4 character '' not supported here
 line 1:5 character '' not supported here
 line 1:6 character '' not supported here
 line 1:7 character '' not supported here
 line 1:8 character '' not supported here
 line 1:9 character '' not supported here
 ...
 ...
 line 1:131 character '' not supported here
 line 1:132 character '' not supported here
 line 1:133 character '' not supported here
 line 1:134 character '' not supported here
 line 1:135 character '' not supported here
 line 1:136 character '' not supported here
 line 1:137 character '' not supported here
 line 1:138 character '' not supported here
 line 1:139 character '' not supported here
 line 1:140 character '' not supported here
 line 1:141 character '' not supported here
 line 1:142 character '' not supported here
 line 1:143 character '' not supported here
 line 1:144 character '' not supported here
 line 1:145 character '' not supported here
 line 1:146 character '' not supported here

 BR,
 Patcharee





 --
 Nitin Pawar





 --
 Nitin Pawar





 --
 Nitin Pawar





-- 
Nitin Pawar


Re: character '' not supported here

2015-07-18 Thread Nitin Pawar
can you do select * from table limit 5;

On Sat, Jul 18, 2015 at 3:35 PM, patcharee patcharee.thong...@uni.no
wrote:

 Hi,

 I am using hive 0.14 with Tez engine. Found a weird problem. Any
 suggestions?

 hive select count(*) from 4D;
 line 1:1 character '' not supported here
 line 1:2 character '' not supported here
 line 1:3 character '' not supported here
 line 1:4 character '' not supported here
 line 1:5 character '' not supported here
 line 1:6 character '' not supported here
 line 1:7 character '' not supported here
 line 1:8 character '' not supported here
 line 1:9 character '' not supported here
 ...
 ...
 line 1:131 character '' not supported here
 line 1:132 character '' not supported here
 line 1:133 character '' not supported here
 line 1:134 character '' not supported here
 line 1:135 character '' not supported here
 line 1:136 character '' not supported here
 line 1:137 character '' not supported here
 line 1:138 character '' not supported here
 line 1:139 character '' not supported here
 line 1:140 character '' not supported here
 line 1:141 character '' not supported here
 line 1:142 character '' not supported here
 line 1:143 character '' not supported here
 line 1:144 character '' not supported here
 line 1:145 character '' not supported here
 line 1:146 character '' not supported here

 BR,
 Patcharee





-- 
Nitin Pawar


Re: character '' not supported here

2015-07-18 Thread Nitin Pawar
select * without where will work because it does not involve file
processing
I suspect the problem is with field delimiter so i asked for records so
that we can see whats the data in each column

are you using csv file with columns delimited by some char and it has
numeric data in quotes ?

On Sat, Jul 18, 2015 at 3:58 PM, patcharee patcharee.thong...@uni.no
wrote:

  This select * from table limit 5; works, but not others. So?

 Patcharee


 On 18. juli 2015 12:08, Nitin Pawar wrote:

 can you do select * from table limit 5;

 On Sat, Jul 18, 2015 at 3:35 PM, patcharee patcharee.thong...@uni.no
 wrote:

 Hi,

 I am using hive 0.14 with Tez engine. Found a weird problem. Any
 suggestions?

 hive select count(*) from 4D;
 line 1:1 character '' not supported here
 line 1:2 character '' not supported here
 line 1:3 character '' not supported here
 line 1:4 character '' not supported here
 line 1:5 character '' not supported here
 line 1:6 character '' not supported here
 line 1:7 character '' not supported here
 line 1:8 character '' not supported here
 line 1:9 character '' not supported here
 ...
 ...
 line 1:131 character '' not supported here
 line 1:132 character '' not supported here
 line 1:133 character '' not supported here
 line 1:134 character '' not supported here
 line 1:135 character '' not supported here
 line 1:136 character '' not supported here
 line 1:137 character '' not supported here
 line 1:138 character '' not supported here
 line 1:139 character '' not supported here
 line 1:140 character '' not supported here
 line 1:141 character '' not supported here
 line 1:142 character '' not supported here
 line 1:143 character '' not supported here
 line 1:144 character '' not supported here
 line 1:145 character '' not supported here
 line 1:146 character '' not supported here

 BR,
 Patcharee





 --
 Nitin Pawar





-- 
Nitin Pawar


Re: character '' not supported here

2015-07-18 Thread Nitin Pawar
can you tell exactly what steps you did/?
also did you try running the query with processing to MR instead of tez?
not sure this issue with orc file formats .. i had once faced issues on
alter table for orc backed tabled on adding a new column

On Sun, Jul 19, 2015 at 12:05 AM, pth001 patcharee.thong...@uni.no wrote:

  Hi,

 The query result

 11236119012.64043-5.9708868.5592070.0
 0.00.0-19.6869931308.804799848.00.0061966440.0
 0.0301.274750.382470460.0NULL1120081
 11236122012.513598-6.36717137.39279460.0
 0.00.0-22.3003921441.054799848.00.00508465060.0
 0.0112.207870.304595230.0NULL1120081
 5122503682415.1955.1722354.9027147
 -0.0244086120.023590.553-38.96928-1130.046974660.54
 2.5969802E-49.706164E-1123054.2680.00.241967370.0
 NULL1120081
 9121449412.25196412.081688-9.594620.0
 0.00.0-25.93576258.6562599848.00.00217082170.0
 0.01.29632131.15602660.0NULL1120081
 9121458412.3020987.752461-12.1834630.0
 0.00.0-24.983763351.195399848.00.00237235990.0
 0.01.41373750.992398860.0NULL1120081

 I stored table in orc format, partitioned and compressed by ZLIB. The
 problem happened just after I concatenate table.

 BR,
 Patcharee

 On 18/07/15 12:46, Nitin Pawar wrote:

  select * without where will work because it does not involve file
 processing
  I suspect the problem is with field delimiter so i asked for records so
 that we can see whats the data in each column

  are you using csv file with columns delimited by some char and it has
 numeric data in quotes ?

 On Sat, Jul 18, 2015 at 3:58 PM, patcharee patcharee.thong...@uni.no
 wrote:

  This select * from table limit 5; works, but not others. So?

 Patcharee


 On 18. juli 2015 12:08, Nitin Pawar wrote:

 can you do select * from table limit 5;

 On Sat, Jul 18, 2015 at 3:35 PM, patcharee patcharee.thong...@uni.no
 wrote:

 Hi,

 I am using hive 0.14 with Tez engine. Found a weird problem. Any
 suggestions?

 hive select count(*) from 4D;
 line 1:1 character '' not supported here
 line 1:2 character '' not supported here
 line 1:3 character '' not supported here
 line 1:4 character '' not supported here
 line 1:5 character '' not supported here
 line 1:6 character '' not supported here
 line 1:7 character '' not supported here
 line 1:8 character '' not supported here
 line 1:9 character '' not supported here
 ...
 ...
 line 1:131 character '' not supported here
 line 1:132 character '' not supported here
 line 1:133 character '' not supported here
 line 1:134 character '' not supported here
 line 1:135 character '' not supported here
 line 1:136 character '' not supported here
 line 1:137 character '' not supported here
 line 1:138 character '' not supported here
 line 1:139 character '' not supported here
 line 1:140 character '' not supported here
 line 1:141 character '' not supported here
 line 1:142 character '' not supported here
 line 1:143 character '' not supported here
 line 1:144 character '' not supported here
 line 1:145 character '' not supported here
 line 1:146 character '' not supported here

 BR,
 Patcharee





 --
 Nitin Pawar





 --
 Nitin Pawar





-- 
Nitin Pawar


Re: Hive Query Error

2015-07-09 Thread Nitin Pawar
can u check your config?
host appears twice 01hw357381.tcsgegdc.com: 01hw357381.tcsgegdc.com
it shd be hostname:port

also once you correct this, you do a nslookup on the host to make sure its
identified by the hive client

On Thu, Jul 9, 2015 at 7:19 PM, Ajeet O ajee...@tcs.com wrote:

 Hi All , I have installed Hadoop 2.0 ,  Hive 0.12  on Cent OS 7.

 When I run a query  in Hive -  select count(*)  from u_data ;  it gives
 following errors.   , However I can run  select  * from u_data ;  pls help.

 hive select count(*) from u_data;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 java.net.UnknownHostException: 01hw357381.tcsgegdc.com:
 01hw357381.tcsgegdc.com: unknown error
 at java.net.InetAddress.getLocalHost(InetAddress.java:1484)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:439)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
 at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
 at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
 at
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425)
 at
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
 at
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
 at
 org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
 at
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 Caused by: java.net.UnknownHostException: 01hw357381.tcsgegdc.com:
 unknown error
 at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
 at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:907)
 at
 java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1302)
 at java.net.InetAddress.getLocalHost(InetAddress.java:1479)
 ... 34 more
 Job Submission failed with exception 'java.net.UnknownHostException(
 01hw357381.tcsgegdc.com: 01hw357381.tcsgegdc.com: unknown error)'
 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask

 Thanks
 Ajeet

 =-=-=
 Notice: The information contained in this e-mail
 message and/or attachments to it may contain
 confidential or privileged information. If you are
 not the intended recipient, any dissemination, use,
 review, distribution, printing or copying of the
 information contained in this e-mail message
 and/or attachments to it are strictly prohibited. If
 you have received this communication in error,
 please notify us by reply e-mail or telephone and
 immediately and permanently delete the message
 and any attachments. Thank you




-- 
Nitin Pawar


Re: fails to alter table concatenate

2015-06-30 Thread Nitin Pawar
can you try doing same by changing the query engine from tez to mr1? not
sure if its hive bug or tez bug

On Tue, Jun 30, 2015 at 1:46 PM, patcharee patcharee.thong...@uni.no
wrote:

 Hi,

 I am using hive 0.14. It fails to alter table concatenate occasionally
 (see the exception below). It is strange that it fails from time to time
 not predictable. Is there any suggestion/clue?

 hive alter table 4dim partition(zone=2,z=15,year=2005,month=4)
 CONCATENATE;

 
 VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING FAILED
 KILLED

 
 File MergeFAILED -1  00 -1   0   0

 
 VERTICES: 00/01  [--] 0%ELAPSED TIME:
 1435651968.00 s

 
 Status: Failed
 Vertex failed, vertexName=File Merge,
 vertexId=vertex_1435307579867_0041_1_00, diagnostics=[Vertex
 vertex_1435307579867_0041_1_00 [File Merge] killed/failed due
 to:ROOT_INPUT_INIT_FAILURE, Vertex Input:
 [hdfs://service-10-0.local:8020/apps/hive/warehouse/wrf_tables/4dim/zone=2/z=15/year=2005/month=4]
 initializer failed, vertex=vertex_1435307579867_0041_1_00 [File Merge],
 java.lang.NullPointerException
 at org.apache.hadoop.hive.ql.io
 .HiveInputFormat.init(HiveInputFormat.java:265)
 at org.apache.hadoop.hive.ql.io
 .CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:452)
 at
 org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:441)
 at
 org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:295)
 at
 org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:124)
 at
 org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
 at
 org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at
 org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
 at
 org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 ]
 DAG failed due to vertex failure. failedVertices:1 killedVertices:0
 FAILED: Execution Error, return code 2 from
 org.apache.hadoop.hive.ql.exec.DDLTask

 BR,
 Patcharee




-- 
Nitin Pawar


Re: Show table in Spark

2015-06-30 Thread Nitin Pawar
please check on spark userlist.
I don't think its related to hive

On Tue, Jun 30, 2015 at 4:42 PM, Vinod Kuamr vinod.rajan1...@yahoo.com
wrote:

 Hi Folks,

 Can anyone please let me know how to show content of dataframe in spark?

 when I using *dt.show()* ( here df is dataframe) I am getting following
 result

 [image: Inline image]


 I am using Scala version 1.3.1 in windows 8

 Thanks in advance,
 Vinod




-- 
Nitin Pawar


Re: Hive and elasticsearch-hadoop-2.1.0 : NoClassDefFoundError

2015-06-29 Thread Nitin Pawar
by any chance you built hive yourself  ?

On Mon, Jun 29, 2015 at 7:11 PM, Erwan Queffélec erwan.queffe...@gmail.com
wrote:

 Additional info : it works when I manually add the jar with ADD JAR file;

 hive ADD JAR '
 /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar'

 I'm quite new to hive and hadoop in general. This is my first post on this
 mailing list, so please excuse me if the folowing question has been asked 
 answered over and over again :

 Perhaps I'm a bit naíve, but I though that hive Custom SerDe/UD*Fs were
 able to access everything already on the Hive classpath. Was it just a
 dream ?

 I would greatly appreciate some pointers, thanks to anyone who might be
 able to help !

 Best regards,

 Erwan



 On Mon, Jun 29, 2015 at 2:30 PM, Erwan Queffélec 
 erwan.queffe...@gmail.com wrote:

 Hello,

 I'm running HortonWorks HDP 2.2.6, hive 0.14 alongside an ElasticSearch
 cluster

 For some reason Hive can't seem to connect to my ES cluster Using the ES
 SerDe (I'm using elasticsearch-hadoop-2.1.0.jar). When time comes for my
 job to insert the my query result in an ES resources, I get this stacktrace:

 NoClassDefFoundError: org/apache/commons/httpclient/URIException
  at
 org.elasticsearch.hadoop.hive.HiveUtils.structObjectInspector(HiveUtils.java:57)
  at org.elasticsearch.hadoop.hive.EsSerDe.initialize(EsSerDe.java:82)
  at org.elasticsearch.hadoop.hive.EsSerDe.initialize(EsSerDe.java:97)

 I'm quite puzzled because commons-httpclient is supposed to be on the
 hive-client classpath :

 # ls -l /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar
 -rw-r--r-- 1 root root 279781 Mar 31 20:26
 /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar
 # ls -l /usr/hdp/current/hive-client/lib/commons-httpclient-3.0.1.jar
 -rw-r--r-- 1 root root 279781 Mar 31 20:26
 /usr/hdp/current/hive-client/lib/commons-httpclient-3.0.1.jar

 What am I missing ?

 Thanks a lot for your help,

 Kind regards,

 Erwan





-- 
Nitin Pawar


Re: Hive and elasticsearch-hadoop-2.1.0 : NoClassDefFoundError

2015-06-29 Thread Nitin Pawar
I am using 2.2.4-2.2 but did not get any error.

can you check what all services are installed on the node where hive client
is running

On Mon, Jun 29, 2015 at 7:18 PM, Erwan Queffélec erwan.queffe...@gmail.com
wrote:

 Hi Nitin,

 No, I didn't do such a thing. I'm using the stock 0.14 version from HDP
 2.2.4 (said 2.2.6 earlier but that was wrong)

 # hive --version
 Hive 0.14.0.2.2.4.2-2
 Subversion
 git://ip-10-0-0-5.ec2.internal/grid/0/jenkins/workspace/HDP-2.2.4.1-centos6/bigtop/build/hive/rpm/BUILD/hive-0.14.0.2.2.4.2
 -r 115d99896f5a4a81e7d91e052e8d38d7436b78d4
 Compiled by jenkins on Tue Mar 31 16:26:33 EDT 2015
 From source with checksum 1f34a1d4e566c3e801582862ed85ee93

 Thanks for taking the time.

 Kind regards,

 Erwan

 On Mon, Jun 29, 2015 at 3:44 PM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 by any chance you built hive yourself  ?

 On Mon, Jun 29, 2015 at 7:11 PM, Erwan Queffélec 
 erwan.queffe...@gmail.com wrote:

 Additional info : it works when I manually add the jar with ADD JAR
 file;

 hive ADD JAR '
 /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar'

 I'm quite new to hive and hadoop in general. This is my first post on
 this mailing list, so please excuse me if the folowing question has been
 asked  answered over and over again :

 Perhaps I'm a bit naíve, but I though that hive Custom SerDe/UD*Fs were
 able to access everything already on the Hive classpath. Was it just a
 dream ?

 I would greatly appreciate some pointers, thanks to anyone who might be
 able to help !

 Best regards,

 Erwan



 On Mon, Jun 29, 2015 at 2:30 PM, Erwan Queffélec 
 erwan.queffe...@gmail.com wrote:

 Hello,

 I'm running HortonWorks HDP 2.2.6, hive 0.14 alongside an ElasticSearch
 cluster

 For some reason Hive can't seem to connect to my ES cluster Using the
 ES SerDe (I'm using elasticsearch-hadoop-2.1.0.jar). When time comes for my
 job to insert the my query result in an ES resources, I get this 
 stacktrace:

 NoClassDefFoundError: org/apache/commons/httpclient/URIException
  at
 org.elasticsearch.hadoop.hive.HiveUtils.structObjectInspector(HiveUtils.java:57)
  at org.elasticsearch.hadoop.hive.EsSerDe.initialize(EsSerDe.java:82)
  at org.elasticsearch.hadoop.hive.EsSerDe.initialize(EsSerDe.java:97)

 I'm quite puzzled because commons-httpclient is supposed to be on the
 hive-client classpath :

 # ls -l /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar
 -rw-r--r-- 1 root root 279781 Mar 31 20:26
 /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar
 # ls -l /usr/hdp/current/hive-client/lib/commons-httpclient-3.0.1.jar
 -rw-r--r-- 1 root root 279781 Mar 31 20:26
 /usr/hdp/current/hive-client/lib/commons-httpclient-3.0.1.jar

 What am I missing ?

 Thanks a lot for your help,

 Kind regards,

 Erwan





 --
 Nitin Pawar





-- 
Nitin Pawar


Re: Hive and elasticsearch-hadoop-2.1.0 : NoClassDefFoundError

2015-06-29 Thread Nitin Pawar
great it helped

On Mon, Jun 29, 2015 at 7:29 PM, Erwan Queffélec erwan.queffe...@gmail.com
wrote:

 [continued]
 A dependency for a custom UDF seems not to be properly shaded, as I could
 see in an excerpt of the maven build output:
 [INFO] Including org.apache.httpcomponents:httpclient:jar:4.1.2 in the
 shaded jar.
 [INFO] Including org.apache.httpcomponents:httpcore:jar:4.1.2 in the
 shaded jar.

 I'm going to look into this. Thanks a lot for confirming things worked as
 I expected on your end!

 Regards,

 Erwan

 On Mon, Jun 29, 2015 at 3:55 PM, Erwan Queffélec 
 erwan.queffe...@gmail.com wrote:

 Hi Nitin,

 Digging up a bit I discovered that the error is probably on our end :



 On Mon, Jun 29, 2015 at 3:54 PM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 I am using 2.2.4-2.2 but did not get any error.

 can you check what all services are installed on the node where hive
 client is running

 On Mon, Jun 29, 2015 at 7:18 PM, Erwan Queffélec 
 erwan.queffe...@gmail.com wrote:

 Hi Nitin,

 No, I didn't do such a thing. I'm using the stock 0.14 version from HDP
 2.2.4 (said 2.2.6 earlier but that was wrong)

 # hive --version
 Hive 0.14.0.2.2.4.2-2
 Subversion
 git://ip-10-0-0-5.ec2.internal/grid/0/jenkins/workspace/HDP-2.2.4.1-centos6/bigtop/build/hive/rpm/BUILD/hive-0.14.0.2.2.4.2
 -r 115d99896f5a4a81e7d91e052e8d38d7436b78d4
 Compiled by jenkins on Tue Mar 31 16:26:33 EDT 2015
 From source with checksum 1f34a1d4e566c3e801582862ed85ee93

 Thanks for taking the time.

 Kind regards,

 Erwan

 On Mon, Jun 29, 2015 at 3:44 PM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 by any chance you built hive yourself  ?

 On Mon, Jun 29, 2015 at 7:11 PM, Erwan Queffélec 
 erwan.queffe...@gmail.com wrote:

 Additional info : it works when I manually add the jar with ADD JAR
 file;

 hive ADD JAR '
 /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar'

 I'm quite new to hive and hadoop in general. This is my first post on
 this mailing list, so please excuse me if the folowing question has been
 asked  answered over and over again :

 Perhaps I'm a bit naíve, but I though that hive Custom SerDe/UD*Fs
 were able to access everything already on the Hive classpath. Was it 
 just a
 dream ?

 I would greatly appreciate some pointers, thanks to anyone who might
 be able to help !

 Best regards,

 Erwan



 On Mon, Jun 29, 2015 at 2:30 PM, Erwan Queffélec 
 erwan.queffe...@gmail.com wrote:

 Hello,

 I'm running HortonWorks HDP 2.2.6, hive 0.14 alongside an
 ElasticSearch cluster

 For some reason Hive can't seem to connect to my ES cluster Using
 the ES SerDe (I'm using elasticsearch-hadoop-2.1.0.jar). When time comes
 for my job to insert the my query result in an ES resources, I get this
 stacktrace:

 NoClassDefFoundError: org/apache/commons/httpclient/URIException
  at
 org.elasticsearch.hadoop.hive.HiveUtils.structObjectInspector(HiveUtils.java:57)
  at org.elasticsearch.hadoop.hive.EsSerDe.initialize(EsSerDe.java:82)
  at org.elasticsearch.hadoop.hive.EsSerDe.initialize(EsSerDe.java:97)

 I'm quite puzzled because commons-httpclient is supposed to be on
 the hive-client classpath :

 # ls -l
 /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar
 -rw-r--r-- 1 root root 279781 Mar 31 20:26
 /usr/hdp/current/hive-server2/lib/commons-httpclient-3.0.1.jar
 # ls -l /usr/hdp/current/hive-client/lib/commons-httpclient-3.0.1.jar
 -rw-r--r-- 1 root root 279781 Mar 31 20:26
 /usr/hdp/current/hive-client/lib/commons-httpclient-3.0.1.jar

 What am I missing ?

 Thanks a lot for your help,

 Kind regards,

 Erwan





 --
 Nitin Pawar





 --
 Nitin Pawar






-- 
Nitin Pawar


Re: Left function

2015-06-16 Thread Nitin Pawar
try using substr function

On Tue, Jun 16, 2015 at 3:03 PM, Ravisankar Mani rrav...@gmail.com wrote:

 Hi every one,


 how to get  leftmost length of characters from the string  in hive?

 In Mysql or sq has specific function

 LEFT(string,length) 

 Could you please help any other way to achieve this scenario?


 Regards
 Ravisnkar




-- 
Nitin Pawar


Re: difference between add file from a local disk and hdfs file

2015-05-16 Thread Nitin Pawar
Answering my own question

either way the file was available via distributed cache.
it was a spelling mistake in the code for me, correcting it solved the
problem

On Sun, May 17, 2015 at 2:46 AM, Nitin Pawar nitinpawar...@gmail.com
wrote:

 Hi,

 I am trying to access a lookup file from a udf.
 There are two ways I add lookup file to distribute cache

 option1: loading file from local disk to distributed cache this is for
 hive cli
 add file tmp.txt;


 option2: add a file from hdfs to distributed cache so that oozie can do it
 too
 add file hdfs:///user/admin/tmp.txt;


 i want to use a file from hdfs into distributed cache so that I can use it
 a hive udf.

 Problem is
 when I load a file using option1. it is available to the udf (works fine)
 hive add file format.txt;
 Added resources: [format.txt]
 hive list files;
 format.txt


 But when I load the file from hdfs, it moves into tmp folder  and i am not
 sure if the path remains same all the time
 hive add file hdfs:user/admin/tmp.txt ;
 converting to local hdfs:///user/admin/tmp.txt
 Added resources: [hdfs:tmp.txt]
 hive list files;

 /tmp/006ab981-ddac-4bcb-bee1-7d8ed9a271a0_resources/tmp.txt

 Question:  how do I get the file at same location (like option 1 all
 times)  cause from option 2 I keep getting the error tmp.txt does not
 exists when I initialize the udf

 thanks
 --
 Nitin Pawar




-- 
Nitin Pawar


difference between add file from a local disk and hdfs file

2015-05-16 Thread Nitin Pawar
Hi,

I am trying to access a lookup file from a udf.
There are two ways I add lookup file to distribute cache

option1: loading file from local disk to distributed cache this is for hive
cli
add file tmp.txt;


option2: add a file from hdfs to distributed cache so that oozie can do it
too
add file hdfs:///user/admin/tmp.txt;


i want to use a file from hdfs into distributed cache so that I can use it
a hive udf.

Problem is
when I load a file using option1. it is available to the udf (works fine)
hive add file format.txt;
Added resources: [format.txt]
hive list files;
format.txt


But when I load the file from hdfs, it moves into tmp folder  and i am not
sure if the path remains same all the time
hive add file hdfs:user/admin/tmp.txt ;
converting to local hdfs:///user/admin/tmp.txt
Added resources: [hdfs:tmp.txt]
hive list files;

/tmp/006ab981-ddac-4bcb-bee1-7d8ed9a271a0_resources/tmp.txt

Question:  how do I get the file at same location (like option 1 all times)
 cause from option 2 I keep getting the error tmp.txt does not exists when
I initialize the udf

thanks
-- 
Nitin Pawar


Re: user matching query does not exist

2015-05-15 Thread Nitin Pawar
this is related to djnago
see this on how to clear sessions from django
http://www.opencsw.org/community/questions/289/how-to-clear-the-django-session-cache

On Fri, May 15, 2015 at 12:24 PM, amit kumar ak3...@gmail.com wrote:

 Yes it is happening  for hue only, can u plz suggest how i cleaning up hue
 session from server ?

 The query is succeed in hive command line.

 On Fri, May 15, 2015 at 11:52 AM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 Is this happening for Hue?

 If yes, may be you can try cleaning up hue sessions from server. (this
 may clean all users active sessions from hue so be careful while doing it)



 On Fri, May 15, 2015 at 11:31 AM, amit kumar ak3...@gmail.com wrote:

 i am using CDH 5.2.1,

 Any pointers will be of immense help.



 Thanks



 On Fri, May 15, 2015 at 9:43 AM, amit kumar ak3...@gmail.com wrote:

 Hi,

 After re-create my account in Hue, i receives “User matching query does
 not exist” when attempting to perform hive query.

 The query is succeed in hive command line.

 Please suggest on this,

 
 Thanks you
 Amit





 --
 Nitin Pawar





-- 
Nitin Pawar


Re: user matching query does not exist

2015-05-15 Thread Nitin Pawar
Is this happening for Hue?

If yes, may be you can try cleaning up hue sessions from server. (this may
clean all users active sessions from hue so be careful while doing it)



On Fri, May 15, 2015 at 11:31 AM, amit kumar ak3...@gmail.com wrote:

 i am using CDH 5.2.1,

 Any pointers will be of immense help.



 Thanks



 On Fri, May 15, 2015 at 9:43 AM, amit kumar ak3...@gmail.com wrote:

 Hi,

 After re-create my account in Hue, i receives “User matching query does
 not exist” when attempting to perform hive query.

 The query is succeed in hive command line.

 Please suggest on this,

 
 Thanks you
 Amit





-- 
Nitin Pawar


Re: Stopping HiveServer2

2015-04-29 Thread Nitin Pawar
how did you start it ?


On Wed, Apr 29, 2015 at 4:26 PM, CHEBARO Abdallah 
abdallah.cheb...@murex.com wrote:

  Hello,



 How can I stop hiveserver2? I am not able to find the command.



 Thanks

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.




-- 
Nitin Pawar


Re: Clear up Hive scratch directory

2015-04-24 Thread Nitin Pawar
Thanks Martin

Can you also mention steps you did to reclaim the hdfs data from temporary
data ?

On Fri, Apr 24, 2015 at 12:21 PM, Martin Benson martin.ben...@jaywing.com
wrote:

  Hi All,

 I just wanted to feedback that it does appear to be safe - I emptied the
 directory manually, without adverse consequences.

 Thanks,

 Martin.
  --
 From: Martin Benson martin.ben...@jaywing.com
 Sent: ‎20/‎04/‎2015 18:06
 To: user@hive.apache.org
 Subject: Clear up Hive scratch directory

   Hi,

 One of my users tried to run an HUGE join, which failed due to a lack of
 space in HDFS. This has resulted in a large amount of data remaining in the
 Hive scratch directory which I need to clear down. I've tried setting
 hive.start.cleanup.scratchdir to true and restarting Hive, but it didn't
 tidy it up. So, I'm wondering if it is safe to just delete the content of
 the directory in HDFS (while Hive is stopped). Could anyone advise please?

 Many thanks,

 Martin.


  Registered in England and Wales at Players House, 300 Attercliffe
 Common, Sheffield, S9 2AG. Company number 05935923.

 This email and its attachments are confidential and are intended solely
 for the use of the addressed recipient.
 Any views or opinions expressed are those of the author and do not
 necessarily represent Jaywing. If you are not
 the intended recipient, you must not forward or show this to anyone or
 take any action based upon it.
 Please contact the sender if you received this in error.


  Registered in England and Wales at Players House, 300 Attercliffe
 Common, Sheffield, S9 2AG. Company number 05935923.

 This email and its attachments are confidential and are intended solely
 for the use of the addressed recipient.
 Any views or opinions expressed are those of the author and do not
 necessarily represent Jaywing. If you are not
 the intended recipient, you must not forward or show this to anyone or
 take any action based upon it.
 Please contact the sender if you received this in error.




-- 
Nitin Pawar


Re: Discrepancy in String matching between Teradata and HIVE

2015-03-27 Thread Nitin Pawar
Hive does not manipulate data by its own, if your processing logic needs
the trimming of spaces then you can provide that in query.



On Fri, Mar 27, 2015 at 1:17 PM, @Sanjiv Singh sanjiv.is...@gmail.com
wrote:


   Hi All,

 I am getting into Hive and learning hive. I have customer table in
 teradata , used sqoop to extract complete table in hive which worked fine.

 See below customer table both in Teradata and HIVE.

 *In Teradata :*

 select TOP 4 id,name,''||status||'' from customer;

 3172460 Customer#003172460  BUILDING  
 3017726 Customer#003017726  BUILDING  
 2817987 Customer#002817987  COMPLETE  
 2817984 Customer#002817984  BUILDING  

 *In HIVE :*

 select id,name,CONCAT ('' , status , '') from customer LIMIT 4;

 3172460 Customer#003172460  BUILDING  
 3017726 Customer#003017726  BUILDING  
 2817987 Customer#002817987  COMPLETE  
 2817984 Customer#002817984  BUILDING  

 When I tried to fetch records from table customer with column matching
 which is of String type. I am getting different result for same query in
 different environment.

 See below query results..

 *In Teradata :*

 select TOP 2 id,name,''||status||'' from customer WHERE status = 
 'BUILDING';

 3172460 Customer#003172460  BUILDING  
 3017726 Customer#003017726  BUILDING  

 *In HIVE :*

 select id,name,CONCAT ('' , status , '') from customer WHERE status = 
 'BUILDING' LIMIT 2;

 ***No Result***

 It seems that teradata is doing trimming short of thing before actually
 comparing stating values. But Hive is matching strings as it is.

 Not sure, It is expected behaviour or bug or can be raised as enhancement.

 I see below possible solution:

- Convert into like operator expression with wildcard character before
and after

 Looking forward for your response on this. How can it be handled/achieved
 in hive.

 Regards
 Sanjiv Singh
 Mob :  +091 9990-447-339




-- 
Nitin Pawar


Re: Discrepancy in String matching between Teradata and HIVE

2015-03-27 Thread Nitin Pawar
Hive is only PRO SQL compliance,

In hive the string comparisons work just like they would work in java

so in hive

BUILDING = BUILDING
BUILDING  != BUILDING (extra space added)

On Fri, Mar 27, 2015 at 2:11 PM, @Sanjiv Singh sanjiv.is...@gmail.com
wrote:

 Hi,

 I can use rtrim function, i.e:

 select id,name,CONCAT ('' , status , '') from customer WHERE rtrim(status) 
 = 'BUILDING' LIMIT 2;

 But question raised what standard in string comparison Hive uses?
 According to ANSI/ISO SQL-92 'BUILDING' == 'BUILDING ', Here is a link
 http://support.microsoft.com/en-us/kb/316626 for an article about it.

 Regards
 Sanjiv Singh
 Mob :  +091 9990-447-339

 On Fri, Mar 27, 2015 at 1:41 PM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 Hive does not manipulate data by its own, if your processing logic needs
 the trimming of spaces then you can provide that in query.



 On Fri, Mar 27, 2015 at 1:17 PM, @Sanjiv Singh sanjiv.is...@gmail.com
 wrote:


   Hi All,

 I am getting into Hive and learning hive. I have customer table in
 teradata , used sqoop to extract complete table in hive which worked fine.

 See below customer table both in Teradata and HIVE.

 *In Teradata :*

 select TOP 4 id,name,''||status||'' from customer;

 3172460 Customer#003172460  BUILDING  
 3017726 Customer#003017726  BUILDING  
 2817987 Customer#002817987  COMPLETE  
 2817984 Customer#002817984  BUILDING  

 *In HIVE :*

 select id,name,CONCAT ('' , status , '') from customer LIMIT 4;

 3172460 Customer#003172460  BUILDING  
 3017726 Customer#003017726  BUILDING  
 2817987 Customer#002817987  COMPLETE  
 2817984 Customer#002817984  BUILDING  

 When I tried to fetch records from table customer with column matching
 which is of String type. I am getting different result for same query in
 different environment.

 See below query results..

 *In Teradata :*

 select TOP 2 id,name,''||status||'' from customer WHERE status = 
 'BUILDING';

 3172460 Customer#003172460  BUILDING  
 3017726 Customer#003017726  BUILDING  

 *In HIVE :*

 select id,name,CONCAT ('' , status , '') from customer WHERE status = 
 'BUILDING' LIMIT 2;

 ***No Result***

 It seems that teradata is doing trimming short of thing before actually
 comparing stating values. But Hive is matching strings as it is.

 Not sure, It is expected behaviour or bug or can be raised as
 enhancement.

 I see below possible solution:

- Convert into like operator expression with wildcard character
before and after

 Looking forward for your response on this. How can it be
 handled/achieved in hive.

 Regards
 Sanjiv Singh
 Mob :  +091 9990-447-339




 --
 Nitin Pawar





-- 
Nitin Pawar


Re: Re: how to set column level privileges

2015-03-26 Thread Nitin Pawar
Column level security in hive was added at HIVE-5837
https://issues.apache.org/jira/browse/HIVE-5837

It has the PDF link for your readings.

https://cwiki.apache.org/confluence/display/Hive/AuthDev talks about
setting column level permissions

On Thu, Mar 26, 2015 at 4:39 PM, Allen bjallenw...@sina.com wrote:

 Thanks for your replay.

 If we handle the privileges by creating views, it will lead to lots of
 views in our database.

 I found there is a table named TBL_COL_PRIV in hive metastore database,
 maybe this table is related to column privilege,but it is never used in
 hive. Anybody knew why?


 


 - 原始邮件 -
 发件人:Daniel Haviv daniel.ha...@veracity-group.com
 收件人:user@hive.apache.org user@hive.apache.org
 主题:Re: how to set column level privileges
 日期:2015年03月26日 18点42分

 Create a view with the permitted columns and handle the privileges for it

 Daniel

 On 26 במרץ 2015, at 12:40, Allen bjallenw...@sina.com wrote:

 hi,

 We use SQL standards based authorization for authorization in Hive 0.14.
 But it  has not support for column level privileges.

 So, I want to know Is there anyway to set column level privileges?

  Thanks!

 




-- 
Nitin Pawar


Re: CREATE FUNCTION: How to automatically load extra jar file?

2014-12-30 Thread Nitin Pawar
:1282)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
 at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
 at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
 at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
 at
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
 at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:740)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 Job Submission failed with exception 'java.io.FileNotFoundException(File
 does not exist:
 hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
 )'
 Execution failed with exit status: 1
 Obtaining error information
 Task failed!
 Task ID:
   Stage-1
 Logs:
 /tmp/hadoop/hive.log
 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask


 Step 5: (check the file)
 hive dfs -ls
 /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar;
 ls: 
 `/tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar':
 No such file or directory
 Command failed with exit code = 1
 Query returned non-zero code: 1, cause: null













-- 
Nitin Pawar


Re: CREATE FUNCTION: How to automatically load extra jar file?

2014-12-30 Thread Nitin Pawar
If you put a file inside tmp then there is no guarantee it will live there
forever based on ur cluster configuration.

You may want to put it as a place where all users can access it like making
a folder and keeping it read permission

On Wed, Dec 31, 2014 at 11:40 AM, arthur.hk.c...@gmail.com 
arthur.hk.c...@gmail.com wrote:


 Hi,

 Thanks.

 Below are my steps, I did copy my JAR to HDFS and CREATE FUNCTION  using
 the JAR in HDFS, however during my smoke test, I got FileNotFoundException.

 java.io.FileNotFoundException: File does not exist:
 hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar




 Step 1:   (make sure the jar in in HDFS)
 hive dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
 -rw-r--r--   3 hadoop hadoop  57388 2014-12-30 10:02
 hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar

 Step 2: (drop if function exists)
 hive drop function sysdate;

 OK
 Time taken: 0.013 seconds

 Step 3: (create function using the jar in HDFS)
 hive CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate'
 using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
 converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
 Added
 /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
 to class path
 Added resource:
 /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
 OK
 Time taken: 0.034 seconds

 Step 4: (test)
 hive select sysdate();

 Execution log at:
 /tmp/hadoop/hadoop_20141230101717_282ec475-8621-40fa-8178-a7927d81540b.log
 java.io.FileNotFoundException: File does not exist:
 hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar



 Please help!

 Arthur



 On 31 Dec, 2014, at 12:31 am, Nitin Pawar nitinpawar...@gmail.com wrote:

 just copy pasting Jason's reply to other thread

 If you have a recent version of Hive (0.13+), you could try registering
 your UDF as a permanent UDF which was added in HIVE-6047:

 1) Copy your JAR somewhere on HDFS, say
 hdfs:///home/nirmal/udf/hiveUDF-1.0-SNAPSHOT.jar.
 2) In Hive, run CREATE FUNCTION zeroifnull AS
 'com.test.udf.ZeroIfNullUDF' USING JAR '
 hdfs:///home/nirmal/udf/hiveUDF-1.0-SNAPSHOT.jar';

 The function definition should be saved in the metastore and Hive should
 remember to pull the JAR from the location you specified in the CREATE
 FUNCTION call.

 On Tue, Dec 30, 2014 at 9:54 PM, arthur.hk.c...@gmail.com 
 arthur.hk.c...@gmail.com wrote:

 Thank you.

 Will this work for *hiveserver2 *?


 Arthur

 On 30 Dec, 2014, at 2:24 pm, vic0777 vic0...@163.com wrote:


 You can put it into $HOME/.hiverc like this: ADD JAR
 full_path_of_the_jar. Then, the file is automatically loaded when Hive is
 started.

 Wantao




 At 2014-12-30 11:01:06, arthur.hk.c...@gmail.com 
 arthur.hk.c...@gmail.com wrote:

 Hi,

 I am using Hive 0.13.1 on Hadoop 2.4.1, I need to automatically load an
 extra JAR file to hive for UDF, below are my steps to create the UDF
 function. I have tried the following but still no luck to get thru.

 Please help!!

 Regards
 Arthur


 Step 1:   (make sure the jar in in HDFS)
 hive dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
 -rw-r--r--   3 hadoop hadoop  57388 2014-12-30 10:02
 hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar

 Step 2: (drop if function exists)
 hive drop function sysdate;

 OK
 Time taken: 0.013 seconds

 Step 3: (create function using the jar in HDFS)
 hive CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate'
 using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
 converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
 Added
 /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
 to class path
 Added resource:
 /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
 OK
 Time taken: 0.034 seconds

 Step 4: (test)
 hive select sysdate();


 Automatically selecting local only mode for query
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in
 [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in
 [jar:file:/hadoop/hbase-0.98.5-hadoop2/lib/phoenix-4.1.0-client-hadoop2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 14/12/30 10:17:06 WARN conf.Configuration:
 file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an
 attempt to override final parameter:
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 14/12/30 10:17:06 WARN conf.Configuration:
 file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003

Re: Detailing on how UPDATE is performed in Hive

2014-11-27 Thread Nitin Pawar
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions

entire implementation is under jira here
https://issues.apache.org/jira/browse/HIVE-5317

On Thu, Nov 27, 2014 at 4:11 PM, unmesha sreeveni unmeshab...@gmail.com
wrote:

 Hi friends
   Where can I find details on how update is performed in Hive.

 ​1. When an update is performed,whether HDFS will write that block
 elsewhere with the new value.
 2. whether the old block is unallocated and is allowed for further writes.
 3. Whether this process create fragmentation ?
 4. while creating a partitioned table, and update is performed ,whether
 the partition is deleted and updated with new value or the entire block is
 deleted and written once again?

 where will be the good place to gather these knowlege​

 --
 *Thanks  Regards *


 *Unmesha Sreeveni U.B*
 *Hadoop, Bigdata Developer*
 *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
 http://www.unmeshasreeveni.blogspot.in/





-- 
Nitin Pawar


Re: UPDATE in Hive -0.14.0

2014-11-24 Thread Nitin Pawar
whats your create table DDL?
On 24 Nov 2014 13:43, unmesha sreeveni unmeshab...@gmail.com wrote:

 Hi

 I am using hive -0.14.0 which support UPDATE statement

 but I am getting an error once I did this Command
 UPDATE Emp SET salary = 5 WHERE employeeid = 19;

 FAILED: SemanticException [Error 10294]: Attempt to do update or delete
 using transaction manager that does not support these operations.
 hive


 Am I doing anything wrong?

 --
 *Thanks  Regards *


 *Unmesha Sreeveni U.B*
 *Hadoop, Bigdata Developer*
 *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
 http://www.unmeshasreeveni.blogspot.in/





Re: from_unixtime() and epoch definition

2014-11-06 Thread Nitin Pawar
Hi Maciek, Jason

Sorry I could not find my old code but I came up with a little code as much
as I can remember.
you can try the following jar
https://github.com/nitinpawar/hive-udfs/tree/master/FromUnixtimeWithTZ/dist

and let me know if this works for you guys.
I can change it the way it needs to be

PS: I am not a java dev so forgive anything bad I have done in there

On Thu, Nov 6, 2014 at 3:44 PM, Maciek mac...@sonra.io wrote:

 @Jason:
 re. Hive (…) just assumes things are in the system's local timezone,
 just to clarify - this is not true in case of conversions (from_unixtime())
 as it respects the local system TZ settings hence the problem.
 TZ itself is a very hairy subject and would definitely be a big
 undertaking. Extending from_unixtime seems like easiest solution for now.
 Happy to do ER in JIRA but haven't done this for before...

 @Nitin
 Would be very grateful if you're able to dig it out! Thanks!

 Best Regards


 On Thu, Nov 6, 2014 at 7:48 AM, Jason Dere jd...@hortonworks.com wrote:

 That would be great!

 On Nov 5, 2014, at 10:49 PM, Nitin Pawar nitinpawar...@gmail.com wrote:

 May be a JIRA ?

 I remember having my own UDF for doing this. If possible I will share the
 code

 On Thu, Nov 6, 2014 at 6:22 AM, Jason Dere jd...@hortonworks.com wrote:

 Hive should probably at least provide a timezone option to
 from_unixtime().
 As you mentioned, Hive doesn't really do any timezone handling, just
 assumes things are in the system's local timezone. It will be a bit of a
 bigger project to add better time zone handling to Hive timestamps.


 On Nov 5, 2014, at 7:18 AM, Maciek mac...@sonra.io wrote:

 I see… and confirm, it's consistent with Linux/Unix output I get:
 date -r 0
 Thu  1 Jan 1970 01:00:00 IST

 date
 Wed  5 Nov 2014 14:49:52 GMT
 Got some digging and it actually makes sense. Turns out Ireland didn't
 observe daylight saving time in years 1968-1972 as set permanently to
 GMT+1=IST.

 Anyway, back to Hive
 I'm trying to convert unix_times to UTC (using from_unixtime UDF )but
 due to the issue it I'm getting different results on different servers (TZ
 settings)
 Is there any way influence that behaviour without changing timezone on
 the server?

 Oracle for that instance offers a good few options to facilitate
 timezone conversion, among the others:
 'AT TIME ZONE [GMT]' clause
 ALTER SESSION SET TIME_ZONE [= 'GMT']
 or
 to_timestamp_tz() function

 Currently it seems, the only way to perform this conversion is to detect
 server settings first (won't work at all for some cases like though JDBC
 connection I think) and apply the shift during the process.

 Would be really nice if Hive offers some elegant way to support this.
 I'm thinking of similar ALTER SESSION statement equivalent, maybe
 parameter SET in hive or extra parameter for the from_unixtime() Hive
 function?

 On Mon, Nov 3, 2014 at 10:33 PM, Jason Dere jd...@hortonworks.com
 wrote:


 As Nitin mentions, the behavior is to a string representing the
 timestamp of that moment in the current system time zone.  What are the
 timezone settings on your machine?

 $ TZ=GMT date -r 0
 Thu Jan  1 00:00:00 GMT 1970

 $ TZ=UTC date -r 0
 Thu Jan  1 00:00:00 UTC 1970

 $ TZ=Europe/London date -r 0
 Thu Jan  1 01:00:00 BST 1970

 $ TZ=Europe/Dublin date -r 0
 Thu Jan  1 01:00:00 IST 1970

 On Nov 3, 2014, at 12:50 PM, Maciek mac...@sonra.io wrote:

 I'd consider this behaviour as a bug and would like to raise it as such.
 Is there anyone to confirm it's the same on Hive 0.14?

 On Fri, Oct 31, 2014 at 3:41 PM, Maciek mac...@sonra.io wrote:

 Actually confirmed! It's down to the timezone settings
 I've moved temporarily server/client settings to 'Atlantic/Reykjavik'
 (no change in time comparing to what I was on (GMT), but it's permanent 
 UTC
 and as such doesn't observe daylight saving.
 I believe this shouldn't matter (see my points from previous mail) but
 apparently there's an issue with it.
 Not sure how to deal with this situation (can't just change TZ
 settings everywhere because of Hive) and don't want to hardcode anything.
 I'm on Hive 0.13.
 Does Hive 0.14 provide better support for TimeZones?


 On Fri, Oct 31, 2014 at 3:25 PM, Maciek mac...@sonra.io wrote:

 Thought about that myself based on my prior (bad) experience when
 tried to working with timezones in Hive (functionality pretty much 
 doesn't
 exists)
 That shouldn't be the case here though, here's why:
 in Oracle [timestamp with timezone] can be adjusted when
 sent/displayed on the client based on client's settings. This may be also
 relevant if the timestamp in question would fall onto client's daily 
 saving
 time period. This behaviour would make sense to me, however:

 • this is server, not client settings we're talking about here
 • the server and client do reside in the same timezone anyway, which
 is currently GMT [UTC]

 • while we observe the daily saving here [Dublin] the time in
 question (1970-01-01 00:00:00) is not in that period, neither the time

Re: Unix script for identifying current active namenode in a HA cluster

2014-11-05 Thread Nitin Pawar
looks good to me

thanks for the share

On Wed, Nov 5, 2014 at 5:15 PM, Devopam Mittra devo...@gmail.com wrote:

 hi Nitin,
 Thanks for the vital input around Hadoop Home addition. At times such
 things totally go off the radar when you have customized your own
 environment.

 As suggested I have shared this on github :
 https://github.com/devopam/hadoopHA
 apologies if there is any problem on github as I have limited familiarity
 with it :(


 regards
 Devopam



 On Wed, Nov 5, 2014 at 12:31 PM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 +1
 If you can optionally add hadoop home directory in the script and use
 that in path, it can be used out of the box.

 Also can you share this on github

 On Wed, Nov 5, 2014 at 10:02 AM, Devopam Mittra devo...@gmail.com
 wrote:

 hi All,
 Please find attached a simple shell script to dynamically determine the
 active namenode in the HA Cluster and subsequently run the Hive job / query
 via Talend OS generated workflows.

 It was tried successfully on a HDP2.1 cluster with 2 nn, 7 dn running on
 CentOS 6.5.
 Each ETL job invokes this script first in our framework to derive the NN
 FQDN and then run the hive jobs subsequently to avoid failures.
 Takes a max. of 2 secs to execute (small cost in our case, as compared
 to dealing with a failure and then recalculating the NN to resubmit the
 job).

 Sharing it with you in case you can leverage the same without spending
 effort to code it.

 Do share your feedback/ fixes if you spot any.

 --
 Devopam Mittra
 Life and Relations are not binary




 --
 Nitin Pawar




 --
 Devopam Mittra
 Life and Relations are not binary




-- 
Nitin Pawar


Re: from_unixtime() and epoch definition

2014-11-05 Thread Nitin Pawar
May be a JIRA ?

I remember having my own UDF for doing this. If possible I will share the
code

On Thu, Nov 6, 2014 at 6:22 AM, Jason Dere jd...@hortonworks.com wrote:

 Hive should probably at least provide a timezone option to
 from_unixtime().
 As you mentioned, Hive doesn't really do any timezone handling, just
 assumes things are in the system's local timezone. It will be a bit of a
 bigger project to add better time zone handling to Hive timestamps.


 On Nov 5, 2014, at 7:18 AM, Maciek mac...@sonra.io wrote:

 I see… and confirm, it's consistent with Linux/Unix output I get:
 date -r 0
 Thu  1 Jan 1970 01:00:00 IST

 date
 Wed  5 Nov 2014 14:49:52 GMT
 Got some digging and it actually makes sense. Turns out Ireland didn't
 observe daylight saving time in years 1968-1972 as set permanently to
 GMT+1=IST.

 Anyway, back to Hive
 I'm trying to convert unix_times to UTC (using from_unixtime UDF )but due
 to the issue it I'm getting different results on different servers (TZ
 settings)
 Is there any way influence that behaviour without changing timezone on the
 server?

 Oracle for that instance offers a good few options to facilitate timezone
 conversion, among the others:
 'AT TIME ZONE [GMT]' clause
 ALTER SESSION SET TIME_ZONE [= 'GMT']
 or
 to_timestamp_tz() function

 Currently it seems, the only way to perform this conversion is to detect
 server settings first (won't work at all for some cases like though JDBC
 connection I think) and apply the shift during the process.

 Would be really nice if Hive offers some elegant way to support this.
 I'm thinking of similar ALTER SESSION statement equivalent, maybe
 parameter SET in hive or extra parameter for the from_unixtime() Hive
 function?

 On Mon, Nov 3, 2014 at 10:33 PM, Jason Dere jd...@hortonworks.com wrote:


 As Nitin mentions, the behavior is to a string representing the
 timestamp of that moment in the current system time zone.  What are the
 timezone settings on your machine?

 $ TZ=GMT date -r 0
 Thu Jan  1 00:00:00 GMT 1970

 $ TZ=UTC date -r 0
 Thu Jan  1 00:00:00 UTC 1970

 $ TZ=Europe/London date -r 0
 Thu Jan  1 01:00:00 BST 1970

 $ TZ=Europe/Dublin date -r 0
 Thu Jan  1 01:00:00 IST 1970

 On Nov 3, 2014, at 12:50 PM, Maciek mac...@sonra.io wrote:

 I'd consider this behaviour as a bug and would like to raise it as such.
 Is there anyone to confirm it's the same on Hive 0.14?

 On Fri, Oct 31, 2014 at 3:41 PM, Maciek mac...@sonra.io wrote:

 Actually confirmed! It's down to the timezone settings
 I've moved temporarily server/client settings to 'Atlantic/Reykjavik'
 (no change in time comparing to what I was on (GMT), but it's permanent UTC
 and as such doesn't observe daylight saving.
 I believe this shouldn't matter (see my points from previous mail) but
 apparently there's an issue with it.
 Not sure how to deal with this situation (can't just change TZ settings
 everywhere because of Hive) and don't want to hardcode anything.
 I'm on Hive 0.13.
 Does Hive 0.14 provide better support for TimeZones?


 On Fri, Oct 31, 2014 at 3:25 PM, Maciek mac...@sonra.io wrote:

 Thought about that myself based on my prior (bad) experience when tried
 to working with timezones in Hive (functionality pretty much doesn't 
 exists)
 That shouldn't be the case here though, here's why:
 in Oracle [timestamp with timezone] can be adjusted when sent/displayed
 on the client based on client's settings. This may be also relevant if the
 timestamp in question would fall onto client's daily saving time period.
 This behaviour would make sense to me, however:

 • this is server, not client settings we're talking about here
 • the server and client do reside in the same timezone anyway, which is
 currently GMT [UTC]

 • while we observe the daily saving here [Dublin] the time in question
 (1970-01-01 00:00:00) is not in that period, neither the time I'm sending
 the query (now).



 Based on all above, I don't see the reason the time gets shifted by one
 hour, but I realise the issue might be down to the general problems in
 Hive' implementation of timezones…

 On Fri, Oct 31, 2014 at 12:26 PM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 In hive from_unixtime is returned from the timezone which you belong to
 From document : from_unixtime(bigint unixtime[, string format]) :
 Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) 
 to
 a string representing the timestamp of that moment in the current system
 time zone in the format of 1970-01-01 00:00:00.

 if possible can you also check by changing the timezone to UTC on your
 machine?


 On Fri, Oct 31, 2014 at 12:00 PM, Maciek mac...@sonra.io wrote:

 Any reason why

 select from_unixtime(0) t0 FROM …
 gives

 1970-01-01 01:00:00
 ?

 By all available definitions (epoch, from_unixtime etc..) I would
 expect it to be 1970-01-01 00:00:00…?



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain

Re: Hive 0.14 configuration

2014-11-04 Thread Nitin Pawar
currently only ORCFileformat is supports ACIDOutputformat

So you may want to create a table with orcfile format and see if you are
able to do acid opertaions.



On Tue, Nov 4, 2014 at 1:14 PM, mahesh kumar sankarmahes...@gmail.com
wrote:

 Hi Nitin,

  how to create table with AcidOuptut Format.?Can you send me
 examples.

 Thanks
 Mahesh

 On Tue, Nov 4, 2014 at 12:21 PM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 As the error says, your table file format has to be AcidOutPutFormat or
 table needs to be bucketed to perform update operation.

 You may want to create a new table from your existing table with
 AcidOutPutFormat and insert data from current table to that table and then
 try update op on new table

 On Tue, Nov 4, 2014 at 12:11 PM, mahesh kumar sankarmahes...@gmail.com
 wrote:

 Hi ,
Is anyone tried hive 0.14 configuration.I built it using maven
 from github.
 Insert is working fine but when i use update/delete i got the
  error.First i created table and inserted rows.

 CREATE  TABLE new(id int ,name string)ROW FORMAT DELIMITED FIELDS
 TERMINATED BY ',';
  insert into table new values ('1','Mahesh');

 update new set name='Raj' where id=1;

 FAILED: SemanticException [Error 10297]: Attempt to do update or delete
 on table default.new that does not use an AcidOutputFormat or is not
 bucketed.

 When i update the table i got the above error.

 Can you help me guys.

 Thanks

 Mahesh.S






 --
 Nitin Pawar





-- 
Nitin Pawar


Re: Unix script for identifying current active namenode in a HA cluster

2014-11-04 Thread Nitin Pawar
+1
If you can optionally add hadoop home directory in the script and use that
in path, it can be used out of the box.

Also can you share this on github

On Wed, Nov 5, 2014 at 10:02 AM, Devopam Mittra devo...@gmail.com wrote:

 hi All,
 Please find attached a simple shell script to dynamically determine the
 active namenode in the HA Cluster and subsequently run the Hive job / query
 via Talend OS generated workflows.

 It was tried successfully on a HDP2.1 cluster with 2 nn, 7 dn running on
 CentOS 6.5.
 Each ETL job invokes this script first in our framework to derive the NN
 FQDN and then run the hive jobs subsequently to avoid failures.
 Takes a max. of 2 secs to execute (small cost in our case, as compared to
 dealing with a failure and then recalculating the NN to resubmit the job).

 Sharing it with you in case you can leverage the same without spending
 effort to code it.

 Do share your feedback/ fixes if you spot any.

 --
 Devopam Mittra
 Life and Relations are not binary




-- 
Nitin Pawar


Re: Hive 0.14 configuration

2014-11-03 Thread Nitin Pawar
As the error says, your table file format has to be AcidOutPutFormat or
table needs to be bucketed to perform update operation.

You may want to create a new table from your existing table with
AcidOutPutFormat and insert data from current table to that table and then
try update op on new table

On Tue, Nov 4, 2014 at 12:11 PM, mahesh kumar sankarmahes...@gmail.com
wrote:

 Hi ,
Is anyone tried hive 0.14 configuration.I built it using maven from
 github.
 Insert is working fine but when i use update/delete i got the  error.First
 i created table and inserted rows.

 CREATE  TABLE new(id int ,name string)ROW FORMAT DELIMITED FIELDS
 TERMINATED BY ',';
  insert into table new values ('1','Mahesh');

 update new set name='Raj' where id=1;

 FAILED: SemanticException [Error 10297]: Attempt to do update or delete on
 table default.new that does not use an AcidOutputFormat or is not bucketed.

 When i update the table i got the above error.

 Can you help me guys.

 Thanks

 Mahesh.S






-- 
Nitin Pawar


Re: from_unixtime() and epoch definition

2014-10-31 Thread Nitin Pawar
Do you have a copy paste error?

I see both values as same

On Fri, Oct 31, 2014 at 5:30 PM, Maciek mac...@sonra.io wrote:

 Any reason why

 select from_unixtime(0) t0 FROM …

 gives

 1970-01-01 01:00:00

 ?

 By all available definitions (epoch, from_unixtime etc..) I would expect
 it to be 1970-01-01 01:00:00…?




-- 
Nitin Pawar


Re: from_unixtime() and epoch definition

2014-10-31 Thread Nitin Pawar
In hive from_unixtime is returned from the timezone which you belong to
From document : from_unixtime(bigint unixtime[, string format]) : Converts
the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string
representing the timestamp of that moment in the current system time zone
in the format of 1970-01-01 00:00:00.

if possible can you also check by changing the timezone to UTC on your
machine?

On Fri, Oct 31, 2014 at 5:32 PM, Maciek mac...@sonra.io wrote:

 meant 1970-01-01 00:00:00 of course…

 On Fri, Oct 31, 2014 at 12:00 PM, Maciek mac...@sonra.io wrote:

 Any reason why

 select from_unixtime(0) t0 FROM …

 gives

 1970-01-01 01:00:00

 ?

 By all available definitions (epoch, from_unixtime etc..) I would expect
 it to be 1970-01-01 00:00:00…?




 --
 Kind Regards
 Maciek Kocon




-- 
Nitin Pawar


Re: select * from table and select column from table in hive

2014-10-20 Thread Nitin Pawar
whats your table create ddl?

is the data in csv like format?
On 21 Oct 2014 00:26, Raj Hadoop hadoop...@yahoo.com wrote:

 I am able to see the data in the table for all the columns when I issue
 the following -

 SELECT * FROM t1 WHERE dt1='2013-11-20'


 But I am unable to see the column data when i issue the following -

 SELECT cust_num FROM t1 WHERE dt1='2013-11-20'

 The above shows null values.

 How should I debug this ?




Re: Querying A table which JDBC

2014-09-23 Thread Nitin Pawar
can you share hiveserver2 heap size and your table size ?

On Tue, Sep 23, 2014 at 11:31 PM, Shiang Luong shiang.lu...@openx.com
wrote:

 Ritesh thanks for your response.

 Where do I download and place the jars?
 Do you mean on the hive server itself?  I believe the files are already
 there since I can query the same table via command line.
 It feels like the serde is not being sent along with the query? or I need
 to get the jar sent out to the distributed cache?
 I even tried running:

 myStatment.execute(add JAR /usr/lib/hive/extra_libs/test.jar);

 That didn't work.  I'm not sure just shooting out thoughts.

 Thanks,

 Shiang

 On Mon, Sep 22, 2014 at 10:52 PM, Ritesh Kumar Singh 
 riteshoneinamill...@gmail.com wrote:

 try downloading the jar files and put it in the libraries folder

 On Tue, Sep 23, 2014 at 10:58 AM, Shiang Luong shiang.lu...@openx.com
 wrote:

 Hi All,

 I'm new to hive.  I'm having some problems querying an hive table with
 JDBC.  It fails when it is trying to run an map reduce job.  It can't seem
 to find the serde jar file.  When I query it through the command line it
 works fine.  Anyone have any hints on how I can get it working with JDBC?

 Thanks in advance.

 Shiang





 --
 Shiang Luong
 Software Engineer in Test | OpenX
 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101
 o: +1 (626) 466-1141 x | m: +1 (626) 512-2165 | shiang.lu...@openx.com
 OpenX ranked No. 7 in Forbes’ America’s Most Promising Companies




-- 
Nitin Pawar


Re: Handling updates to Bucketed Table

2014-09-18 Thread Nitin Pawar
When you bucket the data in a partition,
there will be a file created for each of your bucketing key.

Now if you add more data to the same bucket that means that file would need
to rebuild

I would prefer a partition on day level under month level where I write the
data once a day and bucket the data there


I am not sure hive supports append to bucketed files yet.
please wait for others to answer as well

On Thu, Sep 18, 2014 at 9:27 PM, Kumar V kumarbuyonl...@yahoo.com wrote:

 Hi,
 I would like to know how to handle frequent updates to bucketed
 tables.  Is there a way to update without a rebuild ?
 I have a monthly partition for a table with buckets.  But I have to update
 the table every day.  Is there a way to achieve this without a rebuild of
 this partition every day ?  Or, is this a wrong use case for a bucketed
 table ?
 This table is joined with another table.  So, I thought bucketing will
 speed up the queries.  What are my options ?

 Please let me know.

 Regards,
 Murali.




-- 
Nitin Pawar


Re: Correlated Subqueries Workaround in Hive!

2014-09-15 Thread Nitin Pawar
have you taken a look at lag and lead functions ?

On Mon, Sep 15, 2014 at 4:46 PM, Viral Parikh viral.j.par...@gmail.com
wrote:

 To Whomsoever It May Concern,

 I posted this question last week but still haven't heard from anyone; I'd
 appreciate any reply.

 I've got a table that contains a LocationId field. In some cases, where a
 record shares the same foreign key, the LocationId might come through as -1.

 What I want to do is in my select query is in the case of this happening,
 the previous location.

 Example data:

 Record  FK StartTime   EndTime  Location1   110  
 2011/01/01 12.302011/01/01 6.10  4562   110  2011/01/01 3.40  
2011/01/01 4.00   -13   110  2011/01/02 1.00 
 2011/01/02 8.00  8914   110  2011/01/02 5.00 2011/01/02 6.00  
  -15   110  2011/01/02 6.10 2011/01/02 6.30   -1

 The -1 should come out as 456 for record 2, and 891 for record 4 and 5

 Can someone help me do this with Hive syntax?

 I can do it using SQL syntax (as below) but since Hive doesnt support
 correlated subqueries in select clauses and so I am unable to get it.

 SELECT  T1.record,
 T1.fk,
 T1.start_time,
 T1.end_time,
 CASE WHEN T1.location != -1 THEN Location
 ELSE
 (
 SELECT  TOP (1)
 T2.location
 FROM#temp1 AS T2
 WHERE   T2.record  T1.record
 AND T2.fk = T1.fk
 AND T2.location != -1
 ORDER   BY T2.Record DESC
 )
 ENDFROM#temp1 AS T1

 Thank you for your help in advance!




-- 
Nitin Pawar


Re: Correlated Subqueries Workaround in Hive!

2014-09-15 Thread Nitin Pawar
Other way I can think at this is ..

1) ignore all -1 and create a tmp table
2) I see there are couple of time stamps
3) Oder the table by timestamp
4) from this tmp tabel create anothe tmp table which says FK MinStartTime
MaxEndTime Location
5) Now this tmp table from step 4 join with ur raw data and put where
clause with min and max times

I hope this is not confusing

On Mon, Sep 15, 2014 at 6:25 PM, Viral Parikh viral.j.par...@gmail.com
wrote:

 thanks!

 is there any other way than writing python UDF etc.

 any way i can leverage hive joins to get this working?

 On Mon, Sep 15, 2014 at 6:56 AM, Sreenath sreenaths1...@gmail.com wrote:

 How about writing a python UDF that takes input line by line
 and it saves the previous lines location and can replace it with that
 if location turns out to be '-1'

 On 15 September 2014 17:01, Nitin Pawar nitinpawar...@gmail.com wrote:

 have you taken a look at lag and lead functions ?

 On Mon, Sep 15, 2014 at 4:46 PM, Viral Parikh viral.j.par...@gmail.com
 wrote:

 To Whomsoever It May Concern,

 I posted this question last week but still haven't heard from anyone;
 I'd appreciate any reply.

 I've got a table that contains a LocationId field. In some cases, where
 a record shares the same foreign key, the LocationId might come through as
 -1.

 What I want to do is in my select query is in the case of this
 happening, the previous location.

 Example data:

 Record  FK StartTime   EndTime  Location1   
 110  2011/01/01 12.302011/01/01 6.10  4562   110  
 2011/01/01 3.40 2011/01/01 4.00   -13   110  2011/01/02 
 1.00 2011/01/02 8.00  8914   110  2011/01/02 5.00 
 2011/01/02 6.00   -15   110  2011/01/02 6.10 2011/01/02 
 6.30   -1

 The -1 should come out as 456 for record 2, and 891 for record 4 and 5

 Can someone help me do this with Hive syntax?

 I can do it using SQL syntax (as below) but since Hive doesnt support
 correlated subqueries in select clauses and so I am unable to get it.

 SELECT  T1.record,
 T1.fk,
 T1.start_time,
 T1.end_time,
 CASE WHEN T1.location != -1 THEN Location
 ELSE
 (
 SELECT  TOP (1)
 T2.location
 FROM#temp1 AS T2
 WHERE   T2.record  T1.record
 AND T2.fk = T1.fk
 AND T2.location != -1
 ORDER   BY T2.Record DESC
 )
 ENDFROM#temp1 AS T1

 Thank you for your help in advance!




 --
 Nitin Pawar




 --
 Sreenath S Kamath
 Bangalore
 Ph No:+91-9590989106





-- 
Nitin Pawar


Re: Dynamic Partitioning- Partition_Naming

2014-09-10 Thread Nitin Pawar
Thanks for correcting me Anusha,

Here are the links you gave me

https://cwiki.apache.org/confluence/display/Hive/HCatalog+Config+Properties

https://issues.apache.org/jira/secure/attachment/12622686/HIVE-6109.pdf

On Tue, Sep 9, 2014 at 5:16 PM, Nitin Pawar nitinpawar...@gmail.com wrote:

 you can not modify the paths of partitions being created by dynamic
 partitioning or rename them
 Thats the default implementation for having column=value in path as
 partition


 On Tue, Sep 9, 2014 at 5:18 AM, anusha Mangina anusha.mang...@gmail.com
 wrote:


 I need a table partitioned by country and then city . I created a table
 and INSERTed data  from another table using dynamic partition.

 CREATE TABLE invoice_details_hive _partitioned(Invoice_Id
 double,Invoice_Date string,Invoice_Amount double,Paid_Date
 string)PARTITIONED BY(pay_country STRING,pay_location STRING);

 Everything worked fine.



 Partitions by default are named like pay_country=INDIA and
  pay_city=DELHI etc in

 ../hive/warehouse/invoice_details_hive_partitioned/pay_country=INDIA/pay_city=DELHI


 can I get partition name as Just Column Value  INDIA and DELHI ...not
 including column name ...like  /hive/warehouse/invoice_details_hive
 _partitioned/INDIA/DELHI?

 Thanks in Advance





 --
 Nitin Pawar




-- 
Nitin Pawar


Re: Dynamic Partitioning- Partition_Naming

2014-09-09 Thread Nitin Pawar
you can not modify the paths of partitions being created by dynamic
partitioning or rename them
Thats the default implementation for having column=value in path as
partition


On Tue, Sep 9, 2014 at 5:18 AM, anusha Mangina anusha.mang...@gmail.com
wrote:


 I need a table partitioned by country and then city . I created a table
 and INSERTed data  from another table using dynamic partition.

 CREATE TABLE invoice_details_hive _partitioned(Invoice_Id
 double,Invoice_Date string,Invoice_Amount double,Paid_Date
 string)PARTITIONED BY(pay_country STRING,pay_location STRING);

 Everything worked fine.



 Partitions by default are named like pay_country=INDIA and
  pay_city=DELHI etc in

 ../hive/warehouse/invoice_details_hive_partitioned/pay_country=INDIA/pay_city=DELHI


 can I get partition name as Just Column Value  INDIA and DELHI ...not
 including column name ...like  /hive/warehouse/invoice_details_hive
 _partitioned/INDIA/DELHI?

 Thanks in Advance





-- 
Nitin Pawar


Re: Hive columns

2014-09-04 Thread Nitin Pawar
If those are text files you can create the table with single column and
then process them line by line


On Thu, Sep 4, 2014 at 6:13 PM, CHEBARO Abdallah abdallah.cheb...@murex.com
 wrote:

  Hello,



 Is it possible to create an external table without specifying the columns?



 In fact, I am creating an external table that points to a directory that
 contains 3 text file, and each text file has different number of columns.



 Thanks

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.




-- 
Nitin Pawar


Re: Hive columns

2014-09-04 Thread Nitin Pawar
it means you will need to define atleast one column in hive or build your
fileformat which can handle reading the files and giving data back to hive

when i say atleast one column, by default hive uses \n as record
terminator that means you can define an entire row as a column and then
process it the way you want
this is just a suggestion and it would be really tedious to keep the
mapping.

Instead I would suggest use pig to create proper tables from these files
and then use hive to do more deeper analytics


On Thu, Sep 4, 2014 at 6:35 PM, CHEBARO Abdallah abdallah.cheb...@murex.com
 wrote:

  Can you please specify what this means?



 *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com]
 *Sent:* Thursday, September 04, 2014 4:00 PM
 *To:* user@hive.apache.org
 *Subject:* Re: Hive columns



 If those are text files you can create the table with single column and
 then process them line by line



 On Thu, Sep 4, 2014 at 6:13 PM, CHEBARO Abdallah 
 abdallah.cheb...@murex.com wrote:

 Hello,



 Is it possible to create an external table without specifying the columns?



 In fact, I am creating an external table that points to a directory that
 contains 3 text file, and each text file has different number of columns.



 Thanks

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.





 --
 Nitin Pawar

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.




-- 
Nitin Pawar


Re: Mysql - Hive Sync

2014-09-02 Thread Nitin Pawar
have you looked at sqoop?


On Wed, Sep 3, 2014 at 10:15 AM, Muthu Pandi muthu1...@gmail.com wrote:

 Dear All

  Am developing a prototype of syncing tables from mysql to Hive using
 python and JDBC. Is it a good idea using the JDBC for this purpose.

 My usecase will be generating the sales report using the hive, data pulled
 from mysql using the prototype tool.My data will be around 2GB/day.



 *Regards Muthupandi.K*

  [image: Picture (Device Independent Bitmap)]




-- 
Nitin Pawar


Re: how to create custom user defined data type in Hive

2014-08-25 Thread Nitin Pawar
from teradata documentation
 A PERIOD column in Teradata can be any date or timestamp type

I think both of these are supported in hive-0.13 if not as Peyman
suggested, strings are best friends when we are not sure


On Tue, Aug 26, 2014 at 6:56 AM, reena upadhyay reena2...@gmail.com wrote:

 Hi,

 As long as the data type is ANSI complaint, its equivalent type is
 available in Hive. But there are few data types that are database specific.
 Like there is a PERIOD data type in teradata, it is specific to teradata
 only, So how to map such columns in Hive?

 Thanks.


 On Tue, Aug 26, 2014 at 6:44 AM, Peyman Mohajerian mohaj...@gmail.com
 wrote:

 As far as i know you cannot do that and most likely you don't need it,
 here are sample mappings between the two systems:
 Teradata
   Hive
DECIMAL(x,y)  double  DATE,TIMESTAMP  timestamp
 INTEGER,SMALLINT,BYTINT  int  VARCHAR,CHAR  string  DECIMAL(x,0)  bigint


 I would typically stage data in hadoop as all string and then move it to
 hive managed/orc with the above mapping.




 On Mon, Aug 25, 2014 at 8:42 PM, reena upadhyay reena2...@gmail.com
 wrote:

 Hi,

 Is there any way to create custom user defined data type in Hive? I want
 to move some table data from teradata database to Hive. But in teradata
 database tables, there are few columns data type that are not supported in
 Hive. So to map the source table columns to my destination table columns in
 Hive, I want to create my own data type in Hive.

 I know about writing UDF's in Hive but have no idea about creating user
 defined data type in HIve. Any idea and example on the same would be of
 great help.

 Thanks.






-- 
Nitin Pawar


Re: List of dates as arguments

2014-08-24 Thread Nitin Pawar
with your shell script calculate your start date and end date
hive $HIVEPARAMS -hiveconf startdate=$var1  -hiveconf enddate=$var2

also set in ..hiverc
set hive.variable.substitute=true;


On Sun, Aug 24, 2014 at 10:19 AM, karthik Srivasthava 
karthiksrivasth...@gmail.com wrote:

 As my raw-data table is partitioned by date.. i want to get data to run a
 query every days to find top 10 products in last 15 days .

 How to pass list of dates dynamically as arguments in hive query using
 hiveconf?





-- 
Nitin Pawar


Re: List of dates as arguments

2014-08-24 Thread Nitin Pawar
I am not sure if you can transform array from shell to java, you may want
to write your own custom UDF for that

if these are continuous dates, then you can have less than greater than
comparison


On Sun, Aug 24, 2014 at 12:39 PM, karthik Srivasthava 
karthiksrivasth...@gmail.com wrote:

 Nitin,
 Teja

 Thank you.. I exactly need what Teja suggested... i need list of dates
 between start date and end date


 On Sun, Aug 24, 2014 at 2:05 AM, Teja Kunapareddy 
 tejakunapare...@gmail.com wrote:

 Thanks Nithin For your reply.. I can get start date and end date,. But
 can i get all the dates with in START DATE AND END DATE.??? . so that my
 query looks something like this

 Select  a, b, c from table_x where date in  (${hiveconf:LIST_OF DATES})



 On 24 August 2014 01:18, Nitin Pawar nitinpawar...@gmail.com wrote:

 with your shell script calculate your start date and end date
 hive $HIVEPARAMS -hiveconf startdate=$var1  -hiveconf enddate=$var2

 also set in ..hiverc
 set hive.variable.substitute=true;


 On Sun, Aug 24, 2014 at 10:19 AM, karthik Srivasthava 
 karthiksrivasth...@gmail.com wrote:

 As my raw-data table is partitioned by date.. i want to get data to run
 a query every days to find top 10 products in last 15 days .

 How to pass list of dates dynamically as arguments in hive query using
 hiveconf?





 --
 Nitin Pawar






-- 
Nitin Pawar


Re: List of dates as arguments

2014-08-24 Thread Nitin Pawar
Bala,

I think they need an array substitution  instead of string as hiveconf
variable substitution


On Sun, Aug 24, 2014 at 11:55 PM, Bala Krishna Gangisetty 
b...@altiscale.com wrote:

 Here is my understanding on your requirements. Let me know if I am missing
 something. You,

 a) would like to run a query daily to find top 10 products in the past 15
 days
 b) would like to pass dates dynamically as arguments to HIVE query

 Given the requirement a), passing just two variables(startdate and
 enddate) to HIVE query will suffice to achieve the requirement b).

 Assuming startdate and enddate variables are passed to HIVE query, the
 query will look like below.

 SELECT * FROM *table_name* WHERE *date_column* BETWEEN
 *${hiveconf:startdate}* AND  *${hiveconf:enddate}*

 Note, values for startdate and enddate must be enclosed in ' '.

 Hope this helps.

 --Bala G.


 On Sun, Aug 24, 2014 at 12:57 AM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 I am not sure if you can transform array from shell to java, you may want
 to write your own custom UDF for that

 if these are continuous dates, then you can have less than greater than
 comparison


 On Sun, Aug 24, 2014 at 12:39 PM, karthik Srivasthava 
 karthiksrivasth...@gmail.com wrote:

 Nitin,
 Teja

 Thank you.. I exactly need what Teja suggested... i need list of dates
 between start date and end date


 On Sun, Aug 24, 2014 at 2:05 AM, Teja Kunapareddy 
 tejakunapare...@gmail.com wrote:

 Thanks Nithin For your reply.. I can get start date and end date,. But
 can i get all the dates with in START DATE AND END DATE.??? . so that my
 query looks something like this

 Select  a, b, c from table_x where date in  (${hiveconf:LIST_OF
 DATES})


 On 24 August 2014 01:18, Nitin Pawar nitinpawar...@gmail.com wrote:

 with your shell script calculate your start date and end date
 hive $HIVEPARAMS -hiveconf startdate=$var1  -hiveconf
 enddate=$var2

 also set in ..hiverc
 set hive.variable.substitute=true;


 On Sun, Aug 24, 2014 at 10:19 AM, karthik Srivasthava 
 karthiksrivasth...@gmail.com wrote:

 As my raw-data table is partitioned by date.. i want to get data to
 run a query every days to find top 10 products in last 15 days .

 How to pass list of dates dynamically as arguments in hive query
 using hiveconf?





 --
 Nitin Pawar






 --
 Nitin Pawar





-- 
Nitin Pawar


Re: Passing variables using Hiveconf

2014-08-22 Thread Nitin Pawar
this is one way

hive $HIVEPARAMS -hiveconf target=$var1 -hiveconf
mapred.child.java.opts=-server -Xmx1200m -Djava.net.preferIPv4Stack=true

and you need to set this variable

set hive.variable.substitute=true;



On Fri, Aug 22, 2014 at 9:24 PM, karthik Srivasthava 
karthiksrivasth...@gmail.com wrote:

 Hi,

 I am passing substitution variable using hiveconf in Hive..

 But i couldnt execute simple queries when i am trying to pass more than
 one parameter. It throws NoViableAltException - AtomExpression.. Am i
 missing something.?




-- 
Nitin Pawar


Re: Load CSV files with embedded map and arrays to Hive

2014-08-21 Thread Nitin Pawar
Hey sorry .. got stuck with work.
I will take a look today


On Wed, Aug 20, 2014 at 5:43 PM, Sushant Prusty sushan...@gmx.com wrote:

  Hi Nitin,
 Hope you have received the dataset. If you have any further requirement,
 please feel free to contact. Will appreciate your help.

 Regards,
 Sushant
  On Tuesday 19 August 2014 02:33 PM, Nitin Pawar wrote:

 can you give an example of your dataset?


 On Tue, Aug 19, 2014 at 2:31 PM, Sushant Prusty sushan...@gmx.com wrote:

 Pl let me know how I can load a CSV file with embedded map and arrays
 data into Hive.

 Regards,
 Sushant




  --
 Nitin Pawar


 --
 Warm regards,

 Sushant Prusty




-- 
Nitin Pawar


Re: Load CSV files with embedded map and arrays to Hive

2014-08-19 Thread Nitin Pawar
can you give an example of your dataset?


On Tue, Aug 19, 2014 at 2:31 PM, Sushant Prusty sushan...@gmx.com wrote:

 Pl let me know how I can load a CSV file with embedded map and arrays data
 into Hive.

 Regards,
 Sushant




-- 
Nitin Pawar


Re: Cache tables in hive

2014-08-13 Thread Nitin Pawar
are you talking about the tables in map--join being loaded into distributed
cache?


On Wed, Aug 13, 2014 at 6:01 PM, harish tangella harish.tange...@gmail.com
wrote:

 Hi all,

 Request you to help

What are cache tables in hive

 Regards
 Harish







-- 
Nitin Pawar


Re: Distributed data

2014-08-12 Thread Nitin Pawar
what do you mean the data is distributed on many computers?

are you saying the data is on hdfs like filesystem ?


On Tue, Aug 12, 2014 at 5:51 PM, CHEBARO Abdallah 
abdallah.cheb...@murex.com wrote:

  Hello,



 Using Hive, we know that we should specify the file path to read data from
 a specific location. If the data is distributed on many computers, how can
 we read it?



 Thanks

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.




-- 
Nitin Pawar


Re: Distributed data

2014-08-12 Thread Nitin Pawar
If your hadoop is setup with same filesystem as hdfs, hive will take care
of it

If your hdfs is totally different than where the file resides, then you
need to get the file from that filesystem and then push it to hive using
load

if that filesystem supports import/export with tools like sqoop then you
can use them as well




On Tue, Aug 12, 2014 at 5:58 PM, CHEBARO Abdallah 
abdallah.cheb...@murex.com wrote:

  Yes I mean the data is on hdfs like filesystem



 *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com]
 *Sent:* Tuesday, August 12, 2014 3:26 PM
 *To:* user@hive.apache.org
 *Subject:* Re: Distributed data



 what do you mean the data is distributed on many computers?



 are you saying the data is on hdfs like filesystem ?



 On Tue, Aug 12, 2014 at 5:51 PM, CHEBARO Abdallah 
 abdallah.cheb...@murex.com wrote:

 Hello,



 Using Hive, we know that we should specify the file path to read data from
 a specific location. If the data is distributed on many computers, how can
 we read it?



 Thanks

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.





 --
 Nitin Pawar



 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.




-- 
Nitin Pawar


Re: Hive: Centralized HDFS Caching

2014-08-01 Thread Nitin Pawar
Please take a look at hive with tez as execution engine on hadoop 2.3.

it may help you compare it with what you want to achieve


On Fri, Aug 1, 2014 at 4:13 PM, Uli Bethke uli.bet...@sonra.io wrote:

   Hi.

  in Hive can I make use of the centralized cache management introduced in
 Hadoop 2.3 (
 http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html)?
 If not implemented yet, is this on the roadmap?

  My use case is that I want to pin a fact table that needs to be queried
 frequently into memory.

  Impala already supports this as per the Cloudera documentation
 http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_perf_hdfs_caching.html

  Thanks
  uli




-- 
Nitin Pawar


Re: How can I know one table is a partitioned table in hive?

2014-07-31 Thread Nitin Pawar
what are the options you have?
can you write a java code which can interact with hcatalog ?
or you can do a describle table and check for partion column details in
there


On Thu, Jul 31, 2014 at 1:11 PM, 张甲超 rebeyond1...@gmail.com wrote:

 dear all,
 I want know that one table is a partitioned table in hive, and
 return the result to shell.
 How can I do?




-- 
Nitin Pawar


Re: Input

2014-07-31 Thread Nitin Pawar
if you specified ; as your delimiter then abc will be complete string and
not abc only.

Take a look at csv fileformat if you want to use proper comma delimited
feature


On Thu, Jul 31, 2014 at 3:44 PM, CHEBARO Abdallah 
abdallah.cheb...@murex.com wrote:

  Hello,



 I am using Hive and trying to read from a txt file.



 I have an input like the following: “string”;”string”;”integer”.



 First, I specified that the row fields are delimited by a semi-column. Is
 it possible to read the integer without the quotations?



 Thank you

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.




-- 
Nitin Pawar


Re: SELECT specific data

2014-07-30 Thread Nitin Pawar
you mean just by writing query then I think no.

But if you want to read only first 3 columns of the data then it would work
with just a single table and load data into


On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah 
abdallah.cheb...@murex.com wrote:

  Hello,



 I am interested in selecting specific data from a source and loading it to
 a table. For example, if I have 5 columns in my dataset, I want to load 3
 columns of it. Is it possible to do it without create a second table?



 Thank you

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.




-- 
Nitin Pawar


Re: SELECT specific data

2014-07-30 Thread Nitin Pawar
With hive, without creating a table with full data, you can do intermediate
processing like select only few columns and write into another table,

If this is something one time then you can take a look at awk or cut
commands in linux and generate those files only.


On Wed, Jul 30, 2014 at 2:49 PM, CHEBARO Abdallah 
abdallah.cheb...@murex.com wrote:

  I am only using Hive and hadoop, nothing more.



 *From:* Devopam Mittra [mailto:devo...@gmail.com]
 *Sent:* Wednesday, July 30, 2014 12:15 PM

 *To:* user@hive.apache.org
 *Subject:* Re: SELECT specific data



 Are you using any tool to load data ? If yes, then the ETL tool will
 provide you such options.

 If not, then please explore unix file processing/external table route.



 On Wed, Jul 30, 2014 at 2:09 PM, CHEBARO Abdallah 
 abdallah.cheb...@murex.com wrote:

 Hello,



 Thank you for your reply.



 Consider we have data divided into 5 columns (col1, col2, col3, col4,
 col5).

 So I can’t load directly col1, col3 and col5?

 If I can’t do it directly, can you provide me with an alternate solution?



 Thank you.



 *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com]
 *Sent:* Wednesday, July 30, 2014 11:37 AM
 *To:* user@hive.apache.org
 *Subject:* Re: SELECT specific data



 you mean just by writing query then I think no.



 But if you want to read only first 3 columns of the data then it would
 work with just a single table and load data into



 On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah 
 abdallah.cheb...@murex.com wrote:

 Hello,



 I am interested in selecting specific data from a source and loading it to
 a table. For example, if I have 5 columns in my dataset, I want to load 3
 columns of it. Is it possible to do it without create a second table?



 Thank you

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.





 --
 Nitin Pawar

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.





 --
 Devopam Mittra
 Life and Relations are not binary

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.




-- 
Nitin Pawar


Re: SELECT specific data

2014-07-30 Thread Nitin Pawar
sorry hit send too soon ..
I mean without creating intermediate tables, in hive you can process the
file directly


On Wed, Jul 30, 2014 at 3:06 PM, Nitin Pawar nitinpawar...@gmail.com
wrote:

 With hive, without creating a table with full data, you can do
 intermediate processing like select only few columns and write into another
 table,

 If this is something one time then you can take a look at awk or cut
 commands in linux and generate those files only.


 On Wed, Jul 30, 2014 at 2:49 PM, CHEBARO Abdallah 
 abdallah.cheb...@murex.com wrote:

  I am only using Hive and hadoop, nothing more.



 *From:* Devopam Mittra [mailto:devo...@gmail.com]
 *Sent:* Wednesday, July 30, 2014 12:15 PM

 *To:* user@hive.apache.org
 *Subject:* Re: SELECT specific data



 Are you using any tool to load data ? If yes, then the ETL tool will
 provide you such options.

 If not, then please explore unix file processing/external table route.



 On Wed, Jul 30, 2014 at 2:09 PM, CHEBARO Abdallah 
 abdallah.cheb...@murex.com wrote:

 Hello,



 Thank you for your reply.



 Consider we have data divided into 5 columns (col1, col2, col3, col4,
 col5).

 So I can’t load directly col1, col3 and col5?

 If I can’t do it directly, can you provide me with an alternate solution?



 Thank you.



 *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com]
 *Sent:* Wednesday, July 30, 2014 11:37 AM
 *To:* user@hive.apache.org
 *Subject:* Re: SELECT specific data



 you mean just by writing query then I think no.



 But if you want to read only first 3 columns of the data then it would
 work with just a single table and load data into



 On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah 
 abdallah.cheb...@murex.com wrote:

 Hello,



 I am interested in selecting specific data from a source and loading it
 to a table. For example, if I have 5 columns in my dataset, I want to load
 3 columns of it. Is it possible to do it without create a second table?



 Thank you

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.





 --
 Nitin Pawar

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.





 --
 Devopam Mittra
 Life and Relations are not binary

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.




 --
 Nitin Pawar




-- 
Nitin Pawar


Re: SELECT specific data

2014-07-30 Thread Nitin Pawar
Please check another mail i sent right after that.
my bad had hit send button too soon without reading the mail.

I will rephrase

In hive to process the data, you will need the table created and data
loaded to the table.
You can not process a file without loading it into a table.

If you want to do that and do not want to create a temporary table in hive
with full columns from file then options available to you are
1) simple  unix tools like awk or sed or cut
2) write a pig script
3) write your own mapreduce code



On Wed, Jul 30, 2014 at 3:09 PM, CHEBARO Abdallah 
abdallah.cheb...@murex.com wrote:

  “With hive, without creating a table with full data, you can do
 intermediate processing like select only few columns and write into another
 table”. How can I do this process?



 Thank you alot!



 *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com]
 *Sent:* Wednesday, July 30, 2014 12:37 PM

 *To:* user@hive.apache.org
 *Subject:* Re: SELECT specific data



 sorry hit send too soon ..

 I mean without creating intermediate tables, in hive you can process the
 file directly



 On Wed, Jul 30, 2014 at 3:06 PM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 With hive, without creating a table with full data, you can do
 intermediate processing like select only few columns and write into another
 table,



 If this is something one time then you can take a look at awk or cut
 commands in linux and generate those files only.



 On Wed, Jul 30, 2014 at 2:49 PM, CHEBARO Abdallah 
 abdallah.cheb...@murex.com wrote:

 I am only using Hive and hadoop, nothing more.



 *From:* Devopam Mittra [mailto:devo...@gmail.com]
 *Sent:* Wednesday, July 30, 2014 12:15 PM


 *To:* user@hive.apache.org
 *Subject:* Re: SELECT specific data



 Are you using any tool to load data ? If yes, then the ETL tool will
 provide you such options.

 If not, then please explore unix file processing/external table route.



 On Wed, Jul 30, 2014 at 2:09 PM, CHEBARO Abdallah 
 abdallah.cheb...@murex.com wrote:

 Hello,



 Thank you for your reply.



 Consider we have data divided into 5 columns (col1, col2, col3, col4,
 col5).

 So I can’t load directly col1, col3 and col5?

 If I can’t do it directly, can you provide me with an alternate solution?



 Thank you.



 *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com]
 *Sent:* Wednesday, July 30, 2014 11:37 AM
 *To:* user@hive.apache.org
 *Subject:* Re: SELECT specific data



 you mean just by writing query then I think no.



 But if you want to read only first 3 columns of the data then it would
 work with just a single table and load data into



 On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah 
 abdallah.cheb...@murex.com wrote:

 Hello,



 I am interested in selecting specific data from a source and loading it to
 a table. For example, if I have 5 columns in my dataset, I want to load 3
 columns of it. Is it possible to do it without create a second table?



 Thank you

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.





 --
 Nitin Pawar

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.





 --
 Devopam Mittra
 Life and Relations are not binary

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.





 --
 Nitin Pawar





 --
 Nitin Pawar

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential

Re: Hive Data

2014-07-30 Thread Nitin Pawar
hive reads the files by the input format defined by the table schema.

By default it reads the TextFile in which columns are separated by CTRL+A
key

if you have a csv file then you can use a csv serde.
there are lots of such file formats.

what does your file look like?



On Wed, Jul 30, 2014 at 5:54 PM, CHEBARO Abdallah 
abdallah.cheb...@murex.com wrote:

  Hello,



 I am interested in testing Hive with a huge sample data. Does Hive read
 all data types? Should the file be a table?



 Thank you

 ***

 This e-mail contains information for the intended recipient only. It may
 contain proprietary material or confidential information. If you are not
 the intended recipient you are not authorised to distribute, copy or use
 this e-mail or any attachment to it. Murex cannot guarantee that it is
 virus free and accepts no responsibility for any loss or damage arising
 from its use. If you have received this e-mail in error please notify
 immediately the sender and delete the original email received, any
 attachments and all copies from your system.




-- 
Nitin Pawar


Re: Exception in Hive with SMB join and Parquet

2014-07-30 Thread Nitin Pawar
:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at 
 org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:96)
   at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
   at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:79)
   at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:66)
   at 
 org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:471)
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:561)
   ... 18 more

 Looks like it is trying to access the column with index as 29 where as there 
 are only 5 non null columns being present in the row - which matches the 
 Arraylist size.

 What could be going wrong here?


 Thanks

 Suma






-- 
Nitin Pawar


Re: UDTF

2014-07-28 Thread Nitin Pawar
you want to know how initializes an udtf or how to build udtf ?


On Tue, Jul 29, 2014 at 1:30 AM, Doug Christie doug.chris...@sas.com
wrote:

  Can anyone point me to the source code in hive where the calls to
 initialize, process and forward in a UDTF are made? Thanks.



 Doug






-- 
Nitin Pawar


Re: Drop Partition by ID

2014-07-21 Thread Nitin Pawar
you can try with like statement
On 21 Jul 2014 19:32, fab wol darkwoll...@gmail.com wrote:

 Hi everyone,

 I have the following problem: I have a partitoned managed table (Partition
 table is a string which represents a date, eg. log-date=2014-07-15).
 Unfortunately there is one partition in there like this:
 log_date=2014-07-15-23%3A45%3A38 (copied from show partitions stmt). This
 partitions most likeley got created to a wrong script 8which is fixed).

 Now i want to delete this partition, but it doesn't work:

- alter table ... drop partitition
(log_date='2014-07-15-23%3A45%3A38') gives no error, but the partition is
still existing afterwards
- I tried escaping the %-signs with backslashes but no luck with that
- I delete the directory in the HDFS and run msck repair table
afterwards. It recognizes that the folder is missing but is not deleting
the metadata

 So what can I do to get rid of the metadata? My next guess would be to go
 directly to the metastore DB and delete the metadata there. But what
 exactly has to be deleted? I guess there are several dependencies.

 Other idea: is there a possibility in Hive to delete a partition by a
 unique ID or something like that?

 Or what is needed to delete the table with the normal alter table drop
 partition command?

 Cheers
 Wolli



Re: difference between partition by and distribute by in rank()

2014-07-11 Thread Nitin Pawar
In general principle,
distribute by  ensures each of N reducers gets non-overlapping ranges of X ,
but doesn't sort the output of each reducer. You end up with N or unsorted
files with non-overlapping ranges. So this is more of a horizontal
distribution of data.

In my view,
Partition by is more based on values so its vertical distribution of data.

I may be wrong in understanding this




On Fri, Jul 11, 2014 at 1:38 PM, Eric Chu e...@rocketfuel.com wrote:

 Does anyone know what

 *rank() over(distribute by p_mfgr sort by p_name) *

 does exactly and how it's different from

 *rank() over(partition by p_mfgr order by p_name)*?

 Thanks,

 Eric




-- 
Nitin Pawar


Re: Error while renaming Partitioned column name

2014-07-09 Thread Nitin Pawar
whats your table DDL?


On Wed, Jul 9, 2014 at 11:03 PM, Manish Kothari manish.koth...@vonage.com
wrote:

  Thanks Dipesh.



 Here is what I tried : -



 ALTER TABLE siplogs_partitioned PARTITION
 (pcol1='str_hour',pcol2='str_date') RENAME TO PARTITION
 (pcol1='call_hour',pcol2='call_date');



 When I run the above command I am getting the error below : -



 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.DDLTask. str_date not found in table's
 partition spec: {pcol1=str_hour, pcol2=str_date}



 Am I missing something here?



 Thanks,

 Manish





 *From:* D K [mailto:deepe...@gmail.com]
 *Sent:* Wednesday, July 09, 2014 12:38 PM
 *To:* user@hive.apache.org
 *Subject:* Re: Error while renaming Partitioned column name



 Here is an example:
 ALTER TABLE alter_rename_partition PARTITION (pCol1='old_part1',
 pcol2='old_part2') RENAME TO PARTITION (pCol1='new_part1',
 pcol2='new_part2');



 On Wed, Jul 9, 2014 at 9:20 AM, Manish Kothari manish.koth...@vonage.com
 wrote:

 Hi,



 I have a table name siplogs_partitioned which is partitioned by columns
 str_date(DATE) and str_hour(INT). I want to rename the partitioned columns
 to call_date and call_hour.



 I am using the below command to alter the partitioned column name: -



 ALTER TABLE siplogs_partitioned PARTITION str_date RENAME TO PARTITION
 call_date;



 When I run the above command I am getting an error : -



 FAILED: ParseException line 1:12 cannot recognize input near
 'siplogs_partitioned' 'PARTITION' 'str_date' in alter table partition
 statement



 Is the “ALTER TABLE” usage correct to rename the partitioned column names?



 Any pointer or help is appreciated.



 Thanks,

 Manish






-- 
Nitin Pawar


Re: Hive metastore error

2014-06-26 Thread Nitin Pawar
is your hive metastore service running ?


On Thu, Jun 26, 2014 at 2:11 PM, Rishabh Bhardwaj rbnex...@yahoo.com
wrote:

 HI all,
 I have changed my hive metastore to mysql using the steps described here

 http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html

 Now when I am running any hive command on cli like show databases or show
 tables , It gives me the following error:

 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable
 to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient


 This is related to hive metastore only.
 Can anyone please help me out with this.

 Thanks,
 Rishabh




-- 
Nitin Pawar


edit permissions to wiki

2014-06-24 Thread Nitin Pawar
Hi,

can someone add me to hive wiki editors?

My userid is : nitinpawar432

-- 
Nitin Pawar


Re: how to load json with nested array into hive?

2014-06-23 Thread Nitin Pawar
I think you can just take a look at jsonserde

It does take care of nested json documents. (though you will need to know
entire json structure upfront)

Here is example of using it
http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/




On Mon, Jun 23, 2014 at 2:28 PM, Christian Link christian.l...@mdmp.com
wrote:

 Hi Jerome,

 thanks...I've already found Brickhouse and the Hive UDFs, but it didn't
 help.

 Today I'll try again to process the json file after going through all my
 mails...maybe I'll find a solution.

 Best,
 Chris


 On Fri, Jun 20, 2014 at 7:16 PM, Jerome Banks jba...@tagged.com wrote:

 Christian,
Sorry to spam this newsgroup, and this is not a commercial
 endorsement, but check out the Hive UDFs in the Brickhouse project (
 http://github.com/klout/brickhouse ) (
 http://brickhouseconfessions.wordpress.com/2014/02/07/hive-and-json-made-simple/
 )

 You can convert arbitrary complex Hive structures to an from json with
 it's to_json and from_json UDF's. See the blog posting for an explanation.

 -- jerome


 On Fri, Jun 20, 2014 at 8:26 AM, Christian Link christian.l...@mdmp.com
 wrote:

 hi,

 I'm very, very new to Hadoop, Hive, etc. and I have to import data into
 hive tables.

 Environment: Amazon EMR, S3, etc.

 The input file is on S3 and I copied it into my HDFS.

 1. flat table with one column and loaded data into it:

   CREATE TABLE mdmp_raw_data (json_record STRING);
   LOAD DATA INPATH 'hdfs:///input-api/1403181319.json' OVERWRITE INTO
 TABLE `mdmp_raw_data`;
 That worked, I can access some data, like this:

 SELECT d.carrier, d.language, d.country
   FROM mdmp_raw_data a LATERAL VIEW json_tuple(a.data,
 'requestTimestamp', 'context') bAS requestTimestamp, context
   LATERAL VIEW json_tuple(b.context, 'locale') c AS locale
   LATERAL VIEW json_tuple(c.locale, 'carrier', 'language', 'country') d
 AS carrier, language, country
   LIMIT 1;

 Result: o2 - de Deutsch Deutschland

 I can also select the array at once:

 SELECT b.requestTimestamp, b.batch
   FROM mdmp_raw_data a
   LATERAL VIEW json_tuple(a.data, 'requestTimestamp', 'batch') b AS
 requestTimestamp, batch
   LIMIT 1;
 This will give me:

  
 [{timestamp:2014-06-19T14:25:18+02:00,requestId:2ca08247-5542-4cb4-be7e-4a8574fb77a8,sessionId:f29ec175ca6b7d10,event:TEST
 Doge
 Comments,userId:doge96514016ruffruff,action:track,context:{library:analytics-android,libraryVersion:0.6.13},properties:{comment:Much
 joy.}}, ...]

 This batch may contain n events will a structure like above.

 I want to put all events in a table where each element will be stored
 in a unique column: timestamp, requestId, sessionId, event, userId, action,
 context, properties

 2. explode the batch I read a lot about SerDe, etc. - but I don't get
 it.

 - I tried to create a table with an array and load the data into it -
 several errors
 use explode in query but it doesn't accept batch as array
 - integrated several SerDes but get things like unknown function jspilt
 - I'm lost in too many documents, howtos, etc. and could need some
 advices...

 Thank you in advance!

 Best, Chris






-- 
Nitin Pawar


Re: hive variables

2014-06-22 Thread Nitin Pawar
perfect


On Sun, Jun 22, 2014 at 11:48 AM, Lefty Leverenz leftylever...@gmail.com
wrote:

 Thanks Nitin, I've added that information to the wiki on the Variable
 Substitution page
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution#LanguageManualVariableSubstitution-SubstitutionDuringQueryConstruction
 .

 Please check my wording and let me know if revisions are needed.

 -- Lefty


 On Fri, Jun 20, 2014 at 5:17 AM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 hive variables are not replaced on mapreduce jobs but when the query is
 constructed with the variable.

 if you are running two difference hivesessions, the variables will not be
 mixed.

 If you are setting variables with same name in same hive session then the
 last set value will be picked


 On Fri, Jun 20, 2014 at 2:44 PM, Bogala, Chandra Reddy 
 chandra.bog...@gs.com wrote:

 How does hive variables work?. if I have multiple Hive jobs running
 simultaneously? Will they end up picking up values from each other?

 In automation I am constructing an HQL file by prepending it with some
 SET statements. I want to make sure if I submit two jobs at the same time
 that use the same variable names, one job won't pick up values from the
 other job.



 Same question from stakeoverflow:
 http://stackoverflow.com/questions/12464636/how-to-set-variables-in-hive-scripts



 Thanks,

 Chandra






 --
 Nitin Pawar





-- 
Nitin Pawar


Re: hive variables

2014-06-20 Thread Nitin Pawar
hive variables are not replaced on mapreduce jobs but when the query is
constructed with the variable.

if you are running two difference hivesessions, the variables will not be
mixed.

If you are setting variables with same name in same hive session then the
last set value will be picked


On Fri, Jun 20, 2014 at 2:44 PM, Bogala, Chandra Reddy 
chandra.bog...@gs.com wrote:

 How does hive variables work?. if I have multiple Hive jobs running
 simultaneously? Will they end up picking up values from each other?

 In automation I am constructing an HQL file by prepending it with some SET
 statements. I want to make sure if I submit two jobs at the same time that
 use the same variable names, one job won't pick up values from the other
 job.



 Same question from stakeoverflow:
 http://stackoverflow.com/questions/12464636/how-to-set-variables-in-hive-scripts



 Thanks,

 Chandra






-- 
Nitin Pawar


Re: mismatched input 'EOF' expecting FROM near 'CURRENT_TIME' in from clause

2014-06-19 Thread Nitin Pawar
Please take a look at hive's query language support.

its pro-sql but not fully sql compliance


On Thu, Jun 19, 2014 at 7:19 PM, Clay McDonald 
stuart.mcdon...@bateswhite.com wrote:

 Why does this not work?

 hive SELECT CURRENT_TIME;
 MismatchedTokenException(-1!=107)
 at
 org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
 at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
 at
 org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1194)
 at
 org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:31423)
 at
 org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:29520)
 at
 org.apache.hadoop.hive.ql.parse.HiveParser.regular_body(HiveParser.java:29428)
 at
 org.apache.hadoop.hive.ql.parse.HiveParser.queryStatement(HiveParser.java:28968)
 at
 org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:28762)
 at
 org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1238)
 at
 org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:938)
 at
 org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1000)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
 at
 org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
 at
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 FAILED: ParseException line 1:19 mismatched input 'EOF' expecting FROM
 near 'CURRENT_TIME' in from clause




-- 
Nitin Pawar


Re: simple insert query question

2014-06-19 Thread Nitin Pawar
remember in hive, insert operation is

1) from a file
2) from another table

hive's underlying storage is hdfs which is not meant for having just single
record kind of stuff (as of now, this will change once hive starts
supporting ACID actions in coming releases)

1) either create a sample file and load data in table using file
2) or create a dummy table and then write insert into table select from
table2 kind of dummy query


On Thu, Jun 19, 2014 at 7:26 PM, Clay McDonald 
stuart.mcdon...@bateswhite.com wrote:

  What about if I wanted to run this in hive,



 create table test_log (test_time timestamp, test_notes varchar(60));



 insert into table test_log values(now(),'THIS IS A TEST');





 *From:* Nishant Kelkar [mailto:nishant@gmail.com]
 *Sent:* Thursday, June 19, 2014 9:29 AM
 *To:* user@hive.apache.org; Clay McDonald
 *Subject:* Re: simple insert query question



 Hey Stuart,

 As far as I know, files in HDFS are immutable. So I would think that your
 query below would not have a direct Hive conversion.

 What you can do though, is create a local text file and then create an
 EXTERNAL TABLE on top of that. Then, instead of your INSERT query, just use
 some linux command to append a line to text file. It will automatically
 reflect in your external Hive table! :)

 To understand what Hive external tables are and how to create them, I'd
 just go on the Hive wiki page.

 Good luck!

 Best,
 Nishant

 On Jun 19, 2014 6:17 AM, Clay McDonald stuart.mcdon...@bateswhite.com
 wrote:

  hi all,

 how do I write the following query to insert a note with a current system
 timestamp?

 I tried the following;


 INSERT INTO TEST_LOG VALUES (unix_timestamp(),'THIS IS A TEST.');

 thanks, Clay




-- 
Nitin Pawar


Re: Storing and reading XML files in HIVE

2014-06-06 Thread Nitin Pawar
see if this can help you

https://github.com/dvasilen/Hive-XML-SerDe/wiki/XML-data-sources


On Fri, Jun 6, 2014 at 3:25 PM, Knowledge gatherer 
knowledge.gatherer@gmail.com wrote:

 U need to have a CustomSerde in Hive to read the XML files


 On Fri, Jun 6, 2014 at 2:58 PM, Yu Azuryy azuryy@gmail.com wrote:

 AFAIK, Hive doesn't provide XMLInputFormat , so you had to write it by
 yourself.


 On Fri, Jun 6, 2014 at 5:23 PM, Ramasubramanian Narayanan 
 ramasubramanian.naraya...@gmail.com wrote:

 Dear All,

 Request your help to guide how to store and read XML data in HIVE.

 while querying it should look as if we are having txt format file under
 HIVE (it is fine if we use view to parse the XML and show).

 Have gone through some sites but not able to figure out correctly.. few
 are mentioning that we need use some JAR's to achieve it...


 Thanks in advance,
 Rams






-- 
Nitin Pawar


Re: Python version compatibility for hive 0.13

2014-05-21 Thread Nitin Pawar
do you mean python hiveserver client library?

I would recommend you to upgrade to python 2.6 to the least


On Wed, May 21, 2014 at 9:54 PM, Hari Rajendhran hari.rajendh...@tcs.comwrote:

 Hi Team,

 Does Python 2.4.3 supports apache hive 0.13 version ?



 Best Regards
 Hari Krishnan Rajendhran
 Hadoop Admin
 DESS-ABIM ,Chennai BIGDATA Galaxy
 Tata Consultancy Services
 Cell:- 9677985515
 Mailto: hari.rajendh...@tcs.com
 Website: http://www.tcs.com
 
 Experience certainty. IT Services
 Business Solutions
 Consulting
 

 =-=-=
 Notice: The information contained in this e-mail
 message and/or attachments to it may contain
 confidential or privileged information. If you are
 not the intended recipient, any dissemination, use,
 review, distribution, printing or copying of the
 information contained in this e-mail message
 and/or attachments to it are strictly prohibited. If
 you have received this communication in error,
 please notify us by reply e-mail or telephone and
 immediately and permanently delete the message
 and any attachments. Thank you




-- 
Nitin Pawar


Re: Connecting hive to SAP BO

2014-05-19 Thread Nitin Pawar
another option would be add jar /path/to/serde/jar/file;


On Tue, May 20, 2014 at 10:45 AM, Shengjun Xin s...@gopivotal.com wrote:

 hive –auxpath /path-to-/csvserde.jar


 On Tue, May 20, 2014 at 12:59 PM, Chhaya Vishwakarma 
 chhaya.vishwaka...@lntinfotech.com wrote:



 Hi,

 I have connected SAP BO to Hive using a ODBC driver. I am able to see the
 database and table in hive. but when i fetch data from hive it gives error
 as

 org.apache.hadoop.hìve.serde2.SerDeExceptio SerDe
 com,bizohive.serde.csv.CSVSerde does not exist

 Can ayone suggest where i should put csvserde jar in SAP BO





 Regards,

 Chhaya Vishwakarma



 --
 The contents of this e-mail and any attachment(s) may contain
 confidential or privileged information for the intended recipient(s).
 Unintended recipients are prohibited from taking action on the basis of
 information in this e-mail and using or disseminating the information, and
 must notify the sender and delete it from their system. LT Infotech will
 not accept responsibility or liability for the accuracy or completeness of,
 or the presence of any virus or disabling code in this e-mail




 --
 Regards
 Shengjun




-- 
Nitin Pawar


Re: hive query to select top 10 product of each subcategory and select most recent product info

2014-04-11 Thread Nitin Pawar
may be you can share your table ddl, your query and what output r u looking
for


On Fri, Apr 11, 2014 at 12:26 PM, Mohit Durgapal durgapalmo...@gmail.comwrote:

 I have a hive table partitioned by dates. It contains ecomm data in the
 format siteid,sitecatid,catid,subcatgid,pid,pname,pprice,pmrp,pdesc



 What I need to do is to run a query on table above in hive for top 10
 products(count wise) in each sub category. What adds a bit more complexity
 is that I need all the information of the product. Now when I do group by
 with only subcatg,pid, I can only select the same fields. But I want all
 the data for that product coming in the same row as subcatg  prodid like
 prodname, proddesc,price, mrp,imageurl. And since some information like
 price  proddesc of a product keep on changing I want to pick the latest
 column values(according to a date field) for a pid if we are able to do a
 group by on subcatg,pid.


 I am not able to find a solution to my problem in hive. Any help would be
 much appreciated.


 Regards
 Mohit




-- 
Nitin Pawar


Re: hive query to select top 10 product of each subcategory and select most recent product info

2014-04-11 Thread Nitin Pawar
will it be a good idea to just get top 10 ranked products by whatever your
ranking is based on and then join it with its metadata (self join or any
other way) ?


On Fri, Apr 11, 2014 at 1:52 PM, Mohit Durgapal durgapalmo...@gmail.comwrote:

 Hi Nitin,

 The ddl is as follows:

 CREATE EXTERNAL TABLE user_logs(
 users_iduuidstring,
 siteid  int,
 site_catid  int,
 stext   string,
 catgint,   // CATEGORY
 scatg   int, // SUBCATEGORY
 catgnamestring,
 scatgname   string,
 brand   string,// PRODUCT BRAND NAME
 prrange string,
 currint,
 pname   string, // product name
 pid int,  // product ID
 price   string,  //Product Price
 prodnbr int,
 mrp string,  //MRP
 prURL string, //Product url
 prIMGURL string, //Product Image URL
 opr string,
 oid string,
 txsucc  string,
 last_updatedstring //timestamp
 )

 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','


 I am looking for an output where I have top 10 products from each
 subcategory(on the basis of count) with all their information like product
 name, price, url, imgurl. Any there will be multiple entries for the same
 products (pids) within the same subcategory, In that case I have to pick
 the product info that is latest(by last_updated field).


 I have written a query but it is considering a multiple entries of
 product as  different products If price or any other info changes for that
 product.



 select siteid,site_catid,catg,scatg,COLLECT_SET(PRODDESC) from
 (
   select
 PRODDESC,displays,siteid,site_catid,catg,scatg,rank(siteid,site_catid,catg,scatg)
 as row_number from
   (
   select count(*) as
 displays,siteid,site_catid,catg,scatg,CONCAT('{','pname:',pname,',price:',price,',','mrp:',mrp,',curr:',curr,',pid:',pid,'}')
 as PRODDESC from
   user_logs group by siteid,site_catid,catg,scatg,pid,pname,price,mrp,curr
 order by siteid,site_catid,catg,scatg,displays desc
   ) A
   ) B
 WHERE row_number  10
 group by siteid,site_catid,catg,scatg
 order by siteid,site_catid,catg,scatg desc;

 The rank() method simply helps in fetching top 10 within a subcategory.
 Every time it encounters the same combination of
 siteid,site_catid,catg,scatg it increments row_number goes till 10.

 The problem above is that I am forced to put product info such as
 pname,price,mrp, in the group by clause otherwise I will not be able
 to get  that information  in select. Therefore, even if someone changes
 just the price a product(this happens very frequently) it is considered a
 different product by the above query. And that is something I don't want.

 I hope I have made it a little more clear?  Thanks for your reply :)



 On Fri, Apr 11, 2014 at 12:45 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 may be you can share your table ddl, your query and what output r u
 looking for


 On Fri, Apr 11, 2014 at 12:26 PM, Mohit Durgapal durgapalmo...@gmail.com
  wrote:

 I have a hive table partitioned by dates. It contains ecomm data in the
 format siteid,sitecatid,catid,subcatgid,pid,pname,pprice,pmrp,pdesc



 What I need to do is to run a query on table above in hive for top 10
 products(count wise) in each sub category. What adds a bit more complexity
 is that I need all the information of the product. Now when I do group by
 with only subcatg,pid, I can only select the same fields. But I want all
 the data for that product coming in the same row as subcatg  prodid like
 prodname, proddesc,price, mrp,imageurl. And since some information like
 price  proddesc of a product keep on changing I want to pick the latest
 column values(according to a date field) for a pid if we are able to do a
 group by on subcatg,pid.


 I am not able to find a solution to my problem in hive. Any help would
 be much appreciated.


 Regards
 Mohit




 --
 Nitin Pawar





-- 
Nitin Pawar


Re: HIVE UDF Error

2014-04-09 Thread Nitin Pawar
Can you put first few lines of your code here or upload code on github and
share the link?




On Wed, Apr 9, 2014 at 11:59 AM, Rishabh Bhardwaj rbnex...@yahoo.comwrote:

 Hi all,
 I have done the following steps to create a UDF in hive but getting
 error.Please help me.
 1. Created the udf as described 
 herehttp://blog.matthewrathbone.com/2013/08/10/guide-to-writing-hive-udfs.html
 .
 2. Compiled it successfully.
 3. Copy the class file to a directory hiveudfs.
 4. Added it to a jar with this command: jar -cf hiveudfs.jar
 hiveudfs/SimpleUDFExample.class
 5. Import the jar into hive. add jar hiveudfs.jar;  (Added Successfully)
 create temporary function helloworld as 'hiveudfs.SimpleUDFExample';
 At this I am getting the following error,
 hive create temporary function helloworld as 'hiveudfs.SimpleUDFExample';
 java.lang.NoClassDefFoundError: hiveudfs/SimpleUDFExample (wrong name:
 SimpleUDFExample)
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
 at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:266)
 at
 org.apache.hadoop.hive.ql.exec.FunctionTask.getUdfClass(FunctionTask.java:105)
 at
 org.apache.hadoop.hive.ql.exec.FunctionTask.createFunction(FunctionTask.java:75)
 at
 org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:63)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
 at
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1353)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1137)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:867)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:755)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 FAILED: Execution Error, return code -101 from
 org.apache.hadoop.hive.ql.exec.FunctionTask
 Thanks,
 Rishabh.




-- 
Nitin Pawar


Re: HIVE UDF Error

2014-04-09 Thread Nitin Pawar
in your code and that code package is missing

what you need to do is
define package something like

package org.apache.hadoop.hive.ql.udf;

then your add function definition becomes

CREATE TEMPORARY FUNCTION function_name AS
'org.apache.hadoop.hive.ql.udf.ClassName';

feel free to use any package name you wish but make sure its reflected same

also to build and compile and package hive udfs
use the shell script if you are on linux

http://yaboolog.blogspot.in/2011/06/compiling-original-hive-udf.html



On Wed, Apr 9, 2014 at 12:12 PM, Rishabh Bhardwaj rbnex...@yahoo.comwrote:

 Hi Nitin,
 Thanks for the concern.
 Here is the code of the UDF,
 import org.apache.hadoop.hive.ql.exec.Description;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.Text;


 @Description(
   name=SimpleUDFExample,
   value=returns 'hello x', where x is whatever you give it (STRING),
   extended=SELECT simpleudfexample('world') from foo limit 1;
   )
 class SimpleUDFExample extends UDF {

   public Text evaluate(Text input) {
 if(input == null) return null;
 return new Text(Hello  + input.toString());
   }
 }
 From google I came across a blog.
 I have taken this from here (git 
 linkhttps://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/SimpleUDFExample.java
 ).

   On Wednesday, 9 April 2014 12:08 PM, Nitin Pawar 
 nitinpawar...@gmail.com wrote:
  Can you put first few lines of your code here or upload code on github
 and share the link?




 On Wed, Apr 9, 2014 at 11:59 AM, Rishabh Bhardwaj rbnex...@yahoo.comwrote:

 Hi all,
 I have done the following steps to create a UDF in hive but getting
 error.Please help me.
 1. Created the udf as described 
 herehttp://blog.matthewrathbone.com/2013/08/10/guide-to-writing-hive-udfs.html
 .
 2. Compiled it successfully.
 3. Copy the class file to a directory hiveudfs.
 4. Added it to a jar with this command: jar -cf hiveudfs.jar
 hiveudfs/SimpleUDFExample.class
 5. Import the jar into hive. add jar hiveudfs.jar;  (Added Successfully)
 create temporary function helloworld as 'hiveudfs.SimpleUDFExample';
 At this I am getting the following error,
 hive create temporary function helloworld as 'hiveudfs.SimpleUDFExample';
 java.lang.NoClassDefFoundError: hiveudfs/SimpleUDFExample (wrong name:
 SimpleUDFExample)
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
 at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:266)
 at
 org.apache.hadoop.hive.ql.exec.FunctionTask.getUdfClass(FunctionTask.java:105)
 at
 org.apache.hadoop.hive.ql.exec.FunctionTask.createFunction(FunctionTask.java:75)
 at
 org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:63)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
 at
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1353)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1137)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:867)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:755)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 FAILED: Execution Error, return code -101 from
 org.apache.hadoop.hive.ql.exec.FunctionTask
 Thanks,
 Rishabh.




 --
 Nitin Pawar





-- 
Nitin Pawar


Re: HIVE UDF Error

2014-04-09 Thread Nitin Pawar
Follow the steps as it is from the link I shared .. it works

Somehow your package is getting messed and it is not able to find the class


On Wed, Apr 9, 2014 at 12:27 PM, Rishabh Bhardwaj rbnex...@yahoo.comwrote:

 I added,
 package rishabh.udf.hive;
 in the above code.
 and repeated the steps.
 But Now getting the following error,

 hive create temporary function helloworld as
 'rishabh.udf.hive.SimpleUDFExample';
 FAILED: Class rishabh.udf.hive.SimpleUDFExample not found
 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.FunctionTask

 The SimpleUDFExample.class file is in hiveudfs.jar file.


   On Wednesday, 9 April 2014 12:20 PM, Nitin Pawar 
 nitinpawar...@gmail.com wrote:
  in your code and that code package is missing

 what you need to do is
 define package something like

 package org.apache.hadoop.hive.ql.udf;

 then your add function definition becomes

 CREATE TEMPORARY FUNCTION function_name AS
 'org.apache.hadoop.hive.ql.udf.ClassName';

 feel free to use any package name you wish but make sure its reflected
 same

 also to build and compile and package hive udfs
 use the shell script if you are on linux

 http://yaboolog.blogspot.in/2011/06/compiling-original-hive-udf.html



 On Wed, Apr 9, 2014 at 12:12 PM, Rishabh Bhardwaj rbnex...@yahoo.comwrote:

 Hi Nitin,
 Thanks for the concern.
 Here is the code of the UDF,
 import org.apache.hadoop.hive.ql.exec.Description;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.Text;


 @Description(
   name=SimpleUDFExample,
   value=returns 'hello x', where x is whatever you give it (STRING),
   extended=SELECT simpleudfexample('world') from foo limit 1;
   )
 class SimpleUDFExample extends UDF {

   public Text evaluate(Text input) {
 if(input == null) return null;
 return new Text(Hello  + input.toString());
   }
 }
 From google I came across a blog.
 I have taken this from here (git 
 linkhttps://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/SimpleUDFExample.java
 ).

   On Wednesday, 9 April 2014 12:08 PM, Nitin Pawar 
 nitinpawar...@gmail.com wrote:
   Can you put first few lines of your code here or upload code on github
 and share the link?




 On Wed, Apr 9, 2014 at 11:59 AM, Rishabh Bhardwaj rbnex...@yahoo.comwrote:

 Hi all,
 I have done the following steps to create a UDF in hive but getting
 error.Please help me.
 1. Created the udf as described 
 herehttp://blog.matthewrathbone.com/2013/08/10/guide-to-writing-hive-udfs.html
 .
 2. Compiled it successfully.
 3. Copy the class file to a directory hiveudfs.
 4. Added it to a jar with this command: jar -cf hiveudfs.jar
 hiveudfs/SimpleUDFExample.class
 5. Import the jar into hive. add jar hiveudfs.jar;  (Added Successfully)
 create temporary function helloworld as 'hiveudfs.SimpleUDFExample';
 At this I am getting the following error,
 hive create temporary function helloworld as 'hiveudfs.SimpleUDFExample';
 java.lang.NoClassDefFoundError: hiveudfs/SimpleUDFExample (wrong name:
 SimpleUDFExample)
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
 at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:266)
 at
 org.apache.hadoop.hive.ql.exec.FunctionTask.getUdfClass(FunctionTask.java:105)
 at
 org.apache.hadoop.hive.ql.exec.FunctionTask.createFunction(FunctionTask.java:75)
 at
 org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:63)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
 at
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1353)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1137)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:867)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:755)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613

Re: Can I update just one row in Hive table using Hive INSERT OVERWRITE

2014-04-04 Thread Nitin Pawar
for non partitioned columns ans in one word: NO

detailed answer here: This feature is still being build as part of
https://issues.apache.org/jira/browse/HIVE-5317


On Sat, Apr 5, 2014 at 2:28 AM, Raj Hadoop hadoop...@yahoo.com wrote:


 Can I update ( delete and insert kind of) just one row keeping the
 remaining rows intact in Hive table using Hive INSERT OVERWRITE. There is
 no partition in the Hive table.


 INSERT OVERWRITE TABLE tablename SELECT col1,col2,col3 from tabx where
 col2='abc';

 Does the above work ? Please advise.





-- 
Nitin Pawar


Re: READING FILE FROM MONGO DB

2014-04-01 Thread Nitin Pawar
you can always write customUDF for your needs


On Tue, Apr 1, 2014 at 1:35 PM, Swagatika Tripathy
swagatikat...@gmail.comwrote:

 Do we hv a for loop concept in hive to iterate through the array elements
 n display them. We need an alternative for explode method
 Well you cN use Json serde for this

 Sent from my iPhone

 On Mar 26, 2014, at 8:40 PM, Swagatika Tripathy swagatikat...@gmail.com
 wrote:

 Hi ,
 The use case is we have some unstructured data fetched from Mongo DB and
 stored in a particular location. Our task is to load those data into our
 staging and core hive tables in form of rows and columns.eg if the data
 is in key value pair like:
 {
 Id: bigint(12346),
 Name:string(ABC),
 Subjects:
 {Subject enrolled:
 Subjects:
 [eng ,math]
 }
 {Game enrolled:
 [Football,cricket]
 }
 This is just a very simple eg fr reference but we have a complex Json
 format with huge amount of data.

 So, in this case how can we load it into hive tables and hdfs?
  On Mar 26, 2014 10:59 PM, shouvanik.hal...@accenture.com wrote:

 Are you swagatika mohanty?






 Thanks,
 Shouvanik


 -Original Message-
 From: Siddharth Tiwari [mailto:siddharth.tiw...@live.com]
 Sent: Wednesday, March 26, 2014 10:03 AM
 To: user@hive.apache.org
 Subject: Re: READING FILE FROM MONGO DB

 Hi Swagatika
 You can create external tables to Mongo and can process it using hive.
 New mongo connectors have added support for hive. Did you try that?

 Sent from my iPhone

  On Mar 26, 2014, at 9:59 AM, Swagatika Tripathy 
 swagatikat...@gmail.com wrote:
 
  Hi,
  We have some files stored in MongoDB , mostly in key value format. We
 need to parse those files and store it into Hive tables.
 
  Any inputs on this will be appreciated.
 
  Thanks,
  Swagatika
 


 

 This message is for the designated recipient only and may contain
 privileged, proprietary, or otherwise confidential information. If you have
 received it in error, please notify the sender immediately and delete the
 original. Any other use of the e-mail by you is prohibited. Where allowed
 by local law, electronic communications with Accenture and its affiliates,
 including e-mail and instant messaging (including content), may be scanned
 by our systems for the purposes of information security and assessment of
 internal compliance with Accenture policy.

 __

 www.accenture.com




-- 
Nitin Pawar


Re: pig,hive install over hadoop

2014-04-01 Thread Nitin Pawar
Pig and hive do not come in bare minimum version. Its complete pig or hive
package

You can use existing hadoop cluster with pig and hive.
If you do not need persistent storage for hive tables, then you dont need
to configure much .

Search for hive with derby and that should get you started.
On pig side, just downloading the binaries is good enough. You can point it
to your HADOOP_HOME and it should work fine


On Tue, Apr 1, 2014 at 3:34 PM, Rahul Singh smart.rahul.i...@gmail.comwrote:

 Hi,
I have installed and configured hadoop. Now, I want to install hive and
 pig, As per my understanding pig and hive internally uses hadoop. So is
 there a way i can just install bare minimum hive or pig and take advantage
 of already installed hadoop or i need to separately install and configure
 complete hive and pig.

 Thanks,
 -Rahul Singh




-- 
Nitin Pawar


Re: MSCK REPAIR TABLE

2014-03-27 Thread Nitin Pawar
can you grab more logs from hiveserver2 log file?


On Thu, Mar 27, 2014 at 2:31 PM, fab wol darkwoll...@gmail.com wrote:

 Hey everyone,

 I have a table with currently 5541 partitions. Daily there are 14
 partitions added. I will switch the update for the metastore from msck
 repair table to alter table add partition, since its performing better,
 but sometimes this might fail, and i need the msck repair table command.
 But unfortunately its not working anymore with this table size it seems:

 0: jdbc:hive2://clusterXYZ- use DB_NAME;
 No rows affected (1.082 seconds)
 0: jdbc:hive2://clusterXYZ- set hive.metastore.client.socket.timeout=6000;
 No rows affected (0.029 seconds)
 0: jdbc:hive2://clusterXYZ- MSCK REPAIR TABLE TABLENAME;
 Error: Error while processing statement: FAILED: Execution Error, return
 code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
 Error: Error while processing statement: FAILED: Execution Error, return
 code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)

 anyone had luck with getting this to work? As you can see, I already
 raised the time until the Thrift Timeout kicks in, but this error is
 happening even before the time runs off ...

 Cheers
 Wolli




-- 
Nitin Pawar


  1   2   3   4   5   >