Re: ORC algorithm skeleton

2016-07-26 Thread praveenesh kumar
Also - http://orc.apache.org/docs/spec-intro.html

On Tue, Jul 26, 2016 at 1:49 PM, praveenesh kumar <praveen...@gmail.com>
wrote:

> This might help  - https://issues.apache.org/jira/browse/HIVE-3874
>
> Regards
> Prav
>
> On Tue, Jul 26, 2016 at 12:58 PM, Amatucci, Mario, Vodafone Group <
> mario.amatu...@vodafone.com> wrote:
>
>> Hello everyone,
>>
>> Anyone got any idea about ORC algorithm skeleton? How it works?
>>
>>
>


Re: ORC algorithm skeleton

2016-07-26 Thread praveenesh kumar
This might help  - https://issues.apache.org/jira/browse/HIVE-3874

Regards
Prav

On Tue, Jul 26, 2016 at 12:58 PM, Amatucci, Mario, Vodafone Group <
mario.amatu...@vodafone.com> wrote:

> Hello everyone,
>
> Anyone got any idea about ORC algorithm skeleton? How it works?
>
>


Capturing HIVE CLI errors

2013-10-08 Thread praveenesh kumar
Hi guys,

I am trying to write a client code for HIVE CLI via ODBC connector. I want
to add query validation code at my client side. I was wondering is there a
way I can capture the Hive query syntax errors which I can use to validate
at my client end.

I don't want to write my own validation codes for checking the hive query.
I was wondering if that is possible to catch the errors that we usually get
at HIVE CLI and throw it at the client end.

Regards
Praveenesh


Re: RHive for R

2013-02-24 Thread praveenesh kumar
Can you please elaborate more on the issues you are facing with the
configuration ?

Regards
Praveenesh

On Sun, Feb 24, 2013 at 4:34 AM, Daniel Mason dma...@bellycard.com wrote:
 Anyone use the RHive connector for R?  I'm having trouble with the 
 configuration steps on Github and was looking for any additional 
 documentation that would be available.

 Thanks.


Re: How to load csv data into HIVE

2012-09-08 Thread praveenesh kumar
You can use hadoop streaming that would be much faster... Just run your
cleaning shell script logic in map phase and it will be done in just few
minutes. That will keep the data in HDFS.

Regards,
Praveenesh

On Fri, Sep 7, 2012 at 8:37 PM, Sandeep Reddy P sandeepreddy.3...@gmail.com
 wrote:

 Hi,
 Thank you all for your help. I'll try both ways and i'll get back to you.


 On Fri, Sep 7, 2012 at 11:02 AM, Mohammad Tariq donta...@gmail.comwrote:

 I said this assuming that a Hadoop cluster is available since Sandeep is
 planning to use Hive. If that is the case then MapReduce would be faster
 for such large files.

 Regards,
 Mohammad Tariq



 On Fri, Sep 7, 2012 at 8:27 PM, Connell, Chuck 
 chuck.conn...@nuance.comwrote:

  I cannot promise which is faster. A lot depends on how clever your
 scripts are.

 ** **

 ** **

 ** **

 *From:* Sandeep Reddy P [mailto:sandeepreddy.3...@gmail.com]
 *Sent:* Friday, September 07, 2012 10:42 AM
 *To:* user@hive.apache.org
 *Subject:* Re: How to load csv data into HIVE

 ** **

 Hi,
 I wrote a shell script to get csv data but when i run that script on a
 12GB csv its taking more time. If i run a python script will that be faster?
 

 On Fri, Sep 7, 2012 at 10:39 AM, Connell, Chuck 
 chuck.conn...@nuance.com wrote:

 How about a Python script that changes it into plain tab-separated text?
 So it would look like this…

  

 174969274tab14-mar-2006tab3522876tab
 tab14-mar-2006tab50308tab65tab1newline
 etc…

  

 Tab-separated with newlines is easy to read and works perfectly on
 import.

  

 Chuck Connell

 Nuance RD Data Team

 Burlington, MA

 781-565-4611

  

 *From:* Sandeep Reddy P [mailto:sandeepreddy.3...@gmail.com]
 *Subject:* How to load csv data into HIVE

  

 Hi,
 Here is the sample data
 174969274,14-mar-2006,

 3522876,,14-mar-2006,50308,65,1|

 174969275,19-jul-2006,3523154,,19-jul-2006,50308,65,1|

 174969276,31-dec-2005,3530333,,31-dec-2005,50308,65,1|

 174969277,14-apr-2005,3531470,,14-apr-2005,50308,65,1|

 How to load this kind of data into HIVE?
 I'm using shell script to get rid of double quotes and '|' but its
 taking very long time to work on each csv which are 12GB each. What is the
 best way to do this?

  




 --
 Thanks,
 sandeep





 --
 Thanks,
 sandeep




Re: How to load csv data into HIVE

2012-09-08 Thread praveenesh kumar
Yup, Bejoy is correct :-) Just use hadoop streaming, for what it can do
best --- Cleaning, Transformations and Validations, in just simple steps.

Regards,
Praveenesh

On Sat, Sep 8, 2012 at 6:03 PM, Bejoy KS bejoy...@yahoo.com wrote:

 Hi Chuck

 I believe Praveenesh was adding his thought to the discussion on
 preprocessing the data using mapreduce itself. If you go with hadoop
 streaming you can use the python script in the mapper and that will do the
 preprocessing parallely on large volume data. Then this preprocessed data
 can be loaded into hive table.


 Regards
 Bejoy KS

 Sent from handheld, please excuse typos.
 --
 *From: * Connell, Chuck chuck.conn...@nuance.com
 *Date: *Sat, 8 Sep 2012 12:18:33 +
 *To: *user@hive.apache.orguser@hive.apache.org
 *ReplyTo: * user@hive.apache.org
 *Subject: *RE: How to load csv data into HIVE

 I would like to hear more about this hadoop streaming to Hive idea. I
 have used streaming jobs as mappers, with a python script as map.py. Are
 you saying that such a streaming mapper can load its output into Hive? Can
 you send some example code? Hive wants to load files not individual
 lines/records. How would you do this?

 Thanks very much,
 Chuck


  --
 *From:* praveenesh kumar [praveen...@gmail.com]
 *Sent:* Saturday, September 08, 2012 7:54 AM
 *To:* user@hive.apache.org
 *Subject:* Re: How to load csv data into HIVE

  You can use hadoop streaming that would be much faster... Just run your
 cleaning shell script logic in map phase and it will be done in just few
 minutes. That will keep the data in HDFS.

 Regards,
 Praveenesh

 On Fri, Sep 7, 2012 at 8:37 PM, Sandeep Reddy P 
 sandeepreddy.3...@gmail.com wrote:

 Hi,
 Thank you all for your help. I'll try both ways and i'll get back to you.


 On Fri, Sep 7, 2012 at 11:02 AM, Mohammad Tariq donta...@gmail.comwrote:

 I said this assuming that a Hadoop cluster is available since Sandeep is
 planning to use Hive. If that is the case then MapReduce would be faster
 for such large files.

 Regards,
 Mohammad Tariq



 On Fri, Sep 7, 2012 at 8:27 PM, Connell, Chuck chuck.conn...@nuance.com
  wrote:

  I cannot promise which is faster. A lot depends on how clever your
 scripts are.

 ** **

 ** **

 ** **

 *From:* Sandeep Reddy P [mailto:sandeepreddy.3...@gmail.com]
 *Sent:* Friday, September 07, 2012 10:42 AM
 *To:* user@hive.apache.org
 *Subject:* Re: How to load csv data into HIVE

 ** **

 Hi,
 I wrote a shell script to get csv data but when i run that script on a
 12GB csv its taking more time. If i run a python script will that be 
 faster?
 

 On Fri, Sep 7, 2012 at 10:39 AM, Connell, Chuck 
 chuck.conn...@nuance.com wrote:

 How about a Python script that changes it into plain tab-separated
 text? So it would look like this…

  

 174969274tab14-mar-2006tab3522876tab
 tab14-mar-2006tab50308tab65tab1newline
 etc…

  

 Tab-separated with newlines is easy to read and works perfectly on
 import.

  

 Chuck Connell

 Nuance RD Data Team

 Burlington, MA

 781-565-4611

  

 *From:* Sandeep Reddy P [mailto:sandeepreddy.3...@gmail.com]
 *Subject:* How to load csv data into HIVE

  

 Hi,
 Here is the sample data
 174969274,14-mar-2006,

 3522876,,14-mar-2006,50308,65,1|

 174969275,19-jul-2006,3523154,,19-jul-2006,50308,65,1|

 174969276,31-dec-2005,3530333,,31-dec-2005,50308,65,1|

 174969277,14-apr-2005,3531470,,14-apr-2005,50308,65,1|

 How to load this kind of data into HIVE?
 I'm using shell script to get rid of double quotes and '|' but its
 taking very long time to work on each csv which are 12GB each. What is the
 best way to do this?

  




 --
 Thanks,
 sandeep





  --
 Thanks,
 sandeep





Starting hive thrift server as daemon process ?

2012-06-13 Thread praveenesh kumar
Hi Hive users,

I was just wondering why hive thrift server is not running as a daemon
process by default. Also I am not seeing any option to start it as a daemon
process if I want to.
I am remotely login in to the hadoop cluster and every time my session is
either disconnecting or closing - thrift server is going down also.
I can however start it as background process by nohup or using  in
linux - but just thinking  - Is there a reason why this feature is not
present in hive ?
What would happen if I have hive thrift server will run as daemon process ?

Regards,
Praveenesh


Hive UDF error : numberformat exception (String to Integer) conversion

2012-05-30 Thread praveenesh kumar
Hello Hive Users,

There is a strange situation I am facing.

I have a string column in my Hive table ( its IP address). I am creating a
UDF where I am taking this string column and converting it into Long value.
Its a simple UDF. Following is my code :

package com.practice.hive.udf;
public class IPtoINT extends UDF {
public static LongWritable execute(Text addr) {

 String[] addrArray = addr.toString().split(\\.);

 long num = 0;

 for (int i=0;iaddrArray.length;i++) {
  int power = 3-i;
num += ((Integer.parseInt(addrArray[i])%256 *
Math.pow(256,power)));
 }
 return new LongWritable(num);
  }
}

After creating jar, I am running the following commands:

$ hive
hive  add jar /home/hadoop/Desktop/HiveData/IPtoINT.jar;
hive  create temporary function ip2int as 'com.practice.hive.udf.IPtoINT';
hive  select ip2int(ip1) from sample_data;

But running the above, is giving me the following error:

java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row
{ip1:1.0.144.36,ip2:16814116,country:Thailand,key:null}
at
org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native
Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row
{ip1:1.0.144.36,ip2:16814116,country:Thailand,key:null}
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at
org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to
execute method public org.apache.hadoop.io.Text
com.musigma.hive.udf.ip2int.evaluate(org.apache.hadoop.io.Text)  on object
com.musigma.hive.udf.ip2int@19a4d79 of class com.musigma.hive.udf.ip2int
with arguments {1.0.144.36:org.apache.hadoop.io.Text} of size 1
at
org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:848)
at
org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:181)
at
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:163)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:76)
at
org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at
org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)
... 9 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:824)
... 18 more
Caused by: java.lang.NumberFormatException: For input string: 1
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.init(Long.java:702)
at com.musigma.hive.udf.ip2int.evaluate(ip2int.java:11)
... 23 more


If I am running the HIVE UDF like --- select ip2int(102.134.123.1) from
sample_data; Its not giving any error.
Strange thing is, its numberformat exception. I am not able to use the
string into int in my java. Very strange issue.

Can someone please tell me what stupid mistake I am doing ?

Is there any other UDF that does string to int/long coversion ?

Regards,
Praveenesh


Re: Hive UDF error : numberformat exception (String to Integer) conversion

2012-05-30 Thread praveenesh kumar
I have done both the things.
There is no null issue here. Checked the nulls also. Sorry not mentioned in
the code.
I also have made a main function and called my evaluate function. If I am
passing a string, its working fine.

Problem is of numberformat exception.
Integer.parseInt is throwing this.. I don't know why... I am  converting
hadoop's Text object to String.. Splitting it.. converting into String
array.. and giving String inside is giving me problems. What could be the
reason.. I know its not at all hive related..Its java mistake only. Please
help me out in resolving this issue. Sounds embarassing, but don't know why
I am not able to see the mistake I am doing.

Regards,
Praveenesh



On Wed, May 30, 2012 at 8:17 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 i won't tell the error but I would recommend to write a main function in
 your udf and try with sample inputs which you are expecting in your query.

 You will know whats the error you are committing


 On Wed, May 30, 2012 at 8:14 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 You should to try catch and return NULL on bad data. The issue is if
 you have a single bad row the UDF will throw a exception up the chain.
 It will try again, it will fail again, ultimately the job will fail.

 On Wed, May 30, 2012 at 10:40 AM, praveenesh kumar praveen...@gmail.com
 wrote:
  Hello Hive Users,
 
  There is a strange situation I am facing.
 
  I have a string column in my Hive table ( its IP address). I am
 creating a
  UDF where I am taking this string column and converting it into Long
 value.
  Its a simple UDF. Following is my code :
 
  package com.practice.hive.udf;
  public class IPtoINT extends UDF {
  public static LongWritable execute(Text addr) {
 
   String[] addrArray = addr.toString().split(\\.);
 
   long num = 0;
 
   for (int i=0;iaddrArray.length;i++) {
int power = 3-i;
  num += ((Integer.parseInt(addrArray[i])%256 *
  Math.pow(256,power)));
   }
   return new LongWritable(num);
}
  }
 
  After creating jar, I am running the following commands:
 
  $ hive
  hive  add jar /home/hadoop/Desktop/HiveData/IPtoINT.jar;
  hive  create temporary function ip2int as
 'com.practice.hive.udf.IPtoINT';
  hive  select ip2int(ip1) from sample_data;
 
  But running the above, is giving me the following error:
 
  java.lang.RuntimeException:
  org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
 while
  processing row
  {ip1:1.0.144.36,ip2:16814116,country:Thailand,key:null}
  at
  org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
  at
 org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
  at
  org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
  at
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
  at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
  at java.security.AccessController.doPrivileged(Native
  Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at
 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
  at org.apache.hadoop.mapred.Child.main(Child.java:249)
  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
 Runtime
  Error while processing row
  {ip1:1.0.144.36,ip2:16814116,country:Thailand,key:null}
  at
  org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
  at
  org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
  ... 8 more
  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to
  execute method public org.apache.hadoop.io.Text
  com.musigma.hive.udf.ip2int.evaluate(org.apache.hadoop.io.Text)  on
 object
  com.musigma.hive.udf.ip2int@19a4d79 of class
 com.musigma.hive.udf.ip2int
  with arguments {1.0.144.36:org.apache.hadoop.io.Text} of size 1
  at
 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:848)
  at
 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:181)
  at
 
 org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:163)
  at
 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:76)
  at
  org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
  at
  org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
  at
 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
  at
  org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
  at
  org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762

Hive starting error

2011-12-30 Thread praveenesh kumar
Hi,

I am using Hive 0.7.1 on hadoop 0.20.205
While running hive. its giving me following error :

Exception in thread main java.lang.NoSuchMethodError:
org.apache.hadoop.security.UserGroupInformation.login(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/security/UserGroupInformation;
at
org.apache.hadoop.hive.shims.Hadoop20Shims.getUGIForConf(Hadoop20Shims.java:448)
at
org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator.setConf(HadoopDefaultAuthenticator.java:51)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at
org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthenticator(HiveUtils.java:222)
at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:219)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:417)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Any Idea on how to resolve this issue ?

Thanks,
Praveenesh


Re: Hive starting error

2011-12-30 Thread praveenesh kumar
How to resolve this thing ?

Thanks,
Praveenesh

On Fri, Dec 30, 2011 at 3:21 PM, alo alt wget.n...@googlemail.com wrote:

 Hi,

 I think thats a typo, I was grepping 0.20.2-737 and did not found
 Lorg/apache/hadoop/security/UserGroupInformation. Should be
 org/apache/hadoop/security/UserGroupInformation. Correct me please if
 I wrong.

 - Alex

 On Fri, Dec 30, 2011 at 10:39 AM, praveenesh kumar praveen...@gmail.com
 wrote:
  Hi,
 
  I am using Hive 0.7.1 on hadoop 0.20.205
  While running hive. its giving me following error :
 
  Exception in thread main java.lang.NoSuchMethodError:
 
 org.apache.hadoop.security.UserGroupInformation.login(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/security/UserGroupInformation;
  at
 
 org.apache.hadoop.hive.shims.Hadoop20Shims.getUGIForConf(Hadoop20Shims.java:448)
  at
 
 org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator.setConf(HadoopDefaultAuthenticator.java:51)
  at
  org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
  at
 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
  at
 
 org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthenticator(HiveUtils.java:222)
  at
 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:219)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:417)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 
  Any Idea on how to resolve this issue ?
 
  Thanks,
  Praveenesh



 --
 Alexander Lorenz
 http://mapredit.blogspot.com

 P Think of the environment: please don't print this email unless you
 really need to.