Re: ORC algorithm skeleton
Also - http://orc.apache.org/docs/spec-intro.html On Tue, Jul 26, 2016 at 1:49 PM, praveenesh kumar <praveen...@gmail.com> wrote: > This might help - https://issues.apache.org/jira/browse/HIVE-3874 > > Regards > Prav > > On Tue, Jul 26, 2016 at 12:58 PM, Amatucci, Mario, Vodafone Group < > mario.amatu...@vodafone.com> wrote: > >> Hello everyone, >> >> Anyone got any idea about ORC algorithm skeleton? How it works? >> >> >
Re: ORC algorithm skeleton
This might help - https://issues.apache.org/jira/browse/HIVE-3874 Regards Prav On Tue, Jul 26, 2016 at 12:58 PM, Amatucci, Mario, Vodafone Group < mario.amatu...@vodafone.com> wrote: > Hello everyone, > > Anyone got any idea about ORC algorithm skeleton? How it works? > >
Capturing HIVE CLI errors
Hi guys, I am trying to write a client code for HIVE CLI via ODBC connector. I want to add query validation code at my client side. I was wondering is there a way I can capture the Hive query syntax errors which I can use to validate at my client end. I don't want to write my own validation codes for checking the hive query. I was wondering if that is possible to catch the errors that we usually get at HIVE CLI and throw it at the client end. Regards Praveenesh
Re: RHive for R
Can you please elaborate more on the issues you are facing with the configuration ? Regards Praveenesh On Sun, Feb 24, 2013 at 4:34 AM, Daniel Mason dma...@bellycard.com wrote: Anyone use the RHive connector for R? I'm having trouble with the configuration steps on Github and was looking for any additional documentation that would be available. Thanks.
Re: How to load csv data into HIVE
You can use hadoop streaming that would be much faster... Just run your cleaning shell script logic in map phase and it will be done in just few minutes. That will keep the data in HDFS. Regards, Praveenesh On Fri, Sep 7, 2012 at 8:37 PM, Sandeep Reddy P sandeepreddy.3...@gmail.com wrote: Hi, Thank you all for your help. I'll try both ways and i'll get back to you. On Fri, Sep 7, 2012 at 11:02 AM, Mohammad Tariq donta...@gmail.comwrote: I said this assuming that a Hadoop cluster is available since Sandeep is planning to use Hive. If that is the case then MapReduce would be faster for such large files. Regards, Mohammad Tariq On Fri, Sep 7, 2012 at 8:27 PM, Connell, Chuck chuck.conn...@nuance.comwrote: I cannot promise which is faster. A lot depends on how clever your scripts are. ** ** ** ** ** ** *From:* Sandeep Reddy P [mailto:sandeepreddy.3...@gmail.com] *Sent:* Friday, September 07, 2012 10:42 AM *To:* user@hive.apache.org *Subject:* Re: How to load csv data into HIVE ** ** Hi, I wrote a shell script to get csv data but when i run that script on a 12GB csv its taking more time. If i run a python script will that be faster? On Fri, Sep 7, 2012 at 10:39 AM, Connell, Chuck chuck.conn...@nuance.com wrote: How about a Python script that changes it into plain tab-separated text? So it would look like this… 174969274tab14-mar-2006tab3522876tab tab14-mar-2006tab50308tab65tab1newline etc… Tab-separated with newlines is easy to read and works perfectly on import. Chuck Connell Nuance RD Data Team Burlington, MA 781-565-4611 *From:* Sandeep Reddy P [mailto:sandeepreddy.3...@gmail.com] *Subject:* How to load csv data into HIVE Hi, Here is the sample data 174969274,14-mar-2006, 3522876,,14-mar-2006,50308,65,1| 174969275,19-jul-2006,3523154,,19-jul-2006,50308,65,1| 174969276,31-dec-2005,3530333,,31-dec-2005,50308,65,1| 174969277,14-apr-2005,3531470,,14-apr-2005,50308,65,1| How to load this kind of data into HIVE? I'm using shell script to get rid of double quotes and '|' but its taking very long time to work on each csv which are 12GB each. What is the best way to do this? -- Thanks, sandeep -- Thanks, sandeep
Re: How to load csv data into HIVE
Yup, Bejoy is correct :-) Just use hadoop streaming, for what it can do best --- Cleaning, Transformations and Validations, in just simple steps. Regards, Praveenesh On Sat, Sep 8, 2012 at 6:03 PM, Bejoy KS bejoy...@yahoo.com wrote: Hi Chuck I believe Praveenesh was adding his thought to the discussion on preprocessing the data using mapreduce itself. If you go with hadoop streaming you can use the python script in the mapper and that will do the preprocessing parallely on large volume data. Then this preprocessed data can be loaded into hive table. Regards Bejoy KS Sent from handheld, please excuse typos. -- *From: * Connell, Chuck chuck.conn...@nuance.com *Date: *Sat, 8 Sep 2012 12:18:33 + *To: *user@hive.apache.orguser@hive.apache.org *ReplyTo: * user@hive.apache.org *Subject: *RE: How to load csv data into HIVE I would like to hear more about this hadoop streaming to Hive idea. I have used streaming jobs as mappers, with a python script as map.py. Are you saying that such a streaming mapper can load its output into Hive? Can you send some example code? Hive wants to load files not individual lines/records. How would you do this? Thanks very much, Chuck -- *From:* praveenesh kumar [praveen...@gmail.com] *Sent:* Saturday, September 08, 2012 7:54 AM *To:* user@hive.apache.org *Subject:* Re: How to load csv data into HIVE You can use hadoop streaming that would be much faster... Just run your cleaning shell script logic in map phase and it will be done in just few minutes. That will keep the data in HDFS. Regards, Praveenesh On Fri, Sep 7, 2012 at 8:37 PM, Sandeep Reddy P sandeepreddy.3...@gmail.com wrote: Hi, Thank you all for your help. I'll try both ways and i'll get back to you. On Fri, Sep 7, 2012 at 11:02 AM, Mohammad Tariq donta...@gmail.comwrote: I said this assuming that a Hadoop cluster is available since Sandeep is planning to use Hive. If that is the case then MapReduce would be faster for such large files. Regards, Mohammad Tariq On Fri, Sep 7, 2012 at 8:27 PM, Connell, Chuck chuck.conn...@nuance.com wrote: I cannot promise which is faster. A lot depends on how clever your scripts are. ** ** ** ** ** ** *From:* Sandeep Reddy P [mailto:sandeepreddy.3...@gmail.com] *Sent:* Friday, September 07, 2012 10:42 AM *To:* user@hive.apache.org *Subject:* Re: How to load csv data into HIVE ** ** Hi, I wrote a shell script to get csv data but when i run that script on a 12GB csv its taking more time. If i run a python script will that be faster? On Fri, Sep 7, 2012 at 10:39 AM, Connell, Chuck chuck.conn...@nuance.com wrote: How about a Python script that changes it into plain tab-separated text? So it would look like this… 174969274tab14-mar-2006tab3522876tab tab14-mar-2006tab50308tab65tab1newline etc… Tab-separated with newlines is easy to read and works perfectly on import. Chuck Connell Nuance RD Data Team Burlington, MA 781-565-4611 *From:* Sandeep Reddy P [mailto:sandeepreddy.3...@gmail.com] *Subject:* How to load csv data into HIVE Hi, Here is the sample data 174969274,14-mar-2006, 3522876,,14-mar-2006,50308,65,1| 174969275,19-jul-2006,3523154,,19-jul-2006,50308,65,1| 174969276,31-dec-2005,3530333,,31-dec-2005,50308,65,1| 174969277,14-apr-2005,3531470,,14-apr-2005,50308,65,1| How to load this kind of data into HIVE? I'm using shell script to get rid of double quotes and '|' but its taking very long time to work on each csv which are 12GB each. What is the best way to do this? -- Thanks, sandeep -- Thanks, sandeep
Starting hive thrift server as daemon process ?
Hi Hive users, I was just wondering why hive thrift server is not running as a daemon process by default. Also I am not seeing any option to start it as a daemon process if I want to. I am remotely login in to the hadoop cluster and every time my session is either disconnecting or closing - thrift server is going down also. I can however start it as background process by nohup or using in linux - but just thinking - Is there a reason why this feature is not present in hive ? What would happen if I have hive thrift server will run as daemon process ? Regards, Praveenesh
Hive UDF error : numberformat exception (String to Integer) conversion
Hello Hive Users, There is a strange situation I am facing. I have a string column in my Hive table ( its IP address). I am creating a UDF where I am taking this string column and converting it into Long value. Its a simple UDF. Following is my code : package com.practice.hive.udf; public class IPtoINT extends UDF { public static LongWritable execute(Text addr) { String[] addrArray = addr.toString().split(\\.); long num = 0; for (int i=0;iaddrArray.length;i++) { int power = 3-i; num += ((Integer.parseInt(addrArray[i])%256 * Math.pow(256,power))); } return new LongWritable(num); } } After creating jar, I am running the following commands: $ hive hive add jar /home/hadoop/Desktop/HiveData/IPtoINT.jar; hive create temporary function ip2int as 'com.practice.hive.udf.IPtoINT'; hive select ip2int(ip1) from sample_data; But running the above, is giving me the following error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {ip1:1.0.144.36,ip2:16814116,country:Thailand,key:null} at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {ip1:1.0.144.36,ip2:16814116,country:Thailand,key:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text com.musigma.hive.udf.ip2int.evaluate(org.apache.hadoop.io.Text) on object com.musigma.hive.udf.ip2int@19a4d79 of class com.musigma.hive.udf.ip2int with arguments {1.0.144.36:org.apache.hadoop.io.Text} of size 1 at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:848) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:181) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:163) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:76) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531) ... 9 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:824) ... 18 more Caused by: java.lang.NumberFormatException: For input string: 1 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.init(Long.java:702) at com.musigma.hive.udf.ip2int.evaluate(ip2int.java:11) ... 23 more If I am running the HIVE UDF like --- select ip2int(102.134.123.1) from sample_data; Its not giving any error. Strange thing is, its numberformat exception. I am not able to use the string into int in my java. Very strange issue. Can someone please tell me what stupid mistake I am doing ? Is there any other UDF that does string to int/long coversion ? Regards, Praveenesh
Re: Hive UDF error : numberformat exception (String to Integer) conversion
I have done both the things. There is no null issue here. Checked the nulls also. Sorry not mentioned in the code. I also have made a main function and called my evaluate function. If I am passing a string, its working fine. Problem is of numberformat exception. Integer.parseInt is throwing this.. I don't know why... I am converting hadoop's Text object to String.. Splitting it.. converting into String array.. and giving String inside is giving me problems. What could be the reason.. I know its not at all hive related..Its java mistake only. Please help me out in resolving this issue. Sounds embarassing, but don't know why I am not able to see the mistake I am doing. Regards, Praveenesh On Wed, May 30, 2012 at 8:17 PM, Nitin Pawar nitinpawar...@gmail.comwrote: i won't tell the error but I would recommend to write a main function in your udf and try with sample inputs which you are expecting in your query. You will know whats the error you are committing On Wed, May 30, 2012 at 8:14 PM, Edward Capriolo edlinuxg...@gmail.comwrote: You should to try catch and return NULL on bad data. The issue is if you have a single bad row the UDF will throw a exception up the chain. It will try again, it will fail again, ultimately the job will fail. On Wed, May 30, 2012 at 10:40 AM, praveenesh kumar praveen...@gmail.com wrote: Hello Hive Users, There is a strange situation I am facing. I have a string column in my Hive table ( its IP address). I am creating a UDF where I am taking this string column and converting it into Long value. Its a simple UDF. Following is my code : package com.practice.hive.udf; public class IPtoINT extends UDF { public static LongWritable execute(Text addr) { String[] addrArray = addr.toString().split(\\.); long num = 0; for (int i=0;iaddrArray.length;i++) { int power = 3-i; num += ((Integer.parseInt(addrArray[i])%256 * Math.pow(256,power))); } return new LongWritable(num); } } After creating jar, I am running the following commands: $ hive hive add jar /home/hadoop/Desktop/HiveData/IPtoINT.jar; hive create temporary function ip2int as 'com.practice.hive.udf.IPtoINT'; hive select ip2int(ip1) from sample_data; But running the above, is giving me the following error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {ip1:1.0.144.36,ip2:16814116,country:Thailand,key:null} at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {ip1:1.0.144.36,ip2:16814116,country:Thailand,key:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text com.musigma.hive.udf.ip2int.evaluate(org.apache.hadoop.io.Text) on object com.musigma.hive.udf.ip2int@19a4d79 of class com.musigma.hive.udf.ip2int with arguments {1.0.144.36:org.apache.hadoop.io.Text} of size 1 at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:848) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:181) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:163) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:76) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762
Hive starting error
Hi, I am using Hive 0.7.1 on hadoop 0.20.205 While running hive. its giving me following error : Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.security.UserGroupInformation.login(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/security/UserGroupInformation; at org.apache.hadoop.hive.shims.Hadoop20Shims.getUGIForConf(Hadoop20Shims.java:448) at org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator.setConf(HadoopDefaultAuthenticator.java:51) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthenticator(HiveUtils.java:222) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:219) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:417) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Any Idea on how to resolve this issue ? Thanks, Praveenesh
Re: Hive starting error
How to resolve this thing ? Thanks, Praveenesh On Fri, Dec 30, 2011 at 3:21 PM, alo alt wget.n...@googlemail.com wrote: Hi, I think thats a typo, I was grepping 0.20.2-737 and did not found Lorg/apache/hadoop/security/UserGroupInformation. Should be org/apache/hadoop/security/UserGroupInformation. Correct me please if I wrong. - Alex On Fri, Dec 30, 2011 at 10:39 AM, praveenesh kumar praveen...@gmail.com wrote: Hi, I am using Hive 0.7.1 on hadoop 0.20.205 While running hive. its giving me following error : Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.security.UserGroupInformation.login(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/security/UserGroupInformation; at org.apache.hadoop.hive.shims.Hadoop20Shims.getUGIForConf(Hadoop20Shims.java:448) at org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator.setConf(HadoopDefaultAuthenticator.java:51) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthenticator(HiveUtils.java:222) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:219) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:417) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Any Idea on how to resolve this issue ? Thanks, Praveenesh -- Alexander Lorenz http://mapredit.blogspot.com P Think of the environment: please don't print this email unless you really need to.