[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: (was: udf_cosine_similarity-v01.patch) create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: HIVE-9557.1.patch, HIVE-9557.2.patch, HIVE-9557.3.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils
[ https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-11137: -- Assignee: Owen O'Malley (was: Nishant Kelkar) In DateWritable remove the use of LazyBinaryUtils - Key: HIVE-11137 URL: https://issues.apache.org/jira/browse/HIVE-11137 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-11137.1.patch Currently the DateWritable class uses LazyBinaryUtils, which has a lot of dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils
[ https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-11137: -- Attachment: (was: HIVE-11137.1.patch) In DateWritable remove the use of LazyBinaryUtils - Key: HIVE-11137 URL: https://issues.apache.org/jira/browse/HIVE-11137 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Currently the DateWritable class uses LazyBinaryUtils, which has a lot of dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: (was: HIVE-9557.1.patch) create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Labels: CosineSimilarity, SimilarityMetric, UDF algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: (was: HIVE-9557.3.patch) create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Labels: CosineSimilarity, SimilarityMetric, UDF algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: (was: HIVE-9557.2.patch) create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Labels: CosineSimilarity, SimilarityMetric, UDF algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils
[ https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610813#comment-14610813 ] Nishant Kelkar commented on HIVE-11137: --- Is this an unrelated test failure? In DateWritable remove the use of LazyBinaryUtils - Key: HIVE-11137 URL: https://issues.apache.org/jira/browse/HIVE-11137 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Nishant Kelkar Attachments: HIVE-11137.1.patch Currently the DateWritable class uses LazyBinaryUtils, which has a lot of dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils
[ https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610816#comment-14610816 ] Nishant Kelkar commented on HIVE-11137: --- From the hive.log, I see the following two issues: {code} 2015-07-01 11:13:28,877 ERROR [Thread-17]: thrift.ThriftCLIService (ThriftBinaryCLIService.java:run(101)) - Error starting HiveServer2: could not start ThriftBinaryCLIService org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:1. at org.apache.thrift.transport.TServerSocket.init(TServerSocket.java:109) at org.apache.thrift.transport.TServerSocket.init(TServerSocket.java:91) at org.apache.thrift.transport.TServerSocket.init(TServerSocket.java:87) at org.apache.hive.service.auth.HiveAuthFactory.getServerSocket(HiveAuthFactory.java:241) at org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:66) at java.lang.Thread.run(Thread.java:744) {code} and {code} 2015-07-01 11:13:18,009 DEBUG [main]: util.Shell (Shell.java:checkHadoopHome(320)) - Failed to detect a valid hadoop home directory java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set. at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:302) at org.apache.hadoop.util.Shell.clinit(Shell.java:327) at org.apache.hadoop.hive.conf.HiveConf$ConfVars.findHadoopBinary(HiveConf.java:2375) at org.apache.hadoop.hive.conf.HiveConf$ConfVars.clinit(HiveConf.java:366) at org.apache.hadoop.hive.conf.HiveConf.clinit(HiveConf.java:105) at org.apache.hive.service.auth.TestCustomAuthentication.setUp(TestCustomAuthentication.java:45) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) {code} In DateWritable remove the use of LazyBinaryUtils - Key: HIVE-11137 URL: https://issues.apache.org/jira/browse/HIVE-11137 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Nishant Kelkar Attachments: HIVE-11137.1.patch Currently the DateWritable class uses LazyBinaryUtils, which has a lot of dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils
[ https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609881#comment-14609881 ] Nishant Kelkar commented on HIVE-11137: --- BTW, let me know if submitting patch != taking ownership of task in general. That way, I can hand it back to you (still learning all the rules here). Thank you! In DateWritable remove the use of LazyBinaryUtils - Key: HIVE-11137 URL: https://issues.apache.org/jira/browse/HIVE-11137 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Nishant Kelkar Attachments: HIVE-11137.1.patch Currently the DateWritable class uses LazyBinaryUtils, which has a lot of dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils
[ https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605940#comment-14605940 ] Nishant Kelkar commented on HIVE-11137: --- LazyBinaryUtils used only for readVInt() and writeVInt(). Relevant sections of code from LazyBinaryUtils: {code} private static ThreadLocalbyte[] vLongBytesThreadLocal = new ThreadLocalbyte[]() { @Override public byte[] initialValue() { return new byte[9]; } }; public static void writeVLong(RandomAccessOutput byteStream, long l) { byte[] vLongBytes = vLongBytesThreadLocal.get(); int len = LazyBinaryUtils.writeVLongToByteArray(vLongBytes, l); byteStream.write(vLongBytes, 0, len); } {code} {code} /** * Reads a zero-compressed encoded int from a byte array and returns it. * * @param bytes * the byte array * @param offset * offset of the array to read from * @param vInt * storing the deserialized int and its size in byte */ public static void readVInt(byte[] bytes, int offset, VInt vInt) { byte firstByte = bytes[offset]; vInt.length = (byte) WritableUtils.decodeVIntSize(firstByte); if (vInt.length == 1) { vInt.value = firstByte; return; } int i = 0; for (int idx = 0; idx vInt.length - 1; idx++) { byte b = bytes[offset + 1 + idx]; i = i 8; i = i | (b 0xFF); } vInt.value = (WritableUtils.isNegativeVInt(firstByte) ? (i ^ -1) : i); } {code} I could contribute a patch towards this task [~owen.omalley] (I'm a beginner contributor in Hive, looking around for work :)). Thanks and let me know! In DateWritable remove the use of LazyBinaryUtils - Key: HIVE-11137 URL: https://issues.apache.org/jira/browse/HIVE-11137 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Currently the DateWritable class uses LazyBinaryUtils, which has a lot of dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: HIVE-9557.3.patch Attaching revision #3 patch to remove hidden dependency on FastMath (it comes in via org.apache.spark:spark-core_2.10 dependency) from commons-math3. Using library Math instead. create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: HIVE-9557.1.patch, HIVE-9557.2.patch, HIVE-9557.3.patch, udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: HIVE-9557.2.patch create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: HIVE-9557.1.patch, HIVE-9557.2.patch, udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: HIVE-9557.1.patch Attached first revision on cosine similarity UDF. create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: HIVE-9557.1.patch, udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604068#comment-14604068 ] Nishant Kelkar commented on HIVE-9557: -- Figured out the issue. Made a dummy var. HADOOP_HOME point to HIVE_HOME. Also, removed commented out queries from the udf_cosine_similarity.q clientpositive file. I'll upload a patch with an RB link soon. create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604349#comment-14604349 ] Nishant Kelkar commented on HIVE-9557: -- Done. Could you please test for access now? create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: HIVE-9557.1.patch, udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604335#comment-14604335 ] Nishant Kelkar commented on HIVE-9557: -- Hey Alexander, Hmmm, in the review settings, I've added the group 'hive' and the user 'apivovarov'. I used rbt to create and upload the ticket to the Apache server. create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: HIVE-9557.1.patch, udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603987#comment-14603987 ] Nishant Kelkar commented on HIVE-9557: -- Hi [~apivovarov], I followed your instructions, and everything went fine till the step where I run the TestCliDriver with 'mvn test'. I get the following exception in ./itests/qtest/tmp/log/hive.log: {code} 2015-06-26 22:25:47,656 DEBUG [main]: util.Shell (Shell.java:checkHadoopHome(320)) - Failed to detect a valid hadoop home directory java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set. at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:302) at org.apache.hadoop.util.Shell.clinit(Shell.java:327) at org.apache.hadoop.hive.conf.HiveConf$ConfVars.findHadoopBinary(HiveConf.java:2371) at org.apache.hadoop.hive.conf.HiveConf$ConfVars.clinit(HiveConf.java:366) at org.apache.hadoop.hive.conf.HiveConf.clinit(HiveConf.java:105) at org.apache.hadoop.hive.ql.QTestUtil.init(QTestUtil.java:354) at org.apache.hadoop.hive.cli.TestCliDriver.clinit(TestCliDriver.java:53) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.internal.runners.SuiteMethod.testFromSuiteMethod(SuiteMethod.java:35) at org.junit.internal.runners.SuiteMethod.init(SuiteMethod.java:24) at org.junit.internal.builders.SuiteMethodBuilder.runnerForClass(SuiteMethodBuilder.java:11) at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59) at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:26) at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59) at org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:26) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:262) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) 2015-06-26 22:25:47,669 DEBUG [main]: util.Shell (Shell.java:isSetsidSupported(392)) - setsid is not available on this machine. So not using it. 2015-06-26 22:25:47,669 DEBUG [main]: util.Shell (Shell.java:isSetsidSupported(396)) - setsid exited with exit code 0 2015-06-26 22:25:48,408 WARN [main]: conf.HiveConf (HiveConf.java:initialize(2802)) - HiveConf of name hive.dummyparam.test.server.specific.config.metastoresite does not exist 2015-06-26 22:25:48,409 WARN [main]: conf.HiveConf (HiveConf.java:initialize(2802)) - HiveConf of name hive.ql.log.PerfLogger.level does not exist 2015-06-26 22:25:48,409 WARN [main]: conf.HiveConf (HiveConf.java:initialize(2802)) - HiveConf of name hive.dummyparam.test.server.specific.config.hivesite does not exist 2015-06-26 22:25:48,409 WARN [main]: conf.HiveConf (HiveConf.java:initialize(2802)) - HiveConf of name hive.dummyparam.test.server.specific.config.override does not exist 2015-06-26 22:25:48,410 WARN [main]: conf.HiveConf (HiveConf.java:initialize(2802)) - HiveConf of name hive.metastore.metadb.dir does not exist 2015-06-26 22:25:48,477 INFO [main]: server.ZooKeeperServer (Environment.java:logEnv(100)) - Server environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 2015-06-26 22:25:48,477 INFO [main]: server.ZooKeeperServer (Environment.java:logEnv(100)) - Server environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 2015-06-26 22:25:48,477 INFO [main]: server.ZooKeeperServer (Environment.java:logEnv(100)) - Server environment:host.name=localhost 2015-06-26 22:25:48,477 INFO [main]: server.ZooKeeperServer (Environment.java:logEnv(100)) - Server environment:host.name=localhost 2015-06-26 22:25:48,477 INFO [main]: server.ZooKeeperServer (Environment.java:logEnv(100)) - Server environment:java.version=1.7.0_67 2015-06-26 22:25:48,477 INFO [main]: server.ZooKeeperServer (Environment.java:logEnv(100)) - Server environment:java.version=1.7.0_67 2015-06-26 22:25:48,477 INFO [main]: server.ZooKeeperServer (Environment.java:logEnv(100)) - Server environment:java.vendor=Oracle Corporation 2015-06-26 22:25:48,477 INFO [main]: server.ZooKeeperServer (Environment.java:logEnv(100)) -
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603988#comment-14603988 ] Nishant Kelkar commented on HIVE-9557: -- The TestCliDriver tests actually fail with the following error: {code} --- T E S T S --- Running org.apache.hadoop.hive.cli.TestCliDriver Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 71.797 sec FAILURE! - in org.apache.hadoop.hive.cli.TestCliDriver testCliDriver_udf_cosine_similarity(org.apache.hadoop.hive.cli.TestCliDriver) Time elapsed: 0.346 sec FAILURE! junit.framework.AssertionFailedError: Unexpected exception junit.framework.AssertionFailedError: Client Execution failed with error code = 10014 running select cosine_similarity('kitten', 'sitting', ' '), cosine_similarity('sitting kitten', 'kitten sitting', ' '), cosine_similarity('sitting kitten', 'sitting kittens', ' '), cosine_similarity('two#delimiters,here', 'two#delimiters#,here,too', '#,'), cosine_similarity('test string', '', ' '), cosine_similarity(cast(null as string), 'test string', ' '), cosine_similarity('test string', cast(null as string), ','), cosine_similarity(cast(null as string), cast(null as string), ' '), cosine_similarity('a string', 'another string', '') See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ for specific test cases logs. at junit.framework.Assert.fail(Assert.java:57) at org.apache.hadoop.hive.ql.QTestUtil.failed(QTestUtil.java:1984) at org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:152) at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_cosine_similarity(TestCliDriver.java:134) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) {code} create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11114) Documentation of Pentaho Missing from Maven Central
[ https://issues.apache.org/jira/browse/HIVE-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601720#comment-14601720 ] Nishant Kelkar commented on HIVE-4: --- [~leftylev] tagging you here for more help/info. Documentation of Pentaho Missing from Maven Central --- Key: HIVE-4 URL: https://issues.apache.org/jira/browse/HIVE-4 Project: Hive Issue Type: Task Reporter: Nishant Kelkar Assignee: Nishant Kelkar Priority: Minor I recently cloned the Hive Git repository. When I went into the hive/ql sub-project and issued the command 'mvn clean compile -Phadoop-1', I got the following build error: [ERROR] Failed to execute goal on project hive-exec: Could not resolve dependencies for project org.apache.hive:hive-exec:jar:2.0.0-SNAPSHOT: Could not find artifact org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde in US (http://repo.maven.apache.org/maven2) - [Help 1] This is because the pentaho-aggdesigner-algorithm dependency is not supported by Maven central; however, it is supported by Conjars. As a quick fix, I downloaded the jar from Conjars repo, and manually installed this dependency to my local Maven by following the instructions here: http://www.mkyong.com/maven/how-to-include-library-manully-into-maven-local-repository/ However, I feel this dependency should be supported on Maven central (I'm not sure where to create this ticket/whom with, but Hive is my use case, so any pointers greatly appreciated). This ticket tracks the task of documenting this fact on the Hive wiki as an additional Note. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598936#comment-14598936 ] Nishant Kelkar commented on HIVE-9557: -- [~apivovarov]: The reference implementation link you've provided seems to be broken. Did you mean to point here? -- https://github.com/Simmetrics/simmetrics/blob/master/simmetrics-core/src/main/java/org/simmetrics/metrics/CosineSimilarity.java create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11091) Unable to load data into hive table using Load data local inapth command from unix named pipe
[ https://issues.apache.org/jira/browse/HIVE-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599283#comment-14599283 ] Nishant Kelkar commented on HIVE-11091: --- I did a diff of Hive 0.11 vs. Hive 0.14 for the piece of code within MoveTask that is causing this error: Hive-0.11: {code} Table table = db.getTable(tbd.getTable().getTableName()); if (work.getCheckFileFormat()) { // Get all files from the src directory FileStatus[] dirs; ArrayListFileStatus files; FileSystem fs; try { fs = FileSystem.get(table.getDataLocation(), conf); dirs = fs.globStatus(new Path(tbd.getSourceDir())); files = new ArrayListFileStatus(); for (int i = 0; (dirs != null i dirs.length); i++) { files.addAll(Arrays.asList(fs.listStatus(dirs[i].getPath(; // We only check one file, so exit the loop when we have at least // one. if (files.size() 0) { break; } } } catch (IOException e) { throw new HiveException( addFiles: filesystem error in check phase, e); } if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVECHECKFILEFORMAT)) { // Check if the file format of the file matches that of the table. boolean flag = HiveFileFormatUtils.checkInputFormat( fs, conf, tbd.getTable().getInputFileFormatClass(), files); if (!flag) { throw new HiveException( Wrong file format. Please check the file's format.); } } } {code} Hive-0.14: {code} Table table = db.getTable(tbd.getTable().getTableName()); if (work.getCheckFileFormat()) { // Get all files from the src directory FileStatus[] dirs; ArrayListFileStatus files; FileSystem srcFs; // source filesystem try { srcFs = tbd.getSourcePath().getFileSystem(conf); dirs = srcFs.globStatus(tbd.getSourcePath()); files = new ArrayListFileStatus(); for (int i = 0; (dirs != null i dirs.length); i++) { files.addAll(Arrays.asList(srcFs.listStatus(dirs[i].getPath(), FileUtils.HIDDEN_FILES_PATH_FILTER))); // We only check one file, so exit the loop when we have at least // one. if (files.size() 0) { break; } } } catch (IOException e) { throw new HiveException( addFiles: filesystem error in check phase, e); } if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVECHECKFILEFORMAT)) { // Check if the file format of the file matches that of the table. boolean flag = HiveFileFormatUtils.checkInputFormat( srcFs, conf, tbd.getTable().getInputFileFormatClass(), files); if (!flag) { throw new HiveException( Wrong file format. Please check the file's format.); } } } {code} Unable to load data into hive table using Load data local inapth command from unix named pipe --- Key: HIVE-11091 URL: https://issues.apache.org/jira/browse/HIVE-11091 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 0.14.0 Environment: Unix,MacOS Reporter: Manoranjan Sahoo Priority: Blocker Unable to load data into hive table from unix named pipe in Hive 0.14.0 Please find below the execution details in env ( Hadoop2.6.0 + Hive 0.14.0): $ mkfifo /tmp/test.txt $ hive hive create table test(id bigint,name string); OK Time taken: 1.018 seconds hive LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; Loading data to table default.test Failed with exception addFiles: filesystem error in check phase FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask But in Hadoop 1.3 and hive 0.11.0 it works fine: hive LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; Copying data from file:/tmp/test.txt Copying file: file:/tmp/test.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: udf_cosine_similarity-v01.patch create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599265#comment-14599265 ] Nishant Kelkar commented on HIVE-9557: -- Hey [~kinow] and [~apivovarov], I've added a patch for the cosine similarity metric UDF and some test cases. This is my first time submitting a patch, so I guess I'm allowed 1 chance at the following question? :) What are all the next steps in this process, once a patch has been uploaded? I could also add this correspondence in an email to d...@hive.apache.org, for everyone else's benefit. Thanks! create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11091) Unable to load data into hive table using Load data local inapth command from unix named pipe
[ https://issues.apache.org/jira/browse/HIVE-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599301#comment-14599301 ] Nishant Kelkar commented on HIVE-11091: --- The only significant change I see in above code snippets is: {code} srcFs = tbd.getSourcePath().getFileSystem(conf); dirs = srcFs.globStatus(tbd.getSourcePath()); {code} i.e. the way in which we get the file system handle and a list of the directories/files within the path provided. Unable to load data into hive table using Load data local inapth command from unix named pipe --- Key: HIVE-11091 URL: https://issues.apache.org/jira/browse/HIVE-11091 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 0.14.0 Environment: Unix,MacOS Reporter: Manoranjan Sahoo Priority: Blocker Unable to load data into hive table from unix named pipe in Hive 0.14.0 Please find below the execution details in env ( Hadoop2.6.0 + Hive 0.14.0): $ mkfifo /tmp/test.txt $ hive hive create table test(id bigint,name string); OK Time taken: 1.018 seconds hive LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; Loading data to table default.test Failed with exception addFiles: filesystem error in check phase FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask But in Hadoop 1.3 and hive 0.11.0 it works fine: hive LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; Copying data from file:/tmp/test.txt Copying file: file:/tmp/test.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar reassigned HIVE-9557: Assignee: Nishant Kelkar (was: Alexander Pivovarov) create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600145#comment-14600145 ] Nishant Kelkar commented on HIVE-9557: -- [~apivovarov], I had a question: When I prepare a clientpositives/udf_cosine_similarity.q and a clientnegative/udf_cosine_similarity.q, how do I run these? Also, how do I create the q.out file? create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599747#comment-14599747 ] Nishant Kelkar commented on HIVE-9557: -- Thanks for the pointers! I'll modify the patch per your instructions and reupload. Thanks for working with me through my first patch! :) create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)