[jira] [Commented] (HIVE-9664) Hive add jar command should be able to download and add jars from a repository
[ https://issues.apache.org/jira/browse/HIVE-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317581#comment-14317581 ] Edward Capriolo commented on HIVE-9664: --- Just so you know. You can use a groovy for writing UDFs currently, and groovy has some @GRAB support for chasing down dependencies. Check hives COMPILE syntax Hive add jar command should be able to download and add jars from a repository Key: HIVE-9664 URL: https://issues.apache.org/jira/browse/HIVE-9664 Project: Hive Issue Type: Improvement Reporter: Anant Nag Labels: hive Currently Hive's add jar command takes a local path to the dependency jar. This clutters the local file-system as users may forget to remove this jar later It would be nice if Hive supported a Gradle like notation to download the jar from a repository. Example: add jar org:module:version It should also be backward compatible and should take jar from the local file-system as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8614) Upgrade hive to use tez version 0.5.2-SNAPSHOT
[ https://issues.apache.org/jira/browse/HIVE-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312431#comment-14312431 ] Edward Capriolo commented on HIVE-8614: --- This is bad. Hive should not depend on SNAPSHOT releases. Why do we keep doing this? Upgrade hive to use tez version 0.5.2-SNAPSHOT -- Key: HIVE-8614 URL: https://issues.apache.org/jira/browse/HIVE-8614 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8614.1.patch, HIVE-8614.2.patch, HIVE-8614.3.patch, HIVE-8614.4.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8848) data loading from text files or text file processing doesn't handle nulls correctly
[ https://issues.apache.org/jira/browse/HIVE-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218906#comment-14218906 ] Edward Capriolo commented on HIVE-8848: --- Nulls are supposed to be stored as a literal \N data loading from text files or text file processing doesn't handle nulls correctly --- Key: HIVE-8848 URL: https://issues.apache.org/jira/browse/HIVE-8848 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8848.patch I am not sure how nulls are supposed to be stored in text tables, but after loading some data with null or NULL strings, or x00 characters, we get bunch of annoying logging from LazyPrimitive that data is not in INT format and was converted to null, with data being null (string saying null, I assume from the code). Either load should load them as nulls, or there should be some defined way to load nulls. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8848) data loading from text files or text file processing doesn't handle nulls correctly
[ https://issues.apache.org/jira/browse/HIVE-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-8848: -- Status: Open (was: Patch Available) You can re-open if I am wrong but in TextInputFormats null is '\N' I think this is defined in LazySimpleSerde data loading from text files or text file processing doesn't handle nulls correctly --- Key: HIVE-8848 URL: https://issues.apache.org/jira/browse/HIVE-8848 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8848.patch I am not sure how nulls are supposed to be stored in text tables, but after loading some data with null or NULL strings, or x00 characters, we get bunch of annoying logging from LazyPrimitive that data is not in INT format and was converted to null, with data being null (string saying null, I assume from the code). Either load should load them as nulls, or there should be some defined way to load nulls. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-8848) data loading from text files or text file processing doesn't handle nulls correctly
[ https://issues.apache.org/jira/browse/HIVE-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218908#comment-14218908 ] Edward Capriolo edited comment on HIVE-8848 at 11/20/14 2:27 AM: - You can re-submit and merge if I am wrong but in TextInputFormats null is '\N' I think this is defined in LazySimpleSerde was (Author: appodictic): You can re-open if I am wrong but in TextInputFormats null is '\N' I think this is defined in LazySimpleSerde data loading from text files or text file processing doesn't handle nulls correctly --- Key: HIVE-8848 URL: https://issues.apache.org/jira/browse/HIVE-8848 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8848.patch I am not sure how nulls are supposed to be stored in text tables, but after loading some data with null or NULL strings, or x00 characters, we get bunch of annoying logging from LazyPrimitive that data is not in INT format and was converted to null, with data being null (string saying null, I assume from the code). Either load should load them as nulls, or there should be some defined way to load nulls. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5538) Turn on vectorization by default.
[ https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207314#comment-14207314 ] Edward Capriolo commented on HIVE-5538: --- I do not like the idea of turning on vectorize by default until we have a way to test both code paths, and am -1 until this is addressed. Turn on vectorization by default. - Key: HIVE-5538 URL: https://issues.apache.org/jira/browse/HIVE-5538 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Matt McCline Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch, HIVE-5538.61.patch, HIVE-5538.62.patch Vectorization should be turned on by default, so that users don't have to specifically enable vectorization. Vectorization code validates and ensures that a query falls back to row mode if it is not supported on vectorized code path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098623#comment-14098623 ] Edward Capriolo commented on HIVE-1434: --- There is going to be no jira. I am doing the code here https://github.com/edwardcapriolo/hive-cassandra-ng/blob/master/src/main/java/io/teknek/hive/cassandra/CassandraSerde.java Please do not share this link. I have not had time to commit the licence file yet and I would not want it to end up into 50 others peoples github again. Cassandra Storage Handler - Key: HIVE-1434 URL: https://issues.apache.org/jira/browse/HIVE-1434 Project: Hive Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-1434-r1182878.patch, cas-handle.tar.gz, cass_handler.diff, hive-1434-1.txt, hive-1434-2-patch.txt, hive-1434-2011-02-26.patch.txt, hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-14.patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt, hive-1434-5.patch.txt, hive-1434.2011-02-27.diff.txt, hive-cassandra.2011-02-25.txt, hive.diff Add a cassandra storage handler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5538) Turn on vectorization by default.
[ https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088074#comment-14088074 ] Edward Capriolo commented on HIVE-5538: --- {quote}If so, one possibility is to turn it on only for unit tests {quote} I would not suggest this. We would be saying, Hive 0.15 is tested and ready for release! A user would download and use hive 0.15 and if they found a bug the reason would be because we are not actually testing the code we shipped. Unless we plan on removing the non-vectorized code path we have to test it. To do that we need the answer to some important questions: * Is vector ALWAYS better/faster? * Can vector capable of EVERYTHING non vector can not do? Until we can answer yes to both of the above points, we can not remove the non-vectorized code paths. Until we remove the non-vectorized code paths we have to test them. As I said above I think we need a stanza at the top of the Q files that defines permutations of testing parameters. --testwith vectorized+mr, vectorized+tez, !vectorized+mr --testwith (hive.local.mode=true hive.localmode=false) etc. I think that is the only way to keep the project sane. Turn on vectorization by default. - Key: HIVE-5538 URL: https://issues.apache.org/jira/browse/HIVE-5538 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch Vectorization should be turned on by default, so that users don't have to specifically enable vectorization. Vectorization code validates and ensures that a query falls back to row mode if it is not supported on vectorized code path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5538) Turn on vectorization by default.
[ https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062176#comment-14062176 ] Edward Capriolo commented on HIVE-5538: --- I would suggest that we handle this by putting lines at the top of the .Q files that specify which permutation of ways this class need to be tested maybe like --testwith vectorized+mr, vectorized+tez, !vectorized+mr Turn on vectorization by default. - Key: HIVE-5538 URL: https://issues.apache.org/jira/browse/HIVE-5538 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch Vectorization should be turned on by default, so that users don't have to specifically enable vectorization. Vectorization code validates and ensures that a query falls back to row mode if it is not supported on vectorized code path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5538) Turn on vectorization by default.
[ https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062456#comment-14062456 ] Edward Capriolo commented on HIVE-5538: --- This is especially relavant since we are also developing spark support giving us another testing permutation :( Turn on vectorization by default. - Key: HIVE-5538 URL: https://issues.apache.org/jira/browse/HIVE-5538 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch Vectorization should be turned on by default, so that users don't have to specifically enable vectorization. Vectorization code validates and ensures that a query falls back to row mode if it is not supported on vectorized code path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7025) Support retention on hive tables
[ https://issues.apache.org/jira/browse/HIVE-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054910#comment-14054910 ] Edward Capriolo commented on HIVE-7025: --- I like this. I coded something like this by hand at my last job. I did run into issue where tables with huge number of partitions caused memory issues and had to page / limit the number of objects I would access in one go. Anyway your test case does not include a partitioned table. I am assuming the reinvention is set on the table but you are using the create time of the partition? It might be good to include that in your unit test. Support retention on hive tables Key: HIVE-7025 URL: https://issues.apache.org/jira/browse/HIVE-7025 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-7025.1.patch.txt, HIVE-7025.2.patch.txt, HIVE-7025.3.patch.txt Add self destruction properties for temporary tables. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3392) Hive unnecessarily validates table SerDes when dropping a table
[ https://issues.apache.org/jira/browse/HIVE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040802#comment-14040802 ] Edward Capriolo commented on HIVE-3392: --- Please feel free to take over the review. I will not have any time at the moment. Thanks! Hive unnecessarily validates table SerDes when dropping a table --- Key: HIVE-3392 URL: https://issues.apache.org/jira/browse/HIVE-3392 Project: Hive Issue Type: Bug Affects Versions: 0.9.0 Reporter: Jonathan Natkins Assignee: Navis Labels: patch Attachments: HIVE-3392.2.patch.txt, HIVE-3392.3.patch.txt, HIVE-3392.Test Case - with_trunk_version.txt natty@hadoop1:~$ hive hive add jar /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar; Added /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar to class path Added resource: /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar hive create table test (a int) row format serde 'hive.serde.JSONSerDe'; OK Time taken: 2.399 seconds natty@hadoop1:~$ hive hive drop table test; FAILED: Hive Internal Error: java.lang.RuntimeException(MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe hive.serde.JSONSerDe does not exist)) java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe hive.serde.JSONSerDe does not exist) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:262) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:943) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeDropTable(DDLSemanticAnalyzer.java:700) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:210) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Caused by: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe com.cloudera.hive.serde.JSONSerDe does not exist) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:211) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260) ... 20 more hive add jar /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar; Added /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar to class path Added resource: /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar hive drop table test; OK Time taken: 0.658 seconds hive -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5857) Reduce tasks do not work in uber mode in YARN
[ https://issues.apache.org/jira/browse/HIVE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029325#comment-14029325 ] Edward Capriolo commented on HIVE-5857: --- {code} } catch (FileNotFoundException fnf) { // happens. e.g.: no reduce work. LOG.debug(No plan file found: +path); return null; } ... {code} Can we remove this code? This bothers me. It is not self documenting all. Can we use if statements to determine when the file should be there and when it should not. Something like: if (job.hasNoReduceWork()){ retur null; } else { throw RuntimeException(work should be found but was not + expectedPathToFile); Reduce tasks do not work in uber mode in YARN - Key: HIVE-5857 URL: https://issues.apache.org/jira/browse/HIVE-5857 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.13.1 Reporter: Adam Kawa Assignee: Adam Kawa Priority: Critical Labels: plan, uber-jar, uberization, yarn Fix For: 0.13.0 Attachments: HIVE-5857.1.patch.txt, HIVE-5857.2.patch, HIVE-5857.3.patch A Hive query fails when it tries to run a reduce task in uber mode in YARN. The NullPointerException is thrown in the ExecReducer.configure method, because the plan file (reduce.xml) for a reduce task is not found. The Utilities.getBaseWork method is expected to return BaseWork object, but it returns NULL due to FileNotFoundException. {code} // org.apache.hadoop.hive.ql.exec.Utilities public static BaseWork getBaseWork(Configuration conf, String name) { ... try { ... if (gWork == null) { Path localPath; if (ShimLoader.getHadoopShims().isLocalMode(conf)) { localPath = path; } else { localPath = new Path(name); } InputStream in = new FileInputStream(localPath.toUri().getPath()); BaseWork ret = deserializePlan(in); } return gWork; } catch (FileNotFoundException fnf) { // happens. e.g.: no reduce work. LOG.debug(No plan file found: +path); return null; } ... } {code} It happens because, the ShimLoader.getHadoopShims().isLocalMode(conf)) method returns true, because immediately before running a reduce task, org.apache.hadoop.mapred.LocalContainerLauncher changes its configuration to local mode (mapreduce.framework.name is changed from yarn to local). On the other hand map tasks run successfully, because its configuration is not changed and still remains yarn. {code} // org.apache.hadoop.mapred.LocalContainerLauncher private void runSubtask(..) { ... conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.LOCAL_FRAMEWORK_NAME); conf.set(MRConfig.MASTER_ADDRESS, local); // bypass shuffle ReduceTask reduce = (ReduceTask)task; reduce.setConf(conf); reduce.run(conf, umbilical); } {code} A super quick fix could just an additional if-branch, where we check if we run a reduce task in uber mode, and then look for a plan file in a different location. *Java stacktrace* {code} 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: hdfs://namenode.c.lon.spotify.net:54310/var/tmp/kawaa/hive_2013-11-20_00-50-43_888_3938384086824086680-2/-mr-10003/e3caacf6-15d6-4987-b186-d2906791b5b0/reduce.xml 2013-11-20 00:50:56,862 WARN [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local (uberized) 'child' : java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:427) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:340) at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:225) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 7 more
[jira] [Resolved] (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo resolved HIVE-1434. --- Resolution: Won't Fix Cassandra Storage Handler - Key: HIVE-1434 URL: https://issues.apache.org/jira/browse/HIVE-1434 Project: Hive Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-1434-r1182878.patch, cas-handle.tar.gz, cass_handler.diff, hive-1434-1.txt, hive-1434-2-patch.txt, hive-1434-2011-02-26.patch.txt, hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-14.patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt, hive-1434-5.patch.txt, hive-1434.2011-02-27.diff.txt, hive-cassandra.2011-02-25.txt, hive.diff Add a cassandra storage handler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025539#comment-14025539 ] Edward Capriolo commented on HIVE-1434: --- This feature is a complete utter failure. It was never committed to hive. It was never committed to cassandra. I find ~40 forks of the code that are likely derivative works that make no reference to me or hive and all types of people are now asserting copyright over it. I am closing this issue and making a clean room implementation of a new handler. Cassandra Storage Handler - Key: HIVE-1434 URL: https://issues.apache.org/jira/browse/HIVE-1434 Project: Hive Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-1434-r1182878.patch, cas-handle.tar.gz, cass_handler.diff, hive-1434-1.txt, hive-1434-2-patch.txt, hive-1434-2011-02-26.patch.txt, hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-14.patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt, hive-1434-5.patch.txt, hive-1434.2011-02-27.diff.txt, hive-cassandra.2011-02-25.txt, hive.diff Add a cassandra storage handler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7115) Support a mechanism for running hive locally that doesnt require having a hadoop executable.
[ https://issues.apache.org/jira/browse/HIVE-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025767#comment-14025767 ] Edward Capriolo commented on HIVE-7115: --- That would be really nice especially if it could be extended to dependent projects https://github.com/edwardcapriolo/hive_test requires lots of trickery to launch a hive process. Support a mechanism for running hive locally that doesnt require having a hadoop executable. Key: HIVE-7115 URL: https://issues.apache.org/jira/browse/HIVE-7115 Project: Hive Issue Type: Improvement Components: Testing Infrastructure, Tests Reporter: jay vyas Mapreduce has a local mode by default, and likewise, tools such as pig and SOLR do as well, maybe we can have a first class local mode for hive also. For local integration testing of a hadoop app, it would be nice if we could fire up a local hive instance which didnt require bin/hadoop for running local jobs. This would allow us to maintain polyglot hadoop applications much easier by incorporating hive into the integration tests. For example: {noformat} LocalHiveInstance hive = new LocalHiveInstance(); hive.set(course,crochet)l hive.runScript(hive_flow.ql)l {noformat} Would essentially run a local hive query which mirrors {noformat} hive -f hive_flow.ql -hiveconf course=crochet {noformat{ It seems like thee might be a simple way to do this, at least for small data sets, by putting some kind of alternative (i.e. in memory) execution environment under hive, if one is not already underway ? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5538) Turn on vectorization by default.
[ https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016115#comment-14016115 ] Edward Capriolo commented on HIVE-5538: --- To be clear we need a long term solution to rigorously test both code paths. Defaulting vectorization on could lead to rot in non vectorized code paths. Turn on vectorization by default. - Key: HIVE-5538 URL: https://issues.apache.org/jira/browse/HIVE-5538 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch Vectorization should be turned on by default, so that users don't have to specifically enable vectorization. Vectorization code validates and ensures that a query falls back to row mode if it is not supported on vectorized code path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5538) Turn on vectorization by default.
[ https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016114#comment-14016114 ] Edward Capriolo commented on HIVE-5538: --- Do we thing we are rushing this? Besides these test errors a vectorization udfs bug was reported on the mailing list this week. Is it prudent to switch this? If we switch this how will the original code path be tested? Turn on vectorization by default. - Key: HIVE-5538 URL: https://issues.apache.org/jira/browse/HIVE-5538 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch Vectorization should be turned on by default, so that users don't have to specifically enable vectorization. Vectorization code validates and ensures that a query falls back to row mode if it is not supported on vectorized code path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7121) Use murmur hash to distribute HiveKey
[ https://issues.apache.org/jira/browse/HIVE-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008377#comment-14008377 ] Edward Capriolo commented on HIVE-7121: --- Does this effect bucketed tables? I think it does and then we can not just change the hash code because that would break assumptions of what is in the bucket. IE i create a bucket in hive 12, and in hive 13 different data would be in the bucket. I think this is why: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_quotedid_smb These tests are failing. If this is the case we need a way or recording the hashcode in the metadata for the table. Use murmur hash to distribute HiveKey - Key: HIVE-7121 URL: https://issues.apache.org/jira/browse/HIVE-7121 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-7121.1.patch, HIVE-7121.WIP.patch The current hashCode implementation produces poor parallelism when dealing with single integers or doubles. And for partitioned inserts into a 1 bucket table, there is a significant hotspot on Reducer #31. Removing the magic number 31 and using a more normal hash algorithm would help fix these hotspots. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7025) TTL on hive tables
[ https://issues.apache.org/jira/browse/HIVE-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992808#comment-13992808 ] Edward Capriolo commented on HIVE-7025: --- We do something similar however we also have the ability to delete partitions over a certain age. Hive already has a property inside every table called retention that we could consider using. This code is a good first step but I have one question. Isn't this code rather racey? If we have multiple CLIs running threads they could all be simultaneously deleting tables, and a CLI with a system with a misconfiguration clock could potentially delete all the tables. I think if we do this it should be a stand alone piece. TTL on hive tables -- Key: HIVE-7025 URL: https://issues.apache.org/jira/browse/HIVE-7025 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-7025.1.patch.txt Add self destruction properties for temporary tables. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6469) skipTrash option in hive command line
[ https://issues.apache.org/jira/browse/HIVE-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977685#comment-13977685 ] Edward Capriolo commented on HIVE-6469: --- If a user is willing to commit an optional syntax that does not cause a language ambiguity I think we should allow the user to add the feature. Rational : currently dfs -rm allows an optional --skip trash. Normal users are able to control if a delete skips trash or not, regardless of how admins set the trash feature. A natual extension is to extend this functionality to drop table. skipTrash option in hive command line - Key: HIVE-6469 URL: https://issues.apache.org/jira/browse/HIVE-6469 Project: Hive Issue Type: New Feature Components: CLI Affects Versions: 0.12.0 Reporter: Jayesh Fix For: 0.12.1 Attachments: HIVE-6469.patch hive drop table command deletes the data from HDFS warehouse and puts it into Trash. Currently there is no way to provide flag to tell warehouse to skip trash while deleting table data. This ticket is to add skipTrash feature in hive command-line, that looks as following. hive -e drop table skipTrash testTable This would be good feature to add, so that user can specify when not to put data into trash directory and thus not to fill hdfs space instead of relying on trash interval and policy configuration to take care of disk filling issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results
[ https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973210#comment-13973210 ] Edward Capriolo commented on HIVE-1608: --- It is not much. SequenceFile + none (codec) only ads some block information around text. I still thing sequence by default is a good idea. It makes it easier to add compression later without sacrificing split- ablility. use sequencefile as the default for storing intermediate results Key: HIVE-1608 URL: https://issues.apache.org/jira/browse/HIVE-1608 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.0 Reporter: Namit Jain Assignee: Brock Noland Fix For: 0.14.0 Attachments: HIVE-1608.patch The only argument for having a text file for storing intermediate results seems to be better debuggability. But, tailing a sequence file is possible, and it should be more space efficient -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-6212) Using Presto-0.56 for sql query,but HiveServer the console print java.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/HIVE-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo resolved HIVE-6212. --- Resolution: Won't Fix Contact presto developers. Using Presto-0.56 for sql query,but HiveServer the console print java.lang.OutOfMemoryError: Java heap space Key: HIVE-6212 URL: https://issues.apache.org/jira/browse/HIVE-6212 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Environment: HADOOP ENVIRONMENT IS CDH5+CDH5-HIVE-0.11+PRESTO-0.56 Reporter: apachehadoop Fix For: 0.11.0 Hi friends: Now I can't open the page https://groups.google.com/forum/#!forum/presto-users ,so show my question here. I have started hiveserver and started presto-server on a machine with commands below: hive --service hiveserver -p 9083 ./launcher run When I use the presto-client-cli command ./presto --server localhost:9083 --catalog hive --schema default ,the console shows presto:default,input the command as show tables the console prints Error running command: java.nio.channels.ClosedChannelException, and the hiveserver console print as below: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Exception in thread pool-1-thread-1 java.lang.OutOfMemoryError: Java heap space at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) my configuration file below: node.properties node.environment=production node.id=cc4a1bbf-5b98-4935-9fde-2cf1c98e8774 node.data-dir=/home/hadoop/cloudera-5.0.0/presto-0.56/presto/data config.properties coordinator=true datasources=jmx http-server.http.port=8080 presto-metastore.db.type=h2 presto-metastore.db.filename=/home/hadoop/cloudera-5.0.0/presto-0.56/presto/db/MetaStore task.max-memory=1GB discovery-server.enabled=true discovery.uri=http://slave4:8080 jvm.config -server -Xmx16G -XX:+UseConcMarkSweepGC -XX:+ExplicitGCInvokesConcurrent -XX:+CMSClassUnloadingEnabled -XX:+AggressiveOpts -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError=kill -9 %p -XX:PermSize=150M -XX:MaxPermSize=150M -XX:ReservedCodeCacheSize=150M -Xbootclasspath/p:/home/hadoop/cloudera-5.0.0/presto-0.56/presto-server-0.56/lib/floatingdecimal-0.1.jar log.properties com.facebook.presto=DEBUG catalog/hive.properties connector.name=hive-cdh4 hive.metastore.uri=thrift://slave4:9083 HADOOP ENVIRONMENT IS CDH5+CDH5-HIVE-0.11+PRESTO-0.56 Last I had increased the Java heap size for the Hive metastore,but it still given me the same error informations ,please help me to check if that is a bug of CDH5.Now I have no idea,god ! please help me ,thanks. ** ** Add some informations as below: Help,help,help! I have test prest-server-0.55 and 0.56 and 0.57 on CDH4 +hive-0.10 or hive-0.11,but it still shown error informations above. ON coordinator machine the directory etc and configuration files as below: =coordinator config.properties: coordinator=true datasources=jmx http-server.http.port=8080 presto-metastore.db.type=h2 presto-metastore.db.filename=/home/hadoop/cloudera-5.0.0/presto-0.55/presto/db/MetaStore task.max-memory=1GB discovery-server.enabled=true discovery.uri=http://name:8080 --jvm.config: -server -Xmx4G -XX:+UseConcMarkSweepGC -XX:+ExplicitGCInvokesConcurrent -XX:+CMSClassUnloadingEnabled -XX:+AggressiveOpts -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError=kill -9 %p -XX:PermSize=150M -XX:MaxPermSize=150M -XX:ReservedCodeCacheSize=150M -Xbootclasspath/p:/home/hadoop/cloudera-5.0.0/presto-0.55/presto-server-0.55/lib/floatingdecimal-0.1.jar
[jira] [Commented] (HIVE-6212) Using Presto-0.56 for sql query,but HiveServer the console print java.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/HIVE-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967874#comment-13967874 ] Edward Capriolo commented on HIVE-6212: --- We dont support presto. Using Presto-0.56 for sql query,but HiveServer the console print java.lang.OutOfMemoryError: Java heap space Key: HIVE-6212 URL: https://issues.apache.org/jira/browse/HIVE-6212 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Environment: HADOOP ENVIRONMENT IS CDH5+CDH5-HIVE-0.11+PRESTO-0.56 Reporter: apachehadoop Fix For: 0.11.0 Hi friends: Now I can't open the page https://groups.google.com/forum/#!forum/presto-users ,so show my question here. I have started hiveserver and started presto-server on a machine with commands below: hive --service hiveserver -p 9083 ./launcher run When I use the presto-client-cli command ./presto --server localhost:9083 --catalog hive --schema default ,the console shows presto:default,input the command as show tables the console prints Error running command: java.nio.channels.ClosedChannelException, and the hiveserver console print as below: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Exception in thread pool-1-thread-1 java.lang.OutOfMemoryError: Java heap space at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) my configuration file below: node.properties node.environment=production node.id=cc4a1bbf-5b98-4935-9fde-2cf1c98e8774 node.data-dir=/home/hadoop/cloudera-5.0.0/presto-0.56/presto/data config.properties coordinator=true datasources=jmx http-server.http.port=8080 presto-metastore.db.type=h2 presto-metastore.db.filename=/home/hadoop/cloudera-5.0.0/presto-0.56/presto/db/MetaStore task.max-memory=1GB discovery-server.enabled=true discovery.uri=http://slave4:8080 jvm.config -server -Xmx16G -XX:+UseConcMarkSweepGC -XX:+ExplicitGCInvokesConcurrent -XX:+CMSClassUnloadingEnabled -XX:+AggressiveOpts -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError=kill -9 %p -XX:PermSize=150M -XX:MaxPermSize=150M -XX:ReservedCodeCacheSize=150M -Xbootclasspath/p:/home/hadoop/cloudera-5.0.0/presto-0.56/presto-server-0.56/lib/floatingdecimal-0.1.jar log.properties com.facebook.presto=DEBUG catalog/hive.properties connector.name=hive-cdh4 hive.metastore.uri=thrift://slave4:9083 HADOOP ENVIRONMENT IS CDH5+CDH5-HIVE-0.11+PRESTO-0.56 Last I had increased the Java heap size for the Hive metastore,but it still given me the same error informations ,please help me to check if that is a bug of CDH5.Now I have no idea,god ! please help me ,thanks. ** ** Add some informations as below: Help,help,help! I have test prest-server-0.55 and 0.56 and 0.57 on CDH4 +hive-0.10 or hive-0.11,but it still shown error informations above. ON coordinator machine the directory etc and configuration files as below: =coordinator config.properties: coordinator=true datasources=jmx http-server.http.port=8080 presto-metastore.db.type=h2 presto-metastore.db.filename=/home/hadoop/cloudera-5.0.0/presto-0.55/presto/db/MetaStore task.max-memory=1GB discovery-server.enabled=true discovery.uri=http://name:8080 --jvm.config: -server -Xmx4G -XX:+UseConcMarkSweepGC -XX:+ExplicitGCInvokesConcurrent -XX:+CMSClassUnloadingEnabled -XX:+AggressiveOpts -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError=kill -9 %p -XX:PermSize=150M -XX:MaxPermSize=150M -XX:ReservedCodeCacheSize=150M
[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results
[ https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961224#comment-13961224 ] Edward Capriolo commented on HIVE-1608: --- If the sequence file is not compressed it is actually larger then the text file... use sequencefile as the default for storing intermediate results Key: HIVE-1608 URL: https://issues.apache.org/jira/browse/HIVE-1608 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.0 Reporter: Namit Jain Assignee: Brock Noland Fix For: 0.14.0 Attachments: HIVE-1608.patch The only argument for having a text file for storing intermediate results seems to be better debuggability. But, tailing a sequence file is possible, and it should be more space efficient -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6570) Hive variable substitution does not work with the source command
[ https://issues.apache.org/jira/browse/HIVE-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951675#comment-13951675 ] Edward Capriolo commented on HIVE-6570: --- No major concern the release note is enough information. Sorry I was not paying attention to this thread. Please proceed. Hive variable substitution does not work with the source command -- Key: HIVE-6570 URL: https://issues.apache.org/jira/browse/HIVE-6570 Project: Hive Issue Type: Bug Reporter: Anthony Hsu Assignee: Anthony Hsu Attachments: HIVE-6570.1.patch The following does not work: {code} source ${hivevar:test-dir}/test.q; {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6570) Hive variable substitution does not work with the source command
[ https://issues.apache.org/jira/browse/HIVE-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932506#comment-13932506 ] Edward Capriolo commented on HIVE-6570: --- WE should make a release note if someone has $ in there file hive might now try to interpret it. Hive variable substitution does not work with the source command -- Key: HIVE-6570 URL: https://issues.apache.org/jira/browse/HIVE-6570 Project: Hive Issue Type: Bug Reporter: Anthony Hsu Assignee: Anthony Hsu Attachments: HIVE-6570.1.patch The following does not work: {code} source ${hivevar:test-dir}/test.q; {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6311) Design a new logo?
[ https://issues.apache.org/jira/browse/HIVE-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882432#comment-13882432 ] Edward Capriolo commented on HIVE-6311: --- I really like the hive logo. I surely can be re-drawn high res etc, but fundamentally I like the elephant/bee hybrid. Design a new logo? -- Key: HIVE-6311 URL: https://issues.apache.org/jira/browse/HIVE-6311 Project: Hive Issue Type: Task Reporter: Brock Noland I have heard some folks saying we should create a new logo so I am creating a jira for their comment, -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name
[ https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868485#comment-13868485 ] Edward Capriolo commented on HIVE-6167: --- In my opinion we must keep the current syntax working as is. Current users of hive do not want there scripts to break just to match a standard. If we wish to add new syntax that matches a given standard that makes sense. I do not think the current standard forbids keeping our current syntax and functionality. Also realistically we have to be practical. Users have sessions, most users are not going to care what database/schema a function is associated with. Most are going to want global functions. Most people are not going to have so many functions that a conflict would ever arise. Lets not make and solve problems we really don't have. Allow user-defined functions to be qualified with database name --- Key: HIVE-6167 URL: https://issues.apache.org/jira/browse/HIVE-6167 Project: Hive Issue Type: Sub-task Components: UDF Reporter: Jason Dere Assignee: Jason Dere Function names in Hive are currently unqualified and there is a single namespace for all function names. This task would allow users to define temporary UDFs (and eventually permanent UDFs) with a database name, such as: CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass'; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6171) Use Paths consistently - V
[ https://issues.apache.org/jira/browse/HIVE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866153#comment-13866153 ] Edward Capriolo commented on HIVE-6171: --- If I had to be picky I see methods named somethingURI( in patch. Convention now is somethingUri. Not critical or required by any stretch Use Paths consistently - V -- Key: HIVE-6171 URL: https://issues.apache.org/jira/browse/HIVE-6171 Project: Hive Issue Type: Improvement Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6171.patch Next in series for consistent usage of Paths in Hive. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6047) Permanent UDFs in Hive
[ https://issues.apache.org/jira/browse/HIVE-6047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864890#comment-13864890 ] Edward Capriolo commented on HIVE-6047: --- Theoreticallly you could compile anything, even input formats or serdes, but I do not imagine anyone using it that way. Permanent UDFs in Hive -- Key: HIVE-6047 URL: https://issues.apache.org/jira/browse/HIVE-6047 Project: Hive Issue Type: Bug Components: UDF Reporter: Jason Dere Assignee: Jason Dere Attachments: PermanentFunctionsinHive.pdf Currently Hive only supports temporary UDFs which must be re-registered when starting up a Hive session. Provide some support to register permanent UDFs with Hive. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6100) Introduce basic set operations as UDFs
[ https://issues.apache.org/jira/browse/HIVE-6100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863147#comment-13863147 ] Edward Capriolo commented on HIVE-6100: --- Having UDFs would still be useful. I use a lot of nested structures. We end up doing really complicated and kinda slow lateral view / join queries to do set operations sometimes. Having UDFs that did things on complex types could help in many situations. Introduce basic set operations as UDFs -- Key: HIVE-6100 URL: https://issues.apache.org/jira/browse/HIVE-6100 Project: Hive Issue Type: New Feature Components: UDF Reporter: Kostiantyn Kudriavtsev Priority: Minor Fix For: 0.13.0 Introduce basic set operations: 1. Intersection: The intersection of A and B, denoted by A ∩ B, is the set of all things that are members of both A and B. select set_intersection(arr_a, arr_b) from dual 2. Union: The union of A and B, denoted by A ∪ B, is the set of all things that are members of either A or B. select set_union(arr_a, arr_b) from dual 3. Symmetric difference: the symmetric difference of two sets is the set of elements which are in either of the sets and not in their intersection. select set_symdiff(arr_a, arr_b) from dual -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6100) Introduce basic set operations as UDFs
[ https://issues.apache.org/jira/browse/HIVE-6100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863402#comment-13863402 ] Edward Capriolo commented on HIVE-6100: --- I think Alan and I are speaking of two different things both of which are valid. Form the title of the Jira I was assuming the user meant this. {pre} create table a ( listint x , list int y) select union (x,y) {pre} But what Alan is discussing is perfectly valid as well. Introduce basic set operations as UDFs -- Key: HIVE-6100 URL: https://issues.apache.org/jira/browse/HIVE-6100 Project: Hive Issue Type: New Feature Components: UDF Reporter: Kostiantyn Kudriavtsev Priority: Minor Fix For: 0.13.0 Introduce basic set operations: 1. Intersection: The intersection of A and B, denoted by A ∩ B, is the set of all things that are members of both A and B. select set_intersection(arr_a, arr_b) from dual 2. Union: The union of A and B, denoted by A ∪ B, is the set of all things that are members of either A or B. select set_union(arr_a, arr_b) from dual 3. Symmetric difference: the symmetric difference of two sets is the set of elements which are in either of the sets and not in their intersection. select set_symdiff(arr_a, arr_b) from dual -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6047) Permanent UDFs in Hive
[ https://issues.apache.org/jira/browse/HIVE-6047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863870#comment-13863870 ] Edward Capriolo commented on HIVE-6047: --- We just added the ability to write UDFs as groovy, can those be persisted as well it would be easier to save the groovy string rather then the compiled classes. Permanent UDFs in Hive -- Key: HIVE-6047 URL: https://issues.apache.org/jira/browse/HIVE-6047 Project: Hive Issue Type: Bug Components: UDF Reporter: Jason Dere Assignee: Jason Dere Attachments: PermanentFunctionsinHive.pdf Currently Hive only supports temporary UDFs which must be re-registered when starting up a Hive session. Provide some support to register permanent UDFs with Hive. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841601#comment-13841601 ] Edward Capriolo commented on HIVE-5783: --- Why does support need to be build directly into the semantic analyzer? I think input format/serde's should be decoupled from the hive code as much as possible. hard codes like this make it hard to evolve support. I *think* you should be only adding the libs as a dependency to the pom files and building some tests. Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Fix For: 0.11.0 Attachments: hive-0.11-parquet.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841674#comment-13841674 ] Edward Capriolo commented on HIVE-5783: --- {quote} regarding the support being built into the semantic analyzer, I mimicked what was done for ORC support{quote} I think that was done before maven. I am sure there is a reason why RCFILE, ORCFILE and this add there own syntax, but this is something we might not want to copy-and-paste repeat just because the last person did it that way. Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Fix For: 0.11.0 Attachments: hive-0.11-parquet.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841693#comment-13841693 ] Edward Capriolo edited comment on HIVE-5783 at 12/6/13 8:55 PM: {quote} I would normally agree with this, but I suppose I was trying to make as minor a change as possible. {quote} Right I am not demanding that we do it one way or the other, just pointing out that we should not build tech dept. hive does not have a dedicated cleanup crew to handle all the non-sexy features :) was (Author: appodictic): {quote} I would normally agree with this, but I suppose I was trying to make as minor a change as possible. {quote} Right I am not demanding that we do it one way or the other, just pointing out that we should not build tech dept. Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Fix For: 0.11.0 Attachments: hive-0.11-parquet.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841693#comment-13841693 ] Edward Capriolo commented on HIVE-5783: --- {quote} I would normally agree with this, but I suppose I was trying to make as minor a change as possible. {quote} Right I am not demanding that we do it one way or the other, just pointing out that we should not build tech dept. Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Fix For: 0.11.0 Attachments: hive-0.11-parquet.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5875) task : collect list of hive configuration params whose default should change
[ https://issues.apache.org/jira/browse/HIVE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830393#comment-13830393 ] Edward Capriolo commented on HIVE-5875: --- hive.mapred.mode=strict hive.cli.print.header=true auotcreate.scheam=false task : collect list of hive configuration params whose default should change Key: HIVE-5875 URL: https://issues.apache.org/jira/browse/HIVE-5875 Project: Hive Issue Type: Task Reporter: Thejas M Nair Assignee: Thejas M Nair The immediate motivation for this was the ticket HIVE-4485 . Beeline prints NULLs as empty strings. This is not a desirable behavior. But if we fix it, it breaks backward compatibility. But we should not be burdening all users with mistakes of the past, specially the users who are new to hive. As hadoop and hive adoption increases proportion of 'new' users will continue to increase. We need a way to let users choose between backward compatible behavior and more sensible behavior. How this is implemented can be discussed in a separate jira. The purpose of this *Task* jira is just to collect list of config flags whose current default is not the desirable one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825611#comment-13825611 ] Edward Capriolo commented on HIVE-5317: --- I have two fundamental problems with this concept. {quote} The only requirement is that the file format must be able to support a rowid. With things like text and sequence file this can be done via a byte offset. {quote} This is a good reason not to do this. Things that only work for some formats create fragmentation. What about format's that do not have a row id? What if the user is already using the key for something else like data? {quote} Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. {quote} What this ticket describes seems like a bad use case for hive. Why would the user not simply create a new table partitioned by hour? What is the need to transaction ally in-place update a table? It seems like the better solution would be for the user to log these updates themselves and then export the table with a tool like squoop periodically. I see this as a really complicated piece of work, for a narrow use case, and I have a very difficult time believing adding transactions to hive to support this is the right answer. Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825619#comment-13825619 ] Edward Capriolo commented on HIVE-5317: --- By the way. I do work like this very often, and having tables that update periodically cause a lot of problems. The first is when you have to re-compute a result 4 days later. You do not want a fresh up-to-date table, you want the table as it existed 4 days ago. When you want to troubleshoot a result you do not want your intermediate tables trampled over. When you want to rebuild a months worth of results you want to launch 31 jobs in parallel not 31 jobs in series. In fact in programming hive I suggest ALWAYS partitioning this dimension tables by time and NOT doing what this ticket is describing for the reasons above (and more) Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826191#comment-13826191 ] Edward Capriolo commented on HIVE-5317: --- {quote} Ed, If you don't use the insert, update, and delete commands, they won't impact your use of Hive. On the other hand, there are a wide number of users who need ACID and updates. {quote} Why don't those users just use an acid database? {quote} The dimension tables have primary keys and are typically bucketed and sorted on those keys. {quote} All the use cases defined seem to be exactly what hive is not built for. 1) Hive does not do much/any optimization of a table when it is sorted. 2) Hive tables do not have primary keys 3) Hive is not made to play with tables of only a few rows It seems like the idea is to turn hive and hive metastore into a once shot database for processes that can easily be done differently. {quote} Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. {quote} 1. squoop export to rdbms 2. run query on rdbms 3. write back to hive. I am not ready to vote -1, but I am struggling to understand why anyone would want to use hive to solve the use cases described. This seems like a square peg in a round hole solution. It feels like something that belongs outside of hive. It feels a lot like this: http://db.cs.yale.edu/hadoopdb/hadoopdb.html Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826207#comment-13826207 ] Edward Capriolo commented on HIVE-5317: --- {quote} In theory the base can be in any format, but ORC will be required for v1 {quote} This is exactly what I talk about when I talk about fragmentation. Hive can not be a system where features only work when using a specific input format. The feature must be applicable to more then just the single file format. Taging other file formats in the LATER bothers me. Wouldn't the community have more utility of something that worked against a TextFormat was written first, then later against other formats. I know about the stinger initiative, developing features that only work with specific input formats does not seem like the correct course of action. It goes against our core design principals: https://cwiki.apache.org/confluence/display/Hive/Home Hive does not mandate read or written data be in the Hive format---there is no such thing. Hive works equally well on Thrift, control delimited, or your specialized data formats. Please see File Format and SerDe in the Developer Guide for details. Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes
[ https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818289#comment-13818289 ] Edward Capriolo commented on HIVE-5731: --- {quote} GenericUDF class is the latest and recommended base class for any UDFs. This JIRA is to change the current UDFDate* classes extended from GenericUDF. {quote} Had anyone done performance evaluation on the speed of a UDF vs a generic UDF. I understand the motivation in the vectorized case, but are users of the non-vectorized case getting less performance. If I knew the performance was negligible I would not care, but I have not seen any numbers and I am wondering if we have considered the implications of this. Use new GenericUDF instead of basic UDF for UDFDate* classes - Key: HIVE-5731 URL: https://issues.apache.org/jira/browse/HIVE-5731 Project: Hive Issue Type: Improvement Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch, HIVE-5731.3.patch, HIVE-5731.4.patch GenericUDF class is the latest and recommended base class for any UDFs. This JIRA is to change the current UDFDate* classes extended from GenericUDF. The general benefit of GenericUDF is described in comments as * The GenericUDF are superior to normal UDFs in the following ways: 1. It can accept arguments of complex types, and return complex types. 2. It can accept variable length of arguments. 3. It can accept an infinite number of function signature - for example, it's easy to write a GenericUDF that accepts arrayint, arrayarrayint and so on (arbitrary levels of nesting). 4. It can do short-circuit evaluations using DeferedObject. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5107) Change hive's build to maven
[ https://issues.apache.org/jira/browse/HIVE-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817813#comment-13817813 ] Edward Capriolo commented on HIVE-5107: --- Generally it is better to but unit tests closest to the code it is testing. This makes it easier to determine test coverage. Integration tests usually involve testing across modules. Ideally we want tests to be localized. Someone working in hive-avro should not have to run tests unrelated to avro to add a feature, I think that is what we are aiming for, clean separation and easier testing without a full run. Change hive's build to maven Key: HIVE-5107 URL: https://issues.apache.org/jira/browse/HIVE-5107 Project: Hive Issue Type: Task Reporter: Edward Capriolo Assignee: Edward Capriolo I can not cope with hive's build infrastructure any more. I have started working on porting the project to maven. When I have some solid progess i will github the entire thing for review. Then we can talk about switching the project somehow. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5602) Micro optimize select operator
[ https://issues.apache.org/jira/browse/HIVE-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808331#comment-13808331 ] Edward Capriolo commented on HIVE-5602: --- Thanks for looking. Micro optimize select operator -- Key: HIVE-5602 URL: https://issues.apache.org/jira/browse/HIVE-5602 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Minor Fix For: 0.13.0 Attachments: HIVE-5602.2.patch.txt, HIVE-5602.patch.1.txt -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5602) Micro optimize select operator
[ https://issues.apache.org/jira/browse/HIVE-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5602: -- Attachment: HIVE-5602.2.patch.txt Micro optimize select operator -- Key: HIVE-5602 URL: https://issues.apache.org/jira/browse/HIVE-5602 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Minor Attachments: HIVE-5602.2.patch.txt, HIVE-5602.patch.1.txt -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5643) ZooKeeperHiveLockManager.getQuorumServers incorrectly appends the custom zk port to quorum hosts
[ https://issues.apache.org/jira/browse/HIVE-5643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806347#comment-13806347 ] Edward Capriolo commented on HIVE-5643: --- +1 ZooKeeperHiveLockManager.getQuorumServers incorrectly appends the custom zk port to quorum hosts Key: HIVE-5643 URL: https://issues.apache.org/jira/browse/HIVE-5643 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.12.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 0.13.0 Attachments: HIVE-5643.1.patch.txt ZooKeeperHiveLockManager calls the below method to construct the connection string for ZooKeeper connection. {code} private static String getQuorumServers(HiveConf conf) { String hosts = conf.getVar(HiveConf.ConfVars.HIVE_ZOOKEEPER_QUORUM); String port = conf.getVar(HiveConf.ConfVars.HIVE_ZOOKEEPER_CLIENT_PORT); return hosts + : + port; } {code} For example: HIVE_ZOOKEEPER_QUORUM=node1, node2, node3 HIVE_ZOOKEEPER_CLIENT_PORT= Connection string given to ZooKeeper object is node1, node2, node3:. ZooKeeper consider the default port as 2181 for hostnames that don't have any port. This works fine as long as HIVE_ZOOKEEPER_CLIENT_PORT is 2181. If it is different then ZooKeeper client object tries to connect to node1 and node2 on port 2181 which always fails. So it has only one choice the last host which receives all the load from Hive. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5610) Merge maven branch into trunk
[ https://issues.apache.org/jira/browse/HIVE-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806176#comment-13806176 ] Edward Capriolo commented on HIVE-5610: --- [~brocknoland] All looks good to me. +1 Lets prepare a wiki doc on maven, and documented the simple changes building, testing, etc. Then we can pull the trigger this change. Merge maven branch into trunk - Key: HIVE-5610 URL: https://issues.apache.org/jira/browse/HIVE-5610 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Brock Noland With HIVE-5566 nearing completion we will be nearly ready to merge the maven branch to trunk. The following tasks will be done post-merge: * HIVE-5611 - Add assembly (i.e.) tar creation to pom * HIVE-5612 - Add ability to re-generate generated code stored in source control The merge process will be as follows: 1) svn merge ^/hive/branches/maven 2) Commit result 3) Modify the following line in maven-rollforward.sh: {noformat} mv $source $target {noformat} to {noformat} svn mv $source $target {noformat} 4) Execute maven-rollfward.sh 5) Commit result 6) Update trunk-mr1.properties and trunk-mr2.properties on the ptesting host, adding the following: {noformat} mavenEnvOpts = -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128 testCasePropertyName = test buildTool = maven unitTests.directories = ./ {noformat} Notes: * To build everything you must: {noformat} $ mvn clean install -DskipTests $ cd itests $ mvn clean install -DskipTests {noformat} because itests (any tests that has cyclical dependencies or requires that the packages be built) is not part of the root reactor build. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5655) Hive incorrecly handles divide-by-zero case
[ https://issues.apache.org/jira/browse/HIVE-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806197#comment-13806197 ] Edward Capriolo commented on HIVE-5655: --- +1 . [~xuefuz] We(I) recently committed a new system that runs udf tests through the operator chain. Maybe you want to base your junit test on that. see ./ql/src/test/org/apache/hadoop/hive/ql/testutil/BaseScalarUdfTest.java Hive incorrecly handles divide-by-zero case --- Key: HIVE-5655 URL: https://issues.apache.org/jira/browse/HIVE-5655 Project: Hive Issue Type: Improvement Components: Types Affects Versions: 0.10.0, 0.11.0, 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-5655.1.patch, HIVE-5655.patch Unlike other databases, Hive currently has only one mode (default mode) regarding error handling, in which NULL value is returned. However, in case of divide-by-zero, Hive demonstrated a different behavior. {code} hive select 5/0 from tmp2 limit 1; Total MapReduce jobs = 1 ... Total MapReduce CPU Time Spent: 860 msec OK Infinity {code} The correct behaviour should be Hive returning NULL instead in order to be consistent w.r.t error handling. (BTW, the same situation is handled corrected for decimal type.) MySQL has server modes control the behaviour. By default, NULL is returned. For instance, {code} mysql select 3/0 from dual; +--+ | 3/0 | +--+ | NULL | +--+ 1 row in set (0.00 sec) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5613) Subquery support: disallow nesting of SubQueries
[ https://issues.apache.org/jira/browse/HIVE-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802252#comment-13802252 ] Edward Capriolo commented on HIVE-5613: --- I do not understand this issue from the description. Are we discussing disallowing sub queries that already work? Or are we discussing more stringent syntax checking? Subquery support: disallow nesting of SubQueries Key: HIVE-5613 URL: https://issues.apache.org/jira/browse/HIVE-5613 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-4965) Add support so that PTFs can stream their output; Windowing PTF should do this
[ https://issues.apache.org/jira/browse/HIVE-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802544#comment-13802544 ] Edward Capriolo commented on HIVE-4965: --- HIVE-4965.D12615.1.patch . There are several lint errors in this patch +while(pItr.hasNext()) +{ int i=0; i=0; +for(i=0; i iPart.getOutputOI().getAllStructFieldRefs().size(); i++) { int i =0; Add support so that PTFs can stream their output; Windowing PTF should do this -- Key: HIVE-4965 URL: https://issues.apache.org/jira/browse/HIVE-4965 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-4965.D12033.1.patch, HIVE-4965.D12615.1.patch There is no need to create an output PTF Partition for the last PTF in a chain. For the Windowing PTF this should give a perf. boost; we avoid creating temporary results for each UDAF; avoid populating an output Partition. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HIVE-5602) Micro optimize select operator
Edward Capriolo created HIVE-5602: - Summary: Micro optimize select operator Key: HIVE-5602 URL: https://issues.apache.org/jira/browse/HIVE-5602 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Priority: Minor -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5602) Micro optimize select operator
[ https://issues.apache.org/jira/browse/HIVE-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5602: -- Attachment: HIVE-5602.patch.1.txt Micro optimize select operator -- Key: HIVE-5602 URL: https://issues.apache.org/jira/browse/HIVE-5602 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Priority: Minor Attachments: HIVE-5602.patch.1.txt -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5602) Micro optimize select operator
[ https://issues.apache.org/jira/browse/HIVE-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5602: -- Assignee: Edward Capriolo Status: Patch Available (was: Open) Micro optimize select operator -- Key: HIVE-5602 URL: https://issues.apache.org/jira/browse/HIVE-5602 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Minor Attachments: HIVE-5602.patch.1.txt -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5602) Micro optimize select operator
[ https://issues.apache.org/jira/browse/HIVE-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801368#comment-13801368 ] Edward Capriolo commented on HIVE-5602: --- SELECT operator is doing try catch inside a for loop each column when it does not need to. Additionally we are making a function call each row to check conf.isSelectComputeNoStart() I micro-benched before and after the change and showed a minimal bonus, please review. {pre} 13/10/21 20:29:29 INFO exec.FilterOperator: 0 forwarding 1 rows 13/10/21 20:29:29 INFO exec.FilterOperator: 0 forwarding 10 rows 13/10/21 20:29:29 INFO exec.FilterOperator: 0 forwarding 100 rows 13/10/21 20:29:29 INFO exec.FilterOperator: 0 forwarding 1000 rows 13/10/21 20:29:29 INFO exec.FilterOperator: 0 forwarding 1 rows 13/10/21 20:29:30 INFO exec.FilterOperator: 0 forwarding 10 rows 13/10/21 20:29:31 INFO exec.FilterOperator: 0 forwarding 100 rows 13/10/21 20:29:33 INFO exec.FilterOperator: 0 forwarding 200 rows 13/10/21 20:29:34 INFO exec.FilterOperator: 0 forwarding 300 rows 13/10/21 20:29:36 INFO exec.FilterOperator: 0 forwarding 400 rows 13/10/21 20:29:38 INFO exec.FilterOperator: 0 forwarding 500 rows 13/10/21 20:29:40 INFO exec.FilterOperator: 0 forwarding 600 rows 13/10/21 20:29:41 INFO exec.FilterOperator: 0 forwarding 700 rows 13/10/21 20:29:43 INFO exec.FilterOperator: 0 forwarding 800 rows 13/10/21 20:29:45 INFO exec.FilterOperator: 0 forwarding 900 rows 13/10/21 20:29:46 INFO exec.FilterOperator: 0 forwarding 1000 rows 13/10/21 20:31:36 INFO exec.FilterOperator: Initialization Done 0 FIL 13/10/21 20:31:36 INFO exec.FilterOperator: 0 forwarding 1 rows 13/10/21 20:31:36 INFO exec.FilterOperator: 0 forwarding 10 rows 13/10/21 20:31:36 INFO exec.FilterOperator: 0 forwarding 100 rows 13/10/21 20:31:36 INFO exec.FilterOperator: 0 forwarding 1000 rows 13/10/21 20:31:37 INFO exec.FilterOperator: 0 forwarding 1 rows 13/10/21 20:31:37 INFO exec.FilterOperator: 0 forwarding 10 rows 13/10/21 20:31:38 INFO exec.FilterOperator: 0 forwarding 100 rows 13/10/21 20:31:40 INFO exec.FilterOperator: 0 forwarding 200 rows 13/10/21 20:31:41 INFO exec.FilterOperator: 0 forwarding 300 rows 13/10/21 20:31:43 INFO exec.FilterOperator: 0 forwarding 400 rows 13/10/21 20:31:45 INFO exec.FilterOperator: 0 forwarding 500 rows 13/10/21 20:31:46 INFO exec.FilterOperator: 0 forwarding 600 rows 13/10/21 20:31:48 INFO exec.FilterOperator: 0 forwarding 700 rows 13/10/21 20:31:49 INFO exec.FilterOperator: 0 forwarding 800 rows 13/10/21 20:31:51 INFO exec.FilterOperator: 0 forwarding 900 rows 13/10/21 20:31:53 INFO exec.FilterOperator: 0 forwarding 1000 rows {pre} Micro optimize select operator -- Key: HIVE-5602 URL: https://issues.apache.org/jira/browse/HIVE-5602 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Minor Attachments: HIVE-5602.patch.1.txt -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5592) Add an option to convert enum as structvalue:int as of Hive 0.8
[ https://issues.apache.org/jira/browse/HIVE-5592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801415#comment-13801415 ] Edward Capriolo commented on HIVE-5592: --- If this is true we need to fix this asap. Add an option to convert enum as structvalue:int as of Hive 0.8 - Key: HIVE-5592 URL: https://issues.apache.org/jira/browse/HIVE-5592 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.11.0, 0.12.0 Reporter: Jie Li HIVE-3323 introduced the incompatible change: Hive handling of enum types has been changed to always return the string value rather than structvalue:int. But it didn't add the option hive.data.convert.enum.to.string as planned and thus broke all Enum usage prior to 0.10. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5600) Fix PTest2 Maven support
[ https://issues.apache.org/jira/browse/HIVE-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801487#comment-13801487 ] Edward Capriolo commented on HIVE-5600: --- +1 Fix PTest2 Maven support Key: HIVE-5600 URL: https://issues.apache.org/jira/browse/HIVE-5600 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-5600.patch At present we don't download all the dependencies required in the source prep phase therefore tests fail when the maven repo has been cleared. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5563) Skip reading columns in ORC for count(*)
[ https://issues.apache.org/jira/browse/HIVE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797028#comment-13797028 ] Edward Capriolo commented on HIVE-5563: --- Just a note/question? How is rcfile effected by these changes. Do we have api fragmentation going on are both formats effected? I am not seeing any end-to-end test in hive-5546 what are we doing to prevent code rot, and to ensure this mistake does not happen again? Skip reading columns in ORC for count(*) Key: HIVE-5563 URL: https://issues.apache.org/jira/browse/HIVE-5563 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley With HIVE-4113, the semantics of ColumnProjectionUtils.getReadColumnIds was fixed so that an empty list means no columns instead of all columns. (Except the caveat of the override of ColumnProjectionUtils.isReadAllColumns.) However, ORC's reader wasn't updated so it still reads all columns. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-4175) Injection of emptyFile into input splits for empty partitions causes Deserializer to fail
[ https://issues.apache.org/jira/browse/HIVE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797033#comment-13797033 ] Edward Capriolo commented on HIVE-4175: --- I bet this is something hive/ hive combine input format is going. I have noticed random issues around empty partitions before that were recently fixed. Also note that protobuf was recently updated from 2.4 - 2.5 Injection of emptyFile into input splits for empty partitions causes Deserializer to fail - Key: HIVE-4175 URL: https://issues.apache.org/jira/browse/HIVE-4175 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Environment: CDH4.2, using MR1 Reporter: James Kebinger Priority: Minor My deserializer is expecting to receive one of 2 different subclasses of Writable, but in certain circumstances it receives an empty instance of org.apache.hadoop.io.Text. This only happens for task attempts where I observe the file called emptyFile in the list of input splits. I'm doing queries over an external year/month/day partitioned table that have eagerly created partitions for, so as of today for example, I may do a query where year = 2013 and month = 3 which includes empty partitions. In the course of investigation I downloaded the sequence files to confirm they were ok. Once I realized that processing of empty partitions was to blame, I am able to work around the issue by bounding my queries to populated partitions. Can the need for the emptyFile be eliminated in the case where there's already a bunch of splits being processed? Failing that, can the mapper detect the current input is from emptyFile and not call the deserializer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5567) Add better protection code for SARGs
[ https://issues.apache.org/jira/browse/HIVE-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797194#comment-13797194 ] Edward Capriolo commented on HIVE-5567: --- Is there a reason that decimal can not be supported, or is the support for decimal incomplete? If SARG can support decimal we might be better off not adding protection, instead we should ensure that our unit tests cover all types. Add better protection code for SARGs Key: HIVE-5567 URL: https://issues.apache.org/jira/browse/HIVE-5567 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.12.0 Reporter: Owen O'Malley Assignee: Owen O'Malley Currently, the SARG parser gets a NPE when the push down predicate uses a type like decimal that isn't supported. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5567) Add better protection code for SARGs
[ https://issues.apache.org/jira/browse/HIVE-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797201#comment-13797201 ] Edward Capriolo commented on HIVE-5567: --- In other words we do not want to create fragmentation. If types certain types can not work with predicate-pushdown that is a problem we should address. Add better protection code for SARGs Key: HIVE-5567 URL: https://issues.apache.org/jira/browse/HIVE-5567 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.12.0 Reporter: Owen O'Malley Assignee: Owen O'Malley Currently, the SARG parser gets a NPE when the push down predicate uses a type like decimal that isn't supported. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-2419) CREATE TABLE AS SELECT should create warehouse directory
[ https://issues.apache.org/jira/browse/HIVE-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795383#comment-13795383 ] Edward Capriolo commented on HIVE-2419: --- What if /user/hive does not exist? What if /user does not exist? Maybe it is better to let people make the directories themselves. Or simply have a pre-flight startup check in init scripts or a java main. CREATE TABLE AS SELECT should create warehouse directory Key: HIVE-2419 URL: https://issues.apache.org/jira/browse/HIVE-2419 Project: Hive Issue Type: Bug Reporter: David Phillips Attachments: HIVE-2419.1.patch If you run a CTAS statement on a fresh Hive install without a warehouse directory (as is the case with Amazon EMR), it runs the query but errors out at the end: {quote} hive create table foo as select * from t_message limit 1; Total MapReduce jobs = 1 Launching Job 1 out of 1 ... Ended Job = job_201108301753_0001 Moving data to: hdfs://ip-10-202-22-194.ec2.internal:9000/mnt/hive_07_1/warehouse/foo Failed with exception Unable to rename: hdfs://ip-10-202-22-194.ec2.internal:9000/mnt/var/lib/hive_07_1/tmp/scratch/hive_2011-08-30_18-04-36_809_6130923980133666976/-ext-10001 to: hdfs://ip-10-202-22-194.ec2.internal:9000/mnt/hive_07_1/warehouse/foo FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask {quote} This is different behavior from a simple CREATE TABLE, which creates the warehouse directory. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-4943) An explode function that includes the item's position in the array
[ https://issues.apache.org/jira/browse/HIVE-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-4943: -- Fix Version/s: 0.13.0 An explode function that includes the item's position in the array -- Key: HIVE-4943 URL: https://issues.apache.org/jira/browse/HIVE-4943 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.11.0 Reporter: Niko Stahl Labels: patch Fix For: 0.13.0 Attachments: HIVE-4943.1.patch, HIVE-4943.2.patch, HIVE-4943.3.patch Original Estimate: 8h Remaining Estimate: 8h A function that explodes an array and includes an output column with the position of each item in the original array. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-4943) An explode function that includes the item's position in the array
[ https://issues.apache.org/jira/browse/HIVE-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-4943: -- Resolution: Fixed Status: Resolved (was: Patch Available) Resolved. Thanks Niko. Next time tag me as a watcher or make more noise if the patch takes so long. An explode function that includes the item's position in the array -- Key: HIVE-4943 URL: https://issues.apache.org/jira/browse/HIVE-4943 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.11.0 Reporter: Niko Stahl Labels: patch Fix For: 0.13.0 Attachments: HIVE-4943.1.patch, HIVE-4943.2.patch, HIVE-4943.3.patch Original Estimate: 8h Remaining Estimate: 8h A function that explodes an array and includes an output column with the position of each item in the original array. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-4943) An explode function that includes the item's position in the array
[ https://issues.apache.org/jira/browse/HIVE-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793375#comment-13793375 ] Edward Capriolo commented on HIVE-4943: --- +1. Let me re-upload the patch after it retests I will commit. An explode function that includes the item's position in the array -- Key: HIVE-4943 URL: https://issues.apache.org/jira/browse/HIVE-4943 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.11.0 Reporter: Niko Stahl Labels: patch Attachments: HIVE-4943.1.patch, HIVE-4943.2.patch Original Estimate: 8h Remaining Estimate: 8h A function that explodes an array and includes an output column with the position of each item in the original array. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-4943) An explode function that includes the item's position in the array
[ https://issues.apache.org/jira/browse/HIVE-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-4943: -- Attachment: HIVE-4943.3.patch An explode function that includes the item's position in the array -- Key: HIVE-4943 URL: https://issues.apache.org/jira/browse/HIVE-4943 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.11.0 Reporter: Niko Stahl Labels: patch Attachments: HIVE-4943.1.patch, HIVE-4943.2.patch, HIVE-4943.3.patch Original Estimate: 8h Remaining Estimate: 8h A function that explodes an array and includes an output column with the position of each item in the original array. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5252) Add ql syntax for inline java code creation
[ https://issues.apache.org/jira/browse/HIVE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792672#comment-13792672 ] Edward Capriolo commented on HIVE-5252: --- NP it can wait a day. Add ql syntax for inline java code creation --- Key: HIVE-5252 URL: https://issues.apache.org/jira/browse/HIVE-5252 Project: Hive Issue Type: Sub-task Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5252.1.patch.txt, HIVE-5252.2.patch.txt Something to the effect of compile 'my code here' using 'groovycompiler'. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5518) ADD JAR should add entries to local classpath
[ https://issues.apache.org/jira/browse/HIVE-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792844#comment-13792844 ] Edward Capriolo commented on HIVE-5518: --- Anecdotally. Anything required as part of an input format needs to be on the aux_path. They are needed to read the data, where as UDFs need not be on the aux_path as they are used inside operators. It would be great if we could unify these concepts without making the classpath needed to launch every job very large. ADD JAR should add entries to local classpath - Key: HIVE-5518 URL: https://issues.apache.org/jira/browse/HIVE-5518 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.12.0 Reporter: Nick Dimiduk Jars referenced in {{ADD JAR}} statements are not made available on the immediate classpath. That means they're useless for scripts which need to initialize external output formats for job submission (ie, hbase storage handler). Is this expected behavior? For example, the table 'pagecounts_hbase' is an hbase table defined using the HBaseStorageHandler {noformat} $ cat foo.hql ADD FILE /etc/hbase/conf/hbase-site.xml; ADD JAR /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-68-hadoop2.jar; ADD JAR /usr/lib/hbase/lib/hbase-server-0.96.0.2.0.6.0-68-hadoop2.jar; ADD JAR /usr/lib/hbase/lib/hbase-client-0.96.0.2.0.6.0-68-hadoop2.jar; ADD JAR /usr/lib/hbase/lib/hbase-protocol-0.96.0.2.0.6.0-68-hadoop2.jar; FROM pgc INSERT INTO TABLE pagecounts_hbase SELECT pgc.* WHERE rowkey LIKE 'en/q%' LIMIT 10; $ hive -f foo.hql ... Added resource: /etc/hbase/conf/hbase-site.xml Added /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-68-hadoop2.jar to class path Added resource: /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-68-hadoop2.jar ... Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/mapreduce/TableInputFormatBase [29/1858] at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:410) at java.lang.ClassLoader.loadClass(ClassLoader.java:410) at java.lang.ClassLoader.loadClass(ClassLoader.java:410) at java.lang.ClassLoader.loadClass(ClassLoader.java:410) at java.lang.ClassLoader.loadClass(ClassLoader.java:410) at java.lang.ClassLoader.loadClass(ClassLoader.java:410) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:266) at org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:305) at org.apache.hadoop.hive.ql.metadata.Table.init(Table.java:98) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:989) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:892) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:730) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:707) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1196) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1053) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8342) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:441) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at
[jira] [Commented] (HIVE-5494) Vectorization throws exception with nested UDF.
[ https://issues.apache.org/jira/browse/HIVE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792856#comment-13792856 ] Edward Capriolo commented on HIVE-5494: --- Looks good. Thank you for adding the end-to-end test. Vectorization throws exception with nested UDF. --- Key: HIVE-5494 URL: https://issues.apache.org/jira/browse/HIVE-5494 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5494.1.patch, HIVE-5494.2.patch {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFAbs, is not supported at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:465) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:274) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getAggregatorExpression(VectorizationContext.java:1512) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.init(VectorGroupByOperator.java:133) ... 41 more FAILED: RuntimeException java.lang.reflect.InvocationTargetException {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5430) NOT expression doesn't handle nulls correctly.
[ https://issues.apache.org/jira/browse/HIVE-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792970#comment-13792970 ] Edward Capriolo commented on HIVE-5430: --- I agree with that. I thought we had a similar issue open that would use standard udfs inside the vectorized ones. I do not agree with calling them legacy though. We should pick a better nomenclature, possible non-vectorized or something. NOT expression doesn't handle nulls correctly. -- Key: HIVE-5430 URL: https://issues.apache.org/jira/browse/HIVE-5430 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5430.1.patch, HIVE-5430.2.patch, HIVE-5430.3.patch, HIVE-5430.4.patch NOT expression doesn't handle nulls correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5423) Speed up testing of scalar UDFS
[ https://issues.apache.org/jira/browse/HIVE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793064#comment-13793064 ] Edward Capriolo commented on HIVE-5423: --- It would be really good to go + 1 on this. Then I can begine the process of removing many rather slow .q tests. Speed up testing of scalar UDFS --- Key: HIVE-5423 URL: https://issues.apache.org/jira/browse/HIVE-5423 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5423.1.patch.txt, HIVE-5423.5.patch.txt, HIVE-5423.6.patch.txt, HIVE-5423.patch.txt -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5494) Vectorization throws exception with nested UDF.
[ https://issues.apache.org/jira/browse/HIVE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791617#comment-13791617 ] Edward Capriolo commented on HIVE-5494: --- Q. Should we be testing with null values? 1. Should we be testing results. This test only shows that we are no longer throwing an exception at this point, but we are not showing the feature works in any meaningful way. After this test can't we just end up with another exception later in the code? Vectorization throws exception with nested UDF. --- Key: HIVE-5494 URL: https://issues.apache.org/jira/browse/HIVE-5494 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5494.1.patch {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFAbs, is not supported at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:465) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:274) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getAggregatorExpression(VectorizationContext.java:1512) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.init(VectorGroupByOperator.java:133) ... 41 more FAILED: RuntimeException java.lang.reflect.InvocationTargetException {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (HIVE-5494) Vectorization throws exception with nested UDF.
[ https://issues.apache.org/jira/browse/HIVE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791617#comment-13791617 ] Edward Capriolo edited comment on HIVE-5494 at 10/10/13 4:06 PM: - Q. Should we be testing with null values? 1. Should we be testing results. This test only shows that we are no longer throwing an exception at this point, but we are not showing the feature works in any meaningful way. After this test can't we just end up with another exception later in the code? I think we need an end to end test here: a 5 null 1 select sum ( abs(a) ) from table 6 was (Author: appodictic): Q. Should we be testing with null values? 1. Should we be testing results. This test only shows that we are no longer throwing an exception at this point, but we are not showing the feature works in any meaningful way. After this test can't we just end up with another exception later in the code? Vectorization throws exception with nested UDF. --- Key: HIVE-5494 URL: https://issues.apache.org/jira/browse/HIVE-5494 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5494.1.patch {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFAbs, is not supported at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:465) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:274) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getAggregatorExpression(VectorizationContext.java:1512) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.init(VectorGroupByOperator.java:133) ... 41 more FAILED: RuntimeException java.lang.reflect.InvocationTargetException {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5518) ADD JAR should add entries to local classpath
[ https://issues.apache.org/jira/browse/HIVE-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792331#comment-13792331 ] Edward Capriolo commented on HIVE-5518: --- Lets look into this. I do not see a reason why the auxpath and add jar list can not be combined. It sure would make many things easier. ADD JAR should add entries to local classpath - Key: HIVE-5518 URL: https://issues.apache.org/jira/browse/HIVE-5518 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.12.0 Reporter: Nick Dimiduk Jars referenced in {{ADD JAR}} statements are not made available on the immediate classpath. That means they're useless for scripts which need to initialize external output formats for job submission (ie, hbase storage handler). Is this expected behavior? For example, the table 'pagecounts_hbase' is an hbase table defined using the HBaseStorageHandler {noformat} $ cat foo.hql ADD FILE /etc/hbase/conf/hbase-site.xml; ADD JAR /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-68-hadoop2.jar; ADD JAR /usr/lib/hbase/lib/hbase-server-0.96.0.2.0.6.0-68-hadoop2.jar; ADD JAR /usr/lib/hbase/lib/hbase-client-0.96.0.2.0.6.0-68-hadoop2.jar; ADD JAR /usr/lib/hbase/lib/hbase-protocol-0.96.0.2.0.6.0-68-hadoop2.jar; FROM pgc INSERT INTO TABLE pagecounts_hbase SELECT pgc.* WHERE rowkey LIKE 'en/q%' LIMIT 10; $ hive -f foo.hql ... Added resource: /etc/hbase/conf/hbase-site.xml Added /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-68-hadoop2.jar to class path Added resource: /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-68-hadoop2.jar ... Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/mapreduce/TableInputFormatBase [29/1858] at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:410) at java.lang.ClassLoader.loadClass(ClassLoader.java:410) at java.lang.ClassLoader.loadClass(ClassLoader.java:410) at java.lang.ClassLoader.loadClass(ClassLoader.java:410) at java.lang.ClassLoader.loadClass(ClassLoader.java:410) at java.lang.ClassLoader.loadClass(ClassLoader.java:410) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:266) at org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:305) at org.apache.hadoop.hive.ql.metadata.Table.init(Table.java:98) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:989) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:892) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:730) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:707) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1196) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1053) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8342) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:441) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348) at
[jira] [Commented] (HIVE-5252) Add ql syntax for inline java code creation
[ https://issues.apache.org/jira/browse/HIVE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790531#comment-13790531 ] Edward Capriolo commented on HIVE-5252: --- Groovyc (groovy compiler) requires ant. Ant is on our classpath for development but we need to add it as a ql. dep because otherwise it does not get added to hive/lib in the package. Add ql syntax for inline java code creation --- Key: HIVE-5252 URL: https://issues.apache.org/jira/browse/HIVE-5252 Project: Hive Issue Type: Sub-task Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5252.1.patch.txt Something to the effect of compile 'my code here' using 'groovycompiler'. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5252) Add ql syntax for inline java code creation
[ https://issues.apache.org/jira/browse/HIVE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5252: -- Attachment: HIVE-5252.2.patch.txt Add ql syntax for inline java code creation --- Key: HIVE-5252 URL: https://issues.apache.org/jira/browse/HIVE-5252 Project: Hive Issue Type: Sub-task Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5252.1.patch.txt, HIVE-5252.2.patch.txt Something to the effect of compile 'my code here' using 'groovycompiler'. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HIVE-5491) Some lazy DeferredObjects inspectors are fat
Edward Capriolo created HIVE-5491: - Summary: Some lazy DeferredObjects inspectors are fat Key: HIVE-5491 URL: https://issues.apache.org/jira/browse/HIVE-5491 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Priority: Minor I was looking at some of the implementations of DeferredObject. I found that some carry two extra-properties: boolean eager; boolean eval; Where eval is used to track if the obj is initiated. My thinking is that these extra properties make the objects fat and if removed it allows us to fit more lazy objects in the same memory. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5491) Some lazy DeferredObjects inspectors are fat
[ https://issues.apache.org/jira/browse/HIVE-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5491: -- Attachment: HIVE-5491.1.patch.txt Hive will not tolerate fat lazy code! jk Some lazy DeferredObjects inspectors are fat -- Key: HIVE-5491 URL: https://issues.apache.org/jira/browse/HIVE-5491 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Priority: Minor Attachments: HIVE-5491.1.patch.txt I was looking at some of the implementations of DeferredObject. I found that some carry two extra-properties: boolean eager; boolean eval; Where eval is used to track if the obj is initiated. My thinking is that these extra properties make the objects fat and if removed it allows us to fit more lazy objects in the same memory. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5491) Some lazy DeferredObjects inspectors are fat
[ https://issues.apache.org/jira/browse/HIVE-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5491: -- Assignee: Edward Capriolo Status: Patch Available (was: Open) Some lazy DeferredObjects inspectors are fat -- Key: HIVE-5491 URL: https://issues.apache.org/jira/browse/HIVE-5491 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Minor Attachments: HIVE-5491.1.patch.txt I was looking at some of the implementations of DeferredObject. I found that some carry two extra-properties: boolean eager; boolean eval; Where eval is used to track if the obj is initiated. My thinking is that these extra properties make the objects fat and if removed it allows us to fit more lazy objects in the same memory. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HIVE-5497) Hive trunk broken against hadoop 0.20.2
Edward Capriolo created HIVE-5497: - Summary: Hive trunk broken against hadoop 0.20.2 Key: HIVE-5497 URL: https://issues.apache.org/jira/browse/HIVE-5497 Project: Hive Issue Type: Bug Reporter: Edward Capriolo Priority: Blocker ommon-0.13.0-SNAPSHOT.jar!/hive-log4j.properties hive compile `import org.apache.hadoop.hive.ql.exec.UDF \; public class Pyth extends UDF { public double evaluate(double a, double b){ return Math.sqrt((a*a) + (b*b)) \; } } ` AS GROOVY NAMED Pyth.groovy; Added /tmp/0_1381290655403.jar to class path Added resource: /tmp/0_1381290655403.jar hive create temporary function Pyth as 'Pyth'; OK Time taken: 0.445 seconds hive select Pyth(a,b) from a; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Exception in thread main java.lang.UnsupportedOperationException: Kerberos not supported in current hadoop version at org.apache.hadoop.hive.shims.Hadoop20Shims.getTokenFileLocEnvName(Hadoop20Shims.java:775) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:653) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Execution failed with exit status: 1 Obtaining error information Task failed! Task ID: Stage-1 Logs: /tmp/edward/hive.log FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask hive -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5252) Add ql syntax for inline java code creation
[ https://issues.apache.org/jira/browse/HIVE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5252: -- Attachment: HIVE-5252.1.patch.txt Add ql syntax for inline java code creation --- Key: HIVE-5252 URL: https://issues.apache.org/jira/browse/HIVE-5252 Project: Hive Issue Type: Sub-task Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5252.1.patch.txt Something to the effect of compile 'my code here' using 'groovycompiler'. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5252) Add ql syntax for inline java code creation
[ https://issues.apache.org/jira/browse/HIVE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5252: -- Status: Patch Available (was: Open) Add ql syntax for inline java code creation --- Key: HIVE-5252 URL: https://issues.apache.org/jira/browse/HIVE-5252 Project: Hive Issue Type: Sub-task Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5252.1.patch.txt Something to the effect of compile 'my code here' using 'groovycompiler'. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5460) invalid offsets in lag lead should return an exception (per ISO-SQL)
[ https://issues.apache.org/jira/browse/HIVE-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788196#comment-13788196 ] Edward Capriolo commented on HIVE-5460: --- This should be ready for review. invalid offsets in lag lead should return an exception (per ISO-SQL) - Key: HIVE-5460 URL: https://issues.apache.org/jira/browse/HIVE-5460 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: N Campbell Assignee: Edward Capriolo Priority: Minor Attachments: HIVE-5460.1.patch.txt ISO-SQL 2011 defines how lag and lead should behave when invalid offsets are provided to the functions. i.e. select tint.rnum,tint.cint, lag( tint.cint, -100 ) over ( order by tint.rnum) from tint tint Instead of a meaningful error (as other vendors will emit) you get Error: Query returned non-zero code: 2, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask SQLState: 08S01 ErrorCode: 2 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5253) Create component to compile and jar dynamic code
[ https://issues.apache.org/jira/browse/HIVE-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788208#comment-13788208 ] Edward Capriolo commented on HIVE-5253: --- With no -1's registered there is no blocker. This issue is already a month old. If someone wanted to have a debate over it the time was a month ago. We should not block features for a month over random security debates. There already is pandora's box of 'public static HashMaps', ThreadLocal variables, and other things that if people REALLY want to talk about security they can fix. Create component to compile and jar dynamic code Key: HIVE-5253 URL: https://issues.apache.org/jira/browse/HIVE-5253 Project: Hive Issue Type: Sub-task Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5253.10.patch.txt, HIVE-5253.11.patch.txt, HIVE-5253.1.patch.txt, HIVE-5253.3.patch.txt, HIVE-5253.3.patch.txt, HIVE-5253.3.patch.txt, HIVE-5253.8.patch.txt, HIVE-5253.9.patch.txt, HIVE-5253.patch.txt -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5460) invalid offsets in lag lead should return an exception (per ISO-SQL)
[ https://issues.apache.org/jira/browse/HIVE-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788211#comment-13788211 ] Edward Capriolo commented on HIVE-5460: --- 99% of the windowing merge violated our code conventions. I am slowly fixing these issues as we find bugs and other tweeks in the code. You know next time someone does a code pie chart about a release I want to have the most lines of code :) invalid offsets in lag lead should return an exception (per ISO-SQL) - Key: HIVE-5460 URL: https://issues.apache.org/jira/browse/HIVE-5460 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: N Campbell Assignee: Edward Capriolo Priority: Minor Attachments: HIVE-5460.1.patch.txt ISO-SQL 2011 defines how lag and lead should behave when invalid offsets are provided to the functions. i.e. select tint.rnum,tint.cint, lag( tint.cint, -100 ) over ( order by tint.rnum) from tint tint Instead of a meaningful error (as other vendors will emit) you get Error: Query returned non-zero code: 2, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask SQLState: 08S01 ErrorCode: 2 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5423) Speed up testing of scalar UDFS
[ https://issues.apache.org/jira/browse/HIVE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787634#comment-13787634 ] Edward Capriolo commented on HIVE-5423: --- It seems like our tests/ junit cant understand not to run the bast class. I will rename it from TestBase to BaseTest and we will see if jenkins is happier Speed up testing of scalar UDFS --- Key: HIVE-5423 URL: https://issues.apache.org/jira/browse/HIVE-5423 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5423.1.patch.txt, HIVE-5423.5.patch.txt, HIVE-5423.patch.txt -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5423) Speed up testing of scalar UDFS
[ https://issues.apache.org/jira/browse/HIVE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5423: -- Attachment: HIVE-5423.6.patch.txt Renamed base class so hopefully we can keep it abstract. Speed up testing of scalar UDFS --- Key: HIVE-5423 URL: https://issues.apache.org/jira/browse/HIVE-5423 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5423.1.patch.txt, HIVE-5423.5.patch.txt, HIVE-5423.6.patch.txt, HIVE-5423.patch.txt -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5464) allow OR conditions in table join
[ https://issues.apache.org/jira/browse/HIVE-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787652#comment-13787652 ] Edward Capriolo commented on HIVE-5464: --- Hive only supports equi-joins like this. You can still accomplish this query using a cart-product. Map Reduce can not easily make this query efficient, are you planning to work on this? We have to think carefully if we want this. There is a big danger in adding things to hive that can not be done efficiently in map/reduce. allow OR conditions in table join - Key: HIVE-5464 URL: https://issues.apache.org/jira/browse/HIVE-5464 Project: Hive Issue Type: Improvement Affects Versions: 0.11.0 Reporter: N Campbell select tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 tjoin1 inner join tjoin2 tjoin2 on ( tjoin1.c1 = 10 or tjoin1.c1=20 ) Query returned non-zero code: 10019, cause: FAILED: SemanticException [Error 10019]: Line 1:96 OR not supported in JOIN currently '20' -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (HIVE-5460) invalid offsets in lag lead should return an exception (per ISO-SQL)
[ https://issues.apache.org/jira/browse/HIVE-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo reassigned HIVE-5460: - Assignee: Edward Capriolo invalid offsets in lag lead should return an exception (per ISO-SQL) - Key: HIVE-5460 URL: https://issues.apache.org/jira/browse/HIVE-5460 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: N Campbell Assignee: Edward Capriolo Priority: Minor ISO-SQL 2012 defines how lag and lead should behave when invalid offsets are provided to the functions. i.e. select tint.rnum,tint.cint, lag( tint.cint, -100 ) over ( order by tint.rnum) from tint tint Instead of a meaningful error (as other vendors will emit) you get Error: Query returned non-zero code: 2, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask SQLState: 08S01 ErrorCode: 2 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5460) invalid offsets in lag lead should return an exception (per ISO-SQL)
[ https://issues.apache.org/jira/browse/HIVE-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787665#comment-13787665 ] Edward Capriolo commented on HIVE-5460: --- Can you provide a link to the definition of what it should do? invalid offsets in lag lead should return an exception (per ISO-SQL) - Key: HIVE-5460 URL: https://issues.apache.org/jira/browse/HIVE-5460 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: N Campbell Assignee: Edward Capriolo Priority: Minor ISO-SQL 2012 defines how lag and lead should behave when invalid offsets are provided to the functions. i.e. select tint.rnum,tint.cint, lag( tint.cint, -100 ) over ( order by tint.rnum) from tint tint Instead of a meaningful error (as other vendors will emit) you get Error: Query returned non-zero code: 2, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask SQLState: 08S01 ErrorCode: 2 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5423) Speed up testing of scalar UDFS
[ https://issues.apache.org/jira/browse/HIVE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787737#comment-13787737 ] Edward Capriolo commented on HIVE-5423: --- Ok looks good! yay! Speed up testing of scalar UDFS --- Key: HIVE-5423 URL: https://issues.apache.org/jira/browse/HIVE-5423 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5423.1.patch.txt, HIVE-5423.5.patch.txt, HIVE-5423.6.patch.txt, HIVE-5423.patch.txt -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5460) invalid offsets in lag lead should return an exception (per ISO-SQL)
[ https://issues.apache.org/jira/browse/HIVE-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5460: -- Status: Patch Available (was: Open) Should test. invalid offsets in lag lead should return an exception (per ISO-SQL) - Key: HIVE-5460 URL: https://issues.apache.org/jira/browse/HIVE-5460 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: N Campbell Assignee: Edward Capriolo Priority: Minor Attachments: HIVE-5460.1.patch.txt ISO-SQL 2011 defines how lag and lead should behave when invalid offsets are provided to the functions. i.e. select tint.rnum,tint.cint, lag( tint.cint, -100 ) over ( order by tint.rnum) from tint tint Instead of a meaningful error (as other vendors will emit) you get Error: Query returned non-zero code: 2, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask SQLState: 08S01 ErrorCode: 2 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5460) invalid offsets in lag lead should return an exception (per ISO-SQL)
[ https://issues.apache.org/jira/browse/HIVE-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5460: -- Attachment: HIVE-5460.1.patch.txt invalid offsets in lag lead should return an exception (per ISO-SQL) - Key: HIVE-5460 URL: https://issues.apache.org/jira/browse/HIVE-5460 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: N Campbell Assignee: Edward Capriolo Priority: Minor Attachments: HIVE-5460.1.patch.txt ISO-SQL 2011 defines how lag and lead should behave when invalid offsets are provided to the functions. i.e. select tint.rnum,tint.cint, lag( tint.cint, -100 ) over ( order by tint.rnum) from tint tint Instead of a meaningful error (as other vendors will emit) you get Error: Query returned non-zero code: 2, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask SQLState: 08S01 ErrorCode: 2 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5400) Allow admins to disable compile and other commands
[ https://issues.apache.org/jira/browse/HIVE-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5400: -- Fix Version/s: 0.13.0 Assignee: Brock Noland (was: Edward Capriolo) Committed. Thanks Brock Allow admins to disable compile and other commands -- Key: HIVE-5400 URL: https://issues.apache.org/jira/browse/HIVE-5400 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.13.0 Attachments: HIVE-5400.patch, HIVE-5400.patch, HIVE-5400.patch From here: https://issues.apache.org/jira/browse/HIVE-5253?focusedCommentId=13782220page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13782220 I think we should afford admins who want to disable this functionality the ability to do so. Since such admins might want to disable other commands such as add or dfs, it wouldn't be much trouble to allow them to do this as well. For example we could have a configuration option hive.available.commands (or similar) which specified add,set,delete,reset, etc by default. Then check this value in CommandProcessorFactory. It would probably make sense to add this property to the restrict list. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5400) Allow admins to disable compile and other commands
[ https://issues.apache.org/jira/browse/HIVE-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5400: -- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks again Brock. Allow admins to disable compile and other commands -- Key: HIVE-5400 URL: https://issues.apache.org/jira/browse/HIVE-5400 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.13.0 Attachments: HIVE-5400.patch, HIVE-5400.patch, HIVE-5400.patch From here: https://issues.apache.org/jira/browse/HIVE-5253?focusedCommentId=13782220page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13782220 I think we should afford admins who want to disable this functionality the ability to do so. Since such admins might want to disable other commands such as add or dfs, it wouldn't be much trouble to allow them to do this as well. For example we could have a configuration option hive.available.commands (or similar) which specified add,set,delete,reset, etc by default. Then check this value in CommandProcessorFactory. It would probably make sense to add this property to the restrict list. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5423) Speed up testing of scalar UDFS
[ https://issues.apache.org/jira/browse/HIVE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787445#comment-13787445 ] Edward Capriolo commented on HIVE-5423: --- This version is ready for review. Removed excessed files. Renamed as Brock suggested, moved files as mark suggested. Speed up testing of scalar UDFS --- Key: HIVE-5423 URL: https://issues.apache.org/jira/browse/HIVE-5423 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5423.1.patch.txt, HIVE-5423.5.patch.txt, HIVE-5423.patch.txt -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5423) Speed up testing of scalar UDFS
[ https://issues.apache.org/jira/browse/HIVE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5423: -- Attachment: HIVE-5423.5.patch.txt Speed up testing of scalar UDFS --- Key: HIVE-5423 URL: https://issues.apache.org/jira/browse/HIVE-5423 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5423.1.patch.txt, HIVE-5423.5.patch.txt, HIVE-5423.patch.txt -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5334) Milestone 3: Some tests pass under maven
[ https://issues.apache.org/jira/browse/HIVE-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786758#comment-13786758 ] Edward Capriolo commented on HIVE-5334: --- Looks fine Milestone 3: Some tests pass under maven Key: HIVE-5334 URL: https://issues.apache.org/jira/browse/HIVE-5334 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-5334.patch, HIVE-5334.patch This milestone is that some tests pass and therefore we have the basic unit test environment setup. We'll hunt down the rest of the failing tests in future jiras. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5087) Rename npath UDF to matchpath
[ https://issues.apache.org/jira/browse/HIVE-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785282#comment-13785282 ] Edward Capriolo commented on HIVE-5087: --- I am back under the opinion we should just remove this UDF. You could make a sequel to 'office space' based on the story behind this UDF 'yea... im going to need you to come in on Saturday and rename this udf' 'yea...im going to need you to come in on sunday because its saturday and I dont know the name yet' 'yea...im going to need you to come in next saturday because we are not sure if we should rename it yet' It would be a block buster for sure. Rename npath UDF to matchpath - Key: HIVE-5087 URL: https://issues.apache.org/jira/browse/HIVE-5087 Project: Hive Issue Type: Bug Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Blocker Fix For: 0.12.0 Attachments: HIVE-5087.1.patch.txt, HIVE-5087.99.patch.txt, HIVE-5087-matchpath.1.patch.txt, HIVE-5087.patch.txt, HIVE-5087.patch.txt, regex_path.diff -- This message was sent by Atlassian JIRA (v6.1#6144)