[jira] [Commented] (HIVE-8126) Standalone hive-jdbc jar is not packaged in the Hive distribution
[ https://issues.apache.org/jira/browse/HIVE-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135017#comment-14135017 ] Ashutosh Chauhan commented on HIVE-8126: +1 Standalone hive-jdbc jar is not packaged in the Hive distribution - Key: HIVE-8126 URL: https://issues.apache.org/jira/browse/HIVE-8126 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.14.0 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Fix For: 0.14.0 Attachments: HIVE-8126.1.patch With HIVE-538 we started creating the hive-jdbc-*-standalone.jar but the packaging/distribution does not contain the standalone jdbc jar. I would have expected it to locate under the lib folder of the distribution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25329: HIVE-7932: It may cause NP exception when add accessed columns to ReadEntity
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25329/ --- (Updated Sept. 16, 2014, 6:08 a.m.) Review request for hive, Brock Noland, Prasad Mujumdar, and Szehon Ho. Changes --- fix some format issue seem like patch apply has something wrong. Repository: hive-git Description --- When I execute a query with view join, the view's type is table, but tableToColumnAccessMap will not store view's name, so it will throw null pointer exception Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 392f7ce ql/src/test/org/apache/hadoop/hive/ql/parse/TestColumnAccess.java PRE-CREATION Diff: https://reviews.apache.org/r/25329/diff/ Testing --- Thanks, Xiaomeng Huang
[jira] [Commented] (HIVE-8107) Bad error message for non-existent table in update and delete
[ https://issues.apache.org/jira/browse/HIVE-8107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135025#comment-14135025 ] Hive QA commented on HIVE-8107: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668838/HIVE-8107.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6277 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.parse.TestParse.testParse_union {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/814/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/814/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-814/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12668838 Bad error message for non-existent table in update and delete - Key: HIVE-8107 URL: https://issues.apache.org/jira/browse/HIVE-8107 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-8107.patch update no_such_table set x = 3; produces an error message like: {noformat} 2014-09-12 19:45:00,138 ERROR [main]: ql.Driver (SessionState.java:printError(824)) - FAILED: SemanticException [Error 10290]: Encountered parse error while parsing rewritten update or delete query org.apache.hadoop.hive.ql.parse.SemanticException: Encountered parse error while parsing rewritten update or delete query at org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze(UpdateDeleteSemanticAnalyzer.java:130) at org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeDelete(UpdateDeleteSemanticAnalyzer.java:97) at org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeInternal(UpdateDeleteSemanticAnalyzer.java:66) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:217) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:406) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:302) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1051) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1121) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:988) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:978) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:344) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:441) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:457) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:737) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found no_such_table at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1008) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:978) at org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze(UpdateDeleteSemanticAnalyzer.java:128) ... 24 more {noformat} It should give something much cleaner, or at least push the Table not found message to the top rather than bury it in an exception stack. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-860) Persistent distributed cache
[ https://issues.apache.org/jira/browse/HIVE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu reassigned HIVE-860: - Assignee: Ferdinand Xu (was: Brock Noland) Persistent distributed cache Key: HIVE-860 URL: https://issues.apache.org/jira/browse/HIVE-860 Project: Hive Issue Type: Improvement Affects Versions: 0.12.0 Reporter: Zheng Shao Assignee: Ferdinand Xu Fix For: 0.14.0 Attachments: HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch DistributedCache is shared across multiple jobs, if the hdfs file name is the same. We need to make sure Hive put the same file into the same location every time and do not overwrite if the file content is the same. We can achieve 2 different results: A1. Files added with the same name, timestamp, and md5 in the same session will have a single copy in distributed cache. A2. Filed added with the same name, timestamp, and md5 will have a single copy in distributed cache. A2 has a bigger benefit in sharing but may raise a question on when Hive should clean it up in hdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5744) Implement support for BETWEEN in SELECT list
[ https://issues.apache.org/jira/browse/HIVE-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5744: Assignee: Navis Status: Patch Available (was: Open) Implement support for BETWEEN in SELECT list Key: HIVE-5744 URL: https://issues.apache.org/jira/browse/HIVE-5744 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Navis Attachments: HIVE-4160.1.patch.txt Queries like SELECT col1 BETWEEN 0 and 10 from T; fail in vectorized mode. Support needs to be implemented for a BETWEEN expression in the SELECT list, comparable to how it was added for comparison operators (, , ...). These were done by adding new, templates that return a value for a comparison instead of applying a filter. See ColumnCompareScalar.txt under ql/src/gen for an example. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5744) Implement support for BETWEEN in SELECT list
[ https://issues.apache.org/jira/browse/HIVE-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5744: Attachment: HIVE-4160.1.patch.txt Implement support for BETWEEN in SELECT list Key: HIVE-5744 URL: https://issues.apache.org/jira/browse/HIVE-5744 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Attachments: HIVE-4160.1.patch.txt Queries like SELECT col1 BETWEEN 0 and 10 from T; fail in vectorized mode. Support needs to be implemented for a BETWEEN expression in the SELECT list, comparable to how it was added for comparison operators (, , ...). These were done by adding new, templates that return a value for a comparison instead of applying a filter. See ColumnCompareScalar.txt under ql/src/gen for an example. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5744) Implement support for BETWEEN in SELECT list
[ https://issues.apache.org/jira/browse/HIVE-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5744: Attachment: HIVE-5744.1.patch.txt Implement support for BETWEEN in SELECT list Key: HIVE-5744 URL: https://issues.apache.org/jira/browse/HIVE-5744 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Navis Attachments: HIVE-5744.1.patch.txt Queries like SELECT col1 BETWEEN 0 and 10 from T; fail in vectorized mode. Support needs to be implemented for a BETWEEN expression in the SELECT list, comparable to how it was added for comparison operators (, , ...). These were done by adding new, templates that return a value for a comparison instead of applying a filter. See ColumnCompareScalar.txt under ql/src/gen for an example. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8102) Partitions of type 'date' behave incorrectly with daylight saving time.
[ https://issues.apache.org/jira/browse/HIVE-8102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135083#comment-14135083 ] Eli Acherkan commented on HIVE-8102: Thanks [~jdere]! The patch appears to work well for us. (Haven't tested on other timezones.) Partitions of type 'date' behave incorrectly with daylight saving time. --- Key: HIVE-8102 URL: https://issues.apache.org/jira/browse/HIVE-8102 Project: Hive Issue Type: Bug Components: Database/Schema, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Eli Acherkan Attachments: HIVE-8102.1.patch On 2AM on March 28th 2014, Israel went from standard time (GMT+2) to daylight saving time (GMT+3). The server's timezone is Asia/Jerusalem. When creating a partition whose key is 2014-03-28, Hive creates a partition for 2013-03-27 instead: hive (default) create table test (a int) partitioned by (`b_prt` date); OK Time taken: 0.092 seconds hive (default) alter table test add partition (b_prt='2014-03-28'); OK Time taken: 0.187 seconds hive (default) show partitions test; OK partition b_prt=2014-03-27 Time taken: 0.134 seconds, Fetched: 1 row(s) It seems that the root cause is the behavior of DateWritable.daysToMillis/dateToDays. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8038) Decouple ORC files split calculation logic from Filesystem's get file location implementation
[ https://issues.apache.org/jira/browse/HIVE-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135101#comment-14135101 ] Hive QA commented on HIVE-8038: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668843/HIVE-8038.2.patch {color:green}SUCCESS:{color} +1 6276 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/815/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/815/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-815/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12668843 Decouple ORC files split calculation logic from Filesystem's get file location implementation - Key: HIVE-8038 URL: https://issues.apache.org/jira/browse/HIVE-8038 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.13.1 Reporter: Pankit Thapar Assignee: Pankit Thapar Fix For: 0.14.0 Attachments: HIVE-8038.2.patch, HIVE-8038.patch What is the Current Logic == 1.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using the array index (index = offset/blockSize), get the corresponding host having the blockLocation 4.If the split spans multiple blocks, then get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits Issue with Current Logic = Dependency on FileSystem API’s logic for block location calculations. It returns an array and we need to rely on FileSystem to make all blocks of same size if we want to directly access a block from the array. What is the Fix = 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 1b.convert the array into a tree map offset, BlockLocation and return it through getLocationsWithOffSet() 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using Tree.floorEntry(key), get the highest entry smaller than offset for the split and get the corresponding host. 4a.If the split spans multiple blocks, get a submap, which contains all entries containing blockLocations from the offset to offset + length 4b.get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits What are the major changes in logic == 1. store BlockLocations in a Map instead of an array 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations() 3. one block case is checked by if(offset + length = start.getOffset() + start.getLength()) instead of if((offset % blockSize) + length = blockSize) What is the affect on Complexity (Big O) = 1. We add a O(n) loop to build a TreeMap from an array but its a one time cost and would not be called for each split 2. In case of one block case, we can get the block in O(logn) worst case which was O(1) before 3. Getting the submap is O(logn) 4. In case of multiple block case, building the list of hosts is O(m) which was O(n) m n as previously we were iterating over all the block locations but now we are only iterating only blocks that belong to that range go offsets that we need. What are the benefits of the change == 1. With this fix, we do not depend on the blockLocations returned by FileSystem to figure out the block corresponding to the offset and blockSize 2. Also, it is not necessary that block lengths is same for all blocks for all FileSystems 3. Previously we were using blockSize for one block case and block.length for multiple block case, which is not the case now. We figure out the block depending upon the actual length and offset of the block -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-6705) hive jdbc can not used by jmeter, because of unsupported auto commit feature
[ https://issues.apache.org/jira/browse/HIVE-6705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis reassigned HIVE-6705: --- Assignee: Navis hive jdbc can not used by jmeter, because of unsupported auto commit feature Key: HIVE-6705 URL: https://issues.apache.org/jira/browse/HIVE-6705 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.12.0 Environment: CentOS_X86_64 JMeter 2.11 Reporter: Ben Assignee: Navis Attachments: HIVE-6705.1.patch.txt In apache jmeter ,the autocommit property is required. but in the hive jdbc the auto commit is unsupported method. in /jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java {quote} public void setAutoCommit(boolean autoCommit) throws SQLException { // TODO Auto-generated method stub throw new {color:red} SQLException(Method not supported); {color} } {quote} so ,should we make a mock to support the auto commit property == false ? {quote} public void setAutoCommit(boolean autoCommit) throws SQLException { // TODO Auto-generated method stub {color:red}if(autoCommit) {color} throw new SQLException(Method not supported); else return; } {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 25688: hive jdbc can not used by jmeter, because of unsupported auto commit feature
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25688/ --- Review request for hive. Bugs: HIVE-6705 https://issues.apache.org/jira/browse/HIVE-6705 Repository: hive-git Description --- In apache jmeter ,the autocommit property is required. but in the hive jdbc the auto commit is unsupported method. in /jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java {quote} public void setAutoCommit(boolean autoCommit) throws SQLException { // TODO Auto-generated method stub throw new {color:red} SQLException(Method not supported); {color} } {quote} so ,should we make a mock to support the auto commit property == false ? {quote} public void setAutoCommit(boolean autoCommit) throws SQLException { // TODO Auto-generated method stub {color:red}if(autoCommit) {color} throw new SQLException(Method not supported); else return; } {quote} Diffs - jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java 59ce692 jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java cbcfec7 Diff: https://reviews.apache.org/r/25688/diff/ Testing --- Thanks, Navis Ryu
[jira] [Updated] (HIVE-6705) hive jdbc can not used by jmeter, because of unsupported auto commit feature
[ https://issues.apache.org/jira/browse/HIVE-6705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-6705: Attachment: HIVE-6705.2.patch.txt hive jdbc can not used by jmeter, because of unsupported auto commit feature Key: HIVE-6705 URL: https://issues.apache.org/jira/browse/HIVE-6705 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.12.0 Environment: CentOS_X86_64 JMeter 2.11 Reporter: Ben Assignee: Navis Attachments: HIVE-6705.1.patch.txt, HIVE-6705.2.patch.txt In apache jmeter ,the autocommit property is required. but in the hive jdbc the auto commit is unsupported method. in /jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java {quote} public void setAutoCommit(boolean autoCommit) throws SQLException { // TODO Auto-generated method stub throw new {color:red} SQLException(Method not supported); {color} } {quote} so ,should we make a mock to support the auto commit property == false ? {quote} public void setAutoCommit(boolean autoCommit) throws SQLException { // TODO Auto-generated method stub {color:red}if(autoCommit) {color} throw new SQLException(Method not supported); else return; } {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-7996) Potential resource leak in HiveBurnInClient
[ https://issues.apache.org/jira/browse/HIVE-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] skrho reassigned HIVE-7996: --- Assignee: skrho Potential resource leak in HiveBurnInClient --- Key: HIVE-7996 URL: https://issues.apache.org/jira/browse/HIVE-7996 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: skrho Priority: Minor In createTables() and runQueries(), Statement stmt is not closed upon return. In main(), Connection con is not closed upon exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7996) Potential resource leak in HiveBurnInClient
[ https://issues.apache.org/jira/browse/HIVE-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135139#comment-14135139 ] skrho commented on HIVE-7996: - Hello Ted Yu~~ What is class name which is fixed? or Where do I check to fix ? ^^ Potential resource leak in HiveBurnInClient --- Key: HIVE-7996 URL: https://issues.apache.org/jira/browse/HIVE-7996 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: skrho Priority: Minor In createTables() and runQueries(), Statement stmt is not closed upon return. In main(), Connection con is not closed upon exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8118) SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple result collectors[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135151#comment-14135151 ] Chengxiang Li commented on HIVE-8118: - Actually, we could generate a spark graph with one map RDD followed by multi reduce RDDs, it should not related with SparkMapRecordHandler and SparkReduceRecorderHandler, we could wrap each reduce side child operator with a separate HiveReduceFunction in SparkCompiler level. For a map RDD which is followed by two reduce RDDs and then connected to a union RDD, Spark would compute map RDD twice unless map RDD is cached. If two reduce share the same shuffle dependency(which means they have same map output partitions), the job could be optimized to compute map RDD only once theoretically, but i think this should be an Spark framework level optimization. while two reduce RDDs don't share the same shuffle dependency, map RDD would be computed twice anyway. For multi-insert case, if we wrap all FileSinkOperators into one RDD, parent of FileSinkOperator would forward rows to each FileSinkOperator, so the data source for insert would be only generated once. so I think we do not really need multiple result collectors for SparkMapRecorderHandler and SparkReduceRecordHandler. SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple result collectors[Spark Branch] Key: HIVE-8118 URL: https://issues.apache.org/jira/browse/HIVE-8118 Project: Hive Issue Type: Bug Components: Spark Reporter: Xuefu Zhang Assignee: Venki Korukanti Labels: Spark-M1 In the current implementation, both SparkMapRecordHandler and SparkReduceRecorderHandler takes only one result collector, which limits that the corresponding map or reduce task can have only one child. It's very comment in multi-insert queries where a map/reduce task has more than one children. A query like the following has two map tasks as parents: {code} select name, sum(value) from dec group by name union all select name, value from dec order by name {code} It's possible in the future an optimation may be implemented so that a map work is followed by two reduce works and then connected to a union work. Thus, we should take this as a general case. Tez is currently providing a collector for each child operator in the map-side or reduce side operator tree. We can take Tez as a reference. Likely this is a big change and subtasks are possible. With this, we can have a simpler and clean multi-insert implementation. This is also the problem observed in HIVE-7731. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8104) Insert statements against ACID tables NPE when vectorization is on
[ https://issues.apache.org/jira/browse/HIVE-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135169#comment-14135169 ] Hive QA commented on HIVE-8104: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668847/HIVE-8104.patch {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 6277 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.streaming.TestStreaming.testAddPartition org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits org.apache.hive.hcatalog.streaming.TestStreaming.testMultipleTransactionBatchCommits org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbortAndCommit org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Delimited org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyAbort org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/816/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/816/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-816/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12668847 Insert statements against ACID tables NPE when vectorization is on -- Key: HIVE-8104 URL: https://issues.apache.org/jira/browse/HIVE-8104 Project: Hive Issue Type: Bug Components: Query Processor, Vectorization Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Attachments: HIVE-8104.patch Doing an insert against a table that is using ACID format with the transaction manager set to DbTxnManager and vectorization turned on results in an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6883) Dynamic partitioning optimization does not honor sort order or order by
[ https://issues.apache.org/jira/browse/HIVE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135176#comment-14135176 ] Zhichun Wu commented on HIVE-6883: -- @ [~prasanth_j] , this fix cause some problems when combine dynamic partitioning with group by. Consider the following case: {code} CREATE TABLE `t1`( `a` int,`b` string) PARTITIONED BY (`dt` string); create table src1 ( `key` string, `val` string ); explain insert overwrite table t1 partition(dt) select 1, hello, 20140901 from src1 group by key; {code} The key expressions of RS in Stage-2 are wrong. The part of the patch which using the parent RS's keyCols needs more changes. {code} if (parentRSOpOrder != null !parentRSOpOrder.isEmpty() sortPositions.isEmpty()) { newKeyCols.addAll(parentRSOp.getConf().getKeyCols()); orderStr += parentRSOpOrder; } {code} Dynamic partitioning optimization does not honor sort order or order by --- Key: HIVE-6883 URL: https://issues.apache.org/jira/browse/HIVE-6883 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Critical Fix For: 0.14.0, 0.13.1 Attachments: HIVE-6883-branch-0.13.3.patch, HIVE-6883.1.patch, HIVE-6883.2.patch, HIVE-6883.3.patch HIVE-6455 patch does not honor sort order of the output table or order by of select statement. The reason for the former is numDistributionKey in ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, because of this RSOp sets the sort columns to null in Key. Since nulls are set in place of sort columns in Key, the sort columns in Value are not sorted. The other issue is ORDER BY columns are not honored during insertion. For example {code} insert overwrite table over1k_part_orc partition(ds=foo, t) select si,i,b,f,t from over1k_orc where t is null or t=27 order by si; {code} the select query performs order by on column 'si' in the first MR job. The following MR job (inserted by HIVE-6455), sorts the input data on dynamic partition column 't' without taking into account the already sorted 'si' column. This results in out of order insertion for 'si' column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6090) Audit logs for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HIVE-6090: --- Attachment: HIVE-6090.1.WIP.patch Uploading a WIP progress patch that should apply cleanly. Will test against a live cluster (kerberos) and submit for precommit tests. Audit logs for HiveServer2 -- Key: HIVE-6090 URL: https://issues.apache.org/jira/browse/HIVE-6090 Project: Hive Issue Type: Improvement Components: Diagnosability, HiveServer2 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Attachments: HIVE-6090.1.WIP.patch, HIVE-6090.patch HiveMetastore has audit logs and would like to audit all queries or requests to HiveServer2 also. This will help in understanding how the APIs were used, queries submitted, users etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7305) Return value from in.read() is ignored in SerializationUtils#readLongLE()
[ https://issues.apache.org/jira/browse/HIVE-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] skrho updated HIVE-7305: Attachment: HIVE-7305_001.patch I added null check and size check logic.. Please review my patch~~ Return value from in.read() is ignored in SerializationUtils#readLongLE() - Key: HIVE-7305 URL: https://issues.apache.org/jira/browse/HIVE-7305 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7305_001.patch {code} long readLongLE(InputStream in) throws IOException { in.read(readBuffer, 0, 8); return (((readBuffer[0] 0xff) 0) + ((readBuffer[1] 0xff) 8) {code} Return value from read() may indicate fewer than 8 bytes read. The return value should be checked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7305) Return value from in.read() is ignored in SerializationUtils#readLongLE()
[ https://issues.apache.org/jira/browse/HIVE-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] skrho updated HIVE-7305: Assignee: skrho Status: Patch Available (was: Open) Return value from in.read() is ignored in SerializationUtils#readLongLE() - Key: HIVE-7305 URL: https://issues.apache.org/jira/browse/HIVE-7305 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: skrho Priority: Minor Attachments: HIVE-7305_001.patch {code} long readLongLE(InputStream in) throws IOException { in.read(readBuffer, 0, 8); return (((readBuffer[0] 0xff) 0) + ((readBuffer[1] 0xff) 8) {code} Return value from read() may indicate fewer than 8 bytes read. The return value should be checked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6148) Support arbitrary structs stored in HBase
[ https://issues.apache.org/jira/browse/HIVE-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135336#comment-14135336 ] Hive QA commented on HIVE-6148: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668872/HIVE-6148.1.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6277 tests executed *Failed tests:* {noformat} org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/818/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/818/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-818/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12668872 Support arbitrary structs stored in HBase - Key: HIVE-6148 URL: https://issues.apache.org/jira/browse/HIVE-6148 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.12.0 Reporter: Swarnim Kulkarni Attachments: HIVE-6148.1.patch.txt We should add support to be able to query arbitrary structs stored in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7935) Support dynamic service discovery for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135420#comment-14135420 ] Hive QA commented on HIVE-7935: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668869/HIVE-7935.8.patch {color:green}SUCCESS:{color} +1 6276 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/819/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/819/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-819/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12668869 Support dynamic service discovery for HiveServer2 - Key: HIVE-7935 URL: https://issues.apache.org/jira/browse/HIVE-7935 Project: Hive Issue Type: New Feature Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-7935.1.patch, HIVE-7935.2.patch, HIVE-7935.3.patch, HIVE-7935.4.patch, HIVE-7935.5.patch, HIVE-7935.6.patch, HIVE-7935.7.patch, HIVE-7935.8.patch To support Rolling Upgrade / HA, we need a mechanism by which a JDBC client can dynamically resolve an HiveServer2 to connect to. *High Level Design:* Whether, dynamic service discovery is supported or not, can be configured by setting HIVE_SERVER2_SUPPORT_DYNAMIC_SERVICE_DISCOVERY. ZooKeeper is used to support this. * When an instance of HiveServer2 comes up, it adds itself as a znode to ZooKeeper under a configurable namespace (HIVE_SERVER2_ZOOKEEPER_NAMESPACE). * A JDBC/ODBC client now specifies the ZooKeeper ensemble in its connection string, instead of pointing to a specific HiveServer2 instance. The JDBC driver, uses the ZooKeeper ensemble to pick an instance of HiveServer2 to connect for the entire session. * When an instance is removed from ZooKeeper, the existing client sessions continue till completion. When the last client session completes, the instance shuts down. * All new client connection pick one of the available HiveServer2 uris from ZooKeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135415#comment-14135415 ] Xuefu Zhang commented on HIVE-8054: --- Thank you for the catch, [~leftylev]. Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch] -- Key: HIVE-8054 URL: https://issues.apache.org/jira/browse/HIVE-8054 Project: Hive Issue Type: Improvement Components: Spark Reporter: Xuefu Zhang Assignee: Na Yang Labels: Spark-M1, TODOC-SPARK Fix For: spark-branch Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch, HIVE-8054.3-spark.patch Option hive.optimize.union.remove introduced in HIVE-3276 removes union operators from the operator graph in certain cases as an optimization reduce the number of MR jobs. While making sense in MR, this optimization is actually harmful to an execution engine such as Spark, which natives supports union without requiring additional jobs. This is because removing union operator creates disjointed operator graphs, each graph generating a job, and thus this optimization requires more jobs to run the query. Not to mention the additional complexity handling linked FS descriptors. I propose that we disable such optimization when the execution engine is Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8118) SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple result collectors[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135517#comment-14135517 ] Xuefu Zhang commented on HIVE-8118: --- Hi [~chengxiang li], Thank you for your input. I'm not sure if I understand your thought right. Let me clarify the problem by giving a SparkWork like this: {code} MapWork1 - ReduceWork1 \- ReduceWork2 {code} it means that MapWork1 will generate different datasets to feed to ReduceWork1 and ReduceWork2. In case of multi-insert, ReduceWork1 and ReduceWork2 will have a FS operator. Inside MapWork1, there will be two operator branches consuming the same data, and push different data sets to two RS operators. (ReduceWork1 and ReduceWork2 have different HiveReduceFunctions.) However, current implemenation only takes the first data set and feed it to both reduce works. The same problem can happen also if MapWork1 were a reduce work following other ReduceWork or MapWork. With this problem, I'm not sure how we can get around without letting MapWork1 generate two output RDDs, one for each following reduce work. Potentially, we can duplicate MapWork1 and have the following diagram: {code} MapWork11 - ReduceWork1 MapWork12 - ReduceWork2 {code} where MapWork11 and MapWork12 consume the same input table (input table as RDD), and feed its first output RDD to ReduceWork1 and the second to ReduceWork2. This has its complexity, but more importantly, there will be wasted READ (unless SPark is smart enough to cache the input table, which is unlikely) and COMPUTATION (computing data twice). I feel that it's unlikely to get such optimizations from Spark framework in the near term. Thus, I think we have to take into consideration that a map work or a reduce work might generate multiple RDDs, one feeds to each of its children. Since SparkMapRecorderHandler and SparkReduceRecordHandler are doing the data processing on map and reduce side, they need to have a way to generate multiple outputs. Please correct me if I understood you wrong. Thanks. SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple result collectors[Spark Branch] Key: HIVE-8118 URL: https://issues.apache.org/jira/browse/HIVE-8118 Project: Hive Issue Type: Bug Components: Spark Reporter: Xuefu Zhang Assignee: Venki Korukanti Labels: Spark-M1 In the current implementation, both SparkMapRecordHandler and SparkReduceRecorderHandler takes only one result collector, which limits that the corresponding map or reduce task can have only one child. It's very comment in multi-insert queries where a map/reduce task has more than one children. A query like the following has two map tasks as parents: {code} select name, sum(value) from dec group by name union all select name, value from dec order by name {code} It's possible in the future an optimation may be implemented so that a map work is followed by two reduce works and then connected to a union work. Thus, we should take this as a general case. Tez is currently providing a collector for each child operator in the map-side or reduce side operator tree. We can take Tez as a reference. Likely this is a big change and subtasks are possible. With this, we can have a simpler and clean multi-insert implementation. This is also the problem observed in HIVE-7731. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7870) Insert overwrite table query does not generate correct task plan [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7870: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Fixed via HIVE-8017. Insert overwrite table query does not generate correct task plan [Spark Branch] --- Key: HIVE-7870 URL: https://issues.apache.org/jira/browse/HIVE-7870 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Na Yang Assignee: Na Yang Labels: Spark-M1 Fix For: spark-branch Attachments: HIVE-7870.1-spark.patch, HIVE-7870.2-spark.patch, HIVE-7870.3-spark.patch, HIVE-7870.4-spark.patch, HIVE-7870.5-spark.patch Insert overwrite table query does not generate correct task plan when hive.optimize.union.remove and hive.merge.sparkfiles properties are ON. {noformat} set hive.optimize.union.remove=true set hive.merge.sparkfiles=true insert overwrite table outputTbl1 SELECT * FROM ( select key, 1 as values from inputTbl1 union all select * FROM ( SELECT key, count(1) as values from inputTbl1 group by key UNION ALL SELECT key, 2 as values from inputTbl1 ) a )b; select * from outputTbl1 order by key, values; {noformat} query result {noformat} 1 1 1 2 2 1 2 2 3 1 3 2 7 1 7 2 8 2 8 2 8 2 {noformat} expected result: {noformat} 1 1 1 1 1 2 2 1 2 1 2 2 3 1 3 1 3 2 7 1 7 1 7 2 8 1 8 1 8 2 8 2 8 2 {noformat} Move work is not working properly and some data are missing during move. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-7870) Insert overwrite table query does not generate correct task plan [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135532#comment-14135532 ] Xuefu Zhang edited comment on HIVE-7870 at 9/16/14 2:36 PM: Fixed via HIVE-8054. was (Author: xuefuz): Fixed via HIVE-8017. Insert overwrite table query does not generate correct task plan [Spark Branch] --- Key: HIVE-7870 URL: https://issues.apache.org/jira/browse/HIVE-7870 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Na Yang Assignee: Na Yang Labels: Spark-M1 Fix For: spark-branch Attachments: HIVE-7870.1-spark.patch, HIVE-7870.2-spark.patch, HIVE-7870.3-spark.patch, HIVE-7870.4-spark.patch, HIVE-7870.5-spark.patch Insert overwrite table query does not generate correct task plan when hive.optimize.union.remove and hive.merge.sparkfiles properties are ON. {noformat} set hive.optimize.union.remove=true set hive.merge.sparkfiles=true insert overwrite table outputTbl1 SELECT * FROM ( select key, 1 as values from inputTbl1 union all select * FROM ( SELECT key, count(1) as values from inputTbl1 group by key UNION ALL SELECT key, 2 as values from inputTbl1 ) a )b; select * from outputTbl1 order by key, values; {noformat} query result {noformat} 1 1 1 2 2 1 2 2 3 1 3 2 7 1 7 2 8 2 8 2 8 2 {noformat} expected result: {noformat} 1 1 1 1 1 2 2 1 2 1 2 2 3 1 3 1 3 2 7 1 7 1 7 2 8 1 8 1 8 2 8 2 8 2 {noformat} Move work is not working properly and some data are missing during move. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8061) improve the partition col stats update speed
[ https://issues.apache.org/jira/browse/HIVE-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135567#comment-14135567 ] Hive QA commented on HIVE-8061: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668871/HIVE-8061.4.patch {color:green}SUCCESS:{color} +1 6276 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/820/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/820/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-820/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12668871 improve the partition col stats update speed Key: HIVE-8061 URL: https://issues.apache.org/jira/browse/HIVE-8061 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Minor Attachments: HIVE-8061.1.patch, HIVE-8061.2.patch, HIVE-8061.3.patch, HIVE-8061.4.patch We worked hard towards faster update stats for columns of a partition of a table previously HIVE-7736 and HIVE-7876 Although there is some improvement, it is only correct in the first run. There will be duplicate column stats later. Thanks to Eugene Koifman 's comments. We fixed this in HIVE-7944 by reversing the patch. This JIRA ticket is my another try to improve the speed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8128) Improve Parquet Vectorization
Brock Noland created HIVE-8128: -- Summary: Improve Parquet Vectorization Key: HIVE-8128 URL: https://issues.apache.org/jira/browse/HIVE-8128 Project: Hive Issue Type: Sub-task Reporter: Brock Noland We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, VectorizedOrcSerde) which was partially done in HIVE-5998. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8121) Create micro-benchmarks for ParquetSerde and evaluate performance
[ https://issues.apache.org/jira/browse/HIVE-8121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8121: --- Description: These benchmarks should not execute queries but test only the ParquetSerde code to ensure we are as efficient as possible. The output of this JIRA is: 1) Benchmark tool exists 2) We create new tasks under HIVE-8120 to track the improvements required was: These benchmarks should not execute queries but test only the ParquetSerde code to ensure we are as efficient as possible. Likely the first thing we'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, VectorizedOrcSerde) which was partially done in HIVE-5998. The output of this JIRA is: 1) Benchmark tool exists 2) We create new tasks under HIVE-8120 to track the improvements required Create micro-benchmarks for ParquetSerde and evaluate performance - Key: HIVE-8121 URL: https://issues.apache.org/jira/browse/HIVE-8121 Project: Hive Issue Type: Sub-task Reporter: Brock Noland These benchmarks should not execute queries but test only the ParquetSerde code to ensure we are as efficient as possible. The output of this JIRA is: 1) Benchmark tool exists 2) We create new tasks under HIVE-8120 to track the improvements required -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8130) Support Date in Avro
Brock Noland created HIVE-8130: -- Summary: Support Date in Avro Key: HIVE-8130 URL: https://issues.apache.org/jira/browse/HIVE-8130 Project: Hive Issue Type: Sub-task Reporter: Brock Noland -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8131) Support timestamp in Avro
Brock Noland created HIVE-8131: -- Summary: Support timestamp in Avro Key: HIVE-8131 URL: https://issues.apache.org/jira/browse/HIVE-8131 Project: Hive Issue Type: Sub-task Reporter: Brock Noland -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8132) Support avro ACID (bulk update)
Brock Noland created HIVE-8132: -- Summary: Support avro ACID (bulk update) Key: HIVE-8132 URL: https://issues.apache.org/jira/browse/HIVE-8132 Project: Hive Issue Type: Sub-task Reporter: Brock Noland -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8120) Umbrella JIRA tracking Parquet improvements
[ https://issues.apache.org/jira/browse/HIVE-8120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135614#comment-14135614 ] Brock Noland commented on HIVE-8120: The view from my side is: * Perf (Benchmarks, vectorization) (P1) * Data types (P2) * Refactoring/cleanup (P2) * ACID (bulk update) (P3) Umbrella JIRA tracking Parquet improvements --- Key: HIVE-8120 URL: https://issues.apache.org/jira/browse/HIVE-8120 Project: Hive Issue Type: Improvement Reporter: Brock Noland -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8129) Umbrella JIRA to track Avro improvements
[ https://issues.apache.org/jira/browse/HIVE-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135617#comment-14135617 ] Brock Noland commented on HIVE-8129: * Data types (P1) * ACID (bulk update) (P2) Umbrella JIRA to track Avro improvements Key: HIVE-8129 URL: https://issues.apache.org/jira/browse/HIVE-8129 Project: Hive Issue Type: Improvement Reporter: Brock Noland -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8120) Umbrella JIRA tracking Parquet improvements
[ https://issues.apache.org/jira/browse/HIVE-8120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8120: --- Summary: Umbrella JIRA tracking Parquet improvements (was: Umbrella JIRA tracking Parquet work) Umbrella JIRA tracking Parquet improvements --- Key: HIVE-8120 URL: https://issues.apache.org/jira/browse/HIVE-8120 Project: Hive Issue Type: Improvement Reporter: Brock Noland -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8129) Umbrella JIRA to track Avro improvements
Brock Noland created HIVE-8129: -- Summary: Umbrella JIRA to track Avro improvements Key: HIVE-8129 URL: https://issues.apache.org/jira/browse/HIVE-8129 Project: Hive Issue Type: Improvement Reporter: Brock Noland -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8133) Support Postgres via DirectSQL
Brock Noland created HIVE-8133: -- Summary: Support Postgres via DirectSQL Key: HIVE-8133 URL: https://issues.apache.org/jira/browse/HIVE-8133 Project: Hive Issue Type: Sub-task Reporter: Brock Noland -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8134) concurrency improvements
Brock Noland created HIVE-8134: -- Summary: concurrency improvements Key: HIVE-8134 URL: https://issues.apache.org/jira/browse/HIVE-8134 Project: Hive Issue Type: Improvement Reporter: Brock Noland The goal of this JIRA is track supportability issues with concurrent users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8134) Umbrella JIRA to track concurrency improvements
[ https://issues.apache.org/jira/browse/HIVE-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8134: --- Summary: Umbrella JIRA to track concurrency improvements (was: concurrency improvements) Umbrella JIRA to track concurrency improvements --- Key: HIVE-8134 URL: https://issues.apache.org/jira/browse/HIVE-8134 Project: Hive Issue Type: Improvement Reporter: Brock Noland The goal of this JIRA is track supportability issues with concurrent users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8133) Support Postgres via DirectSQL
[ https://issues.apache.org/jira/browse/HIVE-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135639#comment-14135639 ] Damien Carol commented on HIVE-8133: [~brocknoland] The first step should be to enable Postgres as Metastore back end BEFORE trying to do direct SQL. Currentlry metastore can't work on Postgres. See HIVE-7689 I'm trying to fix normal use of Metastore with Postgres in HIVE-7689 I can take this ticket after if it's possible. Support Postgres via DirectSQL -- Key: HIVE-8133 URL: https://issues.apache.org/jira/browse/HIVE-8133 Project: Hive Issue Type: Sub-task Reporter: Brock Noland -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-8118) SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple result collectors[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135517#comment-14135517 ] Xuefu Zhang edited comment on HIVE-8118 at 9/16/14 4:02 PM: Hi [~chengxiang li], Thank you for your input. I'm not sure if I understand your thought right. Let me clarify the problem by giving a SparkWork like this: {code} MapWork1 - ReduceWork1 \- ReduceWork2 {code} it means that MapWork1 will generate different datasets to feed to ReduceWork1 and ReduceWork2. In case of multi-insert, ReduceWork1 and ReduceWork2 will have a FS operator. Inside MapWork1, there will be two operator branches consuming the same data, and push different data sets to two RS operators. (ReduceWork1 and ReduceWork2 have different HiveReduceFunctions.) However, current implemenation only takes the first data set and feed it to both reduce works. The same problem can happen also if MapWork1 were a reduce work following other ReduceWork or MapWork. With this problem, I'm not sure how we can get around without letting MapWork1 generate two output RDDs, one for each following reduce work. Potentially, we can duplicate MapWork1 and have the following diagram: {code} MapWork11 - ReduceWork1 MapWork12 - ReduceWork2 {code} where MapWork11 and MapWork12 consume the same input table (input table as RDD), and feed its first output RDD to ReduceWork1 and the second to ReduceWork2. This has its complexity, but more importantly, there will be wasted READ (unless SPark is smart enough to cache the input table, which is unlikely) and COMPUTATION (computing data twice). I feel that it's unlikely to get such optimizations from Spark framework in the near term. Thus, I think we have to take into consideration that a map work or a reduce work might generate multiple RDDs, one feeds to each of its children. Since SparkMapRecorderHandler and SparkReduceRecordHandler are doing the data processing on map and reduce side, they need to have a way to generate multiple outputs. Please correct me if I understood you wrong. Thanks. was (Author: xuefuz): Hi [~chengxiang li], Thank you for your input. I'm not sure if I understand your thought right. Let me clarify the problem by giving a SparkWork like this: {code} MapWork1 - ReduceWork1 \- ReduceWork2 {code} it means that MapWork1 will generate different datasets to feed to ReduceWork1 and ReduceWork2. In case of multi-insert, ReduceWork1 and ReduceWork2 will have a FS operator. Inside MapWork1, there will be two operator branches consuming the same data, and push different data sets to two RS operators. (ReduceWork1 and ReduceWork2 have different HiveReduceFunctions.) However, current implemenation only takes the first data set and feed it to both reduce works. The same problem can happen also if MapWork1 were a reduce work following other ReduceWork or MapWork. With this problem, I'm not sure how we can get around without letting MapWork1 generate two output RDDs, one for each following reduce work. Potentially, we can duplicate MapWork1 and have the following diagram: {code} MapWork11 - ReduceWork1 MapWork12 - ReduceWork2 {code} where MapWork11 and MapWork12 consume the same input table (input table as RDD), and feed its first output RDD to ReduceWork1 and the second to ReduceWork2. This has its complexity, but more importantly, there will be wasted READ (unless SPark is smart enough to cache the input table, which is unlikely) and COMPUTATION (computing data twice). I feel that it's unlikely to get such optimizations from Spark framework in the near term. Thus, I think we have to take into consideration that a map work or a reduce work might generate multiple RDDs, one feeds to each of its children. Since SparkMapRecorderHandler and SparkReduceRecordHandler are doing the data processing on map and reduce side, they need to have a way to generate multiple outputs. Please correct me if I understood you wrong. Thanks. SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple result collectors[Spark Branch] Key: HIVE-8118 URL: https://issues.apache.org/jira/browse/HIVE-8118 Project: Hive Issue Type: Bug Components: Spark Reporter: Xuefu Zhang Assignee: Venki Korukanti Labels: Spark-M1 In the current implementation, both SparkMapRecordHandler and SparkReduceRecorderHandler takes only one result collector, which limits that the corresponding map or reduce task can have only one child. It's very comment in multi-insert queries where a map/reduce task has more than one children. A query like the following has two map tasks as parents:
[jira] [Assigned] (HIVE-8133) Support Postgres via DirectSQL
[ https://issues.apache.org/jira/browse/HIVE-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol reassigned HIVE-8133: -- Assignee: Damien Carol Support Postgres via DirectSQL -- Key: HIVE-8133 URL: https://issues.apache.org/jira/browse/HIVE-8133 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Damien Carol -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8136) Finer grained locking
Brock Noland created HIVE-8136: -- Summary: Finer grained locking Key: HIVE-8136 URL: https://issues.apache.org/jira/browse/HIVE-8136 Project: Hive Issue Type: Sub-task Reporter: Brock Noland When using ZK for concurrency control, some statements require an exclusive table lock when they are atomic. Such as setting a tables location. This JIRA is to analyze the scope of statements like ALTER TABLE and see if we can reduce the locking required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8094) add LIKE keyword support for SHOW FUNCTIONS
[ https://issues.apache.org/jira/browse/HIVE-8094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135674#comment-14135674 ] Hive QA commented on HIVE-8094: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668887/HIVE-8094.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6276 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_partitioned {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/821/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/821/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-821/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12668887 add LIKE keyword support for SHOW FUNCTIONS --- Key: HIVE-8094 URL: https://issues.apache.org/jira/browse/HIVE-8094 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0, 0.13.1 Reporter: peter liu Assignee: peter liu Fix For: 0.14.0 Attachments: HIVE-8094.1.patch It would be nice to add LIKE keyword support for SHOW FUNCTIONS as below, and keep the patterns consistent to the way as SHOW DATABASES, SHOW TABLES. bq. SHOW FUNCTIONS LIKE 'foo*'; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8080) CBO: function name may not match UDF name during translation
[ https://issues.apache.org/jira/browse/HIVE-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135707#comment-14135707 ] Sergey Shelukhin commented on HIVE-8080: [~ashutoshc] [~jpullokkaran] ping? CBO: function name may not match UDF name during translation Key: HIVE-8080 URL: https://issues.apache.org/jira/browse/HIVE-8080 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8080.01.patch, HIVE-8080.02.patch, HIVE-8080.patch create_func1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7812) Disable CombineHiveInputFormat when ACID format is used
[ https://issues.apache.org/jira/browse/HIVE-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-7812: Status: Patch Available (was: Reopened) Disable CombineHiveInputFormat when ACID format is used --- Key: HIVE-7812 URL: https://issues.apache.org/jira/browse/HIVE-7812 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.14.0 Attachments: HIVE-7812.patch, HIVE-7812.patch, HIVE-7812.patch Currently the HiveCombineInputFormat complains when called on an ACID directory. Modify HiveCombineInputFormat so that HiveInputFormat is used instead if the directory is ACID format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7812) Disable CombineHiveInputFormat when ACID format is used
[ https://issues.apache.org/jira/browse/HIVE-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-7812: Attachment: HIVE-7812.patch I fixed the problem that was causing trouble for the new Tez tests. Disable CombineHiveInputFormat when ACID format is used --- Key: HIVE-7812 URL: https://issues.apache.org/jira/browse/HIVE-7812 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.14.0 Attachments: HIVE-7812.patch, HIVE-7812.patch, HIVE-7812.patch, HIVE-7812.patch Currently the HiveCombineInputFormat complains when called on an ACID directory. Modify HiveCombineInputFormat so that HiveInputFormat is used instead if the directory is ACID format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8055) Code cleanup after HIVE-8054 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8055: --- Issue Type: Sub-task (was: Task) Parent: HIVE-7292 Code cleanup after HIVE-8054 [Spark Branch] --- Key: HIVE-8055 URL: https://issues.apache.org/jira/browse/HIVE-8055 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Labels: Spark-M1 There is quite some code handling union removal optimization in SparkCompiler and related classes. We need to clean this up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8080) CBO: function name may not match UDF name during translation
[ https://issues.apache.org/jira/browse/HIVE-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135740#comment-14135740 ] Laljo John Pullokkaran commented on HIVE-8080: -- Could you add a RB entry? Will be easier to read the patch. Thanks CBO: function name may not match UDF name during translation Key: HIVE-8080 URL: https://issues.apache.org/jira/browse/HIVE-8080 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8080.01.patch, HIVE-8080.02.patch, HIVE-8080.patch create_func1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 25700: HIVE-8080 CBO: function name may not match UDF name during translation
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25700/ --- Review request for hive, Ashutosh Chauhan and John Pullokkaran. Repository: hive-git Description --- see jira Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java c503bbb ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/RexNodeConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java PRE-CREATION ql/src/test/queries/clientpositive/create_func1.q ad924d3 ql/src/test/results/clientpositive/create_func1.q.out 798f77f Diff: https://reviews.apache.org/r/25700/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Commented] (HIVE-8080) CBO: function name may not match UDF name during translation
[ https://issues.apache.org/jira/browse/HIVE-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135758#comment-14135758 ] Sergey Shelukhin commented on HIVE-8080: https://reviews.apache.org/r/25700/ CBO: function name may not match UDF name during translation Key: HIVE-8080 URL: https://issues.apache.org/jira/browse/HIVE-8080 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8080.01.patch, HIVE-8080.02.patch, HIVE-8080.patch create_func1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8138) Global Init file should allow specifying file name not only directory
Brock Noland created HIVE-8138: -- Summary: Global Init file should allow specifying file name not only directory Key: HIVE-8138 URL: https://issues.apache.org/jira/browse/HIVE-8138 Project: Hive Issue Type: Bug Reporter: Brock Noland HIVE-5160 allows you to specify a directory where a .hiverc file exists. However since .hiverc is a hidden file this can be confusing. The property should allow a path to a file or a directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8138) Global Init file should allow specifying file name not only directory
[ https://issues.apache.org/jira/browse/HIVE-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland reassigned HIVE-8138: -- Assignee: Brock Noland Global Init file should allow specifying file name not only directory -- Key: HIVE-8138 URL: https://issues.apache.org/jira/browse/HIVE-8138 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland HIVE-5160 allows you to specify a directory where a .hiverc file exists. However since .hiverc is a hidden file this can be confusing. The property should allow a path to a file or a directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25700: HIVE-8080 CBO: function name may not match UDF name during translation
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25700/#review53546 --- ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java https://reviews.apache.org/r/25700/#comment93239 Is hive token case insensitive or all function names are in lower case? ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java https://reviews.apache.org/r/25700/#comment93240 are all functions qualified in hive (w.r.t DB) How about built in functions like toLower? Could you say DB NAME.toLower()? - John Pullokkaran On Sept. 16, 2014, 5:23 p.m., Sergey Shelukhin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25700/ --- (Updated Sept. 16, 2014, 5:23 p.m.) Review request for hive, Ashutosh Chauhan and John Pullokkaran. Repository: hive-git Description --- see jira Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java c503bbb ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/RexNodeConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java PRE-CREATION ql/src/test/queries/clientpositive/create_func1.q ad924d3 ql/src/test/results/clientpositive/create_func1.q.out 798f77f Diff: https://reviews.apache.org/r/25700/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Commented] (HIVE-8137) Empty ORC file handling
[ https://issues.apache.org/jira/browse/HIVE-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135768#comment-14135768 ] Gopal V commented on HIVE-8137: --- The right approach is skip generating splits for such files. There is no reason to schedule this split or run a task at all. Empty ORC file handling --- Key: HIVE-8137 URL: https://issues.apache.org/jira/browse/HIVE-8137 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.14.0 Hive 13 does not handle reading of a zero size Orc File properly. An Orc file is suposed to have a post-script which the ReaderIml class tries to read and initialize the footer with it. But in case, the file is empty or is of zero size, then it runs into an IndexOutOfBound Exception because of ReaderImpl trying to read in its constructor. Code Snippet : //get length of PostScript int psLen = buffer.get(readSize - 1) 0xff; In the above code, readSize for an empty file is zero. I see that ensureOrcFooter() method performs some sanity checks for footer , so, either we can move the above code snippet to ensureOrcFooter() and throw a Malformed ORC file exception or we can create a dummy Reader that does not initialize footer and basically has hasNext() set to false so that it returns false on the first call. Basically, I would like to know what might be the correct way to handle an empty ORC file in a mapred job? Should we neglect it and not throw an exception or we can throw an exeption that the ORC file is malformed. Please let me know your thoughts on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8137) Empty ORC file handling
[ https://issues.apache.org/jira/browse/HIVE-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135778#comment-14135778 ] Pankit Thapar commented on HIVE-8137: - The issue is hadoop might create a split in case its a CombineInputFormat. Hadoop specifically creates empty splits. Empty ORC file handling --- Key: HIVE-8137 URL: https://issues.apache.org/jira/browse/HIVE-8137 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.14.0 Hive 13 does not handle reading of a zero size Orc File properly. An Orc file is suposed to have a post-script which the ReaderIml class tries to read and initialize the footer with it. But in case, the file is empty or is of zero size, then it runs into an IndexOutOfBound Exception because of ReaderImpl trying to read in its constructor. Code Snippet : //get length of PostScript int psLen = buffer.get(readSize - 1) 0xff; In the above code, readSize for an empty file is zero. I see that ensureOrcFooter() method performs some sanity checks for footer , so, either we can move the above code snippet to ensureOrcFooter() and throw a Malformed ORC file exception or we can create a dummy Reader that does not initialize footer and basically has hasNext() set to false so that it returns false on the first call. Basically, I would like to know what might be the correct way to handle an empty ORC file in a mapred job? Should we neglect it and not throw an exception or we can throw an exeption that the ORC file is malformed. Please let me know your thoughts on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25700: HIVE-8080 CBO: function name may not match UDF name during translation
On Sept. 16, 2014, 5:30 p.m., John Pullokkaran wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java, line 646 https://reviews.apache.org/r/25700/diff/1/?file=690720#file690720line646 are all functions qualified in hive (w.r.t DB) How about built in functions like toLower? Could you say DB NAME.toLower()? Also could you run few of the q test below and see if your change causes problems authorization_create_func1.qshow_functions.q vectorized_string_funcs.q create_func1.q vector_decimal_math_funcs.q vectorized_timestamp_funcs.q drop_function.q vectorized_date_funcs.q show_describe_func_quotes.q vectorized_math_funcs.q - John --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25700/#review53546 --- On Sept. 16, 2014, 5:23 p.m., Sergey Shelukhin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25700/ --- (Updated Sept. 16, 2014, 5:23 p.m.) Review request for hive, Ashutosh Chauhan and John Pullokkaran. Repository: hive-git Description --- see jira Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java c503bbb ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/RexNodeConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java PRE-CREATION ql/src/test/queries/clientpositive/create_func1.q ad924d3 ql/src/test/results/clientpositive/create_func1.q.out 798f77f Diff: https://reviews.apache.org/r/25700/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Updated] (HIVE-8097) Vectorized Reduce-Side [SMB] MapJoin operator fails
[ https://issues.apache.org/jira/browse/HIVE-8097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-8097: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks [~mmccline]! Vectorized Reduce-Side [SMB] MapJoin operator fails --- Key: HIVE-8097 URL: https://issues.apache.org/jira/browse/HIVE-8097 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-8097.01.patch, HIVE-8097.02.patch, HIVE-8097.03.patch Fails attempting to getScratchColumnVectorTypes since mapWork is null on reduce-side. Fix by calling that method using reduceWork object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8137) Empty ORC file handling
[ https://issues.apache.org/jira/browse/HIVE-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135790#comment-14135790 ] Gopal V commented on HIVE-8137: --- Hive's CombineInputFormat has pending changes to fix this - HIVE-6554 But obviously that does not apply to MR's combine implementation. The Tez one actually works as expected in this case, because it combines InputSplits instead of combining arbitrary FileSplits. Empty ORC file handling --- Key: HIVE-8137 URL: https://issues.apache.org/jira/browse/HIVE-8137 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.14.0 Hive 13 does not handle reading of a zero size Orc File properly. An Orc file is suposed to have a post-script which the ReaderIml class tries to read and initialize the footer with it. But in case, the file is empty or is of zero size, then it runs into an IndexOutOfBound Exception because of ReaderImpl trying to read in its constructor. Code Snippet : //get length of PostScript int psLen = buffer.get(readSize - 1) 0xff; In the above code, readSize for an empty file is zero. I see that ensureOrcFooter() method performs some sanity checks for footer , so, either we can move the above code snippet to ensureOrcFooter() and throw a Malformed ORC file exception or we can create a dummy Reader that does not initialize footer and basically has hasNext() set to false so that it returns false on the first call. Basically, I would like to know what might be the correct way to handle an empty ORC file in a mapred job? Should we neglect it and not throw an exception or we can throw an exeption that the ORC file is malformed. Please let me know your thoughts on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8074) Merge spark into trunk 9/12/2014
[ https://issues.apache.org/jira/browse/HIVE-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8074: --- Attachment: (was: HIVE-8074.1-spark.patch) Merge spark into trunk 9/12/2014 Key: HIVE-8074 URL: https://issues.apache.org/jira/browse/HIVE-8074 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Brock Noland -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8137) Empty ORC file handling
[ https://issues.apache.org/jira/browse/HIVE-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135821#comment-14135821 ] Pankit Thapar commented on HIVE-8137: - I ran an insert overwrite query from an empty table into an orc table. That triggered Hadoop's CombineFileInputFormat which does not check if the split is empty or not. Empty ORC file handling --- Key: HIVE-8137 URL: https://issues.apache.org/jira/browse/HIVE-8137 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.14.0 Hive 13 does not handle reading of a zero size Orc File properly. An Orc file is suposed to have a post-script which the ReaderIml class tries to read and initialize the footer with it. But in case, the file is empty or is of zero size, then it runs into an IndexOutOfBound Exception because of ReaderImpl trying to read in its constructor. Code Snippet : //get length of PostScript int psLen = buffer.get(readSize - 1) 0xff; In the above code, readSize for an empty file is zero. I see that ensureOrcFooter() method performs some sanity checks for footer , so, either we can move the above code snippet to ensureOrcFooter() and throw a Malformed ORC file exception or we can create a dummy Reader that does not initialize footer and basically has hasNext() set to false so that it returns false on the first call. Basically, I would like to know what might be the correct way to handle an empty ORC file in a mapred job? Should we neglect it and not throw an exception or we can throw an exeption that the ORC file is malformed. Please let me know your thoughts on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7588) Using type variable in UDF
[ https://issues.apache.org/jira/browse/HIVE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135827#comment-14135827 ] Hive QA commented on HIVE-7588: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668941/HIVE-7588.4.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6277 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/823/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/823/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-823/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12668941 Using type variable in UDF -- Key: HIVE-7588 URL: https://issues.apache.org/jira/browse/HIVE-7588 Project: Hive Issue Type: Improvement Components: UDF Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-7588.1.patch.txt, HIVE-7588.2.patch.txt, HIVE-7588.3.patch.txt, HIVE-7588.4.patch.txt From http://www.mail-archive.com/user@hive.apache.org/msg12307.html Support type variables in UDF {code} public T T evaluate(final T s, final String column_name, final int bitmap) throws Exception { if (s instanceof Double) return (T) new Double(-1.0); Else if( s instance of Integer) Return (T) new Integer(-1) ; ….. } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8038) Decouple ORC files split calculation logic from Filesystem's get file location implementation
[ https://issues.apache.org/jira/browse/HIVE-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-8038: -- Attachment: HIVE-8038.3.patch +1 - Patch looks good. For commit - .3.patch, removed a white space change a javadoc. {code} context.splits.add(new OrcSplit(file.getPath(), offset, length, -hosts, fileMetaInfo, isOriginal, hasBase, deltas)); +hosts, fileMetaInfo, isOriginal, hasBase, deltas)); } ... - * @return TreeMapOffst, BlockLocation + * @return TreeMapLong, BlockLocation {code} Decouple ORC files split calculation logic from Filesystem's get file location implementation - Key: HIVE-8038 URL: https://issues.apache.org/jira/browse/HIVE-8038 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.13.1 Reporter: Pankit Thapar Assignee: Pankit Thapar Fix For: 0.14.0 Attachments: HIVE-8038.2.patch, HIVE-8038.3.patch, HIVE-8038.patch What is the Current Logic == 1.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using the array index (index = offset/blockSize), get the corresponding host having the blockLocation 4.If the split spans multiple blocks, then get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits Issue with Current Logic = Dependency on FileSystem API’s logic for block location calculations. It returns an array and we need to rely on FileSystem to make all blocks of same size if we want to directly access a block from the array. What is the Fix = 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 1b.convert the array into a tree map offset, BlockLocation and return it through getLocationsWithOffSet() 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using Tree.floorEntry(key), get the highest entry smaller than offset for the split and get the corresponding host. 4a.If the split spans multiple blocks, get a submap, which contains all entries containing blockLocations from the offset to offset + length 4b.get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits What are the major changes in logic == 1. store BlockLocations in a Map instead of an array 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations() 3. one block case is checked by if(offset + length = start.getOffset() + start.getLength()) instead of if((offset % blockSize) + length = blockSize) What is the affect on Complexity (Big O) = 1. We add a O(n) loop to build a TreeMap from an array but its a one time cost and would not be called for each split 2. In case of one block case, we can get the block in O(logn) worst case which was O(1) before 3. Getting the submap is O(logn) 4. In case of multiple block case, building the list of hosts is O(m) which was O(n) m n as previously we were iterating over all the block locations but now we are only iterating only blocks that belong to that range go offsets that we need. What are the benefits of the change == 1. With this fix, we do not depend on the blockLocations returned by FileSystem to figure out the block corresponding to the offset and blockSize 2. Also, it is not necessary that block lengths is same for all blocks for all FileSystems 3. Previously we were using blockSize for one block case and block.length for multiple block case, which is not the case now. We figure out the block depending upon the actual length and offset of the block -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8090) Potential null pointer reference in WriterImpl#StreamFactory#createStream()
[ https://issues.apache.org/jira/browse/HIVE-8090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V reassigned HIVE-8090: - Assignee: Gopal V Potential null pointer reference in WriterImpl#StreamFactory#createStream() --- Key: HIVE-8090 URL: https://issues.apache.org/jira/browse/HIVE-8090 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.14.0 Reporter: Ted Yu Assignee: Gopal V Priority: Minor Attachments: HIVE-8090.1.patch, HIVE-8090.2.patch, HIVE-8090.3.patch, HIVE-8090.4.patch {code} switch (kind) { ... default: modifiers = null; break; } BufferedStream result = streams.get(name); if (result == null) { result = new BufferedStream(name.toString(), bufferSize, codec == null ? codec : codec.modify(modifiers)); {code} In case modifiers is null and codec is ZlibCodec, there would be NPE in ZlibCodec#modify(EnumSetModifier modifiers) : {code} for (Modifier m : modifiers) { {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8090) Potential null pointer reference in WriterImpl#StreamFactory#createStream()
[ https://issues.apache.org/jira/browse/HIVE-8090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-8090: -- Component/s: File Formats Potential null pointer reference in WriterImpl#StreamFactory#createStream() --- Key: HIVE-8090 URL: https://issues.apache.org/jira/browse/HIVE-8090 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.14.0 Reporter: Ted Yu Assignee: Gopal V Priority: Minor Attachments: HIVE-8090.1.patch, HIVE-8090.2.patch, HIVE-8090.3.patch, HIVE-8090.4.patch {code} switch (kind) { ... default: modifiers = null; break; } BufferedStream result = streams.get(name); if (result == null) { result = new BufferedStream(name.toString(), bufferSize, codec == null ? codec : codec.modify(modifiers)); {code} In case modifiers is null and codec is ZlibCodec, there would be NPE in ZlibCodec#modify(EnumSetModifier modifiers) : {code} for (Modifier m : modifiers) { {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8090) Potential null pointer reference in WriterImpl#StreamFactory#createStream()
[ https://issues.apache.org/jira/browse/HIVE-8090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135847#comment-14135847 ] Gopal V commented on HIVE-8090: --- Test failures look unrelated - +1. Assigned to myself till [~rpalamut] gets contributor access. Potential null pointer reference in WriterImpl#StreamFactory#createStream() --- Key: HIVE-8090 URL: https://issues.apache.org/jira/browse/HIVE-8090 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.14.0 Reporter: Ted Yu Assignee: Gopal V Priority: Minor Attachments: HIVE-8090.1.patch, HIVE-8090.2.patch, HIVE-8090.3.patch, HIVE-8090.4.patch {code} switch (kind) { ... default: modifiers = null; break; } BufferedStream result = streams.get(name); if (result == null) { result = new BufferedStream(name.toString(), bufferSize, codec == null ? codec : codec.modify(modifiers)); {code} In case modifiers is null and codec is ZlibCodec, there would be NPE in ZlibCodec#modify(EnumSetModifier modifiers) : {code} for (Modifier m : modifiers) { {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8038) Decouple ORC files split calculation logic from Filesystem's get file location implementation
[ https://issues.apache.org/jira/browse/HIVE-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135848#comment-14135848 ] Pankit Thapar commented on HIVE-8038: - Is .3.patch commited to trunk? Decouple ORC files split calculation logic from Filesystem's get file location implementation - Key: HIVE-8038 URL: https://issues.apache.org/jira/browse/HIVE-8038 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.13.1 Reporter: Pankit Thapar Assignee: Pankit Thapar Fix For: 0.14.0 Attachments: HIVE-8038.2.patch, HIVE-8038.3.patch, HIVE-8038.patch What is the Current Logic == 1.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using the array index (index = offset/blockSize), get the corresponding host having the blockLocation 4.If the split spans multiple blocks, then get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits Issue with Current Logic = Dependency on FileSystem API’s logic for block location calculations. It returns an array and we need to rely on FileSystem to make all blocks of same size if we want to directly access a block from the array. What is the Fix = 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 1b.convert the array into a tree map offset, BlockLocation and return it through getLocationsWithOffSet() 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using Tree.floorEntry(key), get the highest entry smaller than offset for the split and get the corresponding host. 4a.If the split spans multiple blocks, get a submap, which contains all entries containing blockLocations from the offset to offset + length 4b.get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits What are the major changes in logic == 1. store BlockLocations in a Map instead of an array 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations() 3. one block case is checked by if(offset + length = start.getOffset() + start.getLength()) instead of if((offset % blockSize) + length = blockSize) What is the affect on Complexity (Big O) = 1. We add a O(n) loop to build a TreeMap from an array but its a one time cost and would not be called for each split 2. In case of one block case, we can get the block in O(logn) worst case which was O(1) before 3. Getting the submap is O(logn) 4. In case of multiple block case, building the list of hosts is O(m) which was O(n) m n as previously we were iterating over all the block locations but now we are only iterating only blocks that belong to that range go offsets that we need. What are the benefits of the change == 1. With this fix, we do not depend on the blockLocations returned by FileSystem to figure out the block corresponding to the offset and blockSize 2. Also, it is not necessary that block lengths is same for all blocks for all FileSystems 3. Previously we were using blockSize for one block case and block.length for multiple block case, which is not the case now. We figure out the block depending upon the actual length and offset of the block -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8118) SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple result collectors [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135849#comment-14135849 ] Xuefu Zhang commented on HIVE-8118: --- I and [~chengxiang li] had an offline discussion and there was just a little bit confusion on understanding the problem, and now we are in the same page. To summarize, the problem comes when a map work or reduce work is connected to multiple reduce works. Currently the a map work or reduce work is only wired with one collector, which collects all data regardless the branch. That data set feeds to all subsequent child reduce works. I also noted that Tez provides a name, outputcollector map to its recorder handlers. However, for us, we may not be able to do that, due to the limitations of Spark's RDD transformation APIs. SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple result collectors [Spark Branch] - Key: HIVE-8118 URL: https://issues.apache.org/jira/browse/HIVE-8118 Project: Hive Issue Type: Bug Components: Spark Reporter: Xuefu Zhang Assignee: Venki Korukanti Labels: Spark-M1 In the current implementation, both SparkMapRecordHandler and SparkReduceRecorderHandler takes only one result collector, which limits that the corresponding map or reduce task can have only one child. It's very comment in multi-insert queries where a map/reduce task has more than one children. A query like the following has two map tasks as parents: {code} select name, sum(value) from dec group by name union all select name, value from dec order by name {code} It's possible in the future an optimation may be implemented so that a map work is followed by two reduce works and then connected to a union work. Thus, we should take this as a general case. Tez is currently providing a collector for each child operator in the map-side or reduce side operator tree. We can take Tez as a reference. Likely this is a big change and subtasks are possible. With this, we can have a simpler and clean multi-insert implementation. This is also the problem observed in HIVE-7731. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8038) Decouple ORC files split calculation logic from Filesystem's get file location implementation
[ https://issues.apache.org/jira/browse/HIVE-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135857#comment-14135857 ] Gopal V commented on HIVE-8038: --- No, there is a 24 hour waiting period after the +1. I will resolve the ticket once it is committed. Leave comments if you need to. Decouple ORC files split calculation logic from Filesystem's get file location implementation - Key: HIVE-8038 URL: https://issues.apache.org/jira/browse/HIVE-8038 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.13.1 Reporter: Pankit Thapar Assignee: Pankit Thapar Fix For: 0.14.0 Attachments: HIVE-8038.2.patch, HIVE-8038.3.patch, HIVE-8038.patch What is the Current Logic == 1.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using the array index (index = offset/blockSize), get the corresponding host having the blockLocation 4.If the split spans multiple blocks, then get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits Issue with Current Logic = Dependency on FileSystem API’s logic for block location calculations. It returns an array and we need to rely on FileSystem to make all blocks of same size if we want to directly access a block from the array. What is the Fix = 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 1b.convert the array into a tree map offset, BlockLocation and return it through getLocationsWithOffSet() 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using Tree.floorEntry(key), get the highest entry smaller than offset for the split and get the corresponding host. 4a.If the split spans multiple blocks, get a submap, which contains all entries containing blockLocations from the offset to offset + length 4b.get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits What are the major changes in logic == 1. store BlockLocations in a Map instead of an array 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations() 3. one block case is checked by if(offset + length = start.getOffset() + start.getLength()) instead of if((offset % blockSize) + length = blockSize) What is the affect on Complexity (Big O) = 1. We add a O(n) loop to build a TreeMap from an array but its a one time cost and would not be called for each split 2. In case of one block case, we can get the block in O(logn) worst case which was O(1) before 3. Getting the submap is O(logn) 4. In case of multiple block case, building the list of hosts is O(m) which was O(n) m n as previously we were iterating over all the block locations but now we are only iterating only blocks that belong to that range go offsets that we need. What are the benefits of the change == 1. With this fix, we do not depend on the blockLocations returned by FileSystem to figure out the block corresponding to the offset and blockSize 2. Also, it is not necessary that block lengths is same for all blocks for all FileSystems 3. Previously we were using blockSize for one block case and block.length for multiple block case, which is not the case now. We figure out the block depending upon the actual length and offset of the block -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6554) CombineHiveInputFormat should use the underlying InputSplits
[ https://issues.apache.org/jira/browse/HIVE-6554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135863#comment-14135863 ] Pankit Thapar commented on HIVE-6554: - Is there any update in this? CombineHiveInputFormat should use the underlying InputSplits Key: HIVE-6554 URL: https://issues.apache.org/jira/browse/HIVE-6554 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Currently CombineHiveInputFormat generates FileSplits without using the underlying InputFormat. This leads to a problem when an InputFormat needs a InputSplit that isn't exactly a FileSplit, because CombineHiveInputSplit always generates FileSplits and then calls the underlying InputFormats getRecordReader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8106) Enable vectorization for spark [spark branch]
[ https://issues.apache.org/jira/browse/HIVE-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-8106: --- Status: Open (was: Patch Available) Patch need to be reworked. Enable vectorization for spark [spark branch] - Key: HIVE-8106 URL: https://issues.apache.org/jira/browse/HIVE-8106 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-8106-spark.patch Enable the vectorization optimization on spark -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching
[ https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135879#comment-14135879 ] Sergey Shelukhin commented on HIVE-7926: Just pushed some early prototype code for storage layer into development branch long-lived daemons for query fragment execution, I/O and caching Key: HIVE-7926 URL: https://issues.apache.org/jira/browse/HIVE-7926 Project: Hive Issue Type: New Feature Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: LLAPdesigndocument.pdf We are proposing a new execution model for Hive that is a combination of existing process-based tasks and long-lived daemons running on worker nodes. These nodes can take care of efficient I/O, caching and query fragment execution, while heavy lifting like most joins, ordering, etc. can be handled by tasks. The proposed model is not a 2-system solution for small and large queries; neither it is a separate execution engine like MR or Tez. It can be used by any Hive execution engine, if support is added; in future even external products (e.g. Pig) can use it. The document with high-level design we are proposing will be attached shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6936) Provide table properties to InputFormats
[ https://issues.apache.org/jira/browse/HIVE-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-6936: Attachment: HIVE-6936.patch Resubmitting patch to jenkins. Provide table properties to InputFormats Key: HIVE-6936 URL: https://issues.apache.org/jira/browse/HIVE-6936 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.14.0 Attachments: HIVE-6936.patch, HIVE-6936.patch, HIVE-6936.patch, HIVE-6936.patch, HIVE-6936.patch, HIVE-6936.patch, HIVE-6936.patch, HIVE-6936.patch, HIVE-6936.patch Some advanced file formats need the table properties made available to them. Additionally, it would be convenient to provide a unique id for fetch operators and the complete list of directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25700: HIVE-8080 CBO: function name may not match UDF name during translation
On Sept. 16, 2014, 5:30 p.m., John Pullokkaran wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java, line 644 https://reviews.apache.org/r/25700/diff/1/?file=690720#file690720line644 Is hive token case insensitive or all function names are in lower case? see get... and register... methods in FunctionRegistry; when storing or retrieving, they are all made lower case On Sept. 16, 2014, 5:30 p.m., John Pullokkaran wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java, line 646 https://reviews.apache.org/r/25700/diff/1/?file=690720#file690720line646 are all functions qualified in hive (w.r.t DB) How about built in functions like toLower? Could you say DB NAME.toLower()? John Pullokkaran wrote: Also could you run few of the q test below and see if your change causes problems authorization_create_func1.q show_functions.q vectorized_string_funcs.q create_func1.qvector_decimal_math_funcs.q vectorized_timestamp_funcs.q drop_function.q vectorized_date_funcs.q show_describe_func_quotes.q vectorized_math_funcs.q this is covered by the 2nd part of the condition (if function is located w/o qualified name, just the name is returned) Will run the tests - Sergey --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25700/#review53546 --- On Sept. 16, 2014, 5:23 p.m., Sergey Shelukhin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25700/ --- (Updated Sept. 16, 2014, 5:23 p.m.) Review request for hive, Ashutosh Chauhan and John Pullokkaran. Repository: hive-git Description --- see jira Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java c503bbb ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/RexNodeConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java PRE-CREATION ql/src/test/queries/clientpositive/create_func1.q ad924d3 ql/src/test/results/clientpositive/create_func1.q.out 798f77f Diff: https://reviews.apache.org/r/25700/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Created] (HIVE-8139) Upgrade commons-lang from 2.4 to 2.6
Prasad Mujumdar created HIVE-8139: - Summary: Upgrade commons-lang from 2.4 to 2.6 Key: HIVE-8139 URL: https://issues.apache.org/jira/browse/HIVE-8139 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.14.0 Upgrade commons-lang version from 2.4 to latest 2.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8139) Upgrade commons-lang from 2.4 to 2.6
[ https://issues.apache.org/jira/browse/HIVE-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-8139: -- Attachment: HIVE-8139.1.patch Upgrade commons-lang from 2.4 to 2.6 Key: HIVE-8139 URL: https://issues.apache.org/jira/browse/HIVE-8139 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.14.0 Attachments: HIVE-8139.1.patch Upgrade commons-lang version from 2.4 to latest 2.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8139) Upgrade commons-lang from 2.4 to 2.6
[ https://issues.apache.org/jira/browse/HIVE-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-8139: -- Status: Patch Available (was: Open) Upgrade commons-lang from 2.4 to 2.6 Key: HIVE-8139 URL: https://issues.apache.org/jira/browse/HIVE-8139 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.14.0 Attachments: HIVE-8139.1.patch Upgrade commons-lang version from 2.4 to latest 2.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25595: HIVE-8083: Authorization DDLs should not enforce hive identifier syntax for user or group namesname that
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25595/ --- (Updated Sept. 16, 2014, 6:29 p.m.) Review request for hive and Brock Noland. Changes --- Rebased with latest Bugs: HIVE-8083 https://issues.apache.org/jira/browse/HIVE-8083 Repository: hive-git Description --- The compiler expects principals (user, group and role) as hive identifiers for authorization DDLs. The user and group are entities that belong to external namespace and we can't expect those to follow hive identifier syntax rules. For example, a userid or group can contain '-' which is not allowed by compiler. The patch is to allow string literal for user and group names. The quoted identifier support perhaps can be made to work with this. However IMO this syntax should be supported regardless of quoted identifier support (which is an optional configuration) Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 25cd3a5 ql/src/test/queries/clientpositive/authorization_non_id.q PRE-CREATION ql/src/test/results/clientpositive/authorization_non_id.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25595/diff/ Testing --- Added test case to verify various auth DDLs with new syntax. Thanks, Prasad Mujumdar
[jira] [Updated] (HIVE-8083) Authorization DDLs should not enforce hive identifier syntax for user or group
[ https://issues.apache.org/jira/browse/HIVE-8083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-8083: -- Attachment: HIVE-8083.2.patch Rebased with latest Authorization DDLs should not enforce hive identifier syntax for user or group -- Key: HIVE-8083 URL: https://issues.apache.org/jira/browse/HIVE-8083 Project: Hive Issue Type: Bug Components: SQL, SQLStandardAuthorization Affects Versions: 0.13.0, 0.13.1 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-8083.1.patch, HIVE-8083.2.patch The compiler expects principals (user, group and role) as hive identifiers for authorization DDLs. The user and group are entities that belong to external namespace and we can't expect those to follow hive identifier syntax rules. For example, a userid or group can contain '-' which is not allowed by compiler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8055) Code cleanup after HIVE-8054 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Na Yang updated HIVE-8055: -- Attachment: HIVE-8055-spark.patch HIVE-8054 disabled the union remove optimization feature on spark execution engine, so that the linked FileSink descriptors do not need to be maintained. This patch is cleaning up the un-necessary code. Code cleanup after HIVE-8054 [Spark Branch] --- Key: HIVE-8055 URL: https://issues.apache.org/jira/browse/HIVE-8055 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Na Yang Labels: Spark-M1 Attachments: HIVE-8055-spark.patch There is quite some code handling union removal optimization in SparkCompiler and related classes. We need to clean this up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8139) Upgrade commons-lang from 2.4 to 2.6
[ https://issues.apache.org/jira/browse/HIVE-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135918#comment-14135918 ] Brock Noland commented on HIVE-8139: +1 pending tests Upgrade commons-lang from 2.4 to 2.6 Key: HIVE-8139 URL: https://issues.apache.org/jira/browse/HIVE-8139 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.14.0 Attachments: HIVE-8139.1.patch Upgrade commons-lang version from 2.4 to latest 2.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25700: HIVE-8080 CBO: function name may not match UDF name during translation
On Sept. 16, 2014, 5:30 p.m., John Pullokkaran wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java, line 646 https://reviews.apache.org/r/25700/diff/1/?file=690720#file690720line646 are all functions qualified in hive (w.r.t DB) How about built in functions like toLower? Could you say DB NAME.toLower()? John Pullokkaran wrote: Also could you run few of the q test below and see if your change causes problems authorization_create_func1.q show_functions.q vectorized_string_funcs.q create_func1.qvector_decimal_math_funcs.q vectorized_timestamp_funcs.q drop_function.q vectorized_date_funcs.q show_describe_func_quotes.q vectorized_math_funcs.q Sergey Shelukhin wrote: this is covered by the 2nd part of the condition (if function is located w/o qualified name, just the name is returned) Will run the tests Ran the tests; there are some out file changes, but they are the same as on current cbo branch. - Sergey --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25700/#review53546 --- On Sept. 16, 2014, 5:23 p.m., Sergey Shelukhin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25700/ --- (Updated Sept. 16, 2014, 5:23 p.m.) Review request for hive, Ashutosh Chauhan and John Pullokkaran. Repository: hive-git Description --- see jira Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java c503bbb ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/RexNodeConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java PRE-CREATION ql/src/test/queries/clientpositive/create_func1.q ad924d3 ql/src/test/results/clientpositive/create_func1.q.out 798f77f Diff: https://reviews.apache.org/r/25700/diff/ Testing --- Thanks, Sergey Shelukhin
Review Request 25704: HIVE-8055:Code cleanup after HIVE-8054 [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25704/ --- Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-8055 https://issues.apache.org/jira/browse/HIVE-8055 Repository: hive-git Description --- HIVE-8054 disabled the union remove optimization feature on spark execution engine, so that the linked FileSink descriptors do not need to be maintained. This patch is cleaning up the un-necessary code. Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 5ddc16d ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 3cdfc51 Diff: https://reviews.apache.org/r/25704/diff/ Testing --- Thanks, Na Yang
[jira] [Updated] (HIVE-8115) Hive select query hang when fields contain map
[ https://issues.apache.org/jira/browse/HIVE-8115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaobing Zhou updated HIVE-8115: Attachment: HIVE-8115.1.patch made a patch to warn empty key or empty pair. Can anyone do a quick review? Thanks! Hive select query hang when fields contain map -- Key: HIVE-8115 URL: https://issues.apache.org/jira/browse/HIVE-8115 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Xiaobing Zhou Assignee: Xiaobing Zhou Fix For: 0.14.0 Attachments: HIVE-8115.1.patch, createTable.hql, data Attached the repro of the issue. When creating an table loading the data attached, all hive query with hangs even just select * from the table. repro steps: 1. run createTable.hql 2. hadoop fs ls -put data /data 3. LOAD DATA INPATH '/data' OVERWRITE INTO TABLE testtable; 4. SELECT * FROM testtable; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8055) Code cleanup after HIVE-8054 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135948#comment-14135948 ] Xuefu Zhang commented on HIVE-8055: --- Patch looks good. +1 pending on test. Code cleanup after HIVE-8054 [Spark Branch] --- Key: HIVE-8055 URL: https://issues.apache.org/jira/browse/HIVE-8055 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Na Yang Labels: Spark-M1 Attachments: HIVE-8055-spark.patch There is quite some code handling union removal optimization in SparkCompiler and related classes. We need to clean this up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8055) Code cleanup after HIVE-8054 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8055: -- Status: Patch Available (was: Open) Code cleanup after HIVE-8054 [Spark Branch] --- Key: HIVE-8055 URL: https://issues.apache.org/jira/browse/HIVE-8055 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Na Yang Labels: Spark-M1 Attachments: HIVE-8055-spark.patch There is quite some code handling union removal optimization in SparkCompiler and related classes. We need to clean this up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8115) Hive select query hang when fields contain map
[ https://issues.apache.org/jira/browse/HIVE-8115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaobing Zhou updated HIVE-8115: Status: Patch Available (was: Open) Hive select query hang when fields contain map -- Key: HIVE-8115 URL: https://issues.apache.org/jira/browse/HIVE-8115 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Xiaobing Zhou Assignee: Xiaobing Zhou Fix For: 0.14.0 Attachments: HIVE-8115.1.patch, createTable.hql, data Attached the repro of the issue. When creating an table loading the data attached, all hive query with hangs even just select * from the table. repro steps: 1. run createTable.hql 2. hadoop fs ls -put data /data 3. LOAD DATA INPATH '/data' OVERWRITE INTO TABLE testtable; 4. SELECT * FROM testtable; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7762) Enhancement while getting partitions via webhcat client
[ https://issues.apache.org/jira/browse/HIVE-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135954#comment-14135954 ] Mithun Radhakrishnan commented on HIVE-7762: Hello, Suhas. Thanks for working on fixing this inconsistency. Generally, this is a good fix. I'd encourage you to add a test-case to TestHCatClient, to create a table with uppercase partition columns, and then querying it with a lowercase partition-spec. (Essentially, what you've included in your description.) Adding one won't be hard; you could just use one of the other tests for reference. Also, could I please bother you to verify the behaviour of {{HCatClient.getPartitions()}}, for case insensitivity? If it's broken too, I'd rather we fixed both here. I expect that this should be alright, since it goes through the {{listPartitionsByFilter()}} API, but it would be good to have confirmation. Mithun Enhancement while getting partitions via webhcat client --- Key: HIVE-7762 URL: https://issues.apache.org/jira/browse/HIVE-7762 Project: Hive Issue Type: Improvement Components: WebHCat Reporter: Suhas Vasu Priority: Minor Attachments: HIVE-7762.2.patch, HIVE-7762.patch Hcatalog creates partitions in lower case, whereas getting partitions from hcatalog via webhcat client doesn't handle this. So the client starts throwing exceptions. Ex: CREATE EXTERNAL TABLE in_table (word STRING, cnt INT) PARTITIONED BY (Year STRING, Month STRING, Date STRING, Hour STRING, Minute STRING) STORED AS TEXTFILE LOCATION '/user/suhas/hcat-data/in/'; Then i try to get partitions by: {noformat} String inputTableName = in_table; String database = default; MapString, String partitionSpec = new HashMapString, String(); partitionSpec.put(Year, 2014); partitionSpec.put(Month, 08); partitionSpec.put(Date, 11); partitionSpec.put(Hour, 00); partitionSpec.put(Minute, 00); HCatClient client = get(catalogUrl); HCatPartition hCatPartition = client.getPartition(database, inputTableName, partitionSpec); {noformat} This throws up saying: {noformat} Exception in thread main org.apache.hcatalog.common.HCatException : 9001 : Exception occurred while processing HCat request : Invalid partition-key specified: year at org.apache.hcatalog.api.HCatClientHMSImpl.getPartition(HCatClientHMSImpl.java:366) at com.inmobi.demo.HcatPartitions.main(HcatPartitions.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) {noformat} The same code works if i do {noformat} partitionSpec.put(year, 2014); partitionSpec.put(month, 08); partitionSpec.put(date, 11); partitionSpec.put(hour, 00); partitionSpec.put(minute, 00); {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8106) Enable vectorization for spark [spark branch]
[ https://issues.apache.org/jira/browse/HIVE-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-8106: --- Attachment: HIVE-8106.1-spark.patch Enable vectorization for spark [spark branch] - Key: HIVE-8106 URL: https://issues.apache.org/jira/browse/HIVE-8106 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-8106-spark.patch, HIVE-8106.1-spark.patch Enable the vectorization optimization on spark -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25700: HIVE-8080 CBO: function name may not match UDF name during translation
On Sept. 16, 2014, 5:30 p.m., John Pullokkaran wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java, line 646 https://reviews.apache.org/r/25700/diff/1/?file=690720#file690720line646 are all functions qualified in hive (w.r.t DB) How about built in functions like toLower? Could you say DB NAME.toLower()? John Pullokkaran wrote: Also could you run few of the q test below and see if your change causes problems authorization_create_func1.q show_functions.q vectorized_string_funcs.q create_func1.qvector_decimal_math_funcs.q vectorized_timestamp_funcs.q drop_function.q vectorized_date_funcs.q show_describe_func_quotes.q vectorized_math_funcs.q Sergey Shelukhin wrote: this is covered by the 2nd part of the condition (if function is located w/o qualified name, just the name is returned) Will run the tests Sergey Shelukhin wrote: Ran the tests; there are some out file changes, but they are the same as on current cbo branch. it seems like the change would always use qualified function name. If thats the case would built in functions work? For example in the select statement could you always qualify functions with db name. What about arithmetic expressions, conjunctive/disjunctive functions (ad/or)? It seems like your change would qualify thos functions with DB name. What is that i am missing? - John --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25700/#review53546 --- On Sept. 16, 2014, 5:23 p.m., Sergey Shelukhin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25700/ --- (Updated Sept. 16, 2014, 5:23 p.m.) Review request for hive, Ashutosh Chauhan and John Pullokkaran. Repository: hive-git Description --- see jira Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java c503bbb ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/RexNodeConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java PRE-CREATION ql/src/test/queries/clientpositive/create_func1.q ad924d3 ql/src/test/results/clientpositive/create_func1.q.out 798f77f Diff: https://reviews.apache.org/r/25700/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Commented] (HIVE-8139) Upgrade commons-lang from 2.4 to 2.6
[ https://issues.apache.org/jira/browse/HIVE-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135970#comment-14135970 ] Ashutosh Chauhan commented on HIVE-8139: I think HIVE-7145 is relevant. Consider that one too. Upgrade commons-lang from 2.4 to 2.6 Key: HIVE-8139 URL: https://issues.apache.org/jira/browse/HIVE-8139 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.14.0 Attachments: HIVE-8139.1.patch Upgrade commons-lang version from 2.4 to latest 2.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8106) Enable vectorization for spark [spark branch]
[ https://issues.apache.org/jira/browse/HIVE-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135995#comment-14135995 ] Xuefu Zhang commented on HIVE-8106: --- Hi [~chinnalalam], If the patch is ready, please click above submit patch button to allow the test run. Thanks. Enable vectorization for spark [spark branch] - Key: HIVE-8106 URL: https://issues.apache.org/jira/browse/HIVE-8106 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-8106-spark.patch, HIVE-8106.1-spark.patch Enable the vectorization optimization on spark -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8139) Upgrade commons-lang from 2.4 to 2.6
[ https://issues.apache.org/jira/browse/HIVE-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135991#comment-14135991 ] Brock Noland commented on HIVE-8139: Makes sense. I think we can move to 2.6 until we are able to remove commons-lang 2. Upgrade commons-lang from 2.4 to 2.6 Key: HIVE-8139 URL: https://issues.apache.org/jira/browse/HIVE-8139 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.14.0 Attachments: HIVE-8139.1.patch Upgrade commons-lang version from 2.4 to latest 2.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5764) Stopping Metastore and HiveServer2 from command line
[ https://issues.apache.org/jira/browse/HIVE-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136012#comment-14136012 ] Xiaobing Zhou commented on HIVE-5764: - Can anyone do a review to make it go to trunk? Thanks! Stopping Metastore and HiveServer2 from command line Key: HIVE-5764 URL: https://issues.apache.org/jira/browse/HIVE-5764 Project: Hive Issue Type: Bug Components: HiveServer2, Metastore Reporter: Vaibhav Gumashta Assignee: Xiaobing Zhou Labels: patch Fix For: 0.14.0 Attachments: HIVE-5764.patch Currently a user needs to kill the process. Ideally there should be something like: hive --service metastore stop hive --service hiveserver2 stop -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5764) Stopping Metastore and HiveServer2 from command line
[ https://issues.apache.org/jira/browse/HIVE-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaobing Zhou updated HIVE-5764: Attachment: HIVE-5764.1.patch Stopping Metastore and HiveServer2 from command line Key: HIVE-5764 URL: https://issues.apache.org/jira/browse/HIVE-5764 Project: Hive Issue Type: Bug Components: HiveServer2, Metastore Reporter: Vaibhav Gumashta Assignee: Xiaobing Zhou Labels: patch Fix For: 0.14.0 Attachments: HIVE-5764.1.patch, HIVE-5764.patch Currently a user needs to kill the process. Ideally there should be something like: hive --service metastore stop hive --service hiveserver2 stop -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5764) Stopping Metastore and HiveServer2 from command line
[ https://issues.apache.org/jira/browse/HIVE-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaobing Zhou updated HIVE-5764: Attachment: (was: HIVE-5764.patch) Stopping Metastore and HiveServer2 from command line Key: HIVE-5764 URL: https://issues.apache.org/jira/browse/HIVE-5764 Project: Hive Issue Type: Bug Components: HiveServer2, Metastore Reporter: Vaibhav Gumashta Assignee: Xiaobing Zhou Labels: patch Fix For: 0.14.0 Attachments: HIVE-5764.1.patch Currently a user needs to kill the process. Ideally there should be something like: hive --service metastore stop hive --service hiveserver2 stop -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8102) Partitions of type 'date' behave incorrectly with daylight saving time.
[ https://issues.apache.org/jira/browse/HIVE-8102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-8102: - Status: Open (was: Patch Available) Just tried with a timezone with half-hour offsets that is ahead of UTC (Asia/Tehran) and this does not work, cancelling patch. Partitions of type 'date' behave incorrectly with daylight saving time. --- Key: HIVE-8102 URL: https://issues.apache.org/jira/browse/HIVE-8102 Project: Hive Issue Type: Bug Components: Database/Schema, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Eli Acherkan Attachments: HIVE-8102.1.patch On 2AM on March 28th 2014, Israel went from standard time (GMT+2) to daylight saving time (GMT+3). The server's timezone is Asia/Jerusalem. When creating a partition whose key is 2014-03-28, Hive creates a partition for 2013-03-27 instead: hive (default) create table test (a int) partitioned by (`b_prt` date); OK Time taken: 0.092 seconds hive (default) alter table test add partition (b_prt='2014-03-28'); OK Time taken: 0.187 seconds hive (default) show partitions test; OK partition b_prt=2014-03-27 Time taken: 0.134 seconds, Fetched: 1 row(s) It seems that the root cause is the behavior of DateWritable.daysToMillis/dateToDays. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7777) add CSV support for Serde
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136036#comment-14136036 ] Hive QA commented on HIVE-: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668948/HIVE-.3.patch {color:green}SUCCESS:{color} +1 6282 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/824/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/824/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-824/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12668948 add CSV support for Serde - Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, HIVE-.patch, csv-serde-master.zip There is no official support for csvSerde for hive while there is an open source project in github(https://github.com/ogrodnek/csv-serde). CSV is of high frequency in use as a data format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8074) Merge spark into trunk 9/12/2014
[ https://issues.apache.org/jira/browse/HIVE-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136041#comment-14136041 ] Brock Noland commented on HIVE-8074: The merge was really ugly due to the new statistics which for CBO which has been done on trunk. I have done the merge and will update the Spark Test file outputs soon. Until then, most spark tests will fail. Sorry for the disruption. Merge spark into trunk 9/12/2014 Key: HIVE-8074 URL: https://issues.apache.org/jira/browse/HIVE-8074 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Brock Noland -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8140) Remove obsolete code from SparkWork [Spark Branch]
Xuefu Zhang created HIVE-8140: - Summary: Remove obsolete code from SparkWork [Spark Branch] Key: HIVE-8140 URL: https://issues.apache.org/jira/browse/HIVE-8140 Project: Hive Issue Type: Bug Components: Spark Reporter: Xuefu Zhang There are old code in SparkWork about get/set map/reduce work. It's from POC code, which isn't applicable any more. We should remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)