[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode
[ https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753361#comment-13753361 ] Thejas M Nair commented on HIVE-4617: - +1 . Will commit if tests pass. ExecuteStatementAsync call to run a query in non-blocking mode -- Key: HIVE-4617 URL: https://issues.apache.org/jira/browse/HIVE-4617 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Jaideep Dhok Assignee: Vaibhav Gumashta Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, HIVE-4617.D12507Test.1.patch Provide a way to run a queries asynchronously. Current executeStatement call blocks until the query run is complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4601) WebHCat, Templeton need to support proxy users
[ https://issues.apache.org/jira/browse/HIVE-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753364#comment-13753364 ] Thejas M Nair commented on HIVE-4601: - Verified the hive package target, and hcatalog tests pass (this is a webhcat only change). I will commit this soon. WebHCat, Templeton need to support proxy users -- Key: HIVE-4601 URL: https://issues.apache.org/jira/browse/HIVE-4601 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 0.11.0 Reporter: Dilli Arumugam Assignee: Eugene Koifman Labels: proxy, templeton Fix For: 0.12.0 Attachments: HIVE-4601.2.patch, HIVE-4601.3.patch, HIVE-4601.4.patch, HIVE-4601.5.patch, HIVE-4601.patch We have a use case where a Gateway would provide unified and controlled access to secure hadoop cluster. The Gateway itself would authenticate to secure WebHDFS, Oozie and Templeton with SPNego. The Gateway would authenticate the end user with http basic and would assert the end user identity as douser argument in the calls to downstream WebHDFS, Oozie and Templeton. This works fine with WebHDFS and Oozie. But, does not work for Templeton as Templeton does not support proxy users. Hence, request to add this improvement to Templeton. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5128) Direct SQL for view is failing
[ https://issues.apache.org/jira/browse/HIVE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753366#comment-13753366 ] Hudson commented on HIVE-5128: -- ABORTED: Integrated in Hive-trunk-hadoop2 #387 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/387/]) HIVE-5128 : Direct SQL for view is failing (Sergey Shelukhin via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518258) * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java Direct SQL for view is failing --- Key: HIVE-5128 URL: https://issues.apache.org/jira/browse/HIVE-5128 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Sergey Shelukhin Priority: Trivial Fix For: 0.12.0 Attachments: HIVE-5128.D12465.1.patch, HIVE-5128.D12465.2.patch I cannot sure of this, but dropping views, (it rolls back to JPA and works fine) {noformat} etastore.ObjectStore: Direct SQL failed, falling back to ORM MetaException(message:Unexpected null for one of the IDs, SD null, column null, serde null) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:195) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1758) ... {noformat} Should it be disabled for views or can be fixed? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage
[ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753365#comment-13753365 ] Hudson commented on HIVE-3562: -- ABORTED: Integrated in Hive-trunk-hadoop2 #387 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/387/]) HIVE-3562 : Some limit can be pushed down to map stage (Navis via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518234) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/ql/build.xml * /hive/trunk/ql/ivy.xml * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveKey.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java * /hive/trunk/ql/src/test/queries/clientpositive/limit_pushdown.q * /hive/trunk/ql/src/test/queries/clientpositive/limit_pushdown_negative.q * /hive/trunk/ql/src/test/results/clientpositive/limit_pushdown.q.out * /hive/trunk/ql/src/test/results/clientpositive/limit_pushdown_negative.q.out Some limit can be pushed down to map stage -- Key: HIVE-3562 URL: https://issues.apache.org/jira/browse/HIVE-3562 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Fix For: 0.12.0 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, HIVE-3562.D5967.9.patch Queries with limit clause (with reasonable number), for example {noformat} select * from src order by key limit 10; {noformat} makes operator tree, TS-SEL-RS-EXT-LIMIT-FS But LIMIT can be partially calculated in RS, reducing size of shuffling. TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries
[ https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753379#comment-13753379 ] Hive QA commented on HIVE-5091: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12600486/HIVE-5091.D12249.3.patch {color:green}SUCCESS:{color} +1 2902 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/555/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/555/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ORC files should have an option to pad stripes to the HDFS block boundaries --- Key: HIVE-5091 URL: https://issues.apache.org/jira/browse/HIVE-5091 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, HIVE-5091.D12249.3.patch With ORC stripes being large, if a stripe straddles an HDFS block, the locality of read is suboptimal. It would be good to add padding to ensure that stripes don't straddle HDFS blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode
[ https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753381#comment-13753381 ] Carl Steinbach commented on HIVE-4617: -- [~thejas] I found some minor issues and am adding comments to phabricator. Please do not commit this patch until Jaideep has had a chance to respond. Thanks. ExecuteStatementAsync call to run a query in non-blocking mode -- Key: HIVE-4617 URL: https://issues.apache.org/jira/browse/HIVE-4617 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Jaideep Dhok Assignee: Vaibhav Gumashta Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, HIVE-4617.D12507Test.1.patch Provide a way to run a queries asynchronously. Current executeStatement call blocks until the query run is complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x
[ https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4460: Status: Open (was: Patch Available) Publish HCatalog artifacts for Hadoop 2.x - Key: HIVE-4460 URL: https://issues.apache.org/jira/browse/HIVE-4460 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Environment: Hadoop 2.x Reporter: Venkat Ranganathan Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.patch Original Estimate: 72h Time Spent: 40h 40m Remaining Estimate: 31h 20m HCatalog artifacts are only published for Hadoop 1.x version. As more projects add HCatalog integration, the need for HCatalog artifcats on Hadoop versions supported by the product is needed so that automated builds that target different Hadoop releases can be built successfully. For example SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 1.x and 2.x releases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x
[ https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4460: Attachment: HIVE-4460.Dtest.3.patch HIVE-4460.Dtest.3.patch - copy of HIVE-4460.3.patch to get pre-commit tests running Publish HCatalog artifacts for Hadoop 2.x - Key: HIVE-4460 URL: https://issues.apache.org/jira/browse/HIVE-4460 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Environment: Hadoop 2.x Reporter: Venkat Ranganathan Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.Dtest.3.patch, HIVE-4460.patch Original Estimate: 72h Time Spent: 40h 40m Remaining Estimate: 31h 20m HCatalog artifacts are only published for Hadoop 1.x version. As more projects add HCatalog integration, the need for HCatalog artifcats on Hadoop versions supported by the product is needed so that automated builds that target different Hadoop releases can be built successfully. For example SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 1.x and 2.x releases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x
[ https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4460: Status: Patch Available (was: Open) Publish HCatalog artifacts for Hadoop 2.x - Key: HIVE-4460 URL: https://issues.apache.org/jira/browse/HIVE-4460 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Environment: Hadoop 2.x Reporter: Venkat Ranganathan Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.Dtest.3.patch, HIVE-4460.patch Original Estimate: 72h Time Spent: 40h 40m Remaining Estimate: 31h 20m HCatalog artifacts are only published for Hadoop 1.x version. As more projects add HCatalog integration, the need for HCatalog artifcats on Hadoop versions supported by the product is needed so that automated builds that target different Hadoop releases can be built successfully. For example SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 1.x and 2.x releases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode
[ https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753389#comment-13753389 ] Thejas M Nair commented on HIVE-4617: - [~cwsteinbach] Sure. Please note that the latest patch is in a new phabricator link - https://reviews.facebook.net/D12507. Vaibhav had some issues updating the earlier one. ExecuteStatementAsync call to run a query in non-blocking mode -- Key: HIVE-4617 URL: https://issues.apache.org/jira/browse/HIVE-4617 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Jaideep Dhok Assignee: Vaibhav Gumashta Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, HIVE-4617.D12507Test.1.patch Provide a way to run a queries asynchronously. Current executeStatement call blocks until the query run is complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode
[ https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753394#comment-13753394 ] Thejas M Nair commented on HIVE-4617: - [~jaideepdhok] I hope you can also take a look at the revised patch that is based on your original patch. We can discuss how the GetQueryPlan api can be implemented in way that makes it possible to guarantee backward compatibility and any impact that would have on GetOperationStatus in HIVE-4569. ExecuteStatementAsync call to run a query in non-blocking mode -- Key: HIVE-4617 URL: https://issues.apache.org/jira/browse/HIVE-4617 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Jaideep Dhok Assignee: Vaibhav Gumashta Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, HIVE-4617.D12507Test.1.patch Provide a way to run a queries asynchronously. Current executeStatement call blocks until the query run is complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x
[ https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4460: Status: Patch Available (was: Open) Publish HCatalog artifacts for Hadoop 2.x - Key: HIVE-4460 URL: https://issues.apache.org/jira/browse/HIVE-4460 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Environment: Hadoop 2.x Reporter: Venkat Ranganathan Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.4.patch, HIVE-4460.patch Original Estimate: 72h Time Spent: 40h 40m Remaining Estimate: 31h 20m HCatalog artifacts are only published for Hadoop 1.x version. As more projects add HCatalog integration, the need for HCatalog artifcats on Hadoop versions supported by the product is needed so that automated builds that target different Hadoop releases can be built successfully. For example SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 1.x and 2.x releases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x
[ https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4460: Attachment: (was: HIVE-4460.Dtest.3.patch) Publish HCatalog artifacts for Hadoop 2.x - Key: HIVE-4460 URL: https://issues.apache.org/jira/browse/HIVE-4460 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Environment: Hadoop 2.x Reporter: Venkat Ranganathan Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.4.patch, HIVE-4460.patch Original Estimate: 72h Time Spent: 40h 40m Remaining Estimate: 31h 20m HCatalog artifacts are only published for Hadoop 1.x version. As more projects add HCatalog integration, the need for HCatalog artifcats on Hadoop versions supported by the product is needed so that automated builds that target different Hadoop releases can be built successfully. For example SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 1.x and 2.x releases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x
[ https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4460: Status: Open (was: Patch Available) Publish HCatalog artifacts for Hadoop 2.x - Key: HIVE-4460 URL: https://issues.apache.org/jira/browse/HIVE-4460 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Environment: Hadoop 2.x Reporter: Venkat Ranganathan Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.4.patch, HIVE-4460.patch Original Estimate: 72h Time Spent: 40h 40m Remaining Estimate: 31h 20m HCatalog artifacts are only published for Hadoop 1.x version. As more projects add HCatalog integration, the need for HCatalog artifcats on Hadoop versions supported by the product is needed so that automated builds that target different Hadoop releases can be built successfully. For example SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 1.x and 2.x releases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x
[ https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4460: Attachment: HIVE-4460.4.patch HIVE-4460.4.patch - copy of HIVE-4460.3.patch to get pre-commit tests running. The filename of previous file would not work with the pattern expected by the pre commit test framework. Publish HCatalog artifacts for Hadoop 2.x - Key: HIVE-4460 URL: https://issues.apache.org/jira/browse/HIVE-4460 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Environment: Hadoop 2.x Reporter: Venkat Ranganathan Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.4.patch, HIVE-4460.patch Original Estimate: 72h Time Spent: 40h 40m Remaining Estimate: 31h 20m HCatalog artifacts are only published for Hadoop 1.x version. As more projects add HCatalog integration, the need for HCatalog artifcats on Hadoop versions supported by the product is needed so that automated builds that target different Hadoop releases can be built successfully. For example SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 1.x and 2.x releases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode
[ https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753422#comment-13753422 ] Phabricator commented on HIVE-4617: --- cwsteinbach has commented on the revision HIVE-4617 [jira] ExecuteStatementAsync call to run a query in non-blocking mode. INLINE COMMENTS common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:739 Please add documentation for these new conf properties to hive-default.xml.template (and specify the units of 1). service/if/TCLIService.thrift:41 Please add HIVE_CLI_SERVICE_PROTOCOL_V2 along with a short comment explaining what's new with this protocol version. service/if/TCLIService.thrift:455 Bump this to HIVE_CLI_SERVICE_PROTOCOL_V2. Also, the client should probably check to make sure it's talking to a =V2 server before trying to execute an asynchronous call. service/if/TCLIService.thrift:474 Ditto. service/src/java/org/apache/hive/service/cli/CLIService.java:162 There's currently a 1:1 correspondence between operation methods in CLIService and SessionManager. I think it's worth maintaining that relationship, so I would advocate adding SessionManager.executeStatementAsync() instead of overloading SessionManager.executeStatement(). service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java:78 This should be private. service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java:66 I know that Java boolean variables default to false, but I think it would be a good to set this explicitly anyway. service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java:133 Server code should never print to stdout. Also, this is squelching the error instead of returning it to the client. service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java:138 Unnecessary use of this. service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java:181 Let's add an async parameter to OperationManager.newExecuteStatementOperation() instead of calling instanceof and casting. service/src/java/org/apache/hive/service/cli/session/SessionManager.java:57 It would be useful to log the size of the threadpool (INFO level). service/src/java/org/apache/hive/service/cli/session/SessionManager.java:79 This log message should tell the user that HIVE_SERVER2_ASYNC_EXEC_SHUTDOWN_TIMEOUT=xx has been exceeded and background tasks are still running, and that it's going to exit anyway without doing a graceful task cleanup. service/src/test/org/apache/hive/service/cli/CLIServiceTest.java:135 Can you add a statement that fails (e.g. because of a syntax error) and verify that error information is correctly returned to the client? service/src/test/org/apache/hive/service/cli/CLIServiceTest.java:151 I think this test should verify that getOperationStatus returns OperationState.RUNNING at least once. service/src/test/org/apache/hive/service/cli/CLIServiceTest.java:118 It would be nice to add automated tests that cover version discrepancies between client and server, but that's probably too much work. Can you try testing this by hand and at least get a handle on what the behavior is? Users are definitely going to run into this, so it would be good to know what to expect before the first question appears on the user mailing list. common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:738 Is 10 a good default value? A lot of people are probably going to hit this limit and wonder why their queries are blocking. I think this also implies that we should add OperationState.PENDING or OperationState.WAITING instead of returning OperationState.RUNNING. REVISION DETAIL https://reviews.facebook.net/D12507 To: JIRA, vaibhavgumashta Cc: cwsteinbach ExecuteStatementAsync call to run a query in non-blocking mode -- Key: HIVE-4617 URL: https://issues.apache.org/jira/browse/HIVE-4617 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Jaideep Dhok Assignee: Vaibhav Gumashta Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, HIVE-4617.D12507Test.1.patch Provide a way to run a queries asynchronously. Current executeStatement call blocks until the query run is complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode
[ https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753426#comment-13753426 ] Carl Steinbach commented on HIVE-4617: -- [~thejas] I added comments to phabricator. I'll leave it up to you to decide whether or not these issues should be addressed now or in a followup patch. Thanks. ExecuteStatementAsync call to run a query in non-blocking mode -- Key: HIVE-4617 URL: https://issues.apache.org/jira/browse/HIVE-4617 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Jaideep Dhok Assignee: Vaibhav Gumashta Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, HIVE-4617.D12507Test.1.patch Provide a way to run a queries asynchronously. Current executeStatement call blocks until the query run is complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-951) Selectively include EXTERNAL TABLE source files via REGEX
[ https://issues.apache.org/jira/browse/HIVE-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-951: Assignee: (was: Carl Steinbach) Selectively include EXTERNAL TABLE source files via REGEX - Key: HIVE-951 URL: https://issues.apache.org/jira/browse/HIVE-951 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Carl Steinbach Attachments: HIVE-951.patch CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular expression. CREATE EXTERNAL TABLE was designed to allow users to access data that exists outside of Hive, and currently makes the assumption that all of the files located under the supplied path should be included in the new table. Users frequently encounter directories containing multiple datasets, or directories that contain data in heterogeneous schemas, and it's often impractical or impossible to adjust the layout of the directory to meet the requirements of CREATE EXTERNAL TABLE. A good example of this problem is creating an external table based on the contents of an S3 bucket. One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE as follows: CREATE EXTERNAL TABLE ... LOCATION path [file_regex] ... For example: {code:sql} CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string ) STORED AS TEXTFILE LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$'; {code} Creates mytable1 which includes all files in s3:/my.bucket with a filename matching 'folder/2009*.bz2' {code:sql} CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int ) STORED AS TEXTFILE LOCATION 'hdfs://data/' 'xyz.*2009.bz2$'; {code} Creates mytable2 including all files matching 'xyz*2009.bz2' located under hdfs://data/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1367) cluster by multiple columns does not work if parenthesis is present
[ https://issues.apache.org/jira/browse/HIVE-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753451#comment-13753451 ] efan lee commented on HIVE-1367: I found that the result of DISTRIBUTE BY is not certain? It cause the failure of unit test. cluster by multiple columns does not work if parenthesis is present --- Key: HIVE-1367 URL: https://issues.apache.org/jira/browse/HIVE-1367 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Zhenxiao Luo Fix For: 0.10.0 Attachments: HIVE-1367.1.patch.txt The following query: select ... from src cluster by (key, value) throws a compile error: whereas the query select ... from src cluster by key, value works fine -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-951) Selectively include EXTERNAL TABLE source files via REGEX
[ https://issues.apache.org/jira/browse/HIVE-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753454#comment-13753454 ] indrajit commented on HIVE-951: --- External table really gives power to use the different tools on top of table . So you can get chance to do data mining. Its really very fast and easy to create Selectively include EXTERNAL TABLE source files via REGEX - Key: HIVE-951 URL: https://issues.apache.org/jira/browse/HIVE-951 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Carl Steinbach Attachments: HIVE-951.patch CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular expression. CREATE EXTERNAL TABLE was designed to allow users to access data that exists outside of Hive, and currently makes the assumption that all of the files located under the supplied path should be included in the new table. Users frequently encounter directories containing multiple datasets, or directories that contain data in heterogeneous schemas, and it's often impractical or impossible to adjust the layout of the directory to meet the requirements of CREATE EXTERNAL TABLE. A good example of this problem is creating an external table based on the contents of an S3 bucket. One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE as follows: CREATE EXTERNAL TABLE ... LOCATION path [file_regex] ... For example: {code:sql} CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string ) STORED AS TEXTFILE LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$'; {code} Creates mytable1 which includes all files in s3:/my.bucket with a filename matching 'folder/2009*.bz2' {code:sql} CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int ) STORED AS TEXTFILE LOCATION 'hdfs://data/' 'xyz.*2009.bz2$'; {code} Creates mytable2 including all files matching 'xyz*2009.bz2' located under hdfs://data/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5168) Extend Hive for spatial query support
Fusheng Wang created HIVE-5168: -- Summary: Extend Hive for spatial query support Key: HIVE-5168 URL: https://issues.apache.org/jira/browse/HIVE-5168 Project: Hive Issue Type: New Feature Reporter: Fusheng Wang I would like to propose to incorporate a newly developed spatial querying component into Hive. We have recently developed a high performance MapReduce based spatial querying system Hadoop-GIS, to support large scale spatial queries and analytics. Hadoop-GIS is a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through space partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects on MapReduce. Hadoop-GIS takes advantage of global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. We have an alpha release. We look forward to contributors in Hive community to contribute to the system. github: https://github.com/hadoop-gis Hadoop-GIS wiki: https://web.cci.emory.edu/confluence/display/HadoopGIS References: 1. Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, Joel Saltz: Hadoop-GIS: A High Performance Spatial Data Warehousing System Over MapReduce. In Proceedings of the 39th International Conference on Very Large Databases (VLDB'2013), Trento, Italy, August 26-30, 2013. http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p726-aji.pdf 2. Ablimit Aji, Fusheng Wang and Joel Saltz: Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data. In Proceedings of the 20th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2012), Redondo Beach, California, USA, November 6-9, 2012. http://confluence.cci.emory.edu:8090/download/attachments/6193390/SIGSpatial2012TechReport.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x
[ https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753555#comment-13753555 ] Hive QA commented on HIVE-4460: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12600550/HIVE-4460.4.patch {color:green}SUCCESS:{color} +1 2902 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/557/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/557/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. Publish HCatalog artifacts for Hadoop 2.x - Key: HIVE-4460 URL: https://issues.apache.org/jira/browse/HIVE-4460 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Environment: Hadoop 2.x Reporter: Venkat Ranganathan Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.4.patch, HIVE-4460.patch Original Estimate: 72h Time Spent: 40h 40m Remaining Estimate: 31h 20m HCatalog artifacts are only published for Hadoop 1.x version. As more projects add HCatalog integration, the need for HCatalog artifcats on Hadoop versions supported by the product is needed so that automated builds that target different Hadoop releases can be built successfully. For example SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 1.x and 2.x releases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4844) Add char/varchar data types
[ https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753554#comment-13753554 ] Xuefu Zhang commented on HIVE-4844: --- [~jdere] for 2 and 3, could you please exclude them? They will not get wasted. (I will eventually include the patch for HIVE-3976.) This will help rebase and review. Thanks a lot. Add char/varchar data types --- Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-4844.1.patch.hack, HIVE-4844.2.patch, HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, HIVE-4844.6.patch, HIVE-4844.7.patch, HIVE-4844.8.patch, HIVE-4844.9.patch, screenshot.png Add new char/varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4789) FetchOperator fails on partitioned Avro data
[ https://issues.apache.org/jira/browse/HIVE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753556#comment-13753556 ] Brock Noland commented on HIVE-4789: OK, cool. Do you have time to do that? If not I'd be willing to help out with that. FetchOperator fails on partitioned Avro data Key: HIVE-4789 URL: https://issues.apache.org/jira/browse/HIVE-4789 Project: Hive Issue Type: Bug Affects Versions: 0.11.0, 0.12.0 Reporter: Sean Busbey Assignee: Sean Busbey Priority: Blocker Attachments: HIVE-4789.1.patch.txt, HIVE-4789.2.patch.txt HIVE-3953 fixed using partitioned avro tables for anything that used the MapOperator, but those that rely on FetchOperator still fail with the same error. e.g. {code} SELECT * FROM partitioned_avro LIMIT 5; SELECT * FROM partitioned_avro WHERE partition_col=value; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x
[ https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753645#comment-13753645 ] Hive QA commented on HIVE-4460: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12600550/HIVE-4460.4.patch {color:green}SUCCESS:{color} +1 2902 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/558/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/558/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. Publish HCatalog artifacts for Hadoop 2.x - Key: HIVE-4460 URL: https://issues.apache.org/jira/browse/HIVE-4460 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Environment: Hadoop 2.x Reporter: Venkat Ranganathan Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.4.patch, HIVE-4460.patch Original Estimate: 72h Time Spent: 40h 40m Remaining Estimate: 31h 20m HCatalog artifacts are only published for Hadoop 1.x version. As more projects add HCatalog integration, the need for HCatalog artifcats on Hadoop versions supported by the product is needed so that automated builds that target different Hadoop releases can be built successfully. For example SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 1.x and 2.x releases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1511) Hive plan serialization is slow
[ https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753681#comment-13753681 ] Leo Romanoff commented on HIVE-1511: [~kamrul] I think I fixed the problem you reported. Your test seems to pass now on my side. I fixed the bug in Kryo (and it was a serious one related to usage of nested generic classes, e.g. Maps of Maps) and it is just committed into Kryo trunk. Simply update your Kryo 2.22-SNAPSHOT to make sure it uses the latest trunk and you should be fine. -Leo Hive plan serialization is slow --- Key: HIVE-1511 URL: https://issues.apache.org/jira/browse/HIVE-1511 Project: Hive Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Ning Zhang Assignee: Mohammad Kamrul Islam Attachments: generated_plan.xml, HIVE-1511.4.patch, HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, HIVE-1511-wip4.patch, HIVE-1511-wip.patch, KryoHiveTest.java, run.sh As reported by Edward Capriolo: For reference I did this as a test case SELECT * FROM src where key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR ...(100 more of these) No OOM but I gave up after the test case did not go anywhere for about 2 minutes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced
[ https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-4964. Resolution: Fixed Fix Version/s: 0.12.0 Committed to trunk. Thanks, Harish! Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced --- Key: HIVE-4964 URL: https://issues.apache.org/jira/browse/HIVE-4964 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch, HIVE-4964.D12585.1.patch There are still pieces of code that deal with: - supporting select expressions with Windowing - supporting a filter with windowing Need to do this before introducing Perf. improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced
[ https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753793#comment-13753793 ] Hudson commented on HIVE-4964: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #76 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/76/]) HIVE-4964 : Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced (Harish Butani via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518680) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced --- Key: HIVE-4964 URL: https://issues.apache.org/jira/browse/HIVE-4964 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch, HIVE-4964.D12585.1.patch There are still pieces of code that deal with: - supporting select expressions with Windowing - supporting a filter with windowing Need to do this before introducing Perf. improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask
[ https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-5137: --- Attachment: HIVE-5137.D12453.7-test.patch A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask Key: HIVE-5137 URL: https://issues.apache.org/jira/browse/HIVE-5137 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.11.0 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.7-test.patch Currently, a query like create table if not exists t2 as select * from t1 sets the hasResultSet to true in SQLOperation and in turn, the query returns a result set. However, as a DDL command, this should ideally not return a result set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask
[ https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-5137: --- Status: Open (was: Patch Available) Uploading a copy of same patch to kickoff tests A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask Key: HIVE-5137 URL: https://issues.apache.org/jira/browse/HIVE-5137 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.11.0 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.7-test.patch Currently, a query like create table if not exists t2 as select * from t1 sets the hasResultSet to true in SQLOperation and in turn, the query returns a result set. However, as a DDL command, this should ideally not return a result set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask
[ https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-5137: --- Status: Patch Available (was: Open) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask Key: HIVE-5137 URL: https://issues.apache.org/jira/browse/HIVE-5137 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.11.0 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.7-test.patch Currently, a query like create table if not exists t2 as select * from t1 sets the hasResultSet to true in SQLOperation and in turn, the query returns a result set. However, as a DDL command, this should ideally not return a result set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5158) allow getting all partitions for table to also use direct SQL path
[ https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753811#comment-13753811 ] Ashutosh Chauhan commented on HIVE-5158: ~33 tests failed. allow getting all partitions for table to also use direct SQL path -- Key: HIVE-5158 URL: https://issues.apache.org/jira/browse/HIVE-5158 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-5158.D12573.1.patch, HIVE-5158.D12573.2.patch, HIVE-5158.D12573.3.patch While testing some queries I noticed that getPartitions can be very slow (which happens e.g. in non-strict mode with no partition column filter); with a table with many partitions it can take 10-12s easily. SQL perf path can also be used for this path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5112) Upgrade protobuf to 2.5 from 2.4
[ https://issues.apache.org/jira/browse/HIVE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753839#comment-13753839 ] Brock Noland commented on HIVE-5112: With 2.1.0-beta released, should we move ahead on this one? Upgrade protobuf to 2.5 from 2.4 Key: HIVE-5112 URL: https://issues.apache.org/jira/browse/HIVE-5112 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Owen O'Malley Attachments: HIVE-5112.D12429.1.patch Hadoop and Hbase have both upgraded protobuf. We should as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4660) Let there be Tez
[ https://issues.apache.org/jira/browse/HIVE-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753844#comment-13753844 ] Bikas Saha commented on HIVE-4660: -- Folks, FYI, based on recent feedback we have changed the names used in some of the TEZ API's. It a simple refactoring on the Tez side and should be a simple refactoring fix on the Pig side too. Jira for reference. TEZ-410. Let there be Tez Key: HIVE-4660 URL: https://issues.apache.org/jira/browse/HIVE-4660 Project: Hive Issue Type: New Feature Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Tez is a new application framework built on Hadoop Yarn that can execute complex directed acyclic graphs of general data processing tasks. Here's the project's page: http://incubator.apache.org/projects/tez.html The interesting thing about Tez from Hive's perspective is that it will over time allow us to overcome inefficiencies in query processing due to having to express every algorithm in the map-reduce paradigm. The barrier to entry is pretty low as well: Tez can actually run unmodified MR jobs; But as a first step we can without much trouble start using more of Tez' features by taking advantage of the MRR pattern. MRR simply means that there can be any number of reduce stages following a single map stage - without having to write intermediate results to HDFS and re-read them in a new job. This is common when queries require multiple shuffles on keys without correlation (e.g.: join - grp by - window function - order by) For more details see the design doc here: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5102) ORC getSplits should create splits based the stripes
[ https://issues.apache.org/jira/browse/HIVE-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-5102: -- Attachment: HIVE-5102.D12579.2.patch omalley updated the revision HIVE-5102 [jira] ORC getSplits should create splits based the stripes. Replaced local fs with the mockfs to prevent random reorderings that caused a test failure. Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D12579 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D12579?vs=39189id=39261#toc AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StripeInformation.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java To: JIRA, omalley ORC getSplits should create splits based the stripes - Key: HIVE-5102 URL: https://issues.apache.org/jira/browse/HIVE-5102 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5102.D12579.1.patch, HIVE-5102.D12579.2.patch Currently ORC inherits getSplits from FileFormat, which basically makes a split per an HDFS block. This can create too little parallelism and would be better done by having getSplits look at the file footer and create splits based on the stripes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4660) Let there be Tez
[ https://issues.apache.org/jira/browse/HIVE-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753845#comment-13753845 ] Bikas Saha commented on HIVE-4660: -- Sorry I meant Hive instead of Pig. Let there be Tez Key: HIVE-4660 URL: https://issues.apache.org/jira/browse/HIVE-4660 Project: Hive Issue Type: New Feature Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Tez is a new application framework built on Hadoop Yarn that can execute complex directed acyclic graphs of general data processing tasks. Here's the project's page: http://incubator.apache.org/projects/tez.html The interesting thing about Tez from Hive's perspective is that it will over time allow us to overcome inefficiencies in query processing due to having to express every algorithm in the map-reduce paradigm. The barrier to entry is pretty low as well: Tez can actually run unmodified MR jobs; But as a first step we can without much trouble start using more of Tez' features by taking advantage of the MRR pattern. MRR simply means that there can be any number of reduce stages following a single map stage - without having to write intermediate results to HDFS and re-read them in a new job. This is common when queries require multiple shuffles on keys without correlation (e.g.: join - grp by - window function - order by) For more details see the design doc here: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
Hi All: Put some debugging code in TUGIContainingTransport.getTransport() and I tracked it down to @Override public TUGIContainingTransport getTransport(TTransport trans) { // UGI information is not available at connection setup time, it will be set later // via set_ugi() rpc. transMap.putIfAbsent(trans, new TUGIContainingTransport(trans)); //return transMap.get(trans); -change TUGIContainingTransport retTrans = transMap.get(trans); if ( retTrans == null ) { } On Wed, Jul 31, 2013 at 9:48 AM, agateaaa agate...@gmail.com wrote: Thanks Nitin There arent too many connections in close_wait state only 1 or two when we run into this. Most likely its because of dropped connection. I could not find any read or write timeouts we can set for the thrift server which will tell thrift to hold on to the client connection. See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt seem to have been implemented yet. We do have set a client connection timeout but cannot find an equivalent setting for the server. We have a suspicion that this happens when we run two client processes which modify two distinct partitions of the same hive table. We put in a workaround so that the two hive client processes never run together and so far things look ok but we will keep monitoring. Could it be because hive metastore server is not thread safe, would running two alter table statements on two distinct partitions of the same table using two client connections cause problems like these, where hive metastore server closes or drops a wrong client connection and leaves the other hanging? Agateaaa On Tue, Jul 30, 2013 at 12:49 AM, Nitin Pawar nitinpawar...@gmail.comwrote: The mentioned flow is called when you have unsecure mode of thrift metastore client-server connection. So one way to avoid this is have a secure way. code public boolean process(final TProtocol in, final TProtocol out) throwsTException { setIpAddress(in); ... ... ... @Override protected void setIpAddress(final TProtocol in) { TUGIContainingTransport ugiTrans = (TUGIContainingTransport)in.getTransport(); Socket socket = ugiTrans.getSocket(); if (socket != null) { setIpAddress(socket); /code From the above code snippet, it looks like the null pointer exception is not handled if the getSocket returns null. can you check whats the ulimit setting on the server? If its set to default can you set it to unlimited and restart hcat server. (This is just a wild guess). also the getSocket method suggests If the underlying TTransport is an instance of TSocket, it returns the Socket object which it contains. Otherwise it returns null. so someone from thirft gurus need to tell us whats happening. I have no knowledge of this depth may be Ashutosh or Thejas will be able to help on this. From the netstat close_wait, it looks like the hive metastore server has not closed the connection (do not know why yet), may be the hive dev guys can help.Are there too many connections in close_wait state? On Tue, Jul 30, 2013 at 5:52 AM, agateaaa agate...@gmail.com wrote: Looking at the hive metastore server logs see errors like these: 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run(182)) - Error occurred during processing of message. java.lang.NullPointerException at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) approx same time as we see timeout or connection reset errors. Dont know if this is the cause or the side affect of he connection timeout/connection reset errors. Does anybody have any pointers or suggestions ? Thanks On Mon, Jul 29, 2013 at 11:29 AM, agateaaa agate...@gmail.com wrote: Thanks Nitin! We have simiar setup (identical hcatalog and hive server versions) on a another production environment and dont see any errors (its been running ok for a few months) Unfortunately we wont be able to move to hcat 0.5 and hive 0.11 or hive 0.10 soon. I did see that the last time we ran into this problem doing a netstat-ntp | grep :1 see that server was holding on to one socket connection in CLOSE_WAIT state for a long time (hive metastore server is running on port 1). Dont know if thats relevant here or not Can you suggest any hive configuration settings we can tweak or networking tools/tips, we can use to
[jira] [Commented] (HIVE-5133) webhcat jobs that need to access metastore fails in secure mode
[ https://issues.apache.org/jira/browse/HIVE-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753882#comment-13753882 ] Thejas M Nair commented on HIVE-5133: - Thanks for the feedback, I will create a new patch addressing these comments. I also need to add e2e tests. Note about the patch - With this change for submitting pig or MR jobs you need to specify usehcatalog=true as a POST param. (in curl command -d usehcatalog=true ). In case of pig this argument is option, it is sufficient that you have a arg='-useHCatalog' POST param. webhcat jobs that need to access metastore fails in secure mode --- Key: HIVE-5133 URL: https://issues.apache.org/jira/browse/HIVE-5133 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-5133.1.patch Webhcat job submission requests result in the pig/hive/mr job being run from a map task that it launches. In secure mode, for the pig/hive/mr job that is run to be authorized to perform actions on metastore, it has to have the delegation tokens from the hive metastore. In case of pig/MR job this is needed if hcatalog is being used in the script/job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4844) Add char/varchar data types
[ https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-4844: - Attachment: HIVE-4844.10.patch attaching HIVE-4844.10.patch - remove instances of precision/scale where appropriate per Xuefu's request Add char/varchar data types --- Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-4844.10.patch, HIVE-4844.1.patch.hack, HIVE-4844.2.patch, HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, HIVE-4844.6.patch, HIVE-4844.7.patch, HIVE-4844.8.patch, HIVE-4844.9.patch, screenshot.png Add new char/varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3976) Support specifying scale and precision with Hive decimal type
[ https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-3976: - Attachment: remove_prec_scale.diff Here is the patch containing the instances where I've removed precision/scale from the patch to hIVE-4844, if you are interested in re-applying these changes on your side Support specifying scale and precision with Hive decimal type - Key: HIVE-3976 URL: https://issues.apache.org/jira/browse/HIVE-3976 Project: Hive Issue Type: Improvement Components: Query Processor, Types Reporter: Mark Grover Assignee: Xuefu Zhang Attachments: remove_prec_scale.diff HIVE-2693 introduced support for Decimal datatype in Hive. However, the current implementation has unlimited precision and provides no way to specify precision and scale when creating the table. For example, MySQL allows users to specify scale and precision of the decimal datatype when creating the table: {code} CREATE TABLE numbers (a DECIMAL(20,2)); {code} Hive should support something similar too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5161) Additional SerDe support for varchar type
[ https://issues.apache.org/jira/browse/HIVE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-5161: - Attachment: HIVE-5161.1.patch Additional SerDe support for varchar type - Key: HIVE-5161 URL: https://issues.apache.org/jira/browse/HIVE-5161 Project: Hive Issue Type: Bug Components: Serializers/Deserializers, Types Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-5161.1.patch Breaking out support for varchar for the various SerDes as an additional task. NO_COMMIT_TESTS - can't run tests until HIVE-4844 is committed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5161) Additional SerDe support for varchar type
[ https://issues.apache.org/jira/browse/HIVE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-5161: - Description: Breaking out support for varchar for the various SerDes as an additional task. NO_COMMIT_TESTS - can't run tests until HIVE-4844 is committed was:Breaking out support for varchar for the various SerDes as an additional task. Additional SerDe support for varchar type - Key: HIVE-5161 URL: https://issues.apache.org/jira/browse/HIVE-5161 Project: Hive Issue Type: Bug Components: Serializers/Deserializers, Types Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-5161.1.patch Breaking out support for varchar for the various SerDes as an additional task. NO_COMMIT_TESTS - can't run tests until HIVE-4844 is committed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5161) Additional SerDe support for varchar type
[ https://issues.apache.org/jira/browse/HIVE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-5161: - Status: Patch Available (was: Open) Additional SerDe support for varchar type - Key: HIVE-5161 URL: https://issues.apache.org/jira/browse/HIVE-5161 Project: Hive Issue Type: Bug Components: Serializers/Deserializers, Types Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-5161.1.patch Breaking out support for varchar for the various SerDes as an additional task. NO_COMMIT_TESTS - can't run tests until HIVE-4844 is committed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced
[ https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753980#comment-13753980 ] Hudson commented on HIVE-4964: -- FAILURE: Integrated in Hive-trunk-hadoop1-ptest #144 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/144/]) HIVE-4964 : Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced (Harish Butani via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518680) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced --- Key: HIVE-4964 URL: https://issues.apache.org/jira/browse/HIVE-4964 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch, HIVE-4964.D12585.1.patch There are still pieces of code that deal with: - supporting select expressions with Windowing - supporting a filter with windowing Need to do this before introducing Perf. improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced
[ https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753993#comment-13753993 ] Hudson commented on HIVE-4964: -- FAILURE: Integrated in Hive-trunk-h0.21 #2297 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2297/]) HIVE-4964 : Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced (Harish Butani via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518680) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced --- Key: HIVE-4964 URL: https://issues.apache.org/jira/browse/HIVE-4964 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch, HIVE-4964.D12585.1.patch There are still pieces of code that deal with: - supporting select expressions with Windowing - supporting a filter with windowing Need to do this before introducing Perf. improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5169) Sorted Bucketed Partitioned Insert does not sort by dynamic partition column causing reducer OOMs/lease-expiry errors
Gopal V created HIVE-5169: - Summary: Sorted Bucketed Partitioned Insert does not sort by dynamic partition column causing reducer OOMs/lease-expiry errors Key: HIVE-5169 URL: https://issues.apache.org/jira/browse/HIVE-5169 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Environment: Ubuntu LXC, hadoop-2 Reporter: Gopal V When a bulk-ETL operation is in progress, the query plan only sorts based on the SORTED BY key. This means that the FileSinkOperator in the reducer has to keep all the dynamic partition RecordWriters open till the end of the reducer lifetime. A more MR-friendly approach would be to sort by partition_col,sorted_col so that the data entering the reducer will not require to keep exactly one partition and bucket open at any given time. As a test-case a partitioned insert for the TPC-h benchmark's lineitem table will suffice {code} create table lineitem (L_ORDERKEY INT, ... partitioned by (L_SHIPDATE STRING) clustered by (l_orderkey) sorted by (l_orderkey) into 4 buckets stored as ORC; explain from (select L_ORDERKEY , ...) tbl insert overwrite table lineitem partition (L_SHIPDATE) select * ; {code} The generated plan very clearly has {code} Reduce Output Operator key expressions: expr: _col0 type: int sort order: + Map-reduce partition columns: expr: _col0 type: int tag: -1 {code} And col0 being L_ORDERKEY. In the FileSinkOperator over at the reducer side, this results in a larger than usual number of open files. This causes memory pressure due to the compression buffers used by ORC/RCFile and really slows down the reducers. A side-effect of this is that I had to pump 350Gb of TPC-h data through 4 reducers, which on occasion took 1 hour to get from opening a file in the FS to writing the first ORC stripe. This caused HDFS lease expiry and the task dying from that error. All of these can be avoided by adding the partition column to the sort keys as well as the partition keys keeping only one writer open in the FileSinkOperator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5169) Sorted Bucketed Partitioned Insert does not sort by dynamic partition column causing reducer OOMs/lease-expiry errors
[ https://issues.apache.org/jira/browse/HIVE-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-5169: -- Attachment: orc2.sql Scale=2 ORC loader. To generate TPC-h text tables, you can use https://github.com/t3rmin4t0r/tpch-gen And for the text DDL, you can find it in the ddl/text.sql file. Sorted Bucketed Partitioned Insert does not sort by dynamic partition column causing reducer OOMs/lease-expiry errors - Key: HIVE-5169 URL: https://issues.apache.org/jira/browse/HIVE-5169 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Environment: Ubuntu LXC, hadoop-2 Reporter: Gopal V Attachments: orc2.sql When a bulk-ETL operation is in progress, the query plan only sorts based on the SORTED BY key. This means that the FileSinkOperator in the reducer has to keep all the dynamic partition RecordWriters open till the end of the reducer lifetime. A more MR-friendly approach would be to sort by partition_col,sorted_col so that the data entering the reducer will not require to keep exactly one partition and bucket open at any given time. As a test-case a partitioned insert for the TPC-h benchmark's lineitem table will suffice {code} create table lineitem (L_ORDERKEY INT, ... partitioned by (L_SHIPDATE STRING) clustered by (l_orderkey) sorted by (l_orderkey) into 4 buckets stored as ORC; explain from (select L_ORDERKEY , ...) tbl insert overwrite table lineitem partition (L_SHIPDATE) select * ; {code} The generated plan very clearly has {code} Reduce Output Operator key expressions: expr: _col0 type: int sort order: + Map-reduce partition columns: expr: _col0 type: int tag: -1 {code} And col0 being L_ORDERKEY. In the FileSinkOperator over at the reducer side, this results in a larger than usual number of open files. This causes memory pressure due to the compression buffers used by ORC/RCFile and really slows down the reducers. A side-effect of this is that I had to pump 350Gb of TPC-h data through 4 reducers, which on occasion took 1 hour to get from opening a file in the FS to writing the first ORC stripe. This caused HDFS lease expiry and the task dying from that error. All of these can be avoided by adding the partition column to the sort keys as well as the partition keys keeping only one writer open in the FileSinkOperator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5168) Extend Hive for spatial query support
[ https://issues.apache.org/jira/browse/HIVE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754023#comment-13754023 ] Brock Noland commented on HIVE-5168: The design document needs to go here https://cwiki.apache.org/confluence/display/Hive/DesignDocs Extend Hive for spatial query support - Key: HIVE-5168 URL: https://issues.apache.org/jira/browse/HIVE-5168 Project: Hive Issue Type: New Feature Reporter: Fusheng Wang Labels: Hadoop-GIS, Spatial, I would like to propose to incorporate a newly developed spatial querying component into Hive. We have recently developed a high performance MapReduce based spatial querying system Hadoop-GIS, to support large scale spatial queries and analytics. Hadoop-GIS is a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through space partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects on MapReduce. Hadoop-GIS takes advantage of global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. We have an alpha release. We look forward to contributors in Hive community to contribute to the system. github: https://github.com/hadoop-gis Hadoop-GIS wiki: https://web.cci.emory.edu/confluence/display/HadoopGIS References: 1. Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, Joel Saltz: Hadoop-GIS: A High Performance Spatial Data Warehousing System Over MapReduce. In Proceedings of the 39th International Conference on Very Large Databases (VLDB'2013), Trento, Italy, August 26-30, 2013. http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p726-aji.pdf 2. Ablimit Aji, Fusheng Wang and Joel Saltz: Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data. In Proceedings of the 20th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2012), Redondo Beach, California, USA, November 6-9, 2012. http://confluence.cci.emory.edu:8090/download/attachments/6193390/SIGSpatial2012TechReport.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5170) Sorted Bucketed Partitioned Insert hard-codes the reducer count == bucket count
Gopal V created HIVE-5170: - Summary: Sorted Bucketed Partitioned Insert hard-codes the reducer count == bucket count Key: HIVE-5170 URL: https://issues.apache.org/jira/browse/HIVE-5170 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Environment: Ubuntu LXC Reporter: Gopal V When performing a hive sorted-partitioned insert, the insert optimizer hard-codes the number of output files to the actual bucket count of the table. https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L4852 We need at least that many reducers or if limited, switch to multi-spray (as implemented already), but more reducers is wasteful as long as the HiveKey only contains the partition columns. At this point, we're limited to reducers = n-bucket still, which is a problem for partitioning requests which need to insert nearly a terabyte of data into a single-digit bucket count and four-digit partition count. Since that is routed by the hasCode of the HiveKey, we can ensure that works by modifying the HiveKey to handle n-buckets internally. Basically it should only generate hashCode = (sort_cols.hashCode() % n) routing only to n reducers over-all, despite how many we spin up. So far so good with the hard-coded reducer count. But provided we fix the issues brought up by HIVE-5169, the insert becomes friendlier to a higher reducer count as well. At this juncture, we can modify the hashCode to be slightly more interesting. hashCode = (part_cols.hashCode()*31 + (sort_cols.hashCode() % n)) This generates somewhere between n to partition_count * n unique hash-codes. Since the sort-order bucketing has to be maintained per-partition dir, distributing this equally across any number of reducers will result in the scale-out of the reducer count. This will allow a reducer count that will allow for far faster inserts of ORC data into a partitioned/sorted table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1511) Hive plan serialization is slow
[ https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754031#comment-13754031 ] Mohammad Kamrul Islam commented on HIVE-1511: - [~romixlev] Thanks a lot for quick fix. Now working on the next failed one. Hive plan serialization is slow --- Key: HIVE-1511 URL: https://issues.apache.org/jira/browse/HIVE-1511 Project: Hive Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Ning Zhang Assignee: Mohammad Kamrul Islam Attachments: generated_plan.xml, HIVE-1511.4.patch, HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, HIVE-1511-wip4.patch, HIVE-1511-wip.patch, KryoHiveTest.java, run.sh As reported by Edward Capriolo: For reference I did this as a test case SELECT * FROM src where key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR ...(100 more of these) No OOM but I gave up after the test case did not go anywhere for about 2 minutes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4844) Add char/varchar data types
[ https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754038#comment-13754038 ] Jason Dere commented on HIVE-4844: -- Xuefu, you're going to hate me for this one, but upon review of the code with hbutani, I am planning to remove the ParameterizedPrimitiveTypeInfo/ParameterizedPrimitiveObjectInspector interfaces and just add those methods to the PrimitiveTypeInfo/PrimitiveObjectInspector interfaces. I hope this doesn't cause too many rebase issues with your decimal work. Add char/varchar data types --- Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-4844.10.patch, HIVE-4844.1.patch.hack, HIVE-4844.2.patch, HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, HIVE-4844.6.patch, HIVE-4844.7.patch, HIVE-4844.8.patch, HIVE-4844.9.patch, screenshot.png Add new char/varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1511) Hive plan serialization is slow
[ https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754035#comment-13754035 ] Brock Noland commented on HIVE-1511: Great to hear guys! When you are at a point where it makes sense it'd be interesting to see another run of the precommit tests. Hive plan serialization is slow --- Key: HIVE-1511 URL: https://issues.apache.org/jira/browse/HIVE-1511 Project: Hive Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Ning Zhang Assignee: Mohammad Kamrul Islam Attachments: generated_plan.xml, HIVE-1511.4.patch, HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, HIVE-1511-wip4.patch, HIVE-1511-wip.patch, KryoHiveTest.java, run.sh As reported by Edward Capriolo: For reference I did this as a test case SELECT * FROM src where key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR ...(100 more of these) No OOM but I gave up after the test case did not go anywhere for about 2 minutes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5168) Extend Hive for spatial query support
[ https://issues.apache.org/jira/browse/HIVE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754055#comment-13754055 ] Fusheng Wang commented on HIVE-5168: The DesignDocs wiki doesn't allow uploads from non-admin users. Should I update it here? Extend Hive for spatial query support - Key: HIVE-5168 URL: https://issues.apache.org/jira/browse/HIVE-5168 Project: Hive Issue Type: New Feature Reporter: Fusheng Wang Labels: Hadoop-GIS, Spatial, I would like to propose to incorporate a newly developed spatial querying component into Hive. We have recently developed a high performance MapReduce based spatial querying system Hadoop-GIS, to support large scale spatial queries and analytics. Hadoop-GIS is a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through space partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects on MapReduce. Hadoop-GIS takes advantage of global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. We have an alpha release. We look forward to contributors in Hive community to contribute to the system. github: https://github.com/hadoop-gis Hadoop-GIS wiki: https://web.cci.emory.edu/confluence/display/HadoopGIS References: 1. Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, Joel Saltz: Hadoop-GIS: A High Performance Spatial Data Warehousing System Over MapReduce. In Proceedings of the 39th International Conference on Very Large Databases (VLDB'2013), Trento, Italy, August 26-30, 2013. http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p726-aji.pdf 2. Ablimit Aji, Fusheng Wang and Joel Saltz: Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data. In Proceedings of the 20th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2012), Redondo Beach, California, USA, November 6-9, 2012. http://confluence.cci.emory.edu:8090/download/attachments/6193390/SIGSpatial2012TechReport.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5168) Extend Hive for spatial query support
[ https://issues.apache.org/jira/browse/HIVE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754060#comment-13754060 ] Brock Noland commented on HIVE-5168: Yeah that is unfortunate. You can either upload it here or [~ashutoshc] can give you edit privs. Extend Hive for spatial query support - Key: HIVE-5168 URL: https://issues.apache.org/jira/browse/HIVE-5168 Project: Hive Issue Type: New Feature Reporter: Fusheng Wang Labels: Hadoop-GIS, Spatial, I would like to propose to incorporate a newly developed spatial querying component into Hive. We have recently developed a high performance MapReduce based spatial querying system Hadoop-GIS, to support large scale spatial queries and analytics. Hadoop-GIS is a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through space partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects on MapReduce. Hadoop-GIS takes advantage of global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. We have an alpha release. We look forward to contributors in Hive community to contribute to the system. github: https://github.com/hadoop-gis Hadoop-GIS wiki: https://web.cci.emory.edu/confluence/display/HadoopGIS References: 1. Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, Joel Saltz: Hadoop-GIS: A High Performance Spatial Data Warehousing System Over MapReduce. In Proceedings of the 39th International Conference on Very Large Databases (VLDB'2013), Trento, Italy, August 26-30, 2013. http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p726-aji.pdf 2. Ablimit Aji, Fusheng Wang and Joel Saltz: Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data. In Proceedings of the 20th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2012), Redondo Beach, California, USA, November 6-9, 2012. http://confluence.cci.emory.edu:8090/download/attachments/6193390/SIGSpatial2012TechReport.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries
[ https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754073#comment-13754073 ] Gunther Hagleitner commented on HIVE-5091: -- Committed to trunk. Thanks Owen! ORC files should have an option to pad stripes to the HDFS block boundaries --- Key: HIVE-5091 URL: https://issues.apache.org/jira/browse/HIVE-5091 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, HIVE-5091.D12249.3.patch With ORC stripes being large, if a stripe straddles an HDFS block, the locality of read is suboptimal. It would be good to add padding to ensure that stripes don't straddle HDFS blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns
[ https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754076#comment-13754076 ] Yin Huai commented on HIVE-5149: Right, we should only use key as the partitioning column. Actually, the example I posted above is from test file reduce_deduplicate_extended.q. The plan of explain from (select key, value from src group by key, value) s select s.key group by s.key in hive trunk is wrong. ReduceSinkDeDuplication can pick the wrong partitioning columns --- Key: HIVE-5149 URL: https://issues.apache.org/jira/browse/HIVE-5149 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries
[ https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-5091: - Resolution: Fixed Fix Version/s: 0.12.0 Status: Resolved (was: Patch Available) ORC files should have an option to pad stripes to the HDFS block boundaries --- Key: HIVE-5091 URL: https://issues.apache.org/jira/browse/HIVE-5091 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.12.0 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, HIVE-5091.D12249.3.patch With ORC stripes being large, if a stripe straddles an HDFS block, the locality of read is suboptimal. It would be good to add padding to ensure that stripes don't straddle HDFS blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5171) metastore server can cache pruning results across queries
[ https://issues.apache.org/jira/browse/HIVE-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-5171: --- Assignee: (was: Sergey Shelukhin) metastore server can cache pruning results across queries -- Key: HIVE-5171 URL: https://issues.apache.org/jira/browse/HIVE-5171 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Partition pruning results are cached during a query (SemanticAnalyzer and ParseContext are the scope). We could also cache them between queries in MetaStore, which would be especially useful if metastore server is remote and thus long-lived/shared between clients. It may be more complex than it seems due to OOM potential. Also the key would need to be changed since the same expression string that is currently used may mean different things for different queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5172) TUGIContainingTransport returning null transport, causing intermittent SocketTimeoutException on hive client and NullPointerException in TUGIBasedProcessor on the server
agate created HIVE-5172: --- Summary: TUGIContainingTransport returning null transport, causing intermittent SocketTimeoutException on hive client and NullPointerException in TUGIBasedProcessor on the server Key: HIVE-5172 URL: https://issues.apache.org/jira/browse/HIVE-5172 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.9.0 Reporter: agate We are running into frequent problem using HCatalog 0.4.1 (Hive Metastore Server 0.9) where we get connection reset or connection timeout errors on the client and NullPointerException in TUGITransport on the server. hive client logs: = org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:954) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7524) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 31 more hive metastore server logs: === 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run(182)) - Error occurred during processing of message. java.lang.NullPointerException at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Adding some extra debug log messages in TUGIBasedProcessor, noticed that the TUGIContainingTransport is null which results in NullPointerException on the server. Further drilling into TUGIContainingTransport noticed that getTransport() returns a null which causes the above error. Further corelating with GC logs observed that that error always hits when the CMS GC has just kicked in (but does not happen after every GC) Put some debugging code in TUGIContainingTransport.getTransport() and I tracked it down to @Override public TUGIContainingTransport getTransport(TTransport trans) {   // UGI information is not
[jira] [Commented] (HIVE-5102) ORC getSplits should create splits based the stripes
[ https://issues.apache.org/jira/browse/HIVE-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754092#comment-13754092 ] Hive QA commented on HIVE-5102: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12600616/HIVE-5102.D12579.2.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2907 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/560/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/560/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ORC getSplits should create splits based the stripes - Key: HIVE-5102 URL: https://issues.apache.org/jira/browse/HIVE-5102 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5102.D12579.1.patch, HIVE-5102.D12579.2.patch Currently ORC inherits getSplits from FileFormat, which basically makes a split per an HDFS block. This can create too little parallelism and would be better done by having getSplits look at the file footer and create splits based on the stripes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5172) TUGIContainingTransport returning null transport, causing intermittent SocketTimeoutException on hive client and NullPointerException in TUGIBasedProcessor on the server
[ https://issues.apache.org/jira/browse/HIVE-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] agate updated HIVE-5172: Description: We are running into frequent problem using HCatalog 0.4.1 (Hive Metastore Server 0.9) where we get connection reset or connection timeout errors on the client and NullPointerException in TUGITransport on the server. hive client logs: = org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:954) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7524) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 31 more hive metastore server logs: === 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run(182)) - Error occurred during processing of message. java.lang.NullPointerException at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Adding some extra debug log messages in TUGIBasedProcessor, noticed that the TUGIContainingTransport is null which results in NullPointerException on the server. Further drilling into TUGIContainingTransport noticed that getTransport() returns a null which causes the above error. Further corelating with GC logs observed that that error always hits when the CMS GC has just kicked in (but does not happen after every GC) Put some debugging code in TUGIContainingTransport.getTransport() and I tracked it down to @Override public TUGIContainingTransport getTransport(TTransport trans) {   // UGI information is not available at connection setup time, it will be set later   // via set_ugi() rpc.   transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));   //return transMap.get(trans); //-change TUGIContainingTransport retTrans = transMap.get(trans); if ( retTrans == null )
Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
Thanks Ashutosh. Filed https://issues.apache.org/jira/browse/HIVE-5172 On Thu, Aug 29, 2013 at 11:53 AM, Ashutosh Chauhan hashut...@apache.orgwrote: Thanks Agatea for digging in. Seems like you have hit a bug. Would you mind opening a jira and adding your findings to it. Thanks, Ashutosh On Thu, Aug 29, 2013 at 11:22 AM, agateaaa agate...@gmail.com wrote: Sorry hit send too soon ... Hi All: Put some debugging code in TUGIContainingTransport.getTransport() and I tracked it down to @Override public TUGIContainingTransport getTransport(TTransport trans) { // UGI information is not available at connection setup time, it will be set later // via set_ugi() rpc. transMap.putIfAbsent(trans, new TUGIContainingTransport(trans)); //return transMap.get(trans); //-change TUGIContainingTransport retTrans = transMap.get(trans); if ( retTrans == null ) { LOGGER.error ( cannot find transport that was in map !!) } else { LOGGER.debug ( cannot find transport that was in map !!) return retTrans; } } When we run this in our test environment, see that we run into the problem just after GC runs, and cannot find transport that was in the map!! message gets logged. Could the GC be collecting entries from transMap, just before the we get it Tried a minor change which seems to work public TUGIContainingTransport getTransport(TTransport trans) { TUGIContainingTransport retTrans = transMap.get(trans); if ( retTrans == null ) { // UGI information is not available at connection setup time, it will be set later // via set_ugi() rpc. transMap.putIfAbsent(trans, retTrans); } return retTrans; } My questions for hive and thrift experts 1.) Do we need to use a ConcurrentMap ConcurrentMapTTransport, TUGIContainingTransport transMap = new MapMaker().weakKeys().weakValues().makeMap(); It does use == to compare keys (which might be the problem), also in this case we cant rely on the trans to be always there in the transMap, even after a put, so in that case change above probably makes sense 2.) Is it better idea to use WeakHashMap with WeakReference instead ? (was looking at org.apache.thrift.transport.TSaslServerTransport, esp change made by THRIFT-1468) e.g. private static MapTTransport, WeakReferenceTUGIContainingTransport transMap3 = Collections.synchronizedMap(new WeakHashMapTTransport, WeakReferenceTUGIContainingTransport()); getTransport() would be something like public TUGIContainingTransport getTransport(TTransport trans) { WeakReferenceTUGIContainingTransport ret = transMap.get(trans); if (ret == null || ret.get() == null) { ret = new WeakReferenceTUGIContainingTransport(new TUGIContainingTransport(trans)); transMap3.put(trans, ret); // No need for putIfAbsent(). // Concurrent calls to getTransport() will pass in different TTransports. } return ret.get(); } I did try 1.) above in our test environment and it does seem to resolve the problem, though i am not sure if I am introducing any other problem Can someone help ? Thanks Agatea On Thu, Aug 29, 2013 at 10:57 AM, agateaaa agate...@gmail.com wrote: Hi All: Put some debugging code in TUGIContainingTransport.getTransport() and I tracked it down to @Override public TUGIContainingTransport getTransport(TTransport trans) { // UGI information is not available at connection setup time, it will be set later // via set_ugi() rpc. transMap.putIfAbsent(trans, new TUGIContainingTransport(trans)); //return transMap.get(trans); -change TUGIContainingTransport retTrans = transMap.get(trans); if ( retTrans == null ) { } On Wed, Jul 31, 2013 at 9:48 AM, agateaaa agate...@gmail.com wrote: Thanks Nitin There arent too many connections in close_wait state only 1 or two when we run into this. Most likely its because of dropped connection. I could not find any read or write timeouts we can set for the thrift server which will tell thrift to hold on to the client connection. See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt seem to have been implemented yet. We do have set a client connection timeout but cannot find an equivalent setting for the server. We have a suspicion that this happens when we run two client processes which modify two distinct partitions of the same hive table. We put in a workaround so that the two hive client processes never run together and so far things look ok but we will keep monitoring. Could it be because hive metastore server is not thread safe, would running two alter table statements on two distinct partitions of the same table using two client connections cause problems like these, where hive metastore server closes or drops a wrong client connection and leaves the other hanging? Agateaaa On Tue, Jul 30, 2013
[jira] [Updated] (HIVE-5172) TUGIContainingTransport returning null transport, causing intermittent SocketTimeoutException on hive client and NullPointerException in TUGIBasedProcessor on the server
[ https://issues.apache.org/jira/browse/HIVE-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] agate updated HIVE-5172: Description: We are running into frequent problem using HCatalog 0.4.1 (Hive Metastore Server 0.9) where we get connection reset or connection timeout errors on the client and NullPointerException in TUGIBasedProcessor on the server. hive client logs: = org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:954) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7524) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 31 more hive metastore server logs: === 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run(182)) - Error occurred during processing of message. java.lang.NullPointerException at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Adding some extra debug log messages in TUGIBasedProcessor, noticed that the TUGIContainingTransport is null which results in NullPointerException on the server. Further drilling into TUGIContainingTransport noticed that getTransport() returns a null which causes the above error. Further corelating with GC logs observed that that error always hits when the CMS GC has just kicked in (but does not happen after every GC) Put some debugging code in TUGIContainingTransport.getTransport() and I tracked it down to @Override public TUGIContainingTransport getTransport(TTransport trans) {   // UGI information is not available at connection setup time, it will be set later   // via set_ugi() rpc.   transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));   //return transMap.get(trans); //-change TUGIContainingTransport retTrans = transMap.get(trans); if ( retTrans ==
[jira] [Updated] (HIVE-5014) [HCatalog] Fix HCatalog build issue on Windows
[ https://issues.apache.org/jira/browse/HIVE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-5014: --- Status: Patch Available (was: Open) (Setting to patch-available to let jenkins pick it up) [HCatalog] Fix HCatalog build issue on Windows -- Key: HIVE-5014 URL: https://issues.apache.org/jira/browse/HIVE-5014 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 Attachments: HIVE-5014-1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well
[ https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754093#comment-13754093 ] Phabricator commented on HIVE-5029: --- sershe has commented on the revision HIVE-5029 [jira] direct SQL perf optimization cannot be tested well. INLINE COMMENTS metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1684 the rollback is performed in finally. Here we only roll back to re-open it metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1783 the rollback is performed in finally. Here we only roll back to re-open it REVISION DETAIL https://reviews.facebook.net/D12483 To: JIRA, ashutoshc, sershe direct SQL perf optimization cannot be tested well -- Key: HIVE-5029 URL: https://issues.apache.org/jira/browse/HIVE-5029 Project: Hive Issue Type: Test Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.D12483.2.patch, HIVE-5029.patch, HIVE-5029.patch HIVE-4051 introduced perf optimization that involves getting partitions directly via SQL in metastore. Given that SQL queries might not work on all datastores (and will not work on non-SQL ones), JDO fallback is in place. Given that perf improvement is very large for short queries, it's on by default. However, there's a problem with tests with regard to that. If SQL code is broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might allow tests to pass. We are going to disable SQL by default before the testing problem is resolved. There are several possible solultions: 1) Separate build for this setting. Seems like an overkill... 2) Enable by default; disable by default in tests, create a clone of TestCliDriver with a subset of queries that will exercise the SQL path. 3) Have some sort of test hook inside metastore that will run both ORM and SQL and compare. 3') Or make a subclass of ObjectStore that will do that. ObjectStore is already pluggable. 4) Write unit tests for one of the modes (JDO, as non-default?) and declare that they are sufficient; disable fallback in tests. 3' seems like the easiest. For now we will disable SQL by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5173) Wincompat : Add .cmd/text/crlf to .gitattributes
Sushanth Sowmyan created HIVE-5173: -- Summary: Wincompat : Add .cmd/text/crlf to .gitattributes Key: HIVE-5173 URL: https://issues.apache.org/jira/browse/HIVE-5173 Project: Hive Issue Type: Sub-task Components: Windows Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-5173.patch Add .cmd entry to .gitattributes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5173) Wincompat : Add .cmd/text/crlf to .gitattributes
[ https://issues.apache.org/jira/browse/HIVE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-5173: --- Attachment: HIVE-5173.patch Wincompat : Add .cmd/text/crlf to .gitattributes Key: HIVE-5173 URL: https://issues.apache.org/jira/browse/HIVE-5173 Project: Hive Issue Type: Sub-task Components: Windows Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-5173.patch Add .cmd entry to .gitattributes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5173) Wincompat : Add .cmd/text/crlf to .gitattributes
[ https://issues.apache.org/jira/browse/HIVE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-5173: --- Status: Patch Available (was: Open) Wincompat : Add .cmd/text/crlf to .gitattributes Key: HIVE-5173 URL: https://issues.apache.org/jira/browse/HIVE-5173 Project: Hive Issue Type: Sub-task Components: Windows Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-5173.patch Add .cmd entry to .gitattributes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3104) Predicate pushdown doesn't work with multi-insert statements using LATERAL VIEW
[ https://issues.apache.org/jira/browse/HIVE-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Wong updated HIVE-3104: -- Description: Predicate pushdown seems to work for single-insert queries using LATERAL VIEW. It also seems to work for multi-insert queries *not* using LATERAL VIEW. However, it doesn't work for multi-insert queries using LATERAL VIEW. Here are some examples. In the below examples, I make use of the fact that a query with no partition filtering when run under hive.mapred.mode=strict fails. --Table creation and population DROP TABLE IF EXISTS test; CREATE TABLE test (col1 arrayint, col2 int) PARTITIONED BY (part_col int); INSERT OVERWRITE TABLE test PARTITION (part_col=1) SELECT array(1,2), count(*) FROM test; INSERT OVERWRITE TABLE test PARTITION (part_col=2) SELECT array(2,4,6), count(*) FROM test; -- Query 1 -- This succeeds (using LATERAL VIEW with single insert) set hive.mapred.mode=strict; FROM test LATERAL VIEW explode(col1) tmp AS exp_col1 INSERT OVERWRITE DIRECTORY '/test/1' SELECT exp_col1 WHERE (part_col=2); -- Query 2 -- This succeeds (NOT using LATERAL VIEW with multi-insert) set hive.mapred.mode=strict; FROM test INSERT OVERWRITE DIRECTORY '/test/1' SELECT col1 WHERE (part_col=2) INSERT OVERWRITE DIRECTORY '/test/2' SELECT col1 WHERE (part_col=2); -- Query 3 -- This fails (using LATERAL VIEW with multi-insert) set hive.mapred.mode=strict; FROM test LATERAL VIEW explode(col1) tmp AS exp_col1 INSERT OVERWRITE DIRECTORY '/test/1' SELECT exp_col1 WHERE (part_col=2) INSERT OVERWRITE DIRECTORY '/test/2' SELECT exp_col1 WHERE (part_col=2); was: Predicate pushdown seems to work for single-insert queries using LATERAL VIEW. It also seems to work for multi-insert queries *not* using LATERAL VIEW. However, it doesn't work for multi-insert queries using LATERAL VIEW. Here are some examples. In the below examples, I make use of the fact that a query with no partition filtering when run under hive.mapred.mode=strict fails. --Table creation and population DROP TABLE IF EXISTS test; CREATE TABLE test (col1 arrayint, col2 int) PARTITIONED BY (part_col int); INSERT OVERWRITE TABLE test PARTITION (part_col=1) SELECT array(1,2), count(*) FROM test; INSERT OVERWRITE TABLE test PARTITION (part_col=2) SELECT array(2,4,6), count(*) FROM test; -- Query 1 -- This succeeds (using LATERAL VIEW with single insert) set hive.mapred.mode=strict; FROM partition_test LATERAL VIEW explode(col1) tmp AS exp_col1 INSERT OVERWRITE DIRECTORY '/test/1' SELECT exp_col1 WHERE (part_col=2); -- Query 2 -- This succeeds (NOT using LATERAL VIEW with multi-insert) set hive.mapred.mode=strict; FROM partition_test INSERT OVERWRITE DIRECTORY '/test/1' SELECT col1 WHERE (part_col=2) INSERT OVERWRITE DIRECTORY '/test/2' SELECT col1 WHERE (part_col=2); -- Query 3 -- This fails (using LATERAL VIEW with multi-insert) set hive.mapred.mode=strict; FROM test LATERAL VIEW explode(col1) tmp AS exp_col1 INSERT OVERWRITE DIRECTORY '/test/1' SELECT exp_col1 WHERE (part_col=2) INSERT OVERWRITE DIRECTORY '/test/2' SELECT exp_col1 WHERE (part_col=2); Predicate pushdown doesn't work with multi-insert statements using LATERAL VIEW --- Key: HIVE-3104 URL: https://issues.apache.org/jira/browse/HIVE-3104 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.9.0 Environment: Apache Hive 0.9.0, Apache Hadoop 0.20.205.0 Reporter: Mark Grover Assignee: Xuefu Zhang Predicate pushdown seems to work for single-insert queries using LATERAL VIEW. It also seems to work for multi-insert queries *not* using LATERAL VIEW. However, it doesn't work for multi-insert queries using LATERAL VIEW. Here are some examples. In the below examples, I make use of the fact that a query with no partition filtering when run under hive.mapred.mode=strict fails. --Table creation and population DROP TABLE IF EXISTS test; CREATE TABLE test (col1 arrayint, col2 int) PARTITIONED BY (part_col int); INSERT OVERWRITE TABLE test PARTITION (part_col=1) SELECT array(1,2), count(*) FROM test; INSERT OVERWRITE TABLE test PARTITION (part_col=2) SELECT array(2,4,6), count(*) FROM test; -- Query 1 -- This succeeds (using LATERAL VIEW with single insert) set hive.mapred.mode=strict; FROM test LATERAL VIEW explode(col1) tmp AS exp_col1 INSERT OVERWRITE DIRECTORY '/test/1' SELECT exp_col1 WHERE (part_col=2); -- Query 2 -- This succeeds (NOT using LATERAL VIEW with multi-insert) set hive.mapred.mode=strict; FROM test INSERT OVERWRITE DIRECTORY '/test/1' SELECT col1 WHERE (part_col=2) INSERT OVERWRITE DIRECTORY '/test/2' SELECT col1 WHERE (part_col=2);
[jira] [Updated] (HIVE-3104) Predicate pushdown doesn't work with multi-insert statements using LATERAL VIEW
[ https://issues.apache.org/jira/browse/HIVE-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Wong updated HIVE-3104: -- Description: Predicate pushdown seems to work for single-insert queries using LATERAL VIEW. It also seems to work for multi-insert queries *not* using LATERAL VIEW. However, it doesn't work for multi-insert queries using LATERAL VIEW: It errors out right away with 'FAILED: SemanticException [Error 10041]: No partition predicate found for Alias test Table test'. Here are some examples. In the below examples, I make use of the fact that a query with no partition filtering when run under hive.mapred.mode=strict fails. --Table creation and population DROP TABLE IF EXISTS test; CREATE TABLE test (col1 arrayint, col2 int) PARTITIONED BY (part_col int); INSERT OVERWRITE TABLE test PARTITION (part_col=1) SELECT array(1,2), count(*) FROM test; INSERT OVERWRITE TABLE test PARTITION (part_col=2) SELECT array(2,4,6), count(*) FROM test; -- Query 1 -- This succeeds (using LATERAL VIEW with single insert) set hive.mapred.mode=strict; FROM test LATERAL VIEW explode(col1) tmp AS exp_col1 INSERT OVERWRITE DIRECTORY '/test/1' SELECT exp_col1 WHERE (part_col=2); -- Query 2 -- This succeeds (NOT using LATERAL VIEW with multi-insert) set hive.mapred.mode=strict; FROM test INSERT OVERWRITE DIRECTORY '/test/1' SELECT col1 WHERE (part_col=2) INSERT OVERWRITE DIRECTORY '/test/2' SELECT col1 WHERE (part_col=2); -- Query 3 -- This fails (using LATERAL VIEW with multi-insert) set hive.mapred.mode=strict; FROM test LATERAL VIEW explode(col1) tmp AS exp_col1 INSERT OVERWRITE DIRECTORY '/test/1' SELECT exp_col1 WHERE (part_col=2) INSERT OVERWRITE DIRECTORY '/test/2' SELECT exp_col1 WHERE (part_col=2); was: Predicate pushdown seems to work for single-insert queries using LATERAL VIEW. It also seems to work for multi-insert queries *not* using LATERAL VIEW. However, it doesn't work for multi-insert queries using LATERAL VIEW. Here are some examples. In the below examples, I make use of the fact that a query with no partition filtering when run under hive.mapred.mode=strict fails. --Table creation and population DROP TABLE IF EXISTS test; CREATE TABLE test (col1 arrayint, col2 int) PARTITIONED BY (part_col int); INSERT OVERWRITE TABLE test PARTITION (part_col=1) SELECT array(1,2), count(*) FROM test; INSERT OVERWRITE TABLE test PARTITION (part_col=2) SELECT array(2,4,6), count(*) FROM test; -- Query 1 -- This succeeds (using LATERAL VIEW with single insert) set hive.mapred.mode=strict; FROM test LATERAL VIEW explode(col1) tmp AS exp_col1 INSERT OVERWRITE DIRECTORY '/test/1' SELECT exp_col1 WHERE (part_col=2); -- Query 2 -- This succeeds (NOT using LATERAL VIEW with multi-insert) set hive.mapred.mode=strict; FROM test INSERT OVERWRITE DIRECTORY '/test/1' SELECT col1 WHERE (part_col=2) INSERT OVERWRITE DIRECTORY '/test/2' SELECT col1 WHERE (part_col=2); -- Query 3 -- This fails (using LATERAL VIEW with multi-insert) set hive.mapred.mode=strict; FROM test LATERAL VIEW explode(col1) tmp AS exp_col1 INSERT OVERWRITE DIRECTORY '/test/1' SELECT exp_col1 WHERE (part_col=2) INSERT OVERWRITE DIRECTORY '/test/2' SELECT exp_col1 WHERE (part_col=2); Predicate pushdown doesn't work with multi-insert statements using LATERAL VIEW --- Key: HIVE-3104 URL: https://issues.apache.org/jira/browse/HIVE-3104 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.9.0 Environment: Apache Hive 0.9.0, Apache Hadoop 0.20.205.0 Reporter: Mark Grover Assignee: Xuefu Zhang Predicate pushdown seems to work for single-insert queries using LATERAL VIEW. It also seems to work for multi-insert queries *not* using LATERAL VIEW. However, it doesn't work for multi-insert queries using LATERAL VIEW: It errors out right away with 'FAILED: SemanticException [Error 10041]: No partition predicate found for Alias test Table test'. Here are some examples. In the below examples, I make use of the fact that a query with no partition filtering when run under hive.mapred.mode=strict fails. --Table creation and population DROP TABLE IF EXISTS test; CREATE TABLE test (col1 arrayint, col2 int) PARTITIONED BY (part_col int); INSERT OVERWRITE TABLE test PARTITION (part_col=1) SELECT array(1,2), count(*) FROM test; INSERT OVERWRITE TABLE test PARTITION (part_col=2) SELECT array(2,4,6), count(*) FROM test; -- Query 1 -- This succeeds (using LATERAL VIEW with single insert) set hive.mapred.mode=strict; FROM test LATERAL VIEW explode(col1) tmp AS exp_col1 INSERT OVERWRITE DIRECTORY '/test/1' SELECT exp_col1 WHERE (part_col=2); -- Query 2 -- This succeeds
[jira] [Created] (HIVE-5174) Wincompat : junit.file.schema and hadoop.testcp, set-hadoop-test-classpath build configurability
Sushanth Sowmyan created HIVE-5174: -- Summary: Wincompat : junit.file.schema and hadoop.testcp, set-hadoop-test-classpath build configurability Key: HIVE-5174 URL: https://issues.apache.org/jira/browse/HIVE-5174 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Adding junit.file.schema and hadoop.testcp configurability to build, adding set-hadoop-test-classpath target. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5174) Wincompat : junit.file.schema and hadoop.testcp, set-hadoop-test-classpath build configurability
[ https://issues.apache.org/jira/browse/HIVE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-5174: --- Status: Patch Available (was: Open) Wincompat : junit.file.schema and hadoop.testcp, set-hadoop-test-classpath build configurability Key: HIVE-5174 URL: https://issues.apache.org/jira/browse/HIVE-5174 Project: Hive Issue Type: Sub-task Components: Windows Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-5174.patch Adding junit.file.schema and hadoop.testcp configurability to build, adding set-hadoop-test-classpath target. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5174) Wincompat : junit.file.schema and hadoop.testcp, set-hadoop-test-classpath build configurability
[ https://issues.apache.org/jira/browse/HIVE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-5174: --- Attachment: HIVE-5174.patch Wincompat : junit.file.schema and hadoop.testcp, set-hadoop-test-classpath build configurability Key: HIVE-5174 URL: https://issues.apache.org/jira/browse/HIVE-5174 Project: Hive Issue Type: Sub-task Components: Windows Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-5174.patch Adding junit.file.schema and hadoop.testcp configurability to build, adding set-hadoop-test-classpath target. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5175) Wcompat : adds HADOOP_TIME_ZONE env property and user.timezone sysproperty
Sushanth Sowmyan created HIVE-5175: -- Summary: Wcompat : adds HADOOP_TIME_ZONE env property and user.timezone sysproperty Key: HIVE-5175 URL: https://issues.apache.org/jira/browse/HIVE-5175 Project: Hive Issue Type: Sub-task Components: Windows Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Adding HADOOP_TIME_ZONE and env property user.timezone as US/Pacific, needed for certain tests in windows to pass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5176) Wcompat : Changes for allowing various path compatibilities with Windows
Sushanth Sowmyan created HIVE-5176: -- Summary: Wcompat : Changes for allowing various path compatibilities with Windows Key: HIVE-5176 URL: https://issues.apache.org/jira/browse/HIVE-5176 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan We need to make certain changes across the board to allow us to read/parse windows paths. Some are escaping changes, some are being strict about how we read paths (through URL.encode/decode, etc) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5178) WCompat : QTestUtil changes
Sushanth Sowmyan created HIVE-5178: -- Summary: WCompat : QTestUtil changes Key: HIVE-5178 URL: https://issues.apache.org/jira/browse/HIVE-5178 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Miscellaneous QTestUtil changes are needed to make tests work under windows: a) Aux jars needed to be set up for minimr b) Ignore empty test lines if windows -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5177) WCompat : Retrying handler related changes
Sushanth Sowmyan created HIVE-5177: -- Summary: WCompat : Retrying handler related changes Key: HIVE-5177 URL: https://issues.apache.org/jira/browse/HIVE-5177 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5179) WCompat : change script tests from bash to sh
Sushanth Sowmyan created HIVE-5179: -- Summary: WCompat : change script tests from bash to sh Key: HIVE-5179 URL: https://issues.apache.org/jira/browse/HIVE-5179 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5180) Packaging : Add Windows installation and execution .cmd and .ps1 scripts
Sushanth Sowmyan created HIVE-5180: -- Summary: Packaging : Add Windows installation and execution .cmd and .ps1 scripts Key: HIVE-5180 URL: https://issues.apache.org/jira/browse/HIVE-5180 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-5178) WCompat : QTestUtil changes
[ https://issues.apache.org/jira/browse/HIVE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan reassigned HIVE-5178: -- Assignee: Sushanth Sowmyan WCompat : QTestUtil changes --- Key: HIVE-5178 URL: https://issues.apache.org/jira/browse/HIVE-5178 Project: Hive Issue Type: Sub-task Components: Windows Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Miscellaneous QTestUtil changes are needed to make tests work under windows: a) Aux jars needed to be set up for minimr b) Ignore empty test lines if windows -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4944) Hive Windows Scripts and Compatibility changes
[ https://issues.apache.org/jira/browse/HIVE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754123#comment-13754123 ] Sushanth Sowmyan commented on HIVE-4944: (Made subtask patches from the monolithic patches attached to this jira for easier reviewing, will upload each patch individually) Hive Windows Scripts and Compatibility changes -- Key: HIVE-4944 URL: https://issues.apache.org/jira/browse/HIVE-4944 Project: Hive Issue Type: Bug Components: Windows Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: compat.patch, packaging.patch Porting patches that enable hive packaging and running under windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4944) Hive Windows Scripts and Compatibility changes
[ https://issues.apache.org/jira/browse/HIVE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754127#comment-13754127 ] Sushanth Sowmyan commented on HIVE-4944: Also, these patches aren't originally by me, I'll add in the names of each of the contributors in the individual patches, they're from contributors in Microsoft who developed against hive 0.9, and asked for my help in reviewing and forward-porting to trunk. Hive Windows Scripts and Compatibility changes -- Key: HIVE-4944 URL: https://issues.apache.org/jira/browse/HIVE-4944 Project: Hive Issue Type: Bug Components: Windows Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: compat.patch, packaging.patch Porting patches that enable hive packaging and running under windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4944) Hive Windows Scripts and Compatibility changes
[ https://issues.apache.org/jira/browse/HIVE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754130#comment-13754130 ] Sushanth Sowmyan commented on HIVE-4944: Edit, above should read : Also, these patches aren't originally by me, I'll add in the names of each of the contributors in the individual patches, they're from contributors in Microsoft who developed against hive 0.9, and asked for my help in reviewing and forward-porting to trunk and contributing it to apache hive. Hive Windows Scripts and Compatibility changes -- Key: HIVE-4944 URL: https://issues.apache.org/jira/browse/HIVE-4944 Project: Hive Issue Type: Bug Components: Windows Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: compat.patch, packaging.patch Porting patches that enable hive packaging and running under windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5175) Wcompat : adds HADOOP_TIME_ZONE env property and user.timezone sysproperty
[ https://issues.apache.org/jira/browse/HIVE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-5175: --- Attachment: HIVE-5175.patch Wcompat : adds HADOOP_TIME_ZONE env property and user.timezone sysproperty -- Key: HIVE-5175 URL: https://issues.apache.org/jira/browse/HIVE-5175 Project: Hive Issue Type: Sub-task Components: Windows Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-5175.patch Adding HADOOP_TIME_ZONE and env property user.timezone as US/Pacific, needed for certain tests in windows to pass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5175) Wcompat : adds HADOOP_TIME_ZONE env property and user.timezone sysproperty
[ https://issues.apache.org/jira/browse/HIVE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-5175: --- Status: Patch Available (was: Open) Wcompat : adds HADOOP_TIME_ZONE env property and user.timezone sysproperty -- Key: HIVE-5175 URL: https://issues.apache.org/jira/browse/HIVE-5175 Project: Hive Issue Type: Sub-task Components: Windows Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-5175.patch Adding HADOOP_TIME_ZONE and env property user.timezone as US/Pacific, needed for certain tests in windows to pass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5178) WCompat : QTestUtil changes
[ https://issues.apache.org/jira/browse/HIVE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-5178: --- Attachment: HIVE-5178.patch WCompat : QTestUtil changes --- Key: HIVE-5178 URL: https://issues.apache.org/jira/browse/HIVE-5178 Project: Hive Issue Type: Sub-task Components: Windows Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-5178.patch Miscellaneous QTestUtil changes are needed to make tests work under windows: a) Aux jars needed to be set up for minimr b) Ignore empty test lines if windows -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5178) WCompat : QTestUtil changes
[ https://issues.apache.org/jira/browse/HIVE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-5178: --- Status: Patch Available (was: Open) WCompat : QTestUtil changes --- Key: HIVE-5178 URL: https://issues.apache.org/jira/browse/HIVE-5178 Project: Hive Issue Type: Sub-task Components: Windows Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-5178.patch Miscellaneous QTestUtil changes are needed to make tests work under windows: a) Aux jars needed to be set up for minimr b) Ignore empty test lines if windows -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5179) WCompat : change script tests from bash to sh
[ https://issues.apache.org/jira/browse/HIVE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-5179: --- Status: Patch Available (was: Open) WCompat : change script tests from bash to sh - Key: HIVE-5179 URL: https://issues.apache.org/jira/browse/HIVE-5179 Project: Hive Issue Type: Sub-task Components: Windows Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-5179.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4965) Add support so that PTFs can stream their output; Windowing PTF should do this
[ https://issues.apache.org/jira/browse/HIVE-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4965: -- Attachment: HIVE-4965.D12615.1.patch hbutani requested code review of HIVE-4965 [jira] Add support so that PTFs can stream their output; Windowing PTF should do this. Reviewers: JIRA, ashutoshc fix lint issues There is no need to create an output PTF Partition for the last PTF in a chain. For the Windowing PTF this should give a perf. boost; we avoid creating temporary results for each UDAF; avoid populating an output Partition. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D12615 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/30297/ To: JIRA, ashutoshc, hbutani Add support so that PTFs can stream their output; Windowing PTF should do this -- Key: HIVE-4965 URL: https://issues.apache.org/jira/browse/HIVE-4965 Project: Hive Issue Type: Bug Reporter: Harish Butani Attachments: HIVE-4965.D12033.1.patch, HIVE-4965.D12615.1.patch There is no need to create an output PTF Partition for the last PTF in a chain. For the Windowing PTF this should give a perf. boost; we avoid creating temporary results for each UDAF; avoid populating an output Partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5179) WCompat : change script tests from bash to sh
[ https://issues.apache.org/jira/browse/HIVE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-5179: --- Attachment: HIVE-5179.patch WCompat : change script tests from bash to sh - Key: HIVE-5179 URL: https://issues.apache.org/jira/browse/HIVE-5179 Project: Hive Issue Type: Sub-task Components: Windows Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-5179.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns
[ https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754165#comment-13754165 ] Ashutosh Chauhan commented on HIVE-5149: Ah.. right! I missed that. I will take a look at the patch! ReduceSinkDeDuplication can pick the wrong partitioning columns --- Key: HIVE-5149 URL: https://issues.apache.org/jira/browse/HIVE-5149 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5095) Hive needs new operator walker for parallelization/optimization for tez
[ https://issues.apache.org/jira/browse/HIVE-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-5095: - Attachment: HIVE-5095.1.patch Hive needs new operator walker for parallelization/optimization for tez --- Key: HIVE-5095 URL: https://issues.apache.org/jira/browse/HIVE-5095 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: tez-branch Attachments: HIVE-5095.1.patch, HIVE-5095.WIP.patch.txt For tez to compute the number of reducers, we should be walking the operator tree in a topological fashion so that the reducers down the tree get the estimate from all parents. However, the current walkers in hive only walk the operator tree in a depth-first fashion. We need to add a new walker for the topological walk. Also, since information about the parent operators needs to be propagated on a per parent basis, we need to retain some context across operators to be passed to the child which the walker will co-ordinate. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5095) Hive needs new operator walker for parallelization/optimization for tez
[ https://issues.apache.org/jira/browse/HIVE-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-5095: - Description: For tez to compute the number of reducers, we should be walking the operator tree in a topological fashion so that the reducers down the tree get the estimate from all parents. However, the current walkers in hive only walk the operator tree in a depth-first fashion. We need to add a new walker for the topological walk. Also, since information about the parent operators needs to be propagated on a per parent basis, we need to retain some context across operators to be passed to the child which the walker will co-ordinate. NO PRECOMMIT TESTS (this is wip for the tez branch) was:For tez to compute the number of reducers, we should be walking the operator tree in a topological fashion so that the reducers down the tree get the estimate from all parents. However, the current walkers in hive only walk the operator tree in a depth-first fashion. We need to add a new walker for the topological walk. Also, since information about the parent operators needs to be propagated on a per parent basis, we need to retain some context across operators to be passed to the child which the walker will co-ordinate. Hive needs new operator walker for parallelization/optimization for tez --- Key: HIVE-5095 URL: https://issues.apache.org/jira/browse/HIVE-5095 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: tez-branch Attachments: HIVE-5095.1.patch, HIVE-5095.WIP.patch.txt For tez to compute the number of reducers, we should be walking the operator tree in a topological fashion so that the reducers down the tree get the estimate from all parents. However, the current walkers in hive only walk the operator tree in a depth-first fashion. We need to add a new walker for the topological walk. Also, since information about the parent operators needs to be propagated on a per parent basis, we need to retain some context across operators to be passed to the child which the walker will co-ordinate. NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5168) Extend Hive for spatial query support
[ https://issues.apache.org/jira/browse/HIVE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754190#comment-13754190 ] Ashutosh Chauhan commented on HIVE-5168: Hi [~wangfsh] I have granted you privs. You should be able to upload the doc now. Extend Hive for spatial query support - Key: HIVE-5168 URL: https://issues.apache.org/jira/browse/HIVE-5168 Project: Hive Issue Type: New Feature Reporter: Fusheng Wang Labels: Hadoop-GIS, Spatial, I would like to propose to incorporate a newly developed spatial querying component into Hive. We have recently developed a high performance MapReduce based spatial querying system Hadoop-GIS, to support large scale spatial queries and analytics. Hadoop-GIS is a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through space partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects on MapReduce. Hadoop-GIS takes advantage of global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. We have an alpha release. We look forward to contributors in Hive community to contribute to the system. github: https://github.com/hadoop-gis Hadoop-GIS wiki: https://web.cci.emory.edu/confluence/display/HadoopGIS References: 1. Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, Joel Saltz: Hadoop-GIS: A High Performance Spatial Data Warehousing System Over MapReduce. In Proceedings of the 39th International Conference on Very Large Databases (VLDB'2013), Trento, Italy, August 26-30, 2013. http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p726-aji.pdf 2. Ablimit Aji, Fusheng Wang and Joel Saltz: Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data. In Proceedings of the 20th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2012), Redondo Beach, California, USA, November 6-9, 2012. http://confluence.cci.emory.edu:8090/download/attachments/6193390/SIGSpatial2012TechReport.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-5095) Hive needs new operator walker for parallelization/optimization for tez
[ https://issues.apache.org/jira/browse/HIVE-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-5095. -- Resolution: Fixed Hive needs new operator walker for parallelization/optimization for tez --- Key: HIVE-5095 URL: https://issues.apache.org/jira/browse/HIVE-5095 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: tez-branch Attachments: HIVE-5095.1.patch, HIVE-5095.WIP.patch.txt For tez to compute the number of reducers, we should be walking the operator tree in a topological fashion so that the reducers down the tree get the estimate from all parents. However, the current walkers in hive only walk the operator tree in a depth-first fashion. We need to add a new walker for the topological walk. Also, since information about the parent operators needs to be propagated on a per parent basis, we need to retain some context across operators to be passed to the child which the walker will co-ordinate. NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5095) Hive needs new operator walker for parallelization/optimization for tez
[ https://issues.apache.org/jira/browse/HIVE-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754198#comment-13754198 ] Gunther Hagleitner commented on HIVE-5095: -- Committed .1 to branch. Thanks! Hive needs new operator walker for parallelization/optimization for tez --- Key: HIVE-5095 URL: https://issues.apache.org/jira/browse/HIVE-5095 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: tez-branch Attachments: HIVE-5095.1.patch, HIVE-5095.WIP.patch.txt For tez to compute the number of reducers, we should be walking the operator tree in a topological fashion so that the reducers down the tree get the estimate from all parents. However, the current walkers in hive only walk the operator tree in a depth-first fashion. We need to add a new walker for the topological walk. Also, since information about the parent operators needs to be propagated on a per parent basis, we need to retain some context across operators to be passed to the child which the walker will co-ordinate. NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-5052) Set parallelism when generating the tez tasks
[ https://issues.apache.org/jira/browse/HIVE-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-5052. -- Resolution: Duplicate Set parallelism when generating the tez tasks - Key: HIVE-5052 URL: https://issues.apache.org/jira/browse/HIVE-5052 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Vikram Dixit K Fix For: tez-branch Attachments: HIVE-5052.1.patch.txt, HIVE-5052.2.patch.txt In GenTezTask any intermediate task has parallelism set to 1. This needs to be fixed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4914) filtering via partition name should be done inside metastore server (implementation)
[ https://issues.apache.org/jira/browse/HIVE-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754201#comment-13754201 ] Ashutosh Chauhan commented on HIVE-4914: If there are udfs in expression we should still do expression eval on client because * Otherwise user jar is required on server. * It will be security concern to run user code in metastore server. filtering via partition name should be done inside metastore server (implementation) Key: HIVE-4914 URL: https://issues.apache.org/jira/browse/HIVE-4914 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-4914.01.patch, HIVE-4914.D12561.1.patch, HIVE-4914-only-no-gen.patch, HIVE-4914-only.patch, HIVE-4914.patch, HIVE-4914.patch, HIVE-4914.patch Currently, if the filter pushdown is impossible (which is most cases), the client gets all partition names from metastore, filters them, and asks for partitions by names for the filtered set. Metastore server code should do that instead; it should check if pushdown is possible and do it if so; otherwise it should do name-based filtering. Saves the roundtrip with all partition names from the server to client, and also removes the need to have pushdown viability checking on both sides. NO PRECOMMIT TESTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4844) Add char/varchar data types
[ https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754204#comment-13754204 ] Hive QA commented on HIVE-4844: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12600624/HIVE-4844.10.patch {color:green}SUCCESS:{color} +1 2918 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/561/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/561/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. Add char/varchar data types --- Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-4844.10.patch, HIVE-4844.1.patch.hack, HIVE-4844.2.patch, HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, HIVE-4844.6.patch, HIVE-4844.7.patch, HIVE-4844.8.patch, HIVE-4844.9.patch, screenshot.png Add new char/varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5181) RetryingRawStore should not retry on logical failures (e.g. from commit)
Sergey Shelukhin created HIVE-5181: -- Summary: RetryingRawStore should not retry on logical failures (e.g. from commit) Key: HIVE-5181 URL: https://issues.apache.org/jira/browse/HIVE-5181 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Priority: Minor RetryingRawStore retries calls. Some method (e.g. drop_table_core in HiveMetaStore) explicitly call openTransaction and commitTransaction on RawStore. When the commit call fails due to some real issue, it is retried, and instead of a real cause for failure one gets some bogus exception about transaction open count. I doesn't make sense to retry logical errors, especially not from commitTransaction. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira