date:20130829


[ 
https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753361#comment-13753361
 ] 

Thejas M Nair commented on HIVE-4617:
-

+1 . Will commit if tests pass.


 ExecuteStatementAsync call to run a query in non-blocking mode
 --

 Key: HIVE-4617
 URL: https://issues.apache.org/jira/browse/HIVE-4617
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Jaideep Dhok
Assignee: Vaibhav Gumashta
 Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, 
 HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, 
 HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, 
 HIVE-4617.D12507Test.1.patch


 Provide a way to run a queries asynchronously. Current executeStatement call 
 blocks until the query run is complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4601) WebHCat, Templeton need to support proxy users


[ 
https://issues.apache.org/jira/browse/HIVE-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753364#comment-13753364
 ] 

Thejas M Nair commented on HIVE-4601:
-

Verified the hive package target, and hcatalog tests pass (this is a webhcat 
only change). I will commit this soon.


 WebHCat, Templeton need to support proxy users
 --

 Key: HIVE-4601
 URL: https://issues.apache.org/jira/browse/HIVE-4601
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Dilli Arumugam
Assignee: Eugene Koifman
  Labels: proxy, templeton
 Fix For: 0.12.0

 Attachments: HIVE-4601.2.patch, HIVE-4601.3.patch, HIVE-4601.4.patch, 
 HIVE-4601.5.patch, HIVE-4601.patch


 We have a use case where a Gateway would provide unified and controlled 
 access to secure hadoop cluster.
 The Gateway itself would authenticate to secure WebHDFS, Oozie and Templeton 
 with SPNego.
 The Gateway would authenticate the end user with http basic and would assert 
 the end user identity as douser argument in the calls to downstream WebHDFS, 
 Oozie and Templeton.
 This works fine with WebHDFS and Oozie. But, does not work for Templeton as 
 Templeton does not support proxy users.
 Hence, request to add this improvement to Templeton.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5128) Direct SQL for view is failing


[ 
https://issues.apache.org/jira/browse/HIVE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753366#comment-13753366
 ] 

Hudson commented on HIVE-5128:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #387 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/387/])
HIVE-5128 : Direct SQL for view is failing (Sergey Shelukhin via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518258)
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java


 Direct SQL for view is failing 
 ---

 Key: HIVE-5128
 URL: https://issues.apache.org/jira/browse/HIVE-5128
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Sergey Shelukhin
Priority: Trivial
 Fix For: 0.12.0

 Attachments: HIVE-5128.D12465.1.patch, HIVE-5128.D12465.2.patch


 I cannot sure of this, but dropping views, (it rolls back to JPA and works 
 fine)
 {noformat}
 etastore.ObjectStore: Direct SQL failed, falling back to ORM
 MetaException(message:Unexpected null for one of the IDs, SD null, column 
 null, serde null)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:195)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1758)
 ...
 {noformat}
 Should it be disabled for views or can be fixed?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage


[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753365#comment-13753365
 ] 

Hudson commented on HIVE-3562:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #387 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/387/])
HIVE-3562 : Some limit can be pushed down to map stage (Navis via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518234)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/build.xml
* /hive/trunk/ql/ivy.xml
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveKey.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
* /hive/trunk/ql/src/test/queries/clientpositive/limit_pushdown.q
* /hive/trunk/ql/src/test/queries/clientpositive/limit_pushdown_negative.q
* /hive/trunk/ql/src/test/results/clientpositive/limit_pushdown.q.out
* /hive/trunk/ql/src/test/results/clientpositive/limit_pushdown_negative.q.out


 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.12.0

 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
 HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, 
 HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, 
 HIVE-3562.D5967.9.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries


[ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753379#comment-13753379
 ] 

Hive QA commented on HIVE-5091:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12600486/HIVE-5091.D12249.3.patch

{color:green}SUCCESS:{color} +1 2902 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/555/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/555/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 ORC files should have an option to pad stripes to the HDFS block boundaries
 ---

 Key: HIVE-5091
 URL: https://issues.apache.org/jira/browse/HIVE-5091
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, 
 HIVE-5091.D12249.3.patch


 With ORC stripes being large, if a stripe straddles an HDFS block, the 
 locality of read is suboptimal. It would be good to add padding to ensure 
 that stripes don't straddle HDFS blocks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode

2013-08-29 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753381#comment-13753381
 ] 

Carl Steinbach commented on HIVE-4617:
--

[~thejas] I found some minor issues and am adding comments to phabricator. 
Please do not commit this patch until Jaideep has had a chance to respond. 
Thanks.

 ExecuteStatementAsync call to run a query in non-blocking mode
 --

 Key: HIVE-4617
 URL: https://issues.apache.org/jira/browse/HIVE-4617
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Jaideep Dhok
Assignee: Vaibhav Gumashta
 Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, 
 HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, 
 HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, 
 HIVE-4617.D12507Test.1.patch


 Provide a way to run a queries asynchronously. Current executeStatement call 
 blocks until the query run is complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x


 [ 
https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4460:


Status: Open  (was: Patch Available)

 Publish HCatalog artifacts for Hadoop 2.x
 -

 Key: HIVE-4460
 URL: https://issues.apache.org/jira/browse/HIVE-4460
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
 Environment: Hadoop 2.x
Reporter: Venkat Ranganathan
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.patch

   Original Estimate: 72h
  Time Spent: 40h 40m
  Remaining Estimate: 31h 20m

 HCatalog artifacts are only published for Hadoop 1.x version.  As more 
 projects add HCatalog integration, the need for HCatalog artifcats on Hadoop 
 versions supported by the product  is needed so that automated builds that 
 target different Hadoop releases can be built successfully.   For example 
 SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 
 1.x and 2.x releases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x


 [ 
https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4460:


Attachment: HIVE-4460.Dtest.3.patch

HIVE-4460.Dtest.3.patch - copy of HIVE-4460.3.patch to get pre-commit tests 
running


 Publish HCatalog artifacts for Hadoop 2.x
 -

 Key: HIVE-4460
 URL: https://issues.apache.org/jira/browse/HIVE-4460
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
 Environment: Hadoop 2.x
Reporter: Venkat Ranganathan
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, 
 HIVE-4460.Dtest.3.patch, HIVE-4460.patch

   Original Estimate: 72h
  Time Spent: 40h 40m
  Remaining Estimate: 31h 20m

 HCatalog artifacts are only published for Hadoop 1.x version.  As more 
 projects add HCatalog integration, the need for HCatalog artifcats on Hadoop 
 versions supported by the product  is needed so that automated builds that 
 target different Hadoop releases can be built successfully.   For example 
 SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 
 1.x and 2.x releases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x


 [ 
https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4460:


Status: Patch Available  (was: Open)

 Publish HCatalog artifacts for Hadoop 2.x
 -

 Key: HIVE-4460
 URL: https://issues.apache.org/jira/browse/HIVE-4460
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
 Environment: Hadoop 2.x
Reporter: Venkat Ranganathan
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, 
 HIVE-4460.Dtest.3.patch, HIVE-4460.patch

   Original Estimate: 72h
  Time Spent: 40h 40m
  Remaining Estimate: 31h 20m

 HCatalog artifacts are only published for Hadoop 1.x version.  As more 
 projects add HCatalog integration, the need for HCatalog artifcats on Hadoop 
 versions supported by the product  is needed so that automated builds that 
 target different Hadoop releases can be built successfully.   For example 
 SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 
 1.x and 2.x releases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode


[ 
https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753389#comment-13753389
 ] 

Thejas M Nair commented on HIVE-4617:
-

[~cwsteinbach] Sure. Please note that the latest patch is in a new phabricator 
link - https://reviews.facebook.net/D12507. Vaibhav had some issues updating 
the earlier one. 

 ExecuteStatementAsync call to run a query in non-blocking mode
 --

 Key: HIVE-4617
 URL: https://issues.apache.org/jira/browse/HIVE-4617
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Jaideep Dhok
Assignee: Vaibhav Gumashta
 Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, 
 HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, 
 HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, 
 HIVE-4617.D12507Test.1.patch


 Provide a way to run a queries asynchronously. Current executeStatement call 
 blocks until the query run is complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode


[ 
https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753394#comment-13753394
 ] 

Thejas M Nair commented on HIVE-4617:
-

[~jaideepdhok] I hope you can also take a look at the revised patch that is 
based on your original patch.
We can discuss how the GetQueryPlan api can be implemented in way that makes it 
possible to guarantee backward compatibility and any impact that would have on 
GetOperationStatus in HIVE-4569.


 ExecuteStatementAsync call to run a query in non-blocking mode
 --

 Key: HIVE-4617
 URL: https://issues.apache.org/jira/browse/HIVE-4617
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Jaideep Dhok
Assignee: Vaibhav Gumashta
 Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, 
 HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, 
 HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, 
 HIVE-4617.D12507Test.1.patch


 Provide a way to run a queries asynchronously. Current executeStatement call 
 blocks until the query run is complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x


 [ 
https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4460:


Status: Patch Available  (was: Open)

 Publish HCatalog artifacts for Hadoop 2.x
 -

 Key: HIVE-4460
 URL: https://issues.apache.org/jira/browse/HIVE-4460
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
 Environment: Hadoop 2.x
Reporter: Venkat Ranganathan
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.4.patch, 
 HIVE-4460.patch

   Original Estimate: 72h
  Time Spent: 40h 40m
  Remaining Estimate: 31h 20m

 HCatalog artifacts are only published for Hadoop 1.x version.  As more 
 projects add HCatalog integration, the need for HCatalog artifcats on Hadoop 
 versions supported by the product  is needed so that automated builds that 
 target different Hadoop releases can be built successfully.   For example 
 SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 
 1.x and 2.x releases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x


 [ 
https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4460:


Attachment: (was: HIVE-4460.Dtest.3.patch)

 Publish HCatalog artifacts for Hadoop 2.x
 -

 Key: HIVE-4460
 URL: https://issues.apache.org/jira/browse/HIVE-4460
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
 Environment: Hadoop 2.x
Reporter: Venkat Ranganathan
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.4.patch, 
 HIVE-4460.patch

   Original Estimate: 72h
  Time Spent: 40h 40m
  Remaining Estimate: 31h 20m

 HCatalog artifacts are only published for Hadoop 1.x version.  As more 
 projects add HCatalog integration, the need for HCatalog artifcats on Hadoop 
 versions supported by the product  is needed so that automated builds that 
 target different Hadoop releases can be built successfully.   For example 
 SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 
 1.x and 2.x releases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x


 [ 
https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4460:


Status: Open  (was: Patch Available)

 Publish HCatalog artifacts for Hadoop 2.x
 -

 Key: HIVE-4460
 URL: https://issues.apache.org/jira/browse/HIVE-4460
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
 Environment: Hadoop 2.x
Reporter: Venkat Ranganathan
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.4.patch, 
 HIVE-4460.patch

   Original Estimate: 72h
  Time Spent: 40h 40m
  Remaining Estimate: 31h 20m

 HCatalog artifacts are only published for Hadoop 1.x version.  As more 
 projects add HCatalog integration, the need for HCatalog artifcats on Hadoop 
 versions supported by the product  is needed so that automated builds that 
 target different Hadoop releases can be built successfully.   For example 
 SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 
 1.x and 2.x releases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x


 [ 
https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4460:


Attachment: HIVE-4460.4.patch

HIVE-4460.4.patch - copy of HIVE-4460.3.patch to get pre-commit tests running. 
The filename of previous file would not work with the pattern expected by the 
pre commit test framework.


 Publish HCatalog artifacts for Hadoop 2.x
 -

 Key: HIVE-4460
 URL: https://issues.apache.org/jira/browse/HIVE-4460
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
 Environment: Hadoop 2.x
Reporter: Venkat Ranganathan
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.4.patch, 
 HIVE-4460.patch

   Original Estimate: 72h
  Time Spent: 40h 40m
  Remaining Estimate: 31h 20m

 HCatalog artifacts are only published for Hadoop 1.x version.  As more 
 projects add HCatalog integration, the need for HCatalog artifcats on Hadoop 
 versions supported by the product  is needed so that automated builds that 
 target different Hadoop releases can be built successfully.   For example 
 SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 
 1.x and 2.x releases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode


[ 
https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753422#comment-13753422
 ] 

Phabricator commented on HIVE-4617:
---

cwsteinbach has commented on the revision HIVE-4617 [jira] 
ExecuteStatementAsync call to run a query in non-blocking mode.

INLINE COMMENTS
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:739 Please add 
documentation for these new conf properties to hive-default.xml.template (and 
specify the units of 1).
  service/if/TCLIService.thrift:41 Please add HIVE_CLI_SERVICE_PROTOCOL_V2 
along with a short comment explaining what's new with this protocol version.
  service/if/TCLIService.thrift:455 Bump this to HIVE_CLI_SERVICE_PROTOCOL_V2. 
Also, the client should probably check to make sure it's talking to a =V2 
server before trying to execute an asynchronous call.
  service/if/TCLIService.thrift:474 Ditto.
  service/src/java/org/apache/hive/service/cli/CLIService.java:162 There's 
currently a 1:1 correspondence between operation methods in CLIService and 
SessionManager. I think it's worth maintaining that relationship, so I would 
advocate adding SessionManager.executeStatementAsync() instead of overloading 
SessionManager.executeStatement().
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java:78 
This should be private.
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java:66 I 
know that Java boolean variables default to false, but I think it would be a 
good to set this explicitly anyway.
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java:133 
Server code should never print to stdout. Also, this is squelching the error 
instead of returning it to the client.
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java:138 
Unnecessary use of this.
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java:181 
Let's add an async parameter to OperationManager.newExecuteStatementOperation() 
instead of calling instanceof and casting.
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java:57 
It would be useful to log the size of the threadpool (INFO level).
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java:79 
This log message should tell the user that 
HIVE_SERVER2_ASYNC_EXEC_SHUTDOWN_TIMEOUT=xx has been exceeded and background 
tasks are still running, and that it's going to exit anyway without doing a 
graceful task cleanup.
  service/src/test/org/apache/hive/service/cli/CLIServiceTest.java:135 Can you 
add a statement that fails (e.g. because of a syntax error) and verify that 
error information is correctly returned to the client?
  service/src/test/org/apache/hive/service/cli/CLIServiceTest.java:151 I think 
this test should verify that getOperationStatus returns OperationState.RUNNING 
at least once.
  service/src/test/org/apache/hive/service/cli/CLIServiceTest.java:118 It would 
be nice to add automated tests that cover version discrepancies between client 
and server, but that's probably too much work. Can you try testing this by hand 
and at least get a handle on what the behavior is? Users are definitely going 
to run into this, so it would be good to know what to expect before the first 
question appears on the user mailing list.
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:738 Is 10 a good 
default value? A lot of people are probably going to hit this limit and wonder 
why their queries are blocking. I think this also implies that we should add 
OperationState.PENDING or OperationState.WAITING instead of returning 
OperationState.RUNNING.

REVISION DETAIL
  https://reviews.facebook.net/D12507

To: JIRA, vaibhavgumashta
Cc: cwsteinbach


 ExecuteStatementAsync call to run a query in non-blocking mode
 --

 Key: HIVE-4617
 URL: https://issues.apache.org/jira/browse/HIVE-4617
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Jaideep Dhok
Assignee: Vaibhav Gumashta
 Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, 
 HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, 
 HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, 
 HIVE-4617.D12507Test.1.patch


 Provide a way to run a queries asynchronously. Current executeStatement call 
 blocks until the query run is complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode

2013-08-29 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753426#comment-13753426
 ] 

Carl Steinbach commented on HIVE-4617:
--

[~thejas] I added comments to phabricator. I'll leave it up to you to decide 
whether or not these issues should be addressed now or in a followup patch. 
Thanks.

 ExecuteStatementAsync call to run a query in non-blocking mode
 --

 Key: HIVE-4617
 URL: https://issues.apache.org/jira/browse/HIVE-4617
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Jaideep Dhok
Assignee: Vaibhav Gumashta
 Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, 
 HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, 
 HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, 
 HIVE-4617.D12507Test.1.patch


 Provide a way to run a queries asynchronously. Current executeStatement call 
 blocks until the query run is complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-951) Selectively include EXTERNAL TABLE source files via REGEX

2013-08-29 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-951:


Assignee: (was: Carl Steinbach)

 Selectively include EXTERNAL TABLE source files via REGEX
 -

 Key: HIVE-951
 URL: https://issues.apache.org/jira/browse/HIVE-951
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Carl Steinbach
 Attachments: HIVE-951.patch


 CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular 
 expression. 
 CREATE EXTERNAL TABLE was designed to allow users to access data that exists 
 outside of Hive, and
 currently makes the assumption that all of the files located under the 
 supplied path should be included
 in the new table. Users frequently encounter directories containing multiple
 datasets, or directories that contain data in heterogeneous schemas, and it's 
 often
 impractical or impossible to adjust the layout of the directory to meet the 
 requirements of 
 CREATE EXTERNAL TABLE. A good example of this problem is creating an external 
 table based
 on the contents of an S3 bucket. 
 One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE
 as follows:
 CREATE EXTERNAL TABLE
 ...
 LOCATION path [file_regex]
 ...
 For example:
 {code:sql}
 CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string )
 STORED AS TEXTFILE
 LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$';
 {code}
 Creates mytable1 which includes all files in s3:/my.bucket with a filename 
 matching 'folder/2009*.bz2'
 {code:sql}
 CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int )
 STORED AS TEXTFILE 
 LOCATION 'hdfs://data/' 'xyz.*2009.bz2$';
 {code}
 Creates mytable2 including all files matching 'xyz*2009.bz2' located 
 under hdfs://data/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1367) cluster by multiple columns does not work if parenthesis is present

2013-08-29 Thread efan lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753451#comment-13753451
 ] 

efan lee commented on HIVE-1367:


I found that the result of DISTRIBUTE BY is not certain?
It cause the failure of unit test.

 cluster by multiple columns does not work if parenthesis is present
 ---

 Key: HIVE-1367
 URL: https://issues.apache.org/jira/browse/HIVE-1367
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Zhenxiao Luo
 Fix For: 0.10.0

 Attachments: HIVE-1367.1.patch.txt


 The following query:
 select ...  from src cluster by (key, value)
 throws a compile error:
 whereas the query
 select ...  from src cluster by key, value
 works fine

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-951) Selectively include EXTERNAL TABLE source files via REGEX

2013-08-29 Thread indrajit (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753454#comment-13753454
 ] 

indrajit commented on HIVE-951:
---

External table really gives power to use the different tools on top of table . 
So you can get chance to do data mining. Its really very fast and easy to create

 Selectively include EXTERNAL TABLE source files via REGEX
 -

 Key: HIVE-951
 URL: https://issues.apache.org/jira/browse/HIVE-951
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Carl Steinbach
 Attachments: HIVE-951.patch


 CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular 
 expression. 
 CREATE EXTERNAL TABLE was designed to allow users to access data that exists 
 outside of Hive, and
 currently makes the assumption that all of the files located under the 
 supplied path should be included
 in the new table. Users frequently encounter directories containing multiple
 datasets, or directories that contain data in heterogeneous schemas, and it's 
 often
 impractical or impossible to adjust the layout of the directory to meet the 
 requirements of 
 CREATE EXTERNAL TABLE. A good example of this problem is creating an external 
 table based
 on the contents of an S3 bucket. 
 One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE
 as follows:
 CREATE EXTERNAL TABLE
 ...
 LOCATION path [file_regex]
 ...
 For example:
 {code:sql}
 CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string )
 STORED AS TEXTFILE
 LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$';
 {code}
 Creates mytable1 which includes all files in s3:/my.bucket with a filename 
 matching 'folder/2009*.bz2'
 {code:sql}
 CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int )
 STORED AS TEXTFILE 
 LOCATION 'hdfs://data/' 'xyz.*2009.bz2$';
 {code}
 Creates mytable2 including all files matching 'xyz*2009.bz2' located 
 under hdfs://data/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5168) Extend Hive for spatial query support

2013-08-29 Thread Fusheng Wang (JIRA)

Fusheng Wang created HIVE-5168:
--

Summary: Extend Hive for spatial query support
Key: HIVE-5168
URL: https://issues.apache.org/jira/browse/HIVE-5168
Project: Hive
Issue Type: New Feature
Reporter: Fusheng Wang

I would like to propose to incorporate a newly developed spatial querying
component into Hive.

We have recently developed a high performance MapReduce based spatial querying
system Hadoop-GIS, to support large scale spatial queries and analytics.

Hadoop-GIS is a scalable and high performance spatial data warehousing system
for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple
types of spatial queries on MapReduce through space partitioning, customizable
spatial query engine RESQUE, implicit parallel spatial query execution on
MapReduce, and effective methods for amending query results through handling
boundary objects on MapReduce. Hadoop-GIS takes advantage of global partition
indexing and customizable on demand local spatial indexing to achieve efficient
query processing. Hadoop-GIS is integrated into Hive to support declarative
spatial queries with an integrated architecture.

We have an alpha release. We look forward to contributors in Hive community to
contribute to the system.

github: https://github.com/hadoop-gis

Hadoop-GIS wiki: https://web.cci.emory.edu/confluence/display/HadoopGIS

References:
1. Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong
Zhang, Joel Saltz: Hadoop-GIS: A High Performance Spatial Data Warehousing
System Over MapReduce. In Proceedings of the 39th International Conference on
Very Large Databases (VLDB'2013), Trento, Italy, August 26-30, 2013.
http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p726-aji.pdf

2. Ablimit Aji, Fusheng Wang and Joel Saltz: Towards Building a High
Performance Spatial Query System for Large Scale Medical Imaging Data. In
Proceedings of the 20th ACM SIGSPATIAL International Conference on Advances in
Geographic Information Systems (ACM SIGSPATIAL GIS 2012), Redondo Beach,
California, USA, November 6-9, 2012.
http://confluence.cci.emory.edu:8090/download/attachments/6193390/SIGSpatial2012TechReport.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x


[ 
https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753555#comment-13753555
 ] 

Hive QA commented on HIVE-4460:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12600550/HIVE-4460.4.patch

{color:green}SUCCESS:{color} +1 2902 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/557/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/557/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 Publish HCatalog artifacts for Hadoop 2.x
 -

 Key: HIVE-4460
 URL: https://issues.apache.org/jira/browse/HIVE-4460
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
 Environment: Hadoop 2.x
Reporter: Venkat Ranganathan
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.4.patch, 
 HIVE-4460.patch

   Original Estimate: 72h
  Time Spent: 40h 40m
  Remaining Estimate: 31h 20m

 HCatalog artifacts are only published for Hadoop 1.x version.  As more 
 projects add HCatalog integration, the need for HCatalog artifcats on Hadoop 
 versions supported by the product  is needed so that automated builds that 
 target different Hadoop releases can be built successfully.   For example 
 SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 
 1.x and 2.x releases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4844) Add char/varchar data types

2013-08-29 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753554#comment-13753554
 ] 

Xuefu Zhang commented on HIVE-4844:
---

[~jdere] for 2 and 3, could you please exclude them? They will not get wasted. 
(I will eventually include the patch for HIVE-3976.) This will help rebase and 
review. Thanks a lot.

 Add char/varchar data types
 ---

 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-4844.1.patch.hack, HIVE-4844.2.patch, 
 HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, HIVE-4844.6.patch, 
 HIVE-4844.7.patch, HIVE-4844.8.patch, HIVE-4844.9.patch, screenshot.png


 Add new char/varchar data types which have support for more SQL-compliant 
 behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4789) FetchOperator fails on partitioned Avro data


[ 
https://issues.apache.org/jira/browse/HIVE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753556#comment-13753556
 ] 

Brock Noland commented on HIVE-4789:


OK, cool. Do you have time to do that? If not I'd be willing to help out with 
that.

 FetchOperator fails on partitioned Avro data
 

 Key: HIVE-4789
 URL: https://issues.apache.org/jira/browse/HIVE-4789
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.12.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Blocker
 Attachments: HIVE-4789.1.patch.txt, HIVE-4789.2.patch.txt


 HIVE-3953 fixed using partitioned avro tables for anything that used the 
 MapOperator, but those that rely on FetchOperator still fail with the same 
 error.
 e.g.
 {code}
   SELECT * FROM partitioned_avro LIMIT 5;
   SELECT * FROM partitioned_avro WHERE partition_col=value;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x


[ 
https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753645#comment-13753645
 ] 

Hive QA commented on HIVE-4460:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12600550/HIVE-4460.4.patch

{color:green}SUCCESS:{color} +1 2902 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/558/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/558/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 Publish HCatalog artifacts for Hadoop 2.x
 -

 Key: HIVE-4460
 URL: https://issues.apache.org/jira/browse/HIVE-4460
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
 Environment: Hadoop 2.x
Reporter: Venkat Ranganathan
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.4.patch, 
 HIVE-4460.patch

   Original Estimate: 72h
  Time Spent: 40h 40m
  Remaining Estimate: 31h 20m

 HCatalog artifacts are only published for Hadoop 1.x version.  As more 
 projects add HCatalog integration, the need for HCatalog artifcats on Hadoop 
 versions supported by the product  is needed so that automated builds that 
 target different Hadoop releases can be built successfully.   For example 
 SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 
 1.x and 2.x releases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1511) Hive plan serialization is slow

2013-08-29 Thread Leo Romanoff (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753681#comment-13753681
 ] 

Leo Romanoff commented on HIVE-1511:


[~kamrul] I think I fixed the problem you reported. Your test seems to pass now 
on my side. I fixed the bug in Kryo (and it was a serious one related to usage 
of nested generic classes, e.g. Maps of Maps) and it is just committed into 
Kryo trunk. Simply update your Kryo 2.22-SNAPSHOT to make sure it uses the 
latest trunk and you should be fine.

-Leo

 Hive plan serialization is slow
 ---

 Key: HIVE-1511
 URL: https://issues.apache.org/jira/browse/HIVE-1511
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Ning Zhang
Assignee: Mohammad Kamrul Islam
 Attachments: generated_plan.xml, HIVE-1511.4.patch, 
 HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, 
 HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, 
 HIVE-1511-wip4.patch, HIVE-1511-wip.patch, KryoHiveTest.java, run.sh


 As reported by Edward Capriolo:
 For reference I did this as a test case
 SELECT * FROM src where
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 ...(100 more of these)
 No OOM but I gave up after the test case did not go anywhere for about
 2 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced


 [ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4964.


   Resolution: Fixed
Fix Version/s: 0.12.0

Committed to trunk. Thanks, Harish!

 Cleanup PTF code: remove code dealing with non standard sql behavior we had 
 original introduced
 ---

 Key: HIVE-4964
 URL: https://issues.apache.org/jira/browse/HIVE-4964
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch, 
 HIVE-4964.D12585.1.patch


 There are still pieces of code that deal with:
 - supporting select expressions with Windowing
 - supporting a filter with windowing
 Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced


[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753793#comment-13753793
 ] 

Hudson commented on HIVE-4964:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #76 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/76/])
HIVE-4964 : Cleanup PTF code: remove code dealing with non standard sql 
behavior we had original introduced (Harish Butani via Ashutosh Chauhan) 
(hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518680)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java


 Cleanup PTF code: remove code dealing with non standard sql behavior we had 
 original introduced
 ---

 Key: HIVE-4964
 URL: https://issues.apache.org/jira/browse/HIVE-4964
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch, 
 HIVE-4964.D12585.1.patch


 There are still pieces of code that deal with:
 - supporting select expressions with Windowing
 - supporting a filter with windowing
 Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask

2013-08-29 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-5137:
---

Attachment: HIVE-5137.D12453.7-test.patch

 A Hive SQL query should not return a ResultSet when the underlying plan does 
 not include a FetchTask
 

 Key: HIVE-5137
 URL: https://issues.apache.org/jira/browse/HIVE-5137
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.11.0

 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, 
 HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, 
 HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, 
 HIVE-5137.D12453.7-test.patch


 Currently, a query like create table if not exists t2 as select * from t1 
 sets the hasResultSet to true in SQLOperation and in turn, the query returns 
 a result set. However, as a DDL command, this should ideally not return a 
 result set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask

2013-08-29 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-5137:
---

Status: Open  (was: Patch Available)

Uploading a copy of same patch to kickoff tests

 A Hive SQL query should not return a ResultSet when the underlying plan does 
 not include a FetchTask
 

 Key: HIVE-5137
 URL: https://issues.apache.org/jira/browse/HIVE-5137
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.11.0

 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, 
 HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, 
 HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, 
 HIVE-5137.D12453.7-test.patch


 Currently, a query like create table if not exists t2 as select * from t1 
 sets the hasResultSet to true in SQLOperation and in turn, the query returns 
 a result set. However, as a DDL command, this should ideally not return a 
 result set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask

2013-08-29 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-5137:
---

Status: Patch Available  (was: Open)

 A Hive SQL query should not return a ResultSet when the underlying plan does 
 not include a FetchTask
 

 Key: HIVE-5137
 URL: https://issues.apache.org/jira/browse/HIVE-5137
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.11.0

 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, 
 HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, 
 HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, 
 HIVE-5137.D12453.7-test.patch


 Currently, a query like create table if not exists t2 as select * from t1 
 sets the hasResultSet to true in SQLOperation and in turn, the query returns 
 a result set. However, as a DDL command, this should ideally not return a 
 result set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5158) allow getting all partitions for table to also use direct SQL path


[ 
https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753811#comment-13753811
 ] 

Ashutosh Chauhan commented on HIVE-5158:


~33 tests failed.

 allow getting all partitions for table to also use direct SQL path
 --

 Key: HIVE-5158
 URL: https://issues.apache.org/jira/browse/HIVE-5158
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-5158.D12573.1.patch, HIVE-5158.D12573.2.patch, 
 HIVE-5158.D12573.3.patch


 While testing some queries I noticed that getPartitions can be very slow 
 (which happens e.g. in non-strict mode with no partition column filter); with 
 a table with many partitions it can take 10-12s easily. SQL perf path can 
 also be used for this path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5112) Upgrade protobuf to 2.5 from 2.4


[ 
https://issues.apache.org/jira/browse/HIVE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753839#comment-13753839
 ] 

Brock Noland commented on HIVE-5112:


With 2.1.0-beta released, should we move ahead on this one?

 Upgrade protobuf to 2.5 from 2.4
 

 Key: HIVE-5112
 URL: https://issues.apache.org/jira/browse/HIVE-5112
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Owen O'Malley
 Attachments: HIVE-5112.D12429.1.patch


 Hadoop and Hbase have both upgraded protobuf. We should as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4660) Let there be Tez

2013-08-29 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753844#comment-13753844
 ] 

Bikas Saha commented on HIVE-4660:
--

Folks, FYI, based on recent feedback we have changed the names used in some of 
the TEZ API's. It a simple refactoring on the Tez side and should be a simple 
refactoring fix on the Pig side too. Jira for reference. TEZ-410.

 Let there be Tez
 

 Key: HIVE-4660
 URL: https://issues.apache.org/jira/browse/HIVE-4660
 Project: Hive
  Issue Type: New Feature
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner

 Tez is a new application framework built on Hadoop Yarn that can execute 
 complex directed acyclic graphs of general data processing tasks. Here's the 
 project's page: http://incubator.apache.org/projects/tez.html
 The interesting thing about Tez from Hive's perspective is that it will over 
 time allow us to overcome inefficiencies in query processing due to having to 
 express every algorithm in the map-reduce paradigm.
 The barrier to entry is pretty low as well: Tez can actually run unmodified 
 MR jobs; But as a first step we can without much trouble start using more of 
 Tez' features by taking advantage of the MRR pattern. 
 MRR simply means that there can be any number of reduce stages following a 
 single map stage - without having to write intermediate results to HDFS and 
 re-read them in a new job. This is common when queries require multiple 
 shuffles on keys without correlation (e.g.: join - grp by - window function - 
 order by)
 For more details see the design doc here: 
 https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5102) ORC getSplits should create splits based the stripes


 [ 
https://issues.apache.org/jira/browse/HIVE-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5102:
--

Attachment: HIVE-5102.D12579.2.patch

omalley updated the revision HIVE-5102 [jira] ORC getSplits should create 
splits based the stripes.

  Replaced local fs with the mockfs to prevent random reorderings that
  caused a test failure.

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D12579

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D12579?vs=39189id=39261#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StripeInformation.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
  shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java
  shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java
  shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
  shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java

To: JIRA, omalley


 ORC getSplits should create splits based the stripes 
 -

 Key: HIVE-5102
 URL: https://issues.apache.org/jira/browse/HIVE-5102
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5102.D12579.1.patch, HIVE-5102.D12579.2.patch


 Currently ORC inherits getSplits from FileFormat, which basically makes a 
 split per an HDFS block. This can create too little parallelism and would be 
 better done by having getSplits look at the file footer and create splits 
 based on the stripes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4660) Let there be Tez

2013-08-29 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753845#comment-13753845
 ] 

Bikas Saha commented on HIVE-4660:
--

Sorry I meant Hive instead of Pig.

 Let there be Tez
 

 Key: HIVE-4660
 URL: https://issues.apache.org/jira/browse/HIVE-4660
 Project: Hive
  Issue Type: New Feature
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner

 Tez is a new application framework built on Hadoop Yarn that can execute 
 complex directed acyclic graphs of general data processing tasks. Here's the 
 project's page: http://incubator.apache.org/projects/tez.html
 The interesting thing about Tez from Hive's perspective is that it will over 
 time allow us to overcome inefficiencies in query processing due to having to 
 express every algorithm in the map-reduce paradigm.
 The barrier to entry is pretty low as well: Tez can actually run unmodified 
 MR jobs; But as a first step we can without much trouble start using more of 
 Tez' features by taking advantage of the MRR pattern. 
 MRR simply means that there can be any number of reduce stages following a 
 single map stage - without having to write intermediate results to HDFS and 
 re-read them in a new job. This is common when queries require multiple 
 shuffles on keys without correlation (e.g.: join - grp by - window function - 
 order by)
 For more details see the design doc here: 
 https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors

2013-08-29 Thread agateaaa

Hi All:

Put some debugging code in TUGIContainingTransport.getTransport() and I
tracked it down to

@Override
public TUGIContainingTransport getTransport(TTransport trans) {

// UGI information is not available at connection setup time, it will be
set later
// via set_ugi() rpc.
transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));

//return transMap.get(trans); -change
  TUGIContainingTransport retTrans = transMap.get(trans);

  if ( retTrans == null ) {



}





On Wed, Jul 31, 2013 at 9:48 AM, agateaaa agate...@gmail.com wrote:

 Thanks Nitin

 There arent too many connections in close_wait state only 1 or two when we
 run into this. Most likely its because of dropped connection.

 I could not find any read or write timeouts we can set for the thrift
 server which will tell thrift to hold on to the client connection.
  See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt seem
 to have been implemented yet. We do have set a client connection timeout
 but cannot find
 an equivalent setting for the server.

 We have  a suspicion that this happens when we run two client processes
 which modify two distinct partitions of the same hive table. We put in a
 workaround so that the two hive client processes never run together and so
 far things look ok but we will keep monitoring.

 Could it be because hive metastore server is not thread safe, would
 running two alter table statements on two distinct partitions of the same
 table using two client connections cause problems like these, where hive
 metastore server closes or drops a wrong client connection and leaves the
 other hanging?

 Agateaaa




 On Tue, Jul 30, 2013 at 12:49 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 The mentioned flow is called when you have unsecure mode of thrift
 metastore client-server connection. So one way to avoid this is have a
 secure way.

 code
 public boolean process(final TProtocol in, final TProtocol out)
 throwsTException {
 setIpAddress(in);
 ...
 ...
 ...
 @Override
  protected void setIpAddress(final TProtocol in) {
 TUGIContainingTransport ugiTrans =
 (TUGIContainingTransport)in.getTransport();
 Socket socket = ugiTrans.getSocket();
 if (socket != null) {
   setIpAddress(socket);

 /code


 From the above code snippet, it looks like the null pointer exception is
 not handled if the getSocket returns null.

 can you check whats the ulimit setting on the server? If its set to
 default
 can you set it to unlimited and restart hcat server. (This is just a wild
 guess).

 also the getSocket method suggests If the underlying TTransport is an
 instance of TSocket, it returns the Socket object which it contains.
 Otherwise it returns null.

 so someone from thirft gurus need to tell us whats happening. I have no
 knowledge of this depth

 may be Ashutosh or Thejas will be able to help on this.




 From the netstat close_wait, it looks like the hive metastore server has
 not closed the connection (do not know why yet), may be the hive dev guys
 can help.Are there too many connections in close_wait state?



 On Tue, Jul 30, 2013 at 5:52 AM, agateaaa agate...@gmail.com wrote:

  Looking at the hive metastore server logs see errors like these:
 
  2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer
  (TThreadPoolServer.java:run(182)) - Error occurred during processing of
  message.
  java.lang.NullPointerException
  at
 
 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183)
  at
 
 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79)
  at
 
 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
 
  approx same time as we see timeout or connection reset errors.
 
  Dont know if this is the cause or the side affect of he connection
  timeout/connection reset errors. Does anybody have any pointers or
  suggestions ?
 
  Thanks
 
 
  On Mon, Jul 29, 2013 at 11:29 AM, agateaaa agate...@gmail.com wrote:
 
   Thanks Nitin!
  
   We have simiar setup (identical hcatalog and hive server versions) on
 a
   another production environment and dont see any errors (its been
 running
  ok
   for a few months)
  
   Unfortunately we wont be able to move to hcat 0.5 and hive 0.11 or
 hive
   0.10 soon.
  
   I did see that the last time we ran into this problem doing a
 netstat-ntp
   | grep :1 see that server was holding on to one socket
 connection
  in
   CLOSE_WAIT state for a long time
(hive metastore server is running on port 1). Dont know if thats
   relevant here or not
  
   Can you suggest any hive configuration settings we can tweak or
  networking
   tools/tips, we can use to

[jira] [Commented] (HIVE-5133) webhcat jobs that need to access metastore fails in secure mode


[ 
https://issues.apache.org/jira/browse/HIVE-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753882#comment-13753882
 ] 

Thejas M Nair commented on HIVE-5133:
-

Thanks for the feedback, I will create a new patch addressing these comments. I 
also need to add e2e tests.

Note about the patch - With this change for submitting pig or MR jobs you need 
to specify usehcatalog=true as a POST param. (in curl command  -d 
usehcatalog=true ). In case of pig this argument is option, it is sufficient 
that you have a  arg='-useHCatalog' POST param.

 webhcat jobs that need to access metastore fails in secure mode
 ---

 Key: HIVE-5133
 URL: https://issues.apache.org/jira/browse/HIVE-5133
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-5133.1.patch


 Webhcat job submission requests result in the pig/hive/mr job being run from 
 a map task that it launches. In secure mode, for the pig/hive/mr job that is 
 run to be authorized to perform actions on metastore, it has to have the 
 delegation tokens from the hive metastore.
 In case of pig/MR job this is needed if hcatalog is being used in the 
 script/job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4844) Add char/varchar data types


 [ 
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-4844:
-

Attachment: HIVE-4844.10.patch

attaching HIVE-4844.10.patch - remove instances of precision/scale where 
appropriate per Xuefu's request

 Add char/varchar data types
 ---

 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-4844.10.patch, HIVE-4844.1.patch.hack, 
 HIVE-4844.2.patch, HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, 
 HIVE-4844.6.patch, HIVE-4844.7.patch, HIVE-4844.8.patch, HIVE-4844.9.patch, 
 screenshot.png


 Add new char/varchar data types which have support for more SQL-compliant 
 behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3976) Support specifying scale and precision with Hive decimal type


 [ 
https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-3976:
-

Attachment: remove_prec_scale.diff

Here is the patch containing the instances where I've removed precision/scale 
from the patch to hIVE-4844, if you are interested in re-applying these changes 
on your side

 Support specifying scale and precision with Hive decimal type
 -

 Key: HIVE-3976
 URL: https://issues.apache.org/jira/browse/HIVE-3976
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, Types
Reporter: Mark Grover
Assignee: Xuefu Zhang
 Attachments: remove_prec_scale.diff


 HIVE-2693 introduced support for Decimal datatype in Hive. However, the 
 current implementation has unlimited precision and provides no way to specify 
 precision and scale when creating the table.
 For example, MySQL allows users to specify scale and precision of the decimal 
 datatype when creating the table:
 {code}
 CREATE TABLE numbers (a DECIMAL(20,2));
 {code}
 Hive should support something similar too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5161) Additional SerDe support for varchar type


 [ 
https://issues.apache.org/jira/browse/HIVE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-5161:
-

Attachment: HIVE-5161.1.patch

 Additional SerDe support for varchar type
 -

 Key: HIVE-5161
 URL: https://issues.apache.org/jira/browse/HIVE-5161
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers, Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-5161.1.patch


 Breaking out support for varchar for the various SerDes as an additional task.
 NO_COMMIT_TESTS - can't run tests until HIVE-4844 is committed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5161) Additional SerDe support for varchar type


 [ 
https://issues.apache.org/jira/browse/HIVE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-5161:
-

Description: 
Breaking out support for varchar for the various SerDes as an additional task.

NO_COMMIT_TESTS - can't run tests until HIVE-4844 is committed

  was:Breaking out support for varchar for the various SerDes as an additional 
task.


 Additional SerDe support for varchar type
 -

 Key: HIVE-5161
 URL: https://issues.apache.org/jira/browse/HIVE-5161
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers, Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-5161.1.patch


 Breaking out support for varchar for the various SerDes as an additional task.
 NO_COMMIT_TESTS - can't run tests until HIVE-4844 is committed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5161) Additional SerDe support for varchar type


 [ 
https://issues.apache.org/jira/browse/HIVE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-5161:
-

Status: Patch Available  (was: Open)

 Additional SerDe support for varchar type
 -

 Key: HIVE-5161
 URL: https://issues.apache.org/jira/browse/HIVE-5161
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers, Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-5161.1.patch


 Breaking out support for varchar for the various SerDes as an additional task.
 NO_COMMIT_TESTS - can't run tests until HIVE-4844 is committed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced


[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753980#comment-13753980
 ] 

Hudson commented on HIVE-4964:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #144 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/144/])
HIVE-4964 : Cleanup PTF code: remove code dealing with non standard sql 
behavior we had original introduced (Harish Butani via Ashutosh Chauhan) 
(hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518680)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java


 Cleanup PTF code: remove code dealing with non standard sql behavior we had 
 original introduced
 ---

 Key: HIVE-4964
 URL: https://issues.apache.org/jira/browse/HIVE-4964
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch, 
 HIVE-4964.D12585.1.patch


 There are still pieces of code that deal with:
 - supporting select expressions with Windowing
 - supporting a filter with windowing
 Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced


[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753993#comment-13753993
 ] 

Hudson commented on HIVE-4964:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2297 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2297/])
HIVE-4964 : Cleanup PTF code: remove code dealing with non standard sql 
behavior we had original introduced (Harish Butani via Ashutosh Chauhan) 
(hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518680)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java


 Cleanup PTF code: remove code dealing with non standard sql behavior we had 
 original introduced
 ---

 Key: HIVE-4964
 URL: https://issues.apache.org/jira/browse/HIVE-4964
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch, 
 HIVE-4964.D12585.1.patch


 There are still pieces of code that deal with:
 - supporting select expressions with Windowing
 - supporting a filter with windowing
 Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5169) Sorted Bucketed Partitioned Insert does not sort by dynamic partition column causing reducer OOMs/lease-expiry errors

2013-08-29 Thread Gopal V (JIRA)

Gopal V created HIVE-5169:
-

 Summary: Sorted Bucketed Partitioned Insert does not sort by 
dynamic partition column causing reducer OOMs/lease-expiry errors
 Key: HIVE-5169
 URL: https://issues.apache.org/jira/browse/HIVE-5169
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
 Environment: Ubuntu LXC, hadoop-2
Reporter: Gopal V


When a bulk-ETL operation is in progress, the query plan only sorts based on 
the SORTED BY key.

This means that the FileSinkOperator in the reducer has to keep all the dynamic 
partition RecordWriters open till the end of the reducer lifetime.

A more MR-friendly approach would be to sort by partition_col,sorted_col so 
that the data entering the reducer will not require to keep exactly one 
partition and bucket open at any given time.

As a test-case a partitioned insert for the TPC-h benchmark's lineitem table 
will suffice

{code}
create table lineitem
(L_ORDERKEY INT,
...
partitioned by (L_SHIPDATE STRING)
clustered by (l_orderkey)
sorted by (l_orderkey)
into 4 buckets
stored as ORC;

explain from (select
L_ORDERKEY ,
...) tbl 
insert overwrite table lineitem partition (L_SHIPDATE)
select *
;
{code}

The generated plan very clearly has 

{code}
 Reduce Output Operator
key expressions:
  expr: _col0
  type: int
sort order: +
Map-reduce partition columns:
  expr: _col0
  type: int
tag: -1
{code}

And col0 being L_ORDERKEY.

In the FileSinkOperator over at the reducer side, this results in a larger than 
usual number of open files.

This causes memory pressure due to the compression buffers used by ORC/RCFile 
and really slows down the reducers.

A side-effect of this is that I had to pump 350Gb of TPC-h data through 4 
reducers, which on occasion took  1 hour to get from opening a file in the FS 
to writing the first ORC stripe.

This caused HDFS lease expiry and the task dying from that error.

All of these can be avoided by adding the partition column to the sort keys as 
well as the partition keys  keeping only one writer open in the 
FileSinkOperator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5169) Sorted Bucketed Partitioned Insert does not sort by dynamic partition column causing reducer OOMs/lease-expiry errors

2013-08-29 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-5169:
--

Attachment: orc2.sql

Scale=2 ORC loader.

To generate TPC-h text tables, you can use 
https://github.com/t3rmin4t0r/tpch-gen

And for the text DDL, you can find it in the ddl/text.sql file.

 Sorted Bucketed Partitioned Insert does not sort by dynamic partition column 
 causing reducer OOMs/lease-expiry errors
 -

 Key: HIVE-5169
 URL: https://issues.apache.org/jira/browse/HIVE-5169
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
 Environment: Ubuntu LXC, hadoop-2
Reporter: Gopal V
 Attachments: orc2.sql


 When a bulk-ETL operation is in progress, the query plan only sorts based on 
 the SORTED BY key.
 This means that the FileSinkOperator in the reducer has to keep all the 
 dynamic partition RecordWriters open till the end of the reducer lifetime.
 A more MR-friendly approach would be to sort by partition_col,sorted_col so 
 that the data entering the reducer will not require to keep exactly one 
 partition and bucket open at any given time.
 As a test-case a partitioned insert for the TPC-h benchmark's lineitem table 
 will suffice
 {code}
 create table lineitem
 (L_ORDERKEY INT,
 ...
 partitioned by (L_SHIPDATE STRING)
 clustered by (l_orderkey)
 sorted by (l_orderkey)
 into 4 buckets
 stored as ORC;
 explain from (select
 L_ORDERKEY ,
 ...) tbl 
 insert overwrite table lineitem partition (L_SHIPDATE)
 select *
 ;
 {code}
 The generated plan very clearly has 
 {code}
  Reduce Output Operator
 key expressions:
   expr: _col0
   type: int
 sort order: +
 Map-reduce partition columns:
   expr: _col0
   type: int
 tag: -1
 {code}
 And col0 being L_ORDERKEY.
 In the FileSinkOperator over at the reducer side, this results in a larger 
 than usual number of open files.
 This causes memory pressure due to the compression buffers used by ORC/RCFile 
 and really slows down the reducers.
 A side-effect of this is that I had to pump 350Gb of TPC-h data through 4 
 reducers, which on occasion took  1 hour to get from opening a file in the 
 FS to writing the first ORC stripe.
 This caused HDFS lease expiry and the task dying from that error.
 All of these can be avoided by adding the partition column to the sort keys 
 as well as the partition keys  keeping only one writer open in the 
 FileSinkOperator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5168) Extend Hive for spatial query support

[
https://issues.apache.org/jira/browse/HIVE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754023#comment-13754023
]

Brock Noland commented on HIVE-5168:

The design document needs to go here
https://cwiki.apache.org/confluence/display/Hive/DesignDocs

Extend Hive for spatial query support
-

Key: HIVE-5168
URL: https://issues.apache.org/jira/browse/HIVE-5168
Project: Hive
Issue Type: New Feature
Reporter: Fusheng Wang
Labels: Hadoop-GIS, Spatial,

I would like to propose to incorporate a newly developed spatial querying
component into Hive.
We have recently developed a high performance MapReduce based spatial
querying system Hadoop-GIS, to support large scale spatial queries and
analytics.
Hadoop-GIS is a scalable and high performance spatial data warehousing system
for running large scale spatial queries on Hadoop. Hadoop-GIS supports
multiple types of spatial queries on MapReduce through space partitioning,
customizable spatial query engine RESQUE, implicit parallel spatial query
execution on MapReduce, and effective methods for amending query results
through handling boundary objects on MapReduce. Hadoop-GIS takes advantage of
global partition indexing and customizable on demand local spatial indexing
to achieve efficient query processing. Hadoop-GIS is integrated into Hive to
support declarative spatial queries with an integrated architecture.
We have an alpha release. We look forward to contributors in Hive community
to contribute to the system.
github: https://github.com/hadoop-gis
Hadoop-GIS wiki: https://web.cci.emory.edu/confluence/display/HadoopGIS
References:
1. Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong
Zhang, Joel Saltz: Hadoop-GIS: A High Performance Spatial Data Warehousing
System Over MapReduce. In Proceedings of the 39th International Conference on
Very Large Databases (VLDB'2013), Trento, Italy, August 26-30, 2013.
http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p726-aji.pdf
2. Ablimit Aji, Fusheng Wang and Joel Saltz: Towards Building a High
Performance Spatial Query System for Large Scale Medical Imaging Data. In
Proceedings of the 20th ACM SIGSPATIAL International Conference on Advances
in Geographic Information Systems (ACM SIGSPATIAL GIS 2012), Redondo Beach,
California, USA, November 6-9, 2012.
http://confluence.cci.emory.edu:8090/download/attachments/6193390/SIGSpatial2012TechReport.pdf

[jira] [Created] (HIVE-5170) Sorted Bucketed Partitioned Insert hard-codes the reducer count == bucket count

2013-08-29 Thread Gopal V (JIRA)

Gopal V created HIVE-5170:
-

Summary: Sorted Bucketed Partitioned Insert hard-codes the reducer
count == bucket count
Key: HIVE-5170
URL: https://issues.apache.org/jira/browse/HIVE-5170
Project: Hive
Issue Type: Bug
Components: Query Processor
Affects Versions: 0.12.0
Environment: Ubuntu LXC
Reporter: Gopal V

When performing a hive sorted-partitioned insert, the insert optimizer
hard-codes the number of output files to the actual bucket count of the table.

https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L4852

We need at least that many reducers or if limited, switch to multi-spray (as
implemented already), but more reducers is wasteful as long as the HiveKey only
contains the partition columns.

At this point, we're limited to reducers = n-bucket still, which is a problem
for partitioning requests which need to insert nearly a terabyte of data into a
single-digit bucket count and four-digit partition count.

Since that is routed by the hasCode of the HiveKey, we can ensure that works by
modifying the HiveKey to handle n-buckets internally.

Basically it should only generate hashCode = (sort_cols.hashCode() % n) routing
only to n reducers over-all, despite how many we spin up.

So far so good with the hard-coded reducer count.

But provided we fix the issues brought up by HIVE-5169, the insert becomes
friendlier to a higher reducer count as well.

At this juncture, we can modify the hashCode to be slightly more interesting.

hashCode = (part_cols.hashCode()*31 + (sort_cols.hashCode() % n))

This generates somewhere between n to partition_count * n unique hash-codes.

Since the sort-order bucketing has to be maintained per-partition dir,
distributing this equally across any number of reducers will result in the
scale-out of the reducer count.

This will allow a reducer count that will allow for far faster inserts of ORC
data into a partitioned/sorted table.

[jira] [Commented] (HIVE-1511) Hive plan serialization is slow

2013-08-29 Thread Mohammad Kamrul Islam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754031#comment-13754031
 ] 

Mohammad Kamrul Islam commented on HIVE-1511:
-

[~romixlev] Thanks a lot for quick fix.

Now working on the next failed one.



 Hive plan serialization is slow
 ---

 Key: HIVE-1511
 URL: https://issues.apache.org/jira/browse/HIVE-1511
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Ning Zhang
Assignee: Mohammad Kamrul Islam
 Attachments: generated_plan.xml, HIVE-1511.4.patch, 
 HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, 
 HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, 
 HIVE-1511-wip4.patch, HIVE-1511-wip.patch, KryoHiveTest.java, run.sh


 As reported by Edward Capriolo:
 For reference I did this as a test case
 SELECT * FROM src where
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 ...(100 more of these)
 No OOM but I gave up after the test case did not go anywhere for about
 2 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4844) Add char/varchar data types


[ 
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754038#comment-13754038
 ] 

Jason Dere commented on HIVE-4844:
--

Xuefu, you're going to hate me for this one, but upon review of the code with 
hbutani, I am planning to remove the 
ParameterizedPrimitiveTypeInfo/ParameterizedPrimitiveObjectInspector interfaces 
and just add those methods to the PrimitiveTypeInfo/PrimitiveObjectInspector 
interfaces. I hope this doesn't cause too many rebase issues with your decimal 
work.

 Add char/varchar data types
 ---

 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-4844.10.patch, HIVE-4844.1.patch.hack, 
 HIVE-4844.2.patch, HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, 
 HIVE-4844.6.patch, HIVE-4844.7.patch, HIVE-4844.8.patch, HIVE-4844.9.patch, 
 screenshot.png


 Add new char/varchar data types which have support for more SQL-compliant 
 behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1511) Hive plan serialization is slow


[ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754035#comment-13754035
 ] 

Brock Noland commented on HIVE-1511:


Great to hear guys! When you are at a point where it makes sense it'd be 
interesting to see another run of the precommit tests.

 Hive plan serialization is slow
 ---

 Key: HIVE-1511
 URL: https://issues.apache.org/jira/browse/HIVE-1511
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Ning Zhang
Assignee: Mohammad Kamrul Islam
 Attachments: generated_plan.xml, HIVE-1511.4.patch, 
 HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, 
 HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, 
 HIVE-1511-wip4.patch, HIVE-1511-wip.patch, KryoHiveTest.java, run.sh


 As reported by Edward Capriolo:
 For reference I did this as a test case
 SELECT * FROM src where
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 ...(100 more of these)
 No OOM but I gave up after the test case did not go anywhere for about
 2 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5168) Extend Hive for spatial query support

2013-08-29 Thread Fusheng Wang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754055#comment-13754055
]

Fusheng Wang commented on HIVE-5168:

The DesignDocs wiki doesn't allow uploads from non-admin users. Should I update
it here?

Extend Hive for spatial query support
-

Key: HIVE-5168
URL: https://issues.apache.org/jira/browse/HIVE-5168
Project: Hive
Issue Type: New Feature
Reporter: Fusheng Wang
Labels: Hadoop-GIS, Spatial,

[jira] [Commented] (HIVE-5168) Extend Hive for spatial query support

[
https://issues.apache.org/jira/browse/HIVE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754060#comment-13754060
]

Brock Noland commented on HIVE-5168:

Yeah that is unfortunate. You can either upload it here or [~ashutoshc] can
give you edit privs.

Extend Hive for spatial query support
-

Key: HIVE-5168
URL: https://issues.apache.org/jira/browse/HIVE-5168
Project: Hive
Issue Type: New Feature
Reporter: Fusheng Wang
Labels: Hadoop-GIS, Spatial,

[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries


[ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754073#comment-13754073
 ] 

Gunther Hagleitner commented on HIVE-5091:
--

Committed to trunk. Thanks Owen!

 ORC files should have an option to pad stripes to the HDFS block boundaries
 ---

 Key: HIVE-5091
 URL: https://issues.apache.org/jira/browse/HIVE-5091
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, 
 HIVE-5091.D12249.3.patch


 With ORC stripes being large, if a stripe straddles an HDFS block, the 
 locality of read is suboptimal. It would be good to add padding to ensure 
 that stripes don't straddle HDFS blocks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns

2013-08-29 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754076#comment-13754076
 ] 

Yin Huai commented on HIVE-5149:


Right, we should only use key as the partitioning column.

Actually, the example I posted above is from test file 
reduce_deduplicate_extended.q. The plan of explain from (select key, value 
from src group by key, value) s select s.key group by s.key in hive trunk is 
wrong. 

 ReduceSinkDeDuplication can pick the wrong partitioning columns
 ---

 Key: HIVE-5149
 URL: https://issues.apache.org/jira/browse/HIVE-5149
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch


 https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries


 [ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5091:
-

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

 ORC files should have an option to pad stripes to the HDFS block boundaries
 ---

 Key: HIVE-5091
 URL: https://issues.apache.org/jira/browse/HIVE-5091
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.12.0

 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, 
 HIVE-5091.D12249.3.patch


 With ORC stripes being large, if a stripe straddles an HDFS block, the 
 locality of read is suboptimal. It would be good to add padding to ensure 
 that stripes don't straddle HDFS blocks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5171) metastore server can cache pruning results across queries

2013-08-29 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5171:
---

Assignee: (was: Sergey Shelukhin)

 metastore server can cache pruning results across queries 
 --

 Key: HIVE-5171
 URL: https://issues.apache.org/jira/browse/HIVE-5171
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin

 Partition pruning results are cached during a query (SemanticAnalyzer and 
 ParseContext are the scope). 
 We could also cache them between queries in MetaStore, which would be 
 especially useful if metastore server is remote and thus long-lived/shared 
 between clients.
 It may be more complex than it seems due to OOM potential. Also the key would 
 need to be changed since the same expression string that is currently used 
 may mean different things for different queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5172) TUGIContainingTransport returning null transport, causing intermittent SocketTimeoutException on hive client and NullPointerException in TUGIBasedProcessor on the server

2013-08-29 Thread agate (JIRA)

agate created HIVE-5172:
---

 Summary: TUGIContainingTransport returning null transport, causing 
intermittent SocketTimeoutException on hive client and NullPointerException in 
TUGIBasedProcessor on the server
 Key: HIVE-5172
 URL: https://issues.apache.org/jira/browse/HIVE-5172
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.9.0
Reporter: agate


We are running into frequent problem using HCatalog 0.4.1 (Hive Metastore 
Server 0.9) where we get connection reset or connection timeout errors on the 
client and NullPointerException in TUGITransport on the server. 



hive client logs:
=

org.apache.thrift.transport.TTransportException: 
java.net.SocketTimeoutException: Read timed out
at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:954)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7524)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 31 more



hive metastore server logs:
===

2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer 
(TThreadPoolServer.java:run(182)) - Error occurred during processing of message.
java.lang.NullPointerException
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


Adding some extra debug log messages in TUGIBasedProcessor, noticed that the 
TUGIContainingTransport is null which results in NullPointerException on the 
server.

Further drilling into TUGIContainingTransport noticed that getTransport() 
returns a null which causes the above 
error. 

Further corelating with GC logs observed that that error always hits when the 
CMS GC has just kicked in 
(but does not happen after every GC)



Put some debugging code in TUGIContainingTransport.getTransport() and I tracked 
it down to 

@Override
public TUGIContainingTransport getTransport(TTransport trans) {

  // UGI information is not

[jira] [Commented] (HIVE-5102) ORC getSplits should create splits based the stripes


[ 
https://issues.apache.org/jira/browse/HIVE-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754092#comment-13754092
 ] 

Hive QA commented on HIVE-5102:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12600616/HIVE-5102.D12579.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2907 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/560/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/560/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 ORC getSplits should create splits based the stripes 
 -

 Key: HIVE-5102
 URL: https://issues.apache.org/jira/browse/HIVE-5102
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5102.D12579.1.patch, HIVE-5102.D12579.2.patch


 Currently ORC inherits getSplits from FileFormat, which basically makes a 
 split per an HDFS block. This can create too little parallelism and would be 
 better done by having getSplits look at the file footer and create splits 
 based on the stripes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5172) TUGIContainingTransport returning null transport, causing intermittent SocketTimeoutException on hive client and NullPointerException in TUGIBasedProcessor on the server

2013-08-29 Thread agate (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

agate updated HIVE-5172:


Description: 
We are running into frequent problem using HCatalog 0.4.1 (Hive Metastore 
Server 0.9) where we get connection reset or connection timeout errors on the 
client and NullPointerException in TUGITransport on the server. 



hive client logs:
=

org.apache.thrift.transport.TTransportException: 
java.net.SocketTimeoutException: Read timed out
at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:954)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7524)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 31 more



hive metastore server logs:
===

2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer 
(TThreadPoolServer.java:run(182)) - Error occurred during processing of message.
java.lang.NullPointerException
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


Adding some extra debug log messages in TUGIBasedProcessor, noticed that the 
TUGIContainingTransport is null which results in NullPointerException on the 
server.

Further drilling into TUGIContainingTransport noticed that getTransport() 
returns a null which causes the above 
error. 

Further corelating with GC logs observed that that error always hits when the 
CMS GC has just kicked in 
(but does not happen after every GC)



Put some debugging code in TUGIContainingTransport.getTransport() and I tracked 
it down to 

@Override
public TUGIContainingTransport getTransport(TTransport trans) {

  // UGI information is not available at connection setup time, it will be 
set later
  // via set_ugi() rpc.
  transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));

  //return transMap.get(trans); //-change

  TUGIContainingTransport retTrans = transMap.get(trans);

  if ( retTrans == null )

Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors

2013-08-29 Thread agateaaa

Thanks Ashutosh.

Filed https://issues.apache.org/jira/browse/HIVE-5172


On Thu, Aug 29, 2013 at 11:53 AM, Ashutosh Chauhan hashut...@apache.orgwrote:

 Thanks Agatea for digging in. Seems like you have hit a bug. Would you
 mind opening a jira and adding your findings to it.

 Thanks,
 Ashutosh


 On Thu, Aug 29, 2013 at 11:22 AM, agateaaa agate...@gmail.com wrote:

 Sorry hit send too soon ...

 Hi All:

 Put some debugging code in TUGIContainingTransport.getTransport() and I
 tracked it down to

 @Override
 public TUGIContainingTransport getTransport(TTransport trans) {

 // UGI information is not available at connection setup time, it will be
 set later
 // via set_ugi() rpc.
 transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));

 //return transMap.get(trans); //-change
   TUGIContainingTransport retTrans = transMap.get(trans);

   if ( retTrans == null ) {
  LOGGER.error ( cannot find transport that was in map !!)
}  else {
  LOGGER.debug ( cannot find transport that was in map !!)
  return retTrans;
}
 }

 When we run this in our test environment, see that we run into the problem
 just after GC runs,
 and cannot find transport that was in the map!! message gets logged.

 Could the GC be collecting entries from transMap, just before the we get
 it

 Tried a minor change which seems to work

 public TUGIContainingTransport getTransport(TTransport trans) {

TUGIContainingTransport retTrans = transMap.get(trans);

 if ( retTrans == null ) {
 // UGI information is not available at connection setup time, it will be
 set later
 // via set_ugi() rpc.
 transMap.putIfAbsent(trans, retTrans);
 }
return retTrans;
 }


 My questions for hive and  thrift experts

 1.) Do we need to use a ConcurrentMap
 ConcurrentMapTTransport, TUGIContainingTransport transMap = new
 MapMaker().weakKeys().weakValues().makeMap();
 It does use == to compare keys (which might be the problem), also in this
 case we cant rely on the trans to be always there in the transMap, even
 after a put, so in that case change above
 probably makes sense


 2.) Is it better idea to use WeakHashMap with WeakReference instead ? (was
 looking at org.apache.thrift.transport.TSaslServerTransport, esp change
 made by THRIFT-1468)

 e.g.
 private static MapTTransport, WeakReferenceTUGIContainingTransport
 transMap3 = Collections.synchronizedMap(new WeakHashMapTTransport,
 WeakReferenceTUGIContainingTransport());

 getTransport() would be something like

 public TUGIContainingTransport getTransport(TTransport trans) {
 WeakReferenceTUGIContainingTransport ret = transMap.get(trans);
 if (ret == null || ret.get() == null) {
 ret = new WeakReferenceTUGIContainingTransport(new
 TUGIContainingTransport(trans));
 transMap3.put(trans, ret); // No need for putIfAbsent().
 // Concurrent calls to getTransport() will pass in different TTransports.
 }
 return ret.get();
 }


 I did try 1.) above in our test environment and it does seem to resolve
 the
 problem, though i am not sure if I am introducing any other problem


 Can someone help ?


 Thanks
 Agatea













 On Thu, Aug 29, 2013 at 10:57 AM, agateaaa agate...@gmail.com wrote:

  Hi All:
 
  Put some debugging code in TUGIContainingTransport.getTransport() and I
  tracked it down to
 
  @Override
  public TUGIContainingTransport getTransport(TTransport trans) {
 
  // UGI information is not available at connection setup time, it will be
  set later
  // via set_ugi() rpc.
  transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));
 
  //return transMap.get(trans); -change
TUGIContainingTransport retTrans = transMap.get(trans);
 
if ( retTrans == null ) {
 
 
 
  }
 
 
 
 
 
  On Wed, Jul 31, 2013 at 9:48 AM, agateaaa agate...@gmail.com wrote:
 
  Thanks Nitin
 
  There arent too many connections in close_wait state only 1 or two when
  we run into this. Most likely its because of dropped connection.
 
  I could not find any read or write timeouts we can set for the thrift
  server which will tell thrift to hold on to the client connection.
   See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt
  seem to have been implemented yet. We do have set a client connection
  timeout but cannot find
  an equivalent setting for the server.
 
  We have  a suspicion that this happens when we run two client processes
  which modify two distinct partitions of the same hive table. We put in
 a
  workaround so that the two hive client processes never run together
 and so
  far things look ok but we will keep monitoring.
 
  Could it be because hive metastore server is not thread safe, would
  running two alter table statements on two distinct partitions of the
 same
  table using two client connections cause problems like these, where
 hive
  metastore server closes or drops a wrong client connection and leaves
 the
  other hanging?
 
  Agateaaa
 
 
 
 
  On Tue, Jul 30, 2013

[jira] [Updated] (HIVE-5172) TUGIContainingTransport returning null transport, causing intermittent SocketTimeoutException on hive client and NullPointerException in TUGIBasedProcessor on the server

2013-08-29 Thread agate (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

agate updated HIVE-5172:


Description: 
We are running into frequent problem using HCatalog 0.4.1 (Hive Metastore 
Server 0.9) where we get connection reset or connection timeout errors on the 
client and NullPointerException in TUGIBasedProcessor on the server. 



hive client logs:
=

org.apache.thrift.transport.TTransportException: 
java.net.SocketTimeoutException: Read timed out
at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:954)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7524)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 31 more



hive metastore server logs:
===

2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer 
(TThreadPoolServer.java:run(182)) - Error occurred during processing of message.
java.lang.NullPointerException
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


Adding some extra debug log messages in TUGIBasedProcessor, noticed that the 
TUGIContainingTransport is null which results in NullPointerException on the 
server.

Further drilling into TUGIContainingTransport noticed that getTransport() 
returns a null which causes the above 
error. 

Further corelating with GC logs observed that that error always hits when the 
CMS GC has just kicked in 
(but does not happen after every GC)



Put some debugging code in TUGIContainingTransport.getTransport() and I tracked 
it down to 

@Override
public TUGIContainingTransport getTransport(TTransport trans) {

  // UGI information is not available at connection setup time, it will be 
set later
  // via set_ugi() rpc.
  transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));

  //return transMap.get(trans); //-change

  TUGIContainingTransport retTrans = transMap.get(trans);

  if ( retTrans ==

[jira] [Updated] (HIVE-5014) [HCatalog] Fix HCatalog build issue on Windows


 [ 
https://issues.apache.org/jira/browse/HIVE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5014:
---

Status: Patch Available  (was: Open)

(Setting to patch-available to let jenkins pick it up)

 [HCatalog] Fix HCatalog build issue on Windows
 --

 Key: HIVE-5014
 URL: https://issues.apache.org/jira/browse/HIVE-5014
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-5014-1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well


[ 
https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754093#comment-13754093
 ] 

Phabricator commented on HIVE-5029:
---

sershe has commented on the revision HIVE-5029 [jira] direct SQL perf 
optimization cannot be tested well.

INLINE COMMENTS
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1684 the 
rollback is performed in finally. Here we only roll back to re-open it
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1783 the 
rollback is performed in finally. Here we only roll back to re-open it

REVISION DETAIL
  https://reviews.facebook.net/D12483

To: JIRA, ashutoshc, sershe


 direct SQL perf optimization cannot be tested well
 --

 Key: HIVE-5029
 URL: https://issues.apache.org/jira/browse/HIVE-5029
 Project: Hive
  Issue Type: Test
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.D12483.2.patch, 
 HIVE-5029.patch, HIVE-5029.patch


 HIVE-4051 introduced perf optimization that involves getting partitions 
 directly via SQL in metastore. Given that SQL queries might not work on all 
 datastores (and will not work on non-SQL ones), JDO fallback is in place.
 Given that perf improvement is very large for short queries, it's on by 
 default.
 However, there's a problem with tests with regard to that. If SQL code is 
 broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might 
 allow tests to pass.
 We are going to disable SQL by default before the testing problem is resolved.
 There are several possible solultions:
 1) Separate build for this setting. Seems like an overkill...
 2) Enable by default; disable by default in tests, create a clone of 
 TestCliDriver with a subset of queries that will exercise the SQL path.
 3) Have some sort of test hook inside metastore that will run both ORM and 
 SQL and compare.
 3') Or make a subclass of ObjectStore that will do that. ObjectStore is 
 already pluggable.
 4) Write unit tests for one of the modes (JDO, as non-default?) and declare 
 that they are sufficient; disable fallback in tests.
 3' seems like the easiest. For now we will disable SQL by default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5173) Wincompat : Add .cmd/text/crlf to .gitattributes

Sushanth Sowmyan created HIVE-5173:
--

 Summary: Wincompat : Add .cmd/text/crlf to .gitattributes
 Key: HIVE-5173
 URL: https://issues.apache.org/jira/browse/HIVE-5173
 Project: Hive
  Issue Type: Sub-task
  Components: Windows
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-5173.patch

Add .cmd entry to .gitattributes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5173) Wincompat : Add .cmd/text/crlf to .gitattributes


 [ 
https://issues.apache.org/jira/browse/HIVE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5173:
---

Attachment: HIVE-5173.patch

 Wincompat : Add .cmd/text/crlf to .gitattributes
 

 Key: HIVE-5173
 URL: https://issues.apache.org/jira/browse/HIVE-5173
 Project: Hive
  Issue Type: Sub-task
  Components: Windows
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-5173.patch


 Add .cmd entry to .gitattributes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5173) Wincompat : Add .cmd/text/crlf to .gitattributes


 [ 
https://issues.apache.org/jira/browse/HIVE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5173:
---

Status: Patch Available  (was: Open)

 Wincompat : Add .cmd/text/crlf to .gitattributes
 

 Key: HIVE-5173
 URL: https://issues.apache.org/jira/browse/HIVE-5173
 Project: Hive
  Issue Type: Sub-task
  Components: Windows
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-5173.patch


 Add .cmd entry to .gitattributes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3104) Predicate pushdown doesn't work with multi-insert statements using LATERAL VIEW

2013-08-29 Thread Steven Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Wong updated HIVE-3104:
--

Description: 
Predicate pushdown seems to work for single-insert queries using LATERAL VIEW. 
It also seems to work for multi-insert queries *not* using LATERAL VIEW. 
However, it doesn't work for multi-insert queries using LATERAL VIEW.

Here are some examples. In the below examples, I make use of the fact that a 
query with no partition filtering when run under hive.mapred.mode=strict 
fails.

--Table creation and population
DROP TABLE IF EXISTS test;
CREATE TABLE test (col1 arrayint, col2 int)  PARTITIONED BY (part_col int);
INSERT OVERWRITE TABLE test PARTITION (part_col=1) SELECT array(1,2), count(*) 
FROM test;
INSERT OVERWRITE TABLE test PARTITION (part_col=2) SELECT array(2,4,6), 
count(*) FROM test;

-- Query 1
-- This succeeds (using LATERAL VIEW with single insert)
set hive.mapred.mode=strict;
FROM test
LATERAL VIEW explode(col1) tmp AS exp_col1
INSERT OVERWRITE DIRECTORY '/test/1'
SELECT exp_col1
WHERE (part_col=2);

-- Query 2
-- This succeeds (NOT using LATERAL VIEW with multi-insert)
set hive.mapred.mode=strict;
FROM test
INSERT OVERWRITE DIRECTORY '/test/1'
SELECT col1
WHERE (part_col=2)
INSERT OVERWRITE DIRECTORY '/test/2'
SELECT col1
WHERE (part_col=2);

-- Query 3
-- This fails (using LATERAL VIEW with multi-insert)
set hive.mapred.mode=strict;
FROM test
LATERAL VIEW explode(col1) tmp AS exp_col1
INSERT OVERWRITE DIRECTORY '/test/1'
SELECT exp_col1
WHERE (part_col=2)
INSERT OVERWRITE DIRECTORY '/test/2'
SELECT exp_col1
WHERE (part_col=2);


  was:
Predicate pushdown seems to work for single-insert queries using LATERAL VIEW. 
It also seems to work for multi-insert queries *not* using LATERAL VIEW. 
However, it doesn't work for multi-insert queries using LATERAL VIEW.

Here are some examples. In the below examples, I make use of the fact that a 
query with no partition filtering when run under hive.mapred.mode=strict 
fails.

--Table creation and population
DROP TABLE IF EXISTS test;
CREATE TABLE test (col1 arrayint, col2 int)  PARTITIONED BY (part_col int);
INSERT OVERWRITE TABLE test PARTITION (part_col=1) SELECT array(1,2), count(*) 
FROM test;
INSERT OVERWRITE TABLE test PARTITION (part_col=2) SELECT array(2,4,6), 
count(*) FROM test;

-- Query 1
-- This succeeds (using LATERAL VIEW with single insert)
set hive.mapred.mode=strict;
FROM partition_test
LATERAL VIEW explode(col1) tmp AS exp_col1
INSERT OVERWRITE DIRECTORY '/test/1'
SELECT exp_col1
WHERE (part_col=2);

-- Query 2
-- This succeeds (NOT using LATERAL VIEW with multi-insert)
set hive.mapred.mode=strict;
FROM partition_test
INSERT OVERWRITE DIRECTORY '/test/1'
SELECT col1
WHERE (part_col=2)
INSERT OVERWRITE DIRECTORY '/test/2'
SELECT col1
WHERE (part_col=2);

-- Query 3
-- This fails (using LATERAL VIEW with multi-insert)
set hive.mapred.mode=strict;
FROM test
LATERAL VIEW explode(col1) tmp AS exp_col1
INSERT OVERWRITE DIRECTORY '/test/1'
SELECT exp_col1
WHERE (part_col=2)
INSERT OVERWRITE DIRECTORY '/test/2'
SELECT exp_col1
WHERE (part_col=2);



 Predicate pushdown doesn't work with multi-insert statements using LATERAL 
 VIEW
 ---

 Key: HIVE-3104
 URL: https://issues.apache.org/jira/browse/HIVE-3104
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.9.0
 Environment: Apache Hive 0.9.0, Apache Hadoop 0.20.205.0
Reporter: Mark Grover
Assignee: Xuefu Zhang

 Predicate pushdown seems to work for single-insert queries using LATERAL 
 VIEW. It also seems to work for multi-insert queries *not* using LATERAL 
 VIEW. However, it doesn't work for multi-insert queries using LATERAL VIEW.
 Here are some examples. In the below examples, I make use of the fact that a 
 query with no partition filtering when run under hive.mapred.mode=strict 
 fails.
 --Table creation and population
 DROP TABLE IF EXISTS test;
 CREATE TABLE test (col1 arrayint, col2 int)  PARTITIONED BY (part_col int);
 INSERT OVERWRITE TABLE test PARTITION (part_col=1) SELECT array(1,2), 
 count(*) FROM test;
 INSERT OVERWRITE TABLE test PARTITION (part_col=2) SELECT array(2,4,6), 
 count(*) FROM test;
 -- Query 1
 -- This succeeds (using LATERAL VIEW with single insert)
 set hive.mapred.mode=strict;
 FROM test
 LATERAL VIEW explode(col1) tmp AS exp_col1
 INSERT OVERWRITE DIRECTORY '/test/1'
 SELECT exp_col1
 WHERE (part_col=2);
 -- Query 2
 -- This succeeds (NOT using LATERAL VIEW with multi-insert)
 set hive.mapred.mode=strict;
 FROM test
 INSERT OVERWRITE DIRECTORY '/test/1'
 SELECT col1
 WHERE (part_col=2)
 INSERT OVERWRITE DIRECTORY '/test/2'
 SELECT col1
 WHERE (part_col=2);

[jira] [Updated] (HIVE-3104) Predicate pushdown doesn't work with multi-insert statements using LATERAL VIEW

2013-08-29 Thread Steven Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Wong updated HIVE-3104:
--

Description: 
Predicate pushdown seems to work for single-insert queries using LATERAL VIEW. 
It also seems to work for multi-insert queries *not* using LATERAL VIEW. 
However, it doesn't work for multi-insert queries using LATERAL VIEW: It errors 
out right away with 'FAILED: SemanticException [Error 10041]: No partition 
predicate found for Alias test Table test'.

Here are some examples. In the below examples, I make use of the fact that a 
query with no partition filtering when run under hive.mapred.mode=strict 
fails.

--Table creation and population
DROP TABLE IF EXISTS test;
CREATE TABLE test (col1 arrayint, col2 int)  PARTITIONED BY (part_col int);
INSERT OVERWRITE TABLE test PARTITION (part_col=1) SELECT array(1,2), count(*) 
FROM test;
INSERT OVERWRITE TABLE test PARTITION (part_col=2) SELECT array(2,4,6), 
count(*) FROM test;

-- Query 1
-- This succeeds (using LATERAL VIEW with single insert)
set hive.mapred.mode=strict;
FROM test
LATERAL VIEW explode(col1) tmp AS exp_col1
INSERT OVERWRITE DIRECTORY '/test/1'
SELECT exp_col1
WHERE (part_col=2);

-- Query 2
-- This succeeds (NOT using LATERAL VIEW with multi-insert)
set hive.mapred.mode=strict;
FROM test
INSERT OVERWRITE DIRECTORY '/test/1'
SELECT col1
WHERE (part_col=2)
INSERT OVERWRITE DIRECTORY '/test/2'
SELECT col1
WHERE (part_col=2);

-- Query 3
-- This fails (using LATERAL VIEW with multi-insert)
set hive.mapred.mode=strict;
FROM test
LATERAL VIEW explode(col1) tmp AS exp_col1
INSERT OVERWRITE DIRECTORY '/test/1'
SELECT exp_col1
WHERE (part_col=2)
INSERT OVERWRITE DIRECTORY '/test/2'
SELECT exp_col1
WHERE (part_col=2);


  was:
Predicate pushdown seems to work for single-insert queries using LATERAL VIEW. 
It also seems to work for multi-insert queries *not* using LATERAL VIEW. 
However, it doesn't work for multi-insert queries using LATERAL VIEW.

Here are some examples. In the below examples, I make use of the fact that a 
query with no partition filtering when run under hive.mapred.mode=strict 
fails.

--Table creation and population
DROP TABLE IF EXISTS test;
CREATE TABLE test (col1 arrayint, col2 int)  PARTITIONED BY (part_col int);
INSERT OVERWRITE TABLE test PARTITION (part_col=1) SELECT array(1,2), count(*) 
FROM test;
INSERT OVERWRITE TABLE test PARTITION (part_col=2) SELECT array(2,4,6), 
count(*) FROM test;

-- Query 1
-- This succeeds (using LATERAL VIEW with single insert)
set hive.mapred.mode=strict;
FROM test
LATERAL VIEW explode(col1) tmp AS exp_col1
INSERT OVERWRITE DIRECTORY '/test/1'
SELECT exp_col1
WHERE (part_col=2);

-- Query 2
-- This succeeds (NOT using LATERAL VIEW with multi-insert)
set hive.mapred.mode=strict;
FROM test
INSERT OVERWRITE DIRECTORY '/test/1'
SELECT col1
WHERE (part_col=2)
INSERT OVERWRITE DIRECTORY '/test/2'
SELECT col1
WHERE (part_col=2);

-- Query 3
-- This fails (using LATERAL VIEW with multi-insert)
set hive.mapred.mode=strict;
FROM test
LATERAL VIEW explode(col1) tmp AS exp_col1
INSERT OVERWRITE DIRECTORY '/test/1'
SELECT exp_col1
WHERE (part_col=2)
INSERT OVERWRITE DIRECTORY '/test/2'
SELECT exp_col1
WHERE (part_col=2);



 Predicate pushdown doesn't work with multi-insert statements using LATERAL 
 VIEW
 ---

 Key: HIVE-3104
 URL: https://issues.apache.org/jira/browse/HIVE-3104
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.9.0
 Environment: Apache Hive 0.9.0, Apache Hadoop 0.20.205.0
Reporter: Mark Grover
Assignee: Xuefu Zhang

 Predicate pushdown seems to work for single-insert queries using LATERAL 
 VIEW. It also seems to work for multi-insert queries *not* using LATERAL 
 VIEW. However, it doesn't work for multi-insert queries using LATERAL VIEW: 
 It errors out right away with 'FAILED: SemanticException [Error 10041]: No 
 partition predicate found for Alias test Table test'.
 Here are some examples. In the below examples, I make use of the fact that a 
 query with no partition filtering when run under hive.mapred.mode=strict 
 fails.
 --Table creation and population
 DROP TABLE IF EXISTS test;
 CREATE TABLE test (col1 arrayint, col2 int)  PARTITIONED BY (part_col int);
 INSERT OVERWRITE TABLE test PARTITION (part_col=1) SELECT array(1,2), 
 count(*) FROM test;
 INSERT OVERWRITE TABLE test PARTITION (part_col=2) SELECT array(2,4,6), 
 count(*) FROM test;
 -- Query 1
 -- This succeeds (using LATERAL VIEW with single insert)
 set hive.mapred.mode=strict;
 FROM test
 LATERAL VIEW explode(col1) tmp AS exp_col1
 INSERT OVERWRITE DIRECTORY '/test/1'
 SELECT exp_col1
 WHERE (part_col=2);
 -- Query 2
 -- This succeeds

[jira] [Created] (HIVE-5174) Wincompat : junit.file.schema and hadoop.testcp, set-hadoop-test-classpath build configurability

Sushanth Sowmyan created HIVE-5174:
--

 Summary: Wincompat : junit.file.schema and hadoop.testcp, 
set-hadoop-test-classpath build configurability
 Key: HIVE-5174
 URL: https://issues.apache.org/jira/browse/HIVE-5174
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Adding junit.file.schema and hadoop.testcp configurability to build, adding 
set-hadoop-test-classpath target.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5174) Wincompat : junit.file.schema and hadoop.testcp, set-hadoop-test-classpath build configurability


 [ 
https://issues.apache.org/jira/browse/HIVE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5174:
---

Status: Patch Available  (was: Open)

 Wincompat : junit.file.schema and hadoop.testcp, set-hadoop-test-classpath 
 build configurability
 

 Key: HIVE-5174
 URL: https://issues.apache.org/jira/browse/HIVE-5174
 Project: Hive
  Issue Type: Sub-task
  Components: Windows
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-5174.patch


 Adding junit.file.schema and hadoop.testcp configurability to build, adding 
 set-hadoop-test-classpath target.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5174) Wincompat : junit.file.schema and hadoop.testcp, set-hadoop-test-classpath build configurability


 [ 
https://issues.apache.org/jira/browse/HIVE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5174:
---

Attachment: HIVE-5174.patch

 Wincompat : junit.file.schema and hadoop.testcp, set-hadoop-test-classpath 
 build configurability
 

 Key: HIVE-5174
 URL: https://issues.apache.org/jira/browse/HIVE-5174
 Project: Hive
  Issue Type: Sub-task
  Components: Windows
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-5174.patch


 Adding junit.file.schema and hadoop.testcp configurability to build, adding 
 set-hadoop-test-classpath target.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5175) Wcompat : adds HADOOP_TIME_ZONE env property and user.timezone sysproperty

Sushanth Sowmyan created HIVE-5175:
--

 Summary: Wcompat : adds HADOOP_TIME_ZONE env property and 
user.timezone sysproperty
 Key: HIVE-5175
 URL: https://issues.apache.org/jira/browse/HIVE-5175
 Project: Hive
  Issue Type: Sub-task
  Components: Windows
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Adding HADOOP_TIME_ZONE and env property user.timezone as US/Pacific, needed 
for certain tests in windows to pass.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5176) Wcompat : Changes for allowing various path compatibilities with Windows

Sushanth Sowmyan created HIVE-5176:
--

 Summary: Wcompat : Changes for allowing various path 
compatibilities with Windows
 Key: HIVE-5176
 URL: https://issues.apache.org/jira/browse/HIVE-5176
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


We need to make certain changes across the board to allow us to read/parse 
windows paths. Some are escaping changes, some are being strict about how we 
read paths (through URL.encode/decode, etc)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5178) WCompat : QTestUtil changes

Sushanth Sowmyan created HIVE-5178:
--

 Summary: WCompat : QTestUtil changes
 Key: HIVE-5178
 URL: https://issues.apache.org/jira/browse/HIVE-5178
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


Miscellaneous QTestUtil changes are needed to make tests work under windows:

a) Aux jars needed to be set up for minimr
b) Ignore empty test lines if windows

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5177) WCompat : Retrying handler related changes

Sushanth Sowmyan created HIVE-5177:
--

 Summary: WCompat : Retrying handler related changes
 Key: HIVE-5177
 URL: https://issues.apache.org/jira/browse/HIVE-5177
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5179) WCompat : change script tests from bash to sh

Sushanth Sowmyan created HIVE-5179:
--

 Summary: WCompat : change script tests from bash to sh
 Key: HIVE-5179
 URL: https://issues.apache.org/jira/browse/HIVE-5179
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5180) Packaging : Add Windows installation and execution .cmd and .ps1 scripts

Sushanth Sowmyan created HIVE-5180:
--

 Summary: Packaging : Add Windows installation and execution .cmd 
and .ps1 scripts
 Key: HIVE-5180
 URL: https://issues.apache.org/jira/browse/HIVE-5180
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-5178) WCompat : QTestUtil changes


 [ 
https://issues.apache.org/jira/browse/HIVE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan reassigned HIVE-5178:
--

Assignee: Sushanth Sowmyan

 WCompat : QTestUtil changes
 ---

 Key: HIVE-5178
 URL: https://issues.apache.org/jira/browse/HIVE-5178
 Project: Hive
  Issue Type: Sub-task
  Components: Windows
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan

 Miscellaneous QTestUtil changes are needed to make tests work under windows:
 a) Aux jars needed to be set up for minimr
 b) Ignore empty test lines if windows

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4944) Hive Windows Scripts and Compatibility changes


[ 
https://issues.apache.org/jira/browse/HIVE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754123#comment-13754123
 ] 

Sushanth Sowmyan commented on HIVE-4944:


(Made subtask patches from the monolithic patches attached to this jira for 
easier reviewing, will upload each patch individually)

 Hive Windows Scripts and Compatibility changes
 --

 Key: HIVE-4944
 URL: https://issues.apache.org/jira/browse/HIVE-4944
 Project: Hive
  Issue Type: Bug
  Components: Windows
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: compat.patch, packaging.patch


 Porting patches that enable hive packaging and running under windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4944) Hive Windows Scripts and Compatibility changes


[ 
https://issues.apache.org/jira/browse/HIVE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754127#comment-13754127
 ] 

Sushanth Sowmyan commented on HIVE-4944:


Also, these patches aren't originally by me, I'll add in the names of each of 
the contributors in the individual patches, they're from contributors in 
Microsoft who developed against hive 0.9, and asked for my help in reviewing 
and forward-porting to trunk.

 Hive Windows Scripts and Compatibility changes
 --

 Key: HIVE-4944
 URL: https://issues.apache.org/jira/browse/HIVE-4944
 Project: Hive
  Issue Type: Bug
  Components: Windows
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: compat.patch, packaging.patch


 Porting patches that enable hive packaging and running under windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4944) Hive Windows Scripts and Compatibility changes


[ 
https://issues.apache.org/jira/browse/HIVE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754130#comment-13754130
 ] 

Sushanth Sowmyan commented on HIVE-4944:


Edit, above should read :

Also, these patches aren't originally by me, I'll add in the names of each of 
the contributors in the individual patches, they're from contributors in 
Microsoft who developed against hive 0.9, and asked for my help in reviewing 
and forward-porting to trunk and contributing it to apache hive.

 Hive Windows Scripts and Compatibility changes
 --

 Key: HIVE-4944
 URL: https://issues.apache.org/jira/browse/HIVE-4944
 Project: Hive
  Issue Type: Bug
  Components: Windows
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: compat.patch, packaging.patch


 Porting patches that enable hive packaging and running under windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5175) Wcompat : adds HADOOP_TIME_ZONE env property and user.timezone sysproperty


 [ 
https://issues.apache.org/jira/browse/HIVE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5175:
---

Attachment: HIVE-5175.patch

 Wcompat : adds HADOOP_TIME_ZONE env property and user.timezone sysproperty
 --

 Key: HIVE-5175
 URL: https://issues.apache.org/jira/browse/HIVE-5175
 Project: Hive
  Issue Type: Sub-task
  Components: Windows
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-5175.patch


 Adding HADOOP_TIME_ZONE and env property user.timezone as US/Pacific, needed 
 for certain tests in windows to pass.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5175) Wcompat : adds HADOOP_TIME_ZONE env property and user.timezone sysproperty


 [ 
https://issues.apache.org/jira/browse/HIVE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5175:
---

Status: Patch Available  (was: Open)

 Wcompat : adds HADOOP_TIME_ZONE env property and user.timezone sysproperty
 --

 Key: HIVE-5175
 URL: https://issues.apache.org/jira/browse/HIVE-5175
 Project: Hive
  Issue Type: Sub-task
  Components: Windows
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-5175.patch


 Adding HADOOP_TIME_ZONE and env property user.timezone as US/Pacific, needed 
 for certain tests in windows to pass.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5178) WCompat : QTestUtil changes


 [ 
https://issues.apache.org/jira/browse/HIVE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5178:
---

Attachment: HIVE-5178.patch

 WCompat : QTestUtil changes
 ---

 Key: HIVE-5178
 URL: https://issues.apache.org/jira/browse/HIVE-5178
 Project: Hive
  Issue Type: Sub-task
  Components: Windows
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-5178.patch


 Miscellaneous QTestUtil changes are needed to make tests work under windows:
 a) Aux jars needed to be set up for minimr
 b) Ignore empty test lines if windows

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5178) WCompat : QTestUtil changes


 [ 
https://issues.apache.org/jira/browse/HIVE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5178:
---

Status: Patch Available  (was: Open)

 WCompat : QTestUtil changes
 ---

 Key: HIVE-5178
 URL: https://issues.apache.org/jira/browse/HIVE-5178
 Project: Hive
  Issue Type: Sub-task
  Components: Windows
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-5178.patch


 Miscellaneous QTestUtil changes are needed to make tests work under windows:
 a) Aux jars needed to be set up for minimr
 b) Ignore empty test lines if windows

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5179) WCompat : change script tests from bash to sh


 [ 
https://issues.apache.org/jira/browse/HIVE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5179:
---

Status: Patch Available  (was: Open)

 WCompat : change script tests from bash to sh
 -

 Key: HIVE-5179
 URL: https://issues.apache.org/jira/browse/HIVE-5179
 Project: Hive
  Issue Type: Sub-task
  Components: Windows
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-5179.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4965) Add support so that PTFs can stream their output; Windowing PTF should do this


 [ 
https://issues.apache.org/jira/browse/HIVE-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4965:
--

Attachment: HIVE-4965.D12615.1.patch

hbutani requested code review of HIVE-4965 [jira] Add support so that PTFs can 
stream their output; Windowing PTF should do this.

Reviewers: JIRA, ashutoshc

fix lint issues

There is no need to create an output PTF Partition for the last PTF in a chain. 
For the Windowing PTF this should give a perf. boost; we avoid creating 
temporary results for each UDAF; avoid populating an output Partition.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D12615

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/30297/

To: JIRA, ashutoshc, hbutani


 Add support so that PTFs can stream their output; Windowing PTF should do this
 --

 Key: HIVE-4965
 URL: https://issues.apache.org/jira/browse/HIVE-4965
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
 Attachments: HIVE-4965.D12033.1.patch, HIVE-4965.D12615.1.patch


 There is no need to create an output PTF Partition for the last PTF in a 
 chain. For the Windowing PTF this should give a perf. boost; we avoid 
 creating temporary results for each UDAF; avoid populating an output 
 Partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5179) WCompat : change script tests from bash to sh


 [ 
https://issues.apache.org/jira/browse/HIVE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5179:
---

Attachment: HIVE-5179.patch

 WCompat : change script tests from bash to sh
 -

 Key: HIVE-5179
 URL: https://issues.apache.org/jira/browse/HIVE-5179
 Project: Hive
  Issue Type: Sub-task
  Components: Windows
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-5179.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns


[ 
https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754165#comment-13754165
 ] 

Ashutosh Chauhan commented on HIVE-5149:


Ah.. right! I missed that. I will take a look at the patch!

 ReduceSinkDeDuplication can pick the wrong partitioning columns
 ---

 Key: HIVE-5149
 URL: https://issues.apache.org/jira/browse/HIVE-5149
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch


 https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5095) Hive needs new operator walker for parallelization/optimization for tez


 [ 
https://issues.apache.org/jira/browse/HIVE-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5095:
-

Attachment: HIVE-5095.1.patch

 Hive needs new operator walker for parallelization/optimization for tez
 ---

 Key: HIVE-5095
 URL: https://issues.apache.org/jira/browse/HIVE-5095
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: tez-branch

 Attachments: HIVE-5095.1.patch, HIVE-5095.WIP.patch.txt


 For tez to compute the number of reducers, we should be walking the operator 
 tree in a topological fashion so that the reducers down the tree get the 
 estimate from all parents. However, the current walkers in hive only walk the 
 operator tree in a depth-first fashion. We need to add a new walker for the 
 topological walk. Also, since information about the parent operators needs to 
 be propagated on a per parent basis, we need to retain some context across 
 operators to be passed to the child which the walker will co-ordinate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5095) Hive needs new operator walker for parallelization/optimization for tez

[
https://issues.apache.org/jira/browse/HIVE-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gunther Hagleitner updated HIVE-5095:
-

Description:
For tez to compute the number of reducers, we should be walking the operator
tree in a topological fashion so that the reducers down the tree get the
estimate from all parents. However, the current walkers in hive only walk the
operator tree in a depth-first fashion. We need to add a new walker for the
topological walk. Also, since information about the parent operators needs to
be propagated on a per parent basis, we need to retain some context across
operators to be passed to the child which the walker will co-ordinate.

NO PRECOMMIT TESTS (this is wip for the tez branch)

was:For tez to compute the number of reducers, we should be walking the
operator tree in a topological fashion so that the reducers down the tree get
the estimate from all parents. However, the current walkers in hive only walk
the operator tree in a depth-first fashion. We need to add a new walker for the
topological walk. Also, since information about the parent operators needs to
be propagated on a per parent basis, we need to retain some context across
operators to be passed to the child which the walker will co-ordinate.

Hive needs new operator walker for parallelization/optimization for tez
---

Key: HIVE-5095
URL: https://issues.apache.org/jira/browse/HIVE-5095
Project: Hive
Issue Type: Bug
Components: Tez
Affects Versions: tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
Fix For: tez-branch

Attachments: HIVE-5095.1.patch, HIVE-5095.WIP.patch.txt

For tez to compute the number of reducers, we should be walking the operator
tree in a topological fashion so that the reducers down the tree get the
estimate from all parents. However, the current walkers in hive only walk the
operator tree in a depth-first fashion. We need to add a new walker for the
topological walk. Also, since information about the parent operators needs to
be propagated on a per parent basis, we need to retain some context across
operators to be passed to the child which the walker will co-ordinate.
NO PRECOMMIT TESTS (this is wip for the tez branch)

[jira] [Commented] (HIVE-5168) Extend Hive for spatial query support

[
https://issues.apache.org/jira/browse/HIVE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754190#comment-13754190
]

Ashutosh Chauhan commented on HIVE-5168:

Hi [~wangfsh] I have granted you privs. You should be able to upload the doc
now.

Extend Hive for spatial query support
-

Key: HIVE-5168
URL: https://issues.apache.org/jira/browse/HIVE-5168
Project: Hive
Issue Type: New Feature
Reporter: Fusheng Wang
Labels: Hadoop-GIS, Spatial,

[jira] [Resolved] (HIVE-5095) Hive needs new operator walker for parallelization/optimization for tez


 [ 
https://issues.apache.org/jira/browse/HIVE-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner resolved HIVE-5095.
--

Resolution: Fixed

 Hive needs new operator walker for parallelization/optimization for tez
 ---

 Key: HIVE-5095
 URL: https://issues.apache.org/jira/browse/HIVE-5095
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: tez-branch

 Attachments: HIVE-5095.1.patch, HIVE-5095.WIP.patch.txt


 For tez to compute the number of reducers, we should be walking the operator 
 tree in a topological fashion so that the reducers down the tree get the 
 estimate from all parents. However, the current walkers in hive only walk the 
 operator tree in a depth-first fashion. We need to add a new walker for the 
 topological walk. Also, since information about the parent operators needs to 
 be propagated on a per parent basis, we need to retain some context across 
 operators to be passed to the child which the walker will co-ordinate.
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5095) Hive needs new operator walker for parallelization/optimization for tez


[ 
https://issues.apache.org/jira/browse/HIVE-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754198#comment-13754198
 ] 

Gunther Hagleitner commented on HIVE-5095:
--

Committed .1 to branch. Thanks!

 Hive needs new operator walker for parallelization/optimization for tez
 ---

 Key: HIVE-5095
 URL: https://issues.apache.org/jira/browse/HIVE-5095
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: tez-branch

 Attachments: HIVE-5095.1.patch, HIVE-5095.WIP.patch.txt


 For tez to compute the number of reducers, we should be walking the operator 
 tree in a topological fashion so that the reducers down the tree get the 
 estimate from all parents. However, the current walkers in hive only walk the 
 operator tree in a depth-first fashion. We need to add a new walker for the 
 topological walk. Also, since information about the parent operators needs to 
 be propagated on a per parent basis, we need to retain some context across 
 operators to be passed to the child which the walker will co-ordinate.
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-5052) Set parallelism when generating the tez tasks


 [ 
https://issues.apache.org/jira/browse/HIVE-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner resolved HIVE-5052.
--

Resolution: Duplicate

 Set parallelism when generating the tez tasks
 -

 Key: HIVE-5052
 URL: https://issues.apache.org/jira/browse/HIVE-5052
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Vikram Dixit K
 Fix For: tez-branch

 Attachments: HIVE-5052.1.patch.txt, HIVE-5052.2.patch.txt


 In GenTezTask any intermediate task has parallelism set to 1. This needs to 
 be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4914) filtering via partition name should be done inside metastore server (implementation)


[ 
https://issues.apache.org/jira/browse/HIVE-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754201#comment-13754201
 ] 

Ashutosh Chauhan commented on HIVE-4914:


If there are udfs in expression we should still do expression eval on client 
because 
* Otherwise user jar is required on server.
* It will be security concern to run user code in metastore server.

 filtering via partition name should be done inside metastore server 
 (implementation)
 

 Key: HIVE-4914
 URL: https://issues.apache.org/jira/browse/HIVE-4914
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-4914.01.patch, HIVE-4914.D12561.1.patch, 
 HIVE-4914-only-no-gen.patch, HIVE-4914-only.patch, HIVE-4914.patch, 
 HIVE-4914.patch, HIVE-4914.patch


 Currently, if the filter pushdown is impossible (which is most cases), the 
 client gets all partition names from metastore, filters them, and asks for 
 partitions by names for the filtered set.
 Metastore server code should do that instead; it should check if pushdown is 
 possible and do it if so; otherwise it should do name-based filtering.
 Saves the roundtrip with all partition names from the server to client, and 
 also removes the need to have pushdown viability checking on both sides.
 NO PRECOMMIT TESTS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4844) Add char/varchar data types