date:20130429


 [ 
https://issues.apache.org/jira/browse/HIVE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4381:
--

Attachment: HIVE-4381.D10551.2.patch

rusanu updated the revision HIVE-4381 [jira] Implement vectorized aggregation 
expressions.

  update patch after 4f7470d

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D10551

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D10551?vs=32901id=33147#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/ColumnExpression.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorAggregateExpression.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFAvgDouble.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFAvgLong.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFCountDouble.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFCountLong.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFMaxDouble.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFMaxLong.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFMinDouble.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFMinLong.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFStdPopDouble.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFStdPopLong.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFStdSampDouble.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFStdSampLong.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFSumDouble.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFSumLong.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFVarPopDouble.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFVarPopLong.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFVarSampDouble.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen/VectorUDAFVarSampLong.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/VectorUDAFAvg.txt
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/VectorUDAFCount.txt
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/VectorUDAFMinMax.txt
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/VectorUDAFSum.txt
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/VectorUDAFVar.txt
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorGroupByOperator.java
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/FakeCaptureOutputDesc.java
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/FakeCaptureOutputOperator.java
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/FakeVectorDataSourceOperator.java
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/FakeVectorDataSourceOperatorDesc.java
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/FakeVectorRowBatchBase.java
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/FakeVectorRowBatchFromConcat.java
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/FakeVectorRowBatchFromIterables.java
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/FakeVectorRowBatchFromRepeats.java

To: JIRA, rusanu


 Implement vectorized aggregation expressions
 

 Key: HIVE-4381
 URL: https://issues.apache.org/jira/browse/HIVE-4381
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Affects Versions: vectorization-branch
Reporter: Jitendra Nath Pandey
Assignee: Remus Rusanu
  Labels: patch
 Fix For: vectorization-branch

 Attachments: HIVE-4381.D10449.1.patch, HIVE-4381.D10449.2.patch, 
 HIVE-4381.D10449.3.patch, HIVE-4381.D10449.4.patch, HIVE-4381.D10551.1.patch, 
 HIVE-4381.D10551.2.patch


 Vectorized implementation for sum, min, max, average and count.

--
This message is automatically generated by JIRA.
If you think it was sent

[jira] [Commented] (HIVE-4381) Implement vectorized aggregation expressions


[ 
https://issues.apache.org/jira/browse/HIVE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644563#comment-13644563
 ] 

Phabricator commented on HIVE-4381:
---

ashutoshc has accepted the revision HIVE-4381 [jira] Implement vectorized 
aggregation expressions.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D10551

BRANCH
  vectorization

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, rusanu


 Implement vectorized aggregation expressions
 

 Key: HIVE-4381
 URL: https://issues.apache.org/jira/browse/HIVE-4381
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Affects Versions: vectorization-branch
Reporter: Jitendra Nath Pandey
Assignee: Remus Rusanu
  Labels: patch
 Fix For: vectorization-branch

 Attachments: HIVE-4381.D10449.1.patch, HIVE-4381.D10449.2.patch, 
 HIVE-4381.D10449.3.patch, HIVE-4381.D10449.4.patch, HIVE-4381.D10551.1.patch, 
 HIVE-4381.D10551.2.patch


 Vectorized implementation for sum, min, max, average and count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4381) Implement vectorized aggregation expressions


 [ 
https://issues.apache.org/jira/browse/HIVE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4381:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to branch. Thanks, Remus!

 Implement vectorized aggregation expressions
 

 Key: HIVE-4381
 URL: https://issues.apache.org/jira/browse/HIVE-4381
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Affects Versions: vectorization-branch
Reporter: Jitendra Nath Pandey
Assignee: Remus Rusanu
  Labels: patch
 Fix For: vectorization-branch

 Attachments: HIVE-4381.D10449.1.patch, HIVE-4381.D10449.2.patch, 
 HIVE-4381.D10449.3.patch, HIVE-4381.D10449.4.patch, HIVE-4381.D10551.1.patch, 
 HIVE-4381.D10551.2.patch


 Vectorized implementation for sum, min, max, average and count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4383) Implement vectorized string column-scalar filters


 [ 
https://issues.apache.org/jira/browse/HIVE-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4383:
---

Status: Open  (was: Patch Available)

Patch is not applying cleanly on branch. Can you please rebase it?

 Implement vectorized string column-scalar filters
 -

 Key: HIVE-4383
 URL: https://issues.apache.org/jira/browse/HIVE-4383
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Attachments: HIVE-4383.1.patch, HIVE-4383.2.patch


 Create patch for implementing string columns compared with scalars as 
 vectorized filters, and apply it to vectorization branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4370) Change ORC tree readers to return batches of rows instead of a row


 [ 
https://issues.apache.org/jira/browse/HIVE-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4370:
---

Affects Version/s: vectorization-branch
   Status: Open  (was: Patch Available)

Patch is not applying cleanly. Can you please rebase it?

 Change ORC tree readers to return batches of rows instead of a row 
 ---

 Key: HIVE-4370
 URL: https://issues.apache.org/jira/browse/HIVE-4370
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Sarvesh Sakalanaga
Assignee: Sarvesh Sakalanaga
 Attachments: HIVE-4370.1.patch, HIVE-4370.2.patch, HIVE-4370.3.patch


 Change ORC Record reader and Tree readers to return a set of Rows instead of 
 a row. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4389) thrift files are re-generated by compiling

2013-04-29 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644631#comment-13644631
 ] 

Gang Tim Liu commented on HIVE-4389:


+1

 thrift files are re-generated by compiling
 --

 Key: HIVE-4389
 URL: https://issues.apache.org/jira/browse/HIVE-4389
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4389.1.patch


 I am not sure what is going on, but there seems to be a bunch of thrift 
 changes
 if I perform ant thriftif.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4370) Change ORC tree readers to return batches of rows instead of a row

2013-04-29 Thread Sarvesh Sakalanaga (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sarvesh Sakalanaga updated HIVE-4370:
-

Attachment: HIVE-4370.4.patch

Sure. Attached a new patch which is rebased.

 Change ORC tree readers to return batches of rows instead of a row 
 ---

 Key: HIVE-4370
 URL: https://issues.apache.org/jira/browse/HIVE-4370
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Sarvesh Sakalanaga
Assignee: Sarvesh Sakalanaga
 Attachments: HIVE-4370.1.patch, HIVE-4370.2.patch, HIVE-4370.3.patch, 
 HIVE-4370.4.patch, HIVE-4370.4.patch, HIVE-4370.4.patch


 Change ORC Record reader and Tree readers to return a set of Rows instead of 
 a row. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4370) Change ORC tree readers to return batches of rows instead of a row

2013-04-29 Thread Sarvesh Sakalanaga (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sarvesh Sakalanaga updated HIVE-4370:
-

Attachment: HIVE-4370.4.patch

Hi Ashutosh,
Sorry about that. Can you try the one attached?

Thanks,
Sarvesh



 Change ORC tree readers to return batches of rows instead of a row 
 ---

 Key: HIVE-4370
 URL: https://issues.apache.org/jira/browse/HIVE-4370
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Sarvesh Sakalanaga
Assignee: Sarvesh Sakalanaga
 Attachments: HIVE-4370.1.patch, HIVE-4370.2.patch, HIVE-4370.3.patch, 
 HIVE-4370.4.patch, HIVE-4370.4.patch, HIVE-4370.4.patch


 Change ORC Record reader and Tree readers to return a set of Rows instead of 
 a row. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4370) Change ORC tree readers to return batches of rows instead of a row

2013-04-29 Thread Sarvesh Sakalanaga (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sarvesh Sakalanaga updated HIVE-4370:
-

Attachment: HIVE-4370.4.patch

Hi Ashutosh,
Sorry about that. Can you try the one attached?

Thanks,
Sarvesh



 Change ORC tree readers to return batches of rows instead of a row 
 ---

 Key: HIVE-4370
 URL: https://issues.apache.org/jira/browse/HIVE-4370
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Sarvesh Sakalanaga
Assignee: Sarvesh Sakalanaga
 Attachments: HIVE-4370.1.patch, HIVE-4370.2.patch, HIVE-4370.3.patch, 
 HIVE-4370.4.patch, HIVE-4370.4.patch, HIVE-4370.4.patch


 Change ORC Record reader and Tree readers to return a set of Rows instead of 
 a row. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4441) [WebHCat] WebHCat does not honor user home directory


 [ 
https://issues.apache.org/jira/browse/HIVE-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4441:
-

Attachment: HIVE-4441-1.patch

 [WebHCat] WebHCat does not honor user home directory
 

 Key: HIVE-4441
 URL: https://issues.apache.org/jira/browse/HIVE-4441
 Project: Hive
  Issue Type: Bug
Reporter: Daniel Dai
 Attachments: HIVE-4441-1.patch


 If I submit a job as user A and I specify statusdir as a relative path, I 
 would expect results to be stored in the folder relative to the user A's home 
 folder.
 For example, if I run:
 {code}curl -s -d user.name=hdinsightuser -d execute=show+tables; -d 
 statusdir=pokes.output 'http://localhost:50111/templeton/v1/hive'{code}
 I get the results under:
 {code}/user/hdp/pokes.output{code}
 And I expect them to be under:
 {code}/user/hdinsightuser/pokes.output{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4441) [HCatalog] WebHCat does not honor user home directory


 [ 
https://issues.apache.org/jira/browse/HIVE-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4441:
-

Summary: [HCatalog] WebHCat does not honor user home directory  (was: 
[WebHCat] WebHCat does not honor user home directory)

 [HCatalog] WebHCat does not honor user home directory
 -

 Key: HIVE-4441
 URL: https://issues.apache.org/jira/browse/HIVE-4441
 Project: Hive
  Issue Type: Bug
Reporter: Daniel Dai
 Attachments: HIVE-4441-1.patch


 If I submit a job as user A and I specify statusdir as a relative path, I 
 would expect results to be stored in the folder relative to the user A's home 
 folder.
 For example, if I run:
 {code}curl -s -d user.name=hdinsightuser -d execute=show+tables; -d 
 statusdir=pokes.output 'http://localhost:50111/templeton/v1/hive'{code}
 I get the results under:
 {code}/user/hdp/pokes.output{code}
 And I expect them to be under:
 {code}/user/hdinsightuser/pokes.output{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4444) [HCatalog] WebHCat Hive should support equivalent parameters as Pig

Daniel Dai created HIVE-:


 Summary: [HCatalog] WebHCat Hive should support equivalent 
parameters as Pig 
 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Dai


Currently there is no files and args parameter in Hive. We shall add them 
to make them similar to Pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4443) [HCatalog] Have an option for GET queue to return all job information in single call


 [ 
https://issues.apache.org/jira/browse/HIVE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4443:
-

Component/s: HCatalog

 [HCatalog] Have an option for GET queue to return all job information in 
 single call 
 -

 Key: HIVE-4443
 URL: https://issues.apache.org/jira/browse/HIVE-4443
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Dai

 Currently do display a summary of all jobs, one has to call GET queue to 
 retrieve all the jobids and then call GET queue/:jobid for each job. It would 
 be nice to do this in a single call.
 I would suggest:
 * GET queue - mark deprecate
 * GET queue/jobID - mark deprecate
 * DELETE queue/jobID - mark deprecate
 * GET jobs - return the list of JSON objects jobid but no detailed info
 * GET jobs/fields=* - return the list of JSON objects containing detailed Job 
 info
 * GET jobs/jobID - return the single JSON object containing the detailed 
 Job info for the job with the given ID (equivalent to GET queue/jobID)
 * DELETE jobs/jobID - equivalent to DELETE queue/jobID

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4442) [HCatalog] WebHCat should not override user.name parameter for Queue call


 [ 
https://issues.apache.org/jira/browse/HIVE-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4442:
-

Component/s: HCatalog

 [HCatalog] WebHCat should not override user.name parameter for Queue call
 -

 Key: HIVE-4442
 URL: https://issues.apache.org/jira/browse/HIVE-4442
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Daniel Dai

 Currently templeton for the Queue call uses the user.name to filter the 
 results of the call in addition to the default security.
 Ideally the filter is an optional parameter to the call independent of the 
 security check.
 I would suggest a parameter in addition to GET queue (jobs) give you all the 
 jobs a user have permission:
 GET queue?showall=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4441) [HCatalog] WebHCat does not honor user home directory


 [ 
https://issues.apache.org/jira/browse/HIVE-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4441:
-

Component/s: HCatalog

 [HCatalog] WebHCat does not honor user home directory
 -

 Key: HIVE-4441
 URL: https://issues.apache.org/jira/browse/HIVE-4441
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4441-1.patch


 If I submit a job as user A and I specify statusdir as a relative path, I 
 would expect results to be stored in the folder relative to the user A's home 
 folder.
 For example, if I run:
 {code}curl -s -d user.name=hdinsightuser -d execute=show+tables; -d 
 statusdir=pokes.output 'http://localhost:50111/templeton/v1/hive'{code}
 I get the results under:
 {code}/user/hdp/pokes.output{code}
 And I expect them to be under:
 {code}/user/hdinsightuser/pokes.output{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4442) [HCatalog] WebHCat should not override user.name parameter for Queue call


 [ 
https://issues.apache.org/jira/browse/HIVE-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4442:
-

Attachment: HIVE-4442-1.patch

 [HCatalog] WebHCat should not override user.name parameter for Queue call
 -

 Key: HIVE-4442
 URL: https://issues.apache.org/jira/browse/HIVE-4442
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4442-1.patch


 Currently templeton for the Queue call uses the user.name to filter the 
 results of the call in addition to the default security.
 Ideally the filter is an optional parameter to the call independent of the 
 security check.
 I would suggest a parameter in addition to GET queue (jobs) give you all the 
 jobs a user have permission:
 GET queue?showall=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4443) [HCatalog] Have an option for GET queue to return all job information in single call


 [ 
https://issues.apache.org/jira/browse/HIVE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4443:
-

Attachment: HIVE-4443-1.patch

Attach patch. The patch also contains e2e tests for HIVE-4442. That is because 
HIVE-4442 and HIVE-4443 are very intervolved and it is harder to separate the 
tests.

 [HCatalog] Have an option for GET queue to return all job information in 
 single call 
 -

 Key: HIVE-4443
 URL: https://issues.apache.org/jira/browse/HIVE-4443
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4443-1.patch


 Currently do display a summary of all jobs, one has to call GET queue to 
 retrieve all the jobids and then call GET queue/:jobid for each job. It would 
 be nice to do this in a single call.
 I would suggest:
 * GET queue - mark deprecate
 * GET queue/jobID - mark deprecate
 * DELETE queue/jobID - mark deprecate
 * GET jobs - return the list of JSON objects jobid but no detailed info
 * GET jobs/fields=* - return the list of JSON objects containing detailed Job 
 info
 * GET jobs/jobID - return the single JSON object containing the detailed 
 Job info for the job with the given ID (equivalent to GET queue/jobID)
 * DELETE jobs/jobID - equivalent to DELETE queue/jobID

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4443) [HCatalog] Have an option for GET queue to return all job information in single call


 [ 
https://issues.apache.org/jira/browse/HIVE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4443:
-

Attachment: (was: HIVE-4443-1.patch)

 [HCatalog] Have an option for GET queue to return all job information in 
 single call 
 -

 Key: HIVE-4443
 URL: https://issues.apache.org/jira/browse/HIVE-4443
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4443-1.patch


 Currently do display a summary of all jobs, one has to call GET queue to 
 retrieve all the jobids and then call GET queue/:jobid for each job. It would 
 be nice to do this in a single call.
 I would suggest:
 * GET queue - mark deprecate
 * GET queue/jobID - mark deprecate
 * DELETE queue/jobID - mark deprecate
 * GET jobs - return the list of JSON objects jobid but no detailed info
 * GET jobs/fields=* - return the list of JSON objects containing detailed Job 
 info
 * GET jobs/jobID - return the single JSON object containing the detailed 
 Job info for the job with the given ID (equivalent to GET queue/jobID)
 * DELETE jobs/jobID - equivalent to DELETE queue/jobID

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4443) [HCatalog] Have an option for GET queue to return all job information in single call


 [ 
https://issues.apache.org/jira/browse/HIVE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4443:
-

Attachment: HIVE-4443-1.patch

 [HCatalog] Have an option for GET queue to return all job information in 
 single call 
 -

 Key: HIVE-4443
 URL: https://issues.apache.org/jira/browse/HIVE-4443
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4443-1.patch


 Currently do display a summary of all jobs, one has to call GET queue to 
 retrieve all the jobids and then call GET queue/:jobid for each job. It would 
 be nice to do this in a single call.
 I would suggest:
 * GET queue - mark deprecate
 * GET queue/jobID - mark deprecate
 * DELETE queue/jobID - mark deprecate
 * GET jobs - return the list of JSON objects jobid but no detailed info
 * GET jobs/fields=* - return the list of JSON objects containing detailed Job 
 info
 * GET jobs/jobID - return the single JSON object containing the detailed 
 Job info for the job with the given ID (equivalent to GET queue/jobID)
 * DELETE jobs/jobID - equivalent to DELETE queue/jobID

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4442) [HCatalog] WebHCat should not override user.name parameter for Queue call


[ 
https://issues.apache.org/jira/browse/HIVE-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644674#comment-13644674
 ] 

Daniel Dai commented on HIVE-4442:
--

Attach patch. Note the e2e tests is intervolved with HIVE-4443. I include all 
tests in HIVE-4443.

 [HCatalog] WebHCat should not override user.name parameter for Queue call
 -

 Key: HIVE-4442
 URL: https://issues.apache.org/jira/browse/HIVE-4442
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4442-1.patch


 Currently templeton for the Queue call uses the user.name to filter the 
 results of the call in addition to the default security.
 Ideally the filter is an optional parameter to the call independent of the 
 security check.
 I would suggest a parameter in addition to GET queue (jobs) give you all the 
 jobs a user have permission:
 GET queue?showall=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4444) [HCatalog] WebHCat Hive should support equivalent parameters as Pig


 [ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-:
-

Attachment: HIVE--1.patch

 [HCatalog] WebHCat Hive should support equivalent parameters as Pig 
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE--1.patch


 Currently there is no files and args parameter in Hive. We shall add them 
 to make them similar to Pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Weird issue with running tests

2013-04-29 Thread Sushanth Sowmyan

Hi folks,

I'm running into a weird issue with testing HIVE-3682. I don't think
it's so much to do with the jira at hand itself, as it is to do with
the test or the testing framework.

Basically, if I run the .q file test itself, it succeeds. If I run it
as part of a broader ant test, it fails, and seemingly consistently.
What's more, the reason it fails is that the produced .out file does
not match the golden output in an interesting way. I'm attaching the
two files with this mail if anyone wants to look at it further, but
the diff between them is as follows:

926,929c926
 REHOOK: query: create table array_table (a arraystring, b arraystring)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ','^@163:val_163^@
---
 163:val_163
943c940
 REHOOK: type: CREATETABLE^@444:val_444^@
---
 444:val_444
1027a1025,1029
 PREHOOK: query: create table array_table (a arraystring, b arraystring)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ','
 PREHOOK: type: CREATETABLE

Note#1 : the PREHOOK log line for a create table seems to have been
logged before the !cat that preceeded it finished logging.
Note#2 : The ^@ separators seem to be indicating a switch between
streams writing out to the .out file.
Note#3 : The P in PREHOOK seems to get gobbled up each time this happens.

To me, this looks like !cat runs in a separate thread or at least a
separate PrintStream that it hasn't quite completely flushed its
buffers. Is there a way to force this? I mean, yes, I suppose I can go
edit QTestUtil.execute so as to put an explicit flush, but I won't
know if that works or not till after I do a complete test run(given
that a solo .q run succeeds), and even then, if it succeeds, I won't
know if that is what fixed it.

Has anyone hit something like this before or have any thoughts/theories?

Thanks,
-Sushanth

Re: Weird issue with running tests

2013-04-29 Thread Ashutosh Chauhan

Hi Sushanth,

I would suggest to try dfs -cat in your test instead of !cat, because for !
we fork a different process, so its possible streams get mangled up, but
dfs -cat would get you what you want without needing to fork.

Thanks,
Ashutosh


On Mon, Apr 29, 2013 at 10:46 AM, Sushanth Sowmyan khorg...@gmail.comwrote:

 Hi folks,

 I'm running into a weird issue with testing HIVE-3682. I don't think
 it's so much to do with the jira at hand itself, as it is to do with
 the test or the testing framework.

 Basically, if I run the .q file test itself, it succeeds. If I run it
 as part of a broader ant test, it fails, and seemingly consistently.
 What's more, the reason it fails is that the produced .out file does
 not match the golden output in an interesting way. I'm attaching the
 two files with this mail if anyone wants to look at it further, but
 the diff between them is as follows:

 926,929c926
  REHOOK: query: create table array_table (a arraystring, b
 arraystring)
  ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  COLLECTION ITEMS TERMINATED BY ','^@163:val_163^@
 ---
  163:val_163
 943c940
  REHOOK: type: CREATETABLE^@444:val_444^@
 ---
  444:val_444
 1027a1025,1029
  PREHOOK: query: create table array_table (a arraystring, b
 arraystring)
  ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  COLLECTION ITEMS TERMINATED BY ','
  PREHOOK: type: CREATETABLE

 Note#1 : the PREHOOK log line for a create table seems to have been
 logged before the !cat that preceeded it finished logging.
 Note#2 : The ^@ separators seem to be indicating a switch between
 streams writing out to the .out file.
 Note#3 : The P in PREHOOK seems to get gobbled up each time this happens.

 To me, this looks like !cat runs in a separate thread or at least a
 separate PrintStream that it hasn't quite completely flushed its
 buffers. Is there a way to force this? I mean, yes, I suppose I can go
 edit QTestUtil.execute so as to put an explicit flush, but I won't
 know if that works or not till after I do a complete test run(given
 that a solo .q run succeeds), and even then, if it succeeds, I won't
 know if that is what fixed it.

 Has anyone hit something like this before or have any thoughts/theories?

 Thanks,
 -Sushanth

Re: Weird issue with running tests

2013-04-29 Thread Sushanth Sowmyan

Aha, that makes sense. Thanks!

On Mon, Apr 29, 2013 at 10:55 AM, Ashutosh Chauhan hashut...@apache.org wrote:
 Hi Sushanth,

 I would suggest to try dfs -cat in your test instead of !cat, because for !
 we fork a different process, so its possible streams get mangled up, but
 dfs -cat would get you what you want without needing to fork.

 Thanks,
 Ashutosh


 On Mon, Apr 29, 2013 at 10:46 AM, Sushanth Sowmyan khorg...@gmail.comwrote:

 Hi folks,

 I'm running into a weird issue with testing HIVE-3682. I don't think
 it's so much to do with the jira at hand itself, as it is to do with
 the test or the testing framework.

 Basically, if I run the .q file test itself, it succeeds. If I run it
 as part of a broader ant test, it fails, and seemingly consistently.
 What's more, the reason it fails is that the produced .out file does
 not match the golden output in an interesting way. I'm attaching the
 two files with this mail if anyone wants to look at it further, but
 the diff between them is as follows:

 926,929c926
  REHOOK: query: create table array_table (a arraystring, b
 arraystring)
  ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  COLLECTION ITEMS TERMINATED BY ','^@163:val_163^@
 ---
  163:val_163
 943c940
  REHOOK: type: CREATETABLE^@444:val_444^@
 ---
  444:val_444
 1027a1025,1029
  PREHOOK: query: create table array_table (a arraystring, b
 arraystring)
  ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  COLLECTION ITEMS TERMINATED BY ','
  PREHOOK: type: CREATETABLE

 Note#1 : the PREHOOK log line for a create table seems to have been
 logged before the !cat that preceeded it finished logging.
 Note#2 : The ^@ separators seem to be indicating a switch between
 streams writing out to the .out file.
 Note#3 : The P in PREHOOK seems to get gobbled up each time this happens.

 To me, this looks like !cat runs in a separate thread or at least a
 separate PrintStream that it hasn't quite completely flushed its
 buffers. Is there a way to force this? I mean, yes, I suppose I can go
 edit QTestUtil.execute so as to put an explicit flush, but I won't
 know if that works or not till after I do a complete test run(given
 that a solo .q run succeeds), and even then, if it succeeds, I won't
 know if that is what fixed it.

 Has anyone hit something like this before or have any thoughts/theories?

 Thanks,
 -Sushanth

[jira] [Updated] (HIVE-4443) [HCatalog] Have an option for GET queue to return all job information in single call


 [ 
https://issues.apache.org/jira/browse/HIVE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4443:
-

Attachment: (was: HIVE-4443-1.patch)

 [HCatalog] Have an option for GET queue to return all job information in 
 single call 
 -

 Key: HIVE-4443
 URL: https://issues.apache.org/jira/browse/HIVE-4443
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Dai

 Currently do display a summary of all jobs, one has to call GET queue to 
 retrieve all the jobids and then call GET queue/:jobid for each job. It would 
 be nice to do this in a single call.
 I would suggest:
 * GET queue - mark deprecate
 * GET queue/jobID - mark deprecate
 * DELETE queue/jobID - mark deprecate
 * GET jobs - return the list of JSON objects jobid but no detailed info
 * GET jobs/fields=* - return the list of JSON objects containing detailed Job 
 info
 * GET jobs/jobID - return the single JSON object containing the detailed 
 Job info for the job with the given ID (equivalent to GET queue/jobID)
 * DELETE jobs/jobID - equivalent to DELETE queue/jobID

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4443) [HCatalog] Have an option for GET queue to return all job information in single call


 [ 
https://issues.apache.org/jira/browse/HIVE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4443:
-

Attachment: HIVE-4443-1.patch

 [HCatalog] Have an option for GET queue to return all job information in 
 single call 
 -

 Key: HIVE-4443
 URL: https://issues.apache.org/jira/browse/HIVE-4443
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4443-1.patch


 Currently do display a summary of all jobs, one has to call GET queue to 
 retrieve all the jobids and then call GET queue/:jobid for each job. It would 
 be nice to do this in a single call.
 I would suggest:
 * GET queue - mark deprecate
 * GET queue/jobID - mark deprecate
 * DELETE queue/jobID - mark deprecate
 * GET jobs - return the list of JSON objects jobid but no detailed info
 * GET jobs/fields=* - return the list of JSON objects containing detailed Job 
 info
 * GET jobs/jobID - return the single JSON object containing the detailed 
 Job info for the job with the given ID (equivalent to GET queue/jobID)
 * DELETE jobs/jobID - equivalent to DELETE queue/jobID

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3952) merge map-job followed by map-reduce job

2013-04-29 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-3952:
--

Attachment: HIVE-3952-20130428-branch-0.11-bugfix.txt

Patch with only the bug fix. The previously failing tests pass now.

 merge map-job followed by map-reduce job
 

 Key: HIVE-3952
 URL: https://issues.apache.org/jira/browse/HIVE-3952
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.11.0

 Attachments: hive.3952.1.patch, HIVE-3952-20130226.txt, 
 HIVE-3952-20130227.1.txt, HIVE-3952-20130301.txt, HIVE-3952-20130421.txt, 
 HIVE-3952-20130424.txt, HIVE-3952-20130428-branch-0.11-bugfix.txt, 
 HIVE-3952-20130428-branch-0.11.txt, HIVE-3952-20130428-branch-0.11-v2.txt


 Consider the query like:
 select count(*) FROM
 ( select idOne, idTwo, value FROM
   bigTable   
   JOIN
 
   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
   
   ) firstjoin 
 
 JOIN  
 
 smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
 where smallTableOne and smallTableTwo are smaller than 
 hive.auto.convert.join.noconditionaltask.size and
 hive.auto.convert.join.noconditionaltask is set to true.
 The joins are collapsed into mapjoins, and it leads to a map-only job
 (for the map-joins) followed by a map-reduce job (for the group by).
 Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3682) when output hive table to file,users should could have a separator of their own choice


 [ 
https://issues.apache.org/jira/browse/HIVE-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3682:
--

Attachment: HIVE-3682.D10275.4.patch

khorgath updated the revision HIVE-3682 [jira] when output hive table to 
file,users should could have a separator of their own choice.

  Converted !cat to dfs -cat to prevent issues with multiple streams writing to 
the .out file

Reviewers: ashutoshc, JIRA, omalley

REVISION DETAIL
  https://reviews.facebook.net/D10275

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D10275?vs=33045id=33153#toc

BRANCH
  HIVE-3682

ARCANIST PROJECT
  hive

AFFECTED FILES
  data/files/array_table.txt
  data/files/map_table.txt
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
  ql/src/test/queries/clientpositive/insert_overwrite_local_directory_1.q
  ql/src/test/results/clientpositive/insert_overwrite_local_directory_1.q.out

To: JIRA, ashutoshc, omalley, khorgath


 when output hive table to file,users should could have a separator of their 
 own choice
 --

 Key: HIVE-3682
 URL: https://issues.apache.org/jira/browse/HIVE-3682
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Affects Versions: 0.8.1
 Environment: Linux 3.0.0-14-generic #23-Ubuntu SMP Mon Nov 21 
 20:34:47 UTC 2011 i686 i686 i386 GNU/Linux
 java version 1.6.0_25
 hadoop-0.20.2-cdh3u0
 hive-0.8.1
Reporter: caofangkun
Assignee: Sushanth Sowmyan
 Fix For: 0.11.0

 Attachments: HIVE-3682-1.patch, HIVE-3682.D10275.1.patch, 
 HIVE-3682.D10275.2.patch, HIVE-3682.D10275.3.patch, HIVE-3682.D10275.4.patch, 
 HIVE-3682.with.serde.patch


 By default,when output hive table to file ,columns of the Hive table are 
 separated by ^A character (that is \001).
 But indeed users should have the right to set a seperator of their own choice.
 Usage Example:
 create table for_test (key string, value string);
 load data local inpath './in1.txt' into table for_test
 select * from for_test;
 UT-01：default separator is \001 line separator is \n
 insert overwrite local directory './test-01' 
 select * from src ;
 create table array_table (a arraystring, b arraystring)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ',';
 load data local inpath ../hive/examples/files/arraytest.txt overwrite into 
 table table2;
 CREATE TABLE map_table (foo STRING , bar MAPSTRING, STRING)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ','
 MAP KEYS TERMINATED BY ':'
 STORED AS TEXTFILE;
 UT-02：defined field separator as ':'
 insert overwrite local directory './test-02' 
 row format delimited 
 FIELDS TERMINATED BY ':' 
 select * from src ;
 UT-03: line separator DO NOT ALLOWED to define as other separator 
 insert overwrite local directory './test-03' 
 row format delimited 
 FIELDS TERMINATED BY ':' 
 select * from src ;
 UT-04: define map separators 
 insert overwrite local directory './test-04' 
 row format delimited 
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ','
 MAP KEYS TERMINATED BY ':'
 select * from src;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4383) Implement vectorized string column-scalar filters


 [ 
https://issues.apache.org/jira/browse/HIVE-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4383:
--

Attachment: HIVE-4383.3.patch

Updated patch to apply to current version of public vectorization branch

 Implement vectorized string column-scalar filters
 -

 Key: HIVE-4383
 URL: https://issues.apache.org/jira/browse/HIVE-4383
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Attachments: HIVE-4383.1.patch, HIVE-4383.2.patch, HIVE-4383.3.patch


 Create patch for implementing string columns compared with scalars as 
 vectorized filters, and apply it to vectorization branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Review Request: Implement vectorized string column-scalar filters

2013-04-29 Thread Eric Hanson


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10840/
---

Review request for hive.


Description
---

Implement vectorized string column-scalar filters. Includes changes equivalent 
to HIVE-4348 to correct unit test build failure on Windows in this branch.


This addresses bug HIVE-4383.
https://issues.apache.org/jira/browse/HIVE-4383


Diffs
-

  hbase-handler/src/test/templates/TestHBaseCliDriver.vm 9c1651a 
  hbase-handler/src/test/templates/TestHBaseNegativeCliDriver.vm 5940cbb 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColEqualStringScalar.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColGreaterEqualStringScalar.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColGreaterStringScalar.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColLessEqualStringScalar.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColLessStringScalar.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColNotEqualStringScalar.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java
 318541b 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/FilterStringColumnCompareScalar.txt
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java
 fe34b11 

Diff: https://reviews.apache.org/r/10840/diff/


Testing
---


Thanks,

Eric Hanson

[jira] [Created] (HIVE-4445) Fix the Hive unit test failures on Windows when Linux scripts or commands are used in test cases

Xi Fang created HIVE-4445:
-

 Summary: Fix the Hive unit test failures on Windows when Linux 
scripts or commands are used in test cases
 Key: HIVE-4445
 URL: https://issues.apache.org/jira/browse/HIVE-4445
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.11.0
Reporter: Xi Fang


The following unit tests fail on Windows because Linux scripts or commands are 
used in the test cases or .q files:

1. TestMinimrCliDriver: scriptfile1.q
2. TestNegativeMinimrCliDriver: mapreduce_stack_trace_hadoop20.q, 
minimr_broken_pipe.q
3. TestCliDriver: hiveprofiler_script0.q

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4383) Implement vectorized string column-scalar filters


[ 
https://issues.apache.org/jira/browse/HIVE-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644725#comment-13644725
 ] 

Eric Hanson commented on HIVE-4383:
---

Code review available at https://reviews.apache.org/r/10840/

 Implement vectorized string column-scalar filters
 -

 Key: HIVE-4383
 URL: https://issues.apache.org/jira/browse/HIVE-4383
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Attachments: HIVE-4383.1.patch, HIVE-4383.2.patch, HIVE-4383.3.patch


 Create patch for implementing string columns compared with scalars as 
 vectorized filters, and apply it to vectorization branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3682) when output hive table to file,users should could have a separator of their own choice

2013-04-29 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-3682:
---

Attachment: HIVE-3682.D10275.4.patch.for.0.11

Attaching 0.11 patch for latest patch.

 when output hive table to file,users should could have a separator of their 
 own choice
 --

 Key: HIVE-3682
 URL: https://issues.apache.org/jira/browse/HIVE-3682
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Affects Versions: 0.8.1
 Environment: Linux 3.0.0-14-generic #23-Ubuntu SMP Mon Nov 21 
 20:34:47 UTC 2011 i686 i686 i386 GNU/Linux
 java version 1.6.0_25
 hadoop-0.20.2-cdh3u0
 hive-0.8.1
Reporter: caofangkun
Assignee: Sushanth Sowmyan
 Fix For: 0.11.0

 Attachments: HIVE-3682-1.patch, HIVE-3682.D10275.1.patch, 
 HIVE-3682.D10275.2.patch, HIVE-3682.D10275.3.patch, HIVE-3682.D10275.4.patch, 
 HIVE-3682.D10275.4.patch.for.0.11, HIVE-3682.with.serde.patch


 By default,when output hive table to file ,columns of the Hive table are 
 separated by ^A character (that is \001).
 But indeed users should have the right to set a seperator of their own choice.
 Usage Example:
 create table for_test (key string, value string);
 load data local inpath './in1.txt' into table for_test
 select * from for_test;
 UT-01：default separator is \001 line separator is \n
 insert overwrite local directory './test-01' 
 select * from src ;
 create table array_table (a arraystring, b arraystring)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ',';
 load data local inpath ../hive/examples/files/arraytest.txt overwrite into 
 table table2;
 CREATE TABLE map_table (foo STRING , bar MAPSTRING, STRING)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ','
 MAP KEYS TERMINATED BY ':'
 STORED AS TEXTFILE;
 UT-02：defined field separator as ':'
 insert overwrite local directory './test-02' 
 row format delimited 
 FIELDS TERMINATED BY ':' 
 select * from src ;
 UT-03: line separator DO NOT ALLOWED to define as other separator 
 insert overwrite local directory './test-03' 
 row format delimited 
 FIELDS TERMINATED BY ':' 
 select * from src ;
 UT-04: define map separators 
 insert overwrite local directory './test-04' 
 row format delimited 
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ','
 MAP KEYS TERMINATED BY ':'
 select * from src;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4373) Hive Version returned by HiveDatabaseMetaData.getDatabaseProductVersion is incorrect


 [ 
https://issues.apache.org/jira/browse/HIVE-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4373:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk and 0.11 branch. Thanks, Thejas!

 Hive Version returned by HiveDatabaseMetaData.getDatabaseProductVersion is 
 incorrect
 

 Key: HIVE-4373
 URL: https://issues.apache.org/jira/browse/HIVE-4373
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Deepesh Khandelwal
Assignee: Thejas M Nair
Priority: Minor
 Fix For: 0.11.0

 Attachments: HIVE-4373.1.patch, HIVE-4373.2.patch, HIVE-4373.3.patch


 When running beeline
 {code}
 % beeline -u 'jdbc:hive2://localhost:1' -n hive -p passwd -d 
 org.apache.hive.jdbc.HiveDriver
 Connecting to jdbc:hive2://localhost:1
 Connected to: Hive (version 0.10.0)
 Driver: Hive (version 0.11.0)
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 {code}
 The Hive version in the Connected to:  string says 0.10.0 instead of 0.11.0.
 Looking at the code it seems that the version is hardcoded at two places:
 line 250 in jdbc/src/java/org/apache/hive/jdbc/HiveDatabaseMetaData.java
 line 833 in jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4445) Fix the Hive unit test failures on Windows when Linux scripts or commands are used in test cases


 [ 
https://issues.apache.org/jira/browse/HIVE-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated HIVE-4445:
--

Environment: Windows

 Fix the Hive unit test failures on Windows when Linux scripts or commands are 
 used in test cases
 

 Key: HIVE-4445
 URL: https://issues.apache.org/jira/browse/HIVE-4445
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.11.0
 Environment: Windows
Reporter: Xi Fang
 Attachments: HIVE-4445.1.patch


 The following unit tests fail on Windows because Linux scripts or commands 
 are used in the test cases or .q files:
 1. TestMinimrCliDriver: scriptfile1.q
 2. TestNegativeMinimrCliDriver: mapreduce_stack_trace_hadoop20.q, 
 minimr_broken_pipe.q
 3. TestCliDriver: hiveprofiler_script0.q

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4445) Fix the Hive unit test failures on Windows when Linux scripts or commands are used in test cases


 [ 
https://issues.apache.org/jira/browse/HIVE-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated HIVE-4445:
--

Attachment: HIVE-4445.1.patch

 Fix the Hive unit test failures on Windows when Linux scripts or commands are 
 used in test cases
 

 Key: HIVE-4445
 URL: https://issues.apache.org/jira/browse/HIVE-4445
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.11.0
Reporter: Xi Fang
 Attachments: HIVE-4445.1.patch


 The following unit tests fail on Windows because Linux scripts or commands 
 are used in the test cases or .q files:
 1. TestMinimrCliDriver: scriptfile1.q
 2. TestNegativeMinimrCliDriver: mapreduce_stack_trace_hadoop20.q, 
 minimr_broken_pipe.q
 3. TestCliDriver: hiveprofiler_script0.q

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4445) Fix the Hive unit test failures on Windows when Linux scripts or commands are used in test cases


 [ 
https://issues.apache.org/jira/browse/HIVE-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated HIVE-4445:
--

Fix Version/s: 0.11.0
   Status: Patch Available  (was: Open)

 Fix the Hive unit test failures on Windows when Linux scripts or commands are 
 used in test cases
 

 Key: HIVE-4445
 URL: https://issues.apache.org/jira/browse/HIVE-4445
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.11.0
 Environment: Windows
Reporter: Xi Fang
 Fix For: 0.11.0

 Attachments: HIVE-4445.1.patch


 The following unit tests fail on Windows because Linux scripts or commands 
 are used in the test cases or .q files:
 1. TestMinimrCliDriver: scriptfile1.q
 2. TestNegativeMinimrCliDriver: mapreduce_stack_trace_hadoop20.q, 
 minimr_broken_pipe.q
 3. TestCliDriver: hiveprofiler_script0.q

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4445) Fix the Hive unit test failures on Windows when Linux scripts or commands are used in test cases


[ 
https://issues.apache.org/jira/browse/HIVE-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644751#comment-13644751
 ] 

Xi Fang commented on HIVE-4445:
---

Update the .q script files for these test cases so that they can work on 
Windows:
1. TestMinimrCliDriver: scriptfile1.q
2. TestNegativeMinimrCliDriver: mapreduce_stack_trace_hadoop20.q, 
minimr_broken_pipe.q
3. TestCliDriver: hiveprofiler_script0.q

 Fix the Hive unit test failures on Windows when Linux scripts or commands are 
 used in test cases
 

 Key: HIVE-4445
 URL: https://issues.apache.org/jira/browse/HIVE-4445
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.11.0
 Environment: Windows
Reporter: Xi Fang
 Fix For: 0.11.0

 Attachments: HIVE-4445.1.patch


 The following unit tests fail on Windows because Linux scripts or commands 
 are used in the test cases or .q files:
 1. TestMinimrCliDriver: scriptfile1.q
 2. TestNegativeMinimrCliDriver: mapreduce_stack_trace_hadoop20.q, 
 minimr_broken_pipe.q
 3. TestCliDriver: hiveprofiler_script0.q

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: Implement vectorized string column-scalar filters

2013-04-29 Thread Eric Hanson


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10840/
---

(Updated April 29, 2013, 6:53 p.m.)


Review request for hive.


Changes
---

removed changes related to HIVE-4348 


Description
---

Implement vectorized string column-scalar filters. Includes changes equivalent 
to HIVE-4348 to correct unit test build failure on Windows in this branch.


This addresses bug HIVE-4383.
https://issues.apache.org/jira/browse/HIVE-4383


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColEqualStringScalar.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColGreaterEqualStringScalar.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColGreaterStringScalar.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColLessEqualStringScalar.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColLessStringScalar.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColNotEqualStringScalar.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java
 318541b 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/FilterStringColumnCompareScalar.txt
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java
 fe34b11 

Diff: https://reviews.apache.org/r/10840/diff/


Testing
---


Thanks,

Eric Hanson

[jira] [Updated] (HIVE-4383) Implement vectorized string column-scalar filters


 [ 
https://issues.apache.org/jira/browse/HIVE-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4383:
--

Attachment: HIVE-4384.4.patch

removed changes related to 4348 (unit test compile failure)

 Implement vectorized string column-scalar filters
 -

 Key: HIVE-4383
 URL: https://issues.apache.org/jira/browse/HIVE-4383
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Attachments: HIVE-4383.1.patch, HIVE-4383.2.patch, HIVE-4383.3.patch, 
 HIVE-4384.4.patch


 Create patch for implementing string columns compared with scalars as 
 vectorized filters, and apply it to vectorization branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: Implement vectorized string column-scalar filters

2013-04-29 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10840/#review19883
---

Ship it!


Ship It!

- Ashutosh Chauhan


On April 29, 2013, 6:53 p.m., Eric Hanson wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/10840/
 ---
 
 (Updated April 29, 2013, 6:53 p.m.)
 
 
 Review request for hive.
 
 
 Description
 ---
 
 Implement vectorized string column-scalar filters. Includes changes 
 equivalent to HIVE-4348 to correct unit test build failure on Windows in this 
 branch.
 
 
 This addresses bug HIVE-4383.
 https://issues.apache.org/jira/browse/HIVE-4383
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColEqualStringScalar.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColGreaterEqualStringScalar.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColGreaterStringScalar.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColLessEqualStringScalar.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColLessStringScalar.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/FilterStringColNotEqualStringScalar.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java
  318541b 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/FilterStringColumnCompareScalar.txt
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java
  fe34b11 
 
 Diff: https://reviews.apache.org/r/10840/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Eric Hanson

[jira] [Resolved] (HIVE-4383) Implement vectorized string column-scalar filters


 [ 
https://issues.apache.org/jira/browse/HIVE-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4383.


   Resolution: Fixed
Fix Version/s: vectorization-branch

Committed to branch. Thanks, Eric!

 Implement vectorized string column-scalar filters
 -

 Key: HIVE-4383
 URL: https://issues.apache.org/jira/browse/HIVE-4383
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Fix For: vectorization-branch

 Attachments: HIVE-4383.1.patch, HIVE-4383.2.patch, HIVE-4383.3.patch, 
 HIVE-4384.4.patch


 Create patch for implementing string columns compared with scalars as 
 vectorized filters, and apply it to vectorization branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3739) Hive auto convert join result error: java.lang.InstantiationException: org.antlr.runtime.CommonToken

2013-04-29 Thread Garry Turkington (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644781#comment-13644781
]

Garry Turkington commented on HIVE-3739:

I've been seeing this issue on and off but couldn't find a pattern until today.

The variable appears to be JDK 7 vs JDK6. On a CDH 4.2.1 cluster running
Oracle JDK 6 u32 64-bit my Hive queries that take use of auto join run
successfully. But as a test I tried running the cluster with Oracle JDK 7 U21
64-bit and the same Hive queries throw lots of these Antlr exceptions.

On further investigation one of my client boxes has JDK 7 by default and this
is where the errors were seen in the past; JDK 6 clients didn't show the issue.

As mentioned this was on Cloudera 4.2.1 i.e. Hive 0.10; not sure if core Hive
views JDK 7 as a supported platform or not, I see other Jiras resolved that fix
JDK 7 problems but the getting started page only mentions JDK 6:

https://cwiki.apache.org/confluence/display/Hive/GettingStarted

Garry

Hive auto convert join result error: java.lang.InstantiationException:
org.antlr.runtime.CommonToken

Key: HIVE-3739
URL: https://issues.apache.org/jira/browse/HIVE-3739
Project: Hive
Issue Type: Bug
Components: CLI
Affects Versions: 0.9.0
Environment: hive.auto.convert.join=true
Reporter: fantasy

After I set hive.auto.convert.join=true. Any HiveQL with a join executed in
hive result a error as this:
-
java.lang.InstantiationException: org.antlr.runtime.CommonToken
Continuing ...
java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
Continuing ...
java.lang.InstantiationException: org.antlr.runtime.CommonToken
Continuing ...
java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
Continuing ...
java.lang.InstantiationException: org.antlr.runtime.CommonToken
Continuing ...
java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
Continuing ...
java.lang.InstantiationException: org.antlr.runtime.CommonToken
Continuing ...
java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
Continuing ...
---
can anyone tell why?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent


 [ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4435:
-

Status: Patch Available  (was: Open)

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4435.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent


 [ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4435:
-

Attachment: HIVE-4435.1.patch

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4435.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent

[
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shreepadma Venugopalan updated HIVE-4435:
-

Description: The current implementation of Flajolet-Martin estimator to
estimate the number of distinct values doesn't use hash functions that are
pairwise independent. This is problematic because the input values don't
distribute uniformly. When run on large TPC-H data sets, this leads to a huge
discrepancy for primary key columns. Primary key columns are typically a
monotonically increasing sequence.

Column stats: Distinct value estimator should use hash functions that are
pairwise independent
--

Key: HIVE-4435
URL: https://issues.apache.org/jira/browse/HIVE-4435
Project: Hive
Issue Type: Bug
Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
Attachments: HIVE-4435.1.patch

The current implementation of Flajolet-Martin estimator to estimate the
number of distinct values doesn't use hash functions that are pairwise
independent. This is problematic because the input values don't distribute
uniformly. When run on large TPC-H data sets, this leads to a huge
discrepancy for primary key columns. Primary key columns are typically a
monotonically increasing sequence.

[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent


[ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644840#comment-13644840
 ] 

Shreepadma Venugopalan commented on HIVE-4435:
--

The fix is to use hash functions that are pairwise independent. More on 
pairwise independence and family of hash functions - 
http://people.csail.mit.edu/ronitt/COURSE/S12/handouts/lec5.pdf

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4435.1.patch


 The current implementation of Flajolet-Martin estimator to estimate the 
 number of distinct values doesn't use hash functions that are pairwise 
 independent. This is problematic because the input values don't distribute 
 uniformly. When run on large TPC-H data sets, this leads to a huge 
 discrepancy for primary key columns. Primary key columns are typically a 
 monotonically increasing sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Review Request: HIVE-4435: Column stats: Distinct value estimator should use hash functions that are pairwise independent

2013-04-29 Thread Shreepadma Venugopalan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10841/
---

Review request for hive.


Description
---

Fixes the FM estimator to use hash functions that are pairwise independent.


This addresses bug HIVE-4435.
https://issues.apache.org/jira/browse/HIVE-4435


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java
 69e6f46 

Diff: https://reviews.apache.org/r/10841/diff/


Testing
---

The estimates are within the expected error after this fix. Tested on TPCH of 
varying sizes.


Thanks,

Shreepadma Venugopalan

[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent


[ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644844#comment-13644844
 ] 

Shreepadma Venugopalan commented on HIVE-4435:
--

review board: https://reviews.apache.org/r/10841/

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4435.1.patch


 The current implementation of Flajolet-Martin estimator to estimate the 
 number of distinct values doesn't use hash functions that are pairwise 
 independent. This is problematic because the input values don't distribute 
 uniformly. When run on large TPC-H data sets, this leads to a huge 
 discrepancy for primary key columns. Primary key columns are typically a 
 monotonically increasing sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent


 [ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4435:
-

Attachment: chart_1(1).png

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: chart_1(1).png, HIVE-4435.1.patch


 The current implementation of Flajolet-Martin estimator to estimate the 
 number of distinct values doesn't use hash functions that are pairwise 
 independent. This is problematic because the input values don't distribute 
 uniformly. When run on large TPC-H data sets, this leads to a huge 
 discrepancy for primary key columns. Primary key columns are typically a 
 monotonically increasing sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent


[ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644850#comment-13644850
 ] 

Shreepadma Venugopalan commented on HIVE-4435:
--

Attached plot of relative error vs. number of distinct values after the fix. 
Dataset: TPC-H of varying sizes up to 10TB
hive.stats.ndv.error = 5% (standard error for the estimator)
Column types: String, Long, Double

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: chart_1(1).png, HIVE-4435.1.patch


 The current implementation of Flajolet-Martin estimator to estimate the 
 number of distinct values doesn't use hash functions that are pairwise 
 independent. This is problematic because the input values don't distribute 
 uniformly. When run on large TPC-H data sets, this leads to a huge 
 discrepancy for primary key columns. Primary key columns are typically a 
 monotonically increasing sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4446) [HCatalog] Documentation for HIVE-4442, HIVE-4443, HIVE-4444


 [ 
https://issues.apache.org/jira/browse/HIVE-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4446:
-

Attachment: HIVE-4446-1.patch

 [HCatalog] Documentation for HIVE-4442, HIVE-4443, HIVE-
 

 Key: HIVE-4446
 URL: https://issues.apache.org/jira/browse/HIVE-4446
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4446-1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3682) when output hive table to file,users should could have a separator of their own choice


 [ 
https://issues.apache.org/jira/browse/HIVE-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3682:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk and 0.11 branch. Thanks, Sushanth!

 when output hive table to file,users should could have a separator of their 
 own choice
 --

 Key: HIVE-3682
 URL: https://issues.apache.org/jira/browse/HIVE-3682
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Affects Versions: 0.8.1
 Environment: Linux 3.0.0-14-generic #23-Ubuntu SMP Mon Nov 21 
 20:34:47 UTC 2011 i686 i686 i386 GNU/Linux
 java version 1.6.0_25
 hadoop-0.20.2-cdh3u0
 hive-0.8.1
Reporter: caofangkun
Assignee: Sushanth Sowmyan
 Fix For: 0.11.0

 Attachments: HIVE-3682-1.patch, HIVE-3682.D10275.1.patch, 
 HIVE-3682.D10275.2.patch, HIVE-3682.D10275.3.patch, HIVE-3682.D10275.4.patch, 
 HIVE-3682.D10275.4.patch.for.0.11, HIVE-3682.with.serde.patch


 By default,when output hive table to file ,columns of the Hive table are 
 separated by ^A character (that is \001).
 But indeed users should have the right to set a seperator of their own choice.
 Usage Example:
 create table for_test (key string, value string);
 load data local inpath './in1.txt' into table for_test
 select * from for_test;
 UT-01：default separator is \001 line separator is \n
 insert overwrite local directory './test-01' 
 select * from src ;
 create table array_table (a arraystring, b arraystring)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ',';
 load data local inpath ../hive/examples/files/arraytest.txt overwrite into 
 table table2;
 CREATE TABLE map_table (foo STRING , bar MAPSTRING, STRING)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ','
 MAP KEYS TERMINATED BY ':'
 STORED AS TEXTFILE;
 UT-02：defined field separator as ':'
 insert overwrite local directory './test-02' 
 row format delimited 
 FIELDS TERMINATED BY ':' 
 select * from src ;
 UT-03: line separator DO NOT ALLOWED to define as other separator 
 insert overwrite local directory './test-03' 
 row format delimited 
 FIELDS TERMINATED BY ':' 
 select * from src ;
 UT-04: define map separators 
 insert overwrite local directory './test-04' 
 row format delimited 
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ','
 MAP KEYS TERMINATED BY ':'
 select * from src;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4349) Fix the Hive unit test failures when the Hive enlistment root path is longer than ~12 characters

2013-04-29 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644998#comment-13644998
 ] 

Carl Steinbach commented on HIVE-4349:
--

I want to add that a better place for a patch like this is Ant's JUnit task. 
That way everyone automatically benefits from the fix without having to gunk up 
their build files with special case logic and bespoke Ant tasks. Based on the 
prevalence of this problem I'm kind of surprised that someone hasn't already 
done this.

 Fix the Hive unit test failures when the Hive enlistment root path is longer 
 than ~12 characters
 

 Key: HIVE-4349
 URL: https://issues.apache.org/jira/browse/HIVE-4349
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Xi Fang
 Fix For: 0.11.0

 Attachments: HIVE-4349.1.patch


 If the Hive enlistment root path is longer than 12 chars then test classpath 
 “hadoop.testcp” is exceeding the 8K chars so we are unable to run most of the 
 Hive unit tests on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4385) Implement vectorized LIKE filter


 [ 
https://issues.apache.org/jira/browse/HIVE-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4385:
--

Attachment: HIVE-4385.3.patch

Based off most recent vectorization branch

 Implement vectorized LIKE filter
 

 Key: HIVE-4385
 URL: https://issues.apache.org/jira/browse/HIVE-4385
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Attachments: HIVE-4385.2.patch, HIVE-4385.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4385) Implement vectorized LIKE filter


[ 
https://issues.apache.org/jira/browse/HIVE-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645038#comment-13645038
 ] 

Eric Hanson commented on HIVE-4385:
---

Code review available at https://reviews.apache.org/r/10844/

 Implement vectorized LIKE filter
 

 Key: HIVE-4385
 URL: https://issues.apache.org/jira/browse/HIVE-4385
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Attachments: HIVE-4385.2.patch, HIVE-4385.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4385) Implement vectorized LIKE filter


 [ 
https://issues.apache.org/jira/browse/HIVE-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4385:
--

Status: Patch Available  (was: Open)

apply to vectorization branch

 Implement vectorized LIKE filter
 

 Key: HIVE-4385
 URL: https://issues.apache.org/jira/browse/HIVE-4385
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Attachments: HIVE-4385.2.patch, HIVE-4385.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Build failed in Jenkins: Hive-0.10.0-SNAPSHOT-h0.20.1 #135

2013-04-29 Thread Apache Jenkins Server

See https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/135/

--
[...truncated 41967 lines...]
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2013-04-29 16:33:11,107 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] Execution completed successfully
[junit] Mapred Local Task Succeeded . Convert the Join into MapJoin
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/service/localscratchdir/hive_2013-04-29_16-33-07_856_7912485236401692106/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/service/tmp/hive_job_log_jenkins_201304291633_223742817.txt
[junit] Copying file: 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/data/files/kv1.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] Table default.testhivedrivertable stats: [num_partitions: 0, 
num_files: 1, num_rows: 0, total_size: 5812, raw_data_size: 0]
[junit] POSTHOOK: query: load data local inpath 
'/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/service/localscratchdir/hive_2013-04-29_16-33-12_495_3859802960431102280/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/service/localscratchdir/hive_2013-04-29_16-33-12_495_3859802960431102280/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/service/tmp/hive_job_log_jenkins_201304291633_605803189.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable

[jira] [Created] (HIVE-4447) hcatalog version numbers need to be updated

Ashutosh Chauhan created HIVE-4447:
--

 Summary: hcatalog version numbers need to be updated 
 Key: HIVE-4447
 URL: https://issues.apache.org/jira/browse/HIVE-4447
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Ashutosh Chauhan




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #362

2013-04-29 Thread Apache Jenkins Server

See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/362/

--
[...truncated 36511 lines...]
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2013-04-29_16-57-50_551_7553408207680302479/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/362/artifact/hive/build/service/tmp/hive_job_log_jenkins_201304291657_645085803.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] Copying file: 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/jenkins/hive_2013-04-29_16-57-55_154_3606858085066822556/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2013-04-29_16-57-55_154_3606858085066822556/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/362/artifact/hive/build/service/tmp/hive_job_log_jenkins_201304291657_1454527651.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/362/artifact/hive/build/service/tmp/hive_job_log_jenkins_201304291657_1510446235.txt
[junit] Hive history 
file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/362/artifact/hive/build/service/tmp/hive_job_log_jenkins_201304291657_1387074030.txt
[junit] Copying file: 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK:

[jira] [Updated] (HIVE-4232) JDBC2 HiveConnection has odd defaults


 [ 
https://issues.apache.org/jira/browse/HIVE-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-4232:
--

Affects Version/s: 0.12.0
Fix Version/s: 0.12.0

 JDBC2 HiveConnection has odd defaults
 -

 Key: HIVE-4232
 URL: https://issues.apache.org/jira/browse/HIVE-4232
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0, 0.12.0
Reporter: Chris Drome
Assignee: Chris Drome
 Fix For: 0.11.0, 0.12.0

 Attachments: HIVE-4232-1.patch, HIVE-4232-2.patch, HIVE-4232.patch


 HiveConnection defaults to using a plain SASL transport if auth is not set. 
 To get a raw transport auth must be set to noSasl; furthermore noSasl is case 
 sensitive. Code tries to infer Kerberos or plain authentication based on the 
 presence of principal. There is no provision for specifying QOP level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4232) JDBC2 HiveConnection has odd defaults


 [ 
https://issues.apache.org/jira/browse/HIVE-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-4232:
--

Attachment: HIVE-4232-trunk-3.patch
HIVE-4232-0.11-3.patch

New patch fixes test failure.

 JDBC2 HiveConnection has odd defaults
 -

 Key: HIVE-4232
 URL: https://issues.apache.org/jira/browse/HIVE-4232
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0, 0.12.0
Reporter: Chris Drome
Assignee: Chris Drome
 Fix For: 0.11.0, 0.12.0

 Attachments: HIVE-4232-0.11-3.patch, HIVE-4232-1.patch, 
 HIVE-4232-2.patch, HIVE-4232.patch, HIVE-4232-trunk-3.patch


 HiveConnection defaults to using a plain SASL transport if auth is not set. 
 To get a raw transport auth must be set to noSasl; furthermore noSasl is case 
 sensitive. Code tries to infer Kerberos or plain authentication based on the 
 presence of principal. There is no provision for specifying QOP level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4392) Illogical InvalidObjectException throwed when use mulit aggregate functions with star columns


 [ 
https://issues.apache.org/jira/browse/HIVE-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4392:
--

Attachment: HIVE-4392.D10431.4.patch

navis updated the revision HIVE-4392 [jira] Illogical InvalidObjectException 
throwed when use mulit aggregate functions with star columns.

  Addressed comments

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D10431

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D10431?vs=32847id=33177#toc

AFFECTED FILES
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/test/queries/clientpositive/ctas_colname.q
  ql/src/test/results/clientpositive/ctas_colname.q.out

To: JIRA, ashutoshc, navis
Cc: hbutani


 Illogical InvalidObjectException throwed when use mulit aggregate functions 
 with star columns 
 --

 Key: HIVE-4392
 URL: https://issues.apache.org/jira/browse/HIVE-4392
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Apache Hadoop 0.20.1
 Apache Hive Trunk
Reporter: caofangkun
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4392.D10431.1.patch, HIVE-4392.D10431.2.patch, 
 HIVE-4392.D10431.3.patch, HIVE-4392.D10431.4.patch


 For Example:
 hive (default) create table liza_1 as 
select *, sum(key), sum(value) 
from new_src;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201304191025_0003, Tracking URL = 
 http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0003
 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job  -kill 
 job_201304191025_0003
 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 
 1
 2013-04-22 11:09:28,017 Stage-1 map = 0%,  reduce = 0%
 2013-04-22 11:09:34,054 Stage-1 map = 0%,  reduce = 100%
 2013-04-22 11:09:37,074 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201304191025_0003
 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1
 FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a 
 valid object name)
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 MapReduce Jobs Launched: 
 Job 0: Reduce: 1   HDFS Read: 0 HDFS Write: 12 SUCCESS
 Total MapReduce CPU Time Spent: 0 msec
 hive (default) create table liza_1 as 
select *, sum(key), sum(value) 
from new_src   
group by key, value;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks not specified. Estimated from input data size: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201304191025_0004, Tracking URL = 
 http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0004
 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job  -kill 
 job_201304191025_0004
 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 
 1
 2013-04-22 11:11:58,945 Stage-1 map = 0%,  reduce = 0%
 2013-04-22 11:12:01,964 Stage-1 map = 0%,  reduce = 100%
 2013-04-22 11:12:04,982 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201304191025_0004
 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1
 FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a 
 valid object name)
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 MapReduce Jobs Launched: 
 Job 0: Reduce: 1   HDFS Read: 0 HDFS Write: 0 SUCCESS
 Total MapReduce CPU Time Spent: 0 msec
 But the following tow Queries  work:
 hive (default) create table liza_1 as select * from new_src;
 Total MapReduce jobs = 3
 Launching Job 1 out of 3
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201304191025_0006, Tracking URL = 
 http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0006
 Kill Command =

[jira] [Commented] (HIVE-4392) Illogical InvalidObjectException throwed when use mulit aggregate functions with star columns


[ 
https://issues.apache.org/jira/browse/HIVE-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645096#comment-13645096
 ] 

Navis commented on HIVE-4392:
-

[~rhbutani] / [~ashutoshc] Updated patch. Could you take a look at it?

 Illogical InvalidObjectException throwed when use mulit aggregate functions 
 with star columns 
 --

 Key: HIVE-4392
 URL: https://issues.apache.org/jira/browse/HIVE-4392
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Apache Hadoop 0.20.1
 Apache Hive Trunk
Reporter: caofangkun
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4392.D10431.1.patch, HIVE-4392.D10431.2.patch, 
 HIVE-4392.D10431.3.patch, HIVE-4392.D10431.4.patch


 For Example:
 hive (default) create table liza_1 as 
select *, sum(key), sum(value) 
from new_src;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201304191025_0003, Tracking URL = 
 http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0003
 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job  -kill 
 job_201304191025_0003
 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 
 1
 2013-04-22 11:09:28,017 Stage-1 map = 0%,  reduce = 0%
 2013-04-22 11:09:34,054 Stage-1 map = 0%,  reduce = 100%
 2013-04-22 11:09:37,074 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201304191025_0003
 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1
 FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a 
 valid object name)
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 MapReduce Jobs Launched: 
 Job 0: Reduce: 1   HDFS Read: 0 HDFS Write: 12 SUCCESS
 Total MapReduce CPU Time Spent: 0 msec
 hive (default) create table liza_1 as 
select *, sum(key), sum(value) 
from new_src   
group by key, value;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks not specified. Estimated from input data size: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201304191025_0004, Tracking URL = 
 http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0004
 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job  -kill 
 job_201304191025_0004
 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 
 1
 2013-04-22 11:11:58,945 Stage-1 map = 0%,  reduce = 0%
 2013-04-22 11:12:01,964 Stage-1 map = 0%,  reduce = 100%
 2013-04-22 11:12:04,982 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201304191025_0004
 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1
 FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a 
 valid object name)
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 MapReduce Jobs Launched: 
 Job 0: Reduce: 1   HDFS Read: 0 HDFS Write: 0 SUCCESS
 Total MapReduce CPU Time Spent: 0 msec
 But the following tow Queries  work:
 hive (default) create table liza_1 as select * from new_src;
 Total MapReduce jobs = 3
 Launching Job 1 out of 3
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201304191025_0006, Tracking URL = 
 http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0006
 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job  -kill 
 job_201304191025_0006
 Hadoop job information for Stage-1: number of mappers: 0; number of reducers:  0
 2013-04-22 11:15:00,681 Stage-1 map = 0%,  reduce = 0%
 2013-04-22 11:15:03,697 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201304191025_0006
 Stage-4 is selected by condition resolver.
 Stage-3 is filtered out by condition resolver.
 Stage-5 is filtered out by condition resolver.
 Moving data to: 
 hdfs://hd17-vm5:9101/user/zongren/hive-scratchdir/hive_2013-04-22_11-14-54_632_6709035018023861094/-ext-10001
 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1
 Table default.liza_1 stats: [num_partitions: 0, num_files: 0, num_rows: 0, 
 total_size: 0, raw_data_size: 0]
 MapReduce Jobs

[jira] [Updated] (HIVE-4232) JDBC2 HiveConnection has odd defaults


 [ 
https://issues.apache.org/jira/browse/HIVE-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-4232:
--

Attachment: (was: HIVE-4232-0.11-3.patch)

 JDBC2 HiveConnection has odd defaults
 -

 Key: HIVE-4232
 URL: https://issues.apache.org/jira/browse/HIVE-4232
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0, 0.12.0
Reporter: Chris Drome
Assignee: Chris Drome
 Fix For: 0.11.0, 0.12.0

 Attachments: HIVE-4232-1.patch, HIVE-4232-2.patch, HIVE-4232.patch


 HiveConnection defaults to using a plain SASL transport if auth is not set. 
 To get a raw transport auth must be set to noSasl; furthermore noSasl is case 
 sensitive. Code tries to infer Kerberos or plain authentication based on the 
 presence of principal. There is no provision for specifying QOP level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3739) Hive auto convert join result error: java.lang.InstantiationException: org.antlr.runtime.CommonToken


[ 
https://issues.apache.org/jira/browse/HIVE-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645102#comment-13645102
 ] 

Navis commented on HIVE-3739:
-

It's XMLEncoder complaining that AST is not serializable (similar with 
HIVE-4222). Is hive in CDH different with vanillar hive?

 Hive auto convert join result error: java.lang.InstantiationException: 
 org.antlr.runtime.CommonToken
 

 Key: HIVE-3739
 URL: https://issues.apache.org/jira/browse/HIVE-3739
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.9.0
 Environment: hive.auto.convert.join=true
Reporter: fantasy

 After I set hive.auto.convert.join=true. Any HiveQL with a join executed in 
 hive result a error as this:
 -
 java.lang.InstantiationException: org.antlr.runtime.CommonToken
  Continuing ...
  java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
  Continuing ...
  java.lang.InstantiationException: org.antlr.runtime.CommonToken
  Continuing ...
  java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
  Continuing ...
  java.lang.InstantiationException: org.antlr.runtime.CommonToken
  Continuing ...
  java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
  Continuing ...
  java.lang.InstantiationException: org.antlr.runtime.CommonToken
  Continuing ...
  java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
  Continuing ...
 ---
 can anyone tell why?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4232) JDBC2 HiveConnection has odd defaults


 [ 
https://issues.apache.org/jira/browse/HIVE-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-4232:
--

Attachment: HIVE-4232-0.11-3.patch

Missed a file.

 JDBC2 HiveConnection has odd defaults
 -

 Key: HIVE-4232
 URL: https://issues.apache.org/jira/browse/HIVE-4232
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0, 0.12.0
Reporter: Chris Drome
Assignee: Chris Drome
 Fix For: 0.11.0, 0.12.0

 Attachments: HIVE-4232-1.patch, HIVE-4232-2.patch, HIVE-4232.patch


 HiveConnection defaults to using a plain SASL transport if auth is not set. 
 To get a raw transport auth must be set to noSasl; furthermore noSasl is case 
 sensitive. Code tries to infer Kerberos or plain authentication based on the 
 presence of principal. There is no provision for specifying QOP level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4232) JDBC2 HiveConnection has odd defaults


 [ 
https://issues.apache.org/jira/browse/HIVE-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-4232:
--

Attachment: (was: HIVE-4232-0.11-3.patch)

 JDBC2 HiveConnection has odd defaults
 -

 Key: HIVE-4232
 URL: https://issues.apache.org/jira/browse/HIVE-4232
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0, 0.12.0
Reporter: Chris Drome
Assignee: Chris Drome
 Fix For: 0.11.0, 0.12.0

 Attachments: HIVE-4232-1.patch, HIVE-4232-2.patch, HIVE-4232.patch


 HiveConnection defaults to using a plain SASL transport if auth is not set. 
 To get a raw transport auth must be set to noSasl; furthermore noSasl is case 
 sensitive. Code tries to infer Kerberos or plain authentication based on the 
 presence of principal. There is no provision for specifying QOP level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4232) JDBC2 HiveConnection has odd defaults


 [ 
https://issues.apache.org/jira/browse/HIVE-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-4232:
--

Attachment: (was: HIVE-4232-trunk-3.patch)

 JDBC2 HiveConnection has odd defaults
 -

 Key: HIVE-4232
 URL: https://issues.apache.org/jira/browse/HIVE-4232
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0, 0.12.0
Reporter: Chris Drome
Assignee: Chris Drome
 Fix For: 0.11.0, 0.12.0

 Attachments: HIVE-4232-1.patch, HIVE-4232-2.patch, HIVE-4232.patch


 HiveConnection defaults to using a plain SASL transport if auth is not set. 
 To get a raw transport auth must be set to noSasl; furthermore noSasl is case 
 sensitive. Code tries to infer Kerberos or plain authentication based on the 
 presence of principal. There is no provision for specifying QOP level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4232) JDBC2 HiveConnection has odd defaults


 [ 
https://issues.apache.org/jira/browse/HIVE-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-4232:
--

Attachment: HIVE-4232-3-trunk.patch
HIVE-4232-3-0.11.patch

Uploaded renamed files.

 JDBC2 HiveConnection has odd defaults
 -

 Key: HIVE-4232
 URL: https://issues.apache.org/jira/browse/HIVE-4232
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0, 0.12.0
Reporter: Chris Drome
Assignee: Chris Drome
 Fix For: 0.11.0, 0.12.0

 Attachments: HIVE-4232-1.patch, HIVE-4232-2.patch, 
 HIVE-4232-3-0.11.patch, HIVE-4232-3-trunk.patch, HIVE-4232.patch


 HiveConnection defaults to using a plain SASL transport if auth is not set. 
 To get a raw transport auth must be set to noSasl; furthermore noSasl is case 
 sensitive. Code tries to infer Kerberos or plain authentication based on the 
 presence of principal. There is no provision for specifying QOP level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

ObjectInspectorUtils.copyToStandardObject and union types

2013-04-29 Thread Siyang Chen

Calling ObjectInspectorUtils.copyToStandardObject() on a union type object 
returns just the object without any sort of information about its type. (See 
lines 302-310, which start case UNION:.)

I'd like to change the method to return an object of type UnionObject so that 
we retain type information. Does anyone have any issues with this?


Siyang Chen

[jira] [Updated] (HIVE-2379) Hive/HBase integration could be improved


 [ 
https://issues.apache.org/jira/browse/HIVE-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-2379:


Attachment: HIVE-2379-0.11.patch.txt

Sorry, missed this.

 Hive/HBase integration could be improved
 

 Key: HIVE-2379
 URL: https://issues.apache.org/jira/browse/HIVE-2379
 Project: Hive
  Issue Type: Bug
  Components: CLI, Clients, HBase Handler
Affects Versions: 0.7.1, 0.8.0, 0.9.0
Reporter: Roman Shaposhnik
Assignee: Navis
Priority: Critical
 Fix For: 0.12.0

 Attachments: HIVE-2379-0.11.patch.txt, HIVE-2379.D7347.1.patch, 
 HIVE-2379.D7347.2.patch, HIVE-2379.D7347.3.patch


 For now any Hive/HBase queries would require the following jars to be 
 explicitly added via hive's add jar command:
 add jar /usr/lib/hive/lib/hbase-0.90.1-cdh3u0.jar;
 add jar /usr/lib/hive/lib/hive-hbase-handler-0.7.0-cdh3u0.jar;
 add jar /usr/lib/hive/lib/zookeeper-3.3.1.jar;
 add jar /usr/lib/hive/lib/guava-r06.jar;
 the longer term solution, perhaps, should be to have the code at submit time 
 call hbase's 
 TableMapREduceUtil.addDependencyJar(job, HBaseStorageHandler.class) to ship 
 it in distributedcache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4209) Cache evaluation result of deterministic expression and reuse it


 [ 
https://issues.apache.org/jira/browse/HIVE-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4209:


Attachment: HIVE-4209.6.patch.txt

 Cache evaluation result of deterministic expression and reuse it
 

 Key: HIVE-4209
 URL: https://issues.apache.org/jira/browse/HIVE-4209
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4209.6.patch.txt, HIVE-4209.D9585.1.patch, 
 HIVE-4209.D9585.2.patch, HIVE-4209.D9585.3.patch, HIVE-4209.D9585.4.patch, 
 HIVE-4209.D9585.5.patch


 For example, 
 {noformat}
 select key from src where key + 1  100 AND key + 1  200 limit 3;
 {noformat}
 key + 1 need not to be evaluated twice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4448) Fix metastore warehouse incorrect path on Windows for test case TestExecDriver and TestHiveMetaStoreChecker

Shuaishuai Nie created HIVE-4448:


 Summary: Fix metastore warehouse incorrect path on Windows for 
test case TestExecDriver and TestHiveMetaStoreChecker
 Key: HIVE-4448
 URL: https://issues.apache.org/jira/browse/HIVE-4448
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Affects Versions: 0.11.0
 Environment: Windows
Reporter: Shuaishuai Nie
Assignee: Shuaishuai Nie
 Attachments: HIVE-4448.1.patch

Unit test cases which not using QTestUtil will pass incompatible Windows path 
of METASTOREWAREHOUSE to HiveConf which result in creating the 
/test/data/warehouse folder in the wrong location in Windows. This folder will 
not be deleted at the beginning of the unit test and the content will cause 
failure of unit tests if run the same test case repeatedly. The root cause of 
this problem is for path like this 
pfile://C:\hive\build\ql/test/data/warehouse, the C:\hive\build\ part will 
be parsed as authority of the path and removed from the path string. The patch 
will fix this problem and make the unit test result consistent between Windows 
and Linux.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4448) Fix metastore warehouse incorrect path on Windows for test case TestExecDriver and TestHiveMetaStoreChecker


 [ 
https://issues.apache.org/jira/browse/HIVE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuaishuai Nie updated HIVE-4448:
-

Attachment: (was: HIVE-4448.1.patch)

 Fix metastore warehouse incorrect path on Windows for test case 
 TestExecDriver and TestHiveMetaStoreChecker
 ---

 Key: HIVE-4448
 URL: https://issues.apache.org/jira/browse/HIVE-4448
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Affects Versions: 0.11.0
 Environment: Windows
Reporter: Shuaishuai Nie
Assignee: Shuaishuai Nie
 Attachments: HIVE-4448.1.patch


 Unit test cases which not using QTestUtil will pass incompatible Windows path 
 of METASTOREWAREHOUSE to HiveConf which result in creating the 
 /test/data/warehouse folder in the wrong location in Windows. This folder 
 will not be deleted at the beginning of the unit test and the content will 
 cause failure of unit tests if run the same test case repeatedly. The root 
 cause of this problem is for path like this 
 pfile://C:\hive\build\ql/test/data/warehouse, the C:\hive\build\ part 
 will be parsed as authority of the path and removed from the path string. The 
 patch will fix this problem and make the unit test result consistent between 
 Windows and Linux.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4448) Fix metastore warehouse incorrect path on Windows for test case TestExecDriver and TestHiveMetaStoreChecker


 [ 
https://issues.apache.org/jira/browse/HIVE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuaishuai Nie updated HIVE-4448:
-

Status: Patch Available  (was: Open)

 Fix metastore warehouse incorrect path on Windows for test case 
 TestExecDriver and TestHiveMetaStoreChecker
 ---

 Key: HIVE-4448
 URL: https://issues.apache.org/jira/browse/HIVE-4448
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Affects Versions: 0.11.0
 Environment: Windows
Reporter: Shuaishuai Nie
Assignee: Shuaishuai Nie
 Attachments: HIVE-4448.1.patch


 Unit test cases which not using QTestUtil will pass incompatible Windows path 
 of METASTOREWAREHOUSE to HiveConf which result in creating the 
 /test/data/warehouse folder in the wrong location in Windows. This folder 
 will not be deleted at the beginning of the unit test and the content will 
 cause failure of unit tests if run the same test case repeatedly. The root 
 cause of this problem is for path like this 
 pfile://C:\hive\build\ql/test/data/warehouse, the C:\hive\build\ part 
 will be parsed as authority of the path and removed from the path string. The 
 patch will fix this problem and make the unit test result consistent between 
 Windows and Linux.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4448) Fix metastore warehouse incorrect path on Windows for test case TestExecDriver and TestHiveMetaStoreChecker


[ 
https://issues.apache.org/jira/browse/HIVE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645130#comment-13645130
 ] 

Shuaishuai Nie commented on HIVE-4448:
--

All the unit test cases which not using QTestUtil can use the same method in 
the patch to convert the paths in the HiveConf and avoid inconsistent test 
result on Windows

 Fix metastore warehouse incorrect path on Windows for test case 
 TestExecDriver and TestHiveMetaStoreChecker
 ---

 Key: HIVE-4448
 URL: https://issues.apache.org/jira/browse/HIVE-4448
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Affects Versions: 0.11.0
 Environment: Windows
Reporter: Shuaishuai Nie
Assignee: Shuaishuai Nie
 Attachments: HIVE-4448.1.patch


 Unit test cases which not using QTestUtil will pass incompatible Windows path 
 of METASTOREWAREHOUSE to HiveConf which result in creating the 
 /test/data/warehouse folder in the wrong location in Windows. This folder 
 will not be deleted at the beginning of the unit test and the content will 
 cause failure of unit tests if run the same test case repeatedly. The root 
 cause of this problem is for path like this 
 pfile://C:\hive\build\ql/test/data/warehouse, the C:\hive\build\ part 
 will be parsed as authority of the path and removed from the path string. The 
 patch will fix this problem and make the unit test result consistent between 
 Windows and Linux.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4447) hcatalog version numbers need to be updated

2013-04-29 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-4447:
-

Attachment: hcat-releasable-build.patch

 hcatalog version numbers need to be updated 
 

 Key: HIVE-4447
 URL: https://issues.apache.org/jira/browse/HIVE-4447
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Ashutosh Chauhan
 Attachments: hcat-releasable-build.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4447) hcatalog version numbers need to be updated

2013-04-29 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-4447:
-

Status: Patch Available  (was: Open)

 hcatalog version numbers need to be updated 
 

 Key: HIVE-4447
 URL: https://issues.apache.org/jira/browse/HIVE-4447
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Ashutosh Chauhan
 Attachments: hcat-releasable-build.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4196) Support for Streaming Partitions in Hive

2013-04-29 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Roshan Naik updated HIVE-4196:
--

Attachment: HIVE-4196.v1.patch

draft patch for review. based on phase mentioned in design doc. Deviates
slighlty...
1) adds a couple of (temporary) rest calls to enable/disable streaming on a
table. Later these will be replaced with support in DDL.

2) Also also HTTP methods are GET for easy testing with web browser

3) Authentication disabled on the new streaming HTTP methods

Usage Examples on db named 'sdb' table named 'log' :

1) *Setup db table with single partition column 'date':*
hcat -e create database sdb; use sdb; create table log(msg string, region
string) partitioned by (date string) ROW FORMAT DELIMITED FIELDS TERMINATED BY
',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;

2) *To check streaming status:*
http://localhost:50111/templeton/v1/streaming/status?database=sdbtable=log

3) *Enable Streaming:*

http://localhost:50111/templeton/v1/streaming/enable?database=sdbtable=logcol=datevalue=1000

4) *Get Chunk File to write to:*
http://localhost:50111/templeton/v1/streaming/chunkget?database=sdbtable=logschema=blahformat=blahrecord_separator=blahfield_separator=blah

5) *Commit Chunk File:*
http://localhost:50111/templeton/v1/streaming/chunkcommit?database=sdbtable=logchunkfile=/user/hive/streaming/tmp/sdb/log/2

6) *Abort Chunk File:*
http://localhost:50111/templeton/v1/streaming/chunkabort?database=sdbtable=logchunkfile=/user/hive/streaming/tmp/sdb/log/3

7) *Roll Partition:*
http://localhost:50111/templeton/v1/streaming/partitionroll?database=sdbtable=logpartition_column=datepartition_value=3000

Support for Streaming Partitions in Hive

Key: HIVE-4196
URL: https://issues.apache.org/jira/browse/HIVE-4196
Project: Hive
Issue Type: New Feature
Components: Database/Schema, HCatalog
Affects Versions: 0.10.1
Reporter: Roshan Naik
Assignee: Roshan Naik
Attachments:
HCatalogStreamingIngestFunctionalSpecificationandDesign.docx,
HIVE-4196.v1.patch

Motivation: Allow Hive users to immediately query data streaming in through
clients such as Flume.
Currently Hive partitions must be created after all the data for the
partition is available. Thereafter, data in the partitions is considered
immutable.
This proposal introduces the notion of a streaming partition into which new
files an be committed periodically and made available for queries before the
partition is closed and converted into a standard partition.
The admin enables streaming partition on a table using DDL. He provides the
following pieces of information:
- Name of the partition in the table on which streaming is enabled
- Frequency at which the streaming partition should be closed and converted
into a standard partition.
Tables with streaming partition enabled will be partitioned by one and only
one column. It is assumed that this column will contain a timestamp.
Closing the current streaming partition converts it into a standard
partition. Based on the specified frequency, the current streaming partition
is closed and a new one created for future writes. This is referred to as
'rolling the partition'.
A streaming partition's life cycle is as follows:
- A new streaming partition is instantiated for writes
- Streaming clients request (via webhcat) for a HDFS file name into which
they can write a chunk of records for a specific table.
- Streaming clients write a chunk (via webhdfs) to that file and commit
it(via webhcat). Committing merely indicates that the chunk has been written
completely and ready for serving queries.
- When the partition is rolled, all committed chunks are swept into single
directory and a standard partition pointing to that directory is created. The
streaming partition is closed and new streaming partition is created. Rolling
the partition is atomic. Streaming clients are agnostic of partition rolling.

- Hive queries will be able to query the partition that is currently open
for streaming. only committed chunks will be visible. read consistency will
be ensured so that repeated reads of the same partition will be idempotent
for the lifespan of the query.
Partition rolling requires an active agent/thread running to check when it is
time to roll and trigger the roll. This could be either be achieved by using
an external agent such as Oozie (preferably) or an internal agent.

[jira] [Commented] (HIVE-4447) hcatalog version numbers need to be updated


[ 
https://issues.apache.org/jira/browse/HIVE-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645171#comment-13645171
 ] 

Ashutosh Chauhan commented on HIVE-4447:


+1

 hcatalog version numbers need to be updated 
 

 Key: HIVE-4447
 URL: https://issues.apache.org/jira/browse/HIVE-4447
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Ashutosh Chauhan
 Attachments: hcat-releasable-build.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Review Request: Draft patch for review. Based on phase 1 mentioned in design doc.

2013-04-29 Thread Roshan Naik


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10857/
---

Review request for hive.


Description
---

Draft patch for review. based on phase 1 mentioned in design doc. Deviates 
slighlty from doc in the follow ways... 
1) adds a couple of (temporary) rest calls to enable/disable streaming on a 
table. Later these will be replaced with support in DDL.
2) Also also HTTP methods are GET for easy testing with web browser
3) Authentication disabled on the new streaming HTTP methods


Usage Examples on db named 'sdb'  table named 'log' :

1) Setup db  table with single partition column 'date':
hcat -e create database sdb; use sdb; create table log(msg string, region 
string) partitioned by (date string) ROW FORMAT DELIMITED FIELDS TERMINATED BY 
',' LINES TERMINATED BY '\n' STORED AS TEXTFILE; 
2) To check streaming status:
http://localhost:50111/templeton/v1/streaming/status?database=sdbtable=log
3) Enable Streaming:
http://localhost:50111/templeton/v1/streaming/enable?database=sdbtable=logcol=datevalue=1000
4) Get Chunk File to write to:
http://localhost:50111/templeton/v1/streaming/chunkget?database=sdbtable=logschema=blahformat=blahrecord_separator=blahfield_separator=blah
5) Commit Chunk File:
http://localhost:50111/templeton/v1/streaming/chunkcommit?database=sdbtable=logchunkfile=/user/hive/streaming/tmp/sdb/log/2
6) Abort Chunk File:
http://localhost:50111/templeton/v1/streaming/chunkabort?database=sdbtable=logchunkfile=/user/hive/streaming/tmp/sdb/log/3
7) Roll Partition:
http://localhost:50111/templeton/v1/streaming/partitionroll?database=sdbtable=logpartition_column=datepartition_value=3000


This addresses bug HIVE-4196.
https://issues.apache.org/jira/browse/HIVE-4196


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java c61d95b 
  
hcatalog/webhcat/svr/src/main/java/org/apache/hcatalog/templeton/HcatStreamingDelegator.java
 PRE-CREATION 
  hcatalog/webhcat/svr/src/main/java/org/apache/hcatalog/templeton/Server.java 
29ac4b3 
  metastore/if/hive_metastore.thrift c2051f4 
  metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 7b31d28 
  metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 3d69472 
  metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp 
3b90b44 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SkewedInfo.java
 d8d6e71 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SkewedValueList.java
 030b54a 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java
 5929cda 
  metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php a69d214 
  metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote 
6fd2cce 
  metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 9b856e5 
  metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 25aa30c 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
dc14084 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
cef50f4 
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
a2d6b1b 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 2079337 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 233fb46 
  metastore/src/model/org/apache/hadoop/hive/metastore/model/MTable.java 
2a78ce9 
  metastore/src/model/package.jdo a84d2bf 
  metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 
00eb0b4 

Diff: https://reviews.apache.org/r/10857/diff/


Testing
---

Manual testing only


Thanks,

Roshan Naik

[jira] [Updated] (HIVE-4196) Support for Streaming Partitions in Hive

2013-04-29 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Roshan Naik updated HIVE-4196:
--

Attachment: (was:
HCatalogStreamingIngestFunctionalSpecificationandDesign.docx)

Support for Streaming Partitions in Hive

Key: HIVE-4196
URL: https://issues.apache.org/jira/browse/HIVE-4196
Project: Hive
Issue Type: New Feature
Components: Database/Schema, HCatalog
Affects Versions: 0.10.1
Reporter: Roshan Naik
Assignee: Roshan Naik
Attachments: HCatalogStreamingIngestFunctionalSpecificationandDesign-
apr 29- patch1.docx, HIVE-4196.v1.patch

[jira] [Updated] (HIVE-4196) Support for Streaming Partitions in Hive

2013-04-29 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Roshan Naik updated HIVE-4196:
--

Attachment: HCatalogStreamingIngestFunctionalSpecificationandDesign- apr
29- patch1.docx

Support for Streaming Partitions in Hive

Key: HIVE-4196
URL: https://issues.apache.org/jira/browse/HIVE-4196
Project: Hive
Issue Type: New Feature
Components: Database/Schema, HCatalog
Affects Versions: 0.10.1
Reporter: Roshan Naik
Assignee: Roshan Naik
Attachments: HCatalogStreamingIngestFunctionalSpecificationandDesign-
apr 29- patch1.docx, HIVE-4196.v1.patch

[jira] [Commented] (HIVE-4209) Cache evaluation result of deterministic expression and reuse it

2013-04-29 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645175#comment-13645175
 ] 

Namit Jain commented on HIVE-4209:
--

Thanks [~navis]

Looks good. Can you commit it if tests pass ?

+1

 Cache evaluation result of deterministic expression and reuse it
 

 Key: HIVE-4209
 URL: https://issues.apache.org/jira/browse/HIVE-4209
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4209.6.patch.txt, HIVE-4209.D9585.1.patch, 
 HIVE-4209.D9585.2.patch, HIVE-4209.D9585.3.patch, HIVE-4209.D9585.4.patch, 
 HIVE-4209.D9585.5.patch


 For example, 
 {noformat}
 select key from src where key + 1  100 AND key + 1  200 limit 3;
 {noformat}
 key + 1 need not to be evaluated twice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4209) Cache evaluation result of deterministic expression and reuse it


[ 
https://issues.apache.org/jira/browse/HIVE-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645176#comment-13645176
 ] 

Phabricator commented on HIVE-4209:
---

njain has accepted the revision HIVE-4209 [jira] Cache evaluation result of 
deterministic expression and reuse it.

REVISION DETAIL
  https://reviews.facebook.net/D9585

BRANCH
  HIVE-4209

ARCANIST PROJECT
  hive

To: JIRA, njain, navis
Cc: njain


 Cache evaluation result of deterministic expression and reuse it
 

 Key: HIVE-4209
 URL: https://issues.apache.org/jira/browse/HIVE-4209
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4209.6.patch.txt, HIVE-4209.D9585.1.patch, 
 HIVE-4209.D9585.2.patch, HIVE-4209.D9585.3.patch, HIVE-4209.D9585.4.patch, 
 HIVE-4209.D9585.5.patch


 For example, 
 {noformat}
 select key from src where key + 1  100 AND key + 1  200 limit 3;
 {noformat}
 key + 1 need not to be evaluated twice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4440) SMB Operator spills to disk like it's 1999

2013-04-29 Thread Namit Jain (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645178#comment-13645178
]

Namit Jain commented on HIVE-4440:
--

I really like the title of the jira.

Changing the parameter name is backward incompatible.
Can you support both the current parameter and the proposed parameter for now ?
Document it clearly, and say that the current parameter
hive.mapjoin.bucket.cache.size will not be supported
for this from 0.13 or something like that.

SMB Operator spills to disk like it's 1999
--

Key: HIVE-4440
URL: https://issues.apache.org/jira/browse/HIVE-4440
Project: Hive
Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
Attachments: HIVE-4440.1.patch

I was recently looking into some performance issue with a query that used SMB
join and was running really slow. Turns out that the SMB join by default
caches only 100 values per key before spilling to disk. That seems overly
conservative to me. Changing the parameter resulted in a ~5x speedup - quite
significant.
The parameter is: hive.mapjoin.bucket.cache.size
Which right now is only used the SMB Operator as far as I can tell.
The parameter was introduced originally (3 yrs ago) for the map join operator
(looks like pre-SMB) and set to 100 to avoid OOM. That seems to have been in
a different context though where you had to avoid running out of memory with
the cached hash table in the same process, I think.
Two things I'd like to propose:
a) Rename it to what it does: hive.smbjoin.cache.rows
b) Set it to something less restrictive: 1
If you string together a 5 table smb join with a map join and a map-side
group by aggregation you might still run out of memory, but the renamed
parameter should be easier to find and reduce. For most queries, I would
think that 1 is still a reasonable number to cache (On the reduce side we
use 25000 for shuffle joins).

[jira] [Updated] (HIVE-4447) hcatalog version numbers need to be updated

2013-04-29 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-4447:
-

Resolution: Fixed
  Assignee: Alan Gates
Status: Resolved  (was: Patch Available)

Patch checked into branch 0.11.  Thanks Ashutosh for the review.

 hcatalog version numbers need to be updated 
 

 Key: HIVE-4447
 URL: https://issues.apache.org/jira/browse/HIVE-4447
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Ashutosh Chauhan
Assignee: Alan Gates
 Attachments: hcat-releasable-build.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4438) Remove unused join configuration parameter: hive.mapjoin.size.key


 [ 
https://issues.apache.org/jira/browse/HIVE-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4438:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Gunther!

 Remove unused join configuration parameter: hive.mapjoin.size.key
 -

 Key: HIVE-4438
 URL: https://issues.apache.org/jira/browse/HIVE-4438
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.12.0

 Attachments: HIVE-4438.1.patch


 The config parameter that used to limit the number of cached rows per key is 
 no longer used in the code base. I suggest to remove it to make things less 
 confusing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4439) Remove unused join configuration parameter: hive.mapjoin.cache.numrows


 [ 
https://issues.apache.org/jira/browse/HIVE-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4439:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Gunther!

 Remove unused join configuration parameter: hive.mapjoin.cache.numrows
 --

 Key: HIVE-4439
 URL: https://issues.apache.org/jira/browse/HIVE-4439
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.12.0

 Attachments: HIVE-4439.1.patch


 The description says:
 How many rows should be cached by jdbm for map join.
 I can't find any reference to that parameter in the code however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements

2013-04-29 Thread Jeremy Rayner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645269#comment-13645269
 ] 

Jeremy Rayner commented on HIVE-4064:
-

Any progress on this issue? I understand that there are workarounds, but I 
would be nice if this got resolved sometime in the near future.

 Handle db qualified names consistently across all HiveQL statements
 ---

 Key: HIVE-4064
 URL: https://issues.apache.org/jira/browse/HIVE-4064
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan

 Hive doesn't consistently handle db qualified names across all HiveQL 
 statements. While some HiveQL statements such as SELECT support DB qualified 
 names, other such as CREATE INDEX doesn't. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Is HCatalog stand-alone going to die ?

2013-04-29 Thread Rodrigo Trujillo


Hi,

I have followed the discussion about the merging of HCatalog into Hive.
However, it is not clear to me whether new stand-alone versions of Hcatalog
are going to be released.

Is 0.5.0-incubating the last ?

Will be possible to build only hcatalog from Hive tree ?

Regards,

Rodrigo Trujillo

[VOTE] Apache Hive 0.11.0 Release Candidate 0

2013-04-29 Thread Ashutosh Chauhan

Hey all,

I am excited to announce availability of Apache Hive 0.11.0 Release
Candidate 0 at:
http://people.apache.org/~hashutosh/hive-0.11.0-rc0/

Maven artifacts are available here:
https://repository.apache.org/content/repositories/orgapachehive-154/

This release has many goodies including HiveServer2, windowing and
analytical functions, decimal data type, better query planning,
performance enhancements and various bug fixes. In total, we resolved
more than 350 issues. Full list of fixed issues can be found at:
http://s.apache.org/8Fr


Voting will conclude in 72 hours.

Hive PMC Members: Please test and vote.

Thanks,

Ashutosh (On behalf of Hive contributors who made 0.11 a possibility)

[jira] [Updated] (HIVE-4447) hcatalog version numbers need to be updated