[jira] [Commented] (HIVE-633) ADD FILE command does not accept quoted filenames

2013-01-30 Thread kiran sreekumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566331#comment-13566331
 ] 

kiran sreekumar commented on HIVE-633:
--

is this issue still relevant, as i would like to work on this.

 ADD FILE command does not accept quoted filenames
 -

 Key: HIVE-633
 URL: https://issues.apache.org/jira/browse/HIVE-633
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.3.0
 Environment: Ubuntu Linux (intrepid)
Reporter: Saurabh Nanda
Priority: Minor

 The following command says file does not exist. Removing the quotes around 
 the filename makes it work.
 hive add files '/tmp/testing.jar'; 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3962) number of distinct values are in column statistics

2013-01-30 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-3962:
-

 Summary: number of distinct values are in column statistics
 Key: HIVE-3962
 URL: https://issues.apache.org/jira/browse/HIVE-3962
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Amareshwari Sriramadasu


When we run the query on hive ql src table :

select count(distinct(key)), count(distinct(value) from src;
309 309

After running the following analyze query, the stats in metastore seem wrong:

analyze table src compute statistics for columns key, value; 

--- stats in metastore ---

mysql  select * from TAB_COL_STATS where TABLE_NAME=src;

| CS_ID | DB_NAME | TABLE_NAME | COLUMN_NAME | COLUMN_TYPE | TBL_ID | 
LONG_LOW_VALUE | LONG_HIGH_VALUE | DOUBLE_HIGH_VALUE | DOUBLE_LOW_VALUE | 
BIG_DECIMAL_LOW_VALUE | BIG_DECIMAL_HIGH_VALUE | NUM_NULLS | NUM_DISTINCTS | 
AVG_COL_LEN | MAX_COL_LEN | NUM_TRUES | NUM_FALSES | LAST_ANALYZED |
| 5 | default | src| key | int | 11 |   
   0 | 498 |0. |   0. | NULL
  | NULL   | 0 |   291 |  0. |  
 0 | 0 |  0 |1359539181 |
| 6 | default | src| value   | string  | 11 |   
   0 |   0 |0. |   0. | NULL
  | NULL   | 0 |   112 |  6.8120 |  
 7 | 0 |  0 |1359539181 |



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3962) Number of distinct values are wrong in column statistics

2013-01-30 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-3962:
--

Summary: Number of distinct values are wrong in column statistics  (was: 
number of distinct values are in column statistics)

 Number of distinct values are wrong in column statistics
 

 Key: HIVE-3962
 URL: https://issues.apache.org/jira/browse/HIVE-3962
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Amareshwari Sriramadasu

 When we run the query on hive ql src table :
 select count(distinct(key)), count(distinct(value) from src;
 309 309
 After running the following analyze query, the stats in metastore seem wrong:
 analyze table src compute statistics for columns key, value; 
 --- stats in metastore ---
 mysql  select * from TAB_COL_STATS where TABLE_NAME=src;
 | CS_ID | DB_NAME | TABLE_NAME | COLUMN_NAME | COLUMN_TYPE | TBL_ID | 
 LONG_LOW_VALUE | LONG_HIGH_VALUE | DOUBLE_HIGH_VALUE | DOUBLE_LOW_VALUE | 
 BIG_DECIMAL_LOW_VALUE | BIG_DECIMAL_HIGH_VALUE | NUM_NULLS | NUM_DISTINCTS | 
 AVG_COL_LEN | MAX_COL_LEN | NUM_TRUES | NUM_FALSES | LAST_ANALYZED |
 | 5 | default | src| key | int | 11 | 
  0 | 498 |0. |   0. | NULL
   | NULL   | 0 |   291 |  0. 
 |   0 | 0 |  0 |1359539181 |
 | 6 | default | src| value   | string  | 11 | 
  0 |   0 |0. |   0. | NULL
   | NULL   | 0 |   112 |  6.8120 
 |   7 | 0 |  0 |1359539181 |

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3785) Core hive changes for HiveServer2 implementation

2013-01-30 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566399#comment-13566399
 ] 

Namit Jain commented on HIVE-3785:
--

I am sorry for the delay on my part.
Can you refresh ? I will definitely review this time.

 Core hive changes for HiveServer2 implementation
 

 Key: HIVE-3785
 URL: https://issues.apache.org/jira/browse/HIVE-3785
 Project: Hive
  Issue Type: Sub-task
  Components: Authentication, Build Infrastructure, Configuration, 
 Thrift API
Affects Versions: 0.10.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HS2-changed-files-only.patch


 The subtask to track changes in the core hive components for HiveServer2 
 implementation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3785) Core hive changes for HiveServer2 implementation

2013-01-30 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566400#comment-13566400
 ] 

Namit Jain commented on HIVE-3785:
--

cc [~mgrover], [~prasadm]

 Core hive changes for HiveServer2 implementation
 

 Key: HIVE-3785
 URL: https://issues.apache.org/jira/browse/HIVE-3785
 Project: Hive
  Issue Type: Sub-task
  Components: Authentication, Build Infrastructure, Configuration, 
 Thrift API
Affects Versions: 0.10.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HS2-changed-files-only.patch


 The subtask to track changes in the core hive components for HiveServer2 
 implementation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3950) Remove code for merging files via MR job

2013-01-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566401#comment-13566401
 ] 

Hudson commented on HIVE-3950:
--

Integrated in Hive-trunk-hadoop2 #97 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/97/])
HIVE-3950 : Remove code for merging files via MR job (Ashutosh Chauhan, 
Reviewed by Namit Jain) (Revision 1440238)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1440238
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientnegative/dyn_part_merge.q
* /hive/trunk/ql/src/test/results/clientnegative/dyn_part_merge.q.out


 Remove code for merging files via MR job
 

 Key: HIVE-3950
 URL: https://issues.apache.org/jira/browse/HIVE-3950
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.11.0

 Attachments: hive-3950_1.patch, hive-3950_2.patch, hive-3950.patch


 Hive can merge files either via MR job or via map only job. Obviously, doing 
 it via map-only job is more efficient, but there is an option of doing it via 
 MR job as well because CombineFileInputFormat is available only in 
 hadoop-0.20 and later. Since, we no longer support hadoop versions earlier 
 than 20 anymore all that is now dead code, we should get rid of it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-933) Infer bucketing/sorting properties

2013-01-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566402#comment-13566402
 ] 

Hudson commented on HIVE-933:
-

Integrated in Hive-trunk-hadoop2 #97 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/97/])
HIVE-933 Infer bucketing/sorting properties
(Kevin Wilfong via namit) (Revision 1440271)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1440271
Files : 
* /hive/trunk/build-common.xml
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lib/RuleExactMatch.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingCtx.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingOpProcFactory.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java
* /hive/trunk/ql/src/test/queries/clientnegative/merge_negative_3.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_bucketed_table.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_convert_join.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_dyn_part.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_grouping_operators.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_list_bucket.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_map_operators.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_merge.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_multi_insert.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_num_buckets.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_reducers_power_two.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q
* /hive/trunk/ql/src/test/results/clientnegative/merge_negative_3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/ctas.q.out
* /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_bucketed_table.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_convert_join.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_dyn_part.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_list_bucket.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_map_operators.q.out
* /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_merge.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_multi_insert.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_num_buckets.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_reducers_power_two.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/case_sensitivity.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/cast1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input20.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input5.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input6.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input7.q.xml
* 

[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join

2013-01-30 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3403:
-

Attachment: hive.3403.19.patch

 user should not specify mapjoin to perform sort-merge bucketed join
 ---

 Key: HIVE-3403
 URL: https://issues.apache.org/jira/browse/HIVE-3403
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3403.10.patch, hive.3403.11.patch, 
 hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, 
 hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, 
 hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, hive.3403.2.patch, 
 hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, 
 hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch


 Currently, in order to perform a sort merge bucketed join, the user needs
 to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
 mapjoin hint.
 The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3950) Remove code for merging files via MR job

2013-01-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566422#comment-13566422
 ] 

Hudson commented on HIVE-3950:
--

Integrated in Hive-trunk-h0.21 #1946 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1946/])
HIVE-3950 : Remove code for merging files via MR job (Ashutosh Chauhan, 
Reviewed by Namit Jain) (Revision 1440238)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1440238
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientnegative/dyn_part_merge.q
* /hive/trunk/ql/src/test/results/clientnegative/dyn_part_merge.q.out


 Remove code for merging files via MR job
 

 Key: HIVE-3950
 URL: https://issues.apache.org/jira/browse/HIVE-3950
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.11.0

 Attachments: hive-3950_1.patch, hive-3950_2.patch, hive-3950.patch


 Hive can merge files either via MR job or via map only job. Obviously, doing 
 it via map-only job is more efficient, but there is an option of doing it via 
 MR job as well because CombineFileInputFormat is available only in 
 hadoop-0.20 and later. Since, we no longer support hadoop versions earlier 
 than 20 anymore all that is now dead code, we should get rid of it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-h0.21 - Build # 1946 - Failure

2013-01-30 Thread Apache Jenkins Server
Changes for Build #1944
[namit] HIVE-3873 lot of tests failing for hadoop 23
(Gang Tim Liu via namit)


Changes for Build #1945
[hashutosh] Missed deleting empty file GenMRRedSink4.java while commiting 3784

[hashutosh] HIVE-de-emphasize mapjoin hint (Namit Jain via Ashutosh Chauhan)


Changes for Build #1946
[hashutosh] HIVE-3950 : Remove code for merging files via MR job (Ashutosh 
Chauhan, Reviewed by Namit Jain)




1 tests failed.
FAILED:  
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_aggregator_error_1

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.
at 
net.sf.antcontrib.logic.ForTask.doSequentialIteration(ForTask.java:259)
at net.sf.antcontrib.logic.ForTask.doToken(ForTask.java:268)
at net.sf.antcontrib.logic.ForTask.doTheTasks(ForTask.java:324)
at net.sf.antcontrib.logic.ForTask.execute(ForTask.java:244)




The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1946)

Status: Failure

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1946/ to 
view the results.

[jira] [Commented] (HIVE-933) Infer bucketing/sorting properties

2013-01-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566726#comment-13566726
 ] 

Hudson commented on HIVE-933:
-

Integrated in Hive-trunk-h0.21 #1947 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1947/])
HIVE-933 Infer bucketing/sorting properties
(Kevin Wilfong via namit) (Revision 1440271)

 Result = SUCCESS
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1440271
Files : 
* /hive/trunk/build-common.xml
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lib/RuleExactMatch.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingCtx.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingOpProcFactory.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java
* /hive/trunk/ql/src/test/queries/clientnegative/merge_negative_3.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_bucketed_table.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_convert_join.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_dyn_part.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_grouping_operators.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_list_bucket.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_map_operators.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_merge.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_multi_insert.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_num_buckets.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_reducers_power_two.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q
* /hive/trunk/ql/src/test/results/clientnegative/merge_negative_3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/ctas.q.out
* /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_bucketed_table.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_convert_join.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_dyn_part.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_list_bucket.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_map_operators.q.out
* /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_merge.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_multi_insert.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_num_buckets.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_reducers_power_two.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/case_sensitivity.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/cast1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input20.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input5.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input6.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input7.q.xml
* 

Hive-trunk-h0.21 - Build # 1947 - Fixed

2013-01-30 Thread Apache Jenkins Server
Changes for Build #1944
[namit] HIVE-3873 lot of tests failing for hadoop 23
(Gang Tim Liu via namit)


Changes for Build #1945
[hashutosh] Missed deleting empty file GenMRRedSink4.java while commiting 3784

[hashutosh] HIVE-de-emphasize mapjoin hint (Namit Jain via Ashutosh Chauhan)


Changes for Build #1946
[hashutosh] HIVE-3950 : Remove code for merging files via MR job (Ashutosh 
Chauhan, Reviewed by Namit Jain)


Changes for Build #1947
[namit] HIVE-933 Infer bucketing/sorting properties
(Kevin Wilfong via namit)




All tests passed

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1947)

Status: Fixed

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1947/ to 
view the results.

[jira] [Updated] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-30 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3874:
-

Attachment: hive.3874.2.patch

 Create a new Optimized Row Columnar file format for Hive
 

 Key: HIVE-3874
 URL: https://issues.apache.org/jira/browse/HIVE-3874
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hive.3874.2.patch, OrcFileIntro.pptx, orc.tgz


 There are several limitations of the current RC File format that I'd like to 
 address by creating a new format:
 * each column value is stored as a binary blob, which means:
 ** the entire column value must be read, decompressed, and deserialized
 ** the file format can't use smarter type-specific compression
 ** push down filters can't be evaluated
 * the start of each row group needs to be found by scanning
 * user metadata can only be added to the file when the file is created
 * the file doesn't store the number of rows per a file or row group
 * there is no mechanism for seeking to a particular row number, which is 
 required for external indexes.
 * there is no mechanism for storing light weight indexes within the file to 
 enable push-down filters to skip entire row groups.
 * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-30 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566727#comment-13566727
 ] 

Namit Jain commented on HIVE-3874:
--

I took a stab at it. I am attaching it just in case - feel free to ignore it.
I was not able to get the protocol buffer file auto-generated from ant, so I 
manually generated it for the
purpose of this patch.

 Create a new Optimized Row Columnar file format for Hive
 

 Key: HIVE-3874
 URL: https://issues.apache.org/jira/browse/HIVE-3874
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hive.3874.2.patch, OrcFileIntro.pptx, orc.tgz


 There are several limitations of the current RC File format that I'd like to 
 address by creating a new format:
 * each column value is stored as a binary blob, which means:
 ** the entire column value must be read, decompressed, and deserialized
 ** the file format can't use smarter type-specific compression
 ** push down filters can't be evaluated
 * the start of each row group needs to be found by scanning
 * user metadata can only be added to the file when the file is created
 * the file doesn't store the number of rows per a file or row group
 * there is no mechanism for seeking to a particular row number, which is 
 required for external indexes.
 * there is no mechanism for storing light weight indexes within the file to 
 enable push-down filters to skip entire row groups.
 * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3940) Track columns accessed in each table in a query

2013-01-30 Thread Samuel Yuan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samuel Yuan updated HIVE-3940:
--

Attachment: HIVE-3940.3.patch.txt

Updated.

 Track columns accessed in each table in a query
 ---

 Key: HIVE-3940
 URL: https://issues.apache.org/jira/browse/HIVE-3940
 Project: Hive
  Issue Type: Task
  Components: Query Processor
Reporter: Samuel Yuan
Assignee: Samuel Yuan
Priority: Minor
 Attachments: HIVE-3940.1.patch.txt, HIVE-3940.2.patch.txt, 
 HIVE-3940.3.patch.txt


 Similar to partition access logs, we need to have columns access logs, so 
 later we can build tools/reports to inform users if there are wasted columns 
 in a table to be trimmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-30 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566810#comment-13566810
 ] 

Gunther Hagleitner commented on HIVE-2340:
--

FYI: Ran all unit tests on patch .9. Failing tests are: 
groupby_distinct_samekey.q,join31.q,reduce_deduplicate_extended.q 
(TestCliDriver). Failures look like outdated golden files (explain output 
changed). Uploaded testclidriver.txt for reference.

 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
 HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
 HIVE-2340.D1209.9.patch


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2340) optimize orderby followed by a groupby

2013-01-30 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-2340:
-

Attachment: testclidriver.txt

just the diff of the latest unit test run.

 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
 HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
 HIVE-2340.D1209.9.patch, testclidriver.txt


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-30 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566898#comment-13566898
 ] 

Phabricator commented on HIVE-2340:
---

hagleitn has commented on the revision HIVE-2340 [jira] optimize orderby 
followed by a groupby.

  Partial review

INLINE COMMENTS
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:521 Not sure why 
this is needed or why this defaults to 4. From comment below it seems this is 
just to avoid the single reducer order-by case for performance reasons, is that 
correct?
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:787
 Is this required or extra protection? Comment at the top of the file says 
mapjoin optimization happens before this (and probably should for performance 
reasons). Also, if I understand it correctly joinAndSort might be a better 
name than fixed. You're basically saying that if an optimization wants to 
change the join after this they need to make sure the ordering of the keys is 
preserved, right?
  
ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java:136 
seems orthogonal to this patch.
  ql/src/test/queries/clientpositive/reduce_deduplicate.q:7 There are not a lot 
of tests, for min.reducer=1. No order by case for instance. Maybe the 
reduce_deduplicate_extended.q should run with both default and min.reducer=1.

REVISION DETAIL
  https://reviews.facebook.net/D1209

To: JIRA, navis
Cc: hagleitn


 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
 HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
 HIVE-2340.D1209.9.patch, testclidriver.txt


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-30 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566902#comment-13566902
 ] 

Gunther Hagleitner commented on HIVE-2340:
--

Partial review on phabricator. Biggest question is around 
hive.optimize.reducededuplication.min.reducer. That basically disables the 
orderby followed by groupby optimization which was the original motivation 
for the jira. Navis, can you explain this some more?

Might be another ticket, but would it be possible to optimize group by/sort by 
as well with this?

 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
 HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
 HIVE-2340.D1209.9.patch, testclidriver.txt


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3917) Support noscan operation for analyze command

2013-01-30 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3917:
---

Attachment: HIVE-3917.patch.2

 Support noscan operation for analyze command
 

 Key: HIVE-3917
 URL: https://issues.apache.org/jira/browse/HIVE-3917
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.11.0
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3917.patch.1, HIVE-3917.patch.2


 hive supports analyze command to gather statistics from existing 
 tables/partition 
 https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
 It collects:
 1. Number of Rows
 2. Number of files
 3. Size in Bytes
 If table/partition is big, the operation would take time since it will open 
 all files and scan all data.
 It would be nice to support fast operation to gather statistics which doesn't 
 require to open all files:
 1. Number of files
 2. Size in Bytes
 Potential syntax is 
 ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
 COMPUTE STATISTICS [noscan];
 In the future, all statistics without scan can be retrieved via this optional 
 parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HIVE-3917) Support noscan operation for analyze command

2013-01-30 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3917 started by Gang Tim Liu.

 Support noscan operation for analyze command
 

 Key: HIVE-3917
 URL: https://issues.apache.org/jira/browse/HIVE-3917
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.11.0
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3917.patch.1, HIVE-3917.patch.2


 hive supports analyze command to gather statistics from existing 
 tables/partition 
 https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
 It collects:
 1. Number of Rows
 2. Number of files
 3. Size in Bytes
 If table/partition is big, the operation would take time since it will open 
 all files and scan all data.
 It would be nice to support fast operation to gather statistics which doesn't 
 require to open all files:
 1. Number of files
 2. Size in Bytes
 Potential syntax is 
 ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
 COMPUTE STATISTICS [noscan];
 In the future, all statistics without scan can be retrieved via this optional 
 parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3917) Support noscan operation for analyze command

2013-01-30 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3917:
---

Status: Patch Available  (was: In Progress)

patch is available.

 Support noscan operation for analyze command
 

 Key: HIVE-3917
 URL: https://issues.apache.org/jira/browse/HIVE-3917
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.11.0
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3917.patch.1, HIVE-3917.patch.2


 hive supports analyze command to gather statistics from existing 
 tables/partition 
 https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
 It collects:
 1. Number of Rows
 2. Number of files
 3. Size in Bytes
 If table/partition is big, the operation would take time since it will open 
 all files and scan all data.
 It would be nice to support fast operation to gather statistics which doesn't 
 require to open all files:
 1. Number of files
 2. Size in Bytes
 Potential syntax is 
 ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
 COMPUTE STATISTICS [noscan];
 In the future, all statistics without scan can be retrieved via this optional 
 parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Jenkins build is back to normal : Hive-0.10.0-SNAPSHOT-h0.20.1 #50

2013-01-30 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/50/



[jira] [Updated] (HIVE-3940) Track columns accessed in each table in a query

2013-01-30 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3940:


   Resolution: Fixed
Fix Version/s: 0.11.0
   Status: Resolved  (was: Patch Available)

Committed, thanks Samuel.

 Track columns accessed in each table in a query
 ---

 Key: HIVE-3940
 URL: https://issues.apache.org/jira/browse/HIVE-3940
 Project: Hive
  Issue Type: Task
  Components: Query Processor
Reporter: Samuel Yuan
Assignee: Samuel Yuan
Priority: Minor
 Fix For: 0.11.0

 Attachments: HIVE-3940.1.patch.txt, HIVE-3940.2.patch.txt, 
 HIVE-3940.3.patch.txt


 Similar to partition access logs, we need to have columns access logs, so 
 later we can build tools/reports to inform users if there are wasted columns 
 in a table to be trimmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.

2013-01-30 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566946#comment-13566946
 ] 

Ashutosh Chauhan commented on HIVE-896:
---

PTFDesc only contains a serialized string for PTFDef. I think we should just 
merge these two classes. Rename the existing PTFDef to PTFDesc and removing the 
existing PTFDef. And than make sure that PTFDesc is serializable. Does that 
sound right?
 

 Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
 ---

 Key: HIVE-896
 URL: https://issues.apache.org/jira/browse/HIVE-896
 Project: Hive
  Issue Type: New Feature
  Components: OLAP, UDF
Reporter: Amr Awadallah
Priority: Minor
 Attachments: DataStructs.pdf, HIVE-896.1.patch.txt, 
 Hive-896.2.patch.txt


 Windowing functions are very useful for click stream processing and similar 
 time-series/sliding-window analytics.
 More details at:
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
 -- amr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.

2013-01-30 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566952#comment-13566952
 ] 

Ashutosh Chauhan commented on HIVE-896:
---

Also need to make sure that ASTNode and other antlr datastructures referenced 
(directly or via contained fields) in this new PTFDesc are not required in 
PTFOperator and are thus not serialized, thereby eliminating antlr runtime 
dependency.

 Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
 ---

 Key: HIVE-896
 URL: https://issues.apache.org/jira/browse/HIVE-896
 Project: Hive
  Issue Type: New Feature
  Components: OLAP, UDF
Reporter: Amr Awadallah
Priority: Minor
 Attachments: DataStructs.pdf, HIVE-896.1.patch.txt, 
 Hive-896.2.patch.txt


 Windowing functions are very useful for click stream processing and similar 
 time-series/sliding-window analytics.
 More details at:
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
 -- amr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3963) Allow Hive to get connect to RDBMS

2013-01-30 Thread Maxime LANCIAUX (JIRA)
Maxime LANCIAUX created HIVE-3963:
-

 Summary: Allow Hive to get connect to RDBMS
 Key: HIVE-3963
 URL: https://issues.apache.org/jira/browse/HIVE-3963
 Project: Hive
  Issue Type: New Feature
Reporter: Maxime LANCIAUX




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3917) Support noscan operation for analyze command

2013-01-30 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3917:
---

Attachment: (was: HIVE-3917.patch.2)

 Support noscan operation for analyze command
 

 Key: HIVE-3917
 URL: https://issues.apache.org/jira/browse/HIVE-3917
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.11.0
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3917.patch.1


 hive supports analyze command to gather statistics from existing 
 tables/partition 
 https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
 It collects:
 1. Number of Rows
 2. Number of files
 3. Size in Bytes
 If table/partition is big, the operation would take time since it will open 
 all files and scan all data.
 It would be nice to support fast operation to gather statistics which doesn't 
 require to open all files:
 1. Number of files
 2. Size in Bytes
 Potential syntax is 
 ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
 COMPUTE STATISTICS [noscan];
 In the future, all statistics without scan can be retrieved via this optional 
 parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan

2013-01-30 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3778:
---

Attachment: HIVE-3778.patch.8

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan

2013-01-30 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3778 started by Gang Tim Liu.

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan

2013-01-30 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567023#comment-13567023
 ] 

Gang Tim Liu commented on HIVE-3778:


patch is attached to the jira also.

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan

2013-01-30 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3778:
---

Status: Patch Available  (was: In Progress)

patch is available https://reviews.facebook.net/D8259

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3917) Support noscan operation for analyze command

2013-01-30 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567024#comment-13567024
 ] 

Gang Tim Liu commented on HIVE-3917:


patch is in both https://reviews.facebook.net/D8235 and attachment. thanks

 Support noscan operation for analyze command
 

 Key: HIVE-3917
 URL: https://issues.apache.org/jira/browse/HIVE-3917
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.11.0
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3917.patch.1, HIVE-3917.patch.2


 hive supports analyze command to gather statistics from existing 
 tables/partition 
 https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
 It collects:
 1. Number of Rows
 2. Number of files
 3. Size in Bytes
 If table/partition is big, the operation would take time since it will open 
 all files and scan all data.
 It would be nice to support fast operation to gather statistics which doesn't 
 require to open all files:
 1. Number of files
 2. Size in Bytes
 Potential syntax is 
 ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
 COMPUTE STATISTICS [noscan];
 In the future, all statistics without scan can be retrieved via this optional 
 parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Jenkins build is back to normal : Hive-0.9.1-SNAPSHOT-h0.21 #277

2013-01-30 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/277/



[jira] [Commented] (HIVE-3940) Track columns accessed in each table in a query

2013-01-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567149#comment-13567149
 ] 

Hudson commented on HIVE-3940:
--

Integrated in hive-trunk-hadoop1 #60 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/60/])
HIVE-3940. Track columns accessed in each table in a query. (Samuel Yuan 
via kevinwilfong) (Revision 1440695)

 Result = ABORTED
kevinwilfong : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1440695
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnAccessAnalyzer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnAccessInfo.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckColumnAccessHook.java
* /hive/trunk/ql/src/test/queries/clientpositive/column_access_stats.q
* /hive/trunk/ql/src/test/results/clientpositive/column_access_stats.q.out


 Track columns accessed in each table in a query
 ---

 Key: HIVE-3940
 URL: https://issues.apache.org/jira/browse/HIVE-3940
 Project: Hive
  Issue Type: Task
  Components: Query Processor
Reporter: Samuel Yuan
Assignee: Samuel Yuan
Priority: Minor
 Fix For: 0.11.0

 Attachments: HIVE-3940.1.patch.txt, HIVE-3940.2.patch.txt, 
 HIVE-3940.3.patch.txt


 Similar to partition access logs, we need to have columns access logs, so 
 later we can build tools/reports to inform users if there are wasted columns 
 in a table to be trimmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3964) Add upgrade script for Oracle backend to metastore.

2013-01-30 Thread Mithun Radhakrishnan (JIRA)
Mithun Radhakrishnan created HIVE-3964:
--

 Summary: Add upgrade script for Oracle backend to metastore.
 Key: HIVE-3964
 URL: https://issues.apache.org/jira/browse/HIVE-3964
 Project: Hive
  Issue Type: Bug
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan


upgrade-0.9.0-0.10.0.oracle.sql isn't available in 
metastore/scripts/upgrade/oracle. 

This warrants testing as well. My concern is that 
SDS::IS_STOREDASSUBDIRECTORIES is a new, non-nullable column. Existing rows in 
SDS might need updating with a default value (0) before the constraint is 
applied.

I'll post a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.

2013-01-30 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567271#comment-13567271
 ] 

Harish Butani commented on HIVE-896:


Yes, exactly. Will start to introduce the new Spec classes as noted in the 
DataStruct attachment, and refactor the Def classes to remove the antlr 
dependency. 

But before doing this had to handle the following issue. So the plan we 
generate has the form 
... - ReduceSink - Extract - PTF Op - ...
The Reduce Sink RowResolver contains the Virtual Columns from its input 
Operators. During translation we set the RowResolver of the Extract Op to be 
the same as the Reduce Sink RR; and this same RR was used to setup the 
ExprNodeDescs in PTF translation. But at runtime the Extract Op doesn't contain 
the Virtual Columns and so the internal column names can be different. For e.g. 
in our testJoinWithLeadLag testCase, which is a self join on part and also has 
a Windowing expression. The RR of the RS op at translation time looks something 
like this:
  (_co1,_col2,..,_col7, 
_col8(vc=true),_col9(vc=true),_col10,_col11,.._col15(vc=true),_col16(vc=true),..)
At runtime the Virtual columns are removed and all the columns after _col7 are 
shifted 1 or 2 positions. So in child Operators ColumnExprNodeDescs are no 
longer referring to the right columns.
We were handling this issue by recreating the ExprNodeDescs from the ASTNodes 
at runtime. 
So to avoid carrying forward the ASTNodes we now build a new RR for the Extract 
Op, with the Virtual Columns removed. We hand this to the PTFTranslator as the 
starting RR to use to translate a PTF Chain. 

With the above change, now it should be possible to use the ExprNodeDescs 
created during translation in the execution of the PTF Op. So will now start a 
sequence of steps to move to the new data structures and avoid recreation of 
ExprNodeDescs at runtime. 

I apologize if I am not being clear. This is a little hard to explain w/o 
walking through an example. Happy to go over this in detail offline.


 Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
 ---

 Key: HIVE-896
 URL: https://issues.apache.org/jira/browse/HIVE-896
 Project: Hive
  Issue Type: New Feature
  Components: OLAP, UDF
Reporter: Amr Awadallah
Priority: Minor
 Attachments: DataStructs.pdf, HIVE-896.1.patch.txt, 
 Hive-896.2.patch.txt


 Windowing functions are very useful for click stream processing and similar 
 time-series/sliding-window analytics.
 More details at:
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
 -- amr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3958) support partial scan for analyze command

2013-01-30 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-3958:
---

Description: 
analyze commands allows us to collect statistics on existing tables/partitions. 
It works great but might be slow since it scans all files.

There are 2 ways to speed it up:
1. collect stats without file scan. It may not collect all stats but good and 
fast enough for use case. HIVE-3917 addresses it
2. collect stats via partial file scan. It doesn't scan all content of files 
but part of it to get file metadata. some examples are 
https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) and 
HFile of Hbase

This jira is targeted to address the #2

  was:
analyze commands allows us to collect statistics on existing tables/partitions. 
It works great but might be slow since it scans all files.

There are 2 ways to speed it up:
1. collect stats without file scan. It may not collect all stats but good and 
fast enough for use case. Hive-3917 addresses it
2. collect stats via partial file scan. It doesn't scan all content of files 
but part of it to get file metadata. some examples are 
https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) and 
HFile of Hbase

This jira is targeted to address the #2


 support partial scan for analyze command
 

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu

 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan

2013-01-30 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567318#comment-13567318
 ] 

Ashutosh Chauhan commented on HIVE-3778:


Gang cool idea to address the concern. I think we should extend its usage for 
all the different booleans we have in explain of other *Desc classes. That 
probably will update lot more .q.out files so probably should be done in a 
separate ticket. Can you open a follow-up jira for that?  

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan

2013-01-30 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567331#comment-13567331
 ] 

Gang Tim Liu commented on HIVE-3778:


[~ashutoshc]glad you like it. yes, here is the follow-up jira HIVE-3965.

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3965) Reduce output of explain plan by printing boolean value only if it is true

2013-01-30 Thread Gang Tim Liu (JIRA)
Gang Tim Liu created HIVE-3965:
--

 Summary: Reduce output of explain plan by printing boolean value 
only if it is true
 Key: HIVE-3965
 URL: https://issues.apache.org/jira/browse/HIVE-3965
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu


Leverage the design in HIVE-3778 to reduce output of explain plan by printing 
boolean value only if it is true.

That probably will update lot more .q.out files so probably should be done in a 
separate ticket than 3778. so it ends up here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-30 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567336#comment-13567336
 ] 

Namit Jain commented on HIVE-3833:
--

[~jakobhoman], this was definitely not intentional. Unfortunately, there was no 
test case, so I missed this.
Can you provide me a complete testcase ? I will take a look.

 object inspectors should be initialized based on partition metadata
 ---

 Key: HIVE-3833
 URL: https://issues.apache.org/jira/browse/HIVE-3833
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.11.0

 Attachments: hive.3833.10.patch, hive.3833.11.patch, 
 hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
 hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, 
 hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, 
 hive.3833.21.patch, hive.3833.22.patch, hive.3833.23.patch, 
 hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, 
 hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch


 Currently, different partitions can be picked up for the same input split 
 based on the
 serdes' etc. And, we dont allow to change the schema for 
 LazyColumnarBinarySerDe.
 Instead of that, different partitions should be part of the same split, only 
 if the
 partition schemas exactly match. The operator tree object inspectors should 
 be based
 on the partition schema. That would give greater flexibility and also help 
 using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3953) Reading of partitioned Avro data fails because of missing properties

2013-01-30 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567337#comment-13567337
 ] 

Namit Jain commented on HIVE-3953:
--

Copying from HIVE-3833.
Can you provide me a complete testcase ? I will take a look.

 Reading of partitioned Avro data fails because of missing properties
 

 Key: HIVE-3953
 URL: https://issues.apache.org/jira/browse/HIVE-3953
 Project: Hive
  Issue Type: Bug
Reporter: Mark Wagner

 After HIVE-3833, reading partitioned Avro data fails due to missing 
 properties. The avro.schema.(url|literal) properties are not making it all 
 the way to the SerDe. Non-partitioned data can still be read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


reduced unit test timings

2013-01-30 Thread Namit Jain
I have noticed that the time taken to run the unit tests has reduced 
considerably (it has become nearly half) from the last week or so.
Just wondering, if anyone else has noticed this too.

If yes, does anyone know the root cause of this speedup ?

Thanks,
-namit


[jira] [Commented] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan

2013-01-30 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567345#comment-13567345
 ] 

Namit Jain commented on HIVE-3778:
--

+1

Running tests

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: Request to review the change.

2013-01-30 Thread Arun A K

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9171/
---

Review request for hive.


Description
---

Patch for issue https://issues.apache.org/jira/browse/HIVE-3850, Patch has been 
accepted by the person who raised the issue. Please review.


This addresses bug https://issues.apache.org/jira/browse/HIVE-3850.

https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/HIVE-3850


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFHour.java 85b514a 

Diff: https://reviews.apache.org/r/9171/diff/


Testing
---

The change made was tested


Thanks,

Arun A K



[jira] [Commented] (HIVE-3850) hour() function returns 12 hour clock value when using timestamp datatype

2013-01-30 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567355#comment-13567355
 ] 

Mark Grover commented on HIVE-3850:
---

For completeness, review is at: https://reviews.apache.org/r/9171/

 hour() function returns 12 hour clock value when using timestamp datatype
 -

 Key: HIVE-3850
 URL: https://issues.apache.org/jira/browse/HIVE-3850
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.9.0
Reporter: Pieterjan Vriends
 Attachments: HIVE-3850.patch.txt


 Apparently UDFHour.java does have two evaluate() functions. One that does 
 accept a Text object as parameter and one that does use a TimeStampWritable 
 object as parameter. The first function does return the value of 
 Calendar.HOUR_OF_DAY and the second one of Calendar.HOUR. In the 
 documentation I couldn't find any information on the overload of the 
 evaluation function. I did spent quite some time finding out why my statement 
 didn't return a 24 hour clock value.
 Shouldn't both functions return the same?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3850) hour() function returns 12 hour clock value when using timestamp datatype

2013-01-30 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567356#comment-13567356
 ] 

Mark Grover commented on HIVE-3850:
---

Patch looks good to me.

Usually, I would ask for unit tests to be added with any change but given that 
it's a trivial change, I would be ok without new tests. We should, however, 
make sure we update the existing unit tests if needed.

Did you get a chance to run the unit tests (atleast the ones that use hour UDF) 
and make sure no changes are required in their output?


 hour() function returns 12 hour clock value when using timestamp datatype
 -

 Key: HIVE-3850
 URL: https://issues.apache.org/jira/browse/HIVE-3850
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.9.0
Reporter: Pieterjan Vriends
 Attachments: HIVE-3850.patch.txt


 Apparently UDFHour.java does have two evaluate() functions. One that does 
 accept a Text object as parameter and one that does use a TimeStampWritable 
 object as parameter. The first function does return the value of 
 Calendar.HOUR_OF_DAY and the second one of Calendar.HOUR. In the 
 documentation I couldn't find any information on the overload of the 
 evaluation function. I did spent quite some time finding out why my statement 
 didn't return a 24 hour clock value.
 Shouldn't both functions return the same?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HIVE-3850) hour() function returns 12 hour clock value when using timestamp datatype

2013-01-30 Thread Mark Grover (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Grover reopened HIVE-3850:
---


The change wasn't committed, re-opening the JIRA.

 hour() function returns 12 hour clock value when using timestamp datatype
 -

 Key: HIVE-3850
 URL: https://issues.apache.org/jira/browse/HIVE-3850
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.9.0, 0.10.0
Reporter: Pieterjan Vriends
 Fix For: 0.11.0

 Attachments: HIVE-3850.patch.txt


 Apparently UDFHour.java does have two evaluate() functions. One that does 
 accept a Text object as parameter and one that does use a TimeStampWritable 
 object as parameter. The first function does return the value of 
 Calendar.HOUR_OF_DAY and the second one of Calendar.HOUR. In the 
 documentation I couldn't find any information on the overload of the 
 evaluation function. I did spent quite some time finding out why my statement 
 didn't return a 24 hour clock value.
 Shouldn't both functions return the same?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan

2013-01-30 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3778:
---

Attachment: HIVE-3778.patch.9

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8, 
 HIVE-3778.patch.9


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan

2013-01-30 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3778:
-

Status: Open  (was: Patch Available)

comments

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8, 
 HIVE-3778.patch.9


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan

2013-01-30 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3778:
---

Attachment: HIVE-3778.patch.10

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.10, HIVE-3778.patch.3, 
 HIVE-3778.patch.6, HIVE-3778.patch.8, HIVE-3778.patch.9


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan

2013-01-30 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3778:
---

Attachment: HIVE-3778.patch.10

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.10, HIVE-3778.patch.10, 
 HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8, HIVE-3778.patch.9


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan

2013-01-30 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3778:
---

Status: Patch Available  (was: Open)

patch is available.

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.10, HIVE-3778.patch.10, 
 HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8, HIVE-3778.patch.9


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join

2013-01-30 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3403:
-

Attachment: hive.3403.21.patch

 user should not specify mapjoin to perform sort-merge bucketed join
 ---

 Key: HIVE-3403
 URL: https://issues.apache.org/jira/browse/HIVE-3403
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3403.10.patch, hive.3403.11.patch, 
 hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, 
 hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, 
 hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, 
 hive.3403.21.patch, hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, 
 hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, 
 hive.3403.9.patch


 Currently, in order to perform a sort merge bucketed join, the user needs
 to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
 mapjoin hint.
 The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: reduced unit test timings

2013-01-30 Thread Ashutosh Chauhan
I am not sure about half, but
https://issues.apache.org/jira/browse/HIVE-3947 has certainly helped. Both
MiniMRCliDriver and NegativeMiniMRCliDriver used to remain in hung state
for ~10 minutes after all tests have run and minicluster is tearing down.
That patch has saved atleast ~15 mins for test runs in my environment.

Thanks to Navis for that!

Ashutosh


On Wed, Jan 30, 2013 at 8:23 PM, Namit Jain nj...@fb.com wrote:

 I have noticed that the time taken to run the unit tests has reduced
 considerably (it has become nearly half) from the last week or so.
 Just wondering, if anyone else has noticed this too.

 If yes, does anyone know the root cause of this speedup ?

 Thanks,
 -namit



Re: reduced unit test timings

2013-01-30 Thread Namit Jain
I run tests on a parallel cluster (8 machines).
For that, the test time has gone down from 2:15hours to approx. 1:15


On 1/31/13 11:55 AM, Ashutosh Chauhan hashut...@apache.org wrote:

I am not sure about half, but
https://issues.apache.org/jira/browse/HIVE-3947 has certainly helped. Both
MiniMRCliDriver and NegativeMiniMRCliDriver used to remain in hung state
for ~10 minutes after all tests have run and minicluster is tearing down.
That patch has saved atleast ~15 mins for test runs in my environment.

Thanks to Navis for that!

Ashutosh


On Wed, Jan 30, 2013 at 8:23 PM, Namit Jain nj...@fb.com wrote:

 I have noticed that the time taken to run the unit tests has reduced
 considerably (it has become nearly half) from the last week or so.
 Just wondering, if anyone else has noticed this too.

 If yes, does anyone know the root cause of this speedup ?

 Thanks,
 -namit




[jira] [Updated] (HIVE-3917) Support noscan operation for analyze command

2013-01-30 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3917:
---

Attachment: HIVE-3917.patch.3

 Support noscan operation for analyze command
 

 Key: HIVE-3917
 URL: https://issues.apache.org/jira/browse/HIVE-3917
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.11.0
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3917.patch.1, HIVE-3917.patch.2, HIVE-3917.patch.3


 hive supports analyze command to gather statistics from existing 
 tables/partition 
 https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
 It collects:
 1. Number of Rows
 2. Number of files
 3. Size in Bytes
 If table/partition is big, the operation would take time since it will open 
 all files and scan all data.
 It would be nice to support fast operation to gather statistics which doesn't 
 require to open all files:
 1. Number of files
 2. Size in Bytes
 Potential syntax is 
 ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
 COMPUTE STATISTICS [noscan];
 In the future, all statistics without scan can be retrieved via this optional 
 parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join

2013-01-30 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3403:
-

Attachment: hive.3403.22.patch

 user should not specify mapjoin to perform sort-merge bucketed join
 ---

 Key: HIVE-3403
 URL: https://issues.apache.org/jira/browse/HIVE-3403
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3403.10.patch, hive.3403.11.patch, 
 hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, 
 hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, 
 hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, 
 hive.3403.21.patch, hive.3403.22.patch, hive.3403.2.patch, hive.3403.3.patch, 
 hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, 
 hive.3403.8.patch, hive.3403.9.patch


 Currently, in order to perform a sort merge bucketed join, the user needs
 to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
 mapjoin hint.
 The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join

2013-01-30 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567412#comment-13567412
 ] 

Namit Jain commented on HIVE-3403:
--

To help in review, the class hierarchy is:

AbstractBucketJoinProc
 AbstractSMBJoinProc
   SortedMergeBucketMapjoinProc
   SortedMergeJoinProc
 BucketMapjoinOptProc


The context needed is:

BucketJoinOptProcCtx
 SortBucketJoinOptProcCtx

Most of the code in AbstractBucketJoinProc and AbstractSMBJoinProc is old code 
moved.
BucketMapjoinOptProc is also old code – but there has been little refactoring 
to break it up into context.

As such, the only new code is SortedMergeJoinProc. Due to the refactoring, I am 
able to re-use a lot of code
between map-join and join processing.


 user should not specify mapjoin to perform sort-merge bucketed join
 ---

 Key: HIVE-3403
 URL: https://issues.apache.org/jira/browse/HIVE-3403
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3403.10.patch, hive.3403.11.patch, 
 hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, 
 hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, 
 hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, 
 hive.3403.21.patch, hive.3403.22.patch, hive.3403.2.patch, hive.3403.3.patch, 
 hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, 
 hive.3403.8.patch, hive.3403.9.patch


 Currently, in order to perform a sort merge bucketed join, the user needs
 to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
 mapjoin hint.
 The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira