date:20130130


[ 
https://issues.apache.org/jira/browse/HIVE-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566399#comment-13566399
 ] 

Namit Jain commented on HIVE-3785:
--

I am sorry for the delay on my part.
Can you refresh ? I will definitely review this time.

 Core hive changes for HiveServer2 implementation
 

 Key: HIVE-3785
 URL: https://issues.apache.org/jira/browse/HIVE-3785
 Project: Hive
  Issue Type: Sub-task
  Components: Authentication, Build Infrastructure, Configuration, 
 Thrift API
Affects Versions: 0.10.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HS2-changed-files-only.patch


 The subtask to track changes in the core hive components for HiveServer2 
 implementation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3785) Core hive changes for HiveServer2 implementation


[ 
https://issues.apache.org/jira/browse/HIVE-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566400#comment-13566400
 ] 

Namit Jain commented on HIVE-3785:
--

cc [~mgrover], [~prasadm]

 Core hive changes for HiveServer2 implementation
 

 Key: HIVE-3785
 URL: https://issues.apache.org/jira/browse/HIVE-3785
 Project: Hive
  Issue Type: Sub-task
  Components: Authentication, Build Infrastructure, Configuration, 
 Thrift API
Affects Versions: 0.10.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HS2-changed-files-only.patch


 The subtask to track changes in the core hive components for HiveServer2 
 implementation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3950) Remove code for merging files via MR job


[ 
https://issues.apache.org/jira/browse/HIVE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566401#comment-13566401
 ] 

Hudson commented on HIVE-3950:
--

Integrated in Hive-trunk-hadoop2 #97 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/97/])
HIVE-3950 : Remove code for merging files via MR job (Ashutosh Chauhan, 
Reviewed by Namit Jain) (Revision 1440238)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1440238
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientnegative/dyn_part_merge.q
* /hive/trunk/ql/src/test/results/clientnegative/dyn_part_merge.q.out


 Remove code for merging files via MR job
 

 Key: HIVE-3950
 URL: https://issues.apache.org/jira/browse/HIVE-3950
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.11.0

 Attachments: hive-3950_1.patch, hive-3950_2.patch, hive-3950.patch


 Hive can merge files either via MR job or via map only job. Obviously, doing 
 it via map-only job is more efficient, but there is an option of doing it via 
 MR job as well because CombineFileInputFormat is available only in 
 hadoop-0.20 and later. Since, we no longer support hadoop versions earlier 
 than 20 anymore all that is now dead code, we should get rid of it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-933) Infer bucketing/sorting properties


[ 
https://issues.apache.org/jira/browse/HIVE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566402#comment-13566402
 ] 

Hudson commented on HIVE-933:
-

Integrated in Hive-trunk-hadoop2 #97 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/97/])
HIVE-933 Infer bucketing/sorting properties
(Kevin Wilfong via namit) (Revision 1440271)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1440271
Files : 
* /hive/trunk/build-common.xml
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lib/RuleExactMatch.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingCtx.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingOpProcFactory.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java
* /hive/trunk/ql/src/test/queries/clientnegative/merge_negative_3.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_bucketed_table.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_convert_join.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_dyn_part.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_grouping_operators.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_list_bucket.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_map_operators.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_merge.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_multi_insert.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_num_buckets.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_reducers_power_two.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q
* /hive/trunk/ql/src/test/results/clientnegative/merge_negative_3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/ctas.q.out
* /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_bucketed_table.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_convert_join.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_dyn_part.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_list_bucket.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_map_operators.q.out
* /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_merge.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_multi_insert.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_num_buckets.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_reducers_power_two.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/case_sensitivity.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/cast1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input20.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input5.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input6.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input7.q.xml
*

[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join


 [ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3403:
-

Attachment: hive.3403.19.patch

 user should not specify mapjoin to perform sort-merge bucketed join
 ---

 Key: HIVE-3403
 URL: https://issues.apache.org/jira/browse/HIVE-3403
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3403.10.patch, hive.3403.11.patch, 
 hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, 
 hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, 
 hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, hive.3403.2.patch, 
 hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, 
 hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch


 Currently, in order to perform a sort merge bucketed join, the user needs
 to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
 mapjoin hint.
 The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3950) Remove code for merging files via MR job


[ 
https://issues.apache.org/jira/browse/HIVE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566422#comment-13566422
 ] 

Hudson commented on HIVE-3950:
--

Integrated in Hive-trunk-h0.21 #1946 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1946/])
HIVE-3950 : Remove code for merging files via MR job (Ashutosh Chauhan, 
Reviewed by Namit Jain) (Revision 1440238)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1440238
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientnegative/dyn_part_merge.q
* /hive/trunk/ql/src/test/results/clientnegative/dyn_part_merge.q.out


 Remove code for merging files via MR job
 

 Key: HIVE-3950
 URL: https://issues.apache.org/jira/browse/HIVE-3950
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.11.0

 Attachments: hive-3950_1.patch, hive-3950_2.patch, hive-3950.patch


 Hive can merge files either via MR job or via map only job. Obviously, doing 
 it via map-only job is more efficient, but there is an option of doing it via 
 MR job as well because CombineFileInputFormat is available only in 
 hadoop-0.20 and later. Since, we no longer support hadoop versions earlier 
 than 20 anymore all that is now dead code, we should get rid of it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Hive-trunk-h0.21 - Build # 1946 - Failure

Changes for Build #1944
[namit] HIVE-3873 lot of tests failing for hadoop 23
(Gang Tim Liu via namit)


Changes for Build #1945
[hashutosh] Missed deleting empty file GenMRRedSink4.java while commiting 3784

[hashutosh] HIVE-de-emphasize mapjoin hint (Namit Jain via Ashutosh Chauhan)


Changes for Build #1946
[hashutosh] HIVE-3950 : Remove code for merging files via MR job (Ashutosh 
Chauhan, Reviewed by Namit Jain)




1 tests failed.
FAILED:  
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_aggregator_error_1

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.
at 
net.sf.antcontrib.logic.ForTask.doSequentialIteration(ForTask.java:259)
at net.sf.antcontrib.logic.ForTask.doToken(ForTask.java:268)
at net.sf.antcontrib.logic.ForTask.doTheTasks(ForTask.java:324)
at net.sf.antcontrib.logic.ForTask.execute(ForTask.java:244)




The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1946)

Status: Failure

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1946/ to 
view the results.

[jira] [Commented] (HIVE-933) Infer bucketing/sorting properties


[ 
https://issues.apache.org/jira/browse/HIVE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566726#comment-13566726
 ] 

Hudson commented on HIVE-933:
-

Integrated in Hive-trunk-h0.21 #1947 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1947/])
HIVE-933 Infer bucketing/sorting properties
(Kevin Wilfong via namit) (Revision 1440271)

 Result = SUCCESS
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1440271
Files : 
* /hive/trunk/build-common.xml
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lib/RuleExactMatch.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingCtx.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingOpProcFactory.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java
* /hive/trunk/ql/src/test/queries/clientnegative/merge_negative_3.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_bucketed_table.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_convert_join.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_dyn_part.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_grouping_operators.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_list_bucket.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_map_operators.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_merge.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_multi_insert.q
* /hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_num_buckets.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/infer_bucket_sort_reducers_power_two.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q
* /hive/trunk/ql/src/test/results/clientnegative/merge_negative_3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/ctas.q.out
* /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_bucketed_table.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_convert_join.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_dyn_part.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_list_bucket.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_map_operators.q.out
* /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_merge.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_multi_insert.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_num_buckets.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort_reducers_power_two.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/case_sensitivity.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/cast1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input20.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input5.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input6.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input7.q.xml
*

Hive-trunk-h0.21 - Build # 1947 - Fixed

Changes for Build #1944
[namit] HIVE-3873 lot of tests failing for hadoop 23
(Gang Tim Liu via namit)


Changes for Build #1945
[hashutosh] Missed deleting empty file GenMRRedSink4.java while commiting 3784

[hashutosh] HIVE-de-emphasize mapjoin hint (Namit Jain via Ashutosh Chauhan)


Changes for Build #1946
[hashutosh] HIVE-3950 : Remove code for merging files via MR job (Ashutosh 
Chauhan, Reviewed by Namit Jain)


Changes for Build #1947
[namit] HIVE-933 Infer bucketing/sorting properties
(Kevin Wilfong via namit)




All tests passed

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1947)

Status: Fixed

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1947/ to 
view the results.

[jira] [Updated] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive


 [ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3874:
-

Attachment: hive.3874.2.patch

 Create a new Optimized Row Columnar file format for Hive
 

 Key: HIVE-3874
 URL: https://issues.apache.org/jira/browse/HIVE-3874
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hive.3874.2.patch, OrcFileIntro.pptx, orc.tgz


 There are several limitations of the current RC File format that I'd like to 
 address by creating a new format:
 * each column value is stored as a binary blob, which means:
 ** the entire column value must be read, decompressed, and deserialized
 ** the file format can't use smarter type-specific compression
 ** push down filters can't be evaluated
 * the start of each row group needs to be found by scanning
 * user metadata can only be added to the file when the file is created
 * the file doesn't store the number of rows per a file or row group
 * there is no mechanism for seeking to a particular row number, which is 
 required for external indexes.
 * there is no mechanism for storing light weight indexes within the file to 
 enable push-down filters to skip entire row groups.
 * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive


[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566727#comment-13566727
 ] 

Namit Jain commented on HIVE-3874:
--

I took a stab at it. I am attaching it just in case - feel free to ignore it.
I was not able to get the protocol buffer file auto-generated from ant, so I 
manually generated it for the
purpose of this patch.

 Create a new Optimized Row Columnar file format for Hive
 

 Key: HIVE-3874
 URL: https://issues.apache.org/jira/browse/HIVE-3874
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hive.3874.2.patch, OrcFileIntro.pptx, orc.tgz


 There are several limitations of the current RC File format that I'd like to 
 address by creating a new format:
 * each column value is stored as a binary blob, which means:
 ** the entire column value must be read, decompressed, and deserialized
 ** the file format can't use smarter type-specific compression
 ** push down filters can't be evaluated
 * the start of each row group needs to be found by scanning
 * user metadata can only be added to the file when the file is created
 * the file doesn't store the number of rows per a file or row group
 * there is no mechanism for seeking to a particular row number, which is 
 required for external indexes.
 * there is no mechanism for storing light weight indexes within the file to 
 enable push-down filters to skip entire row groups.
 * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3940) Track columns accessed in each table in a query

2013-01-30 Thread Samuel Yuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samuel Yuan updated HIVE-3940:
--

Attachment: HIVE-3940.3.patch.txt

Updated.

 Track columns accessed in each table in a query
 ---

 Key: HIVE-3940
 URL: https://issues.apache.org/jira/browse/HIVE-3940
 Project: Hive
  Issue Type: Task
  Components: Query Processor
Reporter: Samuel Yuan
Assignee: Samuel Yuan
Priority: Minor
 Attachments: HIVE-3940.1.patch.txt, HIVE-3940.2.patch.txt, 
 HIVE-3940.3.patch.txt


 Similar to partition access logs, we need to have columns access logs, so 
 later we can build tools/reports to inform users if there are wasted columns 
 in a table to be trimmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-30 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566810#comment-13566810
 ] 

Gunther Hagleitner commented on HIVE-2340:
--

FYI: Ran all unit tests on patch .9. Failing tests are: 
groupby_distinct_samekey.q,join31.q,reduce_deduplicate_extended.q 
(TestCliDriver). Failures look like outdated golden files (explain output 
changed). Uploaded testclidriver.txt for reference.

 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
 HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
 HIVE-2340.D1209.9.patch


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2340) optimize orderby followed by a groupby

2013-01-30 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-2340:
-

Attachment: testclidriver.txt

just the diff of the latest unit test run.

 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
 HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
 HIVE-2340.D1209.9.patch, testclidriver.txt


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-30 Thread Phabricator (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566898#comment-13566898
 ] 

Phabricator commented on HIVE-2340:
---

hagleitn has commented on the revision HIVE-2340 [jira] optimize orderby 
followed by a groupby.

  Partial review

INLINE COMMENTS
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:521 Not sure why 
this is needed or why this defaults to 4. From comment below it seems this is 
just to avoid the single reducer order-by case for performance reasons, is that 
correct?
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:787
 Is this required or extra protection? Comment at the top of the file says 
mapjoin optimization happens before this (and probably should for performance 
reasons). Also, if I understand it correctly joinAndSort might be a better 
name than fixed. You're basically saying that if an optimization wants to 
change the join after this they need to make sure the ordering of the keys is 
preserved, right?
  
ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java:136 
seems orthogonal to this patch.
  ql/src/test/queries/clientpositive/reduce_deduplicate.q:7 There are not a lot 
of tests, for min.reducer=1. No order by case for instance. Maybe the 
reduce_deduplicate_extended.q should run with both default and min.reducer=1.

REVISION DETAIL
  https://reviews.facebook.net/D1209

To: JIRA, navis
Cc: hagleitn


 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
 HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
 HIVE-2340.D1209.9.patch, testclidriver.txt


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-30 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566902#comment-13566902
 ] 

Gunther Hagleitner commented on HIVE-2340:
--

Partial review on phabricator. Biggest question is around 
hive.optimize.reducededuplication.min.reducer. That basically disables the 
orderby followed by groupby optimization which was the original motivation 
for the jira. Navis, can you explain this some more?

Might be another ticket, but would it be possible to optimize group by/sort by 
as well with this?

 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
 HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
 HIVE-2340.D1209.9.patch, testclidriver.txt


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3917) Support noscan operation for analyze command


 [ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3917:
---

Attachment: HIVE-3917.patch.2

 Support noscan operation for analyze command
 

 Key: HIVE-3917
 URL: https://issues.apache.org/jira/browse/HIVE-3917
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.11.0
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3917.patch.1, HIVE-3917.patch.2


 hive supports analyze command to gather statistics from existing 
 tables/partition 
 https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
 It collects:
 1. Number of Rows
 2. Number of files
 3. Size in Bytes
 If table/partition is big, the operation would take time since it will open 
 all files and scan all data.
 It would be nice to support fast operation to gather statistics which doesn't 
 require to open all files:
 1. Number of files
 2. Size in Bytes
 Potential syntax is 
 ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
 COMPUTE STATISTICS [noscan];
 In the future, all statistics without scan can be retrieved via this optional 
 parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HIVE-3917) Support noscan operation for analyze command


 [ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3917 started by Gang Tim Liu.

 Support noscan operation for analyze command
 

 Key: HIVE-3917
 URL: https://issues.apache.org/jira/browse/HIVE-3917
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.11.0
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3917.patch.1, HIVE-3917.patch.2


 hive supports analyze command to gather statistics from existing 
 tables/partition 
 https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
 It collects:
 1. Number of Rows
 2. Number of files
 3. Size in Bytes
 If table/partition is big, the operation would take time since it will open 
 all files and scan all data.
 It would be nice to support fast operation to gather statistics which doesn't 
 require to open all files:
 1. Number of files
 2. Size in Bytes
 Potential syntax is 
 ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
 COMPUTE STATISTICS [noscan];
 In the future, all statistics without scan can be retrieved via this optional 
 parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3917) Support noscan operation for analyze command


 [ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3917:
---

Status: Patch Available  (was: In Progress)

patch is available.

 Support noscan operation for analyze command
 

 Key: HIVE-3917
 URL: https://issues.apache.org/jira/browse/HIVE-3917
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.11.0
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3917.patch.1, HIVE-3917.patch.2


 hive supports analyze command to gather statistics from existing 
 tables/partition 
 https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
 It collects:
 1. Number of Rows
 2. Number of files
 3. Size in Bytes
 If table/partition is big, the operation would take time since it will open 
 all files and scan all data.
 It would be nice to support fast operation to gather statistics which doesn't 
 require to open all files:
 1. Number of files
 2. Size in Bytes
 Potential syntax is 
 ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
 COMPUTE STATISTICS [noscan];
 In the future, all statistics without scan can be retrieved via this optional 
 parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Jenkins build is back to normal : Hive-0.10.0-SNAPSHOT-h0.20.1 #50

See https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/50/

[jira] [Updated] (HIVE-3940) Track columns accessed in each table in a query

2013-01-30 Thread Kevin Wilfong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3940:


   Resolution: Fixed
Fix Version/s: 0.11.0
   Status: Resolved  (was: Patch Available)

Committed, thanks Samuel.

 Track columns accessed in each table in a query
 ---

 Key: HIVE-3940
 URL: https://issues.apache.org/jira/browse/HIVE-3940
 Project: Hive
  Issue Type: Task
  Components: Query Processor
Reporter: Samuel Yuan
Assignee: Samuel Yuan
Priority: Minor
 Fix For: 0.11.0

 Attachments: HIVE-3940.1.patch.txt, HIVE-3940.2.patch.txt, 
 HIVE-3940.3.patch.txt


 Similar to partition access logs, we need to have columns access logs, so 
 later we can build tools/reports to inform users if there are wasted columns 
 in a table to be trimmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.

2013-01-30 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566946#comment-13566946
 ] 

Ashutosh Chauhan commented on HIVE-896:
---

PTFDesc only contains a serialized string for PTFDef. I think we should just 
merge these two classes. Rename the existing PTFDef to PTFDesc and removing the 
existing PTFDef. And than make sure that PTFDesc is serializable. Does that 
sound right?
 

 Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
 ---

 Key: HIVE-896
 URL: https://issues.apache.org/jira/browse/HIVE-896
 Project: Hive
  Issue Type: New Feature
  Components: OLAP, UDF
Reporter: Amr Awadallah
Priority: Minor
 Attachments: DataStructs.pdf, HIVE-896.1.patch.txt, 
 Hive-896.2.patch.txt


 Windowing functions are very useful for click stream processing and similar 
 time-series/sliding-window analytics.
 More details at:
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
 -- amr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.

2013-01-30 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566952#comment-13566952
 ] 

Ashutosh Chauhan commented on HIVE-896:
---

Also need to make sure that ASTNode and other antlr datastructures referenced 
(directly or via contained fields) in this new PTFDesc are not required in 
PTFOperator and are thus not serialized, thereby eliminating antlr runtime 
dependency.

 Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
 ---

 Key: HIVE-896
 URL: https://issues.apache.org/jira/browse/HIVE-896
 Project: Hive
  Issue Type: New Feature
  Components: OLAP, UDF
Reporter: Amr Awadallah
Priority: Minor
 Attachments: DataStructs.pdf, HIVE-896.1.patch.txt, 
 Hive-896.2.patch.txt


 Windowing functions are very useful for click stream processing and similar 
 time-series/sliding-window analytics.
 More details at:
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
 -- amr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3963) Allow Hive to get connect to RDBMS

2013-01-30 Thread Maxime LANCIAUX (JIRA)

Maxime LANCIAUX created HIVE-3963:
-

 Summary: Allow Hive to get connect to RDBMS
 Key: HIVE-3963
 URL: https://issues.apache.org/jira/browse/HIVE-3963
 Project: Hive
  Issue Type: New Feature
Reporter: Maxime LANCIAUX




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3917) Support noscan operation for analyze command


 [ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3917:
---

Attachment: (was: HIVE-3917.patch.2)

 Support noscan operation for analyze command
 

 Key: HIVE-3917
 URL: https://issues.apache.org/jira/browse/HIVE-3917
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.11.0
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3917.patch.1


 hive supports analyze command to gather statistics from existing 
 tables/partition 
 https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
 It collects:
 1. Number of Rows
 2. Number of files
 3. Size in Bytes
 If table/partition is big, the operation would take time since it will open 
 all files and scan all data.
 It would be nice to support fast operation to gather statistics which doesn't 
 require to open all files:
 1. Number of files
 2. Size in Bytes
 Potential syntax is 
 ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
 COMPUTE STATISTICS [noscan];
 In the future, all statistics without scan can be retrieved via this optional 
 parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan


 [ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3778:
---

Attachment: HIVE-3778.patch.8

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan


 [ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3778 started by Gang Tim Liu.

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan


[ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567023#comment-13567023
 ] 

Gang Tim Liu commented on HIVE-3778:


patch is attached to the jira also.

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan


 [ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3778:
---

Status: Patch Available  (was: In Progress)

patch is available https://reviews.facebook.net/D8259

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3917) Support noscan operation for analyze command

[
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567024#comment-13567024
]

Gang Tim Liu commented on HIVE-3917:

patch is in both https://reviews.facebook.net/D8235 and attachment. thanks

Support noscan operation for analyze command

Key: HIVE-3917
URL: https://issues.apache.org/jira/browse/HIVE-3917
Project: Hive
Issue Type: Improvement
Components: Statistics
Affects Versions: 0.11.0
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Attachments: HIVE-3917.patch.1, HIVE-3917.patch.2

hive supports analyze command to gather statistics from existing
tables/partition
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
It collects:
1. Number of Rows
2. Number of files
3. Size in Bytes
If table/partition is big, the operation would take time since it will open
all files and scan all data.
It would be nice to support fast operation to gather statistics which doesn't
require to open all files:
1. Number of files
2. Size in Bytes
Potential syntax is
ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)]
COMPUTE STATISTICS [noscan];
In the future, all statistics without scan can be retrieved via this optional
parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Jenkins build is back to normal : Hive-0.9.1-SNAPSHOT-h0.21 #277

See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/277/

[jira] [Commented] (HIVE-3940) Track columns accessed in each table in a query


[ 
https://issues.apache.org/jira/browse/HIVE-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567149#comment-13567149
 ] 

Hudson commented on HIVE-3940:
--

Integrated in hive-trunk-hadoop1 #60 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/60/])
HIVE-3940. Track columns accessed in each table in a query. (Samuel Yuan 
via kevinwilfong) (Revision 1440695)

 Result = ABORTED
kevinwilfong : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1440695
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnAccessAnalyzer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnAccessInfo.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckColumnAccessHook.java
* /hive/trunk/ql/src/test/queries/clientpositive/column_access_stats.q
* /hive/trunk/ql/src/test/results/clientpositive/column_access_stats.q.out


 Track columns accessed in each table in a query
 ---

 Key: HIVE-3940
 URL: https://issues.apache.org/jira/browse/HIVE-3940
 Project: Hive
  Issue Type: Task
  Components: Query Processor
Reporter: Samuel Yuan
Assignee: Samuel Yuan
Priority: Minor
 Fix For: 0.11.0

 Attachments: HIVE-3940.1.patch.txt, HIVE-3940.2.patch.txt, 
 HIVE-3940.3.patch.txt


 Similar to partition access logs, we need to have columns access logs, so 
 later we can build tools/reports to inform users if there are wasted columns 
 in a table to be trimmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3964) Add upgrade script for Oracle backend to metastore.

2013-01-30 Thread Mithun Radhakrishnan (JIRA)

Mithun Radhakrishnan created HIVE-3964:
--

 Summary: Add upgrade script for Oracle backend to metastore.
 Key: HIVE-3964
 URL: https://issues.apache.org/jira/browse/HIVE-3964
 Project: Hive
  Issue Type: Bug
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan


upgrade-0.9.0-0.10.0.oracle.sql isn't available in 
metastore/scripts/upgrade/oracle. 

This warrants testing as well. My concern is that 
SDS::IS_STOREDASSUBDIRECTORIES is a new, non-nullable column. Existing rows in 
SDS might need updating with a default value (0) before the constraint is 
applied.

I'll post a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.

2013-01-30 Thread Harish Butani (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567271#comment-13567271
]

Harish Butani commented on HIVE-896:

Yes, exactly. Will start to introduce the new Spec classes as noted in the
DataStruct attachment, and refactor the Def classes to remove the antlr
dependency.

But before doing this had to handle the following issue. So the plan we
generate has the form
... - ReduceSink - Extract - PTF Op - ...
The Reduce Sink RowResolver contains the Virtual Columns from its input
Operators. During translation we set the RowResolver of the Extract Op to be
the same as the Reduce Sink RR; and this same RR was used to setup the
ExprNodeDescs in PTF translation. But at runtime the Extract Op doesn't contain
the Virtual Columns and so the internal column names can be different. For e.g.
in our testJoinWithLeadLag testCase, which is a self join on part and also has
a Windowing expression. The RR of the RS op at translation time looks something
like this:
(_co1,_col2,..,_col7,
_col8(vc=true),_col9(vc=true),_col10,_col11,.._col15(vc=true),_col16(vc=true),..)
At runtime the Virtual columns are removed and all the columns after _col7 are
shifted 1 or 2 positions. So in child Operators ColumnExprNodeDescs are no
longer referring to the right columns.
We were handling this issue by recreating the ExprNodeDescs from the ASTNodes
at runtime.
So to avoid carrying forward the ASTNodes we now build a new RR for the Extract
Op, with the Virtual Columns removed. We hand this to the PTFTranslator as the
starting RR to use to translate a PTF Chain.

With the above change, now it should be possible to use the ExprNodeDescs
created during translation in the execution of the PTF Op. So will now start a
sequence of steps to move to the new data structures and avoid recreation of
ExprNodeDescs at runtime.

I apologize if I am not being clear. This is a little hard to explain w/o
walking through an example. Happy to go over this in detail offline.

Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
---

Key: HIVE-896
URL: https://issues.apache.org/jira/browse/HIVE-896
Project: Hive
Issue Type: New Feature
Components: OLAP, UDF
Reporter: Amr Awadallah
Priority: Minor
Attachments: DataStructs.pdf, HIVE-896.1.patch.txt,
Hive-896.2.patch.txt

Windowing functions are very useful for click stream processing and similar
time-series/sliding-window analytics.
More details at:
http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
-- amr

[jira] [Updated] (HIVE-3958) support partial scan for analyze command

2013-01-30 Thread Zhuoluo (Clark) Yang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhuoluo (Clark) Yang updated HIVE-3958:
---

Description:
analyze commands allows us to collect statistics on existing tables/partitions.
It works great but might be slow since it scans all files.

There are 2 ways to speed it up:
1. collect stats without file scan. It may not collect all stats but good and
fast enough for use case. HIVE-3917 addresses it
2. collect stats via partial file scan. It doesn't scan all content of files
but part of it to get file metadata. some examples are
https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) and
HFile of Hbase

This jira is targeted to address the #2

was:
analyze commands allows us to collect statistics on existing tables/partitions.
It works great but might be slow since it scans all files.

There are 2 ways to speed it up:
1. collect stats without file scan. It may not collect all stats but good and
fast enough for use case. Hive-3917 addresses it
2. collect stats via partial file scan. It doesn't scan all content of files
but part of it to get file metadata. some examples are
https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) and
HFile of Hbase

This jira is targeted to address the #2

support partial scan for analyze command

Key: HIVE-3958
URL: https://issues.apache.org/jira/browse/HIVE-3958
Project: Hive
Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu

analyze commands allows us to collect statistics on existing
tables/partitions. It works great but might be slow since it scans all files.
There are 2 ways to speed it up:
1. collect stats without file scan. It may not collect all stats but good and
fast enough for use case. HIVE-3917 addresses it
2. collect stats via partial file scan. It doesn't scan all content of files
but part of it to get file metadata. some examples are
https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 )
and HFile of Hbase
This jira is targeted to address the #2

[jira] [Commented] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan

2013-01-30 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567318#comment-13567318
 ] 

Ashutosh Chauhan commented on HIVE-3778:


Gang cool idea to address the concern. I think we should extend its usage for 
all the different booleans we have in explain of other *Desc classes. That 
probably will update lot more .q.out files so probably should be done in a 
separate ticket. Can you open a follow-up jira for that?  

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan


[ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567331#comment-13567331
 ] 

Gang Tim Liu commented on HIVE-3778:


[~ashutoshc]glad you like it. yes, here is the follow-up jira HIVE-3965.

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3965) Reduce output of explain plan by printing boolean value only if it is true

Gang Tim Liu created HIVE-3965:
--

 Summary: Reduce output of explain plan by printing boolean value 
only if it is true
 Key: HIVE-3965
 URL: https://issues.apache.org/jira/browse/HIVE-3965
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu


Leverage the design in HIVE-3778 to reduce output of explain plan by printing 
boolean value only if it is true.

That probably will update lot more .q.out files so probably should be done in a 
separate ticket than 3778. so it ends up here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

[
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567336#comment-13567336
]

Namit Jain commented on HIVE-3833:
--

[~jakobhoman], this was definitely not intentional. Unfortunately, there was no
test case, so I missed this.
Can you provide me a complete testcase ? I will take a look.

object inspectors should be initialized based on partition metadata
---

Key: HIVE-3833
URL: https://issues.apache.org/jira/browse/HIVE-3833
Project: Hive
Issue Type: Improvement
Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
Fix For: 0.11.0

Attachments: hive.3833.10.patch, hive.3833.11.patch,
hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch,
hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch,
hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch,
hive.3833.21.patch, hive.3833.22.patch, hive.3833.23.patch,
hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch,
hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch

Currently, different partitions can be picked up for the same input split
based on the
serdes' etc. And, we dont allow to change the schema for
LazyColumnarBinarySerDe.
Instead of that, different partitions should be part of the same split, only
if the
partition schemas exactly match. The operator tree object inspectors should
be based
on the partition schema. That would give greater flexibility and also help
using binary serde with rcfile

[jira] [Commented] (HIVE-3953) Reading of partitioned Avro data fails because of missing properties


[ 
https://issues.apache.org/jira/browse/HIVE-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567337#comment-13567337
 ] 

Namit Jain commented on HIVE-3953:
--

Copying from HIVE-3833.
Can you provide me a complete testcase ? I will take a look.

 Reading of partitioned Avro data fails because of missing properties
 

 Key: HIVE-3953
 URL: https://issues.apache.org/jira/browse/HIVE-3953
 Project: Hive
  Issue Type: Bug
Reporter: Mark Wagner

 After HIVE-3833, reading partitioned Avro data fails due to missing 
 properties. The avro.schema.(url|literal) properties are not making it all 
 the way to the SerDe. Non-partitioned data can still be read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

reduced unit test timings

2013-01-30 Thread Namit Jain

I have noticed that the time taken to run the unit tests has reduced 
considerably (it has become nearly half) from the last week or so.
Just wondering, if anyone else has noticed this too.

If yes, does anyone know the root cause of this speedup ?

Thanks,
-namit

[jira] [Commented] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan


[ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567345#comment-13567345
 ] 

Namit Jain commented on HIVE-3778:
--

+1

Running tests

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Review Request: Request to review the change.

2013-01-30 Thread Arun A K


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9171/
---

Review request for hive.


Description
---

Patch for issue https://issues.apache.org/jira/browse/HIVE-3850, Patch has been 
accepted by the person who raised the issue. Please review.


This addresses bug https://issues.apache.org/jira/browse/HIVE-3850.

https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/HIVE-3850


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFHour.java 85b514a 

Diff: https://reviews.apache.org/r/9171/diff/


Testing
---

The change made was tested


Thanks,

Arun A K

[jira] [Commented] (HIVE-3850) hour() function returns 12 hour clock value when using timestamp datatype

2013-01-30 Thread Mark Grover (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567355#comment-13567355
 ] 

Mark Grover commented on HIVE-3850:
---

For completeness, review is at: https://reviews.apache.org/r/9171/

 hour() function returns 12 hour clock value when using timestamp datatype
 -

 Key: HIVE-3850
 URL: https://issues.apache.org/jira/browse/HIVE-3850
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.9.0
Reporter: Pieterjan Vriends
 Attachments: HIVE-3850.patch.txt


 Apparently UDFHour.java does have two evaluate() functions. One that does 
 accept a Text object as parameter and one that does use a TimeStampWritable 
 object as parameter. The first function does return the value of 
 Calendar.HOUR_OF_DAY and the second one of Calendar.HOUR. In the 
 documentation I couldn't find any information on the overload of the 
 evaluation function. I did spent quite some time finding out why my statement 
 didn't return a 24 hour clock value.
 Shouldn't both functions return the same?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3850) hour() function returns 12 hour clock value when using timestamp datatype

2013-01-30 Thread Mark Grover (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567356#comment-13567356
 ] 

Mark Grover commented on HIVE-3850:
---

Patch looks good to me.

Usually, I would ask for unit tests to be added with any change but given that 
it's a trivial change, I would be ok without new tests. We should, however, 
make sure we update the existing unit tests if needed.

Did you get a chance to run the unit tests (atleast the ones that use hour UDF) 
and make sure no changes are required in their output?


 hour() function returns 12 hour clock value when using timestamp datatype
 -

 Key: HIVE-3850
 URL: https://issues.apache.org/jira/browse/HIVE-3850
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.9.0
Reporter: Pieterjan Vriends
 Attachments: HIVE-3850.patch.txt


 Apparently UDFHour.java does have two evaluate() functions. One that does 
 accept a Text object as parameter and one that does use a TimeStampWritable 
 object as parameter. The first function does return the value of 
 Calendar.HOUR_OF_DAY and the second one of Calendar.HOUR. In the 
 documentation I couldn't find any information on the overload of the 
 evaluation function. I did spent quite some time finding out why my statement 
 didn't return a 24 hour clock value.
 Shouldn't both functions return the same?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HIVE-3850) hour() function returns 12 hour clock value when using timestamp datatype

2013-01-30 Thread Mark Grover (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Grover reopened HIVE-3850:
---


The change wasn't committed, re-opening the JIRA.

 hour() function returns 12 hour clock value when using timestamp datatype
 -

 Key: HIVE-3850
 URL: https://issues.apache.org/jira/browse/HIVE-3850
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.9.0, 0.10.0
Reporter: Pieterjan Vriends
 Fix For: 0.11.0

 Attachments: HIVE-3850.patch.txt


 Apparently UDFHour.java does have two evaluate() functions. One that does 
 accept a Text object as parameter and one that does use a TimeStampWritable 
 object as parameter. The first function does return the value of 
 Calendar.HOUR_OF_DAY and the second one of Calendar.HOUR. In the 
 documentation I couldn't find any information on the overload of the 
 evaluation function. I did spent quite some time finding out why my statement 
 didn't return a 24 hour clock value.
 Shouldn't both functions return the same?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan


 [ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3778:
---

Attachment: HIVE-3778.patch.9

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8, 
 HIVE-3778.patch.9


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan


 [ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3778:
-

Status: Open  (was: Patch Available)

comments

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8, 
 HIVE-3778.patch.9


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan


 [ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3778:
---

Attachment: HIVE-3778.patch.10

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.10, HIVE-3778.patch.3, 
 HIVE-3778.patch.6, HIVE-3778.patch.8, HIVE-3778.patch.9


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan


 [ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3778:
---

Attachment: HIVE-3778.patch.10

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.10, HIVE-3778.patch.10, 
 HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8, HIVE-3778.patch.9


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan


 [ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3778:
---

Status: Patch Available  (was: Open)

patch is available.

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.10, HIVE-3778.patch.10, 
 HIVE-3778.patch.3, HIVE-3778.patch.6, HIVE-3778.patch.8, HIVE-3778.patch.9


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join


 [ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3403:
-

Attachment: hive.3403.21.patch

 user should not specify mapjoin to perform sort-merge bucketed join
 ---

 Key: HIVE-3403
 URL: https://issues.apache.org/jira/browse/HIVE-3403
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3403.10.patch, hive.3403.11.patch, 
 hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, 
 hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, 
 hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, 
 hive.3403.21.patch, hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, 
 hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, 
 hive.3403.9.patch


 Currently, in order to perform a sort merge bucketed join, the user needs
 to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
 mapjoin hint.
 The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: reduced unit test timings

2013-01-30 Thread Ashutosh Chauhan

I am not sure about half, but
https://issues.apache.org/jira/browse/HIVE-3947 has certainly helped. Both
MiniMRCliDriver and NegativeMiniMRCliDriver used to remain in hung state
for ~10 minutes after all tests have run and minicluster is tearing down.
That patch has saved atleast ~15 mins for test runs in my environment.

Thanks to Navis for that!

Ashutosh


On Wed, Jan 30, 2013 at 8:23 PM, Namit Jain nj...@fb.com wrote:

 I have noticed that the time taken to run the unit tests has reduced
 considerably (it has become nearly half) from the last week or so.
 Just wondering, if anyone else has noticed this too.

 If yes, does anyone know the root cause of this speedup ?

 Thanks,
 -namit

Re: reduced unit test timings

2013-01-30 Thread Namit Jain

I run tests on a parallel cluster (8 machines).
For that, the test time has gone down from 2:15hours to approx. 1:15


On 1/31/13 11:55 AM, Ashutosh Chauhan hashut...@apache.org wrote:

I am not sure about half, but
https://issues.apache.org/jira/browse/HIVE-3947 has certainly helped. Both
MiniMRCliDriver and NegativeMiniMRCliDriver used to remain in hung state
for ~10 minutes after all tests have run and minicluster is tearing down.
That patch has saved atleast ~15 mins for test runs in my environment.

Thanks to Navis for that!

Ashutosh


On Wed, Jan 30, 2013 at 8:23 PM, Namit Jain nj...@fb.com wrote:

 I have noticed that the time taken to run the unit tests has reduced
 considerably (it has become nearly half) from the last week or so.
 Just wondering, if anyone else has noticed this too.

 If yes, does anyone know the root cause of this speedup ?

 Thanks,
 -namit

[jira] [Updated] (HIVE-3917) Support noscan operation for analyze command


 [ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3917:
---

Attachment: HIVE-3917.patch.3

 Support noscan operation for analyze command
 

 Key: HIVE-3917
 URL: https://issues.apache.org/jira/browse/HIVE-3917
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.11.0
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3917.patch.1, HIVE-3917.patch.2, HIVE-3917.patch.3


 hive supports analyze command to gather statistics from existing 
 tables/partition 
 https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
 It collects:
 1. Number of Rows
 2. Number of files
 3. Size in Bytes
 If table/partition is big, the operation would take time since it will open 
 all files and scan all data.
 It would be nice to support fast operation to gather statistics which doesn't 
 require to open all files:
 1. Number of files
 2. Size in Bytes
 Potential syntax is 
 ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
 COMPUTE STATISTICS [noscan];
 In the future, all statistics without scan can be retrieved via this optional 
 parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join


 [ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3403:
-

Attachment: hive.3403.22.patch

 user should not specify mapjoin to perform sort-merge bucketed join
 ---

 Key: HIVE-3403
 URL: https://issues.apache.org/jira/browse/HIVE-3403
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3403.10.patch, hive.3403.11.patch, 
 hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, 
 hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, 
 hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, 
 hive.3403.21.patch, hive.3403.22.patch, hive.3403.2.patch, hive.3403.3.patch, 
 hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, 
 hive.3403.8.patch, hive.3403.9.patch


 Currently, in order to perform a sort merge bucketed join, the user needs
 to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
 mapjoin hint.
 The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join