Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-27 Thread Deepak Jaiswal


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFHash.java
> > Lines 32 (patched)
> > 
> >
> > Docs for this UDF will probably need to mention that this uses the old 
> > hashing/bucketing scheme which and that a new one has replaced it.
> 
> Deepak Jaiswal wrote:
> Should I open a documentation JIRA to track this?

opened https://issues.apache.org/jira/browse/HIVE-19342


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201133
---


On April 27, 2018, 1:14 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 27, 2018, 1:14 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and 
> Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  fe2b1c1f3c 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
>  03c28a33c8 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
>  2b28a6677e 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
>  cdb67dd786 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
>  2c23a7e94f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
>  a1be085ea5 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  82ba775286 
>   itests/src/test/resources/testconfiguration.properties 1a346593fd 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
> d4363fdf91 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 25035433c7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
>  a42c299537 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
>  86f466fc4e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
>  1bc3fdabac 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 
> 71498a125c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 019682fb10 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
> 7121bceb22 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java
>  5f65f638ca 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerOperatorFactory.java 
> 2be3c9b9a2 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
>  1c5656267d 
>   
> 

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-27 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review202074
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java
Line 341 (original), 352 (patched)


Can you just add a comment here describing why it is ok to hardcode 
bucketing version to 2 here?


- Jason Dere


On April 27, 2018, 1:14 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 27, 2018, 1:14 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and 
> Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  fe2b1c1f3c 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
>  03c28a33c8 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
>  2b28a6677e 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
>  cdb67dd786 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
>  2c23a7e94f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
>  a1be085ea5 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  82ba775286 
>   itests/src/test/resources/testconfiguration.properties 1a346593fd 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
> d4363fdf91 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 25035433c7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
>  a42c299537 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
>  86f466fc4e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
>  1bc3fdabac 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 
> 71498a125c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 019682fb10 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
> 7121bceb22 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java
>  5f65f638ca 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerOperatorFactory.java 
> 2be3c9b9a2 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
>  1c5656267d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionTimeGranularityOptimizer.java
>  0e995d79d2 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java
>  69d9f3125a 
>   

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-26 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/
---

(Updated April 27, 2018, 1:14 a.m.)


Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and Matt 
McCline.


Changes
---

Addressed the comments, updated results, and put bucketing version back in 
OpTraits as it is needed for hybridgracejoin.


Bugs: HIVE-18910
https://issues.apache.org/jira/browse/HIVE-18910


Repository: hive-git


Description
---

Hive uses JAVA hash which is not as good as murmur for better distribution and 
efficiency in bucketing a table.
Migrate to murmur hash but still keep backward compatibility for existing users 
so that they dont have to reload the existing tables.

To keep backward compatibility, bucket_version is added as a table property, 
resulting in high number of result updates.


Diffs (updated)
-

  hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
  hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
153613e6d0 
  hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
  hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
 924e233293 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
 fe2b1c1f3c 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
 03c28a33c8 
  
hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
 996329195c 
  
hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
 f9ee9d9a03 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
 caa00292b8 
  itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
ab8ad77074 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
 2b28a6677e 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
 cdb67dd786 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
 2c23a7e94f 
  
itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
 a1be085ea5 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 82ba775286 
  itests/src/test/resources/testconfiguration.properties 1a346593fd 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java d4363fdf91 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 25035433c7 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
 a42c299537 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
 86f466fc4e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
 1bc3fdabac 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 71498a125c 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 019682fb10 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
7121bceb22 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java
 5f65f638ca 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerOperatorFactory.java 
2be3c9b9a2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
 1c5656267d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionTimeGranularityOptimizer.java
 0e995d79d2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java
 69d9f3125a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
068f25e75f 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
 7b1fd5f206 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1dccf969ff 
  ql/src/java/org/apache/hadoop/hive/ql/plan/OpTraits.java 9621c3be53 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java dde20ed56e 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java aa3c72bc6d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 25b91899de 
  ql/src/java/org/apache/hadoop/hive/ql/plan/VectorReduceSinkDesc.java 
adea3b53a9 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFHash.java 
7cd571815d 
  

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-25 Thread Deepak Jaiswal


> On April 26, 2018, 1:11 a.m., Jason Dere wrote:
> > hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
> > Lines 179 (patched)
> > 
> >
> > Check the existing table params for bucketing_version before 
> > hard-coding to v2.

Will do that.


> On April 26, 2018, 1:11 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
> > Lines 143 (patched)
> > 
> >
> > This derives from Operator? So it should already have the 
> > bucketingVersion field from that?

Good point. Let me verify this and work on it.


> On April 26, 2018, 1:11 a.m., Jason Dere wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java
> > Line 339 (original), 339 (patched)
> > 
> >
> > I think this change is no longer necessary.

I will verify and update accordingly.


> On April 26, 2018, 1:11 a.m., Jason Dere wrote:
> > standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java
> > Lines 89 (patched)
> > 
> >
> > Is this no longer used?

I might have missed the place where it is used in original patch. It is beyond 
scope of this patch. Will track it with HIVE-19311 which makes this redundant 
anyway.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201975
---


On April 25, 2018, 7:21 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 25, 2018, 7:21 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and 
> Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  fe2b1c1f3c 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
>  2b28a6677e 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
>  cdb67dd786 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
>  2c23a7e94f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
>  a1be085ea5 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  82ba775286 
>   itests/src/test/resources/testconfiguration.properties 2c1a76d89b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
> d4363fdf91 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6395c31ec7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
>  86f466fc4e 
>   
> 

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-25 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201975
---




hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
Lines 179 (patched)


Check the existing table params for bucketing_version before hard-coding to 
v2.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
Lines 143 (patched)


This derives from Operator? So it should already have the bucketingVersion 
field from that?



ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java
Line 339 (original), 339 (patched)


I think this change is no longer necessary.



standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java
Lines 89 (patched)


Is this no longer used?


- Jason Dere


On April 25, 2018, 7:21 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 25, 2018, 7:21 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and 
> Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  fe2b1c1f3c 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
>  2b28a6677e 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
>  cdb67dd786 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
>  2c23a7e94f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
>  a1be085ea5 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  82ba775286 
>   itests/src/test/resources/testconfiguration.properties 2c1a76d89b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
> d4363fdf91 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6395c31ec7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
>  86f466fc4e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
>  4077552a56 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
>  1bc3fdabac 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 
> 71498a125c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java dc6cc62fbb 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
> 7121bceb22 
>   
> 

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-25 Thread Deepak Jaiswal


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > hbase-handler/src/test/results/positive/external_table_ppd.q.out
> > Lines 59 (patched)
> > 
> >
> > Are there any tests for the old-style bucketing, to make sure that 
> > previously created bucketed tables still work properly?
> 
> Deepak Jaiswal wrote:
> That is a good point. Will work on it.

TestAcidOnTez#testMapJoinOnTez is one such test.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201133
---


On April 25, 2018, 7:21 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 25, 2018, 7:21 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and 
> Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  fe2b1c1f3c 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
>  2b28a6677e 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
>  cdb67dd786 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
>  2c23a7e94f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
>  a1be085ea5 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  82ba775286 
>   itests/src/test/resources/testconfiguration.properties 2c1a76d89b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
> d4363fdf91 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6395c31ec7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
>  86f466fc4e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
>  4077552a56 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
>  1bc3fdabac 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 
> 71498a125c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java dc6cc62fbb 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
> 7121bceb22 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java
>  5f65f638ca 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerOperatorFactory.java 
> 2be3c9b9a2 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
>  1c5656267d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionTimeGranularityOptimizer.java
>  0e995d79d2 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
> 068f25e75f 
>   

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-25 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/
---

(Updated April 25, 2018, 7:21 a.m.)


Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and Matt 
McCline.


Changes
---

Removed bucketingVersion from Optraits.
Removed the config to create table using old bucketing version. Users can 
always do it by setting the table property.
Cleaned up some old code related to BinarySortSerDe which was still there.


Bugs: HIVE-18910
https://issues.apache.org/jira/browse/HIVE-18910


Repository: hive-git


Description
---

Hive uses JAVA hash which is not as good as murmur for better distribution and 
efficiency in bucketing a table.
Migrate to murmur hash but still keep backward compatibility for existing users 
so that they dont have to reload the existing tables.

To keep backward compatibility, bucket_version is added as a table property, 
resulting in high number of result updates.


Diffs (updated)
-

  hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
  hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
153613e6d0 
  hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
  hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
 924e233293 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
 fe2b1c1f3c 
  
hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
 996329195c 
  
hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
 f9ee9d9a03 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
 caa00292b8 
  itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
ab8ad77074 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
 2b28a6677e 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
 cdb67dd786 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
 2c23a7e94f 
  
itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
 a1be085ea5 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 82ba775286 
  itests/src/test/resources/testconfiguration.properties 2c1a76d89b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java d4363fdf91 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6395c31ec7 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
 86f466fc4e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
 4077552a56 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
 1bc3fdabac 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 71498a125c 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java dc6cc62fbb 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
7121bceb22 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java
 5f65f638ca 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerOperatorFactory.java 
2be3c9b9a2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
 1c5656267d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionTimeGranularityOptimizer.java
 0e995d79d2 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
068f25e75f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a00f9279c0 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java dde20ed56e 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java aa3c72bc6d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 25b91899de 
  ql/src/java/org/apache/hadoop/hive/ql/plan/VectorReduceSinkDesc.java 
adea3b53a9 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFHash.java 
7cd571815d 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMurmurHash.java 
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnAddPartition.java 7f7bc11410 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java 12d57c6feb 
  

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-24 Thread Deepak Jaiswal


> On April 24, 2018, 11:29 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
> > Line 156 (original), 156 (patched)
> > 
> >
> > What is the point of the conf and HIVE_BUCKETING_JAVA_HASH, is it 
> > supposed to be for testing? I don't see this setting being used anywhere.
> 
> Deepak Jaiswal wrote:
> The setting lets user to use old bucketing logic if they want to. I am 
> working on a testcase to cover it.
> 
> Jason Dere wrote:
> Why not just allow users to set bucketing_version in the tble prpperties?

Actually, I like this idea. I will work on undoing it. Thanks


> On April 24, 2018, 11:29 p.m., Jason Dere wrote:
> > ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out
> > Lines 181 (patched)
> > 
> >
> > Why did bucketing version disappear here?
> 
> Deepak Jaiswal wrote:
> I removed a place where I was setting it, possibly due to that.
> 
> Jason Dere wrote:
> Is it a bug that it is not being set here now?

The test has been removed from MR. Works fine on LLAP and Spark.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201866
---


On April 23, 2018, 5:26 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 23, 2018, 5:26 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and 
> Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2403d7ac6c 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
>  5dd0b8ea5b 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
>  ad14c7265f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  3733e3d02f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
>  03c28a33c8 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
>  2b28a6677e 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
>  cdb67dd786 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
>  2c23a7e94f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
>  a1be085ea5 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  82ba775286 
>   itests/src/test/resources/testconfiguration.properties 3aaa68b11f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
> d4363fdf91 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 5fbe045df5 
>   
> 

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-24 Thread Jason Dere


> On April 24, 2018, 11:29 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
> > Line 156 (original), 156 (patched)
> > 
> >
> > What is the point of the conf and HIVE_BUCKETING_JAVA_HASH, is it 
> > supposed to be for testing? I don't see this setting being used anywhere.
> 
> Deepak Jaiswal wrote:
> The setting lets user to use old bucketing logic if they want to. I am 
> working on a testcase to cover it.

Why not just allow users to set bucketing_version in the tble prpperties?


> On April 24, 2018, 11:29 p.m., Jason Dere wrote:
> > ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out
> > Lines 181 (patched)
> > 
> >
> > Why did bucketing version disappear here?
> 
> Deepak Jaiswal wrote:
> I removed a place where I was setting it, possibly due to that.

Is it a bug that it is not being set here now?


- Jason


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201866
---


On April 23, 2018, 5:26 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 23, 2018, 5:26 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and 
> Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2403d7ac6c 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
>  5dd0b8ea5b 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
>  ad14c7265f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  3733e3d02f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
>  03c28a33c8 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
>  2b28a6677e 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
>  cdb67dd786 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
>  2c23a7e94f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
>  a1be085ea5 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  82ba775286 
>   itests/src/test/resources/testconfiguration.properties 3aaa68b11f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
> d4363fdf91 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 5fbe045df5 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
>  a42c299537 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadTable.java
>  ddb26e529e 
>   
> 

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-24 Thread Deepak Jaiswal


> On April 24, 2018, 11:29 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
> > Line 138 (original), 139 (patched)
> > 
> >
> > Remove?

yes. Thanks for pointing it out.


> On April 24, 2018, 11:29 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
> > Line 156 (original), 156 (patched)
> > 
> >
> > What is the point of the conf and HIVE_BUCKETING_JAVA_HASH, is it 
> > supposed to be for testing? I don't see this setting being used anywhere.

The setting lets user to use old bucketing logic if they want to. I am working 
on a testcase to cover it.


> On April 24, 2018, 11:29 p.m., Jason Dere wrote:
> > ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out
> > Lines 181 (patched)
> > 
> >
> > Why did bucketing version disappear here?

I removed a place where I was setting it, possibly due to that.


> On April 24, 2018, 11:29 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
> > Lines 1605 (patched)
> > 
> >
> > Can you use bucketing version from OpTraits, rather than having to 
> > redefine it here?

Actually, this reminded me that I need to undo all the bucketing version code 
changes for optraits as we rely on Operator rather than optraits. The reason 
being, we dont use optraits in MR and it was breaking backward compatibility 
where data is loaded in MR but the query would use newer logic.


> On April 24, 2018, 11:29 p.m., Jason Dere wrote:
> > ql/src/test/results/clientpositive/results_cache_invalidation2.q.out
> > Lines 88 (patched)
> > 
> >
> > This plan should not change to not using the cache. It's possible this 
> > is because of HIVE-19232.

Okay.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201866
---


On April 23, 2018, 5:26 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 23, 2018, 5:26 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and 
> Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2403d7ac6c 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
>  5dd0b8ea5b 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
>  ad14c7265f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  3733e3d02f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
>  03c28a33c8 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
>  2b28a6677e 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
>  cdb67dd786 
>   
> 

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-24 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201866
---




ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
Line 138 (original), 139 (patched)


Remove?



ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
Line 156 (original), 156 (patched)


What is the point of the conf and HIVE_BUCKETING_JAVA_HASH, is it supposed 
to be for testing? I don't see this setting being used anywhere.



ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out
Lines 181 (patched)


Why did bucketing version disappear here?



ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
Lines 1605 (patched)


Can you use bucketing version from OpTraits, rather than having to redefine 
it here?



ql/src/test/results/clientpositive/results_cache_invalidation2.q.out
Lines 88 (patched)


This plan should not change to not using the cache. It's possible this is 
because of HIVE-19232.


- Jason Dere


On April 23, 2018, 5:26 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 23, 2018, 5:26 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and 
> Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2403d7ac6c 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
>  5dd0b8ea5b 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
>  ad14c7265f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  3733e3d02f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
>  03c28a33c8 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
>  2b28a6677e 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
>  cdb67dd786 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
>  2c23a7e94f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
>  a1be085ea5 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  82ba775286 
>   itests/src/test/resources/testconfiguration.properties 3aaa68b11f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
> d4363fdf91 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 5fbe045df5 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
>  a42c299537 
>   
> 

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-23 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/
---

(Updated April 23, 2018, 5:26 p.m.)


Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and Matt 
McCline.


Changes
---

Fixed a minor bug introduced in previous patch. Please disregard the version 
before it for comparison purpose.


Bugs: HIVE-18910
https://issues.apache.org/jira/browse/HIVE-18910


Repository: hive-git


Description
---

Hive uses JAVA hash which is not as good as murmur for better distribution and 
efficiency in bucketing a table.
Migrate to murmur hash but still keep backward compatibility for existing users 
so that they dont have to reload the existing tables.

To keep backward compatibility, bucket_version is added as a table property, 
resulting in high number of result updates.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2403d7ac6c 
  hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
  hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
153613e6d0 
  hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
  hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
 924e233293 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
 5dd0b8ea5b 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
 ad14c7265f 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
 3733e3d02f 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
 03c28a33c8 
  
hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
 996329195c 
  
hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
 f9ee9d9a03 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
 caa00292b8 
  itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
ab8ad77074 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
 2b28a6677e 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
 cdb67dd786 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
 2c23a7e94f 
  
itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
 a1be085ea5 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 82ba775286 
  itests/src/test/resources/testconfiguration.properties 3aaa68b11f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java d4363fdf91 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 5fbe045df5 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
 a42c299537 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadTable.java
 ddb26e529e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
 86f466fc4e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
 4077552a56 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
 1bc3fdabac 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 71498a125c 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java dc6cc62fbb 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 49c355be01 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
7121bceb22 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java
 5f65f638ca 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerOperatorFactory.java 
2be3c9b9a2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
 1c5656267d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionTimeGranularityOptimizer.java
 0e995d79d2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java
 69d9f3125a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
068f25e75f 
  

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-23 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/
---

(Updated April 23, 2018, 4:36 p.m.)


Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and Matt 
McCline.


Changes
---

Implemented several minor fixes and result updates.


Bugs: HIVE-18910
https://issues.apache.org/jira/browse/HIVE-18910


Repository: hive-git


Description
---

Hive uses JAVA hash which is not as good as murmur for better distribution and 
efficiency in bucketing a table.
Migrate to murmur hash but still keep backward compatibility for existing users 
so that they dont have to reload the existing tables.

To keep backward compatibility, bucket_version is added as a table property, 
resulting in high number of result updates.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 536c7b427f 
  hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
  hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
153613e6d0 
  hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
  hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
 924e233293 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
 5dd0b8ea5b 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
 ad14c7265f 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
 3733e3d02f 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
 03c28a33c8 
  
hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
 996329195c 
  
hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
 f9ee9d9a03 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
 caa00292b8 
  itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
ab8ad77074 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
 2b28a6677e 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
 cdb67dd786 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
 2c23a7e94f 
  
itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
 a1be085ea5 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 82ba775286 
  itests/src/test/resources/testconfiguration.properties 3aaa68b11f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java d4363fdf91 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 5fbe045df5 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
 a42c299537 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadTable.java
 ddb26e529e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
 86f466fc4e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
 4077552a56 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
 1bc3fdabac 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 71498a125c 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java fe109d7b96 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 49c355be01 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
7121bceb22 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java
 5f65f638ca 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerOperatorFactory.java 
2be3c9b9a2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
 1c5656267d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionTimeGranularityOptimizer.java
 0e995d79d2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java
 69d9f3125a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
068f25e75f 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
 7b1fd5f206 
  

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-18 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/
---

(Updated April 19, 2018, 1:44 a.m.)


Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and Matt 
McCline.


Changes
---

Added a config to pick bucketing version.
Fixed several junit tests due to changes to hashcode logic.


Bugs: HIVE-18910
https://issues.apache.org/jira/browse/HIVE-18910


Repository: hive-git


Description
---

Hive uses JAVA hash which is not as good as murmur for better distribution and 
efficiency in bucketing a table.
Migrate to murmur hash but still keep backward compatibility for existing users 
so that they dont have to reload the existing tables.

To keep backward compatibility, bucket_version is added as a table property, 
resulting in high number of result updates.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 73492ff99c 
  hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
  hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
153613e6d0 
  hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
  hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
 924e233293 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
 5dd0b8ea5b 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
 ad14c7265f 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
 3733e3d02f 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
 03c28a33c8 
  
hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
 996329195c 
  
hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
 f9ee9d9a03 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
 caa00292b8 
  itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
ab8ad77074 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
 2b28a6677e 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
 cdb67dd786 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
 2c23a7e94f 
  
itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
 a1be085ea5 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 82ba775286 
  itests/src/test/resources/testconfiguration.properties d26f0ccb17 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java d4363fdf91 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 5fbe045df5 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
 a42c299537 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadTable.java
 ddb26e529e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
 86f466fc4e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
 4077552a56 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
 1bc3fdabac 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java fe109d7b96 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 009a890888 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
7121bceb22 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java
 5f65f638ca 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerOperatorFactory.java 
2be3c9b9a2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
 1c5656267d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionTimeGranularityOptimizer.java
 0e995d79d2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java
 69d9f3125a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
e15c5b7b66 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
 7b1fd5f206 
  

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-16 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/
---

(Updated April 17, 2018, 4:57 a.m.)


Review request for hive, Eugene Koifman, Jason Dere, and Matt McCline.


Changes
---

Implemented changes recommended by Matt and Jason.
Using BiFunction to select hashing method at compile time.


Bugs: HIVE-18910
https://issues.apache.org/jira/browse/HIVE-18910


Repository: hive-git


Description
---

Hive uses JAVA hash which is not as good as murmur for better distribution and 
efficiency in bucketing a table.
Migrate to murmur hash but still keep backward compatibility for existing users 
so that they dont have to reload the existing tables.

To keep backward compatibility, bucket_version is added as a table property, 
resulting in high number of result updates.


Diffs (updated)
-

  hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
  hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
153613e6d0 
  hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
  hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
 924e233293 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
 5dd0b8ea5b 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
 ad14c7265f 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
 3733e3d02f 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
 03c28a33c8 
  
hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
 996329195c 
  
hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
 f9ee9d9a03 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
 caa00292b8 
  itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
ab8ad77074 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
 2b28a6677e 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
 cdb67dd786 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
 2c23a7e94f 
  
itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
 a1be085ea5 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
353b890b7c 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 82ba775286 
  itests/src/test/resources/testconfiguration.properties d26f0ccb17 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java d4363fdf91 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 5fbe045df5 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
 86f466fc4e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
 4077552a56 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
 1bc3fdabac 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
7121bceb22 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java
 5f65f638ca 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerOperatorFactory.java 
2be3c9b9a2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
 1c5656267d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionTimeGranularityOptimizer.java
 0e995d79d2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java
 69d9f3125a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
e15c5b7b66 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
 7b1fd5f206 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 88b5ed81f1 
  ql/src/java/org/apache/hadoop/hive/ql/plan/OpTraits.java 9621c3be53 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java dde20ed56e 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java aa3c72bc6d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 25b91899de 
  ql/src/java/org/apache/hadoop/hive/ql/plan/VectorReduceSinkDesc.java 
adea3b53a9 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFHash.java 
7cd571815d 
  

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-16 Thread Deepak Jaiswal


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
> > Lines 813 (patched)
> > 
> >
> > For these primitive types, might make sense to pre-allocate fixed size 
> > ByteBuffers of size 2/4/8 which can be used here rather than having to 
> > allocate new ones for every value.
> 
> Deepak Jaiswal wrote:
> That is how I did it before but it would send a byte array of length 8 
> all the time. The murmur function would consider all 8 bytes to generate 
> hash. When I noticed it was creating different hashes for same key, I found 
> the bug, hence the specific size allocation. Also, it wont affect the 
> efficiency.
> 
> Jason Dere wrote:
> What I mean is this is performing an allocation for every call to 
> hashCode() here, which I think could affect the efficiency. This could be 
> avoided by passing in pre-allocated arrays of each size to this method. Also, 
> could you use the other version of hash32() where you can also pass in the 
> array length - that way you could just use the same array of size 8, but pass 
> in length 2/4/8 depending on which type you are hashing.

Aah. Got it. I wrote the method which takes buffer and its length for other 
purpose and forgot it could be used here.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
> > Lines 858 (patched)
> > 
> >
> > Old impl (based on DateWritable.hashCode()) did hashCode based on 
> > daysSinceEpoc value, will be faster than doing toString()
> 
> Deepak Jaiswal wrote:
> The new one converts it into string format to get bytes array. Are you 
> suggesting what we get from getPrimitiveWritableObject is daysSinceEpoc? And 
> since it is integer, it is faster to convert it into byte array directly 
> rethet than doing "toString"?
> 
> Jason Dere wrote:
> Yes, DateWritable.toString() converts to Date, which then has to call 
> toString() which means date conversion/formatting. Simpler to base it on the 
> int value.

Thanks for confirming.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201133
---


On April 12, 2018, 6:24 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 12, 2018, 6:24 p.m.)
> 
> 
> Review request for hive, Eugene Koifman, Jason Dere, and Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
>  5dd0b8ea5b 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolverImpl.java
>  7c2cadefa7 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
>  ad14c7265f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  3733e3d02f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
>  03c28a33c8 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> 

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-16 Thread Jason Dere


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
> > Lines 813 (patched)
> > 
> >
> > For these primitive types, might make sense to pre-allocate fixed size 
> > ByteBuffers of size 2/4/8 which can be used here rather than having to 
> > allocate new ones for every value.
> 
> Deepak Jaiswal wrote:
> That is how I did it before but it would send a byte array of length 8 
> all the time. The murmur function would consider all 8 bytes to generate 
> hash. When I noticed it was creating different hashes for same key, I found 
> the bug, hence the specific size allocation. Also, it wont affect the 
> efficiency.

What I mean is this is performing an allocation for every call to hashCode() 
here, which I think could affect the efficiency. This could be avoided by 
passing in pre-allocated arrays of each size to this method. Also, could you 
use the other version of hash32() where you can also pass in the array length - 
that way you could just use the same array of size 8, but pass in length 2/4/8 
depending on which type you are hashing.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
> > Lines 858 (patched)
> > 
> >
> > Old impl (based on DateWritable.hashCode()) did hashCode based on 
> > daysSinceEpoc value, will be faster than doing toString()
> 
> Deepak Jaiswal wrote:
> The new one converts it into string format to get bytes array. Are you 
> suggesting what we get from getPrimitiveWritableObject is daysSinceEpoc? And 
> since it is integer, it is faster to convert it into byte array directly 
> rethet than doing "toString"?

Yes, DateWritable.toString() converts to Date, which then has to call 
toString() which means date conversion/formatting. Simpler to base it on the 
int value.


- Jason


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201133
---


On April 12, 2018, 6:24 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 12, 2018, 6:24 p.m.)
> 
> 
> Review request for hive, Eugene Koifman, Jason Dere, and Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
>  5dd0b8ea5b 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolverImpl.java
>  7c2cadefa7 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
>  ad14c7265f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  3733e3d02f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
>  03c28a33c8 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
>  2b28a6677e 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
>  cdb67dd786 
>   
> 

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-15 Thread Deepak Jaiswal


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> >

Thanks for the review. I will work on the issues and update the patch.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > hbase-handler/src/test/results/positive/external_table_ppd.q.out
> > Lines 59 (patched)
> > 
> >
> > Are there any tests for the old-style bucketing, to make sure that 
> > previously created bucketed tables still work properly?

That is a good point. Will work on it.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolverImpl.java
> > Lines 25 (patched)
> > 
> >
> > Unnecessary change?

Yes. I was using it before, but eventually stopped. Thanks for pointing it out.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java
> > Lines 850 (patched)
> > 
> >
> > missing comment?

missed cleanup.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
> > Line 1053 (original), 1051 (patched)
> > 
> >
> > If this occurs every row, I wonder if it would be better to determine 
> > the bucketing version once during initializeOp() and create some object 
> > which knows which knows which bucketing hash code method to call here

Makes sense. Let me work on this.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
> > Lines 469 (patched)
> > 
> >
> > should we validate that this is a valid bucketing version that we 
> > support?

The way I use bucketingVersion, if it is version 2, we use murmur hash, else we 
dont care and use old hashing.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java
> > Lines 639 (patched)
> > 
> >
> > Do we also need to check the bucketing type in the case that op is not 
> > a TableScan? If op is a ReduceSink or Join, would that end up being 
> > bucketingVersion 2?

The idea is to make sure all the tables involved in SMB when it happens in Map, 
then all the tables should have same bucketing version.
I was thinking that if SMB happens at reducer side, it wont matter, but it 
looks like it may. Lets talk about it in person.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/AnnotateWithOpTraits.java
> > Lines 72 (patched)
> > 
> >
> > Was this commented code for testing?

My bad again, should have been cleanedup.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java
> > Lines 411 (patched)
> > 
> >
> > It seems to me a lot of the logic will treat -1 as bucketing version 1, 
> > since there are a lot of (bucketingVersion == 2 ? doVersion2 : doVersion1) 
> > statements. Where in the code would SMB be disabled because of -1 
> > bucketingVersion?

-1 is used when bucketing version for the join can't be determined, thus 
disabling SMB.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java
> > Lines 187 (patched)
> > 
> >
> > Maybe make some common utility to parse/validate bucketing version, 
> > that both places can use?

Hmm. Let me look into this.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java
> > Lines 198 (patched)
> > 
> >
> > Validate bucketing version number?

Same as for Table.java comment.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFHash.java
> > Lines 32 (patched)
> > 
> >
> > Docs for this UDF will probably need to mention that this uses the old 
> > hashing/bucketing scheme which and that a new one has replaced it.

Should I open a documentation JIRA to track this?


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMurmurHash.java
> > Lines 1 

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-13 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201133
---




hbase-handler/src/test/results/positive/external_table_ppd.q.out
Lines 59 (patched)


Are there any tests for the old-style bucketing, to make sure that 
previously created bucketed tables still work properly?



hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolverImpl.java
Lines 25 (patched)


Unnecessary change?



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java
Lines 850 (patched)


missing comment?



ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
Line 1053 (original), 1051 (patched)


If this occurs every row, I wonder if it would be better to determine the 
bucketing version once during initializeOp() and create some object which knows 
which knows which bucketing hash code method to call here



ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
Lines 469 (patched)


should we validate that this is a valid bucketing version that we support?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java
Lines 639 (patched)


Do we also need to check the bucketing type in the case that op is not a 
TableScan? If op is a ReduceSink or Join, would that end up being 
bucketingVersion 2?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/AnnotateWithOpTraits.java
Lines 72 (patched)


Was this commented code for testing?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java
Lines 411 (patched)


It seems to me a lot of the logic will treat -1 as bucketing version 1, 
since there are a lot of (bucketingVersion == 2 ? doVersion2 : doVersion1) 
statements. Where in the code would SMB be disabled because of -1 
bucketingVersion?



ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java
Lines 187 (patched)


Maybe make some common utility to parse/validate bucketing version, that 
both places can use?



ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java
Lines 198 (patched)


Validate bucketing version number?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFHash.java
Lines 32 (patched)


Docs for this UDF will probably need to mention that this uses the old 
hashing/bucketing scheme which and that a new one has replaced it.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMurmurHash.java
Lines 1 (patched)


Missing Apache header



serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
Lines 813 (patched)


For these primitive types, might make sense to pre-allocate fixed size 
ByteBuffers of size 2/4/8 which can be used here rather than having to allocate 
new ones for every value.



serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
Lines 858 (patched)


Old impl (based on DateWritable.hashCode()) did hashCode based on 
daysSinceEpoc value, will be faster than doing toString()



serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
Lines 866 (patched)


Faster to do hashcode based on the underlying values (totalMonths) rather 
than toString



serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
Lines 869 (patched)


Faster to do hashcode based on the underlying values (totalSeconds/nanos) 
rather than toString


- Jason Dere


On April 12, 2018, 6:24 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 12, 2018, 6:24 p.m.)
> 
> 
> Review request for hive, Eugene Koifman, Jason Dere, and Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses 

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-13 Thread Deepak Jaiswal


> On April 13, 2018, 6:05 p.m., Matt McCline wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
> > Lines 338 (patched)
> > 
> >
> > Logging per row too expensive to leave in.

Thanks for pointing out. It was put in for dev purpose but I forgot to remove.


> On April 13, 2018, 6:05 p.m., Matt McCline wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
> > Line 338 (original), 344 (patched)
> > 
> >
> > Unnecessary line.

Sure.


> On April 13, 2018, 6:05 p.m., Matt McCline wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
> > Lines 453 (patched)
> > 
> >
> > Please add comments as to the significanse of checking the acidOp flag.

Will do.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201121
---


On April 12, 2018, 6:24 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 12, 2018, 6:24 p.m.)
> 
> 
> Review request for hive, Eugene Koifman, Jason Dere, and Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
>  5dd0b8ea5b 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolverImpl.java
>  7c2cadefa7 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
>  ad14c7265f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  3733e3d02f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
>  03c28a33c8 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
>  2b28a6677e 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
>  cdb67dd786 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
>  2c23a7e94f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
>  a1be085ea5 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> 353b890b7c 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  5966740f88 
>   itests/src/test/resources/testconfiguration.properties 48d62a8bf9 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
> d4363fdf91 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
>  86f466fc4e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
>  4077552a56 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
>  1bc3fdabac 
>   

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-13 Thread Matt McCline

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201121
---




ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
Lines 338 (patched)


Logging per row too expensive to leave in.



ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
Line 338 (original), 344 (patched)


Unnecessary line.



ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
Lines 453 (patched)


Please add comments as to the significanse of checking the acidOp flag.


- Matt McCline


On April 12, 2018, 6:24 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 12, 2018, 6:24 p.m.)
> 
> 
> Review request for hive, Eugene Koifman, Jason Dere, and Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
>  5dd0b8ea5b 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolverImpl.java
>  7c2cadefa7 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
>  ad14c7265f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  3733e3d02f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
>  03c28a33c8 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
>  2b28a6677e 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
>  cdb67dd786 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
>  2c23a7e94f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
>  a1be085ea5 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> 353b890b7c 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  5966740f88 
>   itests/src/test/resources/testconfiguration.properties 48d62a8bf9 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
> d4363fdf91 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
>  86f466fc4e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
>  4077552a56 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
>  1bc3fdabac 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
> 7121bceb22 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java
>  5f65f638ca 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerOperatorFactory.java 
> 2be3c9b9a2 
>   
>