Re: Support for Hive buckets

2018-12-02 Thread welder404
https://issues.apache.org/jira/browse/SPARK-19256 is an active umbrella feature. But as of 2.2, you can invoke APIs on DataFrames today to bucketize them on serialization using Hive. If you invoke val bucketCount = 100 df1 .repartition(bucketCount, col("a"), col("b")) .bucketBy(bucketCount,

Re: Support for Hive buckets

2014-12-24 Thread tanejagagan
billion rows -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Support-for-Hive-buckets-tp8421p9905.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com

Re: Support for Hive buckets

2014-09-22 Thread Michael Armbrust
: I noticed that the release notes for 1.1.0 said that spark doesn't support Hive buckets yet. I didn't notice any jira issues related to adding support. Broadly speaking, what would be involved in supporting buckets, especially the bucketmapjoin and sortedmerge optimizations?

Support for Hive buckets

2014-09-14 Thread Cody Koeninger
I noticed that the release notes for 1.1.0 said that spark doesn't support Hive buckets yet. I didn't notice any jira issues related to adding support. Broadly speaking, what would be involved in supporting buckets, especially the bucketmapjoin and sortedmerge optimizations?