Spark SQL document states: Tables with buckets: bucket is the hash partitioning within a Hive table partition. Spark SQL doesn’t support buckets yet
What exactly does that mean?: - that writing to bucketed table wont respect this feature and data will be written in not bucketed manner? - that reading from bucketed table won't use this feature to improve performance? - both? Also, event if bucketing is not supported for reading - do we benefit from having bucketed table just because of the way data is stored in hdfs? If we read bucketed table in spark is it more likely that data from the same bucket will be processed by the same task/executor?