does anyone care about list bucketing stored as directories?

Sergey Shelukhin Tue, 03 Oct 2017 12:59:36 -0700

1) There seem to be some bugs and limitations in LB (e.g. incorrect cleanup - 
https://issues.apache.org/jira/browse/HIVE-14886) and nobody appears to as much 
as watch JIRAs ;) Does anyone actually use this stuff? Should we nuke it in 
3.0, and by 3.0 I mean I’ll remove it from master in a few weeks? :)


2) I actually wonder, on top of the same SQL syntax, wouldn’t it be much easier 
to add logic to partitioning to write skew values into partitions and non-skew 
values into a new type of default partition? It won’t affect nearly as many low 
level codepaths in obscure and unobvious ways, instead keeping all the logic in 
metastore and split generation, and would integrate with Hive features like PPD 
automatically.
Esp. if we are ok with the same limitations - e.g. if you add a new skew value 
right now, I’m not sure what happens to the rows with that value already 
sitting in the non-skew directories, but I don’t expect anything reasonable...

does anyone care about list bucketing stored as directories?

Reply via email to