I'm having trouble understanding the difference between a skewed table and a 
list bucketed table:

https://cwiki.apache.org/confluence/display/Hive/ListBucketing

Is the only difference that ListBucketing stores the data as directories and a 
"plain" skewed table stores them as files? I think that's what the wiki page is 
saying, but it's very confusing. For one, the title of the page is 
ListBucketing and in many places it seems to use the phrase "List Bucketing" as 
the general feature of partitioning a table by skewed columns (whether in 
directories or files).

There's a section "Skewed Table vs. List Bucketing Table" 
(https://cwiki.apache.org/confluence/display/Hive/ListBucketing#ListBucketing-ListBucketing)
 that I would assume would spell out the differences between the two, but it 
says:

 - Skewed Table is a table which has skewed information.
 - List Bucketing Table is a skewed table. In addition, it tells Hive to use 
the list bucketing feature on the skewed table: create sub-directories for 
skewed values.

That makes it seem like "the list bucketing feature" is just using 
sub-directories for the data. If that's the case, why is the whole article 
titled ListBucketing, and why is the section describing the basic idea (that 
apparently both skewed tables and list bucketed tables have in common) titled 
just "List Bucketing" 
(https://cwiki.apache.org/confluence/display/Hive/ListBucketing#ListBucketing-ListBucketing).

The article also says, "Mainly due to its sub-directory nature, list bucketing 
can't coexist with some features." So does that mean just list bucketing (the 
subdirectory feature that skewed tables can have as an option) is incompatible 
with the features mentioned, or does it mean that any skewed table is 
incompatible with said features.

-Steve

Reply via email to