I'm having trouble understanding the difference between a skewed table and a list bucketed table:
https://cwiki.apache.org/confluence/display/Hive/ListBucketing Is the only difference that ListBucketing stores the data as directories and a "plain" skewed table stores them as files? I think that's what the wiki page is saying, but it's very confusing. For one, the title of the page is ListBucketing and in many places it seems to use the phrase "List Bucketing" as the general feature of partitioning a table by skewed columns (whether in directories or files). There's a section "Skewed Table vs. List Bucketing Table" (https://cwiki.apache.org/confluence/display/Hive/ListBucketing#ListBucketing-ListBucketing) that I would assume would spell out the differences between the two, but it says: - Skewed Table is a table which has skewed information. - List Bucketing Table is a skewed table. In addition, it tells Hive to use the list bucketing feature on the skewed table: create sub-directories for skewed values. That makes it seem like "the list bucketing feature" is just using sub-directories for the data. If that's the case, why is the whole article titled ListBucketing, and why is the section describing the basic idea (that apparently both skewed tables and list bucketed tables have in common) titled just "List Bucketing" (https://cwiki.apache.org/confluence/display/Hive/ListBucketing#ListBucketing-ListBucketing). The article also says, "Mainly due to its sub-directory nature, list bucketing can't coexist with some features." So does that mean just list bucketing (the subdirectory feature that skewed tables can have as an option) is incompatible with the features mentioned, or does it mean that any skewed table is incompatible with said features. -Steve
