Are all or ost of the 1000+ columns populated for a given row? If they are sparse you can replace them with a single map collection column which would only occupy the entries that are populated.
-- Jack Krupansky On Tue, Nov 24, 2015 at 11:04 AM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > As always, your queries should drive your data model. Unless you really > need 1000+ columns for most queries, you should consider separate tables > for the subsets of the columns that need to be returned for a given query. > > The new 3.0 Materialized View feature can be used to easily create subsets > of a base table, although that begs the question of whether you ever really > need all 1000+ columns in the same table. > > -- Jack Krupansky > > On Tue, Nov 24, 2015 at 10:45 AM, Kai Wang <dep...@gmail.com> wrote: > >> Hi all, >> >> If I have the following table: >> CREATE TABLE t ( >> pk int, >> ck int, >> c1 int, >> c2 int, >> ... >> PRIMARY KEY (pk, ck) >> ) >> >> There are lots of non-clustering columns (1000+). From time to time I >> need to do a query like this: >> >> SELECT c1 FROM t WHERE pk = abc AND ck > xyz; >> >> How efficient is this query compared to SELECT * ...? Apparently SELECT >> c1 would save a lot of network bandwidth since only c1 needs to be >> transferred on the wire. But I am more interested in the impact on disk IO. >> If I understand C* storage engine correctly, one CQL row is clustered >> together on disk. That means c1 from different rows are stored apart. In >> the case of SELECT c1, does C* do multiple seeks to only lift c1 of each >> row from disk or lift the whole row into memory and return c1 from there? >> >> From comments on https://issues.apache.org/jira/browse/CASSANDRA-5762 it >> seems C* lifts the whole row as of 1.2.7. Is this still the case on 2.1.*? >> >> Thanks. >> > >