Thanks, Sylvain! I'll read it most thoroughly but after a quick glance I wish to repeat my another (implied) question that I believe will not be answered in these articles.
Why does the explicit definition of columns in a column family significantly improve performance and key cache hit ratio (the last one being almost zero when there are no explicit column definitions)? 2013/8/30 Sylvain Lebresne <sylv...@datastax.com> > The short story is that you're probably not up to date on how CQL and > thrift table definition relate to one another, and that may not be exactly > how you think it does. If you haven't done so, I'd suggest the reading of > http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows(should > answer your "what about dynamic column name" case) and > http://www.datastax.com/dev/blog/thrift-to-cql3 (should help explain how > CQL3 interprets thrift table, and why your saw what you saw). > > -- > Sylvain > > > On Fri, Aug 30, 2013 at 9:50 AM, Alexander Shutyaev <shuty...@gmail.com>wrote: > >> Hi all! >> >> We have encountered the following problem. We create our column families >> via hector like this: >> >> ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(* >> "mykeyspace"*, *"mycf"*); >> cfdef.setColumnType(ColumnType.*STANDARD*); >> cfdef.setComparatorType(ComparatorType.*UTF8TYPE*); >> cfdef.setDefaultValidationClass(*"BytesType"*); >> cfdef.setKeyValidationClass(*"UTF8Type"*); >> cfdef.setReadRepairChance(0.1); >> cfdef.setGcGraceSeconds(864000); >> cfdef.setMinCompactionThreshold(4); >> cfdef.setMaxCompactionThreshold(32); >> cfdef.setReplicateOnWrite(*true*); >> cfdef.setCompactionStrategy(*"SizeTieredCompactionStrategy"*); >> Map<String, String> compressionOptions = *new* HashMap<String, String>(); >> compressionOptions.put(*"sstable_compression"*, *""*); >> cfdef.setCompressionOptions(compressionOptions); >> cluster.addColumnFamily(cfdef, *true*); >> >> When we *describe *this column family via *cqlsh* we get this >> >> CREATE TABLE "mycf" ( >> key text, >> column1 text, >> value blob, >> PRIMARY KEY (key, column1) >> ) WITH COMPACT STORAGE AND >> bloom_filter_fp_chance=0.010000 AND >> caching='KEYS_ONLY' AND >> comment='' AND >> dclocal_read_repair_chance=0.000000 AND >> gc_grace_seconds=864000 AND >> read_repair_chance=0.100000 AND >> replicate_on_write='true' AND >> populate_io_cache_on_flush='false' AND >> compaction={'class': 'SizeTieredCompactionStrategy'} AND >> compression={}; >> >> As you can see there is a mysterious *column1* and moreover it is added >> to the primary key. We've thought it wrong so we've tried getting rid of >> it. We've managed to do it by adding explicit column definitions like this: >> >> BasicColumnDefinition cdef = new BasicColumnDefinition(); >> cdef.setName(StringSerializer.get().toByteBuffer(*"mycolumn"*)); >> cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName()); >> cdef.setIndexType(ColumnIndexType.*CUSTOM*); >> cfdef.addColumnDefinition(cDef); >> >> After this the primary key was like >> >> PRIMARY KEY (key) >> >> The effect of this was *overwhelming* - we got a tremendous performance >> improvement and according to stats, the key cache began working while >> previously its hit ratio was close to zero. >> >> My questions are >> >> 1) What is this all about? Is what we did right? >> 2) In this project we can provide explicit column definitions. But in >> another project we have some column families where this is not possible >> because column names are dynamic (based on timestamps). If what we did is >> right - how can we adapt this solution to the dynamic column name case? >> > >