better performance on long row read
-----------------------------------

                 Key: CASSANDRA-2843
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
             Project: Cassandra
          Issue Type: New Feature
            Reporter: Yang Yang
         Attachments: fast_cf.diff

currently if a row contains > 1000 columns, the run time becomes considerably 
slow (my test of 
a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 40 
bytes in value, is about 16ms.
this is all running in memory, no disk read is involved.

through debugging we can find
most of this time is spent on 
[Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
[Wall Time]  
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
ColumnFamily)
[Wall Time]  
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
ColumnFamily)
[Wall Time]  
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, int, 
ColumnFamily)
[Wall Time]  
org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily, 
Iterator, int)
[Wall Time]  
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
 Iterator, int)
[Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)

ColumnFamily.addColumn() is slow because it inserts into an internal 
concurrentSkipListMap() that maps column names to values.
this structure is slow for two reasons: it needs to do synchronization; it 
needs to maintain a more complex structure of map.

but if we look at the whole read path, thrift already defines the read output 
to be List<ColumnOrSuperColumn> so it does not make sense to use a luxury map 
data structure in the interium and finally convert it to a list. on the 
synchronization side, since the return CF is never going to be shared/modified 
by other threads, we know the access is always single thread, so no 
synchronization is needed.

but these 2 features are indeed needed for ColumnFamily in other cases, 
particularly write. so we can provide a different ColumnFamily to 
CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
creates the standard ColumnFamily, but take a provided returnCF, whose cost is 
much cheaper.

the provided patch is for demonstration now, will work further once we agree on 
the general direction. 
CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is provided. 
the main work is to let the FastColumnFamily use an array  for internal 
storage. at first I used binary search to insert new columns in addColumn(), 
but later I found that even this is not necessary, since all calling scenarios 
of ColumnFamily.addColumn() has an invariant that the inserted columns come in 
sorted order (I still have an issue to resolve descending or ascending  now, 
but ascending works). so the current logic is simply to compare the new column 
against the end column in the array, if names not equal, append, if equal, 
reconcile.

slight temporary hacks are made on getTopLevelColumnFamily so we have 2 flavors 
of the method, one accepting a returnCF. but we could definitely think about 
what is the better way to provide this returnCF.


this patch compiles fine, no tests are provided yet. but I tested it in my 
application, and the performance improvement is dramatic: it offers about 50% 
reduction in read time in the 3000-column case.


thanks
Yang


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to