optimizing index sampling for better memory usage

2011-12-28 Thread Radim Kolar
currently j.o.a.c.io.sstable.indexsummary is implemented as ArrayList of KeyPosition (RowPosition key, long offset) i propose to change it to RowPosition keys[] long offsets[] this will lower number of java objects used per entry from 2 (KeyPosition + RowPosition) to 1. For building these ar

Re: index sampling

2011-12-27 Thread Peter Schuller
> on node with 300m rows (small node), it will be 585937 index sample entries > with 512 sampling. lets say 100 bytes per entry this will be 585 MB, bloom > filters are 884 MB. With default sampling 128, sampled entries will use > majority of node memory. Index sampling should be r

index sampling

2011-12-27 Thread Radim Kolar
> That is a good reason for both to be configurable IMO. index sampling is currently configurable only per node, it would be better to have it per Keyspace because we are using OLTP like and OLAP keyspaces in same cluster. OLAP Keyspaces has about 1000x more rows. But its difficult