Thanks very much. Precisely answers my questions. :-) 2010/4/26 Schubert Zhang <zson...@gmail.com>
> Please refer the code: > > org.apache.cassandra.db.ColumnFamilyStore > > public String getFlushPath() > { > long guessedSize = 2 * DatabaseDescriptor.getMemtableThroughput() * > 1024*1024; // 2* adds room for keys, column indexes > String location = > DatabaseDescriptor.getDataFileLocationForTable(table_, guessedSize); > if (location == null) > throw new RuntimeException("Insufficient disk space to flush"); > return new File(location, > getTempSSTableFileName()).getAbsolutePath(); > } > > and we can go through org.apache.cassandra.config.DatabaseDescriptor: > > public static String getDataFileLocationForTable(String table, long > expectedCompactedFileSize) > { > long maxFreeDisk = 0; > int maxDiskIndex = 0; > String dataFileDirectory = null; > String[] dataDirectoryForTable = > getAllDataFileLocationsForTable(table); > > for ( int i = 0 ; i < dataDirectoryForTable.length ; i++ ) > { > File f = new File(dataDirectoryForTable[i]); > if( maxFreeDisk < f.getUsableSpace()) > { > maxFreeDisk = f.getUsableSpace(); > maxDiskIndex = i; > } > } > // Load factor of 0.9 we do not want to use the entire disk that is > too risky. > maxFreeDisk = (long)(0.9 * maxFreeDisk); > if( expectedCompactedFileSize < maxFreeDisk ) > { > dataFileDirectory = dataDirectoryForTable[maxDiskIndex]; > currentIndex = (maxDiskIndex + 1 )%dataDirectoryForTable.length ; > } > else > { > currentIndex = maxDiskIndex; > } > return dataFileDirectory; > } > > So, DataFileDirectories means multiple disks or disk-partitions. > I think your storage01, storage02 and storage03 are in same disk or disk > partition. > > > 2010/4/26 Roland Hänel <rol...@haenel.me> > > I have a configuration like this: >> >> <DataFileDirectories> >> <DataFileDirectory>/storage01/cassandra/data</DataFileDirectory> >> <DataFileDirectory>/storage02/cassandra/data</DataFileDirectory> >> <DataFileDirectory>/storage03/cassandra/data</DataFileDirectory> >> </DataFileDirectories> >> >> After loading a big chunk of data into cassandra, I end up wich some 70GB >> in the first directory, and only about 10GB in the second and third one. All >> rows are quite small, so it's not just some big rows that contain the >> majority of data. >> >> Does Cassandra have the ability to 'see' the maximum available space in >> these directory? I'm asking myself this question since my limit is 100GB, >> and the first directory is approaching this limit... >> >> And, wouldn't it be better if Cassandra tried to 'load-balance' the files >> inside the directories because this will result in better (read) performance >> if the directories are on different disks (which is the case for me)? >> >> Any help is appreciated. >> >> Roland >> >> >