get_range_slices OOM on CompressionMetadata.readChunkOffsets(..)

2011-10-31 Thread Mick Semb Wever
After an upgrade to cassandra-1.0 any get_range_slices gives me:

java.lang.OutOfMemoryError: Java heap space
at 
org.apache.cassandra.io.compress.CompressionMetadata.readChunkOffsets(CompressionMetadata.java:93)
at 
org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:66)
at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.metadata(CompressedRandomAccessReader.java:53)
at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:63)
at 
org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:896)
at 
org.apache.cassandra.io.sstable.SSTableScanner.init(SSTableScanner.java:72)
at 
org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:748)
at 
org.apache.cassandra.db.RowIteratorFactory.getIterator(RowIteratorFactory.java:88)
at 
org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1310)
at 
org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:840)
at 
org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:698)


I set chunk_length_kb to 16 as my rows are very skinny (typically 100b)

Any way around this?

~mck

-- 
Physics is the universe's operating system. Steven R Garman 

| http://semb.wever.org | http://sesat.no |
| http://tech.finn.no   | Java XSS Filter |



signature.asc
Description: This is a digitally signed message part


Re: get_range_slices OOM on CompressionMetadata.readChunkOffsets(..)

2011-10-31 Thread Mick Semb Wever
On Mon, 2011-10-31 at 08:00 +0100, Mick Semb Wever wrote:
 After an upgrade to cassandra-1.0 any get_range_slices gives me:
 
 java.lang.OutOfMemoryError: Java heap space
   at 
 org.apache.cassandra.io.compress.CompressionMetadata.readChunkOffsets(CompressionMetadata.java:93)
   at 
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:66)
   at 
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.metadata(CompressedRandomAccessReader.java:53)
   at 
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:63)
   at 
 org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:896)
   at 
 org.apache.cassandra.io.sstable.SSTableScanner.init(SSTableScanner.java:72)
   at 
 org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:748)
   at 
 org.apache.cassandra.db.RowIteratorFactory.getIterator(RowIteratorFactory.java:88)
   at 
 org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1310)
   at 
 org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:840)
   at 
 org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:698)
 
 
 I set chunk_length_kb to 16 as my rows are very skinny (typically 100b)


I see now this was a bad choice.
The read pattern of these rows is always in bulk so the chunk_length
could have been much higher so to reduce memory usage (my largest
sstable is 61G).

After changing the ckunk_length is there any way to rebuild just some
sstables rather than having to do a full nodetool scrub ?

~mck

-- 
“An idea is a point of departure and no more. As soon as you elaborate
it, it becomes transformed by thought.” - Pablo Picasso 

| http://semb.wever.org | http://sesat.no |
| http://tech.finn.no   | Java XSS Filter |


signature.asc
Description: This is a digitally signed message part


Re: get_range_slices OOM on CompressionMetadata.readChunkOffsets(..)

2011-10-31 Thread Sylvain Lebresne
On Mon, Oct 31, 2011 at 9:07 AM, Mick Semb Wever m...@apache.org wrote:
 On Mon, 2011-10-31 at 08:00 +0100, Mick Semb Wever wrote:
 After an upgrade to cassandra-1.0 any get_range_slices gives me:

 java.lang.OutOfMemoryError: Java heap space
       at 
 org.apache.cassandra.io.compress.CompressionMetadata.readChunkOffsets(CompressionMetadata.java:93)
       at 
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:66)
       at 
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.metadata(CompressedRandomAccessReader.java:53)
       at 
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:63)
       at 
 org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:896)
       at 
 org.apache.cassandra.io.sstable.SSTableScanner.init(SSTableScanner.java:72)
       at 
 org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:748)
       at 
 org.apache.cassandra.db.RowIteratorFactory.getIterator(RowIteratorFactory.java:88)
       at 
 org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1310)
       at 
 org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:840)
       at 
 org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:698)


 I set chunk_length_kb to 16 as my rows are very skinny (typically 100b)


 I see now this was a bad choice.
 The read pattern of these rows is always in bulk so the chunk_length
 could have been much higher so to reduce memory usage (my largest
 sstable is 61G).

 After changing the ckunk_length is there any way to rebuild just some
 sstables rather than having to do a full nodetool scrub ?

Provided you're using SizeTieredCompaction (i.e, the default), you can
trigger a user defined compaction through JMX on each of the sstable
you want to rebuild. Not necessarily a fun process though. Also note that
you can scrub only an individual column family if that was the question.

--
Sylvain


 ~mck

 --
 “An idea is a point of departure and no more. As soon as you elaborate
 it, it becomes transformed by thought.” - Pablo Picasso

 | http://semb.wever.org | http://sesat.no |
 | http://tech.finn.no   | Java XSS Filter |



Re: get_range_slices OOM on CompressionMetadata.readChunkOffsets(..)

2011-10-31 Thread Mick Semb Wever
On Mon, 2011-10-31 at 10:08 +0100, Sylvain Lebresne wrote:
  I set chunk_length_kb to 16 as my rows are very skinny (typically 100b)
 
 
  I see now this was a bad choice.
  The read pattern of these rows is always in bulk so the chunk_length
  could have been much higher so to reduce memory usage (my largest
  sstable is 61G).
 
  After changing the ckunk_length is there any way to rebuild just some
  sstables rather than having to do a full nodetool scrub ?
 
 Provided you're using SizeTieredCompaction (i.e, the default), you can
 trigger a user defined compaction through JMX on each of the sstable
 you want to rebuild. Not necessarily a fun process though. Also note that
 you can scrub only an individual column family if that was the question. 

Actually this won't work i think.

I presume that scrub or any user defined compaction will still need to
SSTableReader.openDataReader(..) and so will still OOM no matter what...

How the hell am i supposed to re-chunk_length an sstable? :-(

~mck

-- 
We all may have come on different ships, but we’re in the same boat
now. Martin Luther King. Jr. 

| http://semb.wever.org | http://sesat.no |
| http://tech.finn.no   | Java XSS Filter |



signature.asc
Description: This is a digitally signed message part


[RELEASE] Apache Cassandra 1.0.1 released

2011-10-31 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.0.1.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is the first maintenance/bug fix release[1] on the 1.0 series.
It contains a fair amount of fixes so upgrade is encouraged, but please pay
attention to the release notes[2] before doing so. Let us know[3] if you were
to encounter any problem.

Have fun and happy Halloween!


[1]: http://goo.gl/x6eAD (CHANGES.txt)
[2]: http://goo.gl/N3xpE (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: get_range_slices OOM on CompressionMetadata.readChunkOffsets(..)

2011-10-31 Thread Sylvain Lebresne
On Mon, Oct 31, 2011 at 11:35 AM, Mick Semb Wever m...@apache.org wrote:
 On Mon, 2011-10-31 at 10:08 +0100, Sylvain Lebresne wrote:
  I set chunk_length_kb to 16 as my rows are very skinny (typically 100b)
 
 
  I see now this was a bad choice.
  The read pattern of these rows is always in bulk so the chunk_length
  could have been much higher so to reduce memory usage (my largest
  sstable is 61G).
 
  After changing the ckunk_length is there any way to rebuild just some
  sstables rather than having to do a full nodetool scrub ?

 Provided you're using SizeTieredCompaction (i.e, the default), you can
 trigger a user defined compaction through JMX on each of the sstable
 you want to rebuild. Not necessarily a fun process though. Also note that
 you can scrub only an individual column family if that was the question.

 Actually this won't work i think.

 I presume that scrub or any user defined compaction will still need to
 SSTableReader.openDataReader(..) and so will still OOM no matter what...

 How the hell am i supposed to re-chunk_length an sstable? :-(

You could start the node without joining the ring (to make sure it doesn't
get any work), i.e, with -Dcassandra.join_ring=false and giving the jvm
the maximum heap the machine can allow. Hopefully that could be enough
to recompact the sstable without OOMing.


 ~mck

 --
 We all may have come on different ships, but we’re in the same boat
 now. Martin Luther King. Jr.

 | http://semb.wever.org | http://sesat.no |
 | http://tech.finn.no   | Java XSS Filter |




Re: get_range_slices OOM on CompressionMetadata.readChunkOffsets(..)

2011-10-31 Thread Sylvain Lebresne
On Mon, Oct 31, 2011 at 11:41 AM, Mick Semb Wever m...@apache.org wrote:
 On Mon, 2011-10-31 at 10:08 +0100, Sylvain Lebresne wrote:
 you can
 trigger a user defined compaction through JMX on each of the sstable
 you want to rebuild.

 May i ask how?
 Everything i see from NodeProbe to StorageProxy is ks and cf based.

It's exposed through JMX but not nodetool (i.e. NodeProbe). It's in the
CompactionManagerMBean and it's called forceUserDefinedCompaction.
It takes a ks and a comma separated list of path to sstables (but it's fine
with with only one sstable).


 ~mck

 --
 “Anyone who lives within their means suffers from a lack of
 imagination.” - Oscar Wilde

 | http://semb.wever.org | http://sesat.no |
 | http://tech.finn.no   | Java XSS Filter |



Re: OOM on CompressionMetadata.readChunkOffsets(..)

2011-10-31 Thread Mick Semb Wever
On Mon, 2011-10-31 at 09:07 +0100, Mick Semb Wever wrote:
 The read pattern of these rows is always in bulk so the chunk_length
 could have been much higher so to reduce memory usage (my largest
 sstable is 61G). 

Isn't CompressionMetadata.readChunkOffsets(..) rather dangerous here?

Given a 60G sstable, even with 64kb chunk_length, to read just that one
sstable requires close to 8G free heap memory...

Especially when the default for cassandra is 4G heap in total.

~mck

-- 
Anyone who has attended a computer conference in a fancy hotel can tell
you that a sentence like You're one of those computer people, aren't
you? is roughly equivalent to Look, another amazingly mobile form of
slime mold! in the mouth of a hotel cocktail waitress. Elizabeth
Zwicky 

| http://semb.wever.org | http://sesat.no |
| http://tech.finn.no   | Java XSS Filter |


signature.asc
Description: This is a digitally signed message part


Re: OOM on CompressionMetadata.readChunkOffsets(..)

2011-10-31 Thread Mick Semb Wever
On Mon, 2011-10-31 at 13:05 +0100, Mick Semb Wever wrote:
 Given a 60G sstable, even with 64kb chunk_length, to read just that one
 sstable requires close to 8G free heap memory... 

Arg, that calculation was a little off...
 (a long isn't exactly 8K...)

But you get my concern...

~mck

-- 
When you say: I wrote a program that crashed Windows, people just
stare at you blankly and say: Hey, I got those with the system -- for
free. Linus Torvalds 

| http://semb.wever.org | http://sesat.no |
| http://tech.finn.no   | Java XSS Filter |


signature.asc
Description: This is a digitally signed message part


Re: Newbie question - fetching multiple columns of different datatypes and conversion from byte[]

2011-10-31 Thread Ertio Lew
Should the different datatype col values or names be first read as byte 
buffer  then converted to appropriate type using Hector's provided 
serializers api like the way shown below ?

ByteBuffer bb;
..

String s= StringSerializer.get().fromByteBuffer(bb);


Or are there any better ways ?


Re: get_range_slices OOM on CompressionMetadata.readChunkOffsets(..)

2011-10-31 Thread Jonathan Ellis
Cleanup would have the same effect I think, in exchange for a minor
amount of extra CPU used.

On Mon, Oct 31, 2011 at 4:08 AM, Sylvain Lebresne sylv...@datastax.com wrote:
 On Mon, Oct 31, 2011 at 9:07 AM, Mick Semb Wever m...@apache.org wrote:
 On Mon, 2011-10-31 at 08:00 +0100, Mick Semb Wever wrote:
 After an upgrade to cassandra-1.0 any get_range_slices gives me:

 java.lang.OutOfMemoryError: Java heap space
       at 
 org.apache.cassandra.io.compress.CompressionMetadata.readChunkOffsets(CompressionMetadata.java:93)
       at 
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:66)
       at 
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.metadata(CompressedRandomAccessReader.java:53)
       at 
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:63)
       at 
 org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:896)
       at 
 org.apache.cassandra.io.sstable.SSTableScanner.init(SSTableScanner.java:72)
       at 
 org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:748)
       at 
 org.apache.cassandra.db.RowIteratorFactory.getIterator(RowIteratorFactory.java:88)
       at 
 org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1310)
       at 
 org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:840)
       at 
 org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:698)


 I set chunk_length_kb to 16 as my rows are very skinny (typically 100b)


 I see now this was a bad choice.
 The read pattern of these rows is always in bulk so the chunk_length
 could have been much higher so to reduce memory usage (my largest
 sstable is 61G).

 After changing the ckunk_length is there any way to rebuild just some
 sstables rather than having to do a full nodetool scrub ?

 Provided you're using SizeTieredCompaction (i.e, the default), you can
 trigger a user defined compaction through JMX on each of the sstable
 you want to rebuild. Not necessarily a fun process though. Also note that
 you can scrub only an individual column family if that was the question.

 --
 Sylvain


 ~mck

 --
 “An idea is a point of departure and no more. As soon as you elaborate
 it, it becomes transformed by thought.” - Pablo Picasso

 | http://semb.wever.org | http://sesat.no |
 | http://tech.finn.no   | Java XSS Filter |





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: OOM on CompressionMetadata.readChunkOffsets(..)

2011-10-31 Thread Sylvain Lebresne
On Mon, Oct 31, 2011 at 1:10 PM, Mick Semb Wever m...@apache.org wrote:
 On Mon, 2011-10-31 at 13:05 +0100, Mick Semb Wever wrote:
 Given a 60G sstable, even with 64kb chunk_length, to read just that one
 sstable requires close to 8G free heap memory...

 Arg, that calculation was a little off...
  (a long isn't exactly 8K...)

 But you get my concern...

Well, with a long being only 8 bytes, that's 8MB of free heap memory. Without
being negligible, that's not completely crazy to me.

No, the problem is that we create those 8MB for each reads, which *is* crazy
(the fact that we allocate those 8MB in one block is not very nice for
the GC either
but that's another problem).
Anyway, that's really a bug and I've created CASSANDRA-3427 to fix.

--
Sylvain


 ~mck

 --
 When you say: I wrote a program that crashed Windows, people just
 stare at you blankly and say: Hey, I got those with the system -- for
 free. Linus Torvalds

 | http://semb.wever.org | http://sesat.no |
 | http://tech.finn.no   | Java XSS Filter |



Re: OOM on CompressionMetadata.readChunkOffsets(..)

2011-10-31 Thread Sylvain Lebresne
On Mon, Oct 31, 2011 at 2:58 PM, Sylvain Lebresne sylv...@datastax.com wrote:
 On Mon, Oct 31, 2011 at 1:10 PM, Mick Semb Wever m...@apache.org wrote:
 On Mon, 2011-10-31 at 13:05 +0100, Mick Semb Wever wrote:
 Given a 60G sstable, even with 64kb chunk_length, to read just that one
 sstable requires close to 8G free heap memory...

 Arg, that calculation was a little off...
  (a long isn't exactly 8K...)

 But you get my concern...

 Well, with a long being only 8 bytes, that's 8MB of free heap memory. Without
 being negligible, that's not completely crazy to me.

 No, the problem is that we create those 8MB for each reads, which *is* crazy
 (the fact that we allocate those 8MB in one block is not very nice for
 the GC either
 but that's another problem).
 Anyway, that's really a bug and I've created CASSANDRA-3427 to fix.

Note that it's only a problem for range queries.

--
Sylvain


 --
 Sylvain


 ~mck

 --
 When you say: I wrote a program that crashed Windows, people just
 stare at you blankly and say: Hey, I got those with the system -- for
 free. Linus Torvalds

 | http://semb.wever.org | http://sesat.no |
 | http://tech.finn.no   | Java XSS Filter |




Re: Cassandra Cluster Admin - phpMyAdmin for Cassandra

2011-10-31 Thread Ertio Lew
Thanks so much SebWajam  for this great piece of work!

Is there a way to set a data type for displaying the column names/ values
of a CF ? It seems that your project always uses String Serializer for
any piece of data however most of the times in real world cases this is not
true so can we anyhow configure what serializer to use while reading the
data so that the data may be properly identified by your project 
delivered in a readable format ?

On Mon, Aug 22, 2011 at 7:17 AM, SebWajam sebast...@wajam.com wrote:

 Hi,

 I'm working on this project for a few months now and I think it's mature
 enough to post it here:
 Cassandra Cluster Admin on 
 GitHubhttps://github.com/sebgiroux/Cassandra-Cluster-Admin

 Basically, it's a GUI for Cassandra. If you're like me and used MySQL for
 a while (and still using it!), you get used to phpMyAdmin and its simple
 and easy to use user interface. I thought it would be nice to have a
 similar tool for Cassandra and I couldn't find any, so I build my own!

 Supported actions:

- Keyspace manipulation (add/edit/drop)
- Column Family manipulation (add/edit/truncate/drop)
- Row manipulation on column family and super column family
(insert/edit/remove)
- Basic data browser to navigate in the data of a column family (seems
to be the favorite feature so far)
- Support Cassandra 0.8+ atomic counters
- Support management of multiple Cassandra clusters

 Bug report and/or pull request are always welcome!

 --
 View this message in context: Cassandra Cluster Admin - phpMyAdmin for
 Cassandrahttp://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-Cluster-Admin-phpMyAdmin-for-Cassandra-tp6709930p6709930.html
 Sent from the cassandra-u...@incubator.apache.org mailing list 
 archivehttp://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/at 
 Nabble.com.



data model for unique users in a time period

2011-10-31 Thread Ed Anuff
I'm looking at the scenario of how to keep track of the number of
unique visitors within a given time period.  Inserting user ids into a
wide row would allow me to have a list of every user within the time
period that the row represented.  My experience in the past was that
using get_count on a row to get the column count got slow pretty quick
but that might still be the easiest way to get the count of unique
users with some sort of caching of the count so that it's not
expensive subsequently.  Using Hadoop is overkill for this scenario.
Any other approaches?

Ed


Re : Best way to search content in Cassandra

2011-10-31 Thread Laurent Aufrechter
Hello,

One good way to manage such things is to give your columns a name that will 
allow you to make some slices query...

Your column name could be something like:
image-png-other_identifier1
image-gif-other_identifier2


In your slice query, you could do a search for image-png-A to image-png-Z.

Regards.

Laurent



De : Jean-Nicolas Boulay Desjardins jnbdzjn...@gmail.com
À : user@cassandra.apache.org
Envoyé le : Vendredi 28 Octobre 2011 3h10
Objet : Best way to search content in Cassandra

Normally in SQL I would use % operator to get what looks like what I
am searching.

Example:

[...] type = image/%

It would give me all the rows that have a column type with image/ in it.

So those would show up:

image/png
image/gif
...

Is there anything similar with Cassandra?

I am also using Solandra... But I doubt that Solandra is made for that.

Are there any extensions or technics I could use?

Thank you allot in advance for any tips.

Re: data model for unique users in a time period

2011-10-31 Thread Zach Richardson
Ed,

I could be completely wrong about this working--I haven't specifically
looked at how the counts are executed, but I think this makes sense.

You could potentially shard across several rows, based on a hash of
the username combined with the time period as the row key.  Run a
count across each row and then add them up.  If your cluster is large
enough this could spread the computation enough to make each query for
the count a bit faster.

Depending on how often this query would be hit, I would still
recommend caching, but you could calculate reality a little more
often.

Zach


On Mon, Oct 31, 2011 at 12:22 PM, Ed Anuff e...@anuff.com wrote:
 I'm looking at the scenario of how to keep track of the number of
 unique visitors within a given time period.  Inserting user ids into a
 wide row would allow me to have a list of every user within the time
 period that the row represented.  My experience in the past was that
 using get_count on a row to get the column count got slow pretty quick
 but that might still be the easiest way to get the count of unique
 users with some sort of caching of the count so that it's not
 expensive subsequently.  Using Hadoop is overkill for this scenario.
 Any other approaches?

 Ed



Re: data model for unique users in a time period

2011-10-31 Thread Ed Anuff
Thanks, good point, splitting wide rows via sharding is a good
optimization for the get_count approach.

On Mon, Oct 31, 2011 at 10:58 AM, Zach Richardson
j.zach.richard...@gmail.com wrote:
 Ed,

 I could be completely wrong about this working--I haven't specifically
 looked at how the counts are executed, but I think this makes sense.

 You could potentially shard across several rows, based on a hash of
 the username combined with the time period as the row key.  Run a
 count across each row and then add them up.  If your cluster is large
 enough this could spread the computation enough to make each query for
 the count a bit faster.

 Depending on how often this query would be hit, I would still
 recommend caching, but you could calculate reality a little more
 often.

 Zach


 On Mon, Oct 31, 2011 at 12:22 PM, Ed Anuff e...@anuff.com wrote:
 I'm looking at the scenario of how to keep track of the number of
 unique visitors within a given time period.  Inserting user ids into a
 wide row would allow me to have a list of every user within the time
 period that the row represented.  My experience in the past was that
 using get_count on a row to get the column count got slow pretty quick
 but that might still be the easiest way to get the count of unique
 users with some sort of caching of the count so that it's not
 expensive subsequently.  Using Hadoop is overkill for this scenario.
 Any other approaches?

 Ed




[RELEASE] Apache Cassandra 0.7.10 released

2011-10-31 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 0.7.10.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a maintenance/bug fix release[1] and upgrade from previous 0.7
version is highly encouraged. Please always pay attention to the release
notes[2] before upgrading,

If you were to encounter any problem, please let us know[1].

Have fun!


[1]: http://goo.gl/pk6Ku (CHANGES.txt)
[2]: http://goo.gl/Vq6ry (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Very slow writes in Cassandra

2011-10-31 Thread Adrian Cockcroft
You are using replication factor of one and the Lustre clustered
filesystem over the network. Not good practice.

Try RF=3 and local disks. Lustre duplicates much of the functionality
of Cassandra, there is no point using both. Make your Lustre server
nodes into Cassandra nodes instead.

Adrian

On Sun, Oct 30, 2011 at 6:33 PM, Evgeny erepe...@cmcrc.com wrote:
 Hello Cassandra users,

 I'm newbie in NoSQL and Cassandara in particular. At the moment doing some
 benchmarking with Cassandra and experiencing very slow write throughput.

 As it is said, Cassandra can perform hundreds of thousands of inserts per
 second, however I'm not observing this: 1) when I send 100 thousand inserts
 simultaneously via 8 CQL clients, then throughput is ~14470 inserts per
 second.
 2) when I do the same via 8 Thrift clients, then throughput is ~16300 inserts
 per seconds.

 I think Cassandra performance can be improved, but I don't know what to tune.
 Please take a look at the test conditions below and advise something.
 Thank you.

 Tests conditions:

   1. Cassandra cluster is deployed on three machines, each machine has 8
   cores Intel(R) Xeon(R) CPU E5420 @ 2.50GHz, RAM is 16GB,
   network speed is 1000Mb/s.

   2. The data sample is

 set MM[utf8('1:exc_source_algo:2010010500.00:ENTER:0')]['order_id'] =
 '1.0';
 set MM[utf8('1:exc_source_algo:2010010500.00:ENTER:0')]['security'] =
 'AA1';
 set MM[utf8('1:exc_source_algo:2010010500.00:ENTER:0')]['price'] =
 '47.1';
 set MM[utf8('1:exc_source_algo:2010010500.00:ENTER:0')]['volume'] =
 '300.0';
 set MM[utf8('1:exc_source_algo:2010010500.00:ENTER:0')]['se'] = '1';
 set MM[utf8('2:exc_source_algo:2010010500.00:ENTER:0')]['order_id'] =
 '2.0';
 set MM[utf8('2:exc_source_algo:2010010500.00:ENTER:0')]['security'] =
 'AA1';
 set MM[utf8('2:exc_source_algo:2010010500.00:ENTER:0')]['price'] =
 '44.89';
 set MM[utf8('2:exc_source_algo:2010010500.00:ENTER:0')]['volume'] =
 '310.0';
 set MM[utf8('2:exc_source_algo:2010010500.00:ENTER:0')]['se'] = '1';
 set MM[utf8('3:exc_source_algo:2010010500.00:ENTER:0')]['order_id'] =
 '3.0';
 set MM[utf8('3:exc_source_algo:2010010500.00:ENTER:0')]['security'] =
 'AA2';
 set MM[utf8('3:exc_source_algo:2010010500.00:ENTER:0')]['price'] =
 '0.35';

  3. Commit log is written on the local hard drive, the data is written on
  Lustre.

   4. Keyspace description Keyspace: MD: Replication Strategy:
 org.apache.cassandra.locator.NetworkTopologyStrategy Durable Writes: true
 Options: [datacenter1:1]
 Column Families: ColumnFamily: MM Key Validation Class:
 org.apache.cassandra.db.marshal.BytesType Default column value validator:
 org.apache.cassandra.db.marshal.BytesType Columns sorted by:
 org.apache.cassandra.db.marshal.BytesType Row cache size / save period in
 seconds: 0.0/0 Key cache size / save period in seconds:20.0/14400 Memtable
 thresholds: 2.3247/1440/496 (millions of ops/minutes/MB) GC grace
 seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0
 Replicate on write: true Built indexes: []

 Thanks in advance.

 Evgeny.