RE: Effective allocation of multiple disks

2010-03-10 Thread Stu Hood
You can list multiple DataFileDirectories, and Cassandra will scatter files 
across all of them. Use 1 disk for the commitlog, and 3 disks for data 
directories.

See http://wiki.apache.org/cassandra/CassandraHardware#Disk

Thanks,
Stu

-Original Message-
From: Eric Rosenberry epros...@gmail.com
Sent: Wednesday, March 10, 2010 2:00am
To: cassandra-user@incubator.apache.org
Subject: Effective allocation of multiple disks

Based on the documentation, it is clear that with Cassandra you want to have
one disk for commitlog, and one disk for data.

My question is: If you think your workload is going to require more io
performance to the data disks than a single disk can handle, how would you
recommend effectively utilizing additional disks?

It would seem a number of vendors sell 1U boxes with four 3.5 inch disks.
 If we use one for commitlog, is there a way to have Cassandra itself
equally split data across the three remaining disks?  Or is this something
that needs to be handled by the hardware level, or operating system/file
system level?

Options include a hardware RAID controller in a RAID 0 stripe (this is more
$$$ and for what gain?), or utilizing a volume manager like LVM.

Along those same lines, if you do implement some type of striping, what RAID
stripe size is recommended?  (I think Todd Burruss asked this earlier but I
did not see a response)

Thanks for any input!

-Eric




RE: CassandraHardware link on the wiki FrontPage

2010-03-10 Thread Stu Hood
Anyone can edit any page once they have an account: click the Login link at 
the top right next to the search box to create an account.

Thanks,
Stu

-Original Message-
From: Eric Rosenberry e...@rosenberry.org
Sent: Wednesday, March 10, 2010 2:52am
To: cassandra-user@incubator.apache.org
Subject: CassandraHardware link on the wiki FrontPage

Would it be possible to add a link to the CassandraHardware page from the
FrontPage of the wiki?

I think other new folks to Cassandra may find it useful.  ;-)

(I would do it myself, though that page is Immutable)

http://wiki.apache.org/cassandra/FrontPage

http://wiki.apache.org/cassandra/CassandraHardware

Thanks!

-Eric




Re: Effective allocation of multiple disks

2010-03-10 Thread Stu Hood
Yea, I suppose major compactions are the wildcard here. Nonetheless, the 
situation where you only have 1 SSTable should be very rare.

I'll open a ticket though, because we really ought to be able to utilize those 
disks more thoroughly, and I have some ideas there.


-Original Message-
From: Anthony Molinaro antho...@alumni.caltech.edu
Sent: Wednesday, March 10, 2010 3:38pm
To: cassandra-user@incubator.apache.org
Subject: Re: Effective allocation of multiple disks

This is incorrect, as discussed a few weeks ago.  I have a setup with multiple
disks, and as soon as compaction occurs all the data ends up on one disk.  If
you need the additional io, you will want raid0.  But simply listing multiple
DataFileDirectories will not work.

-Anthony

On Wed, Mar 10, 2010 at 02:08:13AM -0600, Stu Hood wrote:
 You can list multiple DataFileDirectories, and Cassandra will scatter files 
 across all of them. Use 1 disk for the commitlog, and 3 disks for data 
 directories.
 
 See http://wiki.apache.org/cassandra/CassandraHardware#Disk
 
 Thanks,
 Stu
 
 -Original Message-
 From: Eric Rosenberry epros...@gmail.com
 Sent: Wednesday, March 10, 2010 2:00am
 To: cassandra-user@incubator.apache.org
 Subject: Effective allocation of multiple disks
 
 Based on the documentation, it is clear that with Cassandra you want to have
 one disk for commitlog, and one disk for data.
 
 My question is: If you think your workload is going to require more io
 performance to the data disks than a single disk can handle, how would you
 recommend effectively utilizing additional disks?
 
 It would seem a number of vendors sell 1U boxes with four 3.5 inch disks.
  If we use one for commitlog, is there a way to have Cassandra itself
 equally split data across the three remaining disks?  Or is this something
 that needs to be handled by the hardware level, or operating system/file
 system level?
 
 Options include a hardware RAID controller in a RAID 0 stripe (this is more
 $$$ and for what gain?), or utilizing a volume manager like LVM.
 
 Along those same lines, if you do implement some type of striping, what RAID
 stripe size is recommended?  (I think Todd Burruss asked this earlier but I
 did not see a response)
 
 Thanks for any input!
 
 -Eric
 
 

-- 

Anthony Molinaro   antho...@alumni.caltech.edu




Re: Hackathon?!?

2010-03-09 Thread Stu Hood
Definitely on board!

-Original Message-
From: Dan Di Spaltro dan.dispal...@gmail.com
Sent: Tuesday, March 9, 2010 8:05pm
To: cassandra-user@incubator.apache.org
Subject: Re: Hackathon?!?

Alright guys, we have settled on a date for the Cassandra meetup on...

April 15th, better known as, Tax day!

We can host it here at Cloudkick, unless a cooler startup wants to host it.
http://maps.google.com/maps/ms?ie=UTF8hl=enmsa=0msid=100290781618196563860.000478354937656785449z=19
http://maps.google.com/maps/ms?ie=UTF8hl=enmsa=0msid=100290781618196563860.000478354937656785449z=191499
Potrero Ave San Francisco CA 94110

Bottom line, it would be great to get some folks together and spend some
time doing an intro, cover some deployments, data models and try to address
all the other burning questions out there.

We pushed it out from PyCON and hopefully settled on a good day, lets get a
count for how many folks are interested!

Thanks,

On Tue, Feb 9, 2010 at 3:10 PM, Reuben Smith reuben.sm...@gmail.com wrote:

 I live in the city and I'd like to add my vote for an Intro to
 Cassandra night.

 Reuben

 On Tue, Feb 9, 2010 at 10:43 AM, Dan Di Spaltro dan.dispal...@gmail.com
 wrote:
  I think the tentative plans would be to push this out a bit farther
  away from PyCon, to get a bigger attendance.
 
  It sounds like an Intro to Cassandra would be a better theme; focus
  on the education piece.
 
  But it will happen! So stay tuned.
 
  On Tue, Feb 9, 2010 at 3:53 AM, Wayne Lewis wa...@lewisclan.org wrote:
 
  Hi Dan,
 
  Are you still planning for end of Feb?
 
  Please add me to the very interested list.
 
  Thanks!
  Wayne Lewis
 
 
  On Jan 26, 2010, at 8:42 PM, Dan Di Spaltro wrote:
 
  Would anyone be interested in a Cassandra hack-a-thon at the end of
  February in San Francisco?
 
  I think it would be great to get everyone together, since the last
  hack-a-thon was at the Twitter office back around OSCON time.   We
  could provide space in the Mission area or someone else could too, our
  office is in a pretty interesting area
 
  (
 http://maps.google.com/maps/ms?ie=UTF8hl=enmsa=0msid=100290781618196563860.000478354937656785449z=17
 ).
 
  Tell me what you guys think!
 
  --
  Dan Di Spaltro
 
 
 
 
 
  --
  Dan Di Spaltro
 




-- 
Dan Di Spaltro




RE: Latest check-in to trunk/ is broken

2010-03-08 Thread Stu Hood
Run `ant clean` before building. A few files moved around.

-Original Message-
From: Cool BSD c...@coolbsd.com
Sent: Monday, March 8, 2010 5:18pm
To: cassandra-user cassandra-user@incubator.apache.org
Subject: Latest check-in to trunk/ is broken

version info:
$ svn info
Path: .
URL: https://svn.apache.org/repos/asf/incubator/cassandra/trunk
Repository Root: https://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 920560
Node Kind: directory
Schedule: normal
Last Changed Author: gdusbabek
Last Changed Rev: 920537
Last Changed Date: 2010-03-08 14:00:51 -0800 (Mon, 08 Mar 2010)

and error message:

build-project:
 [echo] apache-cassandra:
/net/f5/shared/nosql/cassandra/archive/svn/build.xml
[javac] Compiling 277 source files to
/net/f5/shared/nosql/cassandra/archive/svn/build/classes
[javac]
/net/f5/shared/nosql/cassandra/archive/svn/src/java/org/apache/cassandra/db/CompactionManager.java:112:
reference to SSTableReader is ambiguous, both class
org.apache.cassandra.io.sstable.SSTableReader in
org.apache.cassandra.io.sstable and class
org.apache.cassandra.io.SSTableReader in org.apache.cassandra.io match
[javac] private void updateEstimateFor(ColumnFamilyStore cfs,
SetListSSTableReader buckets)

[javac]^
[javac]
/net/f5/shared/nosql/cassandra/archive/svn/src/java/org/apache/cassandra/db/CompactionManager.java:138:
reference to SSTableReader is ambiguous, both class
org.apache.cassandra.io.sstable.SSTableReader in
org.apache.cassandra.io.sstable and class
org.apache.cassandra.io.SSTableReader in org.apache.cassandra.io match
[javac] public FutureListSSTableReader
submitAnticompaction(final ColumnFamilyStore cfStore, final
CollectionRange ranges, final InetAddress target)
[javac]^
[javac]
/net/f5/shared/nosql/cassandra/archive/svn/src/java/org/apache/cassandra/db/CompactionManager.java:240:
reference to SSTableReader is ambiguous, both class
org.apache.cassandra.io.sstable.SSTableReader in
org.apache.cassandra.io.sstable and class
org.apache.cassandra.io.SSTableReader in org.apache.cassandra.io match
[javac] int doCompaction(ColumnFamilyStore cfs,
CollectionSSTableReader sstables, int gcBefore) throws IOException
[javac]^
[javac]
/net/f5/shared/nosql/cassandra/archive/svn/src/java/org/apache/cassandra/db/CompactionManager.java:341:
reference to SSTableReader is ambiguous, both class
org.apache.cassandra.io.sstable.SSTableReader in
org.apache.cassandra.io.sstable and class
org.apache.cassandra.io.SSTableReader in org.apache.cassandra.io match
[javac] private ListSSTableReader
doAntiCompaction(ColumnFamilyStore cfs, CollectionSSTableReader sstables,
CollectionRange ranges, InetAddress target)

[javac]
^
[javac]
/net/f5/shared/nosql/cassandra/archive/svn/src/java/org/apache/cassandra/db/CompactionManager.java:341:
reference to SSTableReader is ambiguous, both class
org.apache.cassandra.io.sstable.SSTableReader in
org.apache.cassandra.io.sstable and class
org.apache.cassandra.io.SSTableReader in org.apache.cassandra.io match
[javac] private ListSSTableReader
doAntiCompaction(ColumnFamilyStore cfs, CollectionSSTableReader sstables,
CollectionRange ranges, InetAddress target)
[javac]  ^
[javac]
/net/f5/shared/nosql/cassandra/archive/svn/src/java/org/apache/cassandra/db/CompactionManager.java:451:
reference to SSTableReader is ambiguous, both class
org.apache.cassandra.io.sstable.SSTableReader in
org.apache.cassandra.io.sstable and class
org.apache.cassandra.io.SSTableReader in org.apache.cassandra.io match
[javac] static SetListSSTableReader
getBuckets(IterableSSTableReader files, long min)
[javac] ^
[javac]
/net/f5/shared/nosql/cassandra/archive/svn/src/java/org/apache/cassandra/db/CompactionManager.java:451:
reference to SSTableReader is ambiguous, both class
org.apache.cassandra.io.sstable.SSTableReader in
org.apache.cassandra.io.sstable and class
org.apache.cassandra.io.SSTableReader in org.apache.cassandra.io match
[javac] static SetListSSTableReader
getBuckets(IterableSSTableReader files, long min)
[javac] ^
[javac]
/net/f5/shared/nosql/cassandra/archive/svn/src/java/org/apache/cassandra/db/CompactionManager.java:498:
reference to SSTableScanner is ambiguous, both class
org.apache.cassandra.io.sstable.SSTableScanner in
org.apache.cassandra.io.sstable and class
org.apache.cassandra.io.SSTableScanner in org.apache.cassandra.io match
[javac] private SetSSTableScanner scanners;
[javac] ^
[javac]
/net/f5/shared/nosql/cassandra/archive/svn/src/java/org/apache/cassandra/db/CompactionManager.java:500:
reference to SSTableReader is ambiguous, both class

Re: Dynamically Switching from Ordered Partitioner to Random?

2010-03-05 Thread Stu Hood
But rather than switching, you should definitely try the 'loadbalance' approach 
first, and see whether OrderPP works out for you.

-Original Message-
From: Chris Goffinet goffi...@digg.com
Sent: Friday, March 5, 2010 1:43pm
To: cassandra-user@incubator.apache.org
Subject: Re: Dynamically Switching from Ordered Partitioner to Random?

At this time, you have to re-import the data.

-Chris

On Fri, Mar 5, 2010 at 11:42 AM, shiv shivaji shivaji...@yahoo.com wrote:

 I started with the ordered partitioner as I was hoping to make use of the
 map-reduce functionality. However, my data was likely lopped onto 2 key
 machines with most of it on one (as seen from another thread. There were
 also machine failures to blame for the uneven distribution). One solution
 which I am trying is to load balance. Is there any other thing I can try to
 convert the partitioner to random on a live system?

 I know this sounds like an odd request. Curious about my options though. I
 did see a post mentioning that one can compute the md5 hash of each key and
 then insert using that and have a mapping table from key to md5 hash.
 Unfortunately, the data is already loaded using an ordered partitioner and I
 was wondering if there is a way to switch to random now.

 Shiv




-- 
Chris Goffinet




Re: Connect during bootstrapping?

2010-03-02 Thread Stu Hood
You are probably in the portion of bootstrap where data to be transferred is 
split out to disk, which can take a while: see 
https://issues.apache.org/jira/browse/CASSANDRA-579

Look for a 'streaming' subdirectory in your data directories to confirm.

-Original Message-
From: Brian Frank Cooper coop...@yahoo-inc.com
Sent: Tuesday, March 2, 2010 11:50pm
To: cassandra-user@incubator.apache.org cassandra-user@incubator.apache.org
Subject: Re: Connect during bootstrapping?

Thanks for the note.

Can you help me with something else? I can't seem to get any data to transfer 
during bootstrapping...I must be doing something wrong.

Here is what I did: I took 0.6.0-beta2, loaded 2 machines with 60-70GB each. 
Then I started a third node, with AutoBootstrap true. The node claims it is 
bootstrapping:

INFO - Auto DiskAccessMode determined to be mmap
INFO - Saved Token not found. Using Rb0mePN3PheW3haA
INFO - Creating new commitlog segment 
/home/cooperb/cassandra/commitlog/CommitLog-1267594407761.log
INFO - Starting up server gossip
INFO - Joining: getting load information
INFO - Sleeping 9 ms to wait for load information...
INFO - Node /98.137.30.37 is now part of the cluster
INFO - Node /98.137.30.38 is now part of the cluster
INFO - InetAddress /98.137.30.37 is now UP
INFO - InetAddress /98.137.30.38 is now UP
INFO - Joining: getting bootstrap token
INFO - New token will be user148315419 to assume load from /98.137.30.38
INFO - Joining: sleeping 3 for pending range setup
INFO - Bootstrapping

But when I run nodetool streams, no streams are transferring:

Mode: Bootstrapping
Not sending any streams.
Not receiving any streams.

And it doesn't look like the node is getting any data. Any ideas?

Thanks for the help...

Brian


On 3/2/10 12:22 PM, Jonathan Ellis jbel...@gmail.com wrote:

On Tue, Mar 2, 2010 at 1:54 PM, Brian Frank Cooper
coop...@yahoo-inc.com wrote:
 Hi folks,

 I'm running 0.5 and I had 2 nodes up and running, then added a 3rd node in
 bootstrap mode. I understand from other discussion list threads that the new
 node doesn't serve reads while it is bootstrapping, but does that mean it
 won't connect at all?

it doesn't start the thrift listener until it is bootstrapped, so yes.

(you can tell when it's bootstrapped by when it appears in nodeprobe
ring.  0.6 also adds bootstrap progress reporting via jmx.)

 When I try to connect from my java client, or
 cassandra-cli, I get the exception below. Is it the expected behavior?
 (Also, cassandra-cli says Connected to xxx.yahoo.com even though it isn't
 really connected...)

This is fixed in https://issues.apache.org/jira/browse/CASSANDRA-807
for 0.6, fwiw.

-Jonathan


--
Brian Cooper
Principal Research Scientist
Yahoo! Research





Re: Is Cassandra a document based DB?

2010-03-01 Thread Stu Hood
 In HBase you have table:row:family:key:val:version, which some people
 might consider richer
Cassandra is actually table:family:row:key:val[:subval], where subvals are the 
columns stored in a supercolumn (which can be easily arranged by timestamp to 
give the versioned approach).


-Original Message-
From: Erik Holstad erikhols...@gmail.com
Sent: Monday, March 1, 2010 3:49pm
To: cassandra-user@incubator.apache.org
Subject: Re: Is Cassandra a document based DB?

On Mon, Mar 1, 2010 at 4:41 AM, Brandon Williams dri...@gmail.com wrote:

 On Mon, Mar 1, 2010 at 5:34 AM, HHB hubaghd...@yahoo.ca wrote:


 What are the advantages/disadvantages of Cassandra over HBase?


 Ease of setup: all nodes are the same.

 No single point of failure: all nodes are the same.

 Speed: http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf

 Richer model: supercolumns.

I think that there are people that would be of a different opinion here.
Cassandra has
as I've understood it table:key:name:val and in cases the val is a
serialized data structure.
In HBase you have table:row:family:key:val:version, which some people might
consider
richer.


 Multi-datacenter awareness.

 There are likely other things I'm forgetting, but those stand out for me.

 -Brandon




-- 
Regards Erik




Re: StackOverflowError on high load

2010-02-21 Thread Stu Hood
Ran,

There are bounds to how large your data directory will grow, relative to the 
actual data. Please read up on compaction: 
http://wiki.apache.org/cassandra/MemtableSSTable , and if you have a 
significant number of deletes occuring, also read 
http://wiki.apache.org/cassandra/DistributedDeletes

The key mitigation is to ensure that minor compactions get a chance to occur 
regularly. This will happen automatically, but the faster you write data to 
your nodes, the more behind on compactions they can get. We consider this a 
bug, and CASSANDRA-685 will be exploring solutions so that your client 
automatically backs off as a node becomes overloaded.

Thanks,
Stu

-Original Message-
From: Ran Tavory ran...@gmail.com
Sent: Sunday, February 21, 2010 9:01am
To: cassandra-user@incubator.apache.org
Subject: Re: StackOverflowError on high load

This sort of explain this, yes, but what solution can I use?
I do see the OPP writes go faster than the RP, so this makes sense that when
using the OPP there's higher chance that a host will fall behind with
compaction and eventually crash. It's not a nice feature, but hopefully
there are mitigations to this.
So my question is - what are the mitigations? What should I tell my admin to
do in order to prevent this? Telling him increase the directory size 2x
isn't going to cut it as the directory just keeps growing and is not
bound...
I'm also no clear whether CASSANDRA-804 is going to be a real fix.
Thanks

On Sat, Feb 20, 2010 at 9:36 PM, Jonathan Ellis jbel...@gmail.com wrote:

 if OPP is configured w/ imbalanced ranges (or less balanced than RP)
 then that would explain it.

 OPP is actually slightly faster in terms of raw speed.

 On Sat, Feb 20, 2010 at 2:31 PM, Ran Tavory ran...@gmail.com wrote:
  interestingly, I ran the same load but this time with a random
 partitioner
  and, although from time to time test2 was a little behind with its
  compaction task, it did not crash and was able to eventually close the
 gaps
  that were opened.
  Does this make sense? Is there a reason why random partitioner is less
  likely to be faulty in this scenario? The scenario is of about 1300
  writes/sec of small amounts of data to a single CF on a cluster with two
  nodes and no replication. With the order-preserving-partitioner after a
 few
  hours of load the compaction pool is behind on one of the hosts and
  eventually this host crashes, but with the random partitioner it doesn't
  crash.
  thanks
 
  On Sat, Feb 20, 2010 at 6:27 AM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  looks like test1 started gc storming, so test2 treats it as dead and
  starts doing hinted handoff for it, which increases test2's load, even
  though test1 is not completely dead yet.
 
  On Thu, Feb 18, 2010 at 1:16 AM, Ran Tavory ran...@gmail.com wrote:
   I found another interesting graph, attached.
   I looked at the write-count and write-latency of the CF I'm writing to
   and I
   see a few interesting things:
   1. the host test2 crashed at 18:00
   2. At 16:00, after a few hours of load both hosts dropped their
   write-count.
   test1 (which did not crash) started slowing down first and then test2
   slowed.
   3. At 16:00 I start seeing high write-latency on test2 only. This
 takes
   about 2h until finally at 18:00 it crashes.
   Does this help?
  
   On Thu, Feb 18, 2010 at 7:44 AM, Ran Tavory ran...@gmail.com wrote:
  
   I ran the process again and after a few hours the same node crashed
 the
   same way. Now I can tell for sure this is indeed what Jonathan
 proposed
   -
   the data directory needs to be 2x of what it is, but it looks like a
   design
   problem, how large to I need to tell my admin to set it then?
   Here's what I see when the server crashes:
   $ df -h /outbrain/cassandra/data/
   FilesystemSize  Used Avail Use% Mounted on
   /dev/mapper/cassandra-data
  97G   46G   47G  50% /outbrain/cassandra/data
   The directory is 97G and when the host crashes it's at 50% use.
   I'm also monitoring various JMX counters and I see that
 COMPACTION-POOL
   PendingTasks grows for a while on this host (not on the other host,
   btw,
   which is fine, just this host) and then flats for 3 hours. After 3
   hours of
   flat it crashes. I'm attaching the graph.
   When I restart cassandra on this host (not changed file allocation
   size,
   just restart) it does manage to compact the data files pretty fast,
 so
   after
   a minute I get 12% use, so I wonder what made it crash before that
   doesn't
   now? (could be the load that's not running now)
   $ df -h /outbrain/cassandra/data/
   FilesystemSize  Used Avail Use% Mounted on
   /dev/mapper/cassandra-data
  97G   11G   82G  12% /outbrain/cassandra/data
   The question is what size does the data directory need to be? It's
 not
   2x
   the size of the data I expect to have (I only have 11G of real data
   after
   compaction and the dir is 97G, so it 

Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Stu Hood
 After I ran nodeprobe compact on node B its read latency went up to 150ms.
The compaction process can take a while to finish... in 0.5 you need to watch 
the logs to figure out when it has actually finished, and then you should start 
seeing the improvement in read latency.

 Is there any way to utilize all of the heap space to decrease the read 
 latency?
In 0.5 you can adjust the number of keys that are cached by changing the 
'KeysCachedFraction' parameter in your config file. In 0.6 you can additionally 
cache rows. You don't want to use up all of the memory on your box for those 
caches though: you'll want to leave at least 50% for your OS's disk cache, 
which will store the full row content.


-Original Message-
From: Weijun Li weiju...@gmail.com
Sent: Tuesday, February 16, 2010 12:16pm
To: cassandra-user@incubator.apache.org
Subject: Re: Cassandra benchmark shows OK throughput but high read latency ( 
100ms)?

Thanks for for DataFileDirectory trick and I'll give a try.

Just noticed the impact of number of data files: node A has 13 data files
with read latency of 20ms and node B has 27 files with read latency of 60ms.
After I ran nodeprobe compact on node B its read latency went up to 150ms.
The read latency of node A became as low as 10ms. Is this normal behavior?
I'm using random partitioner and the hardware/JVM settings are exactly the
same for these two nodes.

Another problem is that Java heap usage is always 900mb out of 6GB? Is there
any way to utilize all of the heap space to decrease the read latency?

-Weijun

On Tue, Feb 16, 2010 at 10:01 AM, Brandon Williams dri...@gmail.com wrote:

 On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li weiju...@gmail.com wrote:

 One more thoughts about Martin's suggestion: is it possible to put the
 data files into multiple directories that are located in different physical
 disks? This should help to improve the i/o bottleneck issue.


 Yes, you can already do this, just add more DataFileDirectory directives
 pointed at multiple drives.


 Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?


 Row cache and key cache both help tremendously if your read pattern has a
 decent repeat rate.  Completely random io can only be so fast, however.

 -Brandon





Re: TimeOutExceptions and Cluster Performance

2010-02-13 Thread Stu Hood
The combination of 'too many open files' and lots of memtable flushes could 
mean you have tons and tons of sstables on disk. This can make reads especially 
slow.

If you are seeing the timeouts on reads a lot more often than on writes, then 
this explanation might make sense, and you should watch 
https://issues.apache.org/jira/browse/CASSANDRA-685.

Thanks,
Stu

-Original Message-
From: Jonathan Ellis jbel...@gmail.com
Sent: Friday, February 12, 2010 9:43pm
To: cassandra-user@incubator.apache.org
Subject: Re: TimeOutExceptions and Cluster Performance

There's a lot more details that would be useful, but if you are on the
verge of OOMing and something actually running out, then that's
probably the culprit; when the JVM gets low on ram it will consume all
your CPU trying to GC enough to continue.  (you mentioned seeing high
cpu on one core which tends to corroborate this; to confirm you can
look at the thread using the CPU:
http://publib.boulder.ibm.com/infocenter/javasdk/tools/index.jsp?topic=/com.ibm.java.doc.igaa/_1vg0001475cb4a-1190e2e0f74-8000_1007.html)

Look at your executor queues, in the output of nodeprobe tpstats if
you have no other metrics system.  You probably are just swamping it
with writes, if you have 1000s of ops in any of the pending queues,
that's bad.

-Jonathan

On Fri, Feb 12, 2010 at 7:40 PM, Stephen Hamer stephen.ha...@gmail.com wrote:
 Hi,
 I'm running a 5 node Cassandra cluster and am having a very tough time
 getting reasonable performance from it. Many of the requests are failing
 with TimeOutException. This is making it difficult to use Cassandra in a
 production setting.

 The cluster was running fine for a week or two (it was created 3 weeks ago)
 but has started to degrade in the last week. The cluster was originally only
 3 nodes but when performance started to degrade I added another two nodes.
 This doesn't seem to have helped though.

 Requests being made from the my application are being balanced across the
 cluster in a round robin fashion. Many of these requests are failing with
 TimeOutException. When the occurs I can look at the DB servers and several
 of them fully utilizing 1 core. I can turn off my application when this is
 going on (which stops all reads and writes to Cassandra). The cluster seems
 to stay in this state for another several hour before returning to a resting
 state.

 When the CPU is loaded I see lots of messages about en-queuing, sorting, and
 writing memtables so I have tried adjusting the memtable size down to 16MB
 and raised the MemtableFlushAfterMinutes to 1440. This doesn't seem to have
 affected anything though.

 I was seeing errors about too many file descriptors being open so I added
 “ulimit –n 32768” to Cassandra.in.sh. This seems to fixed this. I was also
 seeing lots of out of memory exceptions so I raised the heap size to 4GB.
 This has helped but not eliminated the OOM issues.

 I'm not sure if it's related to any of the performance issues but I see lots
 of log entries about DigestMismatchExceptions. I've included a sample of the
 exceptions below.

 My Cassandra cluster is almost unusable in its current state because of the
 number of timeout exceptions that I'm seeing. I suspect that this is because
 of a configuration or I have improperly set something up. It feels like the
 database has entered a bad state which is causing it to churn as much as it
 is but have no way to verify this.

 What steps can I take to address the performance issues I am seeing and the
 consistent stream of TimeOutExceptions?

 Thanks,
 Stephen


 Here are some specifics about the cluster configuration:

 5 Large EC2 instances - 7.5 GB ram, 2 cores (64bit, 1-1.2Ghz), data and
 commit logs stored on separate EBS volumes. Boxes are running Debian 5.

 r...@prod-cassandra4 ~/cassandra # bin/nodeprobe -host localhost ring
 Address       Status     Load          Range
      Ring


 101279862673517536112907910111793343978
 10.254.55.191 Up         2.94 GB       27246729060092122727944947571993545
      |--|
 10.214.119.127Up         3.67 GB
 34209800341332764076889844611182786881     |   ^
 10.215.122.208Up         11.86 GB
  42649376116143870288751410571644302377     v   |
 10.215.30.47  Up         6.37 GB
 81374929113514034361049243620869663203     |   ^
 10.208.246.160Up         5.15 GB
 101279862673517536112907910111793343978    |--|


 I am running the 0.5 release of Cassandra (at commit 44e8c2e...). Here are
 some of my configuration options:

 Memory, disk, performance section of storage-conf.xml (I've only included
 options that I've changed from the defaults):
 Partitionerorg.apache.cassandra.dht.RandomPartitioner/Partitioner
 ReplicationFactor3/ReplicationFactor

 SlicedBufferSizeInKB512/SlicedBufferSizeInKB
 FlushDataBufferSizeInMB64/FlushDataBufferSizeInMB
 FlushIndexBufferSizeInMB16/FlushIndexBufferSizeInMB
 ColumnIndexSizeInKB64/ColumnIndexSizeInKB
 MemtableSizeInMB16/MemtableSizeInMB
 

Re: OOM Exception

2009-12-13 Thread Stu Hood
PS: If this turns out to actually be the problem, I'll open a ticket for it.

Thanks,
Stu

-Original Message-
From: Stu Hood stuart.h...@rackspace.com
Sent: Sunday, December 13, 2009 12:28pm
To: cassandra-user@incubator.apache.org
Subject: Re: OOM Exception

With 248G per box, you probably have slightly more than 1/2 billion items?

One current implementation detail in Cassandra is that it loads 128th of the 
index into memory for faster lookups. This means you might have something like 
4.5 million keys in memory at the moment.

The '128' value is a constant at SSTable.INDEX_INTERVAL. You should be able to 
recompile with '1024' to allow for an 8 times larger database, but understand 
that this will have a negative effect on your read performance.

Thanks,
Stu

-Original Message-
From: Dan Di Spaltro dan.dispal...@gmail.com
Sent: Sunday, December 13, 2009 12:06pm
To: cassandra-user@incubator.apache.org
Subject: Re: OOM Exception

What consistencyLevel are you inserting the elements?  If you do
./bin/nodeprobe -host localhost tpstats on each machine do you see one
metric that has a lot of pending items?

On Sun, Dec 13, 2009 at 8:14 AM, Brian Burruss bburr...@real.com wrote:

 another OOM exception.  the only thing interesting about my testing is that
 there are 2 servers, RF=2, W=1, R=1 ... there is 248G of data on each
 server.  I have -Xmx3G assigned to each server

 2009-12-12 22:04:37,436 ERROR [pool-1-thread-309] [Cassandra.java:734]
 Internal error processing get
 java.lang.RuntimeException: java.util.concurrent.ExecutionException:
 java.lang.OutOfMemoryError: Java heap space
at
 org.apache.cassandra.service.StorageProxy.weakReadLocal(StorageProxy.java:523)
at
 org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:373)
at
 org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:92)
at
 org.apache.cassandra.service.CassandraServer.multigetColumns(CassandraServer.java:265)
at
 org.apache.cassandra.service.CassandraServer.multigetInternal(CassandraServer.java:320)
at
 org.apache.cassandra.service.CassandraServer.get(CassandraServer.java:253)
at
 org.apache.cassandra.service.Cassandra$Processor$get.process(Cassandra.java:724)
at
 org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:712)
at
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

 
 From: Brian Burruss
 Sent: Saturday, December 12, 2009 7:45 AM
 To: cassandra-user@incubator.apache.org
 Subject: OOM Exception

 this happened after cassandra was running for a couple of days.  I have
 -Xmx3G on JVM.

 is there any other info you need so this makes sense?

 thx!


 2009-12-11 21:38:37,216 ERROR [HINTED-HANDOFF-POOL:1]
 [DebuggableThreadPoolExecutor.java:157] Error in ThreadPoolExecutor
 java.lang.OutOfMemoryError: Java heap space
at
 org.apache.cassandra.io.BufferedRandomAccessFile.init(BufferedRandomAccessFile.java:151)
at
 org.apache.cassandra.io.BufferedRandomAccessFile.init(BufferedRandomAccessFile.java:144)
at
 org.apache.cassandra.io.SSTableWriter.init(SSTableWriter.java:53)
at
 org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:911)
at
 org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:855)
at
 org.apache.cassandra.db.ColumnFamilyStore.doMajorCompactionInternal(ColumnFamilyStore.java:698)
at
 org.apache.cassandra.db.ColumnFamilyStore.doMajorCompaction(ColumnFamilyStore.java:670)
at
 org.apache.cassandra.db.HintedHandOffManager.deliverAllHints(HintedHandOffManager.java:190)
at
 org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:75)
at
 org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:249)
at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)




-- 
Dan Di Spaltro






Re: quorum / hinted handoff

2009-11-20 Thread Stu Hood
You need a quorum relative to your replication factor. You mentioned in the 
first e-mail that you have RF=2, so you need a quorum of 2. If you use RF=3, 
then you need a quorum of 2 as well.

-Original Message-
From: B. Todd Burruss bburr...@real.com
Sent: Friday, November 20, 2009 4:14pm
To: cassandra-user@incubator.apache.org
Subject: Re: quorum / hinted handoff

not really.  it seems that if i start with 3 nodes, remove 1 of them, i
should still have a quorum, which is 2.  this is not what i experience.

On Fri, 2009-11-20 at 16:03 -0600, Jonathan Ellis wrote:
 Oh, okay.  Then it's working as expected.
 
 Does it make more sense to you now? :)
 
 -Jonathan
 
 On Fri, Nov 20, 2009 at 3:43 PM, B. Todd Burruss bburr...@real.com wrote:
  this was on the build i got yesterday, 882359.
 
  ... and you are correct about if you start with 2 nodes and take one
  down - there isn't a quorum and the write/read fails.  i tested that as
  well.
 
  thx!
 
 
  On Fri, 2009-11-20 at 15:30 -0600, Jonathan Ellis wrote:
  On Fri, Nov 20, 2009 at 11:31 AM, B. Todd Burruss bburr...@real.com 
  wrote:
   one more point on this .. if i only start a cluster with 2 nodes, and i
   use the same config setup (RF=2, etc) .. it works fine.  it's only when
   i start with the 3 nodes and remove 1.  in fact, i remove the node
   before i do any reads or writes at all, completely fresh database.
 
  That sounds like a bug.  If you have 2 nodes, RF of 2, and take one
  node down then quorum anything should always fail.
 
  Is this on trunk still?
 
  -Jonathan
 
 
 






Re: bandwidth limiting Cassandra's replication and access control

2009-11-11 Thread Stu Hood
Hey Ted,

Would you mind creating a ticket for this issue in JIRA? A lot of discussion 
has gone on, and a place to collect the design and feedback would be a good 
start.

Thanks,
Stu

-Original Message-
From: Ted Zlatanov t...@lifelogs.com
Sent: Wednesday, November 11, 2009 3:28pm
To: cassandra-user@incubator.apache.org
Cc: cassandra-...@incubator.apache.org
Subject: Re: bandwidth limiting Cassandra's replication and access control

On Wed, 11 Nov 2009 07:40:00 -0800 Coe, Robin robin@bluecoat.com wrote: 

CR Just going to chime in here, because I have experience writing apps
CR that use JAAS and JNDI to authenticate against LDAP and JDBC
CR services.  However, I only just started looking at Cassandra this
CR week, so I'm not certain of the premise behind controlling access to
CR the Cassandra service.

CR IMO, auth services should be left to the application layer that
CR interfaces to Cassandra and not built into Cassandra.  In the
CR tutorial snippet included below, the access being granted is at the
CR codebase level, not the transaction level.  Since users of Cassandra
CR will generally be fronted by a service layer, the java security
CR manager isn’t going to suffice.  What this snippet could do, though,
CR and may be the rationale for the request, is to ensure that
CR unauthorized users cannot instantiate a new Cassandra server.
CR However, if a user has physical access to the machine on which
CR Cassandra is installed, they could easily bypass that layer of
CR security.

CR So, I guess I'm wondering whether this discussion pertains to
CR application-layer security, i.e., permission to execute Thrift
CR transactions, or Cassandra service security?  Or is it strictly a
CR utility function, to create a map of users to specific Keyspaces, to
CR simplify the Thrift API?

(note followups to the devel list)

I mentioned I didn't know JAAS so I appreciate any help you can give.
Specifically, I don't know yet what is the difference between the
codebase level and the transaction level in JAAS terms.  Can you
explain?

I am interested in controlling the Thrift client API, not the Gossip
replication service.  The authenticating clients will not have physical
access to the machine and all the authentication tokens will have to be
passed over a Thrift login call. How would you use JAAS+JNDI to control
that?  The access point is CassandraServer.java as Jonathan mentioned.

Ted