Re: CompactionExecutor holds 8000+ SSTableReader 6G+ memory

2013-07-01 Thread sulong
These two fields:
CompressedRandomAccessReader.buffer
CompressedRandomAccessReader.compressed

in the queue SSTableReader.dfile.pool consumed those memory. I think the
SSTableReader.dfile is the cache of the SSTable file.


On Sat, Jun 29, 2013 at 1:09 PM, aaron morton aa...@thelastpickle.comwrote:

 Lots of memory are consumed by the SSTableReader's cache

   The file cache is managed by the OS.
 However the SSTableReader will have bloom filters and compression meta
 data, both off heap in 1.2. The Key and Row caches are global so not
 associated with any one SStable.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 28/06/2013, at 6:23 PM, sulong sulong1...@gmail.com wrote:

 Total 100G data per node.


 On Fri, Jun 28, 2013 at 2:14 PM, sulong sulong1...@gmail.com wrote:

 aaron, thanks for your reply. Yes, I do use the Leveled compactions
 strategy, and the SSTable size is 10M. If it happens again, I will try to
 enlarge the sstable size.

 I just wonder why cassandra doesn't limit the SSTableReader's total
 memory usage when compacting. Lots of memory are consumed by the
 SSTableReader's cache. Why not clear these cache first at the beginning of
 compaction?


 On Fri, Jun 28, 2013 at 1:14 PM, aaron morton aa...@thelastpickle.comwrote:

 Are you running the Levelled compactions strategy ?
 If so what is the max SSTable size and what is the total data per node?

  If you are running it try using a larger SSTable size like 32MB

 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 27/06/2013, at 2:02 PM, sulong sulong1...@gmail.com wrote:

 According to  the OpsCenter records, yes,  the compaction was running
 then, 8.5mb /s


 On Thu, Jun 27, 2013 at 9:54 AM, sulong sulong1...@gmail.com wrote:

 version: 1.2.2
 cluster read requests 800/s, write request 22/s
 Sorrry, I don't know whether  the compaction was running then.


 On Thu, Jun 27, 2013 at 1:02 AM, Robert Coli rc...@eventbrite.comwrote:

 On Tue, Jun 25, 2013 at 10:13 PM, sulong sulong1...@gmail.com wrote:
  I have 4 nodes cassandra cluster. Every node has 32G memory, and the
  cassandra jvm uses 8G. The cluster is suffering from gc. Looks like
  CompactionExecutor thread holds too many SSTableReader. See the
 attachement.

 What version of Cassandra?
 What workload?
 Is compaction actually running?

 =Rob










Re: How to do a CAS UPDATE on single column CF?

2013-07-01 Thread Sylvain Lebresne
You're right, there is currently no way to do this since 1) insert can't
have a IF currently and 2) update can't update such table.

We'll fix that: https://issues.apache.org/jira/browse/CASSANDRA-5715

--
Sylvain


On Sat, Jun 29, 2013 at 9:51 PM, Blair Zajac bl...@orcaware.com wrote:

 On 6/24/13 8:23 PM, Blair Zajac wrote:

 How does one do an atomic update in a column family with a single column?

 I have a this CF

 CREATE TABLE schema_migrations (
 version TEXT PRIMARY KEY,
 ) WITH COMPACTION = {'class': 'LeveledCompactionStrategy'};


 Anyone?  Should I raise this on the developer mailing list or open a
 ticket?

 Blair



Re: Cassandra as storage for cache data

2013-07-01 Thread Dmitry Olshansky

Hello,

thanks to all for your answers and comments.

What we've done:
- increased Java heap memory up to 6 Gb
- changed replication factor to 1
- set durable_writes to false
- set memtable_total_space_in_mb to 5000
- set commitlog_total_space_in_mb to 6000

If I understand correctly the last parameter has no matter since we set 
durable_writes to false.


Now the overall performance is much better but still not outstanding. We 
continue observing quite frequent compactions on every node.


According to OpsCenter's graphs Java Heap never grows above 3.5 Gb. So 
there is enough memory to keep memtables. Why they still get flushed to 
disk triggering compactions?


--
Best regards,
Dmitry Olshansky


C* 1.2.5 AssertionError in ColumnSerializer:40

2013-07-01 Thread horschi
Hi,

using C* 1.2.5 I just found a weird AssertionError in our logfiles:

...
 INFO [OptionalTasks:1] 2013-07-01 09:15:43,608 MeteredFlusher.java (line
58) flushing high-traffic column family CFS(Keyspace='Monitoring',
ColumnFamily='cfDateOrderedMessages') (estimated 5242880 bytes)
 INFO [OptionalTasks:1] 2013-07-01 09:15:43,609 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-cfDateOrderedMessages@2147245119(4616888/5242880
serialized/live bytes, 23714 ops)
 INFO [FlushWriter:9] 2013-07-01 09:15:43,610 Memtable.java (line 461)
Writing Memtable-cfDateOrderedMessages@2147245119(4616888/5242880
serialized/live bytes, 23714 ops)
ERROR [FlushWriter:9] 2013-07-01 09:15:44,145 CassandraDaemon.java (line
192) Exception in thread Thread[FlushWriter:9,5,main]
java.lang.AssertionError
at
org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:40)
at
org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:30)
at
org.apache.cassandra.db.OnDiskAtom$Serializer.serializeForSSTable(OnDiskAtom.java:62)
at
org.apache.cassandra.db.ColumnIndex$Builder.add(ColumnIndex.java:181)
at
org.apache.cassandra.db.ColumnIndex$Builder.build(ColumnIndex.java:133)
at
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:185)
at
org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:489)
at
org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:448)
at
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


I looked into the code and it seems to be coming from the following code:

public void serialize(IColumn column, DataOutput dos) throws IOException
{
assert column.name().remaining()  0; // crash
ByteBufferUtil.writeWithShortLength(column.name(), dos);
try
{...


Does anybody have an idea why this is happening? The machine has some
issues with its disks, but flush shouldn't be affected by bad disks, right?
I can rule out that this memtable was filled by a bad commitlog.

Thanks,
Christian


10,000s of column families/keyspaces

2013-07-01 Thread Kirk True
Hi all,

I know it's an old topic, but I want to see if anything's changed on the
number of column families that C* supports, either in 1.2.x or 2.x.

For a number of reasons [1], we'd like to support multi-tenancy via
separate column families. The problem is that there are around 5,000
tenants to support and each one needs a small handful of column families
each.

The last I heard C* supports 'a couple of hundred' column families before
things start to bog down.

What will it take for C* to support 50,000 column families?

I'm about to dive into the code and run some tests, but I was curious about
how to quantify the overhead of a column family. Is the reason performance?
Memory? Does the off-heap work help here?

Thanks,
Kirk

[1] The main three reasons:


   1. ability to wholesale drop data for a given tenant via drop
   keyspace/drop CFs
   2. ability to have divergent schema for each tenant (partially effected
   by DSE Solr integration)
   3. secondary indexes per tenant (given requirement #2)


Re: 10,000s of column families/keyspaces

2013-07-01 Thread Hiller, Dean
We use playorm to do 80,000 virtual column families(a playorm feature though 
the pattern could be copied).  We did find out later and we are working on this 
now that we wanted to map 80,000 virtual CF's into 10 real CF's so leveled 
compaction can run more in parallel though or else we get stuck with single 
threaded LCS at the last tier which can take a while.  We are about to 
map/reduce our dataset into our newest format.

Dean

From: Kirk True kirktrue...@gmail.commailto:kirktrue...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Monday, July 1, 2013 10:19 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: 10,000s of column families/keyspaces

Hi all,

I know it's an old topic, but I want to see if anything's changed on the number 
of column families that C* supports, either in 1.2.x or 2.x.

For a number of reasons [1], we'd like to support multi-tenancy via separate 
column families. The problem is that there are around 5,000 tenants to 
support and each one needs a small handful of column families each.

The last I heard C* supports 'a couple of hundred' column families before 
things start to bog down.

What will it take for C* to support 50,000 column families?

I'm about to dive into the code and run some tests, but I was curious about how 
to quantify the overhead of a column family. Is the reason performance? Memory? 
Does the off-heap work help here?

Thanks,
Kirk

[1] The main three reasons:


 1.  ability to wholesale drop data for a given tenant via drop keyspace/drop 
CFs
 2.  ability to have divergent schema for each tenant (partially effected by 
DSE Solr integration)
 3.  secondary indexes per tenant (given requirement #2)


Re: 10,000s of column families/keyspaces

2013-07-01 Thread Hiller, Dean
Oh and if you are using STCS, I don't think the below is an issue at all
since that can run in parallel if needed already.

Dean

On 7/1/13 10:24 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

We use playorm to do 80,000 virtual column families(a playorm feature
though the pattern could be copied).  We did find out later and we are
working on this now that we wanted to map 80,000 virtual CF's into 10
real CF's so leveled compaction can run more in parallel though or else
we get stuck with single threaded LCS at the last tier which can take a
while.  We are about to map/reduce our dataset into our newest format.

Dean

From: Kirk True kirktrue...@gmail.commailto:kirktrue...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Monday, July 1, 2013 10:19 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: 10,000s of column families/keyspaces

Hi all,

I know it's an old topic, but I want to see if anything's changed on the
number of column families that C* supports, either in 1.2.x or 2.x.

For a number of reasons [1], we'd like to support multi-tenancy via
separate column families. The problem is that there are around 5,000
tenants to support and each one needs a small handful of column families
each.

The last I heard C* supports 'a couple of hundred' column families before
things start to bog down.

What will it take for C* to support 50,000 column families?

I'm about to dive into the code and run some tests, but I was curious
about how to quantify the overhead of a column family. Is the reason
performance? Memory? Does the off-heap work help here?

Thanks,
Kirk

[1] The main three reasons:


 1.  ability to wholesale drop data for a given tenant via drop
keyspace/drop CFs
 2.  ability to have divergent schema for each tenant (partially effected
by DSE Solr integration)
 3.  secondary indexes per tenant (given requirement #2)



Re: CorruptBlockException

2013-07-01 Thread Robert Coli
On Sat, Jun 29, 2013 at 8:39 PM, Glenn Thompson gatman1...@gmail.com wrote:
 I'm Glenn Thompson and new to Cassandra.  I have been trying to figure out
 how to recover from a CorruptBlockException.
 ...
 One of my nodes must have a hardware problem.  Although I've been unable to
 find anything wrong via logs, smart, or mce.
 ...
 The repair, scrub, and decommission all produced Exceptions related to the
 same few corrupt files.

Hardware problem sounds relatively likely, especially if you have not
crashed your nodes. Only other thing I can think of is an issue with
the relationship of the compression library and the JVM. What JVM/JDK
are you using, and what compression method is in use on the Column
Family?

In general the actions you took were reasonable. Do you have the full
stack trace?

=Rob


Re: CorruptBlockException

2013-07-01 Thread Glenn Thompson
Hi Rob,

It was hardware.  Memory.  I've been loading data since I originally
posted.  No exceptions so far.  I had some issues with OOMs when I first
started playing with cassandra.  I increased the amount RAM to the VM and
reduced the memtable size.  I'm guessing it's because I'm using I3s.  More
cores would most likely improve GC performance.

I put all the logs and my configs on my google drive.  The link is in the
original post.  I'm running 1.2.4.  There have been two releases since my
original download.  I'm going to attempt an upgrade soon.

I'm also considering using leveled compaction.  I just have two 750GB
drives per node.  I'd like to use more than 50% of the drives if I can.

Thanks,
Glenn


On Mon, Jul 1, 2013 at 11:08 AM, Robert Coli rc...@eventbrite.com wrote:

 On Sat, Jun 29, 2013 at 8:39 PM, Glenn Thompson gatman1...@gmail.com
 wrote:
  I'm Glenn Thompson and new to Cassandra.  I have been trying to figure
 out
  how to recover from a CorruptBlockException.
  ...
  One of my nodes must have a hardware problem.  Although I've been unable
 to
  find anything wrong via logs, smart, or mce.
  ...
  The repair, scrub, and decommission all produced Exceptions related to
 the
  same few corrupt files.

 Hardware problem sounds relatively likely, especially if you have not
 crashed your nodes. Only other thing I can think of is an issue with
 the relationship of the compression library and the JVM. What JVM/JDK
 are you using, and what compression method is in use on the Column
 Family?

 In general the actions you took were reasonable. Do you have the full
 stack trace?

 =Rob



Re: How to do a CAS UPDATE on single column CF?

2013-07-01 Thread Blair Zajac

Thanks!

On 7/1/13 1:41 AM, Sylvain Lebresne wrote:

You're right, there is currently no way to do this since 1) insert can't have a
IF currently and 2) update can't update such table.

We'll fix that: https://issues.apache.org/jira/browse/CASSANDRA-5715

--
Sylvain


On Sat, Jun 29, 2013 at 9:51 PM, Blair Zajac bl...@orcaware.com
mailto:bl...@orcaware.com wrote:

On 6/24/13 8:23 PM, Blair Zajac wrote:

How does one do an atomic update in a column family with a single 
column?

I have a this CF

CREATE TABLE schema_migrations (
version TEXT PRIMARY KEY,
) WITH COMPACTION = {'class': 'LeveledCompactionStrategy'};


Anyone?  Should I raise this on the developer mailing list or open a ticket?

Blair


Re: How to do a CAS UPDATE on single column CF?

2013-07-01 Thread Hiller, Dean
What does CAS stand for? And is that the row locking feature like hbase's
setAndReadWinner that you give the previous val and next val and your next
val is returned if you won otherwise the current result is returned and
you know some other node won?

Thanks,
Dean

On 7/1/13 12:09 PM, Blair Zajac bl...@orcaware.com wrote:

Thanks!

On 7/1/13 1:41 AM, Sylvain Lebresne wrote:
 You're right, there is currently no way to do this since 1) insert
can't have a
 IF currently and 2) update can't update such table.

 We'll fix that: https://issues.apache.org/jira/browse/CASSANDRA-5715

 --
 Sylvain


 On Sat, Jun 29, 2013 at 9:51 PM, Blair Zajac bl...@orcaware.com
 mailto:bl...@orcaware.com wrote:

 On 6/24/13 8:23 PM, Blair Zajac wrote:

 How does one do an atomic update in a column family with a
single column?

 I have a this CF

 CREATE TABLE schema_migrations (
 version TEXT PRIMARY KEY,
 ) WITH COMPACTION = {'class': 'LeveledCompactionStrategy'};


 Anyone?  Should I raise this on the developer mailing list or open
a ticket?

 Blair



Re: Patterns for enabling Compute apps which only request Local Node's

2013-07-01 Thread Robert Coli
On Sun, Jun 30, 2013 at 1:48 AM, rekt...@voodoowarez.com wrote:

 Question; if we're co-locating our Cassandra and our compute application
 on the same nodes, are there any in-use
 patterns in Cassandra user (or Cassandra dev) applications for having the
 compute application only pull data off the
 localhost Cassandra process? If we have the ability to manage where we do
 compute, what options are there for keeping
 compute happening on local data as much as possible?


The Hadoop support provides Hadoop-like support for locality. One presumes
you could make use of this functionality even if you were not actually
running Hadoop map/reduce as the compute application.

http://wiki.apache.org/cassandra/HadoopSupport#ClusterConfig

=Rob


Re: How to do a CAS UPDATE on single column CF?

2013-07-01 Thread Francisco Andrades Grassi
http://en.wikipedia.org/wiki/Compare-and-swap

I believe C* uses Paxos for CAS but not completely sure?

--
Francisco Andrades Grassi
www.bigjocker.com
@bigjocker

On Jul 1, 2013, at 1:49 PM, Hiller, Dean dean.hil...@nrel.gov wrote:

 What does CAS stand for? And is that the row locking feature like hbase's
 setAndReadWinner that you give the previous val and next val and your next
 val is returned if you won otherwise the current result is returned and
 you know some other node won?
 
 Thanks,
 Dean
 
 On 7/1/13 12:09 PM, Blair Zajac bl...@orcaware.com wrote:
 
 Thanks!
 
 On 7/1/13 1:41 AM, Sylvain Lebresne wrote:
 You're right, there is currently no way to do this since 1) insert
 can't have a
 IF currently and 2) update can't update such table.
 
 We'll fix that: https://issues.apache.org/jira/browse/CASSANDRA-5715
 
 --
 Sylvain
 
 
 On Sat, Jun 29, 2013 at 9:51 PM, Blair Zajac bl...@orcaware.com
 mailto:bl...@orcaware.com wrote:
 
On 6/24/13 8:23 PM, Blair Zajac wrote:
 
How does one do an atomic update in a column family with a
 single column?
 
I have a this CF
 
CREATE TABLE schema_migrations (
version TEXT PRIMARY KEY,
) WITH COMPACTION = {'class': 'LeveledCompactionStrategy'};
 
 
Anyone?  Should I raise this on the developer mailing list or open
 a ticket?
 
Blair
 



Re: How to do a CAS UPDATE on single column CF?

2013-07-01 Thread Andrew Cobley
According to Jonathan Ellis talk at Cassandra 13 it does use Paxos:

http://www.youtube.com/watch?v=PcUpPR4nSr4list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUU

http://www.slideshare.net/jbellis/cassandra-summit-2013-keynote

Andy

On 1 Jul 2013, at 19:40, Francisco Andrades Grassi 
bigjoc...@gmail.commailto:bigjoc...@gmail.com wrote:

http://en.wikipedia.org/wiki/Compare-and-swap

I believe C* uses Paxos for CAS but not completely sure?

--
Francisco Andrades Grassi
www.bigjocker.comhttp://www.bigjocker.com/
@bigjocker

On Jul 1, 2013, at 1:49 PM, Hiller, Dean 
dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:

What does CAS stand for? And is that the row locking feature like hbase's
setAndReadWinner that you give the previous val and next val and your next
val is returned if you won otherwise the current result is returned and
you know some other node won?



The University of Dundee is a registered Scottish Charity, No: SC015096


Re: 10,000s of column families/keyspaces

2013-07-01 Thread Robert Coli
On Mon, Jul 1, 2013 at 9:19 AM, Kirk True kirktrue...@gmail.com wrote:

 What will it take for C* to support 50,000 column families?


As I understand it, a (the?) big problem with huge numbers of Column
Families is that each ColumnFamily has a large number of MBeans associated
with it, each of which consume heap. So.. a lot fewer MBeans per column
family and/or MBean stuff not consuming heap? Then you still have the
problem of each CF having at least one live Memtable, which even if empty
will still consume heap...

I'm thinking the real answer to what it will take for C* to support 50k
CFs is a JVM which can functionally support heap sizes over 8gb ...
which seems unlikely to happen any time soon.

=Rob


Re: 10,000s of column families/keyspaces

2013-07-01 Thread Edward Capriolo
There is another problem. You now need to run repair for a large number of
column families and keyspaces and manage that, look out for schema
mismatches etc.


On Mon, Jul 1, 2013 at 4:09 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Jul 1, 2013 at 9:19 AM, Kirk True kirktrue...@gmail.com wrote:

 What will it take for C* to support 50,000 column families?


 As I understand it, a (the?) big problem with huge numbers of Column
 Families is that each ColumnFamily has a large number of MBeans associated
 with it, each of which consume heap. So.. a lot fewer MBeans per column
 family and/or MBean stuff not consuming heap? Then you still have the
 problem of each CF having at least one live Memtable, which even if empty
 will still consume heap...

 I'm thinking the real answer to what it will take for C* to support 50k
 CFs is a JVM which can functionally support heap sizes over 8gb ...
 which seems unlikely to happen any time soon.

 =Rob



Re: Cassandra as storage for cache data

2013-07-01 Thread Robert Coli
The most effective way to deal with obsolete Tombstones in the short lived
cache case seems to be to drop them on the floor en masse... :D

a) have two column families that the application alternates between, modulo
time_period
b) truncate and populate the cold one
c) read from the hot one
d) clear snapshots frequently

This avoids the downsides of dealing with Tombstones entirely, with only
the cost of increased complexity to manage snapshots. One could (NOT
RECOMMENDED) also disable automatic snapshotting on truncate...

=Rob
PS - apparently in the past this would have resulted in schema CF growing
without bound, but that is no longer the case...


Dynamic Snitch and EC2MultiRegionSnitch

2013-07-01 Thread Daning Wang
How does dynamic snitch work with EC2MultiRegionSnitch? Can dynamic routing
only happen in one data center? We don't wan to have the requests routed to
another center even nodes are idle in other side since the network could be
slow.

Thanks in advance,

Daning


RE: about FlushWriter All time blocked

2013-07-01 Thread Arindam Barua

Thanks guys, these sound like good suggestions, will try those out.

Aaron, we have around 80 CFs.

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Friday, June 28, 2013 10:05 PM
To: user@cassandra.apache.org
Subject: Re: about FlushWriter All time blocked

We do not use secondary indexes or snapshots
Out of interest how many CF's do you have ?

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/06/2013, at 7:52 AM, Nate McCall 
zznat...@gmail.commailto:zznat...@gmail.com wrote:


Non-zero for pending tasks is too transient. Try monitoring tpstats
with a (much) higher frequency and look for sustained threshold over a
duration.

Then, using a percentage of the configuration values for the max - 75%
of memtable_flush_queue_size in this case - alert when it has been
higher than '3' for more than N time. (Start with N=60 seconds and go
from there).

Also, that is a very high 'all time blocked' to 'completed' ratio for
FlushWriter. If iostat is happy, i'd do as Aaron suggested above and
turn up the memtable_flush_queue_size and play around with turning up
memtable_flush_writers (incrementally and separately for both of
course so you can see the effect).

On Thu, Jun 27, 2013 at 2:27 AM, Arindam Barua 
aba...@247-inc.commailto:aba...@247-inc.com wrote:

In our performance tests, we are seeing similar FlushWriter, MutationStage, 
MemtablePostFlusher pending tasks become non-zero. We collect snapshots every 5 
minutes, and they seem to clear after ~10-15 minutes though. (The flush writer 
has an 'All time blocked' count of 540 in the below example).

We do not use secondary indexes or snapshots. We do not use SSDs. We have a 
4-node cluster with around 30-40 GB data on each node. Each node has 3 1-TB 
disks with a RAID 0 setup.

Currently we monitor the tpstats every 5 minutes, and alert if FlushWriter or 
MutationStage has a non-zero Pending count. Any suggestions if this is a cause 
of concern already, or, should we alert only if that count becomes greater than 
a bigger number, say 10, or if the count remains non-zero greater than a 
specified time.

Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 0 0   15685133 0  
   0
RequestResponseStage  0 0   29880863 0  
   0
MutationStage 0 0   40457340 0  
   0
ReadRepairStage   0 0 704322 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 02283062 0  
   0
AntiEntropyStage  0 0  0 0  
   0
MigrationStage0 0 70 0  
   0
MemtablePostFlusher   1 1   1837 0  
   0
StreamStage   0 0  0 0  
   0
FlushWriter   1 1   1446 0  
 540
MiscStage 0 0  0 0  
   0
commitlog_archiver0 0  0 0  
   0
InternalResponseStage 0 0 43 0  
   0
HintedHandoff 0 0  3 0  
   0

Thanks,
Arindam

-Original Message-
From: aaron morton [mailto:aa...@thelastpickle.comhttp://thelastpickle.com]
Sent: Tuesday, June 25, 2013 10:29 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: about FlushWriter All time blocked


FlushWriter   0 0191 0  
  12

This means there were 12 times the code wanted to put an memtable in the queue 
to be flushed to disk but the queue was full.

The length of this queue is controlled by the memtable_flush_queue_size 
https://github.com/apache/cassandra/blob/cassandra-1.2/conf/cassandra.yaml#L299 
and memtable_flush_writers .

When this happens an internal lock around the commit log is held which prevents 
writes from being processed.

In general it means the IO system cannot keep up. It can sometimes happen when 
snapshot is used as all the CF's are flushed to disk at once. I also suspect it 
happens sometimes when a commit log segment is flushed and their are a lot of 
dirty CF's. But i've never proved it.

Increase memtable_flush_queue_size following the help in the yaml file. If you 
do not use secondary indexes are you using snapshot?

Hope that helps.
A
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/06/2013, at 3:41 PM, yue.zhang 

schema management

2013-07-01 Thread Franc Carter
Hi,

I've been giving some thought to the way we deploy schemas and am looking
for something better than out current approach, which is to use
cassandra-cli scripts.

What do people use for this ?

cheers

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


query deadlock(?) after flushing a table

2013-07-01 Thread Mohica Jasha
Hey,

I created a table with a wide row. Query on the wide row after removing the
entries and flushing the table becomes very slow. I am aware of the impact
of tombstones but it seems that there is a deadlock which prevents the
query to be completed.

step by step:

1. creating the keyspace and the table:

CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
'replication_factor': '1'};
use test;
CREATE TABLE job_index (   stage text,   timestamp text,   PRIMARY KEY
(stage, timestamp) ) WITH gc_grace_seconds=10 AND
compaction={'sstable_size_in_mb': '10', 'class':
'LeveledCompactionStrategy'};

2. insert 5000 entries to the job_index column family using the attached
script (insert_1-5000.cql)

3. flushing the table:
nodetool flush test job_index

4. delete the 5000 entries in the wide row using the attached script
(delete_1-5000.cql)

so far the queries return all the entries in the wide row in a fraction of
a second.

5. flushing the table:
nodetool flush test job_index

6. run the following query:
cqlsh:test SELECT * from job_index limit 1 ;
Request did not complete within rpc_timeout.

The execution of the query gets blocked and eventually the query times out.

In the cassandra's log file I see the following lines:

DEBUG [ScheduledTasks:1] 2013-07-01 19:10:39,469 GCInspector.java (line
121) GC for ParNew: 16 ms for 5 collections, 754590496 used; max is
2093809664
DEBUG [ScheduledTasks:1] 2013-07-01 19:10:40,473 GCInspector.java (line
121) GC for ParNew: 19 ms for 6 collections, 547894840 used; max is
2093809664
DEBUG [ScheduledTasks:1] 2013-07-01 19:10:41,475 GCInspector.java (line
121) GC for ParNew: 16 ms for 5 collections, 771812864 used; max is
2093809664

A few minutes later after the compaction finishes the problem goes away.

I am using cassandra 1.2.6.
I tested on Linux (CentOS) and MacOS and I get the same result!

Is this a known issue?


Re: schema management

2013-07-01 Thread sankalp kohli
You can generate schema through the code. That is also one option.


On Mon, Jul 1, 2013 at 4:10 PM, Franc Carter franc.car...@sirca.org.auwrote:


 Hi,

 I've been giving some thought to the way we deploy schemas and am looking
 for something better than out current approach, which is to use
 cassandra-cli scripts.

 What do people use for this ?

 cheers

 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215





Re: schema management

2013-07-01 Thread Todd Fast
Franc--

I think you will find Mutagen Cassandra very interesting; it is similar to
schema management tools like Flyway for SQL databases:

Mutagen Cassandra is a framework (based on Mutagen) that provides schema
 versioning and mutation for Apache Cassandra.

 Mutagen is a lightweight framework for applying versioned changes (known
 as mutations) to a resource, in this case a Cassandra schema. Mutagen takes
 into account the resource's existing state and only applies changes that
 haven't yet been applied.

 Schema mutation with Mutagen helps you make manageable changes to the
 schema of live Cassandra instances as you update your software, and is
 especially useful when used across development, test, staging, and
 production environments to automatically keep schemas in sync.



https://github.com/toddfast/mutagen-cassandra

Todd


On Mon, Jul 1, 2013 at 5:23 PM, sankalp kohli kohlisank...@gmail.comwrote:

 You can generate schema through the code. That is also one option.


 On Mon, Jul 1, 2013 at 4:10 PM, Franc Carter franc.car...@sirca.org.auwrote:


 Hi,

 I've been giving some thought to the way we deploy schemas and am looking
 for something better than out current approach, which is to use
 cassandra-cli scripts.

 What do people use for this ?

 cheers

 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215






Re: schema management

2013-07-01 Thread Franc Carter
On Tue, Jul 2, 2013 at 10:33 AM, Todd Fast t...@digitalexistence.comwrote:

 Franc--

 I think you will find Mutagen Cassandra very interesting; it is similar to
 schema management tools like Flyway for SQL databases:


Oops - forgot to mention in my original email that we will be looking into
Mutagen Cassandra in the medium term. I'm after something with a low
barrier to entry initially as we are quite time constrained.

cheers



 Mutagen Cassandra is a framework (based on Mutagen) that provides schema
 versioning and mutation for Apache Cassandra.

 Mutagen is a lightweight framework for applying versioned changes (known
 as mutations) to a resource, in this case a Cassandra schema. Mutagen takes
 into account the resource's existing state and only applies changes that
 haven't yet been applied.

 Schema mutation with Mutagen helps you make manageable changes to the
 schema of live Cassandra instances as you update your software, and is
 especially useful when used across development, test, staging, and
 production environments to automatically keep schemas in sync.



 https://github.com/toddfast/mutagen-cassandra

 Todd


 On Mon, Jul 1, 2013 at 5:23 PM, sankalp kohli kohlisank...@gmail.comwrote:

 You can generate schema through the code. That is also one option.


 On Mon, Jul 1, 2013 at 4:10 PM, Franc Carter 
 franc.car...@sirca.org.auwrote:


 Hi,

 I've been giving some thought to the way we deploy schemas and am
 looking for something better than out current approach, which is to use
 cassandra-cli scripts.

 What do people use for this ?

 cheers

 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215







-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Streaming performance with 1.2.6

2013-07-01 Thread Mike Heffner
Hi,

We've recently been testing some of the higher performance instance classes
on EC2, specifically the hi1.4xlarge, with Cassandra. For those that are
not familiar with them, they have two SSD disks and 10 gige.

While we have observed much improved raw performance over our current
instances, we are seeing a fairly large gap between Cassandra and raw
performance. We have particularly noticed a gap in the streaming
performance when bootstrapping a new node. I wanted to ensure that we have
configured these instances correctly to get the best performance out of
Cassandra.

When bootstrapping a new node into a small ring with a 35GB streaming
payload, we see a 5-8 MB/sec max streaming rate joining the new node to the
ring. We are using 1.2.6 with 256 token vnode support. In our tests the
ring is small enough so all streaming occurs from a single node.

To test hardware performance for this use case, we ran an rsync of the
sstables from one node to the next (to/from the same file systems) and
observed a consistent rate of 115 MB/sec.

The only changes we've made to the config (aside from dirs/hosts) are:

-concurrent_reads: 32
-concurrent_writes: 32
+concurrent_reads: 128 # 32
+concurrent_writes: 128 # 32

-rpc_server_type: sync
+rpc_server_type: hsha # sync

-compaction_throughput_mb_per_sec: 16
+compaction_throughput_mb_per_sec: 256 # 16

-read_request_timeout_in_ms: 1
+read_request_timeout_in_ms: 6000 # 1

-endpoint_snitch: SimpleSnitch
+endpoint_snitch: Ec2Snitch # SimpleSnitch

-internode_compression: all
+internode_compression: none

We use a 10G heap with a 2G new size. We are using the Oracle 1.7.0_25 JVM.

I've adjusted our streaming throughput limit from 200MB/sec up to 800MB/sec
on both the sending and receiving streaming nodes, but that doesn't appear
to make a difference.

The disks are raid0 (2 * 1T SSD) with 512 read ahead, XFS.

The nodes in the ring are running about 23% CPU on average, with spikes up
to a maximum of 45% CPU.

As I mentioned, on the same boxes with the same workloads, I've seen up to
115 MB/sec transfers with rsync.


Any suggestions for what to adjust to see better streaming performance? 5%
of what a single rsync can do seems somewhat limited.


Thanks,

Mike


-- 

  Mike Heffner m...@librato.com
  Librato, Inc.


very inefficient operation with tombstones

2013-07-01 Thread Mohica Jasha
Querying a table with 5000 thousands tombstones take 3 minutes to complete!
But Querying the same table with the same data pattern with 10,000 entries
takes a fraction of second to complete!


Details:
1. created the following table:
CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
'replication_factor': '1'};
use test;
CREATE TABLE job_index (   stage text,   timestamp text,   PRIMARY KEY
(stage, timestamp));

2. inserted 5000 entries to the table:
INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '0001' );
INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '0002' );

INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '4999' );
INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '5000' );

3. flushed the table:
nodetool flush test job_index

4. deleted the 5000 entries:
DELETE from job_index WHERE stage ='a' AND timestamp = '0001' ;
DELETE from job_index WHERE stage ='a' AND timestamp = '0002' ;
...
DELETE from job_index WHERE stage ='a' AND timestamp = '4999' ;
DELETE from job_index WHERE stage ='a' AND timestamp = '5000' ;

5. flushed the table:
nodetool flush test job_index

6. querying the table takes 3 minutes to complete:
cqlsh:test SELECT * from job_index limit 2;
tracing:
http://pastebin.com/jH2rZN2X

while query was getting executed I saw a lot of GC entries in cassandra's
log:
DEBUG [ScheduledTasks:1] 2013-07-01 23:47:59,221 GCInspector.java (line
121) GC for ParNew: 30 ms for 6 collections, 263993608 used; max is
2093809664
DEBUG [ScheduledTasks:1] 2013-07-01 23:48:00,222 GCInspector.java (line
121) GC for ParNew: 29 ms for 6 collections, 186209616 used; max is
2093809664
DEBUG [ScheduledTasks:1] 2013-07-01 23:48:01,223 GCInspector.java (line
121) GC for ParNew: 29 ms for 6 collections, 108731464 used; max is
2093809664

It seems that something very inefficient is happening in managing
tombstones.

If I start with a clean table and do the following:
1. insert 5000 entries
2. flush to disk
3. insert new 5000 entries
4. flush to disk
Querying the job_index for all the 10,000 entries takes a fraction of
second to complete:
tracing:
http://pastebin.com/scUN9JrP

The fact that iterating over 5000 tombstones takes 3 minutes but iterating
over 10,000 live cells takes fraction of a second to suggest that something
very inefficient is happening in managing tombstones.

I appreciate if any developer can look into this.

-M