Problems with node rejoining cluster

2013-06-25 Thread Arindam Barua

We need to do a rolling upgrade of our Cassandra cluster in production, since 
we are upgrading Cassandra on solaris to Cassandra on CentOS.
(We went with solaris initially since most of our other hosts in production are 
solaris, but were running into some lockup issues during perf tests, and 
decided to switch to linux)

Here are the steps we are following to take the node out of service, and get it 
back. Can someone comment if we are missing anything (eg. is it recommended to 
specify tokens in cassandra.yaml, or do something different with the seed hosts 
than mentioned below)

1.   nodetool decommission - wait for the data to be streamed out.

2.   Re-image (everything is wiped off the disks) the host to CentOS, with 
the same Cassandra version

3.   Get Cassandra back up.

Other details:

-  Using Cassandra 1.1.5

-  We do not specify any tokens in cassandra.yaml relying on bootstrap 
assigning the tokens automatically.

-  We are testing with a 4 node cluster, with only one seed host. The 
seed host is specified in the cassandra.yaml of each node and is not changed at 
any point.

While testing the solaris to linux upgrade path, things seem to work smoothly. 
The data streams out fine, and streams back in when the node comes back up. 
However, testing the linux to solaris path (in case we need to rollback), we 
are facing some issues with the nodes joining back the ring. nodetool indicates 
that the node has joined back the ring, but no data streams in, the node 
doesn't know about the keyspaces/column families, etc. We see some errors in 
the logs of the newly added nodes pasted below.

[17/06/2013:14:10:17 PDT] MutationStage:1: ERROR RowMutationVerbHandler.java 
(line 61) Error in row mutation
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1020
at 
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126)
at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:439)
at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:447)
at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:395)
at 
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

Thanks,
Arindam


Re: How to do a CAS UPDATE on single column CF?

2013-06-25 Thread Blair Zajac

On 06/24/2013 08:35 PM, Arthur Zubarev wrote:

On 06/24/2013 11:23 PM, Blair Zajac wrote:

CAS UPDATE

Since when C* has IF NOT EXISTS in DML part of CQL?


It's new in 2.0.

https://issues.apache.org/jira/browse/CASSANDRA-5062
https://github.com/riptano/cassandra-dtest/blob/master/cql_tests.py#L3044

Blair



Cassandra as storage for cache data

2013-06-25 Thread Dmitry Olshansky

Hello,

we are using Cassandra as a data storage for our caching system. Our 
application generates about 20 put and get requests per second. An 
average size of one cache item is about 500 Kb.


Cache items are placed into one column family with TTL set to 20 - 60 
minutes. Keys and values are bytes (not utf8 strings). Compaction 
strategy is SizeTieredCompactionStrategy.


We setup Cassandra 1.2.6 cluster of 4 nodes. Replication factor is 2. 
Each node has 10GB of RAM and enough space on HDD.


Now when we're putting this cluster into the load it's quickly fills 
with our runtime data (about 5 GB on every node) and we start observing 
performance degradation with often timeouts on client side.


We see that on each node compaction starts very frequently and lasts for 
several minutes to complete. It seems that each node usually busy with 
compaction process.


Here the questions:

What are the recommended setup configuration for our use case?

Is it makes sense to somehow tell Cassandra to keep all data in memory 
(memtables) to eliminate flushing it to disk (sstables) thus decreasing 
number of compactions? How to achieve this behavior?


Cassandra is starting with default shell script that gives the following 
command line:


jsvc.exec -user cassandra -home 
/usr/lib/jvm/java-6-openjdk-amd64/jre/bin/../ -pidfile 
/var/run/cassandra.pid -errfile 1 -outfile 
/var/log/cassandra/output.log -cp CLASSPATH_SKIPPED 
-Dlog4j.configuration=log4j-server.properties 
-Dlog4j.defaultInitOverride=true 
-XX:HeapDumpPath=/var/lib/cassandra/java_1371805844.hprof 
-XX:ErrorFile=/var/lib/cassandra/hs_err_1371805844.log -ea 
-javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms2500M -Xmx2500M 
-Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
-XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
-XX:+UseTLAB -Djava.net.preferIPv4Stack=true 
-Dcom.sun.management.jmxremote.port=7199 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
org.apache.cassandra.service.CassandraDaemon


--
Best regards,
Dmitry Olshansky



Re: Cassandra as storage for cache data

2013-06-25 Thread Jeremy Hanna
If you have rapidly expiring data, then tombstones are probably filling your 
disk and your heap (depending on how you order the data on disk).  To check to 
see if your queries are affected by tombstones, you might try using the query 
tracing that's built-in to 1.2.
See:
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
  -- has an example of tracing where you can see tombstones affecting the query
http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2

You'll want to consider reducing the gc_grace period from the default of 10 
days for those column families - with the understanding why gc_grace exists in 
the first place, see http://wiki.apache.org/cassandra/DistributedDeletes .  
Then once the gc_grace period has passed, the tombstones will stay around until 
they are compacted away.  So there are two options currently to compact them 
away more quickly:
1) use leveled compaction - see 
http://www.datastax.com/dev/blog/when-to-use-leveled-compaction  Leveled 
compaction only requires 10% headroom (as opposed to 50% for size tiered 
compaction) for amount of disk that needs to be kept free.
2) if 1 doesn't work and you're still seeing performance degrading and the 
tombstones aren't getting cleared out fast enough, you might consider using 
size tiered compaction but performing regular major compactions to get rid of 
expired data.

Keep in mind though that if you use gc_grace of 0 and do any kind of manual 
deletes outside of TTLs, you probably want to do the deletes at 
ConsistencyLevel.ALL or else if a node goes down, then comes back up, there's a 
chance that deleted data may be resurrected.  That only applies to non-ttl data 
where you manually delete it.  See the explanation of distributed deletes for 
more information.

 
On 25 Jun 2013, at 13:31, Dmitry Olshansky dmitry.olshan...@gridnine.com 
wrote:

 Hello,
 
 we are using Cassandra as a data storage for our caching system. Our 
 application generates about 20 put and get requests per second. An average 
 size of one cache item is about 500 Kb.
 
 Cache items are placed into one column family with TTL set to 20 - 60 
 minutes. Keys and values are bytes (not utf8 strings). Compaction strategy is 
 SizeTieredCompactionStrategy.
 
 We setup Cassandra 1.2.6 cluster of 4 nodes. Replication factor is 2. Each 
 node has 10GB of RAM and enough space on HDD.
 
 Now when we're putting this cluster into the load it's quickly fills with our 
 runtime data (about 5 GB on every node) and we start observing performance 
 degradation with often timeouts on client side.
 
 We see that on each node compaction starts very frequently and lasts for 
 several minutes to complete. It seems that each node usually busy with 
 compaction process.
 
 Here the questions:
 
 What are the recommended setup configuration for our use case?
 
 Is it makes sense to somehow tell Cassandra to keep all data in memory 
 (memtables) to eliminate flushing it to disk (sstables) thus decreasing 
 number of compactions? How to achieve this behavior?
 
 Cassandra is starting with default shell script that gives the following 
 command line:
 
 jsvc.exec -user cassandra -home /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/../ 
 -pidfile /var/run/cassandra.pid -errfile 1 -outfile 
 /var/log/cassandra/output.log -cp CLASSPATH_SKIPPED 
 -Dlog4j.configuration=log4j-server.properties 
 -Dlog4j.defaultInitOverride=true 
 -XX:HeapDumpPath=/var/lib/cassandra/java_1371805844.hprof 
 -XX:ErrorFile=/var/lib/cassandra/hs_err_1371805844.log -ea 
 -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2500M -Xmx2500M -Xmn400M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC 
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 
 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 
 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 org.apache.cassandra.service.CassandraDaemon
 
 -- 
 Best regards,
 Dmitry Olshansky
 



Re: NREL has released open source Databus on github for time series data

2013-06-25 Thread Hiller, Dean
When you say aggregates, do you mean converting 1 minute data to 15 minute data 
or do you mean summing different streams such that you have the total energy 
from energy streams A, B, C, etc.

Ps. We are working on supporting both….there is a clusterable cron job thing in 
place right now that does some aggregation already but there is another in the 
works for moving higher rate data to lower rates.

Dean

From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Monday, June 24, 2013 9:51 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: NREL has released open source Databus on github for time series 
data

Hi Dean,
Does this handle rollup aggregates along with the time series data ?
I had a quick look at the links and could not see anything.

Cheers
Aaron

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/06/2013, at 2:51 AM, Hiller, Dean 
dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:

NREL has released their open source databus.  They spin it as energy data (and 
a system for campus energy/building energy) but it is very general right now 
and probably will stay pretty general.  More information can be found here

http://www.nrel.gov/analysis/databus/

The source code can be found here
https://github.com/deanhiller/databus

Star the project if you like the idea.  NREL just did a big press release and 
is developing a community around the project.  It is in it's early stages but 
there are users using it and I am helping HP set an instance up this month.  If 
you want to become a committer on the project, let me know as well.

Later,
Dean




Is nexted selects supported by Cassandra JDBC??

2013-06-25 Thread Tony Anecito
Hi All,

Is nested select supported by Cassandra JDBC driver?

So for a simple example to get a list of user details from a users column 
family:

Select * from user_details where user_id in (Select user_id from users)

Thanks!
-Tony


Re: Is nexted selects supported by Cassandra JDBC??

2013-06-25 Thread Sylvain Lebresne
No. CQL3 doesn't support nested selects.

--
Sylvain


On Tue, Jun 25, 2013 at 5:02 PM, Tony Anecito adanec...@yahoo.com wrote:

 Hi All,

 Is nested select supported by Cassandra JDBC driver?

 So for a simple example to get a list of user details from a users column
 family:

 Select * from user_details where user_id in (Select user_id from users)

 Thanks!
 -Tony



cassandra-unit 1.2.0.1 is released : CQL3 and Spring

2013-06-25 Thread Jérémy SEVELLEC
Hi all,

Just to let you know that a new release of cassandra-unit is available with
CQL3 dataset support and Spring integration.

More here :
http://www.unchticafe.fr/2013/06/cassandra-unit-1201-is-out-cql3-script.html

Regards,

-- 
Jérémy


Re: Is nexted selects supported by Cassandra JDBC??

2013-06-25 Thread Tony Anecito
Ok. So if I have a composite key table instead of a nested select I will have 
to run 2 queries else denormalize? Unless there is something provided by CQL 3 
to do the same thing?

Thanks,
-Tony





 From: Sylvain Lebresne sylv...@datastax.com
To: user@cassandra.apache.org user@cassandra.apache.org; Tony Anecito 
adanec...@yahoo.com 
Sent: Tuesday, June 25, 2013 9:06 AM
Subject: Re: Is nexted selects supported by Cassandra JDBC??
 


No. CQL3 doesn't support nested selects.

--
Sylvain



On Tue, Jun 25, 2013 at 5:02 PM, Tony Anecito adanec...@yahoo.com wrote:

Hi All,


Is nested select supported by Cassandra JDBC driver?


So for a simple example to get a list of user details from a users column 
family:


Select * from user_details where user_id in (Select user_id from users)


Thanks!-Tony

Re: Is nexted selects supported by Cassandra JDBC??

2013-06-25 Thread Sylvain Lebresne
Yes, denormalization is usually the answer to the absence of sub-queries
(and joins for that matter) in Cassandra (though sometimes, simply doing 2
queries is fine, depends on your use case and performance requirements).


On Tue, Jun 25, 2013 at 6:46 PM, Tony Anecito adanec...@yahoo.com wrote:

 Ok. So if I have a composite key table instead of a nested select I will
 have to run 2 queries else denormalize? Unless there is something provided
 by CQL 3 to do the same thing?

 Thanks,
 -Tony


   --
  *From:* Sylvain Lebresne sylv...@datastax.com
 *To:* user@cassandra.apache.org user@cassandra.apache.org; Tony
 Anecito adanec...@yahoo.com
 *Sent:* Tuesday, June 25, 2013 9:06 AM
 *Subject:* Re: Is nexted selects supported by Cassandra JDBC??

 No. CQL3 doesn't support nested selects.

 --
 Sylvain


 On Tue, Jun 25, 2013 at 5:02 PM, Tony Anecito adanec...@yahoo.com wrote:

 Hi All,

 Is nested select supported by Cassandra JDBC driver?

 So for a simple example to get a list of user details from a users column
 family:

 Select * from user_details where user_id in (Select user_id from users)

 Thanks!
 -Tony







Re: [Cassandra] Replacing a cassandra node with one of the same IP

2013-06-25 Thread Robert Coli
On Mon, Jun 24, 2013 at 8:53 PM, aaron morton aa...@thelastpickle.com wrote:
 so I am just wondering if this means the hinted handoffs are also updated to 
 reflect the new Cassandra node uuid.
 Without checking the code I would guess not.
 Because it would involve a potentially large read / write / delete to create 
 a new row with the same data. And Hinted Handoff is an optimisation.

So are hints to a given UUID discarded after some period of time with
that UUID not present in the cluster? Or might they need to be
manually purged?

=Rob


Re: Problems with node rejoining cluster

2013-06-25 Thread Robert Coli
On Mon, Jun 24, 2013 at 11:19 PM, Arindam Barua aba...@247-inc.com wrote:
 -  We do not specify any tokens in cassandra.yaml relying on
 bootstrap assigning the tokens automatically.

As cassandra.yaml comments state, you should never ever do this in a
real cluster.

I don't know what is causing your underlying issue, but not-specifying
tokens is a strong contender.

=Rob


Re: Counter value becomes incorrect after several dozen reads writes

2013-06-25 Thread Robert Coli
On Mon, Jun 24, 2013 at 6:42 PM, Josh Dzielak j...@keen.io wrote:
 There is only 1 thread running this sequence, and consistency levels are set
 to ALL. The behavior is fairly repeatable - the unexpectation mutation will
 happen at least 10% of the time I run this program, but at different points.
 When it does not go awry, I can run this loop many thousands of times and
 keep the counter exact. But if it starts happening to a specific counter,
 the counter will never recover and will continue to maintain it's
 incorrect value even after successful subsequent writes.

Sounds like a corrupt counter shard. Hard to understand how it can
happen at ALL. If I were you I would file a JIRA including your repro
path...

=Rob


Re: copy data between clusters

2013-06-25 Thread Robert Coli
On Mon, Jun 24, 2013 at 8:35 PM, S C as...@outlook.com wrote:
 I have a scenario here. I have a cluster A and cluster B running on
 cassandra 1.1. I need to copy data from Cluster A to Cluster B. Cluster A
 has few keyspaces that I need to copy over to Cluster B. What are my
 options?

http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra

=Rob


Re: Cassandra terminates with OutOfMemory (OOM) error

2013-06-25 Thread sankalp kohli
Your young gen is 1/4 of 1.8G which is 450MB. Also in slice queries, the
co-ordinator will get the results from replicas as per consistency level
used and merge the results before returning to the client.
What is the replication in your keyspace and what consistency you are
reading with.
Also 55MB on disk will not mean 55MB in memory. The data is compressed on
disk and also there are other overheads.



On Mon, Jun 24, 2013 at 8:38 PM, Mohammed Guller moham...@glassbeam.comwrote:

  No deletes. In my test, I am just writing and reading data.

  There is a lot of GC, but only on the younger generation. Cassandra
 terminates before the GC for old generation kicks in.

  I know that our queries are reading an unusual amount of data. However,
 I expected it to throw a timeout exception instead of crashing. Also, don't
 understand why 1.8 Gb heap is getting full when the total data stored in
 the entire Cassandra cluster is less than 55 MB.

 Mohammed

 On Jun 21, 2013, at 7:30 PM, sankalp kohli kohlisank...@gmail.com
 wrote:

   Looks like you are putting lot of pressure on the heap by doing a slice
 query on a large row.
 Do you have lot of deletes/tombstone on the rows? That might be causing a
 problem.
 Also why are you returning so many columns as once, you can use auto
 paginate feature in Astyanax.

  Also do you see lot of GC happening?


 On Fri, Jun 21, 2013 at 1:13 PM, Jabbar Azam aja...@gmail.com wrote:

 Hello Mohammed,

  You should increase the heap space. You should also tune the garbage
 collection so young generation objects are collected faster, relieving
 pressure on heap We have been using jdk 7 and it uses G1 as the default
 collector. It does a better job than me trying to optimise the JDK 6 GC
 collectors.

  Bear in mind though that the OS will need memory, so will the row cache
 and the filing system. Although memory usage will depend on the workload of
 your system.

  I'm sure you'll also get good advice from other members of the mailing
 list.

  Thanks

 Jabbar Azam


 On 21 June 2013 18:49, Mohammed Guller moham...@glassbeam.com wrote:

  We have a 3-node cassandra cluster on AWS. These nodes are running
 cassandra 1.2.2 and have 8GB memory. We didn't change any of the default
 heap or GC settings. So each node is allocating 1.8GB of heap space. The
 rows are wide; each row stores around 260,000 columns. We are reading the
 data using Astyanax. If our application tries to read 80,000 columns each
 from 10 or more rows at the same time, some of the nodes run out of heap
 space and terminate with OOM error. Here is the error message:

 ** **

 java.lang.OutOfMemoryError: Java heap space

 at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107)***
 *

 at
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
 

 at
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
 

 at
 org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126)
 

 at
 org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96)
 

 at
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164)
 

 at
 org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136)
 

 at
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84)
 

 at
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294)
 

 at
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
 

 at
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363)
 

 at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220)
 

 at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132)
 

 at org.apache.cassandra.db.Table.getRow(Table.java:355)

 at
 org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70)
 

at
 org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052)
 

 at
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578)
 

 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 

 at java.lang.Thread.run(Thread.java:722)

 ** **

 ERROR 02:14:05,351 Exception in thread Thread[Thrift:6,5,main]

 java.lang.OutOfMemoryError: Java heap space

 at java.lang.Long.toString(Long.java:269)

 at java.lang.Long.toString(Long.java:764)

 at
 

Re: Cassandra as storage for cache data

2013-06-25 Thread sankalp kohli
Apart from what Jeremy said, you can try these
1) Use replication = 1. It is cache data and you dont need persistence.
2) Try playing with memtable size.
3) Use netflix client library as it will reduce one hop. It will chose the
node with data as the co ordinator.
4) Work on your schema. You might want to have fewer columns in each row.
With fatter rows, bloom filter will give out more sstables which are
eligible.

-Sankalp


On Tue, Jun 25, 2013 at 9:04 AM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote:

 If you have rapidly expiring data, then tombstones are probably filling
 your disk and your heap (depending on how you order the data on disk).  To
 check to see if your queries are affected by tombstones, you might try
 using the query tracing that's built-in to 1.2.
 See:

 http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
  -- has an example of tracing where you can see tombstones affecting the
 query
 http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2

 You'll want to consider reducing the gc_grace period from the default of
 10 days for those column families - with the understanding why gc_grace
 exists in the first place, see
 http://wiki.apache.org/cassandra/DistributedDeletes .  Then once the
 gc_grace period has passed, the tombstones will stay around until they are
 compacted away.  So there are two options currently to compact them away
 more quickly:
 1) use leveled compaction - see
 http://www.datastax.com/dev/blog/when-to-use-leveled-compaction  Leveled
 compaction only requires 10% headroom (as opposed to 50% for size tiered
 compaction) for amount of disk that needs to be kept free.
 2) if 1 doesn't work and you're still seeing performance degrading and the
 tombstones aren't getting cleared out fast enough, you might consider using
 size tiered compaction but performing regular major compactions to get rid
 of expired data.

 Keep in mind though that if you use gc_grace of 0 and do any kind of
 manual deletes outside of TTLs, you probably want to do the deletes at
 ConsistencyLevel.ALL or else if a node goes down, then comes back up,
 there's a chance that deleted data may be resurrected.  That only applies
 to non-ttl data where you manually delete it.  See the explanation of
 distributed deletes for more information.


 On 25 Jun 2013, at 13:31, Dmitry Olshansky dmitry.olshan...@gridnine.com
 wrote:

  Hello,
 
  we are using Cassandra as a data storage for our caching system. Our
 application generates about 20 put and get requests per second. An average
 size of one cache item is about 500 Kb.
 
  Cache items are placed into one column family with TTL set to 20 - 60
 minutes. Keys and values are bytes (not utf8 strings). Compaction strategy
 is SizeTieredCompactionStrategy.
 
  We setup Cassandra 1.2.6 cluster of 4 nodes. Replication factor is 2.
 Each node has 10GB of RAM and enough space on HDD.
 
  Now when we're putting this cluster into the load it's quickly fills
 with our runtime data (about 5 GB on every node) and we start observing
 performance degradation with often timeouts on client side.
 
  We see that on each node compaction starts very frequently and lasts for
 several minutes to complete. It seems that each node usually busy with
 compaction process.
 
  Here the questions:
 
  What are the recommended setup configuration for our use case?
 
  Is it makes sense to somehow tell Cassandra to keep all data in memory
 (memtables) to eliminate flushing it to disk (sstables) thus decreasing
 number of compactions? How to achieve this behavior?
 
  Cassandra is starting with default shell script that gives the following
 command line:
 
  jsvc.exec -user cassandra -home
 /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/../ -pidfile
 /var/run/cassandra.pid -errfile 1 -outfile /var/log/cassandra/output.log
 -cp CLASSPATH_SKIPPED -Dlog4j.configuration=log4j-server.properties
 -Dlog4j.defaultInitOverride=true
 -XX:HeapDumpPath=/var/lib/cassandra/java_1371805844.hprof
 -XX:ErrorFile=/var/lib/cassandra/hs_err_1371805844.log -ea
 -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
 -XX:ThreadPriorityPolicy=42 -Xms2500M -Xmx2500M -Xmn400M
 -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
 -Dcom.sun.management.jmxremote.ssl=false
 -Dcom.sun.management.jmxremote.authenticate=false
 org.apache.cassandra.service.CassandraDaemon
 
  --
  Best regards,
  Dmitry Olshansky
 




Re: Counter value becomes incorrect after several dozen reads writes

2013-06-25 Thread Andrew Bialecki
If you can reproduce the invalid behavior 10+% of the time with steps to
repro that take 5-10s/iteration, that sounds extremely interesting for
getting to the bottom of the invalid shard issue (if that's what the root
cause ends up being). Would be very interested in the set up to see if the
behavior can be duplicated.

Andrew


On Tue, Jun 25, 2013 at 2:18 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Jun 24, 2013 at 6:42 PM, Josh Dzielak j...@keen.io wrote:
  There is only 1 thread running this sequence, and consistency levels are
 set
  to ALL. The behavior is fairly repeatable - the unexpectation mutation
 will
  happen at least 10% of the time I run this program, but at different
 points.
  When it does not go awry, I can run this loop many thousands of times and
  keep the counter exact. But if it starts happening to a specific counter,
  the counter will never recover and will continue to maintain it's
  incorrect value even after successful subsequent writes.

 Sounds like a corrupt counter shard. Hard to understand how it can
 happen at ALL. If I were you I would file a JIRA including your repro
 path...

 =Rob



Re: Custom 1.2 Authentication plugin will not work unless user is in system_auth.users column family

2013-06-25 Thread Bao Le
Sorry for not following up on this one in time. I filed a JIRA (5651) and it 
seems user lookup is here to stay.

https://issues.apache.org/jira/browse/CASSANDRA-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

On a related note, that column family is, by default, set up to have key cached 
only. It might be a good idea to have its row cached turned on if row cache is 
enabled.


Bao


RE: copy data between clusters

2013-06-25 Thread S C
Bob and Arthur - thanks for your inputs.
I tried sstableloader but ran into below issue. Anything to do with the 
configuration to run sstableloader?
sstableloader -d 10.225.64.2,10.225.64.3 service/context INFO 14:43:49,937 
Opening service/context/service-context-hf-50 (164863 bytes)DEBUG 14:43:50,063 
INDEX LOAD TIME for service/context/service-context-hf-50: 128 ms. INFO 
14:43:50,063 Opening service/context/service-context-hf-49 (7688939 bytes)DEBUG 
14:43:50,076 INDEX LOAD TIME for service/context/service-context-hf-49: 13 ms. 
INFO 14:43:50,076 Opening service/context/service-context-hf-51 (6703 
bytes)DEBUG 14:43:50,078 INDEX LOAD TIME for 
service/context/service-context-hf-51: 2 ms.Streaming revelant part of 
service/context/service-context-hf-50-Data.db 
service/context/service-context-hf-49-Data.db 
service/context/service-context-hf-51-Data.db to [/10.225.64.2, /10.225.64.3] 
INFO 14:43:50,124 Stream context metadata 
[service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 
- 0%], 3 sstables.DEBUG 14:43:50,124 Adding file 
service/context/service-context-hf-50-Data.db to be streamed.DEBUG 14:43:50,124 
Adding file service/context/service-context-hf-49-Data.db to be streamed.DEBUG 
14:43:50,124 Adding file service/context/service-context-hf-51-Data.db to be 
streamed. INFO 14:43:50,136 Streaming to /10.225.64.2DEBUG 14:43:50,144 Files 
are service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 
0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0% INFO 14:43:50,159 Stream context metadata 
[service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 
- 0%], 3 sstables.DEBUG 14:43:50,159 Adding file 
service/context/service-context-hf-50-Data.db to be streamed.DEBUG 14:43:50,159 
Adding file service/context/service-context-hf-49-Data.db to be streamed.DEBUG 
14:43:50,160 Adding file service/context/service-context-hf-51-Data.db to be 
streamed. INFO 14:43:50,160 Streaming to /10.225.64.3DEBUG 14:43:50,160 Files 
are service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 
0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 
0MB/s)] WARN 14:43:50,225 Failed attempt 1 to connect to /10.225.64.3 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address) WARN 14:43:50,241 Failed attempt 1 to connect to 
/10.225.64.2 to stream service/context/service-context-hf-49-Data.db sections=1 
progress=0/7688939 - 0%. Retrying in 4000 ms. (java.net.SocketException: 
Invalid argument or cannot assign requested address)progress: [/10.225.64.2 0/3 
(0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 14:43:54,227 
Failed attempt 2 to connect to /10.225.64.3 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address) WARN 14:43:54,244 Failed attempt 2 to connect to 
/10.225.64.2 to stream service/context/service-context-hf-49-Data.db sections=1 
progress=0/7688939 - 0%. Retrying in 8000 ms. (java.net.SocketException: 
Invalid argument or cannot assign requested address)progress: [/10.225.64.2 0/3 
(0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 14:44:02,229 
Failed attempt 3 to connect to /10.225.64.3 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 16000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address) WARN 14:44:02,309 Failed attempt 3 to connect to 
/10.225.64.2 to stream service/context/service-context-hf-49-Data.db sections=1 
progress=0/7688939 - 0%. Retrying in 16000 ms. (java.net.SocketException: 
Invalid argument or cannot assign requested address)progress: [/10.225.64.2 0/3 
(0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)]DEBUG 14:44:18,231 
closing with status falseStreaming session to /10.225.64.3 failedERROR 
14:44:18,236 Error in ThreadPoolExecutorjava.lang.RuntimeException: 
java.net.SocketException: Invalid argument or cannot assign requested address   
  at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:636) 
  at 

what happen if coordinator node fails during write

2013-06-25 Thread Jiaan Zeng
Hi there,

I am writing data to Cassandra by thrift client (not hector) and
wonder what happen if the coordinator node fails. The same question
applies for bulk loader which uses gossip protocol instead of thrift
protocol. In my understanding, the HintedHandoff only takes care of
the replica node fails.

Thanks.

--
Regards,
Jiaan


Re: copy data between clusters

2013-06-25 Thread Arthur Zubarev
Hello SC,

whilst most of the sstableloader errors stem from incorrect setups I suspect 
this time you merely have a connectivity issue e.g. a firewall blocking traffic.

From: S C 
Sent: Tuesday, June 25, 2013 5:28 PM
To: user@cassandra.apache.org 
Subject: RE: copy data between clusters

Bob and Arthur - thanks for your inputs. 

I tried sstableloader but ran into below issue. Anything to do with the 
configuration to run sstableloader?

sstableloader -d 10.225.64.2,10.225.64.3 service/context
INFO 14:43:49,937 Opening service/context/service-context-hf-50 (164863 bytes)
DEBUG 14:43:50,063 INDEX LOAD TIME for service/context/service-context-hf-50: 
128 ms.
INFO 14:43:50,063 Opening service/context/service-context-hf-49 (7688939 bytes)
DEBUG 14:43:50,076 INDEX LOAD TIME for service/context/service-context-hf-49: 
13 ms.
INFO 14:43:50,076 Opening service/context/service-context-hf-51 (6703 bytes)
DEBUG 14:43:50,078 INDEX LOAD TIME for service/context/service-context-hf-51: 2 
ms.
Streaming revelant part of service/context/service-context-hf-50-Data.db 
service/context/service-context-hf-49-Data.db 
service/context/service-context-hf-51-Data.db to [/10.225.64.2, /10.225.64.3]
INFO 14:43:50,124 Stream context metadata 
[service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 
- 0%], 3 sstables.
DEBUG 14:43:50,124 Adding file service/context/service-context-hf-50-Data.db to 
be streamed.
DEBUG 14:43:50,124 Adding file service/context/service-context-hf-49-Data.db to 
be streamed.
DEBUG 14:43:50,124 Adding file service/context/service-context-hf-51-Data.db to 
be streamed.
INFO 14:43:50,136 Streaming to /10.225.64.2
DEBUG 14:43:50,144 Files are service/context/service-context-hf-49-Data.db 
sections=1 progress=0/7688939 - 
0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 
0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%
INFO 14:43:50,159 Stream context metadata 
[service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 
- 0%], 3 sstables.
DEBUG 14:43:50,159 Adding file service/context/service-context-hf-50-Data.db to 
be streamed.
DEBUG 14:43:50,159 Adding file service/context/service-context-hf-49-Data.db to 
be streamed.
DEBUG 14:43:50,160 Adding file service/context/service-context-hf-51-Data.db to 
be streamed.
INFO 14:43:50,160 Streaming to /10.225.64.3
DEBUG 14:43:50,160 Files are service/context/service-context-hf-49-Data.db 
sections=1 progress=0/7688939 - 
0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 
0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%

progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 
0MB/s)] WARN 14:43:50,225 Failed attempt 1 to connect to /10.225.64.3 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
WARN 14:43:50,241 Failed attempt 1 to connect to /10.225.64.2 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 
0MB/s)] WARN 14:43:54,227 Failed attempt 2 to connect to /10.225.64.3 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
WARN 14:43:54,244 Failed attempt 2 to connect to /10.225.64.2 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 
0MB/s)] WARN 14:44:02,229 Failed attempt 3 to connect to /10.225.64.3 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 16000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
WARN 14:44:02,309 Failed attempt 3 to connect to /10.225.64.2 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 16000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 
0MB/s)]DEBUG 14:44:18,231 closing with status false
Streaming session to /10.225.64.3 failed
ERROR 

Re: what happen if coordinator node fails during write

2013-06-25 Thread Andrey Ilinykh
It depends on cassandra version. As far as I know in 1.2 coordinator logs
request before it updates replicas. If it fails it will replay log on
startup.
In 1.1 you may have inconsistant state, because only part of your request
is propagated to replicas.

Thank you,
  Andrey


On Tue, Jun 25, 2013 at 5:11 PM, Jiaan Zeng ji...@bloomreach.com wrote:

 Hi there,

 I am writing data to Cassandra by thrift client (not hector) and
 wonder what happen if the coordinator node fails. The same question
 applies for bulk loader which uses gossip protocol instead of thrift
 protocol. In my understanding, the HintedHandoff only takes care of
 the replica node fails.

 Thanks.

 --
 Regards,
 Jiaan



Re: Date range queries

2013-06-25 Thread Colin Blower
You could just separate the history data from the current data. Then
when the user's result is updated, just write into two tables.

CREATE TABLE all_answers (
  user_id uuid,
  created timeuuid,
  result text,
  question_id varint,
  PRIMARY KEY (user_id, created)
)

CREATE TABLE current_answers (
  user_id uuid,
  question_id varint,
  created timeuuid,
  result text,
  PRIMARY KEY (user_id, question_id)
)


 select * FROM current_answers ;
 user_id  | question_id | result | created
--+-++--
 11b1e59c-ddfa-11e2-a28f-0800200c9a66 |   1 | no |
f9893ee0-ddfa-11e2-b74c-35d7be46b354
 11b1e59c-ddfa-11e2-a28f-0800200c9a66 |   2 |   blah |
f7af75d0-ddfa-11e2-b74c-35d7be46b354

 select * FROM all_answers ;
 user_id  |
created  | question_id | result
--+--+-+
 11b1e59c-ddfa-11e2-a28f-0800200c9a66 |
f0141234-ddfa-11e2-b74c-35d7be46b354 |   1 |yes
 11b1e59c-ddfa-11e2-a28f-0800200c9a66 |
f7af75d0-ddfa-11e2-b74c-35d7be46b354 |   2 |   blah
 11b1e59c-ddfa-11e2-a28f-0800200c9a66 |
f9893ee0-ddfa-11e2-b74c-35d7be46b354 |   1 | no

This way you can get the history of answers if you want and there is a
simple way to get the most current answers.

Just a thought.
-Colin B.


On 06/24/2013 03:28 PM, Christopher J. Bottaro wrote:
 Yes, that makes sense and that article helped a lot, but I still have
 a few questions...

 The created_at in our answers table is basically used as a version id.
  When a user updates his answer, we don't overwrite the old answer,
 but rather insert a new answer with a more recent timestamp (the version).

 answers
 ---
 user_id | created_at | question_id | result
 ---
   1 | 2013-01-01 | 1   | yes
   1 | 2013-01-01 | 2   | blah
   1 | 2013-01-02 | 1   | no

 So the queries we really want to run are find me all the answers for
 a given user at a given time.  So given the date of 2013-01-02 and
 user_id 1, we would want rows 2 and 3 returned (since rows 3 obsoletes
 row 1).  Is it possible to do this with CQL given the current schema?

 As an aside, we can do this in Postgresql using window functions, not
 standard SQL, but pretty neat.

 We can alter our schema like so...

 answers
 ---
 user_id | start_at | end_at | question_id | result

 Where the start_at and end_at denote when an answer is active.  So the
 example above would become:

 answers
 ---
 user_id | start_at   | end_at | question_id | result
 
   1 | 2013-01-01 | 2013-01-02 | 1   | yes
   1 | 2013-01-01 | null   | 2   | blah
   1 | 2013-01-02 | null   | 1   | no

 Now we can query SELECT * FROM answers WHERE user_id = 1 AND start_at
 = '2013-01-02' AND (end_at  '2013-01-02' OR end_at IS NULL).

 How would one define the partitioning key and cluster columns in CQL
 to accomplish this?  Is it as simple as PRIMARY KEY (user_id,
 start_at, end_at, question_id) (remembering that we sometimes want to
 limit by question_id)?

 Also, we are a bit worried about race conditions.  Consider two
 separate processes updating an answer for a given user_id /
 question_id.  There will be a race condition between the two to update
 the correct row's end_at field.  Does that make sense?  I can draw it
 out with ASCII tables, but I feel like this email is already too
 long... :P

 Thanks for the help.



 On Wed, Jun 19, 2013 at 2:28 PM, David McNelis dmcne...@gmail.com
 mailto:dmcne...@gmail.com wrote:

 So, if you want to grab by the created_at and occasionally limit
 by question id, that is why you'd use created_at.

 The way the primary keys work is the first part of the primary key
 is the Partioner key, that field is what essentially is the single
 cassandra row.  The second key is the order preserving key, so you
 can sort by that key.  If you have a third piece, then that is the
 secondary order preserving key.

 The reason you'd want to do (user_id, created_at, question_id) is
 because when you do a query on the keys, if you MUST use the
 preceding pieces of the primary key.  So in your case, you could
 not do a query with just user_id and question_id with the
 user-created-question key.  Alternatively if you went with
 (user_id, question_id, created_at), you would not be able to
 include a range of created_at unless you were also filtering on
 the question_id.

 Does that make sense?

 As for the large rows, 10k is unlikely to cause you too many
 issues (unless the answer is potentially a big blob of text).
  Newer versions of cassandra deal with a lot of things in 

Re: what happen if coordinator node fails during write

2013-06-25 Thread sankalp kohli
Read this
http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2


On Tue, Jun 25, 2013 at 8:45 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 It depends on cassandra version. As far as I know in 1.2 coordinator logs
 request before it updates replicas. If it fails it will replay log on
 startup.
 In 1.1 you may have inconsistant state, because only part of your request
 is propagated to replicas.

 Thank you,
   Andrey


 On Tue, Jun 25, 2013 at 5:11 PM, Jiaan Zeng ji...@bloomreach.com wrote:

 Hi there,

 I am writing data to Cassandra by thrift client (not hector) and
 wonder what happen if the coordinator node fails. The same question
 applies for bulk loader which uses gossip protocol instead of thrift
 protocol. In my understanding, the HintedHandoff only takes care of
 the replica node fails.

 Thanks.

 --
 Regards,
 Jiaan





RE: copy data between clusters

2013-06-25 Thread S C
Is there any configuration reference that help me?

Thanks,SC

From: arthur.zuba...@aol.com
To: user@cassandra.apache.org
Subject: Re: copy data between clusters
Date: Tue, 25 Jun 2013 20:30:23 -0400







Hello SC,
 
whilst most of the sstableloader errors stem from incorrect setups I 
suspect this time you merely have a connectivity issue e.g. a firewall blocking 
traffic.


 

From: S C 
Sent: Tuesday, June 25, 2013 5:28 PM
To: user@cassandra.apache.org 
Subject: RE: copy data between clusters
 

Bob and Arthur - thanks for your inputs. 
 
I tried sstableloader but ran into below issue. Anything to do with the 
configuration to run sstableloader?
 

sstableloader -d 10.225.64.2,10.225.64.3 service/context
INFO 14:43:49,937 Opening service/context/service-context-hf-50 (164863 
bytes)
DEBUG 14:43:50,063 INDEX LOAD TIME for 
service/context/service-context-hf-50: 128 ms.
INFO 14:43:50,063 Opening service/context/service-context-hf-49 (7688939 
bytes)
DEBUG 14:43:50,076 INDEX LOAD TIME for 
service/context/service-context-hf-49: 13 ms.
INFO 14:43:50,076 Opening service/context/service-context-hf-51 (6703 
bytes)
DEBUG 14:43:50,078 INDEX LOAD TIME for 
service/context/service-context-hf-51: 2 ms.
Streaming revelant part of service/context/service-context-hf-50-Data.db 
service/context/service-context-hf-49-Data.db 
service/context/service-context-hf-51-Data.db to [/10.225.64.2, 
/10.225.64.3]
INFO 14:43:50,124 Stream context metadata 
[service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 
- 
0%], 3 sstables.
DEBUG 14:43:50,124 Adding file 
service/context/service-context-hf-50-Data.db to be streamed.
DEBUG 14:43:50,124 Adding file 
service/context/service-context-hf-49-Data.db to be streamed.
DEBUG 14:43:50,124 Adding file 
service/context/service-context-hf-51-Data.db to be streamed.
INFO 14:43:50,136 Streaming to /10.225.64.2
DEBUG 14:43:50,144 Files are service/context/service-context-hf-49-Data.db 
sections=1 progress=0/7688939 - 
0%,service/context/service-context-hf-51-Data.db 
sections=1 progress=0/6703 - 0%,service/context/service-context-hf-50-Data.db 
sections=1 progress=0/164863 - 0%
INFO 14:43:50,159 Stream context metadata 
[service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 
- 
0%], 3 sstables.
DEBUG 14:43:50,159 Adding file 
service/context/service-context-hf-50-Data.db to be streamed.
DEBUG 14:43:50,159 Adding file 
service/context/service-context-hf-49-Data.db to be streamed.
DEBUG 14:43:50,160 Adding file 
service/context/service-context-hf-51-Data.db to be streamed.
INFO 14:43:50,160 Streaming to /10.225.64.3
DEBUG 14:43:50,160 Files are service/context/service-context-hf-49-Data.db 
sections=1 progress=0/7688939 - 
0%,service/context/service-context-hf-51-Data.db 
sections=1 progress=0/6703 - 0%,service/context/service-context-hf-50-Data.db 
sections=1 progress=0/164863 - 0%
 
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s 
(avg: 0MB/s)] WARN 14:43:50,225 Failed attempt 1 to connect to /10.225.64.3 to 
stream service/context/service-context-hf-49-Data.db sections=1 
progress=0/7688939 - 0%. Retrying in 4000 ms. (java.net.SocketException: 
Invalid 
argument or cannot assign requested address)
WARN 14:43:50,241 Failed attempt 1 to connect to /10.225.64.2 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s 
(avg: 0MB/s)] WARN 14:43:54,227 Failed attempt 2 to connect to /10.225.64.3 to 
stream service/context/service-context-hf-49-Data.db sections=1 
progress=0/7688939 - 0%. Retrying in 8000 ms. (java.net.SocketException: 
Invalid 
argument or cannot assign requested address)
WARN 14:43:54,244 Failed attempt 2 to connect to /10.225.64.2 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s 
(avg: 0MB/s)] WARN 14:44:02,229 Failed attempt 3 to connect to /10.225.64.3 to 
stream service/context/service-context-hf-49-Data.db sections=1 
progress=0/7688939 - 0%. Retrying in 16000 ms. (java.net.SocketException: 
Invalid argument or cannot assign requested address)
WARN 14:44:02,309 Failed attempt 3 to connect to /10.225.64.2 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 16000 ms. (java.net.SocketException: Invalid 

Re: copy data between clusters

2013-06-25 Thread Arthur Zubarev
This is the best reference I have seen so far 
http://www.datastax.com/dev/blog/bulk-loading But I must tell it is not updated 
to match the most recent changes in C*. I suggest you read thru comments, too.

From: S C 
Sent: Tuesday, June 25, 2013 10:23 PM
To: user@cassandra.apache.org 
Subject: RE: copy data between clusters

Is there any configuration reference that help me?

Thanks, 
SC





From: arthur.zuba...@aol.com
To: user@cassandra.apache.org
Subject: Re: copy data between clusters
Date: Tue, 25 Jun 2013 20:30:23 -0400


Hello SC,

whilst most of the sstableloader errors stem from incorrect setups I suspect 
this time you merely have a connectivity issue e.g. a firewall blocking traffic.

From: S C 
Sent: Tuesday, June 25, 2013 5:28 PM
To: user@cassandra.apache.org 
Subject: RE: copy data between clusters

Bob and Arthur - thanks for your inputs. 

I tried sstableloader but ran into below issue. Anything to do with the 
configuration to run sstableloader?

sstableloader -d 10.225.64.2,10.225.64.3 service/context
INFO 14:43:49,937 Opening service/context/service-context-hf-50 (164863 bytes)
DEBUG 14:43:50,063 INDEX LOAD TIME for service/context/service-context-hf-50: 
128 ms.
INFO 14:43:50,063 Opening service/context/service-context-hf-49 (7688939 bytes)
DEBUG 14:43:50,076 INDEX LOAD TIME for service/context/service-context-hf-49: 
13 ms.
INFO 14:43:50,076 Opening service/context/service-context-hf-51 (6703 bytes)
DEBUG 14:43:50,078 INDEX LOAD TIME for service/context/service-context-hf-51: 2 
ms.
Streaming revelant part of service/context/service-context-hf-50-Data.db 
service/context/service-context-hf-49-Data.db 
service/context/service-context-hf-51-Data.db to [/10.225.64.2, /10.225.64.3]
INFO 14:43:50,124 Stream context metadata 
[service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 
- 0%], 3 sstables.
DEBUG 14:43:50,124 Adding file service/context/service-context-hf-50-Data.db to 
be streamed.
DEBUG 14:43:50,124 Adding file service/context/service-context-hf-49-Data.db to 
be streamed.
DEBUG 14:43:50,124 Adding file service/context/service-context-hf-51-Data.db to 
be streamed.
INFO 14:43:50,136 Streaming to /10.225.64.2
DEBUG 14:43:50,144 Files are service/context/service-context-hf-49-Data.db 
sections=1 progress=0/7688939 - 
0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 
0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%
INFO 14:43:50,159 Stream context metadata 
[service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 
- 0%], 3 sstables.
DEBUG 14:43:50,159 Adding file service/context/service-context-hf-50-Data.db to 
be streamed.
DEBUG 14:43:50,159 Adding file service/context/service-context-hf-49-Data.db to 
be streamed.
DEBUG 14:43:50,160 Adding file service/context/service-context-hf-51-Data.db to 
be streamed.
INFO 14:43:50,160 Streaming to /10.225.64.3
DEBUG 14:43:50,160 Files are service/context/service-context-hf-49-Data.db 
sections=1 progress=0/7688939 - 
0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 
0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%

progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 
0MB/s)] WARN 14:43:50,225 Failed attempt 1 to connect to /10.225.64.3 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
WARN 14:43:50,241 Failed attempt 1 to connect to /10.225.64.2 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 
0MB/s)] WARN 14:43:54,227 Failed attempt 2 to connect to /10.225.64.3 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
WARN 14:43:54,244 Failed attempt 2 to connect to /10.225.64.2 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 
0MB/s)] WARN 14:44:02,229 Failed attempt 3 to connect to /10.225.64.3 to stream 

Re: Cassandra terminates with OutOfMemory (OOM) error

2013-06-25 Thread Mohammed Guller
Replication is 3 and read consistency level is one. One of the non-cordinator 
mode is crashing, so the OOM is happening before aggregation of the data to be 
returned.

Thanks for the info about the space allocated to young generation heap. That is 
helpful.

Mohammed

On Jun 25, 2013, at 1:28 PM, sankalp kohli 
kohlisank...@gmail.commailto:kohlisank...@gmail.com wrote:

Your young gen is 1/4 of 1.8G which is 450MB. Also in slice queries, the 
co-ordinator will get the results from replicas as per consistency level used 
and merge the results before returning to the client.
What is the replication in your keyspace and what consistency you are reading 
with.
Also 55MB on disk will not mean 55MB in memory. The data is compressed on disk 
and also there are other overheads.



On Mon, Jun 24, 2013 at 8:38 PM, Mohammed Guller 
moham...@glassbeam.commailto:moham...@glassbeam.com wrote:
No deletes. In my test, I am just writing and reading data.

There is a lot of GC, but only on the younger generation. Cassandra terminates 
before the GC for old generation kicks in.

I know that our queries are reading an unusual amount of data. However, I 
expected it to throw a timeout exception instead of crashing. Also, don't 
understand why 1.8 Gb heap is getting full when the total data stored in the 
entire Cassandra cluster is less than 55 MB.

Mohammed

On Jun 21, 2013, at 7:30 PM, sankalp kohli 
kohlisank...@gmail.commailto:kohlisank...@gmail.com wrote:

Looks like you are putting lot of pressure on the heap by doing a slice query 
on a large row.
Do you have lot of deletes/tombstone on the rows? That might be causing a 
problem.
Also why are you returning so many columns as once, you can use auto paginate 
feature in Astyanax.

Also do you see lot of GC happening?


On Fri, Jun 21, 2013 at 1:13 PM, Jabbar Azam 
aja...@gmail.commailto:aja...@gmail.com wrote:
Hello Mohammed,

You should increase the heap space. You should also tune the garbage collection 
so young generation objects are collected faster, relieving pressure on heap We 
have been using jdk 7 and it uses G1 as the default collector. It does a better 
job than me trying to optimise the JDK 6 GC collectors.

Bear in mind though that the OS will need memory, so will the row cache and the 
filing system. Although memory usage will depend on the workload of your system.

I'm sure you'll also get good advice from other members of the mailing list.

Thanks

Jabbar Azam


On 21 June 2013 18:49, Mohammed Guller 
moham...@glassbeam.commailto:moham...@glassbeam.com wrote:
We have a 3-node cassandra cluster on AWS. These nodes are running cassandra 
1.2.2 and have 8GB memory. We didn't change any of the default heap or GC 
settings. So each node is allocating 1.8GB of heap space. The rows are wide; 
each row stores around 260,000 columns. We are reading the data using Astyanax. 
If our application tries to read 80,000 columns each from 10 or more rows at 
the same time, some of the nodes run out of heap space and terminate with OOM 
error. Here is the error message:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126)
at 
org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96)
at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164)
at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136)
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84)
at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294)
at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132)
at org.apache.cassandra.db.Table.getRow(Table.java:355)
at 
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70)
   at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052)
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at 

Re: Heap is not released and streaming hangs at 0%

2013-06-25 Thread aaron morton
 bloom_filter_fp_chance value that was changed from default to 0.1, looked at 
 the filters and they are about 2.5G on disk and I have around 8G of heap.
 I will try increasing the value to 0.7 and report my results. 
You need to re-write the sstables on disk using nodetool upgradesstables. 
Otherwise only the new tables with have the 0.1 setting. 

 I will try increasing the value to 0.7 and report my results. 
No need to, it will probably be something like Oh no, really, what, how, 
please make it stop :)
0.7 will mean reads will hit most / all of the SSTables for the CF. 

I covered a high row situation in on of my talks at the summit this month, the 
slide deck is here 
http://www.slideshare.net/aaronmorton/cassandra-sf-2013-in-case-of-emergency-break-glass
 and the videos will soon be up at Planet Cassandra. 

Rebuild the sstables, then reduce the index_interval if you still need to 
reduce mem pressure. 
 
Cheers


-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com


On 22/06/2013, at 1:17 PM, sankalp kohli kohlisank...@gmail.com wrote:

 I will take a heap dump and see whats in there rather than guessing. 
 
 
 On Fri, Jun 21, 2013 at 4:12 PM, Bryan Talbot btal...@aeriagames.com wrote:
 bloom_filter_fp_chance = 0.7 is probably way too large to be effective and 
 you'll probably have issues compacting deleted rows and get poor read 
 performance with a value that high.  I'd guess that anything larger than 0.1 
 might as well be 1.0.
 
 -Bryan
 
 
 
 On Fri, Jun 21, 2013 at 5:58 AM, srmore comom...@gmail.com wrote:
 
 On Fri, Jun 21, 2013 at 2:53 AM, aaron morton aa...@thelastpickle.com wrote:
  nodetool -h localhost flush didn't do much good.
 Do you have 100's of millions of rows ?
 If so see recent discussions about reducing the bloom_filter_fp_chance and 
 index_sampling. 
 Yes, I have 100's of millions of rows. 
  
 
 If this is an old schema you may be using the very old setting of 0.000744 
 which creates a lot of bloom filters. 
 
 bloom_filter_fp_chance value that was changed from default to 0.1, looked at 
 the filters and they are about 2.5G on disk and I have around 8G of heap.
 I will try increasing the value to 0.7 and report my results. 
 
 It also appears to be a case of hard GC failure (as Rob mentioned) as the 
 heap is never released, even after 24+ hours of idle time, the JVM needs to 
 be restarted to reclaim the heap.
 
 Cheers
  
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 20/06/2013, at 6:36 AM, Wei Zhu wz1...@yahoo.com wrote:
 
 If you want, you can try to force the GC through Jconsole. Memory-Perform 
 GC.
 
 It theoretically triggers a full GC and when it will happen depends on the 
 JVM
 
 -Wei
 
 From: Robert Coli rc...@eventbrite.com
 To: user@cassandra.apache.org
 Sent: Tuesday, June 18, 2013 10:43:13 AM
 Subject: Re: Heap is not released and streaming hangs at 0%
 
 On Tue, Jun 18, 2013 at 10:33 AM, srmore comom...@gmail.com wrote:
  But then shouldn't JVM C G it eventually ? I can still see Cassandra alive
  and kicking but looks like the heap is locked up even after the traffic is
  long stopped.
 
 No, when GC system fails this hard it is often a permanent failure
 which requires a restart of the JVM.
 
  nodetool -h localhost flush didn't do much good.
 
 This adds support to the idea that your heap is too full, and not full
 of memtables.
 
 You could try nodetool -h localhost invalidatekeycache, but that
 probably will not free enough memory to help you.
 
 =Rob
 
 
 
 



Re: Cassandra 1.0.9 Performance

2013-06-25 Thread aaron morton
 serving a load of approximately 600GB
is that 600GB in the cluster or 600GB per node ? 
In pre 1.2 days we recommend around 300GB to 500GB per node with spinning disks 
and 1Gbe networking. It's a soft rule of thumb not a hard rule. Above that size 
repair and replacing a failed node can take a long time.
 
 Does anyone have CPU/memory/network graphs (e.g. Cacti) over the last 1-2 
 months they are willing to share of their Cassandra database nodes?
If you can share yours and any specific concerns you may have we may be able to 
help. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/06/2013, at 1:14 PM, G man gmanli...@gmail.com wrote:

 Hi All,
 
 We are running a 1.0.9 cluster with 3 nodes (RF=3) serving a load of 
 approximately 600GB, and since I am fairly new to Cassandra, I'd like to 
 compare notes with other people running a cluster of similar size (perhaps 
 not in the amount of data, but the number of nodes).
 
 Does anyone have CPU/memory/network graphs (e.g. Cacti) over the last 1-2 
 months they are willing to share of their Cassandra database nodes?
 
 Just trying to compare our patterns with others to see if they are normal.
 
 Thanks in advance.
 G



Re: about FlushWriter All time blocked

2013-06-25 Thread aaron morton
 FlushWriter   0 0191 0
 12

This means there were 12 times the code wanted to put an memtable in the queue 
to be flushed to disk but the queue was full. 

The length of this queue is controlled by the memtable_flush_queue_size 
https://github.com/apache/cassandra/blob/cassandra-1.2/conf/cassandra.yaml#L299 
and memtable_flush_writers .

When this happens an internal lock around the commit log is held which prevents 
writes from being processed. 

In general it means the IO system cannot keep up. It can sometimes happen when 
snapshot is used as all the CF's are flushed to disk at once. I also suspect it 
happens sometimes when a commit log segment is flushed and their are a lot of 
dirty CF's. But i've never proved it. 

Increase memtable_flush_queue_size following the help in the yaml file. If you 
do not use secondary indexes are you using snapshot?

Hope that helps. 
A
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/06/2013, at 3:41 PM, yue.zhang yue.zh...@chinacache.com wrote:

 3 node
 cent os
 CPU 8core memory 32GB
 cassandra 1.2.5
 my scenario: many counter incr, every node has one client program, 
 performance is 400 wps /every clicent (it’s so slowly)
  
 my question:
 Ø  nodetool tpstats
 -
 Pool NameActive   Pending  Completed   Blocked  All 
 time blocked
 ReadStage 0 0   8453 0
  0
 RequestResponseStage  0 0  138303982 0
  0
 MutationStage 0 0  172002988 0
  0
 ReadRepairStage   0 0  0 0
  0
 ReplicateOnWriteStage 0 0   82246354 0
  0
 GossipStage   0 01052389 0
  0
 AntiEntropyStage  0 0  0 0
  0
 MigrationStage0 0  0 0
  0
 MemtablePostFlusher   0 0670 0
  0
 FlushWriter   0 0191 0
 12
 MiscStage 0 0  0 0
  0
 commitlog_archiver0 0  0 0
  0
 InternalResponseStage 0 0  0 0
  0
 HintedHandoff 0 0 56 0
  0
 ---
 FlushWriter “All time blocked”=12,I restart the node,but no use,it’s normally 
 ?
  
 thx
  
 -heipark