from:"Janne Jalkanen"

Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-11 Thread Janne Jalkanen


 A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
 and inter-level overlaps in CFs with Leveled Compaction. Your sstables
 generated with 1.1.3 and later should not have this issue [1] [2].

Does this mean that LCS on 1.0.x should be considered unsafe to use? I'm using 
them for semi-wide frequently-updated CounterColumns and they're performing 
much better on LCS than on STCS.

 In case you have old Leveled-compacted sstables (generated with 1.1.2
 or earlier. including 1.0.x) you need to run offline scrub using
 Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
 out-of-order sstables and inter-level overlaps caused by previous
 versions of LCS. You need to take nodes down in order to run offline
 scrub.

The  1.1.5 README does not mention this. Should it?

/Janne

Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-12 Thread Janne Jalkanen


On 12 Sep 2012, at 00:50, Omid Aladini wrote:

 On Tue, Sep 11, 2012 at 8:33 PM, Janne Jalkanen
 janne.jalka...@ecyrd.com wrote:
 
 Does this mean that LCS on 1.0.x should be considered unsafe to
 use? I'm using them for semi-wide frequently-updated CounterColumns
 and they're performing much better on LCS than on STCS.
 
 That's true. Unsafe in the sense that your data might not be in the
 right shape with respect to order of keys in sstables and LCS's
 properties and you might need to offline-scrub when you upgrade to the
 latest 1.1.x.

OK, so what's the worst case here? Data loss? Bad performance?

 The fix was released on 1.1.3 (LCS fix) and 1.1.4 (offline scrub) and
 I agree it would be helpful to have it on NEWS.txt.

I'll file a bug on this, unless someone can get to it first :)

/Janne

Re: Cassandra nodes failing with OOM

2012-11-19 Thread Janne Jalkanen


Something that bit us recently was the size of bloom filters: we have a column 
family which is mostly written to, and only read sequentially, so we were able 
to free a lot of memory and decrease GC pressure by increasing 
bloom_filter_fp_chance for that particular CF.

This on 1.0.12.

/Janne

On 18 Nov 2012, at 21:38, aaron morton wrote:

 1. How much GCInspector warnings per hour are considered 'normal'?
 None. 
 A couple during compaction or repair is not the end of the world. But if you 
 have enough to thinking about per hour it's too many. 
 
 2. What should be the next thing to check?
 Try to determine if the GC activity correlates to application workload, 
 compaction or repair. 
 
 Try to determine what the working set of the server is. Watch the GC activity 
 (via gc logs or JMX) and see what the size of the tenured heap is after a 
 CMS. Or try to calculate it 
 http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html
 
 Look at your data model and query patterns for places where very large 
 queries are being made. Or rows that are very long lived with a lot of 
 deletes (prob not as much as an issue with LDB). 
 
 
 3. What are the possible failure reasons and how to prevent those?
 
 As above. 
 As a work around sometimes drastically slowing down compaction can help. For 
 LDB try reducing in_memory_compaction_limit_in_mb and 
 compaction_throughput_mb_per_sec
 
 
 Hope that helps. 
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 17/11/2012, at 7:07 PM, Ивaн Cобoлeв sobol...@gmail.com wrote:
 
 Dear Community, 
 
 advice from you needed. 
 
 We have a cluster, 1/6 nodes of which died for various reasons(3 had OOM 
 message). 
 Nodes died in groups of 3, 1, 2. No adjacent died, though we use 
 SimpleSnitch.
 
 Version: 1.1.6
 Hardware:  12Gb RAM / 8 cores(virtual)
 Data:  40Gb/node
 Nodes:   36 nodes
 
 Keyspaces:2(RF=3, R=W=2) + 1(OpsCenter)
 CFs:36, 2 indexes
 Partitioner:  Random
 Compaction:   Leveled(we don't want 2x space for housekeeping)
 Caching:  Keys only
 
 All is pretty much standard apart from the one CF receiving writes in 64K 
 chunks and having sstable_size_in_mb=100.
 No JNA installed - this is to be fixed soon.
 
 Checking sysstat/sar I can see 80-90% CPU idle, no anomalies in io and the 
 only change - network activity spiking. 
 All the nodes before dying had the following on logs:
 INFO [ScheduledTasks:1] 2012-11-15 21:35:05,512 StatusLogger.java (line 72) 
 MemtablePostFlusher   1 4 0
 INFO [ScheduledTasks:1] 2012-11-15 21:35:13,540 StatusLogger.java (line 72) 
 FlushWriter   1 3 0
 INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line 72) 
 HintedHandoff 1 6 0
 INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line 77) 
 CompactionManager 5 9
 
 GCInspector warnings were there too, they went from ~0.8 to 3Gb heap in 
 5-10mins.
 
 So, could you please give me a hint on:
 1. How much GCInspector warnings per hour are considered 'normal'?
 2. What should be the next thing to check?
 3. What are the possible failure reasons and how to prevent those?
 
 Thank you very much in advance,
 Ivan

Re: Wide rows in CQL 3

2013-01-09 Thread Janne Jalkanen


On 10 Jan 2013, at 01:30, Edward Capriolo edlinuxg...@gmail.com wrote:

 Column families that mix static and dynamic columns are pretty common. In 
 fact it is pretty much the default case, you have a default validator then 
 some columns have specific validators. In the old days people used to say 
 You only need one column family you would subdivide your row key into parts 
 username=username, password=password, friend-friene = friends, pet-pets = 
 pets. It's very efficient and very easy if you understand what a slice is. Is 
 everyone else just adding a column family every time they have new data? :) 
 Sounds very un-no-sql-like. 

Well, we for sure are heavily mixing static and dynamic columns; it's quite 
useful, really. Which is why upgrading to CQL3 isn't really something I've 
considered seriously at any point.

 Most people are probably going to store column names as tersely as possible. 
 Your not going to store password as a multibyte UTF8(password). You store 
 it as ascii(password). (or really ascii('pw')

UTF8('password') === ascii('password'), actually - as long as you're within 
ascii range, UTF8 and ascii are equal byte for byte. It's not until code points 
 128 where you start getting multibytes.

Having said that, doesn't the sparse storage lend itself really well for 
further column name optimisation - like using a single byte to denote the 
column name and then have a lookup table?  The server could do a lot of nice 
tricks in this area, when afforded so by a tighter schema. Also, I think that 
compression pretty much does this already - effect is the same even if 
mechanism is different.

/Janne

HintedHandoff IOError?

2013-03-11 Thread Janne Jalkanen

I keep seeing these in my log.  Three-node cluster, one node is working fine, 
but two other nodes have increased latencies and these in the error logs (might 
of course be unrelated). No obvious GC pressure, no disk errors that I can see. 
 Ubuntu 12.04 on EC2, Java 7. Repair is run regularly.

My two questions: 1) should I worry, and 2) what might be going on, and 3) is 
there any way to get rid of these? Can I just blow my HintedHandoff table to 
smithereens?

The only relevant issue I might see is CASSANDRA-5158, but it's not about HH.

Any more info I could dig?

Node A:

ERROR [OptionalTasks:1] 2013-03-11 13:34:19,153 AbstractCassandraDaemon.java 
(line 135) Exception in thread Thread[OptionalTasks:1,5,main]
java.io.IOError: java.io.EOFException: bloom filter claims to be 0 bytes, 
longer than entire row size 2147483647
at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:101)
at 
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:66)
at 
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:86)
at 
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator$1.create(SSTableScanner.java:198)
at 
org.apache.cassandra.db.columniterator.LazyColumnIterator.getSubIterator(LazyColumnIterator.java:54)
at 
org.apache.cassandra.db.columniterator.LazyColumnIterator.getColumnFamily(LazyColumnIterator.java:66)
at 
org.apache.cassandra.db.RowIteratorFactory$2.reduce(RowIteratorFactory.java:95)
at 
org.apache.cassandra.db.RowIteratorFactory$2.reduce(RowIteratorFactory.java:79)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at 
org.apache.cassandra.db.ColumnFamilyStore$2.computeNext(ColumnFamilyStore.java:1403)
at 
org.apache.cassandra.db.ColumnFamilyStore$2.computeNext(ColumnFamilyStore.java:1399)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at 
org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:1476)
at 
org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1455)
at 
org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1450)
at 
org.apache.cassandra.db.HintedHandOffManager.scheduleAllDeliveries(HintedHandOffManager.java:406)
at 
org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:85)
at 
org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:120)
at 
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:79)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.EOFException: bloom filter claims to be 0 bytes, longer than 
entire row size 2147483647
at 
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:129)
at 
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:110)
at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:113)
at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:96)
... 30 more

Node B:

ERROR [OptionalTasks:1] 2013-03-11 13:51:02,177 AbstractCassandraDaemon.java 
(line 135) Exception in thread Thread[OptionalTasks:1,5,main]
java.io.IOError: java.io.EOFException
at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:101)
at 
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:66)
at 
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:86)
at

Re: HintedHandoff IOError?

2013-03-15 Thread Janne Jalkanen


JMX ended up just with lots more IOErrors. Did a rolling restart of the cluster 
and removed the HH family in the mean time. That seemed to do the trick. Thanks!

/Janne

On Mar 14, 2013, at 06:58 , aaron morton aa...@thelastpickle.com wrote:

 What is the sanctioned way of removing hints? rm -f HintsColumnFamily*? 
 Truncate from CLI?
 There is a JMX command to do it for a particular node. 
 But if you just want to remove all of them, stop and delete the files. 
 
  the only one with zero size are the -tmp- files.  It seems odd…
 Temp files are created during compaction and flushing sstables. 
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 11/03/2013, at 11:19 PM, Janne Jalkanen janne.jalka...@ecyrd.com wrote:
 
 
 Oops, forgot to mention that, did I… Cass 1.1.10. 
 
 What is the sanctioned way of removing hints? rm -f HintsColumnFamily*? 
 Truncate from CLI?
 
 This is ls -l of my /system/HintsColumnFamily/ btw - the only one with zero 
 size are the -tmp- files.  It seems odd…
 
 -rw-rw-r--  1 ubuntu ubuntu 86373144 Jan 26 21:39 
 system-HintsColumnFamily-hf-11-Data.db
 -rw-rw-r--  1 ubuntu ubuntu   80 Jan 26 21:39 
 system-HintsColumnFamily-hf-11-Digest.sha1
 -rw-rw-r--  1 ubuntu ubuntu  976 Jan 26 21:39 
 system-HintsColumnFamily-hf-11-Filter.db
 -rw-rw-r--  1 ubuntu ubuntu   11 Jan 26 21:39 
 system-HintsColumnFamily-hf-11-Index.db
 -rw-rw-r--  1 ubuntu ubuntu 4348 Jan 26 21:39 
 system-HintsColumnFamily-hf-11-Statistics.db
 -rw-rw-r--  1 ubuntu ubuntu  569 Feb 27 08:33 
 system-HintsColumnFamily-hf-23-Data.db
 -rw-rw-r--  1 ubuntu ubuntu   80 Feb 27 08:33 
 system-HintsColumnFamily-hf-23-Digest.sha1
 -rw-rw-r--  1 ubuntu ubuntu 1936 Feb 27 08:33 
 system-HintsColumnFamily-hf-23-Filter.db
 -rw-rw-r--  1 ubuntu ubuntu   11 Feb 27 08:33 
 system-HintsColumnFamily-hf-23-Index.db
 -rw-rw-r--  1 ubuntu ubuntu 4356 Feb 27 08:33 
 system-HintsColumnFamily-hf-23-Statistics.db
 -rw-rw-r--  1 ubuntu ubuntu  5500155 Feb 27 08:57 
 system-HintsColumnFamily-hf-24-Data.db
 -rw-rw-r--  1 ubuntu ubuntu   80 Feb 27 08:57 
 system-HintsColumnFamily-hf-24-Digest.sha1
 -rw-rw-r--  1 ubuntu ubuntu   16 Feb 27 08:57 
 system-HintsColumnFamily-hf-24-Filter.db
 -rw-rw-r--  1 ubuntu ubuntu   26 Feb 27 08:57 
 system-HintsColumnFamily-hf-24-Index.db
 -rw-rw-r--  1 ubuntu ubuntu 4340 Feb 27 08:57 
 system-HintsColumnFamily-hf-24-Statistics.db
 -rw-rw-r--  1 ubuntu ubuntu0 Feb 27 08:57 
 system-HintsColumnFamily-tmp-hf-25-Data.db
 -rw-rw-r--  1 ubuntu ubuntu0 Feb 27 08:57 
 system-HintsColumnFamily-tmp-hf-25-Index.db
 
 
 /Janne
 
 On Mar 12, 2013, at 08:07 , aaron morton aa...@thelastpickle.com wrote:
 
 What version of cassandra are you using?
 I would stop each node and delete the hints. If it happens again I could 
 either indicate a failing disk or a bug. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 11/03/2013, at 2:13 PM, Robert Coli robert.d.a.c...@gmail.com wrote:
 
 On Mon, Mar 11, 2013 at 7:05 AM, Janne Jalkanen
 janne.jalka...@ecyrd.com wrote:
 I keep seeing these in my log.  Three-node cluster, one node is working 
 fine, but two other nodes have increased latencies and these in the error 
 logs (might of course be unrelated). No obvious GC pressure, no disk 
 errors that I can see.  Ubuntu 12.04 on EC2, Java 7. Repair is run 
 regularly.
 
 My two questions: 1) should I worry, and 2) what might be going on, and 
 3) is there any way to get rid of these? Can I just blow my HintedHandoff 
 table to smithereens?
 
 http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/io/sstable/IndexHelper.java
 
 public static Filter defreezeBloomFilter(FileDataInput file, long
 maxSize, boolean useOldBuffer) throws IOException
{
int size = file.readInt();
if (size  maxSize || size = 0)
throw new EOFException(bloom filter claims to be  + size
 +  bytes, longer than entire row size  + maxSize);
ByteBuffer bytes = file.readBytes(size);
 
 
 Based on the above, I would suspect either a zero byte -Filter.db file
 or a corrupt one. Probably worry a little bit, but only a little bit
 unless your cluster is RF=1.
 
 =Rob

Re: secondary index problem

2013-03-15 Thread Janne Jalkanen


This could be either of the following bugs (which might be the same thing).  I 
get it too every time I recycle a node on 1.1.10.

https://issues.apache.org/jira/browse/CASSANDRA-4973
or
https://issues.apache.org/jira/browse/CASSANDRA-4785

/Janne

On Mar 15, 2013, at 23:24 , Brett Tinling btinl...@lacunasystems.com wrote:

 We have a CF with an indexed column 'type', but we get incomplete results 
 when we query that CF for all rows matching 'type'.  We can find the missing 
 rows if we query by key.
 
 * we are seeing this on a small, single node, 1.2.2 instance with few rows.
 * we use thrift execute_cql_query, no CL is specified
 * none of repair, restart, compact, scrub helped
 
 Finally, nodetool rebuild_index fixed it.  
 
 Is index rebuild something we need to do periodically?  How often?  Is there 
 a way to know when it needs to be done?  Do we have to run rebuild on all 
 nodes?
 
 We have not noticed this until 1.2
 
 Regards,
  - Brett

Munin plugins stupid question

2011-06-22 Thread Janne Jalkanen

Heya!

I know I should probably be able to figure this out on my own, but...

The Cassandra Munin plugins (all of them) define in their 
storageproxy_latency.conf the following (this is from a 0.6 config):

read_latency.jmxObjectName org.apache.cassandra.db:type=StorageProxy
read_latency.jmxAttributeName TotalReadLatencyMicros
read_latency.type DERIVE
read_latency.cdef read_latency,300,/

Ok... Why is the derived difference divided by 3 million?  A three-second 
update interval?

/Janne

1.0.3 CLI oddities

2011-11-28 Thread Janne Jalkanen

Hi!

(Asked this on IRC too, but didn't get anyone to respond, so here goes...)

Is it just me, or are these real bugs? 

On 1.0.3, from CLI: update column family XXX with gc_grace = 36000; just says 
null with nothing logged.  Previous value is the default.

Also, on 1.0.3, update column family XXX with 
compression_options={sstable_compression:SnappyCompressor,chunk_length_kb:64}; 
returns Internal error processing system_update_column_family and log says 
Invalid negative or null chunk_length_kb (stack trace below)

Setting the compression options worked on 1.0.0 when I was testing (though my 
64 kB became 64 MB, but I believe this was fixed in 1.0.3.)

Did the syntax change between 1.0.0 and 1.0.3? Or am I doing something wrong? 

The database was upgraded from 0.6.13 to 1.0.0, then scrubbed, then compression 
options set to some CFs, then upgraded to 1.0.3 and trying to set compression 
on other CFs.

Stack trace:

ERROR [pool-2-thread-68] 2011-11-28 09:59:26,434 Cassandra.java (line 4038) 
Internal error processing system_update_column_family
java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.io.IOException: org.apache.cassandra.config.ConfigurationException: 
Invalid negative or null chunk_length_kb
at 
org.apache.cassandra.thrift.CassandraServer.applyMigrationOnStage(CassandraServer.java:898)
at 
org.apache.cassandra.thrift.CassandraServer.system_update_column_family(CassandraServer.java:1089)
at 
org.apache.cassandra.thrift.Cassandra$Processor$system_update_column_family.process(Cassandra.java:4032)
at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: 
org.apache.cassandra.config.ConfigurationException: Invalid negative or null 
chunk_length_kb
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at 
org.apache.cassandra.thrift.CassandraServer.applyMigrationOnStage(CassandraServer.java:890)
... 7 more
Caused by: java.io.IOException: 
org.apache.cassandra.config.ConfigurationException: Invalid negative or null 
chunk_length_kb
at 
org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:78)
at org.apache.cassandra.db.migration.Migration.apply(Migration.java:156)
at 
org.apache.cassandra.thrift.CassandraServer$2.call(CassandraServer.java:883)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
... 3 more
Caused by: org.apache.cassandra.config.ConfigurationException: Invalid negative 
or null chunk_length_kb
at 
org.apache.cassandra.io.compress.CompressionParameters.validateChunkLength(CompressionParameters.java:167)
at 
org.apache.cassandra.io.compress.CompressionParameters.create(CompressionParameters.java:52)
at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:796)
at 
org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:74)
... 7 more
ERROR [MigrationStage:1] 2011-11-28 09:59:26,434 AbstractCassandraDaemon.java 
(line 133) Fatal exception in thread Thread[MigrationStage:1,5,main]
java.io.IOException: org.apache.cassandra.config.ConfigurationException: 
Invalid negative or null chunk_length_kb
at 
org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:78)
at org.apache.cassandra.db.migration.Migration.apply(Migration.java:156)
at 
org.apache.cassandra.thrift.CassandraServer$2.call(CassandraServer.java:883)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
Caused by: org.apache.cassandra.config.ConfigurationException: Invalid negative 
or null chunk_length_kb
at 
org.apache.cassandra.io.compress.CompressionParameters.validateChunkLength(CompressionParameters.java:167)
at 
org.apache.cassandra.io.compress.CompressionParameters.create(CompressionParameters.java:52)
at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:796)
at 
org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:74)

Re: [RELEASE] Apache Cassandra 1.0.5 released

2011-12-02 Thread Janne Jalkanen


Would be glad to be of any help; it's kind of annoying.

* Nothing unusual on any nodes that I can see
* Cannot reproduce on a single-node cluster; I see it only on our prod cluster 
which was running 0.6.13 until this point (cluster conf is attached to the JIRA 
issue mentioned below).

Let me know of anything that I can try, short of taking my production cluster 
offline :-P

/Janne

On Dec 2, 2011, at 20:42 , Jonathan Ellis wrote:

 The first step towards determining how serious it is, is showing us
 how to reproduce it or otherwise narrowing down what could be causing
 it, because timeouts can be caused by a lot of non-bug scenarios.
 Does it occur for every query or just some?  Is there anything unusual
 on the coordinator or replica nodes, like high CPU?  Can you reproduce
 with the stress tool?  Can you reproduce on a single-node-cluster?
 That kind of thing.
 
 On Fri, Dec 2, 2011 at 12:18 PM, Pierre Belanger
 pierre.belan...@xobni.com wrote:
 Hello,
 
 Is this bug serious enough for 1.0.6 to come out shortly or not?
 
 Thank you,
 PBR
 
 
 
 On Thu, Dec 1, 2011 at 6:05 PM, Zhong Li z...@voxeo.com wrote:
 
 After upgrade to 1.0.5 RangeSlice got timeout. Ticket
 https://issues.apache.org/jira/browse/CASSANDRA-3551
 
 On Dec 1, 2011, at 5:43 PM, Evgeniy Ryabitskiy wrote:
 
 +1
 After upgrade to 1.0.5 also have Timeout exception on Secondary Index
 search (get_indexed_slices API) .
 
 
 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com

Re: Upgrade from 0.6 to 1.0

2011-12-07 Thread Janne Jalkanen

I did this just last week, 0.6.13 - 1.0.5.

Basically, I grabbed the 0.7 distribution and ran the configuration conversion
tool there first, but since the config it produced wasn't compatible with 1.0,
in the end I just opened two editor windows, one with my 0.6 config and one
with the 1.0 cassandra.yaml file and modified it by hand. It wasn't
particularly hard, since most of my config was happy with the defaults. Also I
created a schema file based on the CF family structure I had, also by hand, but
I've got only ten or so CFs.

I ran nodetool compact on all nodes, then nodetool drain, then shut down the
cluster. Upgraded to 1.0, restarted cluster. Then imported the schema file
(with cassandra-cli --file myschema.txt) Then ran nodetool scrub and let my
clients connect. The entire process was fairly smooth except for some
client-side oddities which were my own fault, and took about (compaction time +
1) hours, most of which was spent in debugging my application. I don't think
compaction is necessary, but I wanted to make sure I wouldn't run into disk
space problems.

After this everything has been just fine, except for CASSANDRA-3551, which has
been causing headaches for us, but not badly enough to make me seriously
consider downgrading.

/Janne

On Dec 6, 2011, at 22:12 , Jehan Bing wrote:

Hi,

I've seen recent posts saying it was possible to upgrade directly from 0.6 to
1.0. But how?

I ran nodetool drain on all my nodes and shut them down.

However, there is not config-convert tool anymore. Since I basically using
the default config, is it important? Or is it OK to just use the default one
and change the few settings I need?

Also, there is no schematool anymore either. So how do I load the schema? Can
I just create one using cassandra-cli? Will cassandra then load the existing
data?

Last, I tried to start cassandra 1.0.5, I get the following error in
cassandra.log:

ERROR 11:41:19,399 Exception encountered during startup
java.lang.AssertionError
at
org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:295)
at
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:150)
at
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:337)
at
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
java.lang.AssertionError
at
org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:295)
at
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:150)
at
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:337)
at
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
Exception encountered during startup: null

Then cassandra exits so I can't run nodetool repair (or create the schema if
that's the problem).

So how should I proceed?
Or maybe I misread the previous post and I should actually do 0.6-0.7-1.0?

Thanks,
Jehan

Re: 1.0.3 CLI oddities

2011-12-14 Thread Janne Jalkanen


Correct. 1.0.6 fixes this for me.

/Janne

On 12 Dec 2011, at 02:57, Chris Burroughs wrote:

 Sounds like https://issues.apache.org/jira/browse/CASSANDRA-3558 and the
 other tickets reference there.
 
 On 11/28/2011 05:05 AM, Janne Jalkanen wrote:
 Hi!
 
 (Asked this on IRC too, but didn't get anyone to respond, so here goes...)
 
 Is it just me, or are these real bugs? 
 
 On 1.0.3, from CLI: update column family XXX with gc_grace = 36000; just 
 says null with nothing logged.  Previous value is the default.
 
 Also, on 1.0.3, update column family XXX with 
 compression_options={sstable_compression:SnappyCompressor,chunk_length_kb:64};
  returns Internal error processing system_update_column_family and log 
 says Invalid negative or null chunk_length_kb (stack trace below)
 
 Setting the compression options worked on 1.0.0 when I was testing (though 
 my 64 kB became 64 MB, but I believe this was fixed in 1.0.3.)
 
 Did the syntax change between 1.0.0 and 1.0.3? Or am I doing something 
 wrong? 
 
 The database was upgraded from 0.6.13 to 1.0.0, then scrubbed, then 
 compression options set to some CFs, then upgraded to 1.0.3 and trying to 
 set compression on other CFs.
 
 Stack trace:
 
 ERROR [pool-2-thread-68] 2011-11-28 09:59:26,434 Cassandra.java (line 4038) 
 Internal error processing system_update_column_family
 java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
 java.io.IOException: org.apache.cassandra.config.ConfigurationException: 
 Invalid negative or null chunk_length_kb
  at 
 org.apache.cassandra.thrift.CassandraServer.applyMigrationOnStage(CassandraServer.java:898)
  at 
 org.apache.cassandra.thrift.CassandraServer.system_update_column_family(CassandraServer.java:1089)
  at 
 org.apache.cassandra.thrift.Cassandra$Processor$system_update_column_family.process(Cassandra.java:4032)
  at 
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
  at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:680)
 Caused by: java.util.concurrent.ExecutionException: java.io.IOException: 
 org.apache.cassandra.config.ConfigurationException: Invalid negative or null 
 chunk_length_kb
  at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
  at java.util.concurrent.FutureTask.get(FutureTask.java:83)
  at 
 org.apache.cassandra.thrift.CassandraServer.applyMigrationOnStage(CassandraServer.java:890)
  ... 7 more
 Caused by: java.io.IOException: 
 org.apache.cassandra.config.ConfigurationException: Invalid negative or null 
 chunk_length_kb
  at 
 org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:78)
  at org.apache.cassandra.db.migration.Migration.apply(Migration.java:156)
  at 
 org.apache.cassandra.thrift.CassandraServer$2.call(CassandraServer.java:883)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  ... 3 more
 Caused by: org.apache.cassandra.config.ConfigurationException: Invalid 
 negative or null chunk_length_kb
  at 
 org.apache.cassandra.io.compress.CompressionParameters.validateChunkLength(CompressionParameters.java:167)
  at 
 org.apache.cassandra.io.compress.CompressionParameters.create(CompressionParameters.java:52)
  at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:796)
  at 
 org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:74)
  ... 7 more
 ERROR [MigrationStage:1] 2011-11-28 09:59:26,434 
 AbstractCassandraDaemon.java (line 133) Fatal exception in thread 
 Thread[MigrationStage:1,5,main]
 java.io.IOException: org.apache.cassandra.config.ConfigurationException: 
 Invalid negative or null chunk_length_kb
  at 
 org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:78)
  at org.apache.cassandra.db.migration.Migration.apply(Migration.java:156)
  at 
 org.apache.cassandra.thrift.CassandraServer$2.call(CassandraServer.java:883)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:680)
 Caused by: org.apache.cassandra.config.ConfigurationException: Invalid 
 negative or null chunk_length_kb
  at 
 org.apache.cassandra.io.compress.CompressionParameters.validateChunkLength(CompressionParameters.java:167

Re: Counters and Top 10

2011-12-24 Thread Janne Jalkanen


In our case we didn't need an exact daily top-10 list of pages, just a good 
guess of it.  So the way we did it was to insert a column with a short TTL 
(e.g. 12 hours) with the page id as the column name.  Then, when constructing 
the top-10 list, we'd just slice through the entire list of unexpired page 
id's, get the actual activity data for each from another CF and then sort.  The 
theory is that if a page is popular, they'd be referenced at least once in the 
past 12 hours anyway.  Depending on the size of your hot pages and the 
frequency at which you'd need the top-10 list, you can then tune the TTL 
accordingly.  We started at 24 hrs, then went down to 12 and then gradually 
downwards.

So while it's not guaranteed to be the precise top-10 list for the day, it is a 
fairly accurate sampling of one.

/Janne

On 23 Dec 2011, at 11:52, aaron morton wrote:

 Counters only update the value of the column, they cannot be used as column 
 names. So you cannot have a dynamically updating top ten list using counters.
 
 You have a couple of options. First use something like redis if that fits 
 your use case. Redis could either be the database of record for the counts. 
 Or just an aggregation layer, write the data to cassandra and sorted sets in 
 redis then read the top ten from redis and use cassandra to rebuild redis if 
 needed. 
 
 The other is to periodically pivot the counts into a top ten row where you 
 use regular integers for the column name. With only 10K users you could do 
 this with an process that periodically reads all the users rows or where ever 
 the counters are and updates the aggregate row. Depending on data size you 
 cold use hive/pig or whatever regular programming language your are happy 
 with.
 
 I guess you could also use redis to keep the top ten sorted and then 
 periodically dump that back to cassandra and serve the read traffic from 
 there.  
 
 Hope that helps 
 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 23/12/2011, at 3:46 AM, R. Verlangen wrote:
 
 I would suggest you to create a CF with a single row (or multiple for 
 historical data) with a date as key (utf8, e.g. 2011-12-22) and multiple 
 columns for every user's score. The column (utf8) would then be the score + 
 something unique of the user (e.g. hex representation of the TimeUUID). The 
 value would be the TimeUUID of the user.
 
 By default columns will be sorted and you can perform a slice to get the top 
 10.
 
 2011/12/14 cbert...@libero.it cbert...@libero.it
 Hi all,
 I'm using Cassandra in production for a small social network (~10.000 
 people).
 Now I have to assign some credits to each user operation (login, write post
 and so on) and then beeing capable of providing in each moment the top 10 of
 the most active users. I'm on Cassandra 0.7.6 I'd like to migrate to a new
 version in order to use Counters for the user points but ... what about the 
 top
 10?
 I was thinking about a specific ROW that always keeps the 10 most active 
 users
 ... but I think it would be heavy (to write and to handle in thread-safe 
 mode)
 ... can counters provide something like a value ordered list?
 
 Thanks for any help.
 Best regards,
 
 Carlo

Re: cassandra hit a wall: Too many open files (98567!)

2012-01-18 Thread Janne Jalkanen


1.0.6 has a file leak problem, fixed in 1.0.7. Perhaps this is the reason?

https://issues.apache.org/jira/browse/CASSANDRA-3616

/Janne

On Jan 18, 2012, at 03:52 , dir dir wrote:

 Very Interesting Why you open so many file? Actually what kind of
 system that is built by you until open so many files? would you tell us?
 Thanks...
 
 
 On Sat, Jan 14, 2012 at 2:01 AM, Thorsten von Eicken t...@rightscale.com 
 wrote:
 I'm running a single node cassandra 1.0.6 server which hit a wall yesterday:
 
 ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327
 AbstractCassandraDaemon.java (line 133) Fatal exception in thread
 Thread[CompactionExecutor:2918,1,main] java.io.IOError:
 java.io.FileNotFoundException:
 /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many
 open files in system)
 
 After that it stopped working and just say there with this error
 (undestandable). I did an lsof and saw that it had 98567 open files,
 yikes! An ls in the data directory shows 234011 files. After restarting
 it spent about 5 hours compacting, then quieted down. About 173k files
 left in the data directory. I'm using leveldb (with compression). I
 looked into the json of the two large CFs and gen 0 is empty, most
 sstables are gen 3  4. I have a total of about 150GB of data
 (compressed). Almost all the SStables are around 3MB in size. Aren't
 they supposed to get 10x bigger at higher gen's?
 
 This situation can't be healthy, can it? Suggestions?

Re: how stable is 1.0 these days?

2012-01-26 Thread Janne Jalkanen


1.0.5 and 1.0.6 we had some longer-term stability problems with (fd leaks, 
etc), but so far 1.0.7 is running like a train for us.

/Janne

On Jan 26, 2012, at 08:43 , Radim Kolar wrote:

 Dne 26.1.2012 2:32, David Carlton napsal(a):
 How stable is 1.0 these days?
 good. but hector 1.0 is unstable.

Re: multi region EC2

2012-03-31 Thread Janne Jalkanen


I've switched from SS to NTS on 1.0.x on a single-az cluster with RF3 (which 
obviously created a single-dc, single-rack NTS cluster). Worked without a 
hitch. Also switched from SimpleSnitch to Ec2Snitch on-the-fly. I had about 
12GB of data per node.

Of course, your mileage may vary, so while I can report that it has been done 
successfully, I'd still recommend testing it out first...

/Janne

On Mar 31, 2012, at 22:45 , aaron morton wrote:

 I'm kind of guessing here because it's not something I've done before. 
 Obviously test things first…
 
 The NTS with a single DC and a single Rack will place data in the same 
 location as the Simple Strategy. You *should* be able to change the 
 replication strategy from, say, SS with RF 3 to NTS with RF 3 in a single DC 
 with a single Rack. 
 
 I think you can also migrate to using multiple racks under NTS, but I would 
 need to double check the code. 
 
 Cheers
  
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 27/03/2012, at 11:31 AM, Deno Vichas wrote:
 
 On 3/26/2012 2:15 PM, aaron morton wrote:
 - can i migrate the replication strategy one node at a time or do i need 
 to shut to the whole cluster to do this?
 Just use the NTS from the start.
 but what if i already have a bunch (8g per node) data that i need and i 
 don't have a way to re-create it.
 
 
 thanks,
 deno

Re: cassandra 1.0.9 is out!

2012-04-07 Thread Janne Jalkanen


...or if you're a Pig user, you get support for both counter columns and 
composite columns.

/Janne

On Apr 7, 2012, at 07:46 , Watanabe Maki wrote:

 1.0.9 is a maintenance release, so it's basically bug fixes with some minor 
 improvements.
 If you plan to use LeveledCompaction, you should better to use 1.0.9+ or 
 1.1.0+.
 
 maki
 
 
 On 2012/04/07, at 6:49, Tim Dunphy bluethu...@gmail.com wrote:
 
 Hello list,
 
 
 I just noticed that cassandra 1.0.9 was released.  What's so cool
 about it? It's really hard for me to keep up with all the upgrades to
 cassandra db, although I really enjoy learning it and working with it.
 Is there any place I can go to learn about what's new in the latest
 release? Or if someone out there who really understands these issues
 would be kind enough to explain to me about this in as human,
 non-faq-ish way that would be rather cool.
 
 I'm also curious about the evolution of cassandra, which as we all
 know happens rapidly. I'm wondering what some of the most important
 developments were and which versions you can find them in.
 
 I'd certainly appreciate any knowledge you have to share.
 
 Thanks
 Tim
 
 
 -- 
 GPG me!!
 
 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: issue with composite row key on CassandraStorage pig?

2012-04-09 Thread Janne Jalkanen


I don't think the Pig code supports Composite *keys* yet. The 1.0.9 code 
supports Composite Column Names tho'...

/Janne

On Apr 8, 2012, at 06:02 , Janwar Dinata wrote:

 Hi,
 
 I have a column family that uses DynamicCompositeType for its 
 keys_validation_class.
 When I try to dump the row keys using pig but it fails with 
 java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be 
 cast to org.apache.pig.data.Tuple
 
 This is how I create the column family
 create column family CompoKey
with
  key_validation_class =
'DynamicCompositeType(
  a=AsciiType,
  o=BooleanType,
  b=BytesType,
  e=DateType,
  d=DoubleType,
  f=FloatType,
  i=IntegerType,
  x=LexicalUUIDType,
  l=LongType,
  t=TimeUUIDType,
  s=UTF8Type,
  u=UUIDType)' and
  comparator =
'DynamicCompositeType(
  a=AsciiType,
  o=BooleanType,
  b=BytesType,
  e=DateType,
  d=DoubleType,
  f=FloatType,
  i=IntegerType,
  x=LexicalUUIDType,
  l=LongType,
  t=TimeUUIDType,
  s=UTF8Type,
  u=UUIDType)' and
  default_validation_class = CounterColumnType;   
 
 This is my pig script
 rows =  LOAD 'cassandra://PigTest/CompoKey' USING CassandraStorage();
 keys = FOREACH rows GENERATE flatten(key);
 dump keys;
 
 I'm on cassandra 1.0.9 and pig 0.9.2.
 
 Thanks.

Re: issue with composite row key on CassandraStorage pig?

2012-04-10 Thread Janne Jalkanen


There doesn't seem to be an open JIRA ticket for it - can you please make one 
at https://issues.apache.org/jira/browse/CASSANDRA? That ensures that at some 
point someone will take a look at it and it just won't be forgotten in the 
endless barrage of emails...

Yup, I did the composite columns support. I'd start by looking at 
CassandraStorage.getNext().

/Janne

On 9 Apr 2012, at 22:02, Janwar Dinata wrote:

 Hi Janne,
 
 Do you happen to know if support for composite row key is in the pipeline?
 
 It seems that you did a patch for composite columns support on 
 CassandraStorage.java.
 Do you have any pointers for implementing composite row key feature?
 
 Thanks.
 
 On Mon, Apr 9, 2012 at 11:32 AM, Janne Jalkanen janne.jalka...@ecyrd.com 
 wrote:
 
 I don't think the Pig code supports Composite *keys* yet. The 1.0.9 code 
 supports Composite Column Names tho'...
 
 /Janne
 
 On Apr 8, 2012, at 06:02 , Janwar Dinata wrote:
 
 Hi,
 
 I have a column family that uses DynamicCompositeType for its 
 keys_validation_class.
 When I try to dump the row keys using pig but it fails with 
 java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be 
 cast to org.apache.pig.data.Tuple
 
 This is how I create the column family
 create column family CompoKey
with
  key_validation_class =
'DynamicCompositeType(
  a=AsciiType,
  o=BooleanType,
  b=BytesType,
  e=DateType,
  d=DoubleType,
  f=FloatType,
  i=IntegerType,
  x=LexicalUUIDType,
  l=LongType,
  t=TimeUUIDType,
  s=UTF8Type,
  u=UUIDType)' and
  comparator =
'DynamicCompositeType(
  a=AsciiType,
  o=BooleanType,
  b=BytesType,
  e=DateType,
  d=DoubleType,
  f=FloatType,
  i=IntegerType,
  x=LexicalUUIDType,
  l=LongType,
  t=TimeUUIDType,
  s=UTF8Type,
  u=UUIDType)' and
  default_validation_class = CounterColumnType;   
 
 This is my pig script
 rows =  LOAD 'cassandra://PigTest/CompoKey' USING CassandraStorage();
 keys = FOREACH rows GENERATE flatten(key);
 dump keys;
 
 I'm on cassandra 1.0.9 and pig 0.9.2.
 
 Thanks.

Re: cql shell error

2012-04-15 Thread Janne Jalkanen


You might have hit this bug: 
https://issues.apache.org/jira/browse/CASSANDRA-4003

/Janne

On Apr 15, 2012, at 17:21 , Tamar Fraenkel wrote:

 Hi!
 I have an error when I try to read column value using cql but I can read it 
 when I use cli.
 
 When I read in cli I get:
  get cf['a52efb7a-b2ea-417b-b54a-9d6a2ebf6d71']['i:nwtp_name']=
 = (column=i:nwtp_name, value=G�¼nter Grass's Israel poem provokes outrage, 
 timestamp=1333816116526001)
 
 When I try to read with cqlsh I get:
 'ascii' codec can't encode character u'\u2019' in position 5: ordinal not in 
 range(128)
 
 Do I need to save only ascii chars, or can I read it somehow using cql?
 
 Thanks
 
 
 Tamar Fraenkel 
 Senior Software Engineer, TOK Media 
 
 tokLogo.png
 
 ta...@tok-media.com
 Tel:   +972 2 6409736 
 Mob:  +972 54 8356490 
 Fax:   +972 2 5612956

Re: cql shell error

2012-04-15 Thread Janne Jalkanen


The Resolution line says Fixed, and the Fix Version line says 1.0.9, 
1.1.0. So upgrade to 1.0.9 to get a fix for this particular bug :-)

(Luckily, 1.0.9 has been released a few days ago, so you can just download and 
upgrade.)

/Janne

On Apr 15, 2012, at 20:31 , Tamar Fraenkel wrote:

 I apologize for what must be a dumb question, but I see that there are 
 patches etc, what do I need to do in order to have the fix. I am running 
 latest Cassandra 1.0.8.
 
 Tamar Fraenkel 
 Senior Software Engineer, TOK Media 
 
 tokLogo.png
 
 ta...@tok-media.com
 Tel:   +972 2 6409736 
 Mob:  +972 54 8356490 
 Fax:   +972 2 5612956 
 
 
 
 
 
 On Sun, Apr 15, 2012 at 7:46 PM, Janne Jalkanen janne.jalka...@ecyrd.com 
 wrote:
 
 You might have hit this bug: 
 https://issues.apache.org/jira/browse/CASSANDRA-4003
 
 /Janne
 
 On Apr 15, 2012, at 17:21 , Tamar Fraenkel wrote:
 
 Hi!
 I have an error when I try to read column value using cql but I can read it 
 when I use cli.
 
 When I read in cli I get:
  get cf['a52efb7a-b2ea-417b-b54a-9d6a2ebf6d71']['i:nwtp_name']=
 = (column=i:nwtp_name, value=G�¼nter Grass's Israel poem provokes outrage, 
 timestamp=1333816116526001)
 
 When I try to read with cqlsh I get:
 'ascii' codec can't encode character u'\u2019' in position 5: ordinal not in 
 range(128)
 
 Do I need to save only ascii chars, or can I read it somehow using cql?
 
 Thanks
 
 
 Tamar Fraenkel 
 Senior Software Engineer, TOK Media 
 
 tokLogo.png
 
 
 ta...@tok-media.com
 Tel:   +972 2 6409736 
 Mob:  +972 54 8356490 
 Fax:   +972 2 5612956

Re: Column Family per User

2012-04-18 Thread Janne Jalkanen


Each CF takes a fair chunk of memory regardless of how much data it has, so 
this is probably not a good idea, if you have lots of users. Also using a 
single CF means that compression is likely to work better (more redundant data).

However, Cassandra distributes the load across different nodes based on the row 
key, and the writes scale roughly linearly according to the number of nodes. So 
if you can make sure that no single row gets overly burdened by writes (50 
million writes/day to a single row would always go to the same nodes - this is 
in the order of 600 writes/second/node, which shouldn't really pose a problem, 
IMHO). The main problem is that if a single row gets lots of columns it'll 
start to slow down at some point, and your row caches become less useful, as 
they cache the entire row.

Keep your rows suitably sized and you should be fine. To partition the data, 
you can either distribute it to a few CFs based on use or use some other 
distribution method (like user:1234:00 where the 00 is the hour-of-the-day.

(There's a great article by Aaron Morton on how wide rows impact performance at 
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/, but as always, 
running your own tests to determine the optimal setup is recommended.)

/Janne

On Apr 18, 2012, at 21:20 , Trevor Francis wrote:

 Our application has users that can write in upwards of 50 million records per 
 day. However, they all write the same format of records (20 fields…columns). 
 Should I put each user in their own column family, even though the column 
 family schema will be the same per user?
 
 Would this help with dimensioning, if each user is querying their keyspace 
 and only their keyspace?
 
 
 Trevor Francis

Re: Data aggregation - averages, sums, etc.

2012-05-19 Thread Janne Jalkanen

 2. I know I have counter columns. I can do sums. But can I do averages ?

One counter column for the sum, one counter column for the count. Divide for 
average :-)

/Janne

Re: cassandra as session store

2011-02-01 Thread Janne Jalkanen


If your sessions are fairly long-lived (more like hours instead of minutes) and 
you crank up a suitable row cache and make sure your db is consistent (via 
quorum read/writes or write:all, read:1) - sure, why not?  Especially if you're 
already familiar with Cassandra; possibly even have a deployed instance already 
for your web app. Adding new components to the mix is always a sure way to get 
some headscratching going. For a small team who does not want to spend too much 
time on configuring yet another database, Cassandra would probably work well as 
a session store. And you would get cross-datacenter reliability too.

However, you might want to use 0.7 and expiring columns; otherwise cleaning up 
is going to be boring.

/Janne

On Feb 1, 2011, at 22:24 , Sasha Dolgy wrote:

 
 What I'm still unclear about, and where I think this is suitable, is 
 Cassandra being used as a data warehouse for current and past sessions tied 
 to a user.  Yes, other things are great for session management, but I want to 
 provide near real time session information to my users ... quick and simple 
 and i want to use cassandra ... surely i can't be that bad for thinking this 
 is a good idea?
   
 -sd
 
 On Tue, Feb 1, 2011 at 9:20 PM, Kallin Nagelberg kallin.nagelb...@gmail.com 
 wrote:
 nvm on the persistence, it seems like it does support it:
 
 'Since version 1.1 the safer alternative is an append-only file (a
 journal) that is written as operations modifying the dataset in memory
 are processed. Redis is able to rewrite the append-only file in the
 background in order to avoid an indefinite growth of the journal.'
 
 This thread probably shouldn't digress too much from Cassandra's
 suitability for session management though..

Re: unsubscribe

2011-02-02 Thread Janne Jalkanen


How about adding an autosignature with unsubscription info?

/Janne

On Feb 2, 2011, at 19:42 , Norman Maurer wrote:

 To make it short.. No it can't.
 
 Bye,
 Norman
 
 (ASF Infrastructure Team)
 
 2011/2/2 F. Hugo Zwaal h...@unitedgames.com:
 Can't the mailinglist server be changed to treat messages with unsubscribe
 as subject as an unsubscribe as well? Otherwise it will just keep happening,
 as people simply don't remember or take time to find out?
 
 Just my 2 cents...
 
 Groets, Hugo.
 
 On 2 feb 2011, at 16:54, Jonathan Ellis jbel...@gmail.com wrote:
 
 http://wiki.apache.org/cassandra/FAQ#unsubscribe
 
 On Wed, Feb 2, 2011 at 7:55 AM, JJ jjcha...@gmail.com wrote:
 
 
 Sent from my iPad
 
 
 
 
 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com

Explaining the Replication Factor, N and W and R

2011-02-13 Thread Janne Jalkanen

Folks,

as it seems that wrapping the brain around the R+WN concept is a big hurdle 
for a lot of users, I made a simple web page that allows you to try out the 
different parameters and see how they affect the system.

http://www.ecyrd.com/cassandracalculator/

Let me know if you have any suggestions to improve the wording, or you spot a 
bug. Trying to go for simplicity and clarity over absolute correctness here, as 
this is meant to help newbies.

(App is completely self-contained HTML and Javascript.)

/Janne

Re: Explaining the Replication Factor, N and W and R

2011-02-13 Thread Janne Jalkanen

 Excellent! How about adding Hinted Handoff enabled/disabled option?

Sure, once I understand it ;-)

/Janne

Re: Flush / Snapshot Triggering Full GCs, Leaving Ring

2011-04-10 Thread Janne Jalkanen


On Apr 7, 2011, at 23:43 , Jonathan Ellis wrote:

 The history is that, way back in the early days, we used to max it out
 the other way (MTT=128) but observed behavior is that objects that
 survive 1 new gen collection are very likely to survive forever.

Just a quick note: my own tests seem to indicate likewise - I've been running 
one machine in our cluster with MTT=8 for some time now, and I'm not seeing any 
real difference between MTT=1 and MTT=8, except for a (very) slight increase in 
CPU usage, consistent with the increased copying. We have a read-heavy cluster 
that uses RowCaches a lot, so YMMV.

/Janne

Minor question on index design

2010-09-14 Thread Janne Jalkanen

Hi all!

I'm pondering between a couple of alternatives here: I've got two CFs, one 
which contains Objects, and one which contains Users. Now, each Object has an 
owner associated to it, so obviously I need some sort of an index to point from 
Users to Objects.  This would be of course the perfect usecase for secondary 
indices on 0.7, but I'm still on 0.6.x.

So, esteemed Cassandra-heads, I'm pondering what would be a better design here:

1) I can create a separate CF OwnerIdx which has user id's as keys, and then 
each of the columns points at an object (with a dummy value, since I just need 
a list).  This would add a new CF, but on the other hand, this would be easy to 
drop once 0.7 comes along and I can just make a index query to the Objects CF, 
OR

2) Put the index inside the Users CF, with object:id for column name and a 
dummy value, and then get slices as necessary? This would mean less CFs (and 
hence no schema modification), but might mean that I have to clean it up at 
some point.

I don't yet have a lot of CFs, so I'm not worried about mem consumption really. 
 The Users CF is very read-heavy as-is, but the index and Objects will be a bit 
more balanced.

Experiences? Recommendations? Tips? Other possibilities? What other 
considerations should I take into account?

/Janne

Re: OrderPreservingPartitioner for get_range_slices

2010-09-15 Thread Janne Jalkanen



Correct. You can use get_range_slices with RandomPartitioner too, BUT  
the iteration order is non-predictable, that is, you will not know in  
which order you get the rows (RandomPartitioner would probably better  
be called ObscurePartitioner - it ain't random, but it's as good as if  
it were ;-). This I find to be mostly useful when you want to go  
through your entire keyspace, like when doing something map-reduce  
like.  Or you just have a fairly small CF.


/Janne

On Sep 15, 2010, at 20:26 , Rana Aich wrote:


Hi All,

I was under the impression that in order to query with  
get_range_slices one has to have a OrderPreservingPartitioner.


Can we do get_range_slices with RandomPartitioner also? I can  
distinctly remember I read that(OrderPreservingPartitioner for  
get_range_slices) in Cassnadra WIKI but now somehow I'm not finding  
it anymore.


Can anyone throw some light on it.

Thanks and Regards,

Rana

Re: Minor question on index design

2010-09-15 Thread Janne Jalkanen

Ok, thanks. I'm going with Option 1, and try to steer away from
SuperColumns. That also gives me the option to tweak the caches
depending on the use pattern (User CF will be accessed in a lot of
different ways, not just with relation to Objects).

/Janne

On Sep 14, 2010, at 23:46 , Aaron Morton wrote:

I've been doing option 1 under 0.6. As usual in cassandra though a
lot depends on how you access the data.

- If you often want to get the user and all of the objects they
have, use option 2. It's easier to have one read from one CF to
answer your query.
- If the user has potentially 10k objects go with option 2. AFAIK
large super columns are still inefficient https://issues.apache.org/jira/browse/CASSANDRA-674
https://issues.apache.org/jira/browse/CASSANDRA-598
- In your OwnerIndex CF consider making the column name something
meaningful such as the Object Name or Timestamp (if it has one) so
you can slice against it, e.g. to support paging operations. Make
the column value the key for the object.

Aaron

On 15 Sep, 2010,at 02:41 AM, Janne Jalkanen
janne.jalka...@ecyrd.com wrote:

Hi all!

I'm pondering between a couple of alternatives here: I've got two
CFs, one which contains Objects, and one which contains Users. Now,
each Object has an owner associated to it, so obviously I need some
sort of an index to point from Users to Objects. This would be of
course the perfect usecase for secondary indices on 0.7, but I'm
still on 0.6.x.

So, esteemed Cassandra-heads, I'm pondering what would be a better
design here:

1) I can create a separate CF OwnerIdx which has user id's as
keys, and then each of the columns points at an object (with a
dummy value, since I just need a list). This would add a new CF,
but on the other hand, this would be easy to drop once 0.7 comes
along and I can just make a index query to the Objects CF, OR

2) Put the index inside the Users CF, with object:id for column
name and a dummy value, and then get slices as necessary? This
would mean less CFs (and hence no schema modification), but might
mean that I have to clean it up at some point.

I don't yet have a lot of CFs, so I'm not worried about mem
consumption really. The Users CF is very read-heavy as-is, but the
index and Objects will be a bit more balanced.

Experiences? Recommendations? Tips? Other possibilities? What other
considerations should I take into account?

/Janne

Re: Best strategy for adding new nodes to the cluster

2010-09-28 Thread Janne Jalkanen


On 28 Sep 2010, at 08:37, Michael Dürgner wrote:

 What do you mean by running live? I am also planning to use cassandra on 
 EC2 using small nodes. Small nodes have 1/4 cpu of the large ones, 1/4 cost, 
 but I/O is more than 1/4 (amazon does not give explicit I/O numbers...), so 
 I think 4 small instances should perform better than 1 large one (and the 
 cost is the same), am I wrong?
 
 Based on results we saw and what you also find in different sources around 
 the web, EC2 small instances perform worse than 1/4 regarding IO performance.

Ditto. My tests indicate that while the peak IO performance of small nodes can 
be ok (up to 1/2 of large), it degrades over time down to 1/6 or even less. It 
seems that Amazon dedicates sufficient bandwidth to small nodes in the 
beginning to ensure a smooth and quick boot, but then throttles down fairly 
aggressively within a few minutes.  This seems to affect reads more than 
writes, though.

Note also that large instances have over 4x the memory (1.7 GB = 7.5 GB), and 
that makes a world of difference (you can have larger caches, for example). You 
don't really want to start swapping on the small instances.

(However, small instances are awesome for doing testing and learning how to 
manage a cluster.)

/Janne

Re: normal thread counts?

2013-05-01 Thread Janne Jalkanen


This sounds very much like 
https://issues.apache.org/jira/browse/CASSANDRA-5175, which was fixed in 1.1.10.

/Janne

On Apr 30, 2013, at 23:34 , aaron morton aa...@thelastpickle.com wrote:

  Many many many of the threads are trying to talk to IPs that aren't in the 
 cluster (I assume they are the IP's of dead hosts). 
 Are these IP's from before the upgrade ? Are they IP's you expect to see ? 
 
 Cross reference them with the output from nodetool gossipinfo to see why the 
 node thinks they should be used. 
 Could you provide a list of the thread names ? 
 
 One way to remove those IPs that may be to rolling restart with 
 -Dcassandra.load_ring_state=false i the JVM opts at the bottom of 
 cassandra-env.sh
 
 The OutboundTcpConnection threads are created in pairs by the 
 OutboundTcpConnectionPool, which is created here 
 https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessagingService.java#L502
  The threads are created in the OutboundTcpConnectionPool constructor 
 checking to see if this could be the source of the leak. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 1/05/2013, at 2:18 AM, William Oberman ober...@civicscience.com wrote:
 
 I use phpcassa.
 
 I did a thread dump.  99% of the threads look very similar (I'm using 1.1.9 
 in terms of matching source lines).  The thread names are all like this: 
 WRITE-/10.x.y.z.  There are a LOT of duplicates (in terms of the same IP). 
  Many many many of the threads are trying to talk to IPs that aren't in the 
 cluster (I assume they are the IP's of dead hosts).  The stack trace is 
 basically the same for them all, attached at the bottom.   
 
 There is a lot of things I could talk about in terms of my situation, but 
 what I think might be pertinent to this thread: I hit a tipping point 
 recently and upgraded a 9 node cluster from AWS m1.large to m1.xlarge 
 (rolling, one at a time).  7 of the 9 upgraded fine and work great.  2 of 
 the 9 keep struggling.  I've replaced them many times now, each time using 
 this process:
 http://www.datastax.com/docs/1.1/cluster_management#replacing-a-dead-node
 And even this morning the only two nodes with a high number of threads are 
 those two (yet again).  And at some point they'll OOM.
 
 Seems like there is something about my cluster (caused by the recent 
 upgrade?) that causes a thread leak on OutboundTcpConnection   But I don't 
 know how to escape from the trap.  Any ideas?
 
 
 
   stackTrace = [ { 
 className = sun.misc.Unsafe;
 fileName = Unsafe.java;
 lineNumber = -2;
 methodName = park;
 nativeMethod = true;
}, { 
 className = java.util.concurrent.locks.LockSupport;
 fileName = LockSupport.java;
 lineNumber = 158;
 methodName = park;
 nativeMethod = false;
}, { 
 className = 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject;
 fileName = AbstractQueuedSynchronizer.java;
 lineNumber = 1987;
 methodName = await;
 nativeMethod = false;
}, { 
 className = java.util.concurrent.LinkedBlockingQueue;
 fileName = LinkedBlockingQueue.java;
 lineNumber = 399;
 methodName = take;
 nativeMethod = false;
}, { 
 className = org.apache.cassandra.net.OutboundTcpConnection;
 fileName = OutboundTcpConnection.java;
 lineNumber = 104;
 methodName = run;
 nativeMethod = false;
} ];
 --
 
 
 
 
 On Mon, Apr 29, 2013 at 4:31 PM, aaron morton aa...@thelastpickle.com 
 wrote:
  I used JMX to check current number of threads in a production cassandra 
 machine, and it was ~27,000.
 That does not sound too good. 
 
 My first guess would be lots of client connections. What client are you 
 using, does it do connection pooling ?
 See the comments in cassandra.yaml around rpc_server_type, the default uses 
 sync uses one thread per connection, you may be better with HSHA. But if 
 your app is leaking connection you should probably deal with that first. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 30/04/2013, at 3:07 AM, William Oberman ober...@civicscience.com wrote:
 
 Hi,
 
 I'm having some issues.  I keep getting:
 
 ERROR [GossipStage:1] 2013-04-28 07:48:48,876 AbstractCassandraDaemon.java 
 (line 135) Exception in thread Thread[GossipStage:1,5,main]
 java.lang.OutOfMemoryError: unable to create new native thread
 --
 after a day or two of runtime.  I've checked and my system settings seem 
 acceptable:
 memlock=unlimited
 nofiles=10
 nproc=122944
 
 I've messed with heap sizes from 6-12GB (15 physical, m1.xlarge in AWS), 
 and I keep OOM'ing with the above error.
 
 I've found some (what seem to me) to be obscure references to the stack 
 size interacting with # of threads.  If I'm understanding it correctly, to

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-16 Thread Janne Jalkanen


Might you be experiencing this? 
https://issues.apache.org/jira/browse/CASSANDRA-4417

/Janne

On May 16, 2013, at 14:49 , Alain RODRIGUEZ arodr...@gmail.com wrote:

 @Rob: Thanks about the feedback.
 
 Yet I have a weird behavior still unexplained about repairing. Are counters 
 supposed to be repaired too ? I mean, while reading at CL.ONE I can have 
 different values depending on what node is answering. Even after a read 
 repair or a full repair. Shouldn't a repair fix these discrepancies ?
 
 The only way I found to get always the same count is to read data at 
 CL.QUORUM, but this is a workaround since the data itself remains wrong on 
 some nodes. 
 
 Any clue on it ?
 
 Alain
 
 2013/5/15 Edward Capriolo edlinuxg...@gmail.com
 http://basho.com/introducing-riak-1-3/
 
 Introduced Active Anti-Entropy. Riak now has active anti-entropy. In 
 distributed systems, inconsistencies can arise between replicas due to 
 failure modes, concurrent updates, and physical data loss or corruption. 
 Pre-1.3 Riak already had several features for repairing this “entropy”, but 
 they all required some form of user intervention. Riak 1.3 introduces 
 automatic, self-healing properties that repair entropy on an ongoing basis.
 
 
 On Wed, May 15, 2013 at 5:32 PM, Robert Coli rc...@eventbrite.com wrote:
 On Wed, May 15, 2013 at 1:27 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:
  Rob, I was wondering something. Are you a commiter working on improving the
  repair or something similar ?
 
 I am not a committer [1], but I have an active interest in potential
 improvements to the best practices for repair. The specific change
 that I am considering is a modification to the default
 gc_grace_seconds value, which seems picked out of a hat at 10 days. My
 view is that the current implementation of repair has such negative
 performance consequences that I do not believe that holding onto
 tombstones for longer than 10 days could possibly be as bad as the
 fixed cost of running repair once every 10 days. I believe that this
 value is too low for a default (it also does not map cleanly to the
 work week!) and likely should be increased to 14, 21 or 28 days.
 
  Anyway, if a commiter (or any other expert) could give us some feedback on
  our comments (Are we doing well or not, whether things we observe are normal
  or unexplained, what is going to be improved in the future about repair...)
 
 1) you are doing things according to best practice
 2) unfortunately your experience with significantly degraded
 performance, including a blocked go-live due to repair bloat is pretty
 typical
 3) the things you are experiencing are part of the current
 implementation of repair and are also typical, however I do not
 believe they are fully explained [2]
 4) as has been mentioned further down thread, there are discussions
 regarding (and some already committed) improvements to both the
 current repair paradigm and an evolution to a new paradigm
 
 Thanks to all for the responses so far, please keep them coming! :D
 
 =Rob
 [1] hence the (unofficial) tag for this thread. I do have minor
 patches accepted to the codebase, but always merged by an actual
 committer. :)
 [2] driftx@#cassandra feels that these things are explained/understood
 by core team, and points to
 https://issues.apache.org/jira/browse/CASSANDRA-5280 as a useful
 approach to minimize same.

Re: best practices on EC2 question

2013-05-16 Thread Janne Jalkanen

On May 16, 2013, at 17:05 , Brian Tarbox tar...@cabotresearch.com wrote:

 An alternative that we had explored for a while was to do a two stage backup:
 1) copy a C* snapshot from the ephemeral drive to an EBS drive
 2) do an EBS snapshot to S3.
 
 The idea being that EBS is quite reliable, S3 is still the emergency backup 
 and copying back from EBS to ephemeral is likely much faster than the 15 
 MB/sec we get from S3.

Yup, this is what we do.  We use rsync with --bwlimit=4000 to copy the 
snapshots from the eph drive to EBS; this is intentionally very low so that the 
backup process does not take eat our I/O.  This is on m1.xlarge instances; YMMV 
so measure :).  EBS drives are then snapshot with ec2-consistent-snapshot and 
then old snapshots expired using ec2-expire-snapshots (I believe these scripts 
are from Alestic).

/Janne

Re: Billions of counters

2013-06-13 Thread Janne Jalkanen


Hi!

We have a similar situation of millions of events on millions of items - turns 
out that this isn't really a problem, because there tends to be a very strong 
power -distribution: very few of the items get a lot of hits, some get some, 
and the majority gets no hits (though most of them do get hits every now and 
then).  So it's basically a sparse multidimensional array, and turns out that 
Cassandra is pretty good at storing those.  We just treat a missing counter 
column as zero, and add a counter only when necessary.  To avoid I/O, we also 
do some statistical sampling for certain counters where we don't need an exact 
figure.

YMMV, of course, but I'd look at the likelihood of all the products being 
purchased from the same location during one week at least once and start the 
modeling from there. :)

/Janne

On 13 Jun 2013, at 21:19, Darren Smythe darren1...@gmail.com wrote:

 We want to precalculate counts for some common metrics for usage. We have 
 events, locations, products, etc. The problem is we have millions events/day, 
 thousands of locations and millions of products.
 
 Were trying to precalculate counts for some common queries like 'how many 
 times was product X purchased in location Y last week'.
 
 It seems like we'll end up with trillions of counters for even these basic 
 permutations. Is this a cause for concern?
 
 TIA
 
 -- Darren

Re: Why does cassandra PoolingSegmentedFile recycle the RandomAccessReader?

2013-07-15 Thread Janne Jalkanen


I had exactly the same problem, so I increased the sstable size (from 5 to 50 
MB - the default 5MB is most certainly too low for serious usecases).  Now the 
number of SSTableReader objects is manageable, and my heap is happier.

Note that for immediate effect I stopped the node, removed the *.json files and 
restarted - which put all SSTables to L0, which meant a weekend full of 
compactions… Would be really cool if there was a way to automatically drop all 
LCS SSTables one level down to make them compact earlier without avoiding the 
OMG-must-compact-everything-aargh-my-L0-is-full -effect of removing the JSON 
file.

/Janne

On 15 Jul 2013, at 10:48, sulong sulong1...@gmail.com wrote:

 Why does cassandra PoolingSegmentedFile recycle the RandomAccessReader? The 
 RandomAccessReader objects consums too much memory.
 
 I have a cluster of 4 nodes. Every node's cassandra jvm has 8G heap. The 
 cassandra's memory is full after about one month, so I have to restart the 4 
 nodes every month. 
 
 I have 100G data on every node, with LevedCompactionStrategy and 10M sstable 
 size, so there are more than 1 sstable files. By looking through the heap 
 dump file, I see there are more than 9000 SSTableReader objects in memory, 
 which references lots of  RandomAccessReader objects. The memory is consumed 
 by these RandomAccessReader objects. 
 
 I see the PoolingSegementedFile has a recycle method, which puts the 
 RandomAccessReader to a queue. Looks like the Queue always grow until the 
 sstable is compacted.  Is there any way to stop the RandomAccessReader 
 recycling? Or, set a limit to the recycled RandomAccessReader's number?

Re: Cassandra Out of Memory on startup while reading cache

2013-07-22 Thread Janne Jalkanen


Sounds like this: https://issues.apache.org/jira/browse/CASSANDRA-5706, which 
is fixed in 1.2.7.

/Janne

On 22 Jul 2013, at 20:40, Jason Tyler jaty...@yahoo-inc.com wrote:

 Hello,
 
 Since upgrading from 1.1.9 to 1.2.6 over the last week, we've had two 
 instances where cassandra was unable, but kept trying to restart:
 
 SNIP
  INFO [main] 2013-07-19 16:12:36,769 AutoSavingCache.java (line 140) reading 
 saved cache /var/cassandra/caches/SyncCore-CommEvents-KeyCache-b.db
 ERROR [main] 2013-07-19 16:12:36,966 CassandraDaemon.java (line 458) 
 Exception encountered during startup
 java.lang.OutOfMemoryError: Java heap space
 at 
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
 at 
 org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:379)
 at 
 org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:145)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:266)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:382)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:354)
 at org.apache.cassandra.db.Table.initCf(Table.java:329)
 at org.apache.cassandra.db.Table.init(Table.java:272)
 at org.apache.cassandra.db.Table.open(Table.java:109)
 at org.apache.cassandra.db.Table.open(Table.java:87)
 at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:271)
 at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:441)
 at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:484)
  INFO [main] 2013-07-19 16:12:43,288 CassandraDaemon.java (line 118) Logging 
 initialized
 SNIP
 
 This is new behavior with 1.2.6.  
 
 Stopping cassandra, moving the offending file, then starting cassandra does 
 succeed.  
 
 Any config suggestions (key cache config?) to prevent this from happening?
 
 THX
 
 
 Cheers,
 
 ~Jason

Re: sstable size change

2013-07-22 Thread Janne Jalkanen


I don't think upgradesstables is enough, since it's more of a change this file 
to a new format but don't try to merge sstables and compact -thing.

Deleting the .json -file is probably the only way, but someone more familiar 
with cassandra LCS might be able to tell whether manually editing the json file 
so that you drop all sstables a level might work? Since they would overflow the 
new level, they would compact soon, but the impact might be less drastic than 
just deleting the .json file (which takes everything to L0)...

/Janne

On 22 Jul 2013, at 16:02, Keith Wright kwri...@nanigans.com wrote:

 Hi all,
 
I know there has been several threads recently on this but I wanted to 
 make sure I got a clear answer:  we are looking to increase our SSTable size 
 for a couple of our LCS tables as well as chunk size (to match the SSD block 
 size).   The largest table is at 500 GB across 6 nodes (RF 3, C* 1.2.4 
 VNodes).  I wanted to get feedback on the best way to make this change with 
 minimal load impact on the cluster.  After I make the change, I understand 
 that I need to force the nodes to re-compact the tables.  
 
 Can this be done via upgrade sstables or do I need to shutdown the node, 
 delete the .json file, and restart as some have suggested?  
 
 I assume I can do this one node at a time?
 
 If I change the bloom filter size, I assume I will need to force compaction 
 again?  Using the same methodology?
 
 Thank you

Re: Which of these VPS configurations would perform better for Cassandra ?

2013-08-07 Thread Janne Jalkanen


Well, Amazon is expensive. Hetzner will sell you dedicated SSD RAID-1 servers 
with 32GB RAM and 4 cores with HT for €59/mth.  However, if pricing is an 
issue, you could start with:

1 server : read at ONE, write at ONE, RF=1. You will have consistency, but not 
high availability. This is the same as with MySQL or any other single-server 
solution - if the db server goes down, your service goes down.  You will need 
to be extra careful with backups here, because if your node blows, you will 
need to restore.

then you upgrade to

2 servers: read at ONE, write at ONE, RF=2. You can now tolerate one node going 
down with automatic failover, but you won't get consistency.  This is kinda 
having MySQL master/slave replication (yes, I know, it's not really the same, 
but it's pretty close as an effect)

then you upgrade to

3 servers: read at QUORUM, write at QUORUM, RF=3. You can tolerate one node 
going down, and you will have consistent data. This is where Cassandra starts 
to shine.

then you get a big heap-o-money, and keep adding servers and you realize that 
with pretty much everything else you would be spending a LOT of time just to 
keep sure that your cluster is up and running and performing.

It's always a question of tradeoffs. Cassandra is cool 'cos it gives you the 
ability to run a lot of different configurations and will go up-up-up when you 
need it without a lot of special magic.

/Janne

On Aug 7, 2013, at 07:36 , Ertio Lew ertio...@gmail.com wrote:

 Amazon seems to much overprice its services. If you look out for a similar 
 size deployment elsewhere like linode or digital ocean(very competitive 
 pricing), you'll notice huge differences. Ok, some services  features are 
 extra but may we all don't need them necessarily  when you can host on 
 non-dedicated virtual servers on Amazon you can also do it with similar 
 configuration nodes elsewhere too.
 
 IMO these huge costs associated with cassandra deployment are too heavy for 
 small startups just starting out. I believe, If you consider a deployment for 
 similar application using MySQL it should be quite cheaper/ affordable(though 
 i'm not exactly sure). Atleast you don't usually create a cluster from the 
 beginning. Probably we made a wrong decision to choose cassandra considering 
 only its technological advantages.

Re: understanding memory footprint

2013-08-15 Thread Janne Jalkanen


Also, if you are using leveled compaction, remember that each SSTable will take 
a couple of MB of heap space.  You can tune this by choosing a good 
sstable_size_in_mb value for those CFs which are on LCS and contain lots of 
data.  Default is 5 MB, which is for many cases inadequate, so most people seem 
to be happy running with sizes that range from 64 MB and up.  The right size 
for you will most probably vary.

/Janne

On Aug 15, 2013, at 06:05 , Aaron Morton aa...@thelastpickle.com wrote:

 Does the number of column families still significantly impact the memory 
 footprint? If so, what is the incremental cost of a column family/table?
 IMHO there would be little difference in memory use for a node with zero data 
 that had 10 CF's and one that had 100 CF's. When you start putting data in 
 the story changes. 
 
 As Alain said, the number of rows can impact the memory use. In 1.2+ that's 
 less of an issue, but the index samples are still on heap. In my experience 
 in normal (4Gb to 8GB heap) this is not an issue until you get into 500+ 
 million rows. 
 
 The number of CF's is still used when calculating when to flush to disk. If 
 you have 100 cf's the server will flush to disk more frequently than if you 
 have 10. Because it needs to leave more room for the memtables to grow. 
 
 The best way to get help on this is provide details on the memory settings, 
 the numbers of CF's, the total number of rows, and the cache settings. 
 
 Hope that helps. 
  
 -
 Aaron Morton
 Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 13/08/2013, at 9:10 PM, Alain RODRIGUEZ arodr...@gmail.com wrote:
 
 if using 1.2.*, Bloom filters are in native memory so not pressuring your 
 heap, how many data do you have per node ? If this value is big, you have 
 samples index in the heap consuming a lot of memory, for sure, and growing 
 as your data per node grow.
 
 Solutions : increase the heap if  8GB and / or reduce sampling 
 index_interval: 128 to a bigger value (256 - 512) and /or wait for 2.0.* 
 which, of the top of my head, should move the sampling in native memory 
 allowing heap size to be independent from the data size per node.
 
 This should alleviate things. Yet these are only guesses since I know almost 
 nothing about your cluster...
 
 Hope this help somehow.
 
 
 2013/8/12 Robert Coli rc...@eventbrite.com
 On Mon, Aug 12, 2013 at 11:14 AM, Paul Ingalls paulinga...@gmail.com wrote:
 I don't really need exact numbers, just a rough cost would be sufficient.  
 I'm running into memory problems on my cluster, and I'm trying to decide if 
 reducing the number of column families would be worth the effort.  Looking 
 at the rule of thumb from the wiki entry made it seem like reducing the 
 number of tables would make a big impact, but I'm running 1.2.8 so not sure 
 if it is still true.
 
 Is there a new rule of thumb?
  
 If you want a cheap/quick measure of how much space partially full memtables 
 are taking, just nodetool flush and check heap usage before and after?
 
 If you want a cheap/quick measure of how much space empty sstables take in 
 heap, I think you're out of luck.
 
 =Rob

Re: Cassandra JVM heap sizes on EC2

2013-08-24 Thread Janne Jalkanen

We've been trying to keep the heap as small as possible; the disk access
penalty on EC2 is big enough - even on instance store - that you want to give
as much memory to disk caches as you can. Of course, then you will need to
keep extra vigilant on your garbage collection and tune various things like
bloom filters, cache sizes (if using on-heap cache) and sstable size for LCS
accordingly.

YMMV of course; we're running on m1.xlarge, so we have less RAM to play with
than you. It all depends on your data size, the size of the hot portion, etc.
Currently we use 3.5GB for Cassandra 1.2.8, which seems like a good tradeoff
for our usage patterns. I tend to bump the heap up and down in .5 GB intervals
just to see what happens; let it run for a few hours or a day and then check
Munin graphs to see what the effect was compared to other nodes.

/Janne

On Aug 24, 2013, at 01:12 , David Laube d...@stormpath.com wrote:

Hi All,

We are evaluating our JVM heap size configuration on Cassandra 1.2.8 and
would like to get some feedback from the community as to what the proper JVM
heap size should be for cassandra nodes deployed on to Amazon EC2. We are
running m2.4xlarge EC2 instances (64GB RAM, 8 core, 2 x 840GB disks) --so we
will have plenty of RAM. I've already consulted the docs at
http://www.datastax.com/documentation/cassandra/1.2/mobile/cassandra/operations/ops_tune_jvm_c.html
but would love to hear what is working or not working for you in the wild.
Since Datastax cautions against using more than 8GB, I'm wondering if it is
even advantageous to use even slightly more.

Thanks,
-David Laube

Failed decommission

2013-08-25 Thread Janne Jalkanen

This on cass 1.2.8

Ring state before decommission

--  Address Load   Owns   Host ID   
TokenRack
UN  10.0.0.1  38.82 GB   33.3%  21a98502-dc74-4ad0-9689-0880aa110409  1 
   1a
UN  10.0.0.2   33.5 GB33.3%  cba6b27a-4982-4f04-854d-cc73155d5f69  
56713727820156407428984779325531226110   1b
UN  10.0.0.3  37.41 GB   0.0%   6ba2c7d4-713e-4c14-8df8-f861fb211b0d  
56713727820156407428984779325531226111   1b
UN  10.0.0.4  35.7 GB33.3%  bf3d4792-f3e0-4062-afe3-be292bc85ed7  
11342745564031281485796955865106245  1c

Trying to decommission the node

ubuntu@10.0.0.3:~$ nodetool decommission
Exception in thread main java.lang.NumberFormatException: For input string: 
56713727820156407428984779325531226111
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:444)
at java.lang.Long.parseLong(Long.java:483)
at 
org.apache.cassandra.service.StorageService.extractExpireTime(StorageService.java:1660)
at 
org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1515)
at 
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1234)
at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:949)
at 
org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1116)
at 
org.apache.cassandra.service.StorageService.leaveRing(StorageService.java:2817)
at 
org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2861)
at 
org.apache.cassandra.service.StorageService.decommission(StorageService.java:2808)

Now I'm in a state where the machine is still up but leaving but I can't 
seem to get it out of the ring.  For example:

% nodetool removenode 6ba2c7d4-713e-4c14-8df8-f861fb211b0d
Exception in thread main java.lang.UnsupportedOperationException: Node 
/10.0.0.3 is alive and owns this ID. Use decommission command to remove it from 
the ring

Any ideas?

/Janne

Re: Failed decommission

2013-08-25 Thread Janne Jalkanen


Thanks; this worked for me too.

/Janne

On Aug 25, 2013, at 18:47 , Mike Heffner m...@librato.com wrote:

 Janne,
 
 We ran into this too. Appears it's a bug in 1.2.8 that is fixed in the 
 upcoming 1.2.9. I added the steps I took to finally remove the node here: 
 https://issues.apache.org/jira/browse/CASSANDRA-5857?focusedCommentId=13748998page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13748998
 
 
 Cheers,
 
 Mike
 
 
 On Sun, Aug 25, 2013 at 4:06 AM, Janne Jalkanen janne.jalka...@ecyrd.com 
 wrote:
 This on cass 1.2.8
 
 Ring state before decommission
 
 --  Address Load   Owns   Host ID   
 TokenRack
 UN  10.0.0.1  38.82 GB   33.3%  21a98502-dc74-4ad0-9689-0880aa110409  1   
  1a
 UN  10.0.0.2   33.5 GB33.3%  cba6b27a-4982-4f04-854d-cc73155d5f69  
 56713727820156407428984779325531226110   1b
 UN  10.0.0.3  37.41 GB   0.0%   6ba2c7d4-713e-4c14-8df8-f861fb211b0d  
 56713727820156407428984779325531226111   1b
 UN  10.0.0.4  35.7 GB33.3%  bf3d4792-f3e0-4062-afe3-be292bc85ed7  
 11342745564031281485796955865106245  1c
 
 Trying to decommission the node
 
 ubuntu@10.0.0.3:~$ nodetool decommission
 Exception in thread main java.lang.NumberFormatException: For input string: 
 56713727820156407428984779325531226111
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
 at java.lang.Long.parseLong(Long.java:444)
 at java.lang.Long.parseLong(Long.java:483)
 at 
 org.apache.cassandra.service.StorageService.extractExpireTime(StorageService.java:1660)
 at 
 org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1515)
 at 
 org.apache.cassandra.service.StorageService.onChange(StorageService.java:1234)
 at 
 org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:949)
 at 
 org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1116)
 at 
 org.apache.cassandra.service.StorageService.leaveRing(StorageService.java:2817)
 at 
 org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2861)
 at 
 org.apache.cassandra.service.StorageService.decommission(StorageService.java:2808)
 
 Now I'm in a state where the machine is still up but leaving but I can't 
 seem to get it out of the ring.  For example:
 
 % nodetool removenode 6ba2c7d4-713e-4c14-8df8-f861fb211b0d
 Exception in thread main java.lang.UnsupportedOperationException: Node 
 /10.0.0.3 is alive and owns this ID. Use decommission command to remove it 
 from the ring
 
 Any ideas?
 
 /Janne
 
 
 
 -- 
 
   Mike Heffner m...@librato.com
   Librato, Inc.

Re: [Pig] ERROR 2118: Could not get input splits

2013-09-20 Thread Janne Jalkanen


I just started moving our scripts to Pig 0.11.1 from 0.9.2 and I see the same 
issue - about 75-80% time it fails. So I'm not moving :-/. 

I am using OSX + Oracle Java7 and CassandraStorage, but I did not see any 
difference between CassandraStorage and CqlStorage.

Cassandra 1.2.9, though 1.1.10 seemed to have the same symptoms.

(Also, ?widerows=true seems to cause a NullPointerExeption in 
ByteByfferUtil.toString().  So not a very successful evening yesterday.)

/Janne

On Sep 20, 2013, at 11:55 , Cyril Scetbon cyril.scet...@free.fr wrote:

 Hi,
 
 I get a lot of exceptions when using Pig scripts over Cassandra. I have to 
 launch them again and again until they work. You can find a sample of the 
 stacks when it works (twice) and when it fails (3 times) at 
 http://pastebin.com/yWsTHbix. I use the following sample script (there are 
 only a few lines) :
 
 data = LOAD 'cql://ks1/lc' USING CqlStorage();
 describe data;
 dump data;
 rows  = FOREACH data GENERATE filtre;
 dump rows;
 describe rows;
 
 any idea or a way to get ride of them ?
 
 -- 
 Cyril SCETBON

Mystery PIG issue with 1.2.10

2013-09-25 Thread Janne Jalkanen

Heya!

I am seeing something rather strange in the way Cass 1.2 + Pig seem to handle 
integer values.

Setup: Cassandra 1.2.10, OSX 10.8, JDK 1.7u40, Pig 0.11.1.  Single node for 
testing this. 

First a table:

 CREATE TABLE testc (
  key text PRIMARY KEY,
  ivalue int,
  svalue text,
  value bigint
) WITH COMPACT STORAGE;

 insert into testc (key,ivalue,svalue,value) values ('foo',10,'bar',65);
 select * from testc;

 key | ivalue | svalue | value
-+++---
 foo | 10 |bar | 65

For my Pig setup, I then use libraries from different C* versions to actually 
talk to my database (which stays on 1.2.10 all the time).

Cassandra 1.0.12 (using cassandra_storage.jar):

 testc = LOAD 'cassandra://keyspace/testc' USING CassandraStorage();
 dump testc
(foo,(svalue,bar),(ivalue,10),(value,65),{})

Cassandra 1.1.10:

 testc = LOAD 'cassandra://keyspace/testc' USING CassandraStorage();
 dump testc
(foo,(svalue,bar),(ivalue,10),(value,65),{})

Cassandra 1.2.10:

 (testc = LOAD 'cassandra://keyspace/testc' USING CassandraStorage();
 dump testc
foo,{(ivalue,
),(svalue,bar),(value,A)})


To me it appears that ints and bigints are interpreted as ascii values in cass 
1.2.10.  Did something change for CassandraStorage, is there a regression, or 
am I doing something wrong?  Quick perusal of the JIRA didn't reveal anything 
that I could directly pin on this.

Note that using compact storage does not seem to affect the issue, though it 
obviously changes the resulting pig format.

In addition, trying to use Pygmalion 

 tf = foreach testc generate key, 
 flatten(FromCassandraBag('ivalue,svalue,value',columns)) as 
 (ivalue:int,svalue:chararray,lvalue:long);
 dump tf

(foo,
,bar,A)

So no help there. Explicitly casting the values to (long) or (int) just results 
in a ClassCastException.

/Janne

Re: Mystery PIG issue with 1.2.10

2013-09-26 Thread Janne Jalkanen


Unfortunately no, as I have a dozen legacy columnfamilies… Since no clear 
answers appeared, I'm going to assume that this is a regression and file a JIRA 
ticket on this.

/Janne

On 26 Sep 2013, at 08:00, Aaron Morton aa...@thelastpickle.com wrote:

  (testc = LOAD 'cassandra://keyspace/testc' USING CassandraStorage();
  dump testc
 foo,{(ivalue,
 ),(svalue,bar),(value,A)})
 
 
 
 If the CQL 3 data ye wish to read, CqlStorage be the driver of your success. 
 
 (btw there is a ticket out to update the example if you get excited 
 https://issues.apache.org/jira/browse/CASSANDRA-5709)
 
 Cheers
 
 
 -
 Aaron Morton
 New Zealand
 @aaronmorton
 
 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com
 
 On 26/09/2013, at 3:57 AM, Chad Johnston cjohns...@megatome.com wrote:
 
 As an FYI, creating the table without the WITH COMPACT STORAGE and using 
 CqlStorage works just fine in 1.2.10.
 
 I know that CqlStorage and AbstractCassandraStorage got changed for 1.2.10 - 
 maybe there's a regression with the existing CassandraStorage?
 
 Chad
 
 
 On Wed, Sep 25, 2013 at 1:51 AM, Janne Jalkanen janne.jalka...@ecyrd.com 
 wrote:
 Heya!
 
 I am seeing something rather strange in the way Cass 1.2 + Pig seem to 
 handle integer values.
 
 Setup: Cassandra 1.2.10, OSX 10.8, JDK 1.7u40, Pig 0.11.1.  Single node for 
 testing this.
 
 First a table:
 
  CREATE TABLE testc (
   key text PRIMARY KEY,
   ivalue int,
   svalue text,
   value bigint
 ) WITH COMPACT STORAGE;
 
  insert into testc (key,ivalue,svalue,value) values ('foo',10,'bar',65);
  select * from testc;
 
  key | ivalue | svalue | value
 -+++---
  foo | 10 |bar | 65
 
 For my Pig setup, I then use libraries from different C* versions to 
 actually talk to my database (which stays on 1.2.10 all the time).
 
 Cassandra 1.0.12 (using cassandra_storage.jar):
 
  testc = LOAD 'cassandra://keyspace/testc' USING CassandraStorage();
  dump testc
 (foo,(svalue,bar),(ivalue,10),(value,65),{})
 
 Cassandra 1.1.10:
 
  testc = LOAD 'cassandra://keyspace/testc' USING CassandraStorage();
  dump testc
 (foo,(svalue,bar),(ivalue,10),(value,65),{})
 
 Cassandra 1.2.10:
 
  (testc = LOAD 'cassandra://keyspace/testc' USING CassandraStorage();
  dump testc
 foo,{(ivalue,
 ),(svalue,bar),(value,A)})
 
 
 To me it appears that ints and bigints are interpreted as ascii values in 
 cass 1.2.10.  Did something change for CassandraStorage, is there a 
 regression, or am I doing something wrong?  Quick perusal of the JIRA didn't 
 reveal anything that I could directly pin on this.
 
 Note that using compact storage does not seem to affect the issue, though it 
 obviously changes the resulting pig format.
 
 In addition, trying to use Pygmalion
 
  tf = foreach testc generate key, 
  flatten(FromCassandraBag('ivalue,svalue,value',columns)) as 
  (ivalue:int,svalue:chararray,lvalue:long);
  dump tf
 
 (foo,
 ,bar,A)
 
 So no help there. Explicitly casting the values to (long) or (int) just 
 results in a ClassCastException.
 
 /Janne

Re: Mystery PIG issue with 1.2.10

2013-09-27 Thread Janne Jalkanen


Sorry, got sidetracked :)

https://issues.apache.org/jira/browse/CASSANDRA-6102

/Janne

On Sep 26, 2013, at 20:04 , Robert Coli rc...@eventbrite.com wrote:

 On Thu, Sep 26, 2013 at 1:00 AM, Janne Jalkanen janne.jalka...@ecyrd.com 
 wrote:
 
 Unfortunately no, as I have a dozen legacy columnfamilies… Since no clear 
 answers appeared, I'm going to assume that this is a regression and file a 
 JIRA ticket on this.
 
 Could you let the list know the ticket number, when you do? :)
 
 =Rob

Re: Disappearing index data.

2013-10-07 Thread Janne Jalkanen


https://issues.apache.org/jira/browse/CASSANDRA-5732

There is now a reproducible test case.

/Janne

On Oct 7, 2013, at 16:29 , Michał Michalski mich...@opera.com wrote:

 I had similar issue (reported many times here, there's also a JIRA issue, but 
 people reporting this problem were unable to reproduce it).
 
 What I can say is that for me the solution was to run major compaction on the 
 index CF via JMX. To be clear - we're not talking about compacting the CF 
 that IS indexed (your CF), but the internal Cassandra's one, which is 
 responsible for storing index data.
 
 MBean you should look for looks like this:
 
 org.apache.cassandra.db:type=IndexColumnFamilies,keyspace=KS,columnfamily=CF.IDX
 
 M.
 
 W dniu 07.10.2013 15:22, Tom van den Berge pisze:
 On a 2-node cluster with replication factor 2, I have a column family with
 an index on one of the columns.
 
 Every now and then, I notice that a lookup of the record through the index
 on node 1 produces the record, but the same lookup on node 2 does not! If I
 do a lookup by row key, the record is found, and the indexed value is there.
 
 
 So as far as I can tell, the index on one of the nodes looses values, and
 is no longer in sync with the other node, even though the replication
 factor requires it. I typically repair these issues by storing the indexed
 column value again.
 
 The indexed data is static data; it doesn't change.
 
 I'm running cassandra 1.2.3. I'm running a nodetool repair on each node
 every day (although this does not fix this problem).
 
 This problem worries me a lot. I don't have a clue about the cause of it.
 Any help would be greatly appreciated.
 
 
 
 Tom

Re: [RELEASE] Apache Cassandra 1.2.11 released

2013-10-23 Thread Janne Jalkanen


Question - is https://issues.apache.org/jira/browse/CASSANDRA-6102 in 1.2.11 or 
not? CHANGES.txt says it's not, JIRA says it is.

/Janne (temporarily unable to check out the git repo)

On Oct 22, 2013, at 13:48 , Sylvain Lebresne sylv...@datastax.com wrote:

 The Cassandra team is pleased to announce the release of Apache Cassandra
 version 1.2.11.
 
 Cassandra is a highly scalable second-generation distributed database,
 bringing together Dynamo's fully distributed design and Bigtable's
 ColumnFamily-based data model. You can read more here:
 
  http://cassandra.apache.org/
 
 Downloads of source and binary distributions are listed in our download
 section:
 
  http://cassandra.apache.org/download/
 
 This version is a maintenance/bug fix release[1] on the 1.2 series. As always,
 please pay attention to the release notes[2] and Let us know[3] if you were to
 encounter any problem.
 
 Enjoy!
 
 [1]: http://goo.gl/xjiN74 (CHANGES.txt)
 [2]: http://goo.gl/r5pVU2 (NEWS.txt)
 [3]: https://issues.apache.org/jira/browse/CASSANDRA

Re: Efficient IP address location lookup

2013-11-16 Thread Janne Jalkanen

Idea:

Put only range end points in the table with primary key (part, remainder)

insert into location (part, remainder, city) values (100,10,Sydney) // 
100.0.0.1-100.0.0.10 is Sydney
insert into location (part, remainder, city) values (100,50,Melbourne) // 
100.0.0.11-100.0.0.5 is Melb

then look up (100.0.0.30) as

select * from location where part=100 and remainder = 30 limit 1

For nonused ranges just put in an empty city or some other known value :)

/Janne

On Nov 16, 2013, at 04:51 , Jacob Rhoden jacob.rho...@me.com wrote:

 
 On 16 Nov 2013, at 1:47 pm, Jon Haddad j...@jonhaddad.com wrote:
 Instead of determining your table first, you should figure out what you want 
 to ask Cassandra.
 
 Thanks Jon, Perhaps I should have been more clear. I need to efficiently look 
 up the location of an IP address.
 
 On Nov 15, 2013, at 4:36 PM, Jacob Rhoden jacob.rho...@me.com wrote:
 
 Hi Guys,
 
 It occurs to me that someone may have done this before and be willing to 
 share, or may just be interested in helping work out it.
 
 Assuming a database table where the partition key is the first component of 
 a users IPv4 address, i.e. (ip=100.0.0.1, part=100) and the remaining three 
 parts of the IP address become a 24bit integer.
 
 create table location(
 part int,
 start bigint,
 end bigint,
 country text,
 city text,
 primary key (part, start, end));
 
 // range 100.0.0.0 - 100.0.0.10
 insert into location (part, start, end, country, city) 
 values(100,0,10,'AU','Melbourne’);
 
 // range 100.0.0.11 - 100.0.0.200
 insert into location (part, start, end, country, city) 
 values(100,11,200,'US','New York’);
 
 // range 100.0.0.201-100.0.0.255
 insert into location (part, start, end, country, city) 
 values(100,201,255,'UK','London');
 
 What is the appropriate way to then query this? While the following is 
 possible:
 
 select * from location where part=100 and start=30
 
 What I need to do, is this, which seems not allowed. What is the correct 
 way to query this?
 
 select * from location where part=100 and start=30 and end=30
 
 Or perhaps I’m going about this all wrong? Thanks!

Re: Data loss when swapping out cluster

2013-11-26 Thread Janne Jalkanen


That sounds bad!  Did you run repair at any stage?  Which CL are you reading 
with? 

/Janne

On 25 Nov 2013, at 19:00, Christopher J. Bottaro cjbott...@academicworks.com 
wrote:

 Hello,
 
 We recently experienced (pretty severe) data loss after moving our 4 node 
 Cassandra cluster from one EC2 availability zone to another.  Our strategy 
 for doing so was as follows:
 One at a time, bring up new nodes in the new availability zone and have them 
 join the cluster.
 One at a time, decommission the old nodes in the old availability zone and 
 turn them off (stop the Cassandra process).
 Everything seemed to work as expected.  As we decommissioned each node, we 
 checked the logs for messages indicating yes, this node is done 
 decommissioning before turning the node off.
 
 Pretty quickly after the old nodes left the cluster, we started getting 
 client calls about data missing.
 
 We immediately turned the old nodes back on and when they rejoined the 
 cluster *most* of the reported missing data returned.  For the rest of the 
 missing data, we had to spin up a new cluster from EBS snapshots and copy it 
 over.
 
 What did we do wrong?
 
 In hindsight, we noticed a few things which may be clues...
 The new nodes had much lower load after joining the cluster than the old ones 
 (3-4 gb as opposed to 10 gb).
 We have EC2Snitch turned on, although we're using SimpleStrategy for 
 replication.
 The new nodes showed even ownership (via nodetool status) after joining the 
 cluster.
 Here's more info about our cluster...
 Cassandra 1.2.10
 Replication factor of 3
 Vnodes with 256 tokens
 All tables made via CQL
 Data dirs on EBS (yes, we are aware of the performance implications)
 
 Thanks for the help.

Re: Data loss when swapping out cluster

2013-11-27 Thread Janne Jalkanen


A-yup. Got burned this too some time ago myself. If you do accidentally try to 
bootstrap a seed node, the solution is to run repair after adding the new node 
but before removing the old one. However, during this time the node will 
advertise itself as owning a range, but when queried, it'll return no data 
until the repair has completed :-(.

Honestly, with reference to the JIRA ticket, I just don't see a situation where 
the current behaviour would really be useful. It's a nasty thing that you just 
have to know when upgrading your cluster - there's no warning, no logging, no 
documentation; just something that you might accidentally do and which will 
manifest itself as random data loss.

/Janne

On 26 Nov 2013, at 21:20, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Nov 26, 2013 at 9:48 AM, Christopher J. Bottaro 
 cjbott...@academicworks.com wrote:
 One thing that I didn't mention, and I think may be the culprit after doing a 
 lot or mailing list reading, is that when we brought the 4 new nodes into the 
 cluster, they had themselves listed in the seeds list.  I read yesterday that 
 if a node has itself in the seeds list, then it won't bootstrap properly.
 
 https://issues.apache.org/jira/browse/CASSANDRA-5836
 
 =Rob

Re: user / password authentication advice

2013-12-11 Thread Janne Jalkanen


Hi!

You're right, this isn't really Cassandra-specific. Most languages/web 
frameworks have their own way of doing user authentication, and then you just 
typically write a plugin that just stores whatever data the system needs in 
Cassandra.

For example, if you're using Java (or Scala or Groovy or anything else 
JVM-based), Apache Shiro is a good way of doing user authentication and 
authorization. http://shiro.apache.org/. Just implement a custom Realm for 
Cassandra and you should be set.

/Janne

On Dec 12, 2013, at 05:31 , onlinespending onlinespend...@gmail.com wrote:

 Hi,
 
 I’m using Cassandra in an environment where many users can login to use an 
 application I’m developing. I’m curious if anyone has any advice or links to 
 documentation / blogs where it discusses common implementations or best 
 practices for user and password authentication. My cursory search online 
 didn’t bring much up on the subject. I suppose the information needn’t even 
 be specific to Cassandra.
 
 I imagine a few basic steps will be as follows:
 
 user types in username (e.g. email address) and password
 this is verified against a table storing username and passwords (encrypted in 
 some way)
 a token is return to the app / web browser to allow further transactions 
 using secure token (e.g. cookie)
 
 Obviously I’m only scratching the surface and it’s the detail and best 
 practices of implementing this user / password authentication that I’m 
 curious about.
 
 Thank you,
 Ben

Re: Setting up Cassandra to store on a specific node and not replicate

2013-12-18 Thread Janne Jalkanen


This may be hard because the coordinator could store hinted handoff (HH) data 
on disk. You could turn HH off and have RF=1 to keep data on a single instance, 
but you would be likely to lose data if you had any problems with your 
instances… Also you would need to tweak the memtable flushing so that it goes 
to disk more often than the ten seconds which is the default. Or lose data. You 
will also have an interesting time scaling your cluster and would have to 
plan for that in your custom database.

Essentially you want to turn off all the features which make Cassandra a robust 
product ;-). Without knowing your requirements more precisely, I'd be inclined 
to recommend manually sharding on MariaDB or Postgres instances instead, or use 
their underlying storage engines directly (e.g. InnoDB), if you're just looking 
for a key-value store.

/Janne

On 18 Dec 2013, at 11:20, Colin MacDonald colin.macdon...@sas.com wrote:

 Ahoy the list.  I am evaluating Cassandra in the context of using it as a 
 storage back end for the Titan graph database.
  
 We’ll have several nodes in the cluster.  However, one of our requirements is 
 that data has to be loaded into and stored on a specific node and only on 
 that node.  Also, it cannot be replicated around the system, at least not 
 stored persistently on disk – we will of course make copies in memory and on 
 the wire as we access remote notes.  These requirements are non-negotiable.
  
 We understand that this is essentially the opposite of what Cassandra is 
 designed for, and that we’re missing all the scalability and robustness, but 
 is it technically possible?
  
 First, I would need to create a custom partitioner – is there any tutorial on 
 that?  I see a few “you don’t need” to threads, but I do.
  
 Second, how easy is it to have Cassandra not replicate data between nodes in 
 a cluster?  I’m not seeing an obvious configuration option for that, 
 presumably because it obviates much of the point of using Cassandra, but 
 again, we’re working within some rather unfortunate constraints.
  
 Any hints or suggestions would be most gratefully received.
  
 Kind regards,
  
 -Colin MacDonald-

Re: Setting up Cassandra to store on a specific node and not replicate

2013-12-19 Thread Janne Jalkanen


Probably yes, if you also disabled any sort of failovers from the token-aware 
client…

(Talking about this makes you realize how many failsafes Cassandra has. And 
still you can lose data… :-P)

/Janne

On 18 Dec 2013, at 20:31, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Dec 18, 2013 at 2:44 AM, Sylvain Lebresne sylv...@datastax.com 
 wrote:
 As Janne said, you could still have hint being written by other nodes if the 
 one storage node is dead, but you can use the system property 
 cassandra.maxHintTTL to 0 to disable hints.
 
 If one uses a Token Aware client with RF=1, that would seem to preclude 
 hinting even without disabling HH for the entire system; if the coordinator 
 is always the single replica, why would it send a copy anywhere else?
 
 =Rob

Re: Row cache vs. OS buffer cache

2014-01-23 Thread Janne Jalkanen


Our experience is that you want to have all your very hot data fit in the row 
cache (assuming you don’t have very large rows), and leave the rest for the OS. 
 Unfortunately, it completely depends on your access patterns and data what is 
the right size for the cache - zero makes sense for a lot of cases.

Try out different sizes, and watch for row cache hit ratio and read latency. 
Ditto for heap sizes, btw - if your nodes are short on RAM, you may get better 
performance by running at lower heap sizes because OS caches will get more 
memory and your gc pauses will be shorter (though more numerous).

/Janne

On 23 Jan 2014, at 09:13 , Katriel Traum katr...@google.com wrote:

 Hello list,
 
 I was if anyone has any pointers or some advise regarding using row cache vs 
 leaving it up to the OS buffer cache.
 
 I run cassandra 1.1 and 1.2 with JNA, so off-heap row cache is an option.
 
 Any input appreciated.
 Katriel

Weird row cache behaviour

2014-04-06 Thread Janne Jalkanen

Heya!

I’ve been observing some strange and worrying behaviour all this week with row 
cache hits taking hundreds of milliseconds.

Cassandra 1.2.15, Datastax CQL driver 1.0.4.
EC2 m1.xlarge instances
RF=3, N=4
vnodes in use
key cache: 200M
row cache: 200M
row_cache_provider: SerializingCacheProvider
Query: PreparedStatement SELECT * from uniques3 WHERE hash=? AND item=? AND 
event=?. All values are  20 bytes. 
All data is written with a TTL of days.

Row is not particularly wide (see cfhistograms in the pastebin).  Row cache hit 
can take hundreds of milliseconds, pretty much screwing performance. My initial 
thought was garbage collection, but I collected traces and GC logs to the 
pastebin below, so while there *is* plenty of GC going on, I don’t think it’s 
the reason.  We also have other column families accessed through Thrift which 
do not exhibit this behaviour at all. There are no abnormal query times for 
cache misses.

http://pastebin.com/ac6PVHhm

Notice also the weird “triple hump” on the cfhistograms - I’m kinda used to 
seeing two humps, one for cache hits and one for disk access, but this one has 
clearly three humps, one at the 200ms area. Also odd is the very large fp false 
ratio, but that might be just our data.

Armed with the traces I formed a hypothesis that perhaps row cache is a bad 
idea, turned it off for this CF, and hey! The average read latencies dropped to 
about 2 milliseconds.  So I’m kinda fine here now, but I would really 
appreciate it if someone could explain to me what is going on, and why would a 
row cache hit ever take up to 450 milliseconds? In our usecase, this CF does 
contain some hot data, and the row cache hit ratio is around 80%, so keeping it 
would be kinda useful.

(The pastebin contains a couple of traces, GC logs from all servers noted in 
the trace, cfstats, cfhistograms and schema.)

/Janne

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Janne Jalkanen


Don’t know, but as a potential customer of DataStax I’m also concerned at the 
fact that there does not seem to be a competitor offering Cassandra support and 
services. All innovation seems to be occurring only in the OSS version or 
DSE(*).  I’d welcome a competitor for DSE - it does not even have to be so 
well-rounded ;-)

(DSE is really cool, and I think DataStax is doing awesome work. I just get 
uncomfortable when there’s a SPoF - that’s why I’m running Cassandra in the 
first place ;-)

((So yes, you, exactly you who is reading this and thinking of starting a 
company around Cassandra, pitch me when you have a product.))

(((* Yes, Netflix is open sourcing a lot of Cassandra stuff, but I don’t think 
they’re planning to pivot.)))

/Janne

On 14 May 2014, at 23:39, Kevin Burton bur...@spinn3r.com wrote:

 I'm curious what % of cassandra developers are employed by Datastax?
 
 … vs other companies.
 
 When MySQL was acquired by Oracle this became a big issue because even though 
 you can't really buy an Open Source project, you can acquire all the 
 developers and essentially do the same thing.
 
 It would be sad if all of Cassandra's 'eggs' were in one basket and a similar 
 situation happens with Datastax.
 
 Seems like they're doing an awesome job to be sure but I guess it worries me 
 in the back of my mind.
 
 
 
 -- 
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 Skype: burtonator
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
 people.

Re: Moving Cassandra from EC2 Classic into VPC

2014-09-09 Thread Janne Jalkanen


Alain Rodriguez outlined this procedure that he was going to try, but failed to 
mention whether this actually worked :-)

https://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201406.mbox/%3cca+vsrlopop7th8nx20aoz3as75g2jrjm3ryx119deklynhq...@mail.gmail.com%3E

/Janne

On 8 Sep 2014, at 23:05, Oleg Dulin oleg.du...@gmail.com wrote:

 I get that, but if you read my opening post, I have an existing cluster in 
 EC2 classic that I have no idea how to move to VPC cleanly.
 
 
 On 2014-09-08 19:52:28 +, Bram Avontuur said:
 
 I have setup Cassandra into VPC with the EC2Snitch and it works without 
 issues. I didn't need to do anything special to the configuration. I have 
 created instances in 2 availability zones, and it automatically
 picks it up as 2 different data racks. Just make sure your nodes can see each 
 other in the VPC, e.g. setup a security group that allows connections from 
 other nodes from the same group.
 
 There should be no need to use public IP's if whatever talks to cassandra is 
 also within your VPC.
 
 Hope this helps.
 Bram
 
 
 On Mon, Sep 8, 2014 at 3:34 PM, Oleg Dulin oleg.du...@gmail.com wrote:
 Dear Colleagues:
 
 I need to move Cassandra from EC2 classic into VPC.
 
 What I was thinking is that I can create a new data center within VPC and 
 rebuild it from my existing one (switching to vnodes while I am at it). 
 However, I don't understand how the ec2-snitch will deal with this.
 
 Another idea I had was taking the ec2-snitch configuration and converting it 
 into a Property file snitch. But I still don't understand how to perform this 
 move since I need my newly created VPC instances to have public IPs -- 
 something I would like to avoid.
 
 Any thoughts are appreciated.
 
 Regards,
 Oleg

Re: are repairs in 2.0 more expensive than in 1.2

2014-10-23 Thread Janne Jalkanen


On 23 Oct 2014, at 21:29 , Robert Coli rc...@eventbrite.com wrote:

 On Thu, Oct 23, 2014 at 9:33 AM, Sean Bridges sean.brid...@gmail.com wrote:
 The change from parallel to sequential is very dramatic.  For a small cluster 
 with 3 nodes, using cassandra 2.0.10,  a parallel repair takes 2 hours, and 
 io throughput peaks at 6 mb/s.  Sequential repair takes 40 hours, with 
 average io around 27 mb/s.  Should I file a jira?
 
 As you are an actual user actually encountering the problem I had only 
 conjectured about, you are the person best suited to file such a ticket on 
 the reasonableness of the -par default. :D

Hm?  I’ve been banging my head against the exact same problem (cluster size 
five nodes, RF=3, ~40GB/node) - paraller repair takes about 6 hrs whereas 
serial takes some 48 hours or so. In addition, the compaction impact is roughly 
the same - that is, there’s the same number of compactions triggered per 
minute, but serial runs eight times more of them. There does not seem to be a 
difference between the node response latency during parallel or serial repair.

NB: We do increase our compaction throughput during calmer times, and lower it 
through busy times, and the serial compaction takes enough time to hit the busy 
period - that might also have an impact to the overall performance.

If I had known that this had so far been a theoretical problem, I would’ve 
spoken up earlier. Perhaps serial repair is not the best default.

/Janne

Re: are repairs in 2.0 more expensive than in 1.2

2014-10-24 Thread Janne Jalkanen


Commented and added a munin graph, if it helps. For the record, I’m happy with 
-par performance for now.

/Janne

On 24 Oct 2014, at 18:59, Sean Bridges sean.brid...@gmail.com wrote:

 Janne,
 
 I filed CASSANDRA-8177 [1] for this.  Maybe comment on the jira that you are 
 having the same problem.
 
 Sean
 
 [1]  https://issues.apache.org/jira/browse/CASSANDRA-8177
 
 On Thu, Oct 23, 2014 at 2:04 PM, Janne Jalkanen janne.jalka...@ecyrd.com 
 wrote:
 
 On 23 Oct 2014, at 21:29 , Robert Coli rc...@eventbrite.com wrote:
 
 On Thu, Oct 23, 2014 at 9:33 AM, Sean Bridges sean.brid...@gmail.com wrote:
 The change from parallel to sequential is very dramatic.  For a small 
 cluster with 3 nodes, using cassandra 2.0.10,  a parallel repair takes 2 
 hours, and io throughput peaks at 6 mb/s.  Sequential repair takes 40 hours, 
 with average io around 27 mb/s.  Should I file a jira?
 
 As you are an actual user actually encountering the problem I had only 
 conjectured about, you are the person best suited to file such a ticket on 
 the reasonableness of the -par default. :D
 
 Hm?  I’ve been banging my head against the exact same problem (cluster size 
 five nodes, RF=3, ~40GB/node) - paraller repair takes about 6 hrs whereas 
 serial takes some 48 hours or so. In addition, the compaction impact is 
 roughly the same - that is, there’s the same number of compactions triggered 
 per minute, but serial runs eight times more of them. There does not seem to 
 be a difference between the node response latency during parallel or serial 
 repair.
 
 NB: We do increase our compaction throughput during calmer times, and lower 
 it through busy times, and the serial compaction takes enough time to hit the 
 busy period - that might also have an impact to the overall performance.
 
 If I had known that this had so far been a theoretical problem, I would’ve 
 spoken up earlier. Perhaps serial repair is not the best default.
 
 /Janne

Re: Practical use of counters in the industry

2014-12-23 Thread Janne Jalkanen


On 20 Dec 2014, at 09:46, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Dec 18, 2014 at 7:19 PM, Rajath Subramanyam rajat...@gmail.com 
 wrote:
 Thanks Ken. Any other use cases where counters are used apart from Rainbird ? 
 
 Disqus use(d? s?) them behind an in-memory accumulator which batches and 
 periodically flushes. This is the best way to use old counters. New 
 counters should be usable in more cases without something in front of them. 

We at Thinglink also have been using the same strategy successfully: collect 
stats in-memory, batch them periodically to a wide LCS cf.  Our stats are 
somewhat bursty (=same event can occur several times per second), so doing 
in-memory accumulation is useful to reduce write load.  We’ll re-benchmark once 
we upgrade to 2.1, since the counter and CQL implementations have changed for 
the better.

/Janne

Re: User click count

2014-12-29 Thread Janne Jalkanen


Hi!

It’s really a tradeoff between accurate and fast and your read access patterns; 
if you need it to be fairly fast, use counters by all means, but accept the 
fact that they will (especially in older versions of cassandra or adverse 
network conditions) drift off from the true click count.  If you need accurate, 
use a timeuuid and count the rows (this is fairly safe for replays too).  
However, if using timeuuids your storage will need lots of space; and your 
reads will be slow if the click counts are huge (because Cassandra will need to 
read every item).  Using counters makes it easy to just grab a slice of the 
time series data and shove it to a client for visualization.

You could of course do a hybrid system; use timeuuids and then periodically 
count and add the result to a regular column, and then remove the columns.  
Note that you might want to optimize this so that you don’t end up with a lot 
of tombstones, e.g. by bucketing the writes so that you can delete everything 
with just a single partition delete.

At Thinglink some of the more important counters that we use are backed up by 
the actual data. So for speed purposes we use always counters for reads, but 
there’s a repair process that fixes the counter value if we suspect it starts 
drifting off the real data too much.  (You might be able to tell that we’ve 
been using counters for quite some time :-P)

/Janne

On 29 Dec 2014, at 13:00, Ajay ajay.ga...@gmail.com wrote:

 Hi,
 
 Is it better to use Counter to User click count than maintaining creating new 
 row as user id : timestamp and count it.
 
 Basically we want to track the user clicks and use the same for 
 hourly/daily/monthly report.
 
 Thanks
 Ajay

Re: User click count

2014-12-30 Thread Janne Jalkanen

Hi!

Yes, since all the writes for a partition (or row if you speak Thrift) always
go to the same replicas, you will need to design to avoid hotspots - a pure day
row will cause all the writes for a single day to go to the same replicas, so
those nodes will have to work really hard for a day, and then the next day it’s
again hard work for some other nodes. If you have an user id there in front,
then it would distribute better.

For tombstone purposes think of your access patterns; if you have a date-based
system, it probably does not matter since you will scan those UUIDs once, and
then they will be tombstoned away. It’s cleaner if you can delete the entire
row with a single command, but as long as you never read it again, I don’t
think this matters much.

The real problems with wide rows come with compaction, and you shouldn’t have
much problems with compaction because this is an append-only row, so it should
be fine as a fairly wide row. Make some back-of-the-envelope calculations and
if it looks like you’re going to be hitting tens of millions of columns per
day, then store per hour.

One important thing: in order not to lose clicks, always use timeuuids instead
of timestamps (or else two clicks coming in for the same id would overwrite
itself and count as one).

/Janne

On 30 Dec 2014, at 06:28, Ajay ajay.ga...@gmail.com wrote:

Thanks Janne, Alain and Eric.

Now say I go with counters (hourly, daily, monthly) and also store UUID as
below:

user Id : /mm/dd as row key and dynamic columns for each click with
column key as timestamp and value as empty. Periodically count the columns
and rows and correct the counters. Now in this case, there will be one row
per day but as many columns as user click.

Other way is to store row per hour
user id : /mm/dd/hh as row key and dynamic columns for each click with
column key as timestamp and value as empty.

Is there any difference (in performance or any known issues) between more
rows Vs more columns as Cassandra deletes them through tombstones (say by
default 20 days).

Thanks
Ajay

On Mon, Dec 29, 2014 at 7:47 PM, Eric Stevens migh...@gmail.com wrote:
If the counters get incorrect, it could't be corrected

You'd have to store something that allowed you to correct it. For example,
the TimeUUID approach to keep true counts, which are slow to read but
accurate, and a background process that trues up your counter columns
periodically.

On Mon, Dec 29, 2014 at 7:05 AM, Ajay ajay.ga...@gmail.com wrote:
Thanks for the clarification.

In my case, Cassandra is the only storage. If the counters get incorrect, it
could't be corrected. For that if we store raw data, we can as well go that
approach. But the granularity has to be as seconds level as more than one
user can click the same link. So the data will be huge with more writes and
more rows to count for reads right?

Thanks
Ajay

On Mon, Dec 29, 2014 at 7:10 PM, Alain RODRIGUEZ arodr...@gmail.com wrote:
Hi Ajay,

Here is a good explanation you might want to read.

http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

Though we use counters for 3 years now, we used them from start C* 0.8 and we
are happy with them. Limits I can see in both ways are:

Counters:

- accuracy indeed (Tend to be small in our use case 5% - when the business
allow 10%, so fair enough for us) + we recount them through a batch
processing tool (spark / hadoop - Kind of lambda architecture). So our
real-time stats are inaccurate and after a few minutes or hours we have the
real value.
- Read-Before-Write model, which is an anti-pattern. Makes you use more
machine due to the pressure involved, affordable for us too.

Raw data (counted)

- Space used (can become quite impressive very fast, depending on your
business) !
- Time to answer a request (we expose the data to customer, they don't want
to wait 10 sec for Cassandra to read 1 000 000 + columns)
- Performances in o(n) (linear) instead of o(1) (constant). Customer won't
always understand that for you it is harder to read 1 than 1 000 000, since
it should be reading 1 number in both case, and your interface will have very
unstable read time.

Pick the best solution (or combination) for your use case. Those
disadvantages lists are not exhaustive, just things that came to my mind
right now.

C*heers

Alain

2014-12-29 13:33 GMT+01:00 Ajay ajay.ga...@gmail.com:
Hi,

So you mean to say counters are not accurate? (It is highly likely that
multiple parallel threads trying to increment the counter as users click the
links).

Thanks
Ajay

On Mon, Dec 29, 2014 at 4:49 PM, Janne Jalkanen janne.jalka...@ecyrd.com
wrote:

Hi!

It’s really a tradeoff between accurate and fast and your read access
patterns; if you need it to be fairly fast, use counters by all means, but
accept the fact

Re: Cassandra Data Stax java driver Snappy Compression library

2015-08-02 Thread Janne Jalkanen

No, this just tells that your client (S3 using Datastax driver) cannot 
communicate to the Cassandra cluster using a compressed protocol, since the 
necessary libraries are missing on the client side.  Servers will still 
compress the data they receive when they write it to disk.

In other words

Client  - [uncompressed data] - Server - [compressed data] - Disk. 

To fix, make sure that the Snappy libraries are in the classpath of your S3 
service application.  As always, there’s no guarantee that this improves your 
performance, since if your app is already CPU-heavy, the extra CPU overhead of 
compression *may* be a problem.  So measure :-)

/Janne

 On 02 Aug 2015, at 02:17 , Sachin Nikam skni...@gmail.com wrote:
 
 I am currently running a Cassandra 1.2 cluster. This cluster has 2 tables i.e.
 TableA and TableB.
 
 TableA is read and written to by Services S1 and S2 which use Astyanax client 
 library.
 
 TableB is read and written by Service S3 which uses the datastax java driver 
 2.1. S3 also reads data from TableA.
 
 Both TableA and TableB are defined on the Cassandra nodes to use 
 SnappyCompressor.
 
 On start-up service, Service S3 throws the following WARNing messages. The 
 service is able to continue doing its normal operation thereafter
 
 **
 [main] WARN  loggerClass=com.datastax.driver.core.FrameCompressor;Cannot find 
 Snappy class, you should make sure the Snappy library is in the classpath if 
 you intend to use it. Snappy compression will not be available for the 
 protocol.
 ***
 
 
 My questions are as follows--
 #1. Does the compression happen on the cassandra client side or within 
 cassandra server side itself?
 #2. Does Service S3 need to pull in additional dependencies for Snappy 
 Compressions as mentioned here --
 http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error
  
 http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error
 #3. What happens without this additional library not being present on class 
 path of Service S3. Any data that S3 writes to TableB will not be compressed? 
 Regards
 Sachin

Re: Cassandra Data Stax java driver Snappy Compression library

2015-08-05 Thread Janne Jalkanen

I’ve never used Astyanax, so it’s difficult to say, but if you can find the 
snappy-java in the classpath, it’s quite possible that compression is enabled 
for S1 and S2 automatically. You could try removing the snappy jar from S1 and 
see if that changes the latencies compared to S2. ;-)

It probably has some impact on end-to-end latency, but there are multiple other 
things which also impact latency, such as whether you’re using prepared queries 
with the Datastax driver, how large your queries are, etc.  In general the 
consensus seems to be that using CQL over the Datastax driver is 1) very fast 
and since 2.1 of Cassandra, arguably faster than the Thrift interface that the 
older clients still use, b) the clarity of the CQL interface gives a 
productivity boost for developers and iii) all new features will be implemented 
using it, so using CQL is future-proof.

/Janne

 On 5 Aug 2015, at 06:34, Sachin Nikam skni...@gmail.com wrote:
 
 Janne,
 A little clarification i found snappy-java-1.0.4.1.jar on class path. But 
 other questions still remain.
 
 On Tue, Aug 4, 2015 at 8:24 PM, Sachin Nikam skni...@gmail.com 
 mailto:skni...@gmail.com wrote:
 Janne,
 Thanks for continuing to take the time to answer my queries. We noticed that 
 write latency (tp99) from Services S1 and S2 is 50% of the write latency 
 (tp99) for Service S3. I also noticed that S1 and S2, which also use astyanax 
 client library also have compress-lzf.jar on their class path. Although the 
 table is defined to use Snappy Compression. Is this compression library or 
 some other transitive dependency pulled in by Astyanax enabling compression 
 of the payload i.e. sent over the wire and account for the difference in tp99?
 Regards
 Sachin
 
 On Mon, Aug 3, 2015 at 12:14 AM, Janne Jalkanen janne.jalka...@ecyrd.com 
 mailto:janne.jalka...@ecyrd.com wrote:
 
 Correct. Note that you may lose some performance this way though; in a 
 typical case saving bandwidth by increasing CPU usage is good. However, it 
 always depends on your usecase and whether you’re running your cluster to the 
 max. It’s a good, low-hanging optimization to keep in mind though for 
 production environments, if you choose not to enable compression now.
 
 /Janne
 
 On 3 Aug 2015, at 08:40, Sachin Nikam skni...@gmail.com 
 mailto:skni...@gmail.com wrote:
 
 Thanks Janne...
 To clarify, Service S3 should not run in to any issues and I may choose to 
 not fix the issue?
 Regards
 Sachin
 
 On Sat, Aug 1, 2015 at 11:50 PM, Janne Jalkanen janne.jalka...@ecyrd.com 
 mailto:janne.jalka...@ecyrd.com wrote:
 No, this just tells that your client (S3 using Datastax driver) cannot 
 communicate to the Cassandra cluster using a compressed protocol, since the 
 necessary libraries are missing on the client side.  Servers will still 
 compress the data they receive when they write it to disk.
 
 In other words
 
 Client  - [uncompressed data] - Server - [compressed data] - Disk. 
 
 To fix, make sure that the Snappy libraries are in the classpath of your S3 
 service application.  As always, there’s no guarantee that this improves 
 your performance, since if your app is already CPU-heavy, the extra CPU 
 overhead of compression *may* be a problem.  So measure :-)
 
 /Janne
 
 On 02 Aug 2015, at 02:17 , Sachin Nikam skni...@gmail.com 
 mailto:skni...@gmail.com wrote:
 
 I am currently running a Cassandra 1.2 cluster. This cluster has 2 tables 
 i.e.
 TableA and TableB.
 
 TableA is read and written to by Services S1 and S2 which use Astyanax 
 client library.
 
 TableB is read and written by Service S3 which uses the datastax java 
 driver 2.1. S3 also reads data from TableA.
 
 Both TableA and TableB are defined on the Cassandra nodes to use 
 SnappyCompressor.
 
 On start-up service, Service S3 throws the following WARNing messages. The 
 service is able to continue doing its normal operation thereafter
 
 **
 [main] WARN  loggerClass=com.datastax.driver.core.FrameCompressor;Cannot 
 find Snappy class, you should make sure the Snappy library is in the 
 classpath if you intend to use it. Snappy compression will not be available 
 for the protocol.
 ***
 
 
 My questions are as follows--
 #1. Does the compression happen on the cassandra client side or within 
 cassandra server side itself?
 #2. Does Service S3 need to pull in additional dependencies for Snappy 
 Compressions as mentioned here --
 http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error
  
 http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error
 #3. What happens without this additional library not being present on class 
 path of Service S3. Any data that S3 writes to TableB will not be 
 compressed? 
 Regards
 Sachin

Re: Cassandra Data Stax java driver Snappy Compression library

2015-08-03 Thread Janne Jalkanen


Correct. Note that you may lose some performance this way though; in a typical 
case saving bandwidth by increasing CPU usage is good. However, it always 
depends on your usecase and whether you’re running your cluster to the max. 
It’s a good, low-hanging optimization to keep in mind though for production 
environments, if you choose not to enable compression now.

/Janne

 On 3 Aug 2015, at 08:40, Sachin Nikam skni...@gmail.com wrote:
 
 Thanks Janne...
 To clarify, Service S3 should not run in to any issues and I may choose to 
 not fix the issue?
 Regards
 Sachin
 
 On Sat, Aug 1, 2015 at 11:50 PM, Janne Jalkanen janne.jalka...@ecyrd.com 
 mailto:janne.jalka...@ecyrd.com wrote:
 No, this just tells that your client (S3 using Datastax driver) cannot 
 communicate to the Cassandra cluster using a compressed protocol, since the 
 necessary libraries are missing on the client side.  Servers will still 
 compress the data they receive when they write it to disk.
 
 In other words
 
 Client  - [uncompressed data] - Server - [compressed data] - Disk. 
 
 To fix, make sure that the Snappy libraries are in the classpath of your S3 
 service application.  As always, there’s no guarantee that this improves your 
 performance, since if your app is already CPU-heavy, the extra CPU overhead 
 of compression *may* be a problem.  So measure :-)
 
 /Janne
 
 On 02 Aug 2015, at 02:17 , Sachin Nikam skni...@gmail.com 
 mailto:skni...@gmail.com wrote:
 
 I am currently running a Cassandra 1.2 cluster. This cluster has 2 tables 
 i.e.
 TableA and TableB.
 
 TableA is read and written to by Services S1 and S2 which use Astyanax 
 client library.
 
 TableB is read and written by Service S3 which uses the datastax java driver 
 2.1. S3 also reads data from TableA.
 
 Both TableA and TableB are defined on the Cassandra nodes to use 
 SnappyCompressor.
 
 On start-up service, Service S3 throws the following WARNing messages. The 
 service is able to continue doing its normal operation thereafter
 
 **
 [main] WARN  loggerClass=com.datastax.driver.core.FrameCompressor;Cannot 
 find Snappy class, you should make sure the Snappy library is in the 
 classpath if you intend to use it. Snappy compression will not be available 
 for the protocol.
 ***
 
 
 My questions are as follows--
 #1. Does the compression happen on the cassandra client side or within 
 cassandra server side itself?
 #2. Does Service S3 need to pull in additional dependencies for Snappy 
 Compressions as mentioned here --
 http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error
  
 http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error
 #3. What happens without this additional library not being present on class 
 path of Service S3. Any data that S3 writes to TableB will not be 
 compressed? 
 Regards
 Sachin

Re: [RELEASE] Apache Cassandra 3.1 released

2015-12-09 Thread Janne Jalkanen

I’m sorry, I don’t understand the new release scheme at all. Both of these are 
bug fixes on 3.0? What’s the actual difference?

If I just want to run the most stable 3.0, should I run 3.0.1 or 3.1?  Will 3.0 
gain new features which will not go into 3.1, because that’s a bug fix release 
on 3.0? So 3.0.x will contain more features than 3.1, as even-numbered releases 
will be getting new features? Or is 3.0.1 and 3.1 essentially the same thing? 
Then what’s the role of 3.1? Will there be more than one 3.1? 3.1.1? Or is it 
3.3? What’s the content of that? 3.something + patches = 3.what?

What does this statement in the referred blog post mean? "Under normal 
conditions, we will NOT release 3.x.y stability releases for x > 0.” Why are 
the normal conditions being violated already by releasing 3.1 (since 1 > 0)? 

/Janne, who is completely confused by all this, and suspects he’s the target of 
some hideous joke.

> On 8 Dec 2015, at 22:26, Jake Luciani  wrote:
> 
> 
> The Cassandra team is pleased to announce the release of Apache Cassandra
> version 3.1. This is the first release from our new Tick-Tock release 
> process[4]. 
> It contains only bugfixes on the 3.0 release.
> 
> Apache Cassandra is a fully distributed database. It is the right choice
> when you need scalability and high availability without compromising
> performance.
> 
>  http://cassandra.apache.org/ 
> 
> Downloads of source and binary distributions are listed in our download
> section:
> 
>  http://cassandra.apache.org/download/ 
> 
> This version is a bug fix release[1] on the 3.x series. As always, please pay
> attention to the release notes[2] and Let us know[3] if you were to encounter
> any problem.
> 
> Enjoy!
> 
> [1]: http://goo.gl/rQJ9yd  (CHANGES.txt)
> [2]: http://goo.gl/WBrlCs  (NEWS.txt)
> [3]: https://issues.apache.org/jira/browse/CASSANDRA 
> 
> [4]: http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/ 
> 
>

Re: [RELEASE] Apache Cassandra 3.1 released

2015-12-13 Thread Janne Jalkanen

> There's not going to be a 3.3.x series, there will be one 3.3 release (unless 
> there is a critical bug, as mentioned above).
> 
> There are two separate release lines going on:
> 
> 3.0.1 -> 3.0.2 -> 3.0.3 -> 3.0.4 -> ... (every release is a bugfix)
> 
> 3.1 -> 3.2 -> 3.3 -> 3.4 -> ... (odd numbers are bugfix releases, even 
> numbers may contain new features)

Ooh, okay. This explains everything. I wish this schematic would've been a part 
of the initial discussion. I'm not entirely convinced this will actually 
achieve the desired effect, but it's worth a try anyway :-)  Thank you, Mr 
Hobbs!

/Janne

Re: [RELEASE] Apache Cassandra 3.1 released

2015-12-11 Thread Janne Jalkanen

Thanks for this clarification, however...

> So, for the 3.x line:
> If you absolutely must have the most stable version of C* and don't care at 
> all about the new features introduced in even versions of 3.x, you want the 
> 3.0.N release.

So there is no reason why you would ever want to run 3.1 then?  Why was it 
released?  What is the lifecycle of 3.0.x? Will it become obsolete once 3.3 
comes out?

> If you want access to the new features introduced in even release versions of 
> 3.x (3.2, 3.4, 3.6), you'll want to run the latest odd version (3.3, 3.5, 
> 3.7, etc) after the release containing the feature you want access to (so, if 
> the feature's introduced in 3.4 and we haven't dropped 3.5 yet, obviously 
> you'd need to run 3.4).

Are there going to be minor releases of the even releases, i.e. 3.2.1?  Or will 
they all be delegated to 3.3.x -series?  Or will there be a series of identical 
releases like 3.1 and 3.0.1 with 3.2.1 and 3.3?

> This is only going to be the case during the transition phase from old 
> release cycles to tick-tock. We're targeting changes to CI and quality focus 
> going forward to greatly increase the stability of the odd releases of major 
> branches (3.1, 3.3, etc) so, for the 4.X releases, our recommendation would 
> be to run the highest # odd release for greatest stability.

So here you tell to run 3.1, but above you tell to run 3.0.1?  Why is there a 
different release scheme specifically for 3.0.x instead of putting those fixes 
to 3.1?

/Janne

Re: Revisit Cassandra EOL Policy

2016-01-07 Thread Janne Jalkanen


If you wish to have a specific EOL policy, you need to basically buy it. It's 
unusual for open source projects to give any sort of an EOL policy; that's 
something that people with very specific requirements are willing to cough up a 
lot of money on. And getting money by giving support on older versions, having 
contracts and EOL dates and all that stuff that corporations love is something 
that enables companies to actually make money on open source projects.

Have you considered contacting Datastax and checked their Cassandra EOL policy? 
 They seem to be very well aligned on what you are looking for.

http://www.datastax.com/support-policy#9 


/Janne

> On 07 Jan 2016, at 03:26, Anuj Wadehra  wrote:
> 
> I would appreciate if you guys share your thoughts on the concerns I 
> expressed regarding Cassandra End of Life policy. I think these concerns are 
> quite genuine and should be openly discussed so that EOL is more predictable 
> and generates less overhead for the users.
> 
> I would like to understand how various users are dealing with the situation. 
> Are you upgrading Cassandra every 3-6 mths? How do you cut short your 
> planning,test and release cycles for Cassandra upgrades in your 
> application/products?
> 
> 
> 
> 
> Thanks
> Anuj
> 
> 
> 
> On Tue, 5 Jan, 2016 at 8:04 pm, Anuj Wadehra
>  wrote:
> Hi,
> 
> As per my understanding, a Cassandra version n is implicitly declared EOL 
> when two major versions are released after the version n i.e. when version n 
> + 2 is released.
> 
> I think the EOL policy must be revisted in interest of the expanding 
> Cassandra user base. 
> 
> Concerns with current EOL Policy:
> 
> In March 2015, Apache web site mentioned that 2.0.14 is the most stable 
> version of the Cassandra recommended for Production. So, one would push its 
> clients to upgrade to 2.0.14 in Mar 2015. It takes months to roll out a 
> Cassandra upgrade to all your clients and by the time all your clients get 
> the upgrade, the version is declared EOL with the release of 2.2 in Aug 2015 
> (within 6 mths of being declared production ready). I completely understand 
> that supporting multiple versions is tougher but at the same time it is very 
> painful and somewhat unrealistic for users to push Cassandra upgrades to all 
> thier clients after every few months.
> 
> One proposed solution could be to declare a version n as EOL one year after 
> n+1 was declared Production Ready. E.g. if 2.1.7 is the first production 
> ready release of 2.1 which is released in Jun 2015, I would declare 2.0 EOL 
> in Jun 2016. This gives reasonable time for users to plan upgrades.
> 
> Moreover, I think the EOL policy and declarations must be documented 
> explicitly on Apache web site.
> 
> Please share your feedback on revisting the EOL policy.
> 
> Thanks
> Anuj
>

Re: Read efficiency question

2016-12-30 Thread Janne Jalkanen

In practice, the performance you’re getting is likely to be impacted by your reading patterns. If you do a lot of sequential reads where key1 and key2 stay the same, and only key3 varies, then you may be getting better peformance out of the second option due to hitting the row and disk caches more often. If you are doing a lot of scatter reads, then you’re likely to get better performance out of the first option, because the reads will be distributed more evenly to multiple nodes. It also depends on how large rows you’re planning to use, as this will directly impact things like compaction which has an overall impact of the entire cluster speed. For just a few values of key3, I doubt there would be much difference in performance, but if key3 has a cardinality of say, a million, you might be better off with option 1.As always the advice is - benchmark your intended use case - put a few hundred gigs of mock data to a cluster, trigger compactions and do perf tests for different kinds of read/write loads. :-)(Though if I didn’t know what my read pattern would be, I’d probably go for option 1 purely on a gut feeling if I was sure I would never need range queries on key3; shorter rows *usually* are a bit better for performance, compaction, etc. Really wide rows can sometimes be a headache operationally.)
May you have energy and success!/Janne

On 28 Dec 2016, at 16:44, Manoj Khangaonkar wrote:In the first case, the partitioning is based on key1,key2,key3.In the second case, partitioning is based on key1 , key2. Additionally you have a clustered key key3. This means within a partition you can do range queries on key3 efficiently. That is the difference.regardsOn Tue, Dec 27, 2016 at 7:42 AM, Voytek Jarnot wrote:Wondering if there's a difference when querying by primary key between the two definitions below:primary key ((key1, key2, key3))primary key ((key1, key2), key3)In terms of read speed/efficiency... I don't have much of a reason otherwise to prefer one setup over the other, so would prefer the most efficient for querying.Thanks.
-- http://khangaonkar.blogspot.com/

73 matches

Mail list logo