Re: Weird GC

2014-01-31 Thread Joel Samuelsson
Thanks for your help.

I've added those flags as well as some others I saw in another thread that
redirects stdout to a file. What information is it that you need?


2014-01-29 Benedict Elliott Smith belliottsm...@datastax.com:

 It's possible the time attributed to GC is actually spent somewhere else;
 a multitude of tasks may occur during the same safepoint as a GC. We've
 seen some batch revoke of biased locks take a long time, for instance;
 *if* this is happening in your case, and we can track down which objects,
 I would consider it a bug and we may be able to fix it.

 -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1


 On 29 January 2014 16:23, Joel Samuelsson samuelsson.j...@gmail.comwrote:

 Hi,

 We've been trying to figure out why we have so long and frequent
 stop-the-world GC even though we have basically no load.

 Today we got a log of a weird GC that I wonder if you have any theories
 of why it might have happened.

 A plot of our heap at the time, paired with the GC time from the
 Cassandra log:
 http://imgur.com/vw5rOzj
 -The blue line is the ratio of Eden space used (i.e. 1.0 = full)
 -The red line is the ratio of Survivor0 space used
 -The green line is the ratio of Survivor1 space used
 -The teal line is the ratio of Old Gen space used
 -The pink line shows during which period of time a GC happened (from the
 Cassandra log)

 Eden space is filling up and being cleared as expected in the first and
 last hill but on the middle one, it takes two seconds to clear Eden (note
 that Eden has ratio 1 for 2 seconds). Neither the survivor spaces nor old
 generation increase significantly afterwards.

 Any ideas of why this might be happening?
 We have swap disabled, JNA enabled, no CPU spikes at the time, no disk
 I/O spikes at the time. What else could be causing this?

 /Joel Samuelsson





Reverting from VirtualNode

2014-01-31 Thread Víctor Hugo Oliveira Molinar
Once we set nodes to act as virtualnodes, there is an way to revert to
manual assigned token?

I have two nodes for testing and there I set 'num_tokens: 256' and let
initial_token line commented. VirtualNodes worked fine.
But then I tried to switch back by commenting 'num_tokens' line and
uncommenting 'initial_token', although after starting cassandra and typing,
./nodetool -h 'ip' ring
there are still the default 256 tokens per node.

What am I missing?

Att,
*Víctor Hugo Molinar*


Re: Weird GC

2014-01-31 Thread Benedict Elliott Smith
You should expect to see lines of output like:

 vmop[threads: total initially_running
wait_to_block][time: spin block sync cleanup vmop] page_trap_count
0.436: Deoptimize   [  10  0
   0]  [ 0 0 0 0 0]  0
1.437: no vm operation  [  18  0
   1]  [ 0 0 0 0 0]  0
1.762: Deoptimize   [  21  0
   0]  [ 0 0 0 0 0]  0
2.764: no vm operation  [ 160  0
   1]  [ 0 0 0 0 0]  0
2.876: Deoptimize   [ 161  0
   0]  [ 0 0 0 0 0]  0
4.503: EnableBiasedLocking  [ 164  0
   0]  [ 0 0 0 0 0]  0
6.916: RevokeBias   [ 164  0
   0]  [ 0 0 0 0 0]  0

You're looking for any of these lines printed at or around one of your
unexpectedly long pauses.




On 31 January 2014 10:40, Joel Samuelsson samuelsson.j...@gmail.com wrote:

 Thanks for your help.

 I've added those flags as well as some others I saw in another thread that
 redirects stdout to a file. What information is it that you need?


 2014-01-29 Benedict Elliott Smith belliottsm...@datastax.com:

 It's possible the time attributed to GC is actually spent somewhere else;
 a multitude of tasks may occur during the same safepoint as a GC. We've
 seen some batch revoke of biased locks take a long time, for instance;
 *if* this is happening in your case, and we can track down which
 objects, I would consider it a bug and we may be able to fix it.

 -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1


 On 29 January 2014 16:23, Joel Samuelsson samuelsson.j...@gmail.comwrote:

 Hi,

 We've been trying to figure out why we have so long and frequent
 stop-the-world GC even though we have basically no load.

 Today we got a log of a weird GC that I wonder if you have any theories
 of why it might have happened.

 A plot of our heap at the time, paired with the GC time from the
 Cassandra log:
 http://imgur.com/vw5rOzj
 -The blue line is the ratio of Eden space used (i.e. 1.0 = full)
 -The red line is the ratio of Survivor0 space used
 -The green line is the ratio of Survivor1 space used
 -The teal line is the ratio of Old Gen space used
 -The pink line shows during which period of time a GC happened (from the
 Cassandra log)

 Eden space is filling up and being cleared as expected in the first and
 last hill but on the middle one, it takes two seconds to clear Eden (note
 that Eden has ratio 1 for 2 seconds). Neither the survivor spaces nor old
 generation increase significantly afterwards.

 Any ideas of why this might be happening?
 We have swap disabled, JNA enabled, no CPU spikes at the time, no disk
 I/O spikes at the time. What else could be causing this?

 /Joel Samuelsson






Ultra wide row anti pattern

2014-01-31 Thread DuyHai Doan
Hello all

 I've read some materials on the net about Cassandra anti patterns, among
which is mentionned the very large wide-row anti pattern.

 The main rationale to avoid too wide rows are:

 1) fragmentation of data on multiple SStables when the row is very wide,
leading to very slow reads by slice query

 2) repair un-efficient. During repair C* exchanges hashes of row data.
Event if only one column differs, C* still exchange the whole row. Having
very wide rows make repair very expensive

 3) bad scaling. Having wide rows localized on some nodes of your cluster
will create hotspots

 4) hard limit of 2*10⁹ columns per physical row

All those recommendations are quite sensible. Now my customer has a quite
specific use case:

 a. no repair nor durability. C* is used to dump massive data (heavy write
+ read) for temporary processing. The tables are truncated at the end of a
long running processing. So the point 2) does not apply here

 b. maximum number of items to be processed is 24*10⁶, far below the hard
limit of  2*10⁹ columns so point 4) does not apply either

 c. small cluster of only 2 nodes, so load balancing is quite
straightforward (50% roughly on each node). Therefore point 3) does not
apply either

 The only drawback for ultra wide row I can see is point 1). But if I use
leveled compaction with a sufficiently large value for sstable_size_in_mb
(let's say 200Mb), will my read performance be impacted as the row grows ?

 Of course, splitting wide row into several rows using bucketing technique
is one solution but it forces us to keep track of the bucket number and
it's not convenient. We have one process (jvm) that insert data and another
process (jvm) that read data. Using bucketing, we need to synchronize the
bucket number between the 2 processes.

 For information, below is the wide row table definition:

 create table widerow(
  status text,
  insertiondate timeuuid,
  userid long,
  PRIMARY KEY (status,insertiondate));

 the widerow serves to track user insertion status (status : {TODO,
IMPORTED,CHECKED}).

 The read pattern is always:

 SELECT userid FROM widerow WHERE status = 'xxx' AND
insertiondate{last_processed_user_insertion_date}




I'll be interested by your insights and remarks about this data model.

 Regards

 Duy Hai DOAN


exception during add node due to test beforeAppend on SSTableWriter

2014-01-31 Thread Desimpel, Ignace
4 node, byte ordered, LCS, 3 Compaction Executors, replication factor 1
Code is 2.0.4 version but with patch for 
CASSANDRA-6638https://issues.apache.org/jira/browse/CASSANDRA-6638 However, 
no cleanup is run so patch should not play a roll

4 node cluster is started and insert/queries are done up to about only 10 GB of 
data on each node.
Then decommission one node, and delete local files.
Then add node again.
Exception : see below.

Any idea?

Regards,
Ignace Desimpel


  *   2014-01-31 17:12:02.600 == Bootstrap is streaming data from other 
nodes... Please wait ...
  *   2014-01-31 17:12:02.600 == Bootstrap stream state : rx= 29.00 tx= 
100.00 Please wait ...
  *   2014-01-31 17:12:18.908 Enqueuing flush of 
Memtable-compactions_in_progress@350895652(0/0 serialized/live bytes, 1 ops)
  *   2014-01-31 17:12:18.908 Writing 
Memtable-compactions_in_progress@350895652(0/0 serialized/live bytes, 1 ops)
  *   2014-01-31 17:12:19.009 Completed flushing 
../../../../data/cdi.cassandra.cdi/dbdatafile/system/compactions_in_progress/system-compactions_in_progress-jb-74-Data.db
 (42 bytes) for commitlog position ReplayPosition(segmentId=1391184546183, 
position=561494)
  *   2014-01-31 17:12:19.018 Exception in thread 
Thread[CompactionExecutor:1,1,main]
  *   java.lang.RuntimeException: Last written key 
DecoratedKey(8afc9237010380178575, 8afc9237010380178575) = 
current key DecoratedKey(6e0bb955010383dfdd1d, 
6e0bb955010383dfdd1d) writing into 
/media/datadrive1/cdi.cassandra.cdi/dbdatafile/Ks100K/ForwardLongFunction/Ks100K-ForwardLongFunction-tmp-jb-159-Data.db
  *   at 
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142)
 ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
  *   at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) 
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
  *   at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160)
 ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
  *   at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
  *   at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
  *   at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
 ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
  *   at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
 ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
  *   at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
 ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
  *   at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
~[na:1.7.0_40]
  *   at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_40]
  *   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
~[na:1.7.0_40]
  *   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_40]
  *   at java.lang.Thread.run(Thread.java:724) [na:1.7.0_40]



exception during add node due to test beforeAppend on SSTableWriter

2014-01-31 Thread Desimpel, Ignace
The join with auto bootstrap itself was finished. So I restarted the added 
node. During restart I saw a message indicating that something is wrong about 
this row and sstable.
Of course, in my case I did not drop sstable from another node. But I did 
decommission and add the node, so that is still a kind of 
'data-from-another-node'.

At level 2, 
SSTableReader(path='../../../../data/cdi.cassandra.cdi/dbdatafile/Ks100K/ForwardStringFunction/Ks100K-ForwardStringFunction-jb-67-Data.db')
 [DecoratedKey(065864ce01024e4e505300, 065864ce01024e4e505300), 
DecoratedKey(14c9d35e0102646973706f736974696f6e7300, 
14c9d35e0102646973706f736974696f6e7300)] overlaps 
SSTableReader(path='../../../../data/cdi.cassandra.cdi/dbdatafile/Ks100K/ForwardStringFunction/Ks100K-ForwardStringFunction-jb-64-Data.db')
 [DecoratedKey(068c2e4101024d6f64616c207665726200, 
068c2e4101024d6f64616c207665726200), 
DecoratedKey(06c566b4010244657465726d696e657200, 
06c566b4010244657465726d696e657200)].  This could be caused by a bug in 
Cassandra 1.1.0 .. 1.1.3 or due to the fact that you have dropped sstables from 
another node into the data directory. Sending back to L0.  If you didn't drop 
in sstables, and have not yet run scrub, you should do so since you may also 
have rows out-of-order within an sstable



From: Desimpel, Ignace
Sent: vrijdag 31 januari 2014 17:43
To: user@cassandra.apache.org
Subject: exception during add node due to test beforeAppend on SSTableWriter

4 node, byte ordered, LCS, 3 Compaction Executors, replication factor 1
Code is 2.0.4 version but with patch for 
CASSANDRA-6638https://issues.apache.org/jira/browse/CASSANDRA-6638 However, 
no cleanup is run so patch should not play a roll

4 node cluster is started and insert/queries are done up to about only 10 GB of 
data on each node.
Then decommission one node, and delete local files.
Then add node again.
Exception : see below.

Any idea?

Regards,
Ignace Desimpel


  *   2014-01-31 17:12:02.600 == Bootstrap is streaming data from other 
nodes... Please wait ...
  *   2014-01-31 17:12:02.600 == Bootstrap stream state : rx= 29.00 tx= 
100.00 Please wait ...
  *   2014-01-31 17:12:18.908 Enqueuing flush of 
Memtable-compactions_in_progress@350895652(0/0 serialized/live bytes, 1 ops)
  *   2014-01-31 17:12:18.908 Writing 
Memtable-compactions_in_progress@350895652(0/0 serialized/live bytes, 1 ops)
  *   2014-01-31 17:12:19.009 Completed flushing 
../../../../data/cdi.cassandra.cdi/dbdatafile/system/compactions_in_progress/system-compactions_in_progress-jb-74-Data.db
 (42 bytes) for commitlog position ReplayPosition(segmentId=1391184546183, 
position=561494)
  *   2014-01-31 17:12:19.018 Exception in thread 
Thread[CompactionExecutor:1,1,main]
  *   java.lang.RuntimeException: Last written key 
DecoratedKey(8afc9237010380178575, 8afc9237010380178575) = 
current key DecoratedKey(6e0bb955010383dfdd1d, 
6e0bb955010383dfdd1d) writing into 
/media/datadrive1/cdi.cassandra.cdi/dbdatafile/Ks100K/ForwardLongFunction/Ks100K-ForwardLongFunction-tmp-jb-159-Data.db
  *   at 
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142)
 ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
  *   at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) 
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
  *   at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160)
 ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
  *   at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
  *   at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
  *   at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
 ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
  *   at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
 ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
  *   at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
 ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
  *   at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
~[na:1.7.0_40]
  *   at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_40]
  *   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
~[na:1.7.0_40]
  *   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_40]
  *   at java.lang.Thread.run(Thread.java:724) [na:1.7.0_40]



Question: ConsistencyLevel.ONE with multiple datacenters

2014-01-31 Thread Paulo Ricardo Motta Gomes
Hey,

When adding a new data center to our production C* datacenter using the
procedure described in [1], some of our application requests were returning
null/empty values. Rebuild was not complete in the new datacenter, so my
guess is that some requests were being directed to the brand new datacenter
which still didn't have the data.

Our Hector client was connected only to the original nodes, with
autoDiscoverHosts=false and we use ConsistencyLevel.ONE for reads. The
keyspace schema was already configured to use both data centers.

My question is: is it possible that the dynamic snitch is choosing the
nodes in the new (empty) datacenter when CL=ONE? In this case, it's
mandatory to use CL=LOCAL_ONE during bootstrap/rebuild of a new datacenter,
otherwise empty data might be returned, correct?

Cheers,

[1]
http://www.datastax.com/documentation/cassandra/1.2/webhelp/cassandra/operations/ops_add_dc_to_cluster_t.html

-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br http://www.chaordic.com.br/*
+55 48 3232.3200
+55 83 9690-1314


Re: Ultra wide row anti pattern

2014-01-31 Thread Robert Coli
On Fri, Jan 31, 2014 at 6:52 AM, DuyHai Doan doanduy...@gmail.com wrote:

  4) hard limit of 2*10⁹ columns per physical row
  b. maximum number of items to be processed is 24*10⁶, far below the hard
 limit of  2*10⁹ columns so point 4) does not apply either


Before discarding this point, try writing an example row this large with
your actual keys and column names and values, and then reading it and
compacting it.

Given the lack of need for durability, I suggest turning durable_writes off.

=Rob


Re: Reverting from VirtualNode

2014-01-31 Thread Robert Coli
On Fri, Jan 31, 2014 at 5:08 AM, Víctor Hugo Oliveira Molinar 
vhmoli...@gmail.com wrote:

 Once we set nodes to act as virtualnodes, there is an way to revert to
 manual assigned token?


On a given node? My understanding is that there is no officially supported
way. You now have 256 contiguous tokens.

You can decommission the node, have it stream all its data to other nodes,
and then re-add it with a single token.

You could also removetoken (or better (?) unsafeassassinate) each of the
tokens and then restart the node with auto_bootstrap:false to join the node
at its old token, then repair it. That would probably work because your
node still has the data for the ranges it had before you converted it, but
has a risk of stale reads at CL.ONE before repair completes. The benefit
here is that you avoid bootstrapping data that you already have on your
node.

https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/

=Rob


Re: Ultra wide row anti pattern

2014-01-31 Thread DuyHai Doan
Durable write has been removed from the entire keyspace already.

 I'll run a bench on a 24*10⁶ wide row and give feedback soon


On Fri, Jan 31, 2014 at 7:55 PM, Robert Coli rc...@eventbrite.com wrote:

 On Fri, Jan 31, 2014 at 6:52 AM, DuyHai Doan doanduy...@gmail.com wrote:

  4) hard limit of 2*10⁹ columns per physical row
  b. maximum number of items to be processed is 24*10⁶, far below the hard
 limit of  2*10⁹ columns so point 4) does not apply either


 Before discarding this point, try writing an example row this large with
 your actual keys and column names and values, and then reading it and
 compacting it.

 Given the lack of need for durability, I suggest turning durable_writes
 off.

 =Rob



Fwd: {kundera-discuss} Kundera 2.10 released

2014-01-31 Thread Vivek Mishra
fyi

-- Forwarded message --
From: Vivek Mishra vivek.mis...@impetus.co.in
Date: Sat, Feb 1, 2014 at 1:18 AM
Subject: {kundera-discuss} Kundera 2.10 released
To: kundera-disc...@googlegroups.com kundera-disc...@googlegroups.com


Hi All,

We are happy to announce the Kundera 2.10 release.

Kundera is a JPA 2.0 compliant, object-datastore mapping library for NoSQL
datastores. The idea behind Kundera is to make working with NoSQL databases
drop-dead simple and fun. It currently supports Cassandra, HBase, MongoDB,
Redis, OracleNoSQL, Neo4j,ElasticSearch,CouchDB and relational databases.

Major Changes:
==
1) Support added for bean validation.


Github Bug Fixes:
===
https://github.com/impetus-opensource/Kundera/issues/208
https://github.com/impetus-opensource/Kundera/issues/380
https://github.com/impetus-opensource/Kundera/issues/408
https://github.com/impetus-opensource/Kundera/issues/453
https://github.com/impetus-opensource/Kundera/issues/454
https://github.com/impetus-opensource/Kundera/issues/456
https://github.com/impetus-opensource/Kundera/issues/460
https://github.com/impetus-opensource/Kundera/issues/465
https://github.com/impetus-opensource/Kundera/issues/476
https://github.com/impetus-opensource/Kundera/issues/478
https://github.com/impetus-opensource/Kundera/issues/479
https://github.com/impetus-opensource/Kundera/issues/484
https://github.com/impetus-opensource/Kundera/issues/494
https://github.com/impetus-opensource/Kundera/issues/509
https://github.com/impetus-opensource/Kundera/issues/514
https://github.com/impetus-opensource/Kundera/issues/516
https://github.com/impetus-opensource/Kundera/issues/517
https://github.com/impetus-opensource/Kundera/issues/518


How to Download:
To download, use or contribute to Kundera, visit:
http://github.com/impetus-opensource/Kundera

Latest released tag version is 2.10 Kundera maven libraries are now
available at:
https://oss.sonatype.org/content/repositories/releases/com/impetus

Sample codes and examples for using Kundera can be found here:
https://github.com/impetus-opensource/Kundera/tree/trunk/src/kundera-tests

Survey/Feedback:
http://www.surveymonkey.com/s/BMB9PWG

Thank you all for your contributions and using Kundera!

PS: Group artifactId has been changed with 2.9.1 release onward. Please
refer
https://github.com/impetus-opensource/Kundera/blob/trunk/src/README.md#notefor
the same.









NOTE: This message may contain information that is confidential,
proprietary, privileged or otherwise protected by law. The message is
intended solely for the named addressee. If received in error, please
destroy and notify the sender. Any use of this email is prohibited when
received in error. Impetus does not represent, warrant and/or guarantee,
that the integrity of this communication has been maintained nor that the
communication is free of errors, virus, interception or interference.

--
You received this message because you are subscribed to the Google Groups
kundera-discuss group.
To unsubscribe from this group and stop receiving emails from it, send an
email to kundera-discuss+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Ultra wide row anti pattern

2014-01-31 Thread Nate McCall


  The only drawback for ultra wide row I can see is point 1). But if I use
 leveled compaction with a sufficiently large value for sstable_size_in_mb
 (let's say 200Mb), will my read performance be impacted as the row grows ?


For this use case, you would want to use SizeTieredCompaction and play
around with the configuration a bit to keep a small number of large
SSTables. Specifically: keep min|max_threshold really low, set bucket_low
and bucket_high closer together maybe even both to 1.0, and maybe a larger
min_sstable_size.

YMMV though - per Rob's suggestion, take the time to run some tests
tweaking these options.



  Of course, splitting wide row into several rows using bucketing technique
 is one solution but it forces us to keep track of the bucket number and
 it's not convenient. We have one process (jvm) that insert data and another
 process (jvm) that read data. Using bucketing, we need to synchronize the
 bucket number between the 2 processes.


This could be as simple as adding year and month to the primary key (in the
form 'mm'). Alternatively, you could add this in the partition in the
definition. Either way, it then becomes pretty easy to re-generate these
based on the query parameters.



-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder  Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Ultra wide row anti pattern

2014-01-31 Thread DuyHai Doan
Thanks Nat for your ideas.

This could be as simple as adding year and month to the primary key (in
the form 'mm'). Alternatively, you could add this in the partition in
the definition. Either way, it then becomes pretty easy to re-generate
these based on the query parameters.

 The thing is that it's not that simple. My customer has a very BAD idea,
using Cassandra as a queue (the perfect anti-pattern ever).

 Before trying to tell them to redesign their entire architecture and put
in some queueing system like ActiveMQ or something similar, I would like to
see how I can use wide rows to meet the requirements.

 The functional need is quite simple:

 1) A process A loads users into Cassandra and sets the status on this user
to be 'TODO'. When using the bucketing technique, we can limit a row width
to, let's say 100 000 columns. So at the end of the current row, process A
knows that it should move to next bucket. Bucket is coded using *composite
partition key*, in our example it would be 'TODO:1', 'TODO:2'  etc


 2) A process B reads the wide row for 'TODO' status. It starts at bucket 1
so it will read row with partition key 'TODO:1'. The users are processed
and inserted in a new row 'PROCESSED:1' for example to keep track of the
status. After retrieving 100 000 columns, it will switch automatically to
the next bucket. Simple. Fair enough


 3) Now what sucks it that some time, process B does not have enough data
to perform functional logic on the user it fetched from the wide row, so it
has to REPUT some users back into the 'TODO' status rather than
transitioning to 'PROCESSED' status. That's exactly a queue behavior.

 A simplistic idea would be to insert again those *m* users with 'TODO:*n*',
with *n* higher than the current bucket number so it can be processed
later. *But then it screws up all the counting system*. Process A which
inserts data will not know that there are already *m* users in row *n*, so
will happily add 100 000 columns, making the row size grow to  *100 000 +
m. *When process B reads back again this row, it will stop at the first 100
000 columns and skip the trailing *m* elements .

  That 's the main reason for which I dropped the idea of bucketing (which
is quite smart in normal case) to trade for ultra wide row.

 Any way, I'll follow your advice and play around with the parameters of
SizeTiered

 Regards

 Duy Hai DOAN


On Fri, Jan 31, 2014 at 9:23 PM, Nate McCall n...@thelastpickle.com wrote:


  The only drawback for ultra wide row I can see is point 1). But if I use
 leveled compaction with a sufficiently large value for sstable_size_in_mb
 (let's say 200Mb), will my read performance be impacted as the row grows ?


 For this use case, you would want to use SizeTieredCompaction and play
 around with the configuration a bit to keep a small number of large
 SSTables. Specifically: keep min|max_threshold really low, set bucket_low
 and bucket_high closer together maybe even both to 1.0, and maybe a larger
 min_sstable_size.

 YMMV though - per Rob's suggestion, take the time to run some tests
 tweaking these options.



  Of course, splitting wide row into several rows using bucketing
 technique is one solution but it forces us to keep track of the bucket
 number and it's not convenient. We have one process (jvm) that insert data
 and another process (jvm) that read data. Using bucketing, we need to
 synchronize the bucket number between the 2 processes.


 This could be as simple as adding year and month to the primary key (in
 the form 'mm'). Alternatively, you could add this in the partition in
 the definition. Either way, it then becomes pretty easy to re-generate
 these based on the query parameters.



 --
 -
 Nate McCall
 Austin, TX
 @zznate

 Co-Founder  Sr. Technical Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com