Re: data model question

2012-03-13 Thread Tamar Fraenkel
Thanks!
Better than mine, as it considered later additions of services!
Will update my code,
Thanks

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Mon, Mar 12, 2012 at 11:13 AM, aaron morton aa...@thelastpickle.comwrote:

 In this case, where you know the query upfront, I add a custom secondary
 index using another CF to support the query. It's a little easier here
 because the data wont change.

 UserLookupCF (using composite types for the key value)

 row_key: system_name:id e.g. facebook:12345 or twitter:12345
 col_name : internal_user_id e.g. 5678
 col_value: empty

 Hope that helps.

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 11/03/2012, at 11:15 PM, Tamar Fraenkel wrote:

 Hi!
 Thanks for the response.
 From what I read, secondary indices are good only for columns with few
 possible values. Is this a good fit for my case? I have unique facebook id
 for every user.
 Thanks

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 tokLogo.png

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956





 On Sun, Mar 11, 2012 at 11:48 AM, Marcel Steinbach mstei...@gmail.comwrote:

 Either you do that or you could think about using a secondary index on
 the fb user name in your primary cf.

 See http://www.datastax.com/docs/1.0/ddl/indexes

 Cheers

 Am 11.03.2012 um 09:51 schrieb Tamar Fraenkel ta...@tok-media.com:

  Hi!
 I need some advise:
 I have user CF, which has a UUID key which is my internal user id.
 One of the column is facebook_id of the user (if exist).

 I need to have the reverse mapping from facebook_id to my UUID.
 My intention is to add a CF for the mapping from Facebook Id to my id:

 user_by_fbid = {
   // key is fb Id, column name is our User Id, value is empty
   13101876963: {
 f94f6b20-161a-4f7e-995f-0466c62a1b6b : 
   }
 }

 Does this makes sense.
 This CF will be used whenever a user log in through Facebook to retrieve
 the internal id.
 Thanks

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 tokLogo.png


 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956






tokLogo.png

Re: running two rings on the same subnet

2012-03-13 Thread Tamar Fraenkel
Done it. Now it generally runs ok, till one of the nodes get's stuck with
100% cpu and I need to reboot it.

Last lines in the system.log just before are:
 INFO [OptionalTasks:1] 2012-03-13 07:36:43,850 MeteredFlusher.java (line
62) flushing high-traffic column family CFS(Keyspace='tok',
ColumnFamily='tk_vertical_tag_story_indx') (estimated 35417890 bytes)
 INFO [OptionalTasks:1] 2012-03-13 07:36:43,869 ColumnFamilyStore.java
(line 704) Enqueuing flush of
Memtable-tk_vertical_tag_story_indx@2002820169(1620316/35417890
serialized/live bytes, 30572 ops)
 INFO [FlushWriter:76] 2012-03-13 07:36:43,869 Memtable.java (line 246)
Writing Memtable-tk_vertical_tag_story_indx@2002820169(1620316/35417890
serialized/live bytes, 30572 ops)
 INFO [FlushWriter:76] 2012-03-13 07:36:44,015 Memtable.java (line 283)
Completed flushing
/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-191-Data.db (2134123
bytes)
 INFO [OptionalTasks:1] 2012-03-13 07:37:37,886 MeteredFlusher.java (line
62) flushing high-traffic column family CFS(Keyspace='tok',
ColumnFamily='tk_vertical_tag_story_indx') (estimated 34389135 bytes)
 INFO [OptionalTasks:1] 2012-03-13 07:37:37,887 ColumnFamilyStore.java
(line 704) Enqueuing flush of
Memtable-tk_vertical_tag_story_indx@1869953681(1573252/34389135
serialized/live bytes, 29684 ops)
 INFO [FlushWriter:76] 2012-03-13 07:37:37,887 Memtable.java (line 246)
Writing Memtable-tk_vertical_tag_story_indx@1869953681(1573252/34389135
serialized/live bytes, 29684 ops)
 INFO [FlushWrit

Any idea?
I am considering adding a third node, so that replication factor of 2 won't
stuck my system when one node goes down. Does it make sense?

Thanks


*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Tue, Mar 6, 2012 at 7:51 PM, aaron morton aa...@thelastpickle.comwrote:

 Reduce these settings for the CF
 row_cache (disable it)
 key_cache (disable it)

 Increase these settings for the CF
 bloom_filter_fp_chance

 Reduce these settings in cassandra.yaml

 flush_largest_memtables_at
 memtable_flush_queue_size
 sliced_buffer_size_in_kb
 in_memory_compaction_limit_in_mb
 concurrent_compactors


 Increase these settings
 index_interval


 While it obviously depends on load, I would not be surprised if you had a
 lot of trouble running cassandra with that setup.

 Cheers
 A


   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 6/03/2012, at 11:02 PM, Tamar Fraenkel wrote:

 Arron, Thanks for your response. I was afraid this is the issue.
 Can you give me some direction regarding the fine tuning of my VMs, I
 would like to explore that option some more.
 Thanks!

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 tokLogo.png

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956





 On Tue, Mar 6, 2012 at 11:58 AM, aaron morton aa...@thelastpickle.comwrote:

 You do not have enough memory allocated to the JVM and are suffering from
 excessive GC as a result.

 There are some tuning things you can try, but 480MB is not enough. 1GB
 would be a better start, 2 better than that.

 Consider using https://github.com/pcmanus/ccm for testing multiple
 instances on a single server rather than a VM.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 6/03/2012, at 10:21 PM, Tamar Fraenkel wrote:

 I have some more info, after couple of hours running the problematic node
 became again 100% CPU and I had to reboot it, last lines from log show it
 did GC:

  INFO [ScheduledTasks:1] 2012-03-06 10:28:00,880 GCInspector.java (line
 122) GC for Copy: 203 ms for 1 collections, 185983456 used; max is 513802240
  INFO [ScheduledTasks:1] 2012-03-06 10:28:50,595 GCInspector.java (line
 122) GC for Copy: 3927 ms for 1 collections, 156572576 used; max is
 513802240
  INFO [ScheduledTasks:1] 2012-03-06 10:28:55,434 StatusLogger.java (line
 50) Pool NameActive   Pending   Blocked
  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,298 StatusLogger.java (line
 65) ReadStage 2 2 0
  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,499 StatusLogger.java (line
 65) RequestResponseStage  0 0 0
  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line
 65) ReadRepairStage   0 0 0
  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line
 65) MutationStage 0 0 0
  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line
 65) ReplicateOnWriteStage 0 0 0
  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line
 65) GossipStage   0 0 0
  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line
 65) 

Re: Adding node to Cassandra

2012-03-13 Thread aaron morton
 2. Move node 'D' initial token down from 150... to 130... 
 Here we ran into a problem. When move started disk usage for node C 
 grows from 400 to 750GB, we saw running compactions on node 'D' but some 
 compactions failed with 
Did you run out of space on C or D ? 

 We expected decrease of used disk space on node 'D' 'cause we shrink 
 token range for this node, but saw the opposite, why it happened and is 
 it normal behavior?
Remember that node D is also holding replicas of the token ranges assigned to 
node B and C. 

At first glance it sounds unusual but it's hard to tell without knowing more 
about what happened. How long did it take to build up ? What sort of load was 
the system under? What was in the data directory, was there -tmp files in there 
or lots of small files ? What did  nodetool compactionstats say, was compaction 
was keeping up ? 

Moving forward, *if* you see a lot of old files in the data dir you may benefit 
from running a manual compaction as it may reduce the amount of data 
transferred. There are some downsides to this. Check the data stax site or ask 
if you do not know what they are. 

Hope that helps


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 13/03/2012, at 3:38 AM, Rustam Aliyev wrote:

 It's hard to answer this question because there are whole bunch of operations 
 which may cause disk usage growth - repair, compaction, move etc. Any 
 combination of these operations will make things only worse. But let's assume 
 that in your case the only operation increasing disk usage was move.
 
 Simply speaking move does not move data from one node to another, it just 
 copies data. Once data copied, you need to cleanup data which node is not 
 responsible for using cleanup command.
 
 If you can't increase storage, maybe you can try moving nodes slowly. I.e. 
 Instead of moving node D from 150... to 130..., try going first to 140..., 
 cleanup and then from 140... to 130... However, I never tried this and can't 
 guarantee that it will use less disk space.
 
 In the past, someone reported x2.5 increase when they went from 4 nodes to 5.
 
 --
 Rustam.
 
 On 12/03/2012 12:46, Vanger wrote:
 
 Cassandra v1.0.8
 once again: 4-nodes cluster, RF = 3. 
 
 
 On 12.03.2012 16:18, Rustam Aliyev wrote:
 
 What version of Cassandra do you have?
 
 On 12/03/2012 11:38, Vanger wrote:
 
 We were aware of compaction overhead, but still don't understand why that 
 shall happened: node 'D' was in stable condition, works for at least 
 month, had all data for its token range and was comfortable with such disk 
 space. 
 Why suddenly node needs 2x more space for data it already have? Why 
 decreasing token range not lead to decreasing disk usage? 
 
 On 12.03.2012 15:14, Rustam Aliyev wrote:
 
 Hi,
 
 If you use SizeTieredCompactionStrategy, you should have x2 disk space to 
 be on the safe side. So if you want to store 2TB data, you need partition 
 size of 4TB at least.  LeveledCompactionStrategy is available in 1.x and 
 supposed to require less free disk space (but comes at price of I/O).
 
 --
 Rustam.
 
 On 12/03/2012 09:23, Vanger wrote:
 
 We have cassandra 4 nodes cluster with RF = 3 (nodes named from 'A' to 
 'D', initial tokens: 
 A (25%): 20543402371996174596346065790779111550, 
 B (25%): 63454860067234500516210522518260948578, 
 C (25%): 106715317233367107622067286720208938865,
 D (25%): 150141183460469231731687303715884105728), 
 and want to add 5th node ('E') with initial token = 
 164163260474281062972548100673162157075,  then we want to rebalance A, 
 D, E nodes such way they'll own equal percentage of data. All nodes have 
 ~400 GB of data and around ~300GB disk free space.
 What we did:
 1. 'Join' new cassandra instance (node 'E') to cluster and wait 'till it 
 loads data for it tokens range.
 
 2. Move node 'D' initial token down from 150... to 130... 
 Here we ran into a problem. When move started disk usage for node C 
 grows from 400 to 750GB, we saw running compactions on node 'D' but some 
 compactions failed with WARN [CompactionExecutor:580] 2012-03-11 
 16:57:56,036 CompactionTask.java (line 87) insufficient space to compact 
 all requested files SSTableReader after that we killed move process 
 to avoid out of disk space error (when 5GB of free space left). After 
 restart it   frees 100GB of space and now we have total of 
 105GB free disk space on node 'D'. Also we noticed increased disk usage 
 by ~150GB at node 'B' but it stops growing before we stopped move 
 token.
 
 
 So now we have 5 nodes in cluster in status like this:
 Node, Owns%, Load, Init. token
 A: 16%   400GB020...
 B: 25%   520GB063...
 C: 25%   400GB106...
 D: 25%   640GB150...
 E:  9% 300GB164...
 
 We'll add disk space for all nodes and run some cleanups, but there's 
 still left some questions:
 
 What is 

Re: OOM opening bloom filter

2012-03-13 Thread aaron morton
Thanks for the update. 

How much smaller did the BF get to ? 

A

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 13/03/2012, at 8:24 AM, Mick Semb Wever wrote:

 
 It's my understanding then for this use case that bloom filters are of
 little importance and that i can
 
 
 Ok. To summarise our actions to get us out of this situation, in hope
 that it may help others one day, we did the following actions:
 
 1) upgrade to 1.0.7
 2) set fp_ratio=0.99
 3) set index_interval=4096
 4) restarted the node with Xmx30G
 5) run `nodetool scrub` 
  and monitor total size of bf files
  using `du -hc *-Filter.db | grep total`
 6) restart node with original Xmx setting once total bf size is under
  (scrub was running for 12hrs)
  (remaining bloom filters can be rebuilt later from normal compact)
 
 Hopefully it will also eventuate that this cluster can run with a more
 normal Xmx4G rather than the previous Xmx12G.
 
 (2) and (3) are very much dependent on our set up using hadoop where all
 reads are get_range_slice with 16k rows per request. Both could be tuned
 correctly but they're the numbers that worked first up.
 
 ~mck
 
 -- 
 When there is no enemy within, the enemies outside can't hurt you.
 African proverb 
 
 | http://github.com/finn-no | http://tech.finn.no |



Re: running two rings on the same subnet

2012-03-13 Thread aaron morton
If you are on Ubuntu it may be this 
http://wiki.apache.org/cassandra/FAQ#ubuntu_hangs

otherwise I would look for GC problems. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 13/03/2012, at 7:53 PM, Tamar Fraenkel wrote:

 Done it. Now it generally runs ok, till one of the nodes get's stuck with 
 100% cpu and I need to reboot it.
 
 Last lines in the system.log just before are:
  INFO [OptionalTasks:1] 2012-03-13 07:36:43,850 MeteredFlusher.java (line 62) 
 flushing high-traffic column family CFS(Keyspace='tok', 
 ColumnFamily='tk_vertical_tag_story_indx') (estimated 35417890 bytes)
  INFO [OptionalTasks:1] 2012-03-13 07:36:43,869 ColumnFamilyStore.java (line 
 704) Enqueuing flush of 
 Memtable-tk_vertical_tag_story_indx@2002820169(1620316/35417890 
 serialized/live bytes, 30572 ops)
  INFO [FlushWriter:76] 2012-03-13 07:36:43,869 Memtable.java (line 246) 
 Writing Memtable-tk_vertical_tag_story_indx@2002820169(1620316/35417890 
 serialized/live bytes, 30572 ops)
  INFO [FlushWriter:76] 2012-03-13 07:36:44,015 Memtable.java (line 283) 
 Completed flushing 
 /opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-191-Data.db (2134123 
 bytes)
  INFO [OptionalTasks:1] 2012-03-13 07:37:37,886 MeteredFlusher.java (line 62) 
 flushing high-traffic column family CFS(Keyspace='tok', 
 ColumnFamily='tk_vertical_tag_story_indx') (estimated 34389135 bytes)
  INFO [OptionalTasks:1] 2012-03-13 07:37:37,887 ColumnFamilyStore.java (line 
 704) Enqueuing flush of 
 Memtable-tk_vertical_tag_story_indx@1869953681(1573252/34389135 
 serialized/live bytes, 29684 ops)
  INFO [FlushWriter:76] 2012-03-13 07:37:37,887 Memtable.java (line 246) 
 Writing Memtable-tk_vertical_tag_story_indx@1869953681(1573252/34389135 
 serialized/live bytes, 29684 ops)
  INFO [FlushWrit
 
 Any idea?
 I am considering adding a third node, so that replication factor of 2 won't 
 stuck my system when one node goes down. Does it make sense?
 
 Thanks
 
 
 Tamar Fraenkel 
 Senior Software Engineer, TOK Media 
 
 tokLogo.png
 
 ta...@tok-media.com
 Tel:   +972 2 6409736 
 Mob:  +972 54 8356490 
 Fax:   +972 2 5612956 
 
 
 
 
 
 On Tue, Mar 6, 2012 at 7:51 PM, aaron morton aa...@thelastpickle.com wrote:
 Reduce these settings for the CF
 row_cache (disable it)
 key_cache (disable it)
 
 Increase these settings for the CF
 bloom_filter_fp_chance
 
 Reduce these settings in cassandra.yaml
 
 flush_largest_memtables_at
 memtable_flush_queue_size
 sliced_buffer_size_in_kb
 in_memory_compaction_limit_in_mb
 concurrent_compactors
 
 
 Increase these settings 
 index_interval
 
 
 While it obviously depends on load, I would not be surprised if you had a lot 
 of trouble running cassandra with that setup. 
 
 Cheers
 A
 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 6/03/2012, at 11:02 PM, Tamar Fraenkel wrote:
 
 Arron, Thanks for your response. I was afraid this is the issue.
 Can you give me some direction regarding the fine tuning of my VMs, I would 
 like to explore that option some more.
 Thanks!
 
 Tamar Fraenkel 
 Senior Software Engineer, TOK Media 
 
 tokLogo.png
 
 ta...@tok-media.com
 Tel:   +972 2 6409736 
 Mob:  +972 54 8356490 
 Fax:   +972 2 5612956 
 
 
 
 
 
 On Tue, Mar 6, 2012 at 11:58 AM, aaron morton aa...@thelastpickle.com 
 wrote:
 You do not have enough memory allocated to the JVM and are suffering from 
 excessive GC as a result.
 
 There are some tuning things you can try, but 480MB is not enough. 1GB would 
 be a better start, 2 better than that. 
 
 Consider using https://github.com/pcmanus/ccm for testing multiple instances 
 on a single server rather than a VM.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 6/03/2012, at 10:21 PM, Tamar Fraenkel wrote:
 
 I have some more info, after couple of hours running the problematic node 
 became again 100% CPU and I had to reboot it, last lines from log show it 
 did GC:
 
  INFO [ScheduledTasks:1] 2012-03-06 10:28:00,880 GCInspector.java (line 
 122) GC for Copy: 203 ms for 1 collections, 185983456 used; max is 513802240
  INFO [ScheduledTasks:1] 2012-03-06 10:28:50,595 GCInspector.java (line 
 122) GC for Copy: 3927 ms for 1 collections, 156572576 used; max is 
 513802240
  INFO [ScheduledTasks:1] 2012-03-06 10:28:55,434 StatusLogger.java (line 
 50) Pool NameActive   Pending   Blocked
  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,298 StatusLogger.java (line 
 65) ReadStage 2 2 0
  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,499 StatusLogger.java (line 
 65) RequestResponseStage  0 0 0
  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line 
 65) ReadRepairStage   0 0 0
  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line 
 65) MutationStage

Row iteration over indexed clause

2012-03-13 Thread Vivek Mishra
Hi,
Is it possible to iterate and fetch in chunks using thrift API by querying
using secondary indexes?

-Vivek


Adding a new node to already existing single-node-cluster cassandra

2012-03-13 Thread Rishabh Agrawal
Hello,

I have been trying to add a node to single node cluster of Cassandra (1.0.8) 
but I always get following error:

INFO 17:50:35,555 JOINING: schema complete, ready to bootstrap
INFO 17:50:35,556 JOINING: getting bootstrap token
ERROR 17:50:35,557 Exception encountered during startup
java.lang.RuntimeException: No other nodes seen!  Unable to bootstrap.If you 
intended to start a single-node cluster, you should make sure your 
broadcast_address (or listen_address) is listed as a seed.  Otherwise, you need 
to determine why the seed being contacted has no knowledge of the rest of the 
cluster.  Usually, this can be solved by giving all nodes the same seed list.
at 
org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:168)
at 
org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java:150)
at 
org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.java:145)
at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:565)
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:484)
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:395)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:234)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
java.lang.RuntimeException: No other nodes seen!  Unable to bootstrap.If you 
intended to start a single-node cluster, you should make sure your 
broadcast_address (or listen_address) is listed as a seed.  Otherwise, you need 
to determine why the seed being contacted has no knowledge of the rest of the 
cluster.  Usually, this can be solved by giving all nodes the same seed list.
at 
org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:168)
at 
org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java:150)
at 
org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.java:145)
at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:565)
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:484)
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:395)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:234)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
Exception encountered during startup: No other nodes seen!  Unable to 
bootstrap.If you intended to start a single-node cluster, you should make sure 
your broadcast_address (or listen_address) is listed as a seed.  Otherwise, you 
need to determine why the seed being contacted has no knowledge of the rest of 
the cluster.  Usually, this can be solved by giving all nodes the same seed 
list.
INFO 17:50:35,571 Waiting for messaging service to quiesce
INFO 17:50:35,571 MessagingService shutting down server thread.

Kindly help me asap.

Regards
Rishabh Agrawal



Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more 
about our Big Data quick-start program at the event.

New Impetus webcast 'Cloud-enabled Performance Testing vis-?-vis On-premise' 
available at http://bit.ly/z6zT4L.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Row iteration over indexed clause

2012-03-13 Thread Shimi Kiviti
Yes.use get_indexed_slices (http://wiki.apache.org/cassandra/API)
On Tue, Mar 13, 2012 at 2:12 PM, Vivek Mishra mishra.v...@gmail.com wrote:

 Hi,
 Is it possible to iterate and fetch in chunks using thrift API by querying
 using secondary indexes?

 -Vivek



Re: Row iteration over indexed clause

2012-03-13 Thread Vivek Mishra
Thanks.

*Attribute*

*Type*

*Default*

*Required*

*Description*

expressions

listIndexExpression

n/a

Y

The list of IndexExpression objects which must contain one EQ
IndexOperator among
the expressions

start_key

binary

n/a

Y

Start the index query at the specified key - can be set to '', i.e., an
empty byte array, to start with the first key

count

integer

100

Y

The number of results to which the index query will be constrained




How do i iterate using it? How do i ensure that it should not return me
previous results(without i need to keep something in-memory)?

This is the method i am looking into:

get_indexed_slices(ColumnParent column_parent, IndexClause index_clause,
SlicePredicate column_predicate, ConsistencyLevel consistency_level)

It does not have anything like count.


Thanks,
Vivek

On Tue, Mar 13, 2012 at 6:24 PM, Shimi Kiviti shim...@gmail.com wrote:

 Yes. use get_indexed_slices (http://wiki.apache.org/cassandra/API)
 On Tue, Mar 13, 2012 at 2:12 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 Hi,
 Is it possible to iterate and fetch in chunks using thrift API by
 querying using secondary indexes?

 -Vivek





Re: OOM opening bloom filter

2012-03-13 Thread Mick Semb Wever


 How much smaller did the BF get to ? 

After pending compactions completed today, i'm presuming fp_ratio is
applied now to all sstables in the keyspace, it has gone from 20G+ down
to 1G. This node is now running comfortably on Xmx4G (used heap ~1.5G).


~mck


-- 
A Microsoft Certified System Engineer is to information technology as a
McDonalds Certified Food Specialist is to the culinary arts. Michael
Bacarella 

| http://github.com/finn-no | http://tech.finn.no |


signature.asc
Description: This is a digitally signed message part


Re: CAn't bootstrap a new node to my cluster

2012-03-13 Thread aaron morton
Can you provide some context for the log files please. 

The original error had to do with bootstrapping a new node into a cluster. The 
log looks like a node is starting with -Dcassadra.join-ring = false and then 
nodetool join is run. 

Is there an error when this runs ? 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 12/03/2012, at 11:58 PM, Cyril Scetbon wrote:

 I don't know if it can helps, but the only thing I see on cluster's nodes is :
 
 == /var/log/cassandra/output.log ==
  INFO 10:57:28,530 InetAddress /10.0.1.70 is now dead.
 
 
 when I try to join the node 10.0.1.70 to the cluster
 
 On 3/12/12 11:27 AM, Cyril Scetbon wrote:
 
 It's done.
 
 Nothing new on stderr when I use the join command. I send you the logfiles 
 after I've tried to add the node.
 
 Regards
 
 On 3/12/12 10:47 AM, aaron morton wrote:
 
 Modify this line the log4j-server.properties. It will normally be located 
 in /etc/cassandra 
 
 https://github.com/apache/cassandra/blob/trunk/conf/log4j-server.properties#L21
 
 Change INFO to DEBUG 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 12/03/2012, at 10:12 PM, Cyril Scetbon wrote:
 
 On 3/12/12 9:50 AM, aaron morton wrote:
 
 It may be the case that the joining node does not have enough 
 information. But there is a default 30 second delay while the node waits 
 for the ring information to stabilise. 
 
 What version are you using ? 
 1.0.7
 
 Next time you add a new node can you try it with logging set the DEBUG. 
 If you get the error please add it to 
 https://issues.apache.org/jira/browse/CASSANDRA with the relevant logs. 
 where do I have to add it ? I added it to the cassandra-env.sh and got a 
 lot of things but are you saying that I must add it to the join command ? 
 if yes, how ?
 
 after the join command fails as you saw I have the ring information after 
 that. I don't if it took 30 seconds or not ...
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 
 -- 
 Cyril SCETBON
 
 
 
 -- 
 Cyril SCETBON
 
 
 -- 
 Cyril SCETBON



Re: how to increase compaction rate?

2012-03-13 Thread Thorsten von Eicken
On 3/12/2012 6:52 AM, Brandon Williams wrote:
 On Mon, Mar 12, 2012 at 4:44 AM, aaron morton aa...@thelastpickle.com wrote:
 I don't understand why I
 don't get multiple concurrent compactions running, that's what would
 make the biggest performance difference.

 concurrent_compactors
 Controls how many concurrent compactions to run, by default it's the number
 of cores on the machine.
I'm on a quad-core machine so not setting concurrent_compactors should
not be a limiting factor...
 With leveled compaction, I don't think you get any concurrency because
 it has to compact an entire level, and it can't proceed to the next
 level without completing the one before it.

 In short, if you want maximum throughput, stick with size tiered.
I switched the CFs to tiered compaction and I still get no concurrency
for the same CF. I now have two compactions running concurrently but
always for different CFs. I've briefly seen a third for one of the small
CFs, so it's willing to run more than two concurrently. Looks like I
have to wait for a few days for all the compactions to complete. Talk
about compaction hell!


 -Brandon



Re: Why is row lookup much faster than column lookup

2012-03-13 Thread Dave Brosius
Given the hashtable nature of cassandra, finding a row is probably 'relatively' 
constant no matter how many columns you have.The smaller the number of columns, 
i suppose the more likely that all the columns will be in one sstable. If 
you've got a ton of columns per row, it is much more likely that these columns 
will be spread out in multple ss tables. Plus, columns are read in chunks, 
depending on yaml settings.   - Original Message -From: quot;A Jquot; 
;s5a...@gmail.com

Re: Why is row lookup much faster than column lookup

2012-03-13 Thread Dave Brosius
 sorry, should have been: Given the hashtable nature of cassandra, finding a 
row is probably 'relatively' constant no matter how many *rows* you have.  
- Original Message -From: quot;Dave Brosiusquot; 
;dbros...@mebigfatguy.com 

Question on ByteOrdered rebalancing

2012-03-13 Thread work late
The ring command on nodetool shows as

Address DC  RackStatus State   LoadOwns
   Token

   Token(bytes[88401b216270ab8ebb690946b0b70eab])
10.1.1.1 datacenter1 rack1   Up Normal  69.1 KB 50.00%
 Token(bytes[4936c862b88db2bdd92d684583bf0280])
10.1.1.2datacenter1 rack1   Up Normal  69.1 KB 50.00%
 Token(bytes[88401b216270ab8ebb690946b0b70eab])


The token looks like a MD5 value, is that correct? So when rebalancing the
cluster, what is the token value I am supposed to give the move command
(with RP it is the token between 0- 2^127), what should I use BOP?


Thanks


Re: Question on ByteOrdered rebalancing

2012-03-13 Thread Tyler Hobbs
The tokens are hex encoded arrays of bytes.

On Tue, Mar 13, 2012 at 1:05 PM, work late worklate1...@gmail.com wrote:

 The ring command on nodetool shows as

 Address DC  RackStatus State   Load
  OwnsToken

  Token(bytes[88401b216270ab8ebb690946b0b70eab])
 10.1.1.1 datacenter1 rack1   Up Normal  69.1 KB 50.00%
  Token(bytes[4936c862b88db2bdd92d684583bf0280])
 10.1.1.2datacenter1 rack1   Up Normal  69.1 KB 50.00%
  Token(bytes[88401b216270ab8ebb690946b0b70eab])


 The token looks like a MD5 value, is that correct? So when rebalancing the
 cluster, what is the token value I am supposed to give the move command
 (with RP it is the token between 0- 2^127), what should I use BOP?


 Thanks




-- 
Tyler Hobbs
DataStax http://datastax.com/


Does the 'batch' order matter ?

2012-03-13 Thread A J
I know batch operations are not atomic but does the success of a write
imply all writes preceeding it in the batch were successful ?

For example, using cql:
BEGIN BATCH USING CONSISTENCY QUORUM AND TTL 864
  INSERT INTO users (KEY, password, name) VALUES ('user2',
'ch@ngem3b', 'second user')
  UPDATE users SET password = 'ps22dhds' WHERE KEY = 'user2'
  INSERT INTO users (KEY, password) VALUES ('user3', 'ch@ngem3c')
  DELETE name FROM users WHERE key = 'user2'
  INSERT INTO users (KEY, password, name) VALUES ('user4',
'ch@ngem3c', 'Andrew')
APPLY BATCH;

Say the batch failed but I see that the third write was present on a
node. Does it imply that the first insert and the second update
definitely made to that node as well ?

Thanks.


Building a brand new cluster and readying it for production -- advice needed

2012-03-13 Thread Maxim Potekhin

Dear All,

after all the testing and continuous operation of my first cluster,
I've been given an OK to build a second production Cassandra cluster in 
Europe.


There were posts in recent weeks regarding the most stable and solid 
Cassandra version.

I was wondering is anything better has appeared since it was last discussed.

At this juncture, I don't need features, just rock solid stability. Are 
0.8.* versions still acceptable,

since I have experience with these, or should I take the plunge to 1+?

I realize that I won't need more than 8GB RAM because I can't make Java 
heap too big. Is worth it
still to pay money for extra RAM? Is the cache located outside of heap 
in recent versions?


Thanks to all of you for the advice I'm receiving on this board.

Best regards

Maxim



Re: Adding a new node to already existing single-node-cluster cassandra

2012-03-13 Thread aaron morton
Sounds similar to 
http://www.mail-archive.com/user@cassandra.apache.org/msg20926.html

Are you able to try adding the node  again with logging set to DEBUG (in 
/etc/cassandra/log4j-server.properties) . (Please make sure the system 
directory is empty (/var/lib/cassandra/data/system) *NOTE* do not clear this 
dir if the node has already joined)

It looks like the node has not detected the cluster yet for some reason. You 
can try passing the JVM option cassandra.ring_delay_ms  (in cassandra-env.sh) 
to override the period it waits, the default is 3 (30 secs). 

Could you add a ticket here https://issues.apache.org/jira/browse/CASSANDRA as 
well. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/03/2012, at 1:36 AM, Rishabh Agrawal wrote:

 Hello,
  
 I have been trying to add a node to single node cluster of Cassandra (1.0.8) 
 but I always get following error:
  
 INFO 17:50:35,555 JOINING: schema complete, ready to bootstrap
 INFO 17:50:35,556 JOINING: getting bootstrap token
 ERROR 17:50:35,557 Exception encountered during startup
 java.lang.RuntimeException: No other nodes seen!  Unable to bootstrap.If you 
 intended to start a single-node cluster, you should make sure your 
 broadcast_address (or listen_address) is listed as a seed.  Otherwise, you 
 need to determine why the seed being contacted has no knowledge of the rest 
 of the cluster.  Usually, this can be solved by giving all nodes the same 
 seed list.
 at 
 org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:168)
 at 
 org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java:150)
 at 
 org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.java:145)
 at 
 org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:565)
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:484)
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:395)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:234)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
 java.lang.RuntimeException: No other nodes seen!  Unable to bootstrap.If you 
 intended to start a single-node cluster, you should make sure your 
 broadcast_address (or listen_address) is listed as a seed.  Otherwise, you 
 need to determine why the seed being contacted has no knowledge of the rest 
 of the cluster.  Usually, this can be solved by giving all nodes the same 
 seed list.
 at 
 org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:168)
 at 
 org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java:150)
 at 
 org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.java:145)
 at 
 org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:565)
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:484)
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:395)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:234)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
 Exception encountered during startup: No other nodes seen!  Unable to 
 bootstrap.If you intended to start a single-node cluster, you should make 
 sure your broadcast_address (or listen_address) is listed as a seed.  
 Otherwise, you need to determine why the seed being contacted has no 
 knowledge of the rest of the cluster.  Usually, this can be solved by giving 
 all nodes the same seed list.
 INFO 17:50:35,571 Waiting for messaging service to quiesce
 INFO 17:50:35,571 MessagingService shutting down server thread.
  
 Kindly help me asap.
  
 Regards
 Rishabh Agrawal
 
 
 Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know 
 more about our Big Data quick-start program at the event. 
 
 New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis On-premise’ 
 available at http://bit.ly/z6zT4L. 
 
 
 NOTE: This message may contain information that is confidential, proprietary, 
 privileged or otherwise protected by law. The message is intended solely for 
 the named addressee. If received in error, please destroy and notify the 
 sender. Any use of this email is prohibited when received in error. Impetus 
 does not represent, warrant and/or guarantee, that the integrity of this 
 communication has been maintained nor that the communication is free of 
 errors, virus, interception or interference.



Re: how to increase compaction rate?

2012-03-13 Thread Viktor Jevdokimov
After loosing one node we had to repair, CFs was on leveled compaction.
For one CF each node had about 7GB of data.
Running a repair without primary range switch ended up some nodes exhausted
to about 60-100GB of 5MB sstables for that CF (a lot of files).
After switching back from leveled to tiered we ended up completely blocked
compactions on all nodes since this CF were compacting forever.
On one node a major compaction for that CF is CPU bound and may run with
unlimited compaction speed for 4-7 days at maximum 1MB/s rate, finally
compacting to 3GB of data (some data is deleted by TTL, some merged).

What we did to speedup this process to return all exhausted nodes into
normal state faster:
We have created a 6 temporary virtual single Cassandra nodes with 2 CPU
cores and 8GB RAM.
Stopped completely a compaction for CF on a production node.
Leveled sstables from this production node was divided into 6 ranges and
copied into 6 temporary empty nodes.
On each node we ran a major compaction to compact just 1/6 of data, about
10-14GB. It took 1-2 hours to compact them into 1GB of data.
Then all 6 sstables was copied into one of 6 nodes for a major compaction,
finally getting expected 3GB sstable.
Stopping production node, deleting files that was copied, returning
compacted (may need renaming) and node is back to normal.

Using separate nodes we saved original production nodes time not to compact
exhausted CF forever, blocking compactions for other CFs. With 6 separate
nodes we have compacted 2 productions nodes a day, so maybe it took the
same time, but production nodes were free for regular compactions for other
CFs.

After back to normal for our use case we stick to tiered compaction with a
major compaction nightly.
With our insertion/TTL deletion rates a leveled compaction is a nightmare,
even if amount of data is not very huge, just a few GBs/node.

2012/3/13 Thorsten von Eicken t...@rightscale.com

 On 3/12/2012 6:52 AM, Brandon Williams wrote:
  On Mon, Mar 12, 2012 at 4:44 AM, aaron morton aa...@thelastpickle.com
 wrote:
  I don't understand why I
  don't get multiple concurrent compactions running, that's what would
  make the biggest performance difference.
 
  concurrent_compactors
  Controls how many concurrent compactions to run, by default it's the
 number
  of cores on the machine.
 I'm on a quad-core machine so not setting concurrent_compactors should
 not be a limiting factor...
  With leveled compaction, I don't think you get any concurrency because
  it has to compact an entire level, and it can't proceed to the next
  level without completing the one before it.
 
  In short, if you want maximum throughput, stick with size tiered.
 I switched the CFs to tiered compaction and I still get no concurrency
 for the same CF. I now have two compactions running concurrently but
 always for different CFs. I've briefly seen a third for one of the small
 CFs, so it's willing to run more than two concurrently. Looks like I
 have to wait for a few days for all the compactions to complete. Talk
 about compaction hell!

 
  -Brandon
 



Re: Composite keys and range queries

2012-03-13 Thread John Laban
Forwarding to the Cassandra mailing list as well, in case this is more of
an issue on how I'm using Cassandra.

Am I correct to assume that I can use range queries on composite row keys,
even when using a RandomPartitioner, if I make sure that the first part of
the composite key is fixed?

Any help would be appreciated,
John



On Tue, Mar 13, 2012 at 12:15 PM, John Laban j...@pagerduty.com wrote:

 Hi,

 I have a column family that uses a composite key:

 (ID, priority) - ...

 Where the ID is a UUID and the priority is an integer.

 I'm trying to perform a range query now:  I want all the rows where the ID
 matches some fixed UUID, but within a range of priorities.  This is
 supported even if I'm using a RandomPartitioner, right?  (Because the first
 key in the composite key is the partition key, and the second part of the
 composite key is automatically ordered?)

 So I perform a range slices query:

 val rangeQuery = HFactory.createRangeSlicesQuery(keyspace, new 
 CompositeSerializer, StringSerializer.get, BytesArraySerializer.get)

 rangeQuery.setColumnFamily(RouteColumnFamilyName).
 setKeys( new Composite(id, priorityStart), new Composite(id, 
 priorityEnd) ).
 setRange( null, null, false, Int.MaxValue )


 But I get this error:

 me.prettyprint.hector.api.exceptions.HInvalidRequestException: 
 InvalidRequestException(why:start key's md5 sorts after end key's md5.  this 
 is not allowed; you probably should not specify end key at all, under 
 RandomPartitioner)


 Shouldn't they have the same md5, since they have the same partition key?

 Am I using the wrong query here, or does Hector not support composte range
 queries, or am I making some mistake in how I think Cassandra's composite
 keys work?

 Thanks,
 John




[Windows] How to configure simple authentication and authorization ?

2012-03-13 Thread Sabbiolina
HI. I followed this:



To set up simple authentication and authorization
1. Edit cassandra.yaml, setting
org.apache.cassandra.auth.SimpleAuthenticator as the
authenticator value. The default value of AllowAllAuthenticator is
equivalent to no authentication.
2. Edit access.properties, adding entries for users and their permissions
to read and write to specified
keyspaces and column families. See access.properties below for details on
the correct format.
3. Make sure that users specified in access.properties have corresponding
entries in passwd.properties.
See passwd.properties below for details and examples.
4. After making the required configuration changes, you must specify the
properties files when starting Cassandra
with the flags -Dpasswd.properties and -Daccess.properties. For example:
cd $CASSANDRA_HOME
sh bin/cassandra -f -Dpasswd.properties=conf/passwd.properties
-Daccess.properties=conf/access.properties


I started services with additional parameters, but no result, no Log,
nothing

I use datastax 1.0.8 communiti edition on win 7 64 bit


Tnxs


snapshot files locked

2012-03-13 Thread Jim Newsham


Hi,

I'm using Cassandra 1.0.8, on Windows 7.  When I take a snapshot of the 
database, I find that I am unable to delete the snapshot directory 
(i.e., dir named {datadir}\{keyspacename}\snapshots\{snapshottag}) 
while Cassandra is running:  The action can't be completed because the 
folder or a file in it is open in another program.  Close the folder or 
file and try again.  If I terminate Cassandra, then I can delete the 
directory with no problem.  Is there a reason why Cassandra must hold 
onto these files?


Thanks,
Jim



high level of MemtablePostFlusher pending events

2012-03-13 Thread David Hawthorne
5 node cluster running 1.0.2, doing about 1300 reads and 1300 writes/sec into 3 
column families in the same keyspace.  2 client machines, doing about the same 
amount of reads/writes, but one has an average response time in the 4-40ms 
range and the other in the 200-800ms range.  Both running identical software, 
homebrew with hector-1.0-3 client.

Traffic was peaking out at 6k reads and 6k writes/sec, according to reporting 
from our software, and now it's topping out at 1300/sec each.  The cpus on the 
cassy boxes are bored.  None of the threads within cassandra are chewing more 
than 3% cpu.  Disk is only 10% full on the most loaded box.

MemtablePostFlusher   1   102 36

Not all servers have the same number of pending tasks.  They have 0, 1, 17, 37, 
and 105.

It looks like it's stuck and not recovering, cuz it's been like this for an 
hour.  I've attached the end of the cassandra.log from the server with the most 
pending tasks.  There are some interesting exceptions in there.

As always, all help is always appreciated!  :p




cassandra.log
Description: Binary data


Re: how to increase compaction rate?

2012-03-13 Thread Thorsten von Eicken
On 3/13/2012 4:13 PM, Viktor Jevdokimov wrote:
 What we did to speedup this process to return all exhausted nodes into
 normal state faster:
 We have created a 6 temporary virtual single Cassandra nodes with 2
 CPU cores and 8GB RAM.
 Stopped completely a compaction for CF on a production node.
 Leveled sstables from this production node was divided into 6 ranges
 and copied into 6 temporary empty nodes.
 On each node we ran a major compaction to compact just 1/6 of data,
 about 10-14GB. It took 1-2 hours to compact them into 1GB of data.
 Then all 6 sstables was copied into one of 6 nodes for a major
 compaction, finally getting expected 3GB sstable.
 Stopping production node, deleting files that was copied, returning
 compacted (may need renaming) and node is back to normal.

 Using separate nodes we saved original production nodes time not to
 compact exhausted CF forever, blocking compactions for other CFs. With
 6 separate nodes we have compacted 2 productions nodes a day, so maybe
 it took the same time, but production nodes were free for regular
 compactions for other CFs.
Yikes, that's quite the ordeal, but I totally get why you had to go
there. Cassandra seems to work well within some use-case bounds and
lacks the sophistication to handle others well. I've been wondering
about the way I use it, which is to hold the last N days of logs and
corresponding index. This means that every day I make a zillion inserts
and a corresponding zillion of deletes for the data inserted N days ago.
The way the compaction works this is horrible. The data is essentially
immutable until it's deleted, yet it's copied a whole bunch of times. In
addition, it takes forever for the deletion tombstones to meet the
original data in a compaction and actually compact it away. I've also
run into the zillions of files problem with level compaction you did. I
ended up with over 30k SSTables for ~1TB of data. At that point the
compaction just ceases to make progress. And starting cassandra takes
30 minutes just for it to open all the SSTables and when done 12GB of
memory are used. Better algorithms and some tools will be needed for all
this to just work. But then, we're also just at V1.0.8...
TvE


Re: how to increase compaction rate?

2012-03-13 Thread Edward Capriolo
On Tue, Mar 13, 2012 at 11:32 PM, Thorsten von Eicken
t...@rightscale.com wrote:
 On 3/13/2012 4:13 PM, Viktor Jevdokimov wrote:
 What we did to speedup this process to return all exhausted nodes into
 normal state faster:
 We have created a 6 temporary virtual single Cassandra nodes with 2
 CPU cores and 8GB RAM.
 Stopped completely a compaction for CF on a production node.
 Leveled sstables from this production node was divided into 6 ranges
 and copied into 6 temporary empty nodes.
 On each node we ran a major compaction to compact just 1/6 of data,
 about 10-14GB. It took 1-2 hours to compact them into 1GB of data.
 Then all 6 sstables was copied into one of 6 nodes for a major
 compaction, finally getting expected 3GB sstable.
 Stopping production node, deleting files that was copied, returning
 compacted (may need renaming) and node is back to normal.

 Using separate nodes we saved original production nodes time not to
 compact exhausted CF forever, blocking compactions for other CFs. With
 6 separate nodes we have compacted 2 productions nodes a day, so maybe
 it took the same time, but production nodes were free for regular
 compactions for other CFs.
 Yikes, that's quite the ordeal, but I totally get why you had to go
 there. Cassandra seems to work well within some use-case bounds and
 lacks the sophistication to handle others well. I've been wondering
 about the way I use it, which is to hold the last N days of logs and
 corresponding index. This means that every day I make a zillion inserts
 and a corresponding zillion of deletes for the data inserted N days ago.
 The way the compaction works this is horrible. The data is essentially
 immutable until it's deleted, yet it's copied a whole bunch of times. In
 addition, it takes forever for the deletion tombstones to meet the
 original data in a compaction and actually compact it away. I've also
 run into the zillions of files problem with level compaction you did. I
 ended up with over 30k SSTables for ~1TB of data. At that point the
 compaction just ceases to make progress. And starting cassandra takes
30 minutes just for it to open all the SSTables and when done 12GB of
 memory are used. Better algorithms and some tools will be needed for all
 this to just work. But then, we're also just at V1.0.8...
    TvE

You are correct to say that the way Cassandra works it is not idea for
a dataset where you completely delete and re add the entire dataset
each day. In fact that may be one of the worst use cases for
Cassandra. this has to do with the structured log format and with the
tombstones and grace period. Maybe you can set a lower base.

LevelDB is new and not as common in the wild as the Sized Tiered.
Again it works the way it works. Google must think it is brilliant
after all they invented it.

For a 1TB of data your 12GB is used by bloom filters. Again this is
just a fact of life. Bloom filters are their to make negative lookups
faster. Maybe you can lower the bloom filter sizes and the index
interval. This should use less memory and help the system start up
faster respectively.

But nodes stuffed with a trillion keys may not be optimal for many
reasons. In out case we want a high portion of the data set in memory.
So a 1TB node might need say 256 GB ram :) We opt for more smaller
boxes.


Re: Building a brand new cluster and readying it for production -- advice needed

2012-03-13 Thread Edward Capriolo
Agreed if you are using SSD you likely will not need much as much RAM.
I said You could always do better with more RAM not You should
definitely get more RAM :)

On Tue, Mar 13, 2012 at 7:37 PM, Maxim Potekhin potek...@bnl.gov wrote:
 Thank you Edward.

 As can be expected, my data volume is a multiple of whatever RAM I can
 realistically buy, and in fact much bigger. In my very limited experience,
 the money might be well spent on multicore CPUs because it makes routine
 operations like compact/repair (which always include writes) so much faster,
 hence reducing
 the periods of high occupancy. I'm trying to scope out how much SSD I will
 need because
 it appears to be an economical solution to problems I had previously had.

 Regards,
 Maxim


 On 3/13/2012 10:40 PM, Edward Capriolo wrote:

 I am 1.0.7. I would suggest that. The memtable and JAMM stuff is very
 stable. I would not setup 0.8.X because with 1.1 coming soon 0.8.X is
 not likely to see to many more minor releases. You can always do
 better with more RAM up to the size of your data, having more ram them
 data size will not help noticeably . The off heap row cache can use
 this and the OS can cache disk blocks with it.

 Edward

 On Tue, Mar 13, 2012 at 3:15 PM, Maxim Potekhinpotek...@bnl.gov  wrote:

 Dear All,

 after all the testing and continuous operation of my first cluster,
 I've been given an OK to build a second production Cassandra cluster in
 Europe.

 There were posts in recent weeks regarding the most stable and solid
 Cassandra version.
 I was wondering is anything better has appeared since it was last
 discussed.

 At this juncture, I don't need features, just rock solid stability. Are
 0.8.* versions still acceptable,
 since I have experience with these, or should I take the plunge to 1+?

 I realize that I won't need more than 8GB RAM because I can't make Java
 heap
 too big. Is worth it
 still to pay money for extra RAM? Is the cache located outside of heap in
 recent versions?

 Thanks to all of you for the advice I'm receiving on this board.

 Best regards

 Maxim