Re: data model question
Thanks! Better than mine, as it considered later additions of services! Will update my code, Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Mon, Mar 12, 2012 at 11:13 AM, aaron morton aa...@thelastpickle.comwrote: In this case, where you know the query upfront, I add a custom secondary index using another CF to support the query. It's a little easier here because the data wont change. UserLookupCF (using composite types for the key value) row_key: system_name:id e.g. facebook:12345 or twitter:12345 col_name : internal_user_id e.g. 5678 col_value: empty Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 11/03/2012, at 11:15 PM, Tamar Fraenkel wrote: Hi! Thanks for the response. From what I read, secondary indices are good only for columns with few possible values. Is this a good fit for my case? I have unique facebook id for every user. Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Sun, Mar 11, 2012 at 11:48 AM, Marcel Steinbach mstei...@gmail.comwrote: Either you do that or you could think about using a secondary index on the fb user name in your primary cf. See http://www.datastax.com/docs/1.0/ddl/indexes Cheers Am 11.03.2012 um 09:51 schrieb Tamar Fraenkel ta...@tok-media.com: Hi! I need some advise: I have user CF, which has a UUID key which is my internal user id. One of the column is facebook_id of the user (if exist). I need to have the reverse mapping from facebook_id to my UUID. My intention is to add a CF for the mapping from Facebook Id to my id: user_by_fbid = { // key is fb Id, column name is our User Id, value is empty 13101876963: { f94f6b20-161a-4f7e-995f-0466c62a1b6b : } } Does this makes sense. This CF will be used whenever a user log in through Facebook to retrieve the internal id. Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 tokLogo.png
Re: running two rings on the same subnet
Done it. Now it generally runs ok, till one of the nodes get's stuck with 100% cpu and I need to reboot it. Last lines in the system.log just before are: INFO [OptionalTasks:1] 2012-03-13 07:36:43,850 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='tok', ColumnFamily='tk_vertical_tag_story_indx') (estimated 35417890 bytes) INFO [OptionalTasks:1] 2012-03-13 07:36:43,869 ColumnFamilyStore.java (line 704) Enqueuing flush of Memtable-tk_vertical_tag_story_indx@2002820169(1620316/35417890 serialized/live bytes, 30572 ops) INFO [FlushWriter:76] 2012-03-13 07:36:43,869 Memtable.java (line 246) Writing Memtable-tk_vertical_tag_story_indx@2002820169(1620316/35417890 serialized/live bytes, 30572 ops) INFO [FlushWriter:76] 2012-03-13 07:36:44,015 Memtable.java (line 283) Completed flushing /opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-191-Data.db (2134123 bytes) INFO [OptionalTasks:1] 2012-03-13 07:37:37,886 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='tok', ColumnFamily='tk_vertical_tag_story_indx') (estimated 34389135 bytes) INFO [OptionalTasks:1] 2012-03-13 07:37:37,887 ColumnFamilyStore.java (line 704) Enqueuing flush of Memtable-tk_vertical_tag_story_indx@1869953681(1573252/34389135 serialized/live bytes, 29684 ops) INFO [FlushWriter:76] 2012-03-13 07:37:37,887 Memtable.java (line 246) Writing Memtable-tk_vertical_tag_story_indx@1869953681(1573252/34389135 serialized/live bytes, 29684 ops) INFO [FlushWrit Any idea? I am considering adding a third node, so that replication factor of 2 won't stuck my system when one node goes down. Does it make sense? Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 6, 2012 at 7:51 PM, aaron morton aa...@thelastpickle.comwrote: Reduce these settings for the CF row_cache (disable it) key_cache (disable it) Increase these settings for the CF bloom_filter_fp_chance Reduce these settings in cassandra.yaml flush_largest_memtables_at memtable_flush_queue_size sliced_buffer_size_in_kb in_memory_compaction_limit_in_mb concurrent_compactors Increase these settings index_interval While it obviously depends on load, I would not be surprised if you had a lot of trouble running cassandra with that setup. Cheers A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/03/2012, at 11:02 PM, Tamar Fraenkel wrote: Arron, Thanks for your response. I was afraid this is the issue. Can you give me some direction regarding the fine tuning of my VMs, I would like to explore that option some more. Thanks! *Tamar Fraenkel * Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 6, 2012 at 11:58 AM, aaron morton aa...@thelastpickle.comwrote: You do not have enough memory allocated to the JVM and are suffering from excessive GC as a result. There are some tuning things you can try, but 480MB is not enough. 1GB would be a better start, 2 better than that. Consider using https://github.com/pcmanus/ccm for testing multiple instances on a single server rather than a VM. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/03/2012, at 10:21 PM, Tamar Fraenkel wrote: I have some more info, after couple of hours running the problematic node became again 100% CPU and I had to reboot it, last lines from log show it did GC: INFO [ScheduledTasks:1] 2012-03-06 10:28:00,880 GCInspector.java (line 122) GC for Copy: 203 ms for 1 collections, 185983456 used; max is 513802240 INFO [ScheduledTasks:1] 2012-03-06 10:28:50,595 GCInspector.java (line 122) GC for Copy: 3927 ms for 1 collections, 156572576 used; max is 513802240 INFO [ScheduledTasks:1] 2012-03-06 10:28:55,434 StatusLogger.java (line 50) Pool NameActive Pending Blocked INFO [ScheduledTasks:1] 2012-03-06 10:29:03,298 StatusLogger.java (line 65) ReadStage 2 2 0 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,499 StatusLogger.java (line 65) RequestResponseStage 0 0 0 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line 65) ReadRepairStage 0 0 0 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line 65) MutationStage 0 0 0 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line 65) ReplicateOnWriteStage 0 0 0 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line 65) GossipStage 0 0 0 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line 65)
Re: Adding node to Cassandra
2. Move node 'D' initial token down from 150... to 130... Here we ran into a problem. When move started disk usage for node C grows from 400 to 750GB, we saw running compactions on node 'D' but some compactions failed with Did you run out of space on C or D ? We expected decrease of used disk space on node 'D' 'cause we shrink token range for this node, but saw the opposite, why it happened and is it normal behavior? Remember that node D is also holding replicas of the token ranges assigned to node B and C. At first glance it sounds unusual but it's hard to tell without knowing more about what happened. How long did it take to build up ? What sort of load was the system under? What was in the data directory, was there -tmp files in there or lots of small files ? What did nodetool compactionstats say, was compaction was keeping up ? Moving forward, *if* you see a lot of old files in the data dir you may benefit from running a manual compaction as it may reduce the amount of data transferred. There are some downsides to this. Check the data stax site or ask if you do not know what they are. Hope that helps - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/03/2012, at 3:38 AM, Rustam Aliyev wrote: It's hard to answer this question because there are whole bunch of operations which may cause disk usage growth - repair, compaction, move etc. Any combination of these operations will make things only worse. But let's assume that in your case the only operation increasing disk usage was move. Simply speaking move does not move data from one node to another, it just copies data. Once data copied, you need to cleanup data which node is not responsible for using cleanup command. If you can't increase storage, maybe you can try moving nodes slowly. I.e. Instead of moving node D from 150... to 130..., try going first to 140..., cleanup and then from 140... to 130... However, I never tried this and can't guarantee that it will use less disk space. In the past, someone reported x2.5 increase when they went from 4 nodes to 5. -- Rustam. On 12/03/2012 12:46, Vanger wrote: Cassandra v1.0.8 once again: 4-nodes cluster, RF = 3. On 12.03.2012 16:18, Rustam Aliyev wrote: What version of Cassandra do you have? On 12/03/2012 11:38, Vanger wrote: We were aware of compaction overhead, but still don't understand why that shall happened: node 'D' was in stable condition, works for at least month, had all data for its token range and was comfortable with such disk space. Why suddenly node needs 2x more space for data it already have? Why decreasing token range not lead to decreasing disk usage? On 12.03.2012 15:14, Rustam Aliyev wrote: Hi, If you use SizeTieredCompactionStrategy, you should have x2 disk space to be on the safe side. So if you want to store 2TB data, you need partition size of 4TB at least. LeveledCompactionStrategy is available in 1.x and supposed to require less free disk space (but comes at price of I/O). -- Rustam. On 12/03/2012 09:23, Vanger wrote: We have cassandra 4 nodes cluster with RF = 3 (nodes named from 'A' to 'D', initial tokens: A (25%): 20543402371996174596346065790779111550, B (25%): 63454860067234500516210522518260948578, C (25%): 106715317233367107622067286720208938865, D (25%): 150141183460469231731687303715884105728), and want to add 5th node ('E') with initial token = 164163260474281062972548100673162157075, then we want to rebalance A, D, E nodes such way they'll own equal percentage of data. All nodes have ~400 GB of data and around ~300GB disk free space. What we did: 1. 'Join' new cassandra instance (node 'E') to cluster and wait 'till it loads data for it tokens range. 2. Move node 'D' initial token down from 150... to 130... Here we ran into a problem. When move started disk usage for node C grows from 400 to 750GB, we saw running compactions on node 'D' but some compactions failed with WARN [CompactionExecutor:580] 2012-03-11 16:57:56,036 CompactionTask.java (line 87) insufficient space to compact all requested files SSTableReader after that we killed move process to avoid out of disk space error (when 5GB of free space left). After restart it frees 100GB of space and now we have total of 105GB free disk space on node 'D'. Also we noticed increased disk usage by ~150GB at node 'B' but it stops growing before we stopped move token. So now we have 5 nodes in cluster in status like this: Node, Owns%, Load, Init. token A: 16% 400GB020... B: 25% 520GB063... C: 25% 400GB106... D: 25% 640GB150... E: 9% 300GB164... We'll add disk space for all nodes and run some cleanups, but there's still left some questions: What is
Re: OOM opening bloom filter
Thanks for the update. How much smaller did the BF get to ? A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/03/2012, at 8:24 AM, Mick Semb Wever wrote: It's my understanding then for this use case that bloom filters are of little importance and that i can Ok. To summarise our actions to get us out of this situation, in hope that it may help others one day, we did the following actions: 1) upgrade to 1.0.7 2) set fp_ratio=0.99 3) set index_interval=4096 4) restarted the node with Xmx30G 5) run `nodetool scrub` and monitor total size of bf files using `du -hc *-Filter.db | grep total` 6) restart node with original Xmx setting once total bf size is under (scrub was running for 12hrs) (remaining bloom filters can be rebuilt later from normal compact) Hopefully it will also eventuate that this cluster can run with a more normal Xmx4G rather than the previous Xmx12G. (2) and (3) are very much dependent on our set up using hadoop where all reads are get_range_slice with 16k rows per request. Both could be tuned correctly but they're the numbers that worked first up. ~mck -- When there is no enemy within, the enemies outside can't hurt you. African proverb | http://github.com/finn-no | http://tech.finn.no |
Re: running two rings on the same subnet
If you are on Ubuntu it may be this http://wiki.apache.org/cassandra/FAQ#ubuntu_hangs otherwise I would look for GC problems. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/03/2012, at 7:53 PM, Tamar Fraenkel wrote: Done it. Now it generally runs ok, till one of the nodes get's stuck with 100% cpu and I need to reboot it. Last lines in the system.log just before are: INFO [OptionalTasks:1] 2012-03-13 07:36:43,850 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='tok', ColumnFamily='tk_vertical_tag_story_indx') (estimated 35417890 bytes) INFO [OptionalTasks:1] 2012-03-13 07:36:43,869 ColumnFamilyStore.java (line 704) Enqueuing flush of Memtable-tk_vertical_tag_story_indx@2002820169(1620316/35417890 serialized/live bytes, 30572 ops) INFO [FlushWriter:76] 2012-03-13 07:36:43,869 Memtable.java (line 246) Writing Memtable-tk_vertical_tag_story_indx@2002820169(1620316/35417890 serialized/live bytes, 30572 ops) INFO [FlushWriter:76] 2012-03-13 07:36:44,015 Memtable.java (line 283) Completed flushing /opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-191-Data.db (2134123 bytes) INFO [OptionalTasks:1] 2012-03-13 07:37:37,886 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='tok', ColumnFamily='tk_vertical_tag_story_indx') (estimated 34389135 bytes) INFO [OptionalTasks:1] 2012-03-13 07:37:37,887 ColumnFamilyStore.java (line 704) Enqueuing flush of Memtable-tk_vertical_tag_story_indx@1869953681(1573252/34389135 serialized/live bytes, 29684 ops) INFO [FlushWriter:76] 2012-03-13 07:37:37,887 Memtable.java (line 246) Writing Memtable-tk_vertical_tag_story_indx@1869953681(1573252/34389135 serialized/live bytes, 29684 ops) INFO [FlushWrit Any idea? I am considering adding a third node, so that replication factor of 2 won't stuck my system when one node goes down. Does it make sense? Thanks Tamar Fraenkel Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 6, 2012 at 7:51 PM, aaron morton aa...@thelastpickle.com wrote: Reduce these settings for the CF row_cache (disable it) key_cache (disable it) Increase these settings for the CF bloom_filter_fp_chance Reduce these settings in cassandra.yaml flush_largest_memtables_at memtable_flush_queue_size sliced_buffer_size_in_kb in_memory_compaction_limit_in_mb concurrent_compactors Increase these settings index_interval While it obviously depends on load, I would not be surprised if you had a lot of trouble running cassandra with that setup. Cheers A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/03/2012, at 11:02 PM, Tamar Fraenkel wrote: Arron, Thanks for your response. I was afraid this is the issue. Can you give me some direction regarding the fine tuning of my VMs, I would like to explore that option some more. Thanks! Tamar Fraenkel Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 6, 2012 at 11:58 AM, aaron morton aa...@thelastpickle.com wrote: You do not have enough memory allocated to the JVM and are suffering from excessive GC as a result. There are some tuning things you can try, but 480MB is not enough. 1GB would be a better start, 2 better than that. Consider using https://github.com/pcmanus/ccm for testing multiple instances on a single server rather than a VM. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/03/2012, at 10:21 PM, Tamar Fraenkel wrote: I have some more info, after couple of hours running the problematic node became again 100% CPU and I had to reboot it, last lines from log show it did GC: INFO [ScheduledTasks:1] 2012-03-06 10:28:00,880 GCInspector.java (line 122) GC for Copy: 203 ms for 1 collections, 185983456 used; max is 513802240 INFO [ScheduledTasks:1] 2012-03-06 10:28:50,595 GCInspector.java (line 122) GC for Copy: 3927 ms for 1 collections, 156572576 used; max is 513802240 INFO [ScheduledTasks:1] 2012-03-06 10:28:55,434 StatusLogger.java (line 50) Pool NameActive Pending Blocked INFO [ScheduledTasks:1] 2012-03-06 10:29:03,298 StatusLogger.java (line 65) ReadStage 2 2 0 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,499 StatusLogger.java (line 65) RequestResponseStage 0 0 0 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line 65) ReadRepairStage 0 0 0 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line 65) MutationStage
Row iteration over indexed clause
Hi, Is it possible to iterate and fetch in chunks using thrift API by querying using secondary indexes? -Vivek
Adding a new node to already existing single-node-cluster cassandra
Hello, I have been trying to add a node to single node cluster of Cassandra (1.0.8) but I always get following error: INFO 17:50:35,555 JOINING: schema complete, ready to bootstrap INFO 17:50:35,556 JOINING: getting bootstrap token ERROR 17:50:35,557 Exception encountered during startup java.lang.RuntimeException: No other nodes seen! Unable to bootstrap.If you intended to start a single-node cluster, you should make sure your broadcast_address (or listen_address) is listed as a seed. Otherwise, you need to determine why the seed being contacted has no knowledge of the rest of the cluster. Usually, this can be solved by giving all nodes the same seed list. at org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:168) at org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java:150) at org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.java:145) at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:565) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:484) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:395) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:234) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107) java.lang.RuntimeException: No other nodes seen! Unable to bootstrap.If you intended to start a single-node cluster, you should make sure your broadcast_address (or listen_address) is listed as a seed. Otherwise, you need to determine why the seed being contacted has no knowledge of the rest of the cluster. Usually, this can be solved by giving all nodes the same seed list. at org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:168) at org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java:150) at org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.java:145) at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:565) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:484) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:395) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:234) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107) Exception encountered during startup: No other nodes seen! Unable to bootstrap.If you intended to start a single-node cluster, you should make sure your broadcast_address (or listen_address) is listed as a seed. Otherwise, you need to determine why the seed being contacted has no knowledge of the rest of the cluster. Usually, this can be solved by giving all nodes the same seed list. INFO 17:50:35,571 Waiting for messaging service to quiesce INFO 17:50:35,571 MessagingService shutting down server thread. Kindly help me asap. Regards Rishabh Agrawal Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more about our Big Data quick-start program at the event. New Impetus webcast 'Cloud-enabled Performance Testing vis-?-vis On-premise' available at http://bit.ly/z6zT4L. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Row iteration over indexed clause
Yes.use get_indexed_slices (http://wiki.apache.org/cassandra/API) On Tue, Mar 13, 2012 at 2:12 PM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, Is it possible to iterate and fetch in chunks using thrift API by querying using secondary indexes? -Vivek
Re: Row iteration over indexed clause
Thanks. *Attribute* *Type* *Default* *Required* *Description* expressions listIndexExpression n/a Y The list of IndexExpression objects which must contain one EQ IndexOperator among the expressions start_key binary n/a Y Start the index query at the specified key - can be set to '', i.e., an empty byte array, to start with the first key count integer 100 Y The number of results to which the index query will be constrained How do i iterate using it? How do i ensure that it should not return me previous results(without i need to keep something in-memory)? This is the method i am looking into: get_indexed_slices(ColumnParent column_parent, IndexClause index_clause, SlicePredicate column_predicate, ConsistencyLevel consistency_level) It does not have anything like count. Thanks, Vivek On Tue, Mar 13, 2012 at 6:24 PM, Shimi Kiviti shim...@gmail.com wrote: Yes. use get_indexed_slices (http://wiki.apache.org/cassandra/API) On Tue, Mar 13, 2012 at 2:12 PM, Vivek Mishra mishra.v...@gmail.comwrote: Hi, Is it possible to iterate and fetch in chunks using thrift API by querying using secondary indexes? -Vivek
Re: OOM opening bloom filter
How much smaller did the BF get to ? After pending compactions completed today, i'm presuming fp_ratio is applied now to all sstables in the keyspace, it has gone from 20G+ down to 1G. This node is now running comfortably on Xmx4G (used heap ~1.5G). ~mck -- A Microsoft Certified System Engineer is to information technology as a McDonalds Certified Food Specialist is to the culinary arts. Michael Bacarella | http://github.com/finn-no | http://tech.finn.no | signature.asc Description: This is a digitally signed message part
Re: CAn't bootstrap a new node to my cluster
Can you provide some context for the log files please. The original error had to do with bootstrapping a new node into a cluster. The log looks like a node is starting with -Dcassadra.join-ring = false and then nodetool join is run. Is there an error when this runs ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/03/2012, at 11:58 PM, Cyril Scetbon wrote: I don't know if it can helps, but the only thing I see on cluster's nodes is : == /var/log/cassandra/output.log == INFO 10:57:28,530 InetAddress /10.0.1.70 is now dead. when I try to join the node 10.0.1.70 to the cluster On 3/12/12 11:27 AM, Cyril Scetbon wrote: It's done. Nothing new on stderr when I use the join command. I send you the logfiles after I've tried to add the node. Regards On 3/12/12 10:47 AM, aaron morton wrote: Modify this line the log4j-server.properties. It will normally be located in /etc/cassandra https://github.com/apache/cassandra/blob/trunk/conf/log4j-server.properties#L21 Change INFO to DEBUG Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/03/2012, at 10:12 PM, Cyril Scetbon wrote: On 3/12/12 9:50 AM, aaron morton wrote: It may be the case that the joining node does not have enough information. But there is a default 30 second delay while the node waits for the ring information to stabilise. What version are you using ? 1.0.7 Next time you add a new node can you try it with logging set the DEBUG. If you get the error please add it to https://issues.apache.org/jira/browse/CASSANDRA with the relevant logs. where do I have to add it ? I added it to the cassandra-env.sh and got a lot of things but are you saying that I must add it to the join command ? if yes, how ? after the join command fails as you saw I have the ring information after that. I don't if it took 30 seconds or not ... Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com -- Cyril SCETBON -- Cyril SCETBON -- Cyril SCETBON
Re: how to increase compaction rate?
On 3/12/2012 6:52 AM, Brandon Williams wrote: On Mon, Mar 12, 2012 at 4:44 AM, aaron morton aa...@thelastpickle.com wrote: I don't understand why I don't get multiple concurrent compactions running, that's what would make the biggest performance difference. concurrent_compactors Controls how many concurrent compactions to run, by default it's the number of cores on the machine. I'm on a quad-core machine so not setting concurrent_compactors should not be a limiting factor... With leveled compaction, I don't think you get any concurrency because it has to compact an entire level, and it can't proceed to the next level without completing the one before it. In short, if you want maximum throughput, stick with size tiered. I switched the CFs to tiered compaction and I still get no concurrency for the same CF. I now have two compactions running concurrently but always for different CFs. I've briefly seen a third for one of the small CFs, so it's willing to run more than two concurrently. Looks like I have to wait for a few days for all the compactions to complete. Talk about compaction hell! -Brandon
Re: Why is row lookup much faster than column lookup
Given the hashtable nature of cassandra, finding a row is probably 'relatively' constant no matter how many columns you have.The smaller the number of columns, i suppose the more likely that all the columns will be in one sstable. If you've got a ton of columns per row, it is much more likely that these columns will be spread out in multple ss tables. Plus, columns are read in chunks, depending on yaml settings. - Original Message -From: quot;A Jquot; ;s5a...@gmail.com
Re: Why is row lookup much faster than column lookup
sorry, should have been: Given the hashtable nature of cassandra, finding a row is probably 'relatively' constant no matter how many *rows* you have. - Original Message -From: quot;Dave Brosiusquot; ;dbros...@mebigfatguy.com
Question on ByteOrdered rebalancing
The ring command on nodetool shows as Address DC RackStatus State LoadOwns Token Token(bytes[88401b216270ab8ebb690946b0b70eab]) 10.1.1.1 datacenter1 rack1 Up Normal 69.1 KB 50.00% Token(bytes[4936c862b88db2bdd92d684583bf0280]) 10.1.1.2datacenter1 rack1 Up Normal 69.1 KB 50.00% Token(bytes[88401b216270ab8ebb690946b0b70eab]) The token looks like a MD5 value, is that correct? So when rebalancing the cluster, what is the token value I am supposed to give the move command (with RP it is the token between 0- 2^127), what should I use BOP? Thanks
Re: Question on ByteOrdered rebalancing
The tokens are hex encoded arrays of bytes. On Tue, Mar 13, 2012 at 1:05 PM, work late worklate1...@gmail.com wrote: The ring command on nodetool shows as Address DC RackStatus State Load OwnsToken Token(bytes[88401b216270ab8ebb690946b0b70eab]) 10.1.1.1 datacenter1 rack1 Up Normal 69.1 KB 50.00% Token(bytes[4936c862b88db2bdd92d684583bf0280]) 10.1.1.2datacenter1 rack1 Up Normal 69.1 KB 50.00% Token(bytes[88401b216270ab8ebb690946b0b70eab]) The token looks like a MD5 value, is that correct? So when rebalancing the cluster, what is the token value I am supposed to give the move command (with RP it is the token between 0- 2^127), what should I use BOP? Thanks -- Tyler Hobbs DataStax http://datastax.com/
Does the 'batch' order matter ?
I know batch operations are not atomic but does the success of a write imply all writes preceeding it in the batch were successful ? For example, using cql: BEGIN BATCH USING CONSISTENCY QUORUM AND TTL 864 INSERT INTO users (KEY, password, name) VALUES ('user2', 'ch@ngem3b', 'second user') UPDATE users SET password = 'ps22dhds' WHERE KEY = 'user2' INSERT INTO users (KEY, password) VALUES ('user3', 'ch@ngem3c') DELETE name FROM users WHERE key = 'user2' INSERT INTO users (KEY, password, name) VALUES ('user4', 'ch@ngem3c', 'Andrew') APPLY BATCH; Say the batch failed but I see that the third write was present on a node. Does it imply that the first insert and the second update definitely made to that node as well ? Thanks.
Building a brand new cluster and readying it for production -- advice needed
Dear All, after all the testing and continuous operation of my first cluster, I've been given an OK to build a second production Cassandra cluster in Europe. There were posts in recent weeks regarding the most stable and solid Cassandra version. I was wondering is anything better has appeared since it was last discussed. At this juncture, I don't need features, just rock solid stability. Are 0.8.* versions still acceptable, since I have experience with these, or should I take the plunge to 1+? I realize that I won't need more than 8GB RAM because I can't make Java heap too big. Is worth it still to pay money for extra RAM? Is the cache located outside of heap in recent versions? Thanks to all of you for the advice I'm receiving on this board. Best regards Maxim
Re: Adding a new node to already existing single-node-cluster cassandra
Sounds similar to http://www.mail-archive.com/user@cassandra.apache.org/msg20926.html Are you able to try adding the node again with logging set to DEBUG (in /etc/cassandra/log4j-server.properties) . (Please make sure the system directory is empty (/var/lib/cassandra/data/system) *NOTE* do not clear this dir if the node has already joined) It looks like the node has not detected the cluster yet for some reason. You can try passing the JVM option cassandra.ring_delay_ms (in cassandra-env.sh) to override the period it waits, the default is 3 (30 secs). Could you add a ticket here https://issues.apache.org/jira/browse/CASSANDRA as well. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/03/2012, at 1:36 AM, Rishabh Agrawal wrote: Hello, I have been trying to add a node to single node cluster of Cassandra (1.0.8) but I always get following error: INFO 17:50:35,555 JOINING: schema complete, ready to bootstrap INFO 17:50:35,556 JOINING: getting bootstrap token ERROR 17:50:35,557 Exception encountered during startup java.lang.RuntimeException: No other nodes seen! Unable to bootstrap.If you intended to start a single-node cluster, you should make sure your broadcast_address (or listen_address) is listed as a seed. Otherwise, you need to determine why the seed being contacted has no knowledge of the rest of the cluster. Usually, this can be solved by giving all nodes the same seed list. at org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:168) at org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java:150) at org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.java:145) at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:565) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:484) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:395) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:234) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107) java.lang.RuntimeException: No other nodes seen! Unable to bootstrap.If you intended to start a single-node cluster, you should make sure your broadcast_address (or listen_address) is listed as a seed. Otherwise, you need to determine why the seed being contacted has no knowledge of the rest of the cluster. Usually, this can be solved by giving all nodes the same seed list. at org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:168) at org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java:150) at org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.java:145) at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:565) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:484) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:395) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:234) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107) Exception encountered during startup: No other nodes seen! Unable to bootstrap.If you intended to start a single-node cluster, you should make sure your broadcast_address (or listen_address) is listed as a seed. Otherwise, you need to determine why the seed being contacted has no knowledge of the rest of the cluster. Usually, this can be solved by giving all nodes the same seed list. INFO 17:50:35,571 Waiting for messaging service to quiesce INFO 17:50:35,571 MessagingService shutting down server thread. Kindly help me asap. Regards Rishabh Agrawal Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more about our Big Data quick-start program at the event. New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis On-premise’ available at http://bit.ly/z6zT4L. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: how to increase compaction rate?
After loosing one node we had to repair, CFs was on leveled compaction. For one CF each node had about 7GB of data. Running a repair without primary range switch ended up some nodes exhausted to about 60-100GB of 5MB sstables for that CF (a lot of files). After switching back from leveled to tiered we ended up completely blocked compactions on all nodes since this CF were compacting forever. On one node a major compaction for that CF is CPU bound and may run with unlimited compaction speed for 4-7 days at maximum 1MB/s rate, finally compacting to 3GB of data (some data is deleted by TTL, some merged). What we did to speedup this process to return all exhausted nodes into normal state faster: We have created a 6 temporary virtual single Cassandra nodes with 2 CPU cores and 8GB RAM. Stopped completely a compaction for CF on a production node. Leveled sstables from this production node was divided into 6 ranges and copied into 6 temporary empty nodes. On each node we ran a major compaction to compact just 1/6 of data, about 10-14GB. It took 1-2 hours to compact them into 1GB of data. Then all 6 sstables was copied into one of 6 nodes for a major compaction, finally getting expected 3GB sstable. Stopping production node, deleting files that was copied, returning compacted (may need renaming) and node is back to normal. Using separate nodes we saved original production nodes time not to compact exhausted CF forever, blocking compactions for other CFs. With 6 separate nodes we have compacted 2 productions nodes a day, so maybe it took the same time, but production nodes were free for regular compactions for other CFs. After back to normal for our use case we stick to tiered compaction with a major compaction nightly. With our insertion/TTL deletion rates a leveled compaction is a nightmare, even if amount of data is not very huge, just a few GBs/node. 2012/3/13 Thorsten von Eicken t...@rightscale.com On 3/12/2012 6:52 AM, Brandon Williams wrote: On Mon, Mar 12, 2012 at 4:44 AM, aaron morton aa...@thelastpickle.com wrote: I don't understand why I don't get multiple concurrent compactions running, that's what would make the biggest performance difference. concurrent_compactors Controls how many concurrent compactions to run, by default it's the number of cores on the machine. I'm on a quad-core machine so not setting concurrent_compactors should not be a limiting factor... With leveled compaction, I don't think you get any concurrency because it has to compact an entire level, and it can't proceed to the next level without completing the one before it. In short, if you want maximum throughput, stick with size tiered. I switched the CFs to tiered compaction and I still get no concurrency for the same CF. I now have two compactions running concurrently but always for different CFs. I've briefly seen a third for one of the small CFs, so it's willing to run more than two concurrently. Looks like I have to wait for a few days for all the compactions to complete. Talk about compaction hell! -Brandon
Re: Composite keys and range queries
Forwarding to the Cassandra mailing list as well, in case this is more of an issue on how I'm using Cassandra. Am I correct to assume that I can use range queries on composite row keys, even when using a RandomPartitioner, if I make sure that the first part of the composite key is fixed? Any help would be appreciated, John On Tue, Mar 13, 2012 at 12:15 PM, John Laban j...@pagerduty.com wrote: Hi, I have a column family that uses a composite key: (ID, priority) - ... Where the ID is a UUID and the priority is an integer. I'm trying to perform a range query now: I want all the rows where the ID matches some fixed UUID, but within a range of priorities. This is supported even if I'm using a RandomPartitioner, right? (Because the first key in the composite key is the partition key, and the second part of the composite key is automatically ordered?) So I perform a range slices query: val rangeQuery = HFactory.createRangeSlicesQuery(keyspace, new CompositeSerializer, StringSerializer.get, BytesArraySerializer.get) rangeQuery.setColumnFamily(RouteColumnFamilyName). setKeys( new Composite(id, priorityStart), new Composite(id, priorityEnd) ). setRange( null, null, false, Int.MaxValue ) But I get this error: me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:start key's md5 sorts after end key's md5. this is not allowed; you probably should not specify end key at all, under RandomPartitioner) Shouldn't they have the same md5, since they have the same partition key? Am I using the wrong query here, or does Hector not support composte range queries, or am I making some mistake in how I think Cassandra's composite keys work? Thanks, John
[Windows] How to configure simple authentication and authorization ?
HI. I followed this: To set up simple authentication and authorization 1. Edit cassandra.yaml, setting org.apache.cassandra.auth.SimpleAuthenticator as the authenticator value. The default value of AllowAllAuthenticator is equivalent to no authentication. 2. Edit access.properties, adding entries for users and their permissions to read and write to specified keyspaces and column families. See access.properties below for details on the correct format. 3. Make sure that users specified in access.properties have corresponding entries in passwd.properties. See passwd.properties below for details and examples. 4. After making the required configuration changes, you must specify the properties files when starting Cassandra with the flags -Dpasswd.properties and -Daccess.properties. For example: cd $CASSANDRA_HOME sh bin/cassandra -f -Dpasswd.properties=conf/passwd.properties -Daccess.properties=conf/access.properties I started services with additional parameters, but no result, no Log, nothing I use datastax 1.0.8 communiti edition on win 7 64 bit Tnxs
snapshot files locked
Hi, I'm using Cassandra 1.0.8, on Windows 7. When I take a snapshot of the database, I find that I am unable to delete the snapshot directory (i.e., dir named {datadir}\{keyspacename}\snapshots\{snapshottag}) while Cassandra is running: The action can't be completed because the folder or a file in it is open in another program. Close the folder or file and try again. If I terminate Cassandra, then I can delete the directory with no problem. Is there a reason why Cassandra must hold onto these files? Thanks, Jim
high level of MemtablePostFlusher pending events
5 node cluster running 1.0.2, doing about 1300 reads and 1300 writes/sec into 3 column families in the same keyspace. 2 client machines, doing about the same amount of reads/writes, but one has an average response time in the 4-40ms range and the other in the 200-800ms range. Both running identical software, homebrew with hector-1.0-3 client. Traffic was peaking out at 6k reads and 6k writes/sec, according to reporting from our software, and now it's topping out at 1300/sec each. The cpus on the cassy boxes are bored. None of the threads within cassandra are chewing more than 3% cpu. Disk is only 10% full on the most loaded box. MemtablePostFlusher 1 102 36 Not all servers have the same number of pending tasks. They have 0, 1, 17, 37, and 105. It looks like it's stuck and not recovering, cuz it's been like this for an hour. I've attached the end of the cassandra.log from the server with the most pending tasks. There are some interesting exceptions in there. As always, all help is always appreciated! :p cassandra.log Description: Binary data
Re: how to increase compaction rate?
On 3/13/2012 4:13 PM, Viktor Jevdokimov wrote: What we did to speedup this process to return all exhausted nodes into normal state faster: We have created a 6 temporary virtual single Cassandra nodes with 2 CPU cores and 8GB RAM. Stopped completely a compaction for CF on a production node. Leveled sstables from this production node was divided into 6 ranges and copied into 6 temporary empty nodes. On each node we ran a major compaction to compact just 1/6 of data, about 10-14GB. It took 1-2 hours to compact them into 1GB of data. Then all 6 sstables was copied into one of 6 nodes for a major compaction, finally getting expected 3GB sstable. Stopping production node, deleting files that was copied, returning compacted (may need renaming) and node is back to normal. Using separate nodes we saved original production nodes time not to compact exhausted CF forever, blocking compactions for other CFs. With 6 separate nodes we have compacted 2 productions nodes a day, so maybe it took the same time, but production nodes were free for regular compactions for other CFs. Yikes, that's quite the ordeal, but I totally get why you had to go there. Cassandra seems to work well within some use-case bounds and lacks the sophistication to handle others well. I've been wondering about the way I use it, which is to hold the last N days of logs and corresponding index. This means that every day I make a zillion inserts and a corresponding zillion of deletes for the data inserted N days ago. The way the compaction works this is horrible. The data is essentially immutable until it's deleted, yet it's copied a whole bunch of times. In addition, it takes forever for the deletion tombstones to meet the original data in a compaction and actually compact it away. I've also run into the zillions of files problem with level compaction you did. I ended up with over 30k SSTables for ~1TB of data. At that point the compaction just ceases to make progress. And starting cassandra takes 30 minutes just for it to open all the SSTables and when done 12GB of memory are used. Better algorithms and some tools will be needed for all this to just work. But then, we're also just at V1.0.8... TvE
Re: how to increase compaction rate?
On Tue, Mar 13, 2012 at 11:32 PM, Thorsten von Eicken t...@rightscale.com wrote: On 3/13/2012 4:13 PM, Viktor Jevdokimov wrote: What we did to speedup this process to return all exhausted nodes into normal state faster: We have created a 6 temporary virtual single Cassandra nodes with 2 CPU cores and 8GB RAM. Stopped completely a compaction for CF on a production node. Leveled sstables from this production node was divided into 6 ranges and copied into 6 temporary empty nodes. On each node we ran a major compaction to compact just 1/6 of data, about 10-14GB. It took 1-2 hours to compact them into 1GB of data. Then all 6 sstables was copied into one of 6 nodes for a major compaction, finally getting expected 3GB sstable. Stopping production node, deleting files that was copied, returning compacted (may need renaming) and node is back to normal. Using separate nodes we saved original production nodes time not to compact exhausted CF forever, blocking compactions for other CFs. With 6 separate nodes we have compacted 2 productions nodes a day, so maybe it took the same time, but production nodes were free for regular compactions for other CFs. Yikes, that's quite the ordeal, but I totally get why you had to go there. Cassandra seems to work well within some use-case bounds and lacks the sophistication to handle others well. I've been wondering about the way I use it, which is to hold the last N days of logs and corresponding index. This means that every day I make a zillion inserts and a corresponding zillion of deletes for the data inserted N days ago. The way the compaction works this is horrible. The data is essentially immutable until it's deleted, yet it's copied a whole bunch of times. In addition, it takes forever for the deletion tombstones to meet the original data in a compaction and actually compact it away. I've also run into the zillions of files problem with level compaction you did. I ended up with over 30k SSTables for ~1TB of data. At that point the compaction just ceases to make progress. And starting cassandra takes 30 minutes just for it to open all the SSTables and when done 12GB of memory are used. Better algorithms and some tools will be needed for all this to just work. But then, we're also just at V1.0.8... TvE You are correct to say that the way Cassandra works it is not idea for a dataset where you completely delete and re add the entire dataset each day. In fact that may be one of the worst use cases for Cassandra. this has to do with the structured log format and with the tombstones and grace period. Maybe you can set a lower base. LevelDB is new and not as common in the wild as the Sized Tiered. Again it works the way it works. Google must think it is brilliant after all they invented it. For a 1TB of data your 12GB is used by bloom filters. Again this is just a fact of life. Bloom filters are their to make negative lookups faster. Maybe you can lower the bloom filter sizes and the index interval. This should use less memory and help the system start up faster respectively. But nodes stuffed with a trillion keys may not be optimal for many reasons. In out case we want a high portion of the data set in memory. So a 1TB node might need say 256 GB ram :) We opt for more smaller boxes.
Re: Building a brand new cluster and readying it for production -- advice needed
Agreed if you are using SSD you likely will not need much as much RAM. I said You could always do better with more RAM not You should definitely get more RAM :) On Tue, Mar 13, 2012 at 7:37 PM, Maxim Potekhin potek...@bnl.gov wrote: Thank you Edward. As can be expected, my data volume is a multiple of whatever RAM I can realistically buy, and in fact much bigger. In my very limited experience, the money might be well spent on multicore CPUs because it makes routine operations like compact/repair (which always include writes) so much faster, hence reducing the periods of high occupancy. I'm trying to scope out how much SSD I will need because it appears to be an economical solution to problems I had previously had. Regards, Maxim On 3/13/2012 10:40 PM, Edward Capriolo wrote: I am 1.0.7. I would suggest that. The memtable and JAMM stuff is very stable. I would not setup 0.8.X because with 1.1 coming soon 0.8.X is not likely to see to many more minor releases. You can always do better with more RAM up to the size of your data, having more ram them data size will not help noticeably . The off heap row cache can use this and the OS can cache disk blocks with it. Edward On Tue, Mar 13, 2012 at 3:15 PM, Maxim Potekhinpotek...@bnl.gov wrote: Dear All, after all the testing and continuous operation of my first cluster, I've been given an OK to build a second production Cassandra cluster in Europe. There were posts in recent weeks regarding the most stable and solid Cassandra version. I was wondering is anything better has appeared since it was last discussed. At this juncture, I don't need features, just rock solid stability. Are 0.8.* versions still acceptable, since I have experience with these, or should I take the plunge to 1+? I realize that I won't need more than 8GB RAM because I can't make Java heap too big. Is worth it still to pay money for extra RAM? Is the cache located outside of heap in recent versions? Thanks to all of you for the advice I'm receiving on this board. Best regards Maxim