AW: secondery indexes TTL - strange issues
Issue created. Will attach debug logs asap CASSANDRA-4670https://issues.apache.org/jira/browse/CASSANDRA-4670 Von: aaron morton [mailto:aa...@thelastpickle.com] Gesendet: Montag, 17. September 2012 03:46 An: user@cassandra.apache.org Betreff: Re: secondery indexes TTL - strange issues Date gets inserted and accessible via index query for some time. At some point in time Indexes are completely empty and start filling again (while new data enters the system). If you can reproduce this please create a ticket on https://issues.apache.org/jira/browse/CASSANDRA . If you can include DEBUG level logs that would be helpful. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/09/2012, at 10:08 PM, Roland Gude roland.g...@ez.nomailto:roland.g...@ez.no wrote: I am not sure it is compacting an old file: the same thing happens eeverytime I rebuild the index. New Files appear, get compacted and vanish. We have set up a new smaller cluster with fresh data. Same thing happens here as well. Date gets inserted and accessible via index query for some time. At some point in time Indexes are completely empty and start filling again (while new data enters the system). I am currently testing with SizeTiered on both the fresh set and the imported set. For the fresh set (which is significantly smaller) first results imply that the issue is not happening with SizeTieredCompaction - I have not yet tested everything that comes into my mind and will update if something new comes up. As for the failing query it is from the cli: get EventsByItem where 0003--1000--=utf8('someValue'); 0003--1000-- is a TUUID we use as a marker for a TimeSeries. (and equivalent queries with astyanax and hector as well) This is a cf with the issue: create column family EventsByItem with column_type = 'Standard' and comparator = 'TimeUUIDType' and default_validation_class = 'BytesType' and key_validation_class = 'BytesType' and read_repair_chance = 0.5 and dclocal_read_repair_chance = 0.0 and gc_grace = 864000 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' and caching = 'NONE' and column_metadata = [ {column_name : '--1000--', validation_class : BytesType, index_name : 'ebi_mandatorIndex', index_type : 0}, {column_name : '0002--1000--', validation_class : BytesType, index_name : 'ebi_itemidIndex', index_type : 0}, {column_name : '0003--1000--', validation_class : BytesType, index_name : 'ebi_eventtypeIndex', index_type : 0}] and compression_options={sstable_compression:SnappyCompressor, chunk_length_kb:64}; Von: aaron morton [mailto:aa...@thelastpickle.comhttp://thelastpickle.com] Gesendet: Freitag, 14. September 2012 10:46 An: user@cassandra.apache.orgmailto:user@cassandra.apache.org Betreff: Re: secondery indexes TTL - strange issues INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line 221) Compacted to [/var/lib/cassandra/data/Eventstore/EventsByItem/Eventstore-E ventsByItem.ebi_eventtypeIndex-he-10-Data.db,]. 78,623,000 to 373,348 (~0% of o riginal) bytes for 83 keys at 0.000280MB/s. Time: 1,272,883ms. There is a lot of weird things here. It could be levelled compaction compacting an older file for the first time. But that would be a guess. Rebuilding the index gives us back the data for a couple of minutes - then it vanishes again. Are you able to do a test with SiezedTieredCompaction ? Are you able to replicate the problem with a fresh testing CF and some test Data? If it's only a problem with imported data can you provide a sample of the failing query ? Any maybe the CF definition ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/09/2012, at 2:46 AM, Roland Gude roland.g...@ez.nomailto:roland.g...@ez.no wrote: Hi, we have been running a system on Cassandra 0.7 heavily relying on secondary indexes for columns with TTL. This has been working like a charm, but we are trying hard to move forward with Cassandra and are struggling at that point: When we put our data into a new cluster (any 1.1.x version - currently 1.1.5) , rebuild indexes and run our system, everything seems to work good - until in some point of time index queries do not return any data at all anymore (note that the TTL has not yet expired for several months). Rebuilding the index gives us back the data for a couple of minutes - then it vanishes again. What seems strange is that compaction apparently is very aggressive: INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line 221) Compacted to
Re: nodetool cfstats and compression
Yes. It is the space taken up on disk, including compaction. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/09/2012, at 6:30 AM, Jim Ancona j...@anconafamily.com wrote: Do the row size stats reported by 'nodetool cfstats' include the effect of compression? Thanks, Jim
Re: minor compaction and delete expired column-tombstones
Does minor compaction delete expired column-tombstones when the row is also present in another table which is No. Compaction is per Column Family. Tombstones will be expired by Minor Compaction if all fragments of the row are contained in the SSTables being compacted. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/09/2012, at 6:32 AM, Rene Kochen rene.koc...@schange.com wrote: Hi all, Does minor compaction delete expired column-tombstones when the row is also present in another table which is not subject to the minor compaction? Example: Say there are 5 SStables: - Customers_0 (10 MB) - Customers_1 (10 MB) - Customers_2 (10 MB) - Customers_3 (10 MB) - Customers_4 (30 MB) A minor compaction is triggered which will compact the similar sized tables 0 to 3. In these tables is a customer record with key C1 with an expired column tombstone. Customer C1 is also present in table 4. Will the minor compaction delete the column (i.e. will the tombstone be present in the newly created table)? Thanks, Rene
Re: Disk configuration in new cluster node
4 drives for data and 1 drive for commitlog, How are you configuring the drives ? It's normally best to present one big data volume, e.g. using raid 0, and put the commit log on say the system mirror. will the node balance out the load on the drives, or is it agnostic to usage of drives underlying data directories? It will not. There is a feature coming in v1.2 to add better support for JBOD configurations. A word of warning. If you put more than 300GB to 400GB per node you may end experience some issues such as repair, compaction or disaster recovery taking a long time. These are simply soft limits that provide a good rule of thumb for HDD based systems with 1 GigE networking. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/09/2012, at 7:39 AM, Casey Deccio ca...@deccio.net wrote: I'm building a new cluster (to replace the broken setup I've written about in previous posts) that will consist of only two nodes. I understand that I'll be sacrificing high availability of writes if one of the nodes goes down, and I'm okay with that. I'm more interested in maintaining high consistency and high read availability. So I've decided to use a write-level consistency of ALL and read-level consistency of ONE. My first question is about the drives in this setup. If I initially set up the system with, say, 4 drives for data and 1 drive for commitlog, and later I decide to add more capacity to the node by adding more drives for data (adding the new data directory entries in cassandra.yaml), will the node balance out the load on the drives, or is it agnostic to usage of drives underlying data directories? My second question has to do with RAID striping. Would it be more useful to stripe the disk with the commitlog or the disks with the data? Of course, with a single striped volume for data directories, it would be more difficult to add capacity to the node later, as I've suggested above. Casey
Re: Disk configuration in new cluster node
A word of warning. If you put more than 300GB to 400GB per node you may end experience some issues ... I think this is probably the solution to your multiple disk problem. You could use easily one single disk to store the data on, and one disk for the commitlog. No issues with JBOD, RAID or whatever. If you want to improve throughput you might consider a RAID-0 setup. Best regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. 2012/9/17 aaron morton aa...@thelastpickle.com 4 drives for data and 1 drive for commitlog, How are you configuring the drives ? It's normally best to present one big data volume, e.g. using raid 0, and put the commit log on say the system mirror. will the node balance out the load on the drives, or is it agnostic to usage of drives underlying data directories? It will not. There is a feature coming in v1.2 to add better support for JBOD configurations. A word of warning. If you put more than 300GB to 400GB per node you may end experience some issues such as repair, compaction or disaster recovery taking a long time. These are simply soft limits that provide a good rule of thumb for HDD based systems with 1 GigE networking. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/09/2012, at 7:39 AM, Casey Deccio ca...@deccio.net wrote: I'm building a new cluster (to replace the broken setup I've written about in previous posts) that will consist of only two nodes. I understand that I'll be sacrificing high availability of writes if one of the nodes goes down, and I'm okay with that. I'm more interested in maintaining high consistency and high read availability. So I've decided to use a write-level consistency of ALL and read-level consistency of ONE. My first question is about the drives in this setup. If I initially set up the system with, say, 4 drives for data and 1 drive for commitlog, and later I decide to add more capacity to the node by adding more drives for data (adding the new data directory entries in cassandra.yaml), will the node balance out the load on the drives, or is it agnostic to usage of drives underlying data directories? My second question has to do with RAID striping. Would it be more useful to stripe the disk with the commitlog or the disks with the data? Of course, with a single striped volume for data directories, it would be more difficult to add capacity to the node later, as I've suggested above. Casey
Re: minor compaction and delete expired column-tombstones
Oke, thanks! So a column tombstone will only be removed if all row fragments are present in the tables being compacted. I have a row called Index which contains columns like page0, page1, page2, etc. Every several minutes, new columns are created and old ones deleted. The problem is that I now have an Index row in several SSTables, but the column tombstones are never deleted. And reading the Index row (and all its column tombstones) takes longer and longer. If I do a major compaction, all tombstones are deleted and reading the index row takes one millisecond again (and all the garbage-collect issues because of this). Is it not advised to use rows with many new column creates/deletes (because of how minor compactions work)? Thanks! Rene 2012/9/17 aaron morton aa...@thelastpickle.com: Does minor compaction delete expired column-tombstones when the row is also present in another table which is No. Compaction is per Column Family. Tombstones will be expired by Minor Compaction if all fragments of the row are contained in the SSTables being compacted. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/09/2012, at 6:32 AM, Rene Kochen rene.koc...@schange.com wrote: Hi all, Does minor compaction delete expired column-tombstones when the row is also present in another table which is not subject to the minor compaction? Example: Say there are 5 SStables: - Customers_0 (10 MB) - Customers_1 (10 MB) - Customers_2 (10 MB) - Customers_3 (10 MB) - Customers_4 (30 MB) A minor compaction is triggered which will compact the similar sized tables 0 to 3. In these tables is a customer record with key C1 with an expired column tombstone. Customer C1 is also present in table 4. Will the minor compaction delete the column (i.e. will the tombstone be present in the newly created table)? Thanks, Rene
Re: Query advice to prevent node overload
On Sep 17, 2012, at 3:04 AM, aaron morton aa...@thelastpickle.com wrote: I have a schema that represents a filesystem and one example of a Super CF is: This may help with some ideas http://www.datastax.com/dev/blog/cassandra-file-system-design In general we advise to avoid Super Columns if possible. They are often slower, and the sub columns are not indexed. Meaning all the sub columns have to be read into memory. So if I set column_count = 1, as I have now, but fetch 1000 dirs (rows) and each one happens to have 1 files (columns) the dataset is 1000x1. This is the way the query works internally. Multiget is simply a collections of independent gets. The multiget() is more efficient, but I'm having trouble trying to limit the size of the data returned in order to not crash the cassandra node. Often less is more. I would only ask for a few 10's of rows at a time, or try to limit the size of the returned query to a few MB's. Otherwise a lot of data get's dragged through cassandra, the network and finally Python. You may want to consider a CF like the inode CF it the article above. Where the parent dir is a column with a secondary index. Thanks Aaron! I will take your points into consideration. Best regards, André
Re: Repair: Issue in netstats
Sorry for the delay; been out of the loop. Could this problem be due to running repair on a node upgraded to 1.0.11 but the other node in the cluster is still at 0.8.x ? On Fri, Sep 7, 2012 at 9:11 PM, Sylvain Lebresne sylv...@datastax.comwrote: That obviously shouldn't happen and I don't remember any open ticket related to that. You might want to open a ticket on jira (https://issues.apache.org/jira/browse/CASSANDRA). -- Sylvain On Fri, Sep 7, 2012 at 10:50 AM, B R software.research.w...@gmail.com wrote: We have upgraded a 0.8 cluster to 1.0.11. After upgrading the first node and running upgradesstables, we have run a routine repair operation, This operation has been running for a long time and does not seem to be progressing. Running netstats has shown unexpected values for percentages as shown below. Any clue as to what could be be issue ? bin/nodetool -h 172.16.0.34 netstats Mode: NORMAL Streaming to: /172.16.0.29 /data/cassandra/data/Keyspace1/Standard1-hd-16609-Data.db sections=116 progress=19946657796608/334406146 - 5964800% /data/cassandra/data/Keyspace1/Standard1-hd-16618-Data.db sections=116 progress=0/179880575 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16620-Data.db sections=12 progress=0/1448134 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16616-Data.db sections=116 progress=0/350403675 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16602-Data.db sections=89 progress=0/27569594 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16615-Data.db sections=1 progress=0/95043 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16617-Data.db sections=1 progress=0/232800 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16612-Data.db sections=1 progress=0/82705 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16603-Data.db sections=116 progress=0/724836994 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16607-Data.db sections=116 progress=0/401797714 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16608-Data.db sections=2 progress=0/301297 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16619-Data.db sections=3 progress=0/829914 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16604-Data.db sections=2 progress=0/288460 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16610-Data.db sections=13 progress=0/1954639 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16606-Data.db sections=8 progress=0/1187649 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16613-Data.db sections=1 progress=0/141714 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16614-Data.db sections=116 progress=0/390168999 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16609-Data.db sections=111 progress=13620592201686/303748754 - 4484163% /data/cassandra/data/Keyspace1/Standard1-hd-16618-Data.db sections=110 progress=0/162808076 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16620-Data.db sections=10 progress=0/1922996 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16616-Data.db sections=111 progress=0/350744309 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16602-Data.db sections=87 progress=0/24364920 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16615-Data.db sections=2 progress=0/228764 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16603-Data.db sections=111 progress=0/720722886 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16607-Data.db sections=111 progress=0/364643588 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16608-Data.db sections=4 progress=0/963207 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16619-Data.db sections=2 progress=0/360024 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16604-Data.db sections=1 progress=0/72842 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16610-Data.db sections=11 progress=0/1381176 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16606-Data.db sections=13 progress=0/3266736 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16613-Data.db sections=2 progress=0/639705 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16614-Data.db sections=111 progress=0/358443928 - 0% Nothing streaming from /172.16.0.29 Pool NameActive Pending Completed Commandsn/a 0 19 Responses n/a 02050444 Regards.
Re: Query advice to prevent node overload
On Sep 17, 2012, at 3:04 AM, aaron morton aa...@thelastpickle.com wrote: I have a schema that represents a filesystem and one example of a Super CF is: This may help with some ideas http://www.datastax.com/dev/blog/cassandra-file-system-design Could you explain the usage of the sentinel? Which nodes have it? I understand that it should be used for recursive dir listings, to restrict the nodes returned to the /tmp/ dir, but I'm not sure I understand how it works Thanks, André
Re: Many ParNew collections
Thanks Aaron, I found the problem. It's in this thread: minor compaction and delete expired column-tombstones. The problem was that I have one big row called Index which contains many tombstones. Reading all these tombstones caused the memory issues. I think node 1 and 3 have had enough minor compactions so that the tombstones were removed. The second node still contains several old SSTables and it takes some time before the whole thing is compacted again. Thanks, Rene 2012/9/17 aaron morton aa...@thelastpickle.com: The second node (the one suffering from many GC) has a high read latency compared to the others. Another thing is that the compacted row maximum size is bigger than on the other nodes. Node 2 also: * has about 220MB of data, while the others have about 45MB * has about 1 Million keys while the others have about 0.3 Million - Should the other nodes also have that wide row, yes. Are you running repair ? What CL are you using ? - Could repeatedly reading a wide row cause parnew problems? Maybe. Are you reading the whole thing ? It's only 22MB, it's big but not huge. I would: * ensure repair is running and completing, this may even out the data load. * determine if GC is associate with compactions, repair or general activity. * if Gc is associated with compactions the simple thing is to reduce concurrent_compactions and in_memory_compaction_limit in the yaml. Note this is often a simple / quick fix that can increase IO load and slow down compaction. The harder thing is to tune the JVM memory settings (the defaults often do a good job). Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/09/2012, at 10:41 PM, Rene Kochen rene.koc...@schange.com wrote: Thanks Aaron, At another production site the exact same problems occur (also after ~6 months). Here I have a very small cluster of three nodes with replication factor = 3. One of the three nodes begins to have many long Parnews and high CPU load. I upgraded to Cassandra 1.0.11, but the GC problem still continues on that node. If I look at the CFStats of the three nodes, there is one CF which is different: Column Family: Logs SSTable count: 1 Space used (live): 47606705 Space used (total): 47606705 Number of Keys (estimate): 338176 Memtable Columns Count: 22297 Memtable Data Size: 51542275 Memtable Switch Count: 1 Read Count: 189441 Read Latency: 0,768 ms. Write Count: 123411 Write Latency: 0,035 ms. Pending Tasks: 0 Bloom Filter False Postives: 0 Bloom Filter False Ratio: 0,0 Bloom Filter Space Used: 721456 Key cache capacity: 20 Key cache size: 56685 Key cache hit rate: 0.9132482658217008 Row cache: disabled Compacted row minimum size: 73 Compacted row maximum size: 263210 Compacted row mean size: 94 Column Family: Logs SSTable count: 3 Space used (live): 233688199 Space used (total): 233688199 Number of Keys (estimate): 1191936 Memtable Columns Count: 20147 Memtable Data Size: 47067518 Memtable Switch Count: 1 Read Count: 188473 Read Latency: 4031,791 ms. Write Count: 120412 Write Latency: 0,042 ms. Pending Tasks: 0 Bloom Filter False Postives: 234 Bloom Filter False Ratio: 0,0 Bloom Filter Space Used: 2603808 Key cache capacity: 20 Key cache size: 5153 Key cache hit rate: 1.0 Row cache: disabled Compacted row minimum size: 73 Compacted row maximum size: 25109160 Compacted row mean size: 156 Column Family: Logs SSTable count: 1 Space used (live): 47714798 Space used (total): 47714798 Number of Keys (estimate): 338176 Memtable Columns Count: 29046 Memtable Data Size: 66585390 Memtable Switch Count: 1 Read Count: 196048 Read Latency: 1,466 ms. Write Count: 127709 Write Latency: 0,034 ms. Pending Tasks: 0 Bloom Filter False Postives: 8 Bloom Filter False Ratio: 0,00847 Bloom Filter Space Used: 720496 Key cache capacity: 20 Key cache size: 54166 Key cache hit rate: 0.9833443960960739 Row cache: disabled Compacted row minimum size: 73 Compacted row maximum size: 263210 Compacted row mean size: 95 The second node (the one suffering from many GC) has a high read latency compared to the others. Another thing is that the compacted row maximum size is bigger than on the other nodes. What puzzles me: - Should the other nodes also have that wide row, because the replication factor is three and I only have three nodes? I must say that the wide row is probably the index row which has columns added/removed continuously. Maybe the other nodes lost much data because of compactions? - Could repeatedly reading a wide row cause parnew problems? Thanks! Rene 2012/8/17 aaron morton aa...@thelastpickle.com: - Cassandra 0.7.10 You _really_ should look at getting up to 1.1 :) Memory management is much better and the JVM heap requirements are less. However, there is one node with high read latency and far too many ParNew collections (compared
Re: Repair: Issue in netstats
On Mon, Sep 17, 2012 at 11:06 AM, B R software.research.w...@gmail.com wrote: Could this problem be due to running repair on a node upgraded to 1.0.11 but the other node in the cluster is still at 0.8.x ? Yes, repair (as all operation requiring streaming) doesn't work correctly across major Cassandra version. First thing you should do is to finish the upgrade of the nodes. -- Sylvain On Fri, Sep 7, 2012 at 9:11 PM, Sylvain Lebresne sylv...@datastax.com wrote: That obviously shouldn't happen and I don't remember any open ticket related to that. You might want to open a ticket on jira (https://issues.apache.org/jira/browse/CASSANDRA). -- Sylvain On Fri, Sep 7, 2012 at 10:50 AM, B R software.research.w...@gmail.com wrote: We have upgraded a 0.8 cluster to 1.0.11. After upgrading the first node and running upgradesstables, we have run a routine repair operation, This operation has been running for a long time and does not seem to be progressing. Running netstats has shown unexpected values for percentages as shown below. Any clue as to what could be be issue ? bin/nodetool -h 172.16.0.34 netstats Mode: NORMAL Streaming to: /172.16.0.29 /data/cassandra/data/Keyspace1/Standard1-hd-16609-Data.db sections=116 progress=19946657796608/334406146 - 5964800% /data/cassandra/data/Keyspace1/Standard1-hd-16618-Data.db sections=116 progress=0/179880575 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16620-Data.db sections=12 progress=0/1448134 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16616-Data.db sections=116 progress=0/350403675 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16602-Data.db sections=89 progress=0/27569594 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16615-Data.db sections=1 progress=0/95043 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16617-Data.db sections=1 progress=0/232800 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16612-Data.db sections=1 progress=0/82705 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16603-Data.db sections=116 progress=0/724836994 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16607-Data.db sections=116 progress=0/401797714 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16608-Data.db sections=2 progress=0/301297 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16619-Data.db sections=3 progress=0/829914 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16604-Data.db sections=2 progress=0/288460 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16610-Data.db sections=13 progress=0/1954639 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16606-Data.db sections=8 progress=0/1187649 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16613-Data.db sections=1 progress=0/141714 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16614-Data.db sections=116 progress=0/390168999 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16609-Data.db sections=111 progress=13620592201686/303748754 - 4484163% /data/cassandra/data/Keyspace1/Standard1-hd-16618-Data.db sections=110 progress=0/162808076 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16620-Data.db sections=10 progress=0/1922996 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16616-Data.db sections=111 progress=0/350744309 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16602-Data.db sections=87 progress=0/24364920 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16615-Data.db sections=2 progress=0/228764 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16603-Data.db sections=111 progress=0/720722886 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16607-Data.db sections=111 progress=0/364643588 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16608-Data.db sections=4 progress=0/963207 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16619-Data.db sections=2 progress=0/360024 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16604-Data.db sections=1 progress=0/72842 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16610-Data.db sections=11 progress=0/1381176 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16606-Data.db sections=13 progress=0/3266736 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16613-Data.db sections=2 progress=0/639705 - 0% /data/cassandra/data/Keyspace1/Standard1-hd-16614-Data.db sections=111 progress=0/358443928 - 0% Nothing streaming from /172.16.0.29 Pool NameActive Pending Completed Commandsn/a 0 19 Responses n/a 02050444 Regards.
Re: cassandra/hadoop BulkOutputFormat failures
As suggested, it was a version-skew problem. Thanks. Brian On Sep 14, 2012, at 11:34 PM, Jeremy Hanna wrote: A couple of guesses: - are you mixing versions of Cassandra? Streaming differences between versions might throw this error. That is, are you bulk loading with one version of Cassandra into a cluster that's a different version? - (shot in the dark) is your cluster overwhelmed for some reason? If the temp dir hasn't been cleaned up yet, you are able to retry, fwiw. Jeremy On Sep 14, 2012, at 1:34 PM, Brian Jeltema brian.jelt...@digitalenvoy.net wrote: I'm trying to do a bulk load from a Cassandra/Hadoop job using the BulkOutputFormat class. It appears that the reducers are generating the SSTables, but is failing to load them into the cluster: 12/09/14 14:08:13 INFO mapred.JobClient: Task Id : attempt_201208201337_0184_r_04_0, Status : FAILED java.io.IOException: Too many hosts failed: [/10.4.0.6, /10.4.0.5, /10.4.0.2, /10.4.0.1, /10.4.0.3, /10.4.0.4] at org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:242) at org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:207) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:579) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) A brief look at the BulkOutputFormat class shows that it depends on SSTableLoader. My Hadoop cluster and my Cassandra cluster are co-located on the same set of machines. I haven't found any stated restrictions, but does this technique only work if the Hadoop cluster is distinct from the Cassandra cluster? Any suggestions on how to get past this problem? Thanks in advance. Brian
Cassandra Messages Dropped
Hello, While under load, we have occasionally been seeing messages dropped errors in our cassandra log. Doing some research, I understand this is part of Cassandra's design to shed load, and we should look at the tpstats-like output to determine what should be done to resolve the situation. Typically, you will see lots of messages blocked or pending, and that might be an indicator that a specific part of hardware needs to be improved/tuned/upgraded. However, looking at the output we are getting, I'm finding it difficult to see what needs to be tuned, as it looks to me cassandra is handling the load within the mutation stage: NFO [ScheduledTasks:1] 2012-09-17 06:28:03,266 MessagingService.java (line 658) 119 MUTATION messages dropped in last 5000ms INFO [ScheduledTasks:1] 2012-09-17 06:28:03,645 StatusLogger.java (line 57) Pool NameActive Pending Blocked INFO [ScheduledTasks:1] 2012-09-17 06:28:03,836 StatusLogger.java (line 72) ReadStage 3 3 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) RequestResponseStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) ReadRepairStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) MutationStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) ReplicateOnWriteStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) GossipStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) AntiEntropyStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) MigrationStage0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) StreamStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) MemtablePostFlusher 1 5 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) FlushWriter 1 5 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) MiscStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) commitlog_archiver0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) InternalResponseStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) AntiEntropySessions 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 72) HintedHandoff 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 77) CompactionManager 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 89) MessagingServicen/a 0,0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 99) Cache Type Size Capacity KeysToSave Provider INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 100) KeyCache2184533 2184533 all INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 106) RowCache 00 all org.apache.cassandra.cache.SerializingCacheProvider INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 113) ColumnFamilyMemtable ops,data INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 116) system.NodeIdInfo 0,0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line 116) system.IndexInfo 0,0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line 116) system.LocationInfo 0,0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line 116) system.Versions 0,0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line 116) system.schema_keyspaces 0,0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line 116) system.Migrations 0,0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line 116) system.schema_columnfamilies 0,0
Astyanax error
Hello, I am tyring to retrieve a list of Column Names (that are defined as Integer) from a CF with RowKey as Integer as well. (I don't care for the column values that are just nulls) Following is snippet of my Astyanax code. I am getting 0 columns but I know the key that I am querying contains a few hundred columns. Any idea what part of the code below is incorrect ? Thanks. Astyanax code: ColumnFamilyInteger, Integer CF1 = new ColumnFamilyInteger, Integer( CF1, // Column Family Name IntegerSerializer.get(), // Key Serializer IntegerSerializer.get()); // Column Serializer //Reading data int NUM_EVENTS = 9; StopWatch clock = new StopWatch(); clock.start(); for (int i = 0; i NUM_EVENTS; ++i) { ColumnListInteger result = keyspace.prepareQuery(CF1) .getKey(1919) .execute().getResult(); System.out.println( results are: + result.size() ); } clock.stop(); CF definition: === [default@ks1] describe CF1; ColumnFamily: CF1 Key Validation Class: org.apache.cassandra.db.marshal.IntegerType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.IntegerType
Re: Disk configuration in new cluster node
On Mon, Sep 17, 2012 at 1:19 AM, aaron morton aa...@thelastpickle.comwrote: 4 drives for data and 1 drive for commitlog, How are you configuring the drives ? It's normally best to present one big data volume, e.g. using raid 0, and put the commit log on say the system mirror. Given the advice to use a single RAID 0 volume, I think that's what I'll do. By system mirror, you are referring to the volume on which the OS is installed? Should the volume with the commit log also have multiple disks in a RAID 0 volume? Alternatively, would a RAID 1 setup be reasonable for the system volume/OS, so the system itself can be resilient to disk failure, or would that kill commit performance? Any preference to hardware RAID 0 vs. using something like mdadm? A word of warning. If you put more than 300GB to 400GB per node you may end experience some issues such as repair, compaction or disaster recovery taking a long time. These are simply soft limits that provide a good rule of thumb for HDD based systems with 1 GigE networking. Hmm. My hope was to be able to run a minimal number of nodes and maximize their capacity because it doesn't make sense in my case to build or maintain a large cluster. I wanted to run a two-node setup (RF=1, RCL=ONE, WCL=ALL), each with several disks having large capacity, totaling 10 - 12 TB. Is this (another) bad idea? Casey
Re: minor compaction and delete expired column-tombstones
We've run exactly into the same problem recently. Some specific keys in a couple CFs accumulate a fair amount of column churn over time. Pre Cassandra 1.x we scheduled full compactions often to purge them. However, when we moved to 1.x but we adopted the recommended practice of avoiding full compactions. The problem took a while to manifest itself, but over the course of several weeks (few months) of not doing full compactions the load on those services slowly increased...and despite we have everything monitored, it was not trivial to find out that it was the accumulation of tombstones on 'some' keys, for 'some' CF in the cluster that were really causing long latencies and CPU spikes (high CPU is a typical signature when having a fair amount of tombstones in the SSTables). Is there any JIRA or enhancement to perhaps be able to detect when certain column tombstones can be deleted in minor compactions? The new introduction of SSTable min-max timestamps might help? or perhaps there are new ones coming up that I'm not aware of I'm saying this because there is absolutely no way (that I know of) to find out or monitor when Cassandra encounters many column tombstones when doing searches. That alone could help detect these cases so one can change the data model and/or realize that needs full compactions. For example a new metric at the CF level that tracks % of tombstones read per row (ideally a histogram based on row size), or perhaps spit something out in the logs (a la mysql slowquery log) when a wide row is read and a certain % of tombstone columns are encountered...this alone can be a huge help in at least detecting the latent problem. ...what we had to do to fully debug and understand the issue was to build some tools that scanned SSTables and provided some of those stats. In a large cluster that is painful to do. Anyway, just wanted to chime in the thread to provide our input in the matter. Cheers, Josep M. On Mon, Sep 17, 2012 at 2:01 AM, Rene Kochen rene.koc...@emea.schange.comwrote: Oke, thanks! So a column tombstone will only be removed if all row fragments are present in the tables being compacted. I have a row called Index which contains columns like page0, page1, page2, etc. Every several minutes, new columns are created and old ones deleted. The problem is that I now have an Index row in several SSTables, but the column tombstones are never deleted. And reading the Index row (and all its column tombstones) takes longer and longer. If I do a major compaction, all tombstones are deleted and reading the index row takes one millisecond again (and all the garbage-collect issues because of this). Is it not advised to use rows with many new column creates/deletes (because of how minor compactions work)? Thanks! Rene 2012/9/17 aaron morton aa...@thelastpickle.com: Does minor compaction delete expired column-tombstones when the row is also present in another table which is No. Compaction is per Column Family. Tombstones will be expired by Minor Compaction if all fragments of the row are contained in the SSTables being compacted. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/09/2012, at 6:32 AM, Rene Kochen rene.koc...@schange.com wrote: Hi all, Does minor compaction delete expired column-tombstones when the row is also present in another table which is not subject to the minor compaction? Example: Say there are 5 SStables: - Customers_0 (10 MB) - Customers_1 (10 MB) - Customers_2 (10 MB) - Customers_3 (10 MB) - Customers_4 (30 MB) A minor compaction is triggered which will compact the similar sized tables 0 to 3. In these tables is a customer record with key C1 with an expired column tombstone. Customer C1 is also present in table 4. Will the minor compaction delete the column (i.e. will the tombstone be present in the newly created table)? Thanks, Rene
Re: minor compaction and delete expired column-tombstones
Is there any JIRA or enhancement to perhaps be able to detect when certain column tombstones can be deleted in minor compactions? The new introduction of SSTable min-max timestamps might help? or perhaps there are new ones coming up that I'm not aware of https://issues.apache.org/jira/browse/CASSANDRA-4671 -- Sylvain
persistent compaction issue (1.1.4 and 1.1.5)
Hi All, I have an issue where each one of my nodes (currently all running at 1.1.5) is reporting around 30,000 pending compactions. I understand that a pending compaction doesn't necessarily mean it is a scheduled task however I'm confused why this behavior is occurring. It is the same on all nodes, occasionally goes down 5k pending compaction tasks, and then returns to 25,000-35,000 compaction tasks pending. I have tried a repair operation/scrub operation on two of the nodes and while compactions initially happen the number of pending compactions does not decrease. Any ideas? Thanks for your time. Best, michael 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook
Bloom Filters in Cassandra
How are bloom filters used in Cassandra? Is my understanding correct in that there is one per SSTable encapsulating what keys are in the SSTable? Please advise.
Is Cassandra right for me?
Hello, I am new to Cassandra and I am in doubt if Cassandra is the right technology to use in the architecture I am defining. Also, I saw a presentation which said that if I don't have rows with more than a hundred rows in Cassandra, whether I am doing something wrong or I shouldn't be using Cassandra. Therefore, it might be the case I am doing something wrong. If you could help me to find out the answer for these questions by giving any feedback, it would be highly appreciated. Here is my need and what I am thinking in using Cassandra for: - I need to support a high volume of writes per second. I might have a billion writes per hour - I need to write non-structured data that will be processed later by hadoop processes to generate structured data from it. Later, I index the structured data using SOLR or SOLANDRA, so the data can be consulted by my end user application. Is Cassandra recommended for that, or should I be thinking in writting directly to HDFS files, for instance? What's the main advantage I get from storing data in a nosql service like Cassandra, when compared to storing files into HDFS? - Usually I will write json data associated to an ID and my hadoop processes will process this data to write data to a database. I have two doubts here: - If I don't need to perform complicated queries in Cassandra, should I store the json-like data just as a column value? I am afraid of doing something wrong here, as I would need just to store the json file and some more 5 or 6 fields to query the files later. - Does it make sense to you to use hadoop to process data from Cassandra and store the results in a database, like HBase? Once I have structured data, is there any reason I should use Cassandra instead of HBase? I am sorry if the questions are too dummy, I have been watching a lot of videos and reading a lot of documentation about Cassandra, but honestly, more I read more I have questions. Thanks in advance. Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
are counters stable enough for production?
Hi, Does anyone have any experience with using Cassandra counters in production? We rely heavily on them and recently we've got a few very serious problems. Our counters values suddenly became a few times higher than expected. From the business point of view this is a disaster :/ Also there a few open major bugs related to them. Some of them for quite long (months). We are seriously considering going back to other solutions (e.g. SQL databases). We simply cannot afford incorrect counter values. We can tolerate loosing a few increments from time to time, but we cannot tolerate having counters suddenly 3 times higher or lower than the expected values. What is the current status of counters? Should I consider them a production-ready feature and we just have some bad luck? Or should I rather consider them as a experimental-feature and look for some other solutions? Do you have any experiences with them? Any comments would be very helpful for us! Thanks, Bartek
Re: Query advice to prevent node overload
Could you explain the usage of the sentinel? Queries that use a secondary index must include an equality clause. That's the sentinel is there for… select filename from inode where filename ‘/tmp’ and filename ‘/tmq’ and sentinel = ‘x’; Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/09/2012, at 9:17 PM, André Cruz andre.c...@co.sapo.pt wrote: On Sep 17, 2012, at 3:04 AM, aaron morton aa...@thelastpickle.com wrote: I have a schema that represents a filesystem and one example of a Super CF is: This may help with some ideas http://www.datastax.com/dev/blog/cassandra-file-system-design Could you explain the usage of the sentinel? Which nodes have it? I understand that it should be used for recursive dir listings, to restrict the nodes returned to the /tmp/ dir, but I'm not sure I understand how it works Thanks, André
Re: Cassandra Messages Dropped
INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) MemtablePostFlusher 1 5 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) FlushWriter 1 5 0 Looks suspiciously like http://mail-archives.apache.org/mod_mbox/cassandra-user/201209.mbox/%3c9fb0e801-b1ed-41c4-9939-bafbddf15...@thelastpickle.com%3E What version are you on ? Are there any ERROR log messages before this ? Are you seeing MutationStage back up ? Are you see log messages from GCInspector ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/09/2012, at 2:16 AM, Michael Theroux mthero...@yahoo.com wrote: Hello, While under load, we have occasionally been seeing messages dropped errors in our cassandra log. Doing some research, I understand this is part of Cassandra's design to shed load, and we should look at the tpstats-like output to determine what should be done to resolve the situation. Typically, you will see lots of messages blocked or pending, and that might be an indicator that a specific part of hardware needs to be improved/tuned/upgraded. However, looking at the output we are getting, I'm finding it difficult to see what needs to be tuned, as it looks to me cassandra is handling the load within the mutation stage: NFO [ScheduledTasks:1] 2012-09-17 06:28:03,266 MessagingService.java (line 658) 119 MUTATION messages dropped in last 5000ms INFO [ScheduledTasks:1] 2012-09-17 06:28:03,645 StatusLogger.java (line 57) Pool NameActive Pending Blocked INFO [ScheduledTasks:1] 2012-09-17 06:28:03,836 StatusLogger.java (line 72) ReadStage 3 3 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) RequestResponseStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) ReadRepairStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) MutationStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) ReplicateOnWriteStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) GossipStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) AntiEntropyStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) MigrationStage0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) StreamStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) MemtablePostFlusher 1 5 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) FlushWriter 1 5 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) MiscStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) commitlog_archiver0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) InternalResponseStage 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) AntiEntropySessions 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 72) HintedHandoff 0 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 77) CompactionManager 0 0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 89) MessagingServicen/a 0,0 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 99) Cache Type Size Capacity KeysToSave Provider INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 100) KeyCache2184533 2184533 all INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 106) RowCache 00 all org.apache.cassandra.cache.SerializingCacheProvider INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 113) ColumnFamilyMemtable ops,data INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 116)
Re: Stream definition is lost after server restart
Sorry, Forgot to mention that I'm using Cassandra 1.1.3 -- Thank you..! - 071-6372089 Ishan's info: www.ishans.info මගේ සටහන්: www.siblog.ishans.info Ishan's way: www.blog.ishans.info - On Mon, Sep 17, 2012 at 9:32 PM, Ishan Thilina is...@ishans.info wrote: Hi all, I am currently working on a project which uses Cassandra. I have a task running in my server which will periodically look at a certain set of pre-defined data (of the server) and writes them to Cassandra. The procedure for this work is as follows. 1. I give a name and a version to the task. 2. I configure what data should the task monitor. 3. The task will then look if a stream definition exists for the task using the task name and its version. 4. If a definition does not exist, then the task will create a definition (By looking at the types of data to be monitored). 5. Then (or if a stream definition exists) the task will write the data to Cassandra 6. The task will repeat the steps 3 to 5 forever (even after server restart). Please note that there can be multiple tasks like this monitoring different sets of data. The problem occurs when the server is used for few days and when several (around 100) stream definitions are created, I have observed that after the server is restarted, a stream definition does not exist exception is thrown in the step 3. I manually checked and the stream definition actually exists. When a new server is used (with a clean Cassandra server), then everything works fine for few days. But most of the time after a few days, the same issue arises. Has anyone experienced this..? -- Thank you..! - 071-6372089 Ishan's info: www.ishans.info මගේ සටහන්: www.siblog.ishans.info Ishan's way: www.blog.ishans.info -
HTimedOutException and cluster not working
Hello, A context to our environment, we have a clusters of 9 nodes with a few keyspaces. The client write to the cluster with consistency level of one to a keyspace in the cluster with a replication factor of 3. The hector client is configured such that all the nodes in cluster is specified and so that we would want to ensure that at any write request, two nodes, can fail and one write is succcess to the cluster node. However, under certain situation, we seen in the log, HTimedOutException is logged during writing to the cluster. Hector client thus failover to the next node in the cluster but what we noticed is that, the same exception, HTimedOutException is logged for all the nodes. This result that the cluster is not working as a whole. Logically, we checked all the nodes in the cluster for load. Only node-3 seem to have high pending MutationStage when nodetool tpstats is run. Other nodes are fine with 0 active and 0 pending for all the stages. /nodetool -h localhost tpstats Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 6983 0 0 RequestResponseStage 0 0 1252368951 0 0 MutationStage 16 2177067 879092633 0 0 ReadRepairStage 0 0 3648106 0 0 ReplicateOnWriteStage 0 0 33722610 0 0 GossipStage 0 0 20504608 0 0 AntiEntropyStage 0 0 1197 0 0 MigrationStage 0 0 89 0 0 MemtablePostFlusher 0 0 5659 0 0 StreamStage 0 0 296 0 0 FlushWriter 0 0 5616 0 1321 MiscStage 0 0 5964 0 0 AntiEntropySessions 0 0 88 0 0 InternalResponseStage 0 0 27 0 0 HintedHandoff 1 2 5976 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 BINARY 0 READ 178 MUTATION 17467 REQUEST_RESPONSE 0 We proceed to check if there is any compaction in node-3 and found out the following: ./nodetool -hlocalhost compactionstats pending tasks: 196 compaction type keyspace column family bytes compacted bytes total progress Cleanup MyKeyspace MyCF 6946398685 10230720119 67.90% Question: * with a replication factor of 3 in the keyspace and client write consistency level of one, in the situation above, and the current hector client settings and cluster settings, it should be possible in this scenario, write success on one of the nodes even though node-3 is too busy or failing for any reason? * when hector client failover to other nodes, basically all the nodes fail, why is this so? * what factors that increase MutationStage active and pending values? Thank you for any comments and insight Regards, Jason