[jira] [Updated] (CASSANDRA-5025) Schema push/pull race
[ https://issues.apache.org/jira/browse/CASSANDRA-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Herron updated CASSANDRA-5025: Attachment: 5025-v5.txt (Following up on IRC discussion) * My patch 3 incorrectly hardcoded Schema.emptyVersion for the announcement in SS.joinTokenRing. For actual bootstrap scenario, the schema version should be Schema.emptyVersion anyway. * Since Schema.updateVersion actually reads rows, I wondered if this will be equivalent toSchema.emptyVersion (perhaps Schema tables themselves are represented already by this point in time?) Brandon said that he would check this. * I had asked in a previous comment in this jira, and Brandon also noticed that SS.joinTokenRing had been calling Schema.updateVersionAndAnnounce and Schema.passiveAnnounce in quick succession. Brandon said that it should be removed. I'm attaching patch 5 with these changes: * Reverted my hardcoded Schema.emptyVersion in SS.joinTokenRing (back to original Schema.updateVersionAndAnnounce). * Removed apparently redundant call to Schema.passiveAnnounce. Brandon, could you please confirm whether it is safe to assume that Schema.updateVersionAndAnnounce would emit Schema.emptyVersion in a bootstrap scenario? Schema push/pull race - Key: CASSANDRA-5025 URL: https://issues.apache.org/jira/browse/CASSANDRA-5025 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 1.1.8 Attachments: 5025.txt, 5025-v2.txt, 5025-v3.txt, 5025-v4.txt, 5025-v5.txt When a schema change is made, the coordinator pushes the delta to the other nodes in the cluster. This is more efficient than sending the entire schema. But the coordinator also announces the new schema version, so the other nodes' reception of the new version races with processing the delta, and usually seeing the new schema wins. So the other nodes also issue a pull to the coordinator for the entire schema. Thus, schema changes tend to become O(n) in the number of KS and CF present. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5025) Schema push/pull race
[ https://issues.apache.org/jira/browse/CASSANDRA-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527694#comment-13527694 ] Chris Herron commented on CASSANDRA-5025: - Thanks [~brandon.williams], [~xedin]. Schema push/pull race - Key: CASSANDRA-5025 URL: https://issues.apache.org/jira/browse/CASSANDRA-5025 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 1.1.8 Attachments: 5025.txt, 5025-v2.txt, 5025-v3.txt, 5025-v4.txt, 5025-v5.txt When a schema change is made, the coordinator pushes the delta to the other nodes in the cluster. This is more efficient than sending the entire schema. But the coordinator also announces the new schema version, so the other nodes' reception of the new version races with processing the delta, and usually seeing the new schema wins. So the other nodes also issue a pull to the coordinator for the entire schema. Thus, schema changes tend to become O(n) in the number of KS and CF present. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5025) Schema push/pull race
[ https://issues.apache.org/jira/browse/CASSANDRA-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526467#comment-13526467 ] Chris Herron commented on CASSANDRA-5025: - Could StorageServer.joinTokenRing wait max(RING_DELAY, 1min) (the 1 min being the delay in MigrationManager.maybeScheduleSchemaPull? Or could MigrationManager.maybeScheduleSchemaPull use some multiple of RING_DELAY? Related: is it correct that StorageServer.joinTokenRing calls Schema.instance.updateVersionAndAnnounce and MigrationManager.passiveAnnounce(Schema.instance.getVersion()) in quick succession? Schema push/pull race - Key: CASSANDRA-5025 URL: https://issues.apache.org/jira/browse/CASSANDRA-5025 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 1.1.8 Attachments: 5025.txt, 5025-v2.txt When a schema change is made, the coordinator pushes the delta to the other nodes in the cluster. This is more efficient than sending the entire schema. But the coordinator also announces the new schema version, so the other nodes' reception of the new version races with processing the delta, and usually seeing the new schema wins. So the other nodes also issue a pull to the coordinator for the entire schema. Thus, schema changes tend to become O(n) in the number of KS and CF present. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5025) Schema push/pull race
[ https://issues.apache.org/jira/browse/CASSANDRA-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526533#comment-13526533 ] Chris Herron commented on CASSANDRA-5025: - From discussion on #cassandra-dev with [~brandon.williams], StorageServer.joinTokenRing could use Schema.emptyVersion as Schema UUID in order to allow the maybeScheduleSchemaPull delay to be skipped. Patch to follow... Schema push/pull race - Key: CASSANDRA-5025 URL: https://issues.apache.org/jira/browse/CASSANDRA-5025 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 1.1.8 Attachments: 5025.txt, 5025-v2.txt When a schema change is made, the coordinator pushes the delta to the other nodes in the cluster. This is more efficient than sending the entire schema. But the coordinator also announces the new schema version, so the other nodes' reception of the new version races with processing the delta, and usually seeing the new schema wins. So the other nodes also issue a pull to the coordinator for the entire schema. Thus, schema changes tend to become O(n) in the number of KS and CF present. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5025) Schema push/pull race
[ https://issues.apache.org/jira/browse/CASSANDRA-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Herron updated CASSANDRA-5025: Attachment: 5025-v3.txt Attached patch 3 proposing the use of Schema.emptyVersion to differentiate StorageServer.joinTokenRing from other scenarios so that migration delay can be skipped for bootstrapping. Schema push/pull race - Key: CASSANDRA-5025 URL: https://issues.apache.org/jira/browse/CASSANDRA-5025 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 1.1.8 Attachments: 5025.txt, 5025-v2.txt, 5025-v3.txt When a schema change is made, the coordinator pushes the delta to the other nodes in the cluster. This is more efficient than sending the entire schema. But the coordinator also announces the new schema version, so the other nodes' reception of the new version races with processing the delta, and usually seeing the new schema wins. So the other nodes also issue a pull to the coordinator for the entire schema. Thus, schema changes tend to become O(n) in the number of KS and CF present. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5025) Schema push/pull race
[ https://issues.apache.org/jira/browse/CASSANDRA-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13512078#comment-13512078 ] Chris Herron commented on CASSANDRA-5025: - [~jbellis]: patch 5025-v2.txt works better. For the same test, after 60s, the CF creation time drops from sub-second to 5 seconds average. Delayed rectifySchema work will still interfere with coincident schema migrations, but I think this is the right compromise. Thank you! Minor: import for {{Callable}} was dropped, but is still referenced at line 229. [~xedin]: This test was not endorsing a high rate of CF creation for real world use, the goal was to investigate if/why CF creation time was {{O(N)}}. Schema push/pull race - Key: CASSANDRA-5025 URL: https://issues.apache.org/jira/browse/CASSANDRA-5025 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 1.1.8 Attachments: 5025.txt, 5025-v2.txt When a schema change is made, the coordinator pushes the delta to the other nodes in the cluster. This is more efficient than sending the entire schema. But the coordinator also announces the new schema version, so the other nodes' reception of the new version races with processing the delta, and usually seeing the new schema wins. So the other nodes also issue a pull to the coordinator for the entire schema. Thus, schema changes tend to become O(n) in the number of KS and CF present. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5025) Schema push/pull race
[ https://issues.apache.org/jira/browse/CASSANDRA-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510585#comment-13510585 ] Chris Herron commented on CASSANDRA-5025: - Clarifying for anyone else who encounters this issue: * This problem was introduced in CASSANDRA-3931 * For use cases that involve creation/update/deletion of multiple keyspaces or column families, the symptom will be increasingly slow schema migrations as the KS/CF population grows. Depending on client RPC timeout config, schema change requests may fail. * In a test environment running stock C* 1.1.7, for a test that creates new CFs in sequence, we see the following CF creation times: ** Empty cluster: sub-second ** 200+ CFs: 15s ave. ** 400+ CFs: 30s+ with eventual failure due to 30s client side (Hector) RPC timeout. * In the same test environment running 1.1.7 patched with 5025.txt: ** For the first 60s duration of the test, CF creation times are sub-second ** At 60s, the delayed rectifySchema migration calls kick in and creation times drop to 50s+ (including waits for schema agreement) with eventual failure due to 30s client side RPC timeout. Schema push/pull race - Key: CASSANDRA-5025 URL: https://issues.apache.org/jira/browse/CASSANDRA-5025 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 1.1.8 Attachments: 5025.txt When a schema change is made, the coordinator pushes the delta to the other nodes in the cluster. This is more efficient than sending the entire schema. But the coordinator also announces the new schema version, so the other nodes' reception of the new version races with processing the delta, and usually seeing the new schema wins. So the other nodes also issue a pull to the coordinator for the entire schema. Thus, schema changes tend to become O(n) in the number of KS and CF present. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3931) gossipers notion of schema differs from reality as reported by the nodes in question
[ https://issues.apache.org/jira/browse/CASSANDRA-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510589#comment-13510589 ] Chris Herron commented on CASSANDRA-3931: - FYI the fixes for this issue introduced issue CASSANDRA-5025. gossipers notion of schema differs from reality as reported by the nodes in question Key: CASSANDRA-3931 URL: https://issues.apache.org/jira/browse/CASSANDRA-3931 Project: Cassandra Issue Type: Bug Reporter: Peter Schuller Assignee: Brandon Williams Fix For: 1.1.0 Attachments: 3931.txt, 3931-v2.txt On a 1.1 cluster we happened to notice that {{nodetool gossipinfo | grep SCHEMA}} reported disagreement: {code} SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f SCHEMA:b0d7bab7-c13c-37d9-9adb-8ab8a5b7215d SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f SCHEMA:bcdbd318-82df-3518-89e3-6b72227b3f66 SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f SCHEMA:bcdbd318-82df-3518-89e3-6b72227b3f66 SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f {code} However, the result of a thrift {{describe_ring}} on the cluster claims they all agree and that {{b0d7bab7-c13c-37d9-9adb-8ab8a5b7215d}} is the schema they have. The schemas seem to actually propagate; e.g. dropping a keyspace actually drops the keyspace. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5025) Schema push/pull race
[ https://issues.apache.org/jira/browse/CASSANDRA-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510087#comment-13510087 ] Chris Herron commented on CASSANDRA-5025: - For patch 5025.txt: A single schema migration will result in N (num nodes) gossips of the new schema version (as before). Through MigrationManager.onChange()-rectifySchema(), those will each result in a delayed comparison of value 'theirVersion', but that value is now one minute old. Further, if some new schema migration happens to be underway, the same effect of redundant repeat RowMutations will occur. Schema migrations tend to happen in bursts - so this patch seems like it might reduce the problem but not eliminate it. Would it not be better to have DefsTable.mergeSchema call Schema.instance.updateVersion instead of Schema.instance.updateVersionAndAnnounce and then deal with temporarily unavailable nodes by doing a MigrationManager.passiveAnnounce(version) if/when we see them come back online? Schema push/pull race - Key: CASSANDRA-5025 URL: https://issues.apache.org/jira/browse/CASSANDRA-5025 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 1.1.8 Attachments: 5025.txt When a schema change is made, the coordinator pushes the delta to the other nodes in the cluster. This is more efficient than sending the entire schema. But the coordinator also announces the new schema version, so the other nodes' reception of the new version races with processing the delta, and usually seeing the new schema wins. So the other nodes also issue a pull to the coordinator for the entire schema. Thus, schema changes tend to become O(n) in the number of KS and CF present. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4906) Avoid flushing other columnfamilies on truncate
[ https://issues.apache.org/jira/browse/CASSANDRA-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13495524#comment-13495524 ] Chris Herron commented on CASSANDRA-4906: - Would it be possible to backport this to Cassandra 1.1? Avoid flushing other columnfamilies on truncate --- Key: CASSANDRA-4906 URL: https://issues.apache.org/jira/browse/CASSANDRA-4906 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 1.2.0 Attachments: 4906.txt, 4906-v2.txt Currently truncate flushes *all* columnfamilies so it can get rid of the commitlog segments containing truncated data. Otherwise, it could be replayed on restart since the replay position is contained in the sstables we're trying to delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4417) invalid counter shard detected
[ https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482552#comment-13482552 ] Chris Herron commented on CASSANDRA-4417: - bq. I'm really starting to think that CASSANDRA-4071 is likely the main cause for this and is very easy to reproduce in that case. The commit log we've discussed earlier can also trigger that error, but it's probably much harder to trigger. In our case: * We haven't made any topology changes * Our test drops and recreates the affected CFs. No nodes die during the test (w.r.t. unclean shutdown and commit log) * After previous load test runs under different configuration (see below), no nodes die, and we use nodetool drain before restarting with updated configs. Note my earlier comment above I said: bq. In investigating CASSANDRA-4687 we disabled key cache, repeated the load+upgradesstables test and these invalid counter shard warnings did not appear. Given that we don't have a topology change, can you think of a scenario where a commitlog issue is still contributing? invalid counter shard detected --- Key: CASSANDRA-4417 URL: https://issues.apache.org/jira/browse/CASSANDRA-4417 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Environment: Amazon Linux Reporter: Senthilvel Rangaswamy Seeing errors like these: 2012-07-06_07:00:27.22662 ERROR 07:00:27,226 invalid counter shard detected; (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 13) and (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 1) differ only in count; will pick highest to self-heal; this indicates a bug or corruption generated a bad counter shard What does it mean ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4417) invalid counter shard detected
[ https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481729#comment-13481729 ] Chris Herron commented on CASSANDRA-4417: - bq. Quick question: do you always increment by the same value by any chance? No, sorry just happened to pick that example. We have many other log entries where both values are higher and don't differ by 1. invalid counter shard detected --- Key: CASSANDRA-4417 URL: https://issues.apache.org/jira/browse/CASSANDRA-4417 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Environment: Amazon Linux Reporter: Senthilvel Rangaswamy Seeing errors like these: 2012-07-06_07:00:27.22662 ERROR 07:00:27,226 invalid counter shard detected; (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 13) and (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 1) differ only in count; will pick highest to self-heal; this indicates a bug or corruption generated a bad counter shard What does it mean ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4832) AssertionError: keys must not be empty
[ https://issues.apache.org/jira/browse/CASSANDRA-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Herron updated CASSANDRA-4832: Attachment: FlushWriterKeyAssertionBlock.txt Came across this investigating an apparent deadlock in Schema Migrations. If this assertion fails on the flushWriter executor, it blocks indefinitely. Anything upstream locking-wise gets stuck also. This was on 1.1.6. Log output below, thread dump attached. ERROR [FlushWriter:3] 2012-10-19 22:27:56,948 org.apache.cassandra.service.AbstractCassandraDaemon Exception in thread Thread[FlushWriter:3,5,main] java.lang.AssertionError: Keys must not be empty at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:176) at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:295) at org.apache.cassandra.db.Memtable.access$600(Memtable.java:48) at org.apache.cassandra.db.Memtable$5.runMayThrow(Memtable.java:316) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) AssertionError: keys must not be empty -- Key: CASSANDRA-4832 URL: https://issues.apache.org/jira/browse/CASSANDRA-4832 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.6 Environment: Debian 6.0.5 Reporter: Tristan Seligmann Assignee: Tristan Seligmann Priority: Minor Labels: indexing Fix For: 1.1.7 Attachments: FlushWriterKeyAssertionBlock.txt I'm getting errors like this logged: INFO 07:08:32,104 Compacting [SSTableReader(path='/var/lib/cassandra/data/Fusion/quoteinfo/Fusion-quoteinfo.quoteinfo_search_value_idx-hf-114-Data.db'), SSTableReader(path='/var/lib/cassandra/data/Fusion/quoteinfo/Fusion-quoteinfo.quoteinfo_search_value_idx-hf-113-Data.db'), SSTableReader(path='/var/lib/cassandra/data/Fusion/quoteinfo/Fusion-quoteinfo.quoteinfo_search_value_idx-hf-110-Data.db'), SSTableReader(path='/var/lib/cassandra/data/Fusion/quoteinfo/Fusion-quoteinfo.quoteinfo_search_value_idx-hd-108-Data.db'), SSTableReader(path='/var/lib/cassandra/data/Fusion/quoteinfo/Fusion-quoteinfo.quoteinfo_search_value_idx-hd-106-Data.db'), SSTableReader(path='/var/lib/cassandra/data/Fusion/quoteinfo/Fusion-quoteinfo.quoteinfo_search_value_idx-hd-107-Data.db'), SSTableReader(path='/var/lib/cassandra/data/Fusion/quoteinfo/Fusion-quoteinfo.quoteinfo_search_value_idx-hf-112-Data.db'), SSTableReader(path='/var/lib/cassandra/data/Fusion/quoteinfo/Fusion-quoteinfo.quoteinfo_search_value_idx-hf-109-Data.db'), SSTableReader(path='/var/lib/cassandra/data/Fusion/quoteinfo/Fusion-quoteinfo.quoteinfo_search_value_idx-hf-111-Data.db')] ERROR 07:08:32,108 Exception in thread Thread[CompactionExecutor:5,1,main] java.lang.AssertionError: Keys must not be empty at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:154) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) I'm not really sure when this started happening; they tend to be logged during a repair but I can't reproduce the error 100% reliably. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4417) invalid counter shard detected
[ https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480121#comment-13480121 ] Chris Herron commented on CASSANDRA-4417: - Another observation since: in previous runs with key cache disabled we were not seeing any errors. However I've since found some invalid counter shard errors that are occurring during normal compaction. {code} ERROR [CompactionExecutor:6] 2012-10-19 15:43:50,920 org.apache.cassandra.db.context.CounterContext invalid counter shard detected; (15b843e0-ff7c-11e0--07f4b18563ff, 1, 1) and (15b843e0-ff7c-11e0--07f4b18563ff, 1, 2) differ only in count; will pick highest to self-heal; this indicates a bug or corruption generated a bad counter shard {code} So to be clear, this particular scenario is: * C* 1.1.6 with key cache disabled. * Load test ran earlier against this same setup; but no upgradesstables during that run; no errors under load during that test run. * Later, some nightly jobs ran that read from Super CF counters, write to other CFs. * Compaction activity occurs later after load test and nightly jobs complete. Invalid counter shard errors are seen for some CFs. Gleaning from the log output order, the affected CF's: ** *Did* have upgradesstables run upon them in previous configurations (1.1.6, key cache on) ** Have not been written to at all for the purpose of the load test I've been mentioning. ** Have been read from for these nightly jobs. invalid counter shard detected --- Key: CASSANDRA-4417 URL: https://issues.apache.org/jira/browse/CASSANDRA-4417 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Environment: Amazon Linux Reporter: Senthilvel Rangaswamy Seeing errors like these: 2012-07-06_07:00:27.22662 ERROR 07:00:27,226 invalid counter shard detected; (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 13) and (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 1) differ only in count; will pick highest to self-heal; this indicates a bug or corruption generated a bad counter shard What does it mean ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4417) invalid counter shard detected
[ https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479561#comment-13479561 ] Chris Herron commented on CASSANDRA-4417: - bq. Also, during that test, is there anything involving streaming going on (a repair, a node bootstrapping/moving/decommissioning)? There are definitely no repairs or node bootstrapping/moving/decommissioning happening during the test. Re-ran the test and the JMX stats for StreamStage indicated zero tasks on all nodes after the test completed. invalid counter shard detected --- Key: CASSANDRA-4417 URL: https://issues.apache.org/jira/browse/CASSANDRA-4417 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Environment: Amazon Linux Reporter: Senthilvel Rangaswamy Seeing errors like these: 2012-07-06_07:00:27.22662 ERROR 07:00:27,226 invalid counter shard detected; (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 13) and (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 1) differ only in count; will pick highest to self-heal; this indicates a bug or corruption generated a bad counter shard What does it mean ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4417) invalid counter shard detected
[ https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477943#comment-13477943 ] Chris Herron commented on CASSANDRA-4417: - bq. Unless you've been able to reproduce on a brand new cluster where the commit log was set to batch from the beginning (in which case, if you have an easy way to reproduce, that would be interesting to know) In our test the affected Super CF is completely deleted and recreated - so in that sense the commit log was set to batch from the beginning. Is that equivalent? This does reproduce for every test run. Unfortunately our test is non-trivial to share. It involves heavy writes and moderate reads to counters, while simultaneously running upgradesstables on all nodes upon multiple CF's (including the affected one). Interestingly, the symptom does appear even before compaction reaches the Super CF that's active during the test. invalid counter shard detected --- Key: CASSANDRA-4417 URL: https://issues.apache.org/jira/browse/CASSANDRA-4417 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Environment: Amazon Linux Reporter: Senthilvel Rangaswamy Seeing errors like these: 2012-07-06_07:00:27.22662 ERROR 07:00:27,226 invalid counter shard detected; (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 13) and (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 1) differ only in count; will pick highest to self-heal; this indicates a bug or corruption generated a bad counter shard What does it mean ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4417) invalid counter shard detected
[ https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478505#comment-13478505 ] Chris Herron commented on CASSANDRA-4417: - bq. Probably, how was it deleted/recreated. Did you drop and recreate? Yes, dropped (the schema migration flavor) and recreated a CF of the same name. bq. Perform the same test without the upgradesstables part (i.e. only the writes and reads). If so, does that change something? Have already tested that scenario. Running this load test without the concurrent upgradesstables compaction activity, the problem does not exhibit. bq. during that test, is there anything involving streaming going on (a repair, a node bootstrapping/moving/decommissioning)? Not that I know of. I can test again and monitor for streaming activity to see. By the way, as we've been testing in preparation for a 1.1.x upgrade, we were seeing symptoms of CASSANDRA-4571, CASSANDRA-4687 as well as this issue on C* 1.1.6. In investigating CASSANDRA-4687 we disabled key cache, repeated the load+upgradesstables test and these invalid counter shard warnings did not appear. invalid counter shard detected --- Key: CASSANDRA-4417 URL: https://issues.apache.org/jira/browse/CASSANDRA-4417 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Environment: Amazon Linux Reporter: Senthilvel Rangaswamy Seeing errors like these: 2012-07-06_07:00:27.22662 ERROR 07:00:27,226 invalid counter shard detected; (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 13) and (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 1) differ only in count; will pick highest to self-heal; this indicates a bug or corruption generated a bad counter shard What does it mean ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4571) Strange permament socket descriptors increasing leads to Too many open files
[ https://issues.apache.org/jira/browse/CASSANDRA-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477121#comment-13477121 ] Chris Herron commented on CASSANDRA-4571: - We are also seeing errors similar to those reported in CASSANDRA-4687. Could this be a side-effect of that problem? In {{SSTableSliceIterator}} as of commit {{e1b10590e84189b92af168e33a63c14c3ca1f5fa}}, if the constructor key equality assertion fails, {{fileToClose}} does not get closed. Strange permament socket descriptors increasing leads to Too many open files -- Key: CASSANDRA-4571 URL: https://issues.apache.org/jira/browse/CASSANDRA-4571 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Environment: CentOS 5.8 Linux 2.6.18-308.13.1.el5 #1 SMP Tue Aug 21 17:10:18 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux. java version 1.6.0_33 Java(TM) SE Runtime Environment (build 1.6.0_33-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03, mixed mode) Reporter: Serg Shnerson Assignee: Jonathan Ellis Priority: Critical Fix For: 1.1.5 Attachments: 4571.txt On the two-node cluster there was found strange socket descriptors increasing. lsof -n | grep java shows many rows like java 8380 cassandra 113r unix 0x8101a374a080 938348482 socket java 8380 cassandra 114r unix 0x8101a374a080 938348482 socket java 8380 cassandra 115r unix 0x8101a374a080 938348482 socket java 8380 cassandra 116r unix 0x8101a374a080 938348482 socket java 8380 cassandra 117r unix 0x8101a374a080 938348482 socket java 8380 cassandra 118r unix 0x8101a374a080 938348482 socket java 8380 cassandra 119r unix 0x8101a374a080 938348482 socket java 8380 cassandra 120r unix 0x8101a374a080 938348482 socket And number of this rows constantly increasing. After about 24 hours this situation leads to error. We use PHPCassa client. Load is not so high (aroud ~50kb/s on write). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4687) Exception: DecoratedKey(xxx, yyy) != DecoratedKey(zzz, kkk)
[ https://issues.apache.org/jira/browse/CASSANDRA-4687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477136#comment-13477136 ] Chris Herron commented on CASSANDRA-4687: - Has anybody experienced Linux socket FD leakage alongside these errors? (see CASSANDRA-4571). To check this, you can run: {{watch -n 10 sudo lsof -n | grep java | grep unix | wc -l}} This number should stay at 1. If you see growth towards your limits (/etc/security/limits.conf), then that suggests CASSANDRA-4571 might be a side-effect of this problem. Exception: DecoratedKey(xxx, yyy) != DecoratedKey(zzz, kkk) --- Key: CASSANDRA-4687 URL: https://issues.apache.org/jira/browse/CASSANDRA-4687 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.5 Environment: CentOS 6.3 64-bit, Oracle JRE 1.6.0.33 64-bit, single node cluster Reporter: Leonid Shalupov Assignee: Pavel Yaskevich Priority: Critical Fix For: 1.1.7 Attachments: 4687-debugging.txt Under heavy write load sometimes cassandra fails with assertion error. git bisect leads to commit 295aedb278e7a495213241b66bc46d763fd4ce66. works fine if global key/row caches disabled in code. {quote} java.lang.AssertionError: DecoratedKey(xxx, yyy) != DecoratedKey(zzz, kkk) in /var/lib/cassandra/data/...-he-1-Data.db at org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:60) at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:67) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:79) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:256) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:64) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1345) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1142) at org.apache.cassandra.db.Table.getRow(Table.java:378) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:819) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1253) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4571) Strange permament socket descriptors increasing leads to Too many open files
[ https://issues.apache.org/jira/browse/CASSANDRA-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477176#comment-13477176 ] Chris Herron commented on CASSANDRA-4571: - Yes, seeing the key equality AssertionErrors from two SSTable iterators: SSTableSliceIterator:60 and SSTableNamesIterator:72. Also seeing same EOF error reported by [~tjake] in CASSANDRA-4687: {code} java.io.IOError: java.io.EOFException: unable to seek to position 61291844 in /redacted/cassandra/data/test1/redacted/test1-redacted-hf-1-Data.db (59874704 bytes) in read-only mode at org.apache.cassandra.io.util.CompressedSegmentedFile.getSegment(CompressedSegmentedFile.java:69) at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:898) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:50) at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:67) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:79) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:256) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:64) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1345) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1142) at org.apache.cassandra.db.Table.getRow(Table.java:378) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException: unable to seek to position 61291844 in /redacted/cassandra/data/test1/redacted/test1-redacted-hf-1-Data.db (59874704 bytes) in read-only mode at org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:253) at org.apache.cassandra.io.util.CompressedSegmentedFile.getSegment(CompressedSegmentedFile.java:64) ... 16 more {code} Strange permament socket descriptors increasing leads to Too many open files -- Key: CASSANDRA-4571 URL: https://issues.apache.org/jira/browse/CASSANDRA-4571 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Environment: CentOS 5.8 Linux 2.6.18-308.13.1.el5 #1 SMP Tue Aug 21 17:10:18 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux. java version 1.6.0_33 Java(TM) SE Runtime Environment (build 1.6.0_33-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03, mixed mode) Reporter: Serg Shnerson Assignee: Jonathan Ellis Priority: Critical Fix For: 1.1.5 Attachments: 4571.txt On the two-node cluster there was found strange socket descriptors increasing. lsof -n | grep java shows many rows like java 8380 cassandra 113r unix 0x8101a374a080 938348482 socket java 8380 cassandra 114r unix 0x8101a374a080 938348482 socket java 8380 cassandra 115r unix 0x8101a374a080 938348482 socket java 8380 cassandra 116r unix 0x8101a374a080 938348482 socket java 8380 cassandra 117r unix 0x8101a374a080 938348482 socket java 8380 cassandra 118r unix 0x8101a374a080 938348482 socket java 8380 cassandra 119r unix 0x8101a374a080 938348482 socket java 8380 cassandra 120r unix 0x8101a374a080 938348482 socket And number of this rows constantly increasing. After about 24 hours this situation leads to error. We use PHPCassa client. Load is not so high (aroud ~50kb/s on write). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4417) invalid counter shard detected
[ https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477556#comment-13477556 ] Chris Herron commented on CASSANDRA-4417: - We are seeing large volumes of this error on all nodes when running a load test while also running upgradesstables on multiple CF's on each node. After reading Sylvain's comments above, tried running the same test with commitlog_sync: batch - we get a similar volume of the same errors. (Running a build from branch cassandra-1.1 at commit 4d2e5e73b127dc0b335176ddc1dec1f0244e7f6d, with Java 6u35 on Amazon Linux 2.6.35) invalid counter shard detected --- Key: CASSANDRA-4417 URL: https://issues.apache.org/jira/browse/CASSANDRA-4417 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Environment: Amazon Linux Reporter: Senthilvel Rangaswamy Seeing errors like these: 2012-07-06_07:00:27.22662 ERROR 07:00:27,226 invalid counter shard detected; (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 13) and (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 1) differ only in count; will pick highest to self-heal; this indicates a bug or corruption generated a bad counter shard What does it mean ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4571) Strange permament socket descriptors increasing leads to Too many open files
[ https://issues.apache.org/jira/browse/CASSANDRA-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477576#comment-13477576 ] Chris Herron commented on CASSANDRA-4571: - Tested this patch: https://gist.github.com/2f10efd3922fab9a095e applied to a build from branch cassandra-1.1 at commit 4d2e5e73b127dc0b335176ddc1dec1f0244e7f6d. This definitely reduced the growth of socket FD handles, but there must be other scenarios like this in the codebase because it did grow beyond 2 which is where I've seen it at steady state under normal conditions. The AssertionErrors from CASSANDRA-4687 were so spurious that they were pegging disk IO. When I ran the same test again with assertions disabled for the org.apache.cassandra.db.columniterator package, I saw many errors like those described in CASSANDRA-4417 (invalid counter shard detected). See my comments in that issue. Shouldn't CASSANDRA-4571 be re-opened? Strange permament socket descriptors increasing leads to Too many open files -- Key: CASSANDRA-4571 URL: https://issues.apache.org/jira/browse/CASSANDRA-4571 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Environment: CentOS 5.8 Linux 2.6.18-308.13.1.el5 #1 SMP Tue Aug 21 17:10:18 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux. java version 1.6.0_33 Java(TM) SE Runtime Environment (build 1.6.0_33-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03, mixed mode) Reporter: Serg Shnerson Assignee: Jonathan Ellis Priority: Critical Fix For: 1.1.5 Attachments: 4571.txt On the two-node cluster there was found strange socket descriptors increasing. lsof -n | grep java shows many rows like java 8380 cassandra 113r unix 0x8101a374a080 938348482 socket java 8380 cassandra 114r unix 0x8101a374a080 938348482 socket java 8380 cassandra 115r unix 0x8101a374a080 938348482 socket java 8380 cassandra 116r unix 0x8101a374a080 938348482 socket java 8380 cassandra 117r unix 0x8101a374a080 938348482 socket java 8380 cassandra 118r unix 0x8101a374a080 938348482 socket java 8380 cassandra 119r unix 0x8101a374a080 938348482 socket java 8380 cassandra 120r unix 0x8101a374a080 938348482 socket And number of this rows constantly increasing. After about 24 hours this situation leads to error. We use PHPCassa client. Load is not so high (aroud ~50kb/s on write). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4571) Strange permament socket descriptors increasing leads to Too many open files
[ https://issues.apache.org/jira/browse/CASSANDRA-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476344#comment-13476344 ] Chris Herron commented on CASSANDRA-4571: - For anybody else encountering this unbounded socket growth problem on 1.1.5, note that while upgrading 1.6.0_35 seemed to help, a longer load test still reproduced the symptom. FWIW, upgradesstables ran for a period during this particular test - unclear if the increased compaction activity contributed. Strange permament socket descriptors increasing leads to Too many open files -- Key: CASSANDRA-4571 URL: https://issues.apache.org/jira/browse/CASSANDRA-4571 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Environment: CentOS 5.8 Linux 2.6.18-308.13.1.el5 #1 SMP Tue Aug 21 17:10:18 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux. java version 1.6.0_33 Java(TM) SE Runtime Environment (build 1.6.0_33-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03, mixed mode) Reporter: Serg Shnerson Assignee: Jonathan Ellis Priority: Critical Fix For: 1.1.5 Attachments: 4571.txt On the two-node cluster there was found strange socket descriptors increasing. lsof -n | grep java shows many rows like java 8380 cassandra 113r unix 0x8101a374a080 938348482 socket java 8380 cassandra 114r unix 0x8101a374a080 938348482 socket java 8380 cassandra 115r unix 0x8101a374a080 938348482 socket java 8380 cassandra 116r unix 0x8101a374a080 938348482 socket java 8380 cassandra 117r unix 0x8101a374a080 938348482 socket java 8380 cassandra 118r unix 0x8101a374a080 938348482 socket java 8380 cassandra 119r unix 0x8101a374a080 938348482 socket java 8380 cassandra 120r unix 0x8101a374a080 938348482 socket And number of this rows constantly increasing. After about 24 hours this situation leads to error. We use PHPCassa client. Load is not so high (aroud ~50kb/s on write). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4571) Strange permament socket descriptors increasing leads to Too many open files
[ https://issues.apache.org/jira/browse/CASSANDRA-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476710#comment-13476710 ] Chris Herron commented on CASSANDRA-4571: - FYI was able to reproduce the symptom on Cassandra 1.1.6. @[~jbellis] Re: CASSANDRA-4740 and whether it relates to this: * Haven't looked across all nodes for phantom connections yet * Have searched across all logs - found a single instance of Timed out replaying hints. * Mina mentioned that Nodes running earlier kernels (2.6.39, 3.0, 3.1) haven't exhibited this. We are seeing this on Linux kernel 2.6.35 with Java 1.6.0_35. Strange permament socket descriptors increasing leads to Too many open files -- Key: CASSANDRA-4571 URL: https://issues.apache.org/jira/browse/CASSANDRA-4571 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Environment: CentOS 5.8 Linux 2.6.18-308.13.1.el5 #1 SMP Tue Aug 21 17:10:18 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux. java version 1.6.0_33 Java(TM) SE Runtime Environment (build 1.6.0_33-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03, mixed mode) Reporter: Serg Shnerson Assignee: Jonathan Ellis Priority: Critical Fix For: 1.1.5 Attachments: 4571.txt On the two-node cluster there was found strange socket descriptors increasing. lsof -n | grep java shows many rows like java 8380 cassandra 113r unix 0x8101a374a080 938348482 socket java 8380 cassandra 114r unix 0x8101a374a080 938348482 socket java 8380 cassandra 115r unix 0x8101a374a080 938348482 socket java 8380 cassandra 116r unix 0x8101a374a080 938348482 socket java 8380 cassandra 117r unix 0x8101a374a080 938348482 socket java 8380 cassandra 118r unix 0x8101a374a080 938348482 socket java 8380 cassandra 119r unix 0x8101a374a080 938348482 socket java 8380 cassandra 120r unix 0x8101a374a080 938348482 socket And number of this rows constantly increasing. After about 24 hours this situation leads to error. We use PHPCassa client. Load is not so high (aroud ~50kb/s on write). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3070) counter repair
[ https://issues.apache.org/jira/browse/CASSANDRA-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100744#comment-13100744 ] Chris Herron commented on CASSANDRA-3070: - I have seen something similar. I recently grew an 0.8.4 test cluster from N nodes to N*2 nodes. After running nodetool repair on each node, found that some counters were out of sync (counter values would vary reading from different hosts). counter repair -- Key: CASSANDRA-3070 URL: https://issues.apache.org/jira/browse/CASSANDRA-3070 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.4 Reporter: ivan Assignee: Sylvain Lebresne Attachments: counter_local_quroum_maybeschedulerepairs.txt, counter_local_quroum_maybeschedulerepairs_2.txt, counter_local_quroum_maybeschedulerepairs_3.txt Hi! We have some counters out of sync but repair doesn't sync values. We tried nodetool repair. We use LOCAL_QUORUM for read. A repair row mutation is sent to other nodes while reading a bad row but counters wasn't repaired by mutation. Output of two nodes were uploaded. (Some new debug messages were added.) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira