Re: Does partition size limitation still exists in Cassandra 3.10 given there is a B-tree implementation?
Yes the recommendation still applies Wide partitions have huge impact on repair (over streaming), compaction and bootstrap Le 10 mai 2017 23:54, "Kant Kodali" a écrit : Hi All, Cassandra community had always been recommending 100MB per partition as a sweet spot however does this limitation still exist given there is a B-tree implementation to identify rows inside a partition? https://github.com/apache/cassandra/blob/trunk/src/java/ org/apache/cassandra/db/rows/BTreeRow.java Thanks!
Re: Does partition size limitation still exists in Cassandra 3.10 given there is a B-tree implementation?
Hi DuyHai, I am trying to see what are the possible things we can do to get over this limitation? 1. Would this https://issues.apache.org/jira/browse/CASSANDRA-7447 help at all? 2. Can we have Merkle trees built for groups of rows in partition ? such that we can stream only those groups where the hash is different? 3. It would be interesting to see if we can spread a partition across nodes. I am just trying to validate some ideas that can help potentially get over this 100MB limitation since we may not always fit into a time series model. Thanks! On Thu, May 11, 2017 at 12:37 AM, DuyHai Doan wrote: > Yes the recommendation still applies > > Wide partitions have huge impact on repair (over streaming), compaction > and bootstrap > > Le 10 mai 2017 23:54, "Kant Kodali" a écrit : > > Hi All, > > Cassandra community had always been recommending 100MB per partition as a > sweet spot however does this limitation still exist given there is a B-tree > implementation to identify rows inside a partition? > > https://github.com/apache/cassandra/blob/trunk/src/java/org/ > apache/cassandra/db/rows/BTreeRow.java > > Thanks! > > >
Re: Does partition size limitation still exists in Cassandra 3.10 given there is a B-tree implementation?
oh this looks like one I am looking for https://issues.apache.org/jira/browse/CASSANDRA-9754. Is this in Cassandra 3.10 or merged somewhere? On Thu, May 11, 2017 at 1:13 AM, Kant Kodali wrote: > Hi DuyHai, > > I am trying to see what are the possible things we can do to get over this > limitation? > > 1. Would this https://issues.apache.org/jira/browse/CASSANDRA-7447 help > at all? > 2. Can we have Merkle trees built for groups of rows in partition ? such > that we can stream only those groups where the hash is different? > 3. It would be interesting to see if we can spread a partition across > nodes. > > I am just trying to validate some ideas that can help potentially get over > this 100MB limitation since we may not always fit into a time series model. > > Thanks! > > On Thu, May 11, 2017 at 12:37 AM, DuyHai Doan > wrote: > >> Yes the recommendation still applies >> >> Wide partitions have huge impact on repair (over streaming), compaction >> and bootstrap >> >> Le 10 mai 2017 23:54, "Kant Kodali" a écrit : >> >> Hi All, >> >> Cassandra community had always been recommending 100MB per partition as a >> sweet spot however does this limitation still exist given there is a >> B-tree >> implementation to identify rows inside a partition? >> >> https://github.com/apache/cassandra/blob/trunk/src/java/org/ >> apache/cassandra/db/rows/BTreeRow.java >> >> Thanks! >> >> >> >
Re: Does partition size limitation still exists in Cassandra 3.10 given there is a B-tree implementation?
I'm almost done with a rebased trunk patch. Hit a few snags. I want nothing more to finish this thing... The latest issue was due to range tombstones and the fact that the deletion time was being stored in the index from 3.0 onwards. I hope to have everything pushed very shortly. Sorry for the delay, I'm doing my best... there is never enough hours in the day. :) best, kjellman > On May 11, 2017, at 1:48 AM, Kant Kodali wrote: > > oh this looks like one I am looking for > https://issues.apache.org/jira/browse/CASSANDRA-9754. Is this in Cassandra > 3.10 or merged somewhere? > > On Thu, May 11, 2017 at 1:13 AM, Kant Kodali wrote: > >> Hi DuyHai, >> >> I am trying to see what are the possible things we can do to get over this >> limitation? >> >> 1. Would this https://issues.apache.org/jira/browse/CASSANDRA-7447 help >> at all? >> 2. Can we have Merkle trees built for groups of rows in partition ? such >> that we can stream only those groups where the hash is different? >> 3. It would be interesting to see if we can spread a partition across >> nodes. >> >> I am just trying to validate some ideas that can help potentially get over >> this 100MB limitation since we may not always fit into a time series model. >> >> Thanks! >> >> On Thu, May 11, 2017 at 12:37 AM, DuyHai Doan >> wrote: >> >>> Yes the recommendation still applies >>> >>> Wide partitions have huge impact on repair (over streaming), compaction >>> and bootstrap >>> >>> Le 10 mai 2017 23:54, "Kant Kodali" a écrit : >>> >>> Hi All, >>> >>> Cassandra community had always been recommending 100MB per partition as a >>> sweet spot however does this limitation still exist given there is a >>> B-tree >>> implementation to identify rows inside a partition? >>> >>> https://github.com/apache/cassandra/blob/trunk/src/java/org/ >>> apache/cassandra/db/rows/BTreeRow.java >>> >>> Thanks! >>> >>> >>> >>
Dropped Mutation and Read messages.
Hi Experts, Seeking your help on a production issue. We were running high write intensive job on our 3 node cassandra cluster V 2.1.7. TPS on nodes were high. Job ran for more than 2 days and thereafter, loadavg on 1 of the node increased to very high number like loadavg : 29. System log reports: INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 2 READ messages dropped in last 5000ms INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 5000ms The job was stopped due to heavy load. But sill after 12 hours , we can see mutation drops messages and sudden increase on avgload Are these hintedhandoff mutations? Can we stop these. Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any load or any such activity. Due to heavy load and GC , there are intermittent gossip failures among node. Can you someone Please help. PS: Load job was stopped on cluster. Everything ran fine for few hours and and Later issue started again like mutation messages drops. Thanks and Regards, Varun Saluja - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Soliciting volunteers for flaky dtests on trunk
I've taken CASSANDRA-13507 CASSANDRA-13517 -Jason On Wed, May 10, 2017 at 9:45 PM, Lerh Chuan Low wrote: > I'll try my hand on https://issues.apache.org/jira/browse/CASSANDRA-13182. > > On 11 May 2017 at 05:59, Blake Eggleston wrote: > > > I've taken CASSANDRA-13194, CASSANDRA-13506, CASSANDRA-13515, > > and CASSANDRA-13372 to start > > > > On May 10, 2017 at 12:44:47 PM, Ariel Weisberg (ar...@weisberg.ws) > wrote: > > > > Hi, > > > > The dev list murdered my rich text formatted email. Here it is > > reformatted as plain text. > > > > The unit tests are looking pretty reliable right now. There is a long > > tail of infrequently failing tests but it's not bad and almost all > > builds succeed in the current build environment. In CircleCI it seems > > like unit tests might be a little less reliable, but still usable. > > > > The dtests on the other hand aren't producing clean builds yetl. There > > is also a pretty diverse set of failing tests. > > > > I did a bit of triaging of the flakey dtests. I started by cataloging > > everything, but what I found is that the long tail of flakey dtests is > > very long indeed so I narrowed focus to just the top frequently failing > > tests for now. See https://goo.gl/b96CdO > > > > I created spreadsheet with some of the failing tests. Links to JIRA, > > last time the test was seen failing, and how many failures I found in > > Apache Jenkins across the 3 dtest builds. There are a lot of failures > > not listed. There would be 50+ entries if I cataloged each one. > > > > There are two hard failing tests, but both are already moving along: > > CASSANDRA-13229 (Ready to commit, assigned Alex Petrov, Paulo Motta > > reviewing, last updated April 2017) dtest failure in > > topology_test.TestTopology.size_estimates_multidc_test > > CASSANDRA-13113 (Ready to commit, assigned Alex Petrov, Sam T Reviewing, > > last updated March 2017) test failure in > > auth_test.TestAuth.system_auth_ks_is_alterable_test > > > > I think the tests we should tackle first are on this sheet in priority > > order https://goo.gl/S3khv1 > > > > Suite: bootstrap_test > > Test: TestBootstrap.simultaneous_bootstrap_test > > JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13506 > > Last failure: 5/5/2017 > > Counted failures: 45 > > > > Suite: repair_test > > Test: incremental_repair_test.TestIncRepair.compaction_test > > JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13194 > > Last failure: 5/4/2017 > > Counted failures: 44 > > > > Suite: sstableutil_test > > Test: SSTableUtilTest.compaction_test > > JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13182 > > Last failure: 5/4/2017 > > Counted failures: 35 > > > > Suite: paging_test > > Test: TestPagingWithDeletions.test_ttl_deletions > > JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13507 > > Last failure: 4/25/2017 > > Counted failures: 31 > > > > Suite: repair_test > > Test: incremental_repair_test.TestIncRepair.multiple_repair_test > > JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13515 > > Last failed: 5/4/2017 > > Counted failures: 18 > > > > Suite: cqlsh_tests > > Test: cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip_* > > JIRA: > > https://issues.apache.org/jira/issues/?jql=project%20% > > 3D%20CASSANDRA%20AND%20status%20in%20(Open%2C%20%22In% > > 20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22% > > 2C%20%22Ready%20to%20Commit%22%2C%20%22Awaiting% > > 20Feedback%22)%20AND%20text%20~%20%22CqlshCopyTest%22 > > Last failed: 5/8/2017 > > Counted failures: 23 > > > > Suite: paxos_tests > > Test: TestPaxos.contention_test_many_threads > > JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13517 > > Last failed: 5/8/2017 > > Counted failures: 15 > > > > Suite: repair_test > > Test: TestRepair > > JIRA: > > https://issues.apache.org/jira/issues/?jql=status%20%3D% > > 20Open%20AND%20text%20~%20%22dtest%20failure%20repair_test%22 > > Last failure: 5/4/2017 > > Comment: No one test fails a lot but the number of failing tests is > > substantial > > > > Suite: cqlsh_tests > > Test: cqlsh_tests.CqlshSmokeTest.[test_insert | test_truncate | > > test_use_keyspace | test_create_keyspace] > > JIRA: No JIRA yet > > Last failed: 4/22/2017 > > count: 6 > > > > If you have spare cycles you can make a huge difference in test > > stability by picking off one of these. > > > > Regards, > > Ariel > > > > On Wed, May 10, 2017, at 12:45 PM, Ariel Weisberg wrote: > > > Hi all, > > > > > > The unit tests are looking pretty reliable right now. There is a long > > > tail of infrequently failing tests but it's not bad and almost all > > > builds succeed in the current build environment. In CircleCI it seems > > > like unit tests might be a little less reliable, but still usable. > > > The dtests on the other hand aren't producing clean builds yetl. There > > > is also a pretty diverse set of failing tests. > > > I did a bit of triaging of the flakey dtests. I started by cataloging > > > everything, but what I found is that t
Re: Dropped Mutation and Read messages.
Do you have a lot of compactions going on? It sounds like you might've built up a huge backlog. Is your throttling configured properly? > On 11 May 2017, at 18:50, varun saluja wrote: > > Hi Experts, > > Seeking your help on a production issue. We were running high write > intensive job on our 3 node cassandra cluster V 2.1.7. > > TPS on nodes were high. Job ran for more than 2 days and thereafter, loadavg > on 1 of the node increased to very high number like loadavg : 29. > > System log reports: > > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - > 839 MUTATION messages dropped in last 5000ms > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - > 2 READ messages dropped in last 5000ms > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - > 1 REQUEST_RESPONSE messages dropped in last 5000ms > > The job was stopped due to heavy load. But sill after 12 hours , we can see > mutation drops messages and sudden increase on avgload > > Are these hintedhandoff mutations? Can we stop these. > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any > load or any such activity. > > Due to heavy load and GC , there are intermittent gossip failures among node. > Can you someone Please help. > > PS: Load job was stopped on cluster. Everything ran fine for few hours and > and Later issue started again like mutation messages drops. > > Thanks and Regards, > Varun Saluja > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Dropped Mutation and Read messages.
What does nodetool compactionstats show? I meant compaction throttling. nodetool getcompactionthrougput > On 11 May 2017, at 19:41, varun saluja wrote: > > Hi Oskar, > > Thanks for response. > > Yes, could see lot of threads for compaction. Actually we are loading around > 400GB data per node on 3 node cassandra cluster. > Throttling was set to write around 7k TPS per node. Job ran fine for 2 days > and then, we start getting Mutation drops , longer GC and very high load on > system. > > System log reports: > Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) > off-heap > > The job was stopped 12 hours back. But, still these failures can be seen. > Can you Please let me know how shall i proceed further. If possible, Please > suggest some parameters for high write intensive jobs. > > > Regards, > Varun Saluja > > >> On 11 May 2017 at 23:01, Oskar Kjellin wrote: >> Do you have a lot of compactions going on? It sounds like you might've built >> up a huge backlog. Is your throttling configured properly? >> >> > On 11 May 2017, at 18:50, varun saluja wrote: >> > >> > Hi Experts, >> > >> > Seeking your help on a production issue. We were running high write >> > intensive job on our 3 node cassandra cluster V 2.1.7. >> > >> > TPS on nodes were high. Job ran for more than 2 days and thereafter, >> > loadavg on 1 of the node increased to very high number like loadavg : 29. >> > >> > System log reports: >> > >> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 >> > - 839 MUTATION messages dropped in last 5000ms >> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 >> > - 2 READ messages dropped in last 5000ms >> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 >> > - 1 REQUEST_RESPONSE messages dropped in last 5000ms >> > >> > The job was stopped due to heavy load. But sill after 12 hours , we can >> > see mutation drops messages and sudden increase on avgload >> > >> > Are these hintedhandoff mutations? Can we stop these. >> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any >> > load or any such activity. >> > >> > Due to heavy load and GC , there are intermittent gossip failures among >> > node. Can you someone Please help. >> > >> > PS: Load job was stopped on cluster. Everything ran fine for few hours and >> > and Later issue started again like mutation messages drops. >> > >> > Thanks and Regards, >> > Varun Saluja >> > >> > - >> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> > For additional commands, e-mail: dev-h...@cassandra.apache.org >> > >
Re: Dropped Mutation and Read messages.
That seems way too low. Depending on what type of disk you have it should be closer to 1-200MB. That's probably causing your problems. It would still take a while for you to compact all your data tho Sent from my iPhone > On 11 May 2017, at 19:50, varun saluja wrote: > > nodetool getcompactionthrougput > > ./nodetool getcompactionthroughput > Current compaction throughput: 16 MB/s > > Regards, > Varun Saluja > >> On 11 May 2017 at 23:18, varun saluja wrote: >> Hi, >> >> PFB results for same. Numbers are scary here. >> >> [root@WA-CASSDB2 bin]# ./nodetool compactionstats >> pending tasks: 137 >>compaction type keyspace tablecompleted >>totalunit progress >> Compaction system hints 5762711108 >> 837522028005 bytes 0.69% >> Compaction walletkeyspace user_txn_history_v2101477894 >> 4722068388 bytes 2.15% >> Compaction walletkeyspace user_txn_history_v2 1511866634 >> 753221762663 bytes 0.20% >> Compaction walletkeyspace user_txn_history_v2 3664734135 >> 18605501268 bytes 19.70% >> Active compaction remaining time : 26h32m28s >> >> >> >>> On 11 May 2017 at 23:15, Oskar Kjellin wrote: >>> What does nodetool compactionstats show? >>> >>> I meant compaction throttling. nodetool getcompactionthrougput >>> >>> On 11 May 2017, at 19:41, varun saluja wrote: Hi Oskar, Thanks for response. Yes, could see lot of threads for compaction. Actually we are loading around 400GB data per node on 3 node cassandra cluster. Throttling was set to write around 7k TPS per node. Job ran fine for 2 days and then, we start getting Mutation drops , longer GC and very high load on system. System log reports: Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) off-heap The job was stopped 12 hours back. But, still these failures can be seen. Can you Please let me know how shall i proceed further. If possible, Please suggest some parameters for high write intensive jobs. Regards, Varun Saluja > On 11 May 2017 at 23:01, Oskar Kjellin wrote: > Do you have a lot of compactions going on? It sounds like you might've > built up a huge backlog. Is your throttling configured properly? > > > On 11 May 2017, at 18:50, varun saluja wrote: > > > > Hi Experts, > > > > Seeking your help on a production issue. We were running high write > > intensive job on our 3 node cassandra cluster V 2.1.7. > > > > TPS on nodes were high. Job ran for more than 2 days and thereafter, > > loadavg on 1 of the node increased to very high number like loadavg : > > 29. > > > > System log reports: > > > > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 > > MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms > > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 > > MessagingService.java:888 - 2 READ messages dropped in last 5000ms > > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 > > MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last > > 5000ms > > > > The job was stopped due to heavy load. But sill after 12 hours , we can > > see mutation drops messages and sudden increase on avgload > > > > Are these hintedhandoff mutations? Can we stop these. > > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show > > any load or any such activity. > > > > Due to heavy load and GC , there are intermittent gossip failures among > > node. Can you someone Please help. > > > > PS: Load job was stopped on cluster. Everything ran fine for few hours > > and and Later issue started again like mutation messages drops. > > > > Thanks and Regards, > > Varun Saluja > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > >> >
Re: Dropped Mutation and Read messages.
Hi, PFB results for same. Numbers are scary here. [root@WA-CASSDB2 bin]# ./nodetool compactionstats pending tasks: 137 compaction type keyspace tablecompleted totalunit progress Compaction system hints 5762711108 837522028005 bytes 0.69% Compaction walletkeyspace user_txn_history_v2101477894 4722068388 bytes 2.15% Compaction walletkeyspace user_txn_history_v2 1511866634 753221762663 bytes 0.20% Compaction walletkeyspace user_txn_history_v2 3664734135 18605501268 bytes 19.70% Active compaction remaining time : *26h32m28s* On 11 May 2017 at 23:15, Oskar Kjellin wrote: > What does nodetool compactionstats show? > > I meant compaction throttling. nodetool getcompactionthrougput > > > On 11 May 2017, at 19:41, varun saluja wrote: > > Hi Oskar, > > Thanks for response. > > Yes, could see lot of threads for compaction. Actually we are loading > around 400GB data per node on 3 node cassandra cluster. > Throttling was set to write around 7k TPS per node. Job ran fine for 2 > days and then, we start getting Mutation drops , longer GC and very high > load on system. > > System log reports: > Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) > off-heap > > The job was stopped 12 hours back. But, still these failures can be seen. > Can you Please let me know how shall i proceed further. If possible, Please > suggest some parameters for high write intensive jobs. > > > Regards, > Varun Saluja > > > On 11 May 2017 at 23:01, Oskar Kjellin wrote: > >> Do you have a lot of compactions going on? It sounds like you might've >> built up a huge backlog. Is your throttling configured properly? >> >> > On 11 May 2017, at 18:50, varun saluja wrote: >> > >> > Hi Experts, >> > >> > Seeking your help on a production issue. We were running high write >> intensive job on our 3 node cassandra cluster V 2.1.7. >> > >> > TPS on nodes were high. Job ran for more than 2 days and thereafter, >> loadavg on 1 of the node increased to very high number like loadavg : 29. >> > >> > System log reports: >> > >> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 >> MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms >> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 >> MessagingService.java:888 - 2 READ messages dropped in last 5000ms >> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 >> MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last >> 5000ms >> > >> > The job was stopped due to heavy load. But sill after 12 hours , we can >> see mutation drops messages and sudden increase on avgload >> > >> > Are these hintedhandoff mutations? Can we stop these. >> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show >> any load or any such activity. >> > >> > Due to heavy load and GC , there are intermittent gossip failures among >> node. Can you someone Please help. >> > >> > PS: Load job was stopped on cluster. Everything ran fine for few hours >> and and Later issue started again like mutation messages drops. >> > >> > Thanks and Regards, >> > Varun Saluja >> > >> > - >> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> > For additional commands, e-mail: dev-h...@cassandra.apache.org >> > >> > >
Re: Dropped Mutation and Read messages.
*nodetool getcompactionthrougput* ./nodetool getcompactionthroughput Current compaction throughput: 16 MB/s Regards, Varun Saluja On 11 May 2017 at 23:18, varun saluja wrote: > Hi, > > PFB results for same. Numbers are scary here. > > [root@WA-CASSDB2 bin]# ./nodetool compactionstats > pending tasks: 137 >compaction type keyspace tablecompleted > totalunit progress > Compaction system hints 5762711108 > 837522028005 bytes 0.69% > Compaction walletkeyspace user_txn_history_v2101477894 > 4722068388 bytes 2.15% > Compaction walletkeyspace user_txn_history_v2 1511866634 > 753221762663 bytes 0.20% > Compaction walletkeyspace user_txn_history_v2 3664734135 > 18605501268 bytes 19.70% > Active compaction remaining time : *26h32m28s* > > > > On 11 May 2017 at 23:15, Oskar Kjellin wrote: > >> What does nodetool compactionstats show? >> >> I meant compaction throttling. nodetool getcompactionthrougput >> >> >> On 11 May 2017, at 19:41, varun saluja wrote: >> >> Hi Oskar, >> >> Thanks for response. >> >> Yes, could see lot of threads for compaction. Actually we are loading >> around 400GB data per node on 3 node cassandra cluster. >> Throttling was set to write around 7k TPS per node. Job ran fine for 2 >> days and then, we start getting Mutation drops , longer GC and very high >> load on system. >> >> System log reports: >> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) >> off-heap >> >> The job was stopped 12 hours back. But, still these failures can be >> seen. Can you Please let me know how shall i proceed further. If possible, >> Please suggest some parameters for high write intensive jobs. >> >> >> Regards, >> Varun Saluja >> >> >> On 11 May 2017 at 23:01, Oskar Kjellin wrote: >> >>> Do you have a lot of compactions going on? It sounds like you might've >>> built up a huge backlog. Is your throttling configured properly? >>> >>> > On 11 May 2017, at 18:50, varun saluja wrote: >>> > >>> > Hi Experts, >>> > >>> > Seeking your help on a production issue. We were running high write >>> intensive job on our 3 node cassandra cluster V 2.1.7. >>> > >>> > TPS on nodes were high. Job ran for more than 2 days and thereafter, >>> loadavg on 1 of the node increased to very high number like loadavg : 29. >>> > >>> > System log reports: >>> > >>> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 >>> MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms >>> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 >>> MessagingService.java:888 - 2 READ messages dropped in last 5000ms >>> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 >>> MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last >>> 5000ms >>> > >>> > The job was stopped due to heavy load. But sill after 12 hours , we >>> can see mutation drops messages and sudden increase on avgload >>> > >>> > Are these hintedhandoff mutations? Can we stop these. >>> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show >>> any load or any such activity. >>> > >>> > Due to heavy load and GC , there are intermittent gossip failures >>> among node. Can you someone Please help. >>> > >>> > PS: Load job was stopped on cluster. Everything ran fine for few hours >>> and and Later issue started again like mutation messages drops. >>> > >>> > Thanks and Regards, >>> > Varun Saluja >>> > >>> > - >>> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>> > For additional commands, e-mail: dev-h...@cassandra.apache.org >>> > >>> >> >> >
Re: Dropped Mutation and Read messages.
Hi Oskar, Thanks for response. Yes, could see lot of threads for compaction. Actually we are loading around 400GB data per node on 3 node cassandra cluster. Throttling was set to write around 7k TPS per node. Job ran fine for 2 days and then, we start getting Mutation drops , longer GC and very high load on system. System log reports: Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) off-heap The job was stopped 12 hours back. But, still these failures can be seen. Can you Please let me know how shall i proceed further. If possible, Please suggest some parameters for high write intensive jobs. Regards, Varun Saluja On 11 May 2017 at 23:01, Oskar Kjellin wrote: > Do you have a lot of compactions going on? It sounds like you might've > built up a huge backlog. Is your throttling configured properly? > > > On 11 May 2017, at 18:50, varun saluja wrote: > > > > Hi Experts, > > > > Seeking your help on a production issue. We were running high write > intensive job on our 3 node cassandra cluster V 2.1.7. > > > > TPS on nodes were high. Job ran for more than 2 days and thereafter, > loadavg on 1 of the node increased to very high number like loadavg : 29. > > > > System log reports: > > > > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 > MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms > > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 > MessagingService.java:888 - 2 READ messages dropped in last 5000ms > > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 > MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last > 5000ms > > > > The job was stopped due to heavy load. But sill after 12 hours , we can > see mutation drops messages and sudden increase on avgload > > > > Are these hintedhandoff mutations? Can we stop these. > > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show > any load or any such activity. > > > > Due to heavy load and GC , there are intermittent gossip failures among > node. Can you someone Please help. > > > > PS: Load job was stopped on cluster. Everything ran fine for few hours > and and Later issue started again like mutation messages drops. > > > > Thanks and Regards, > > Varun Saluja > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > >
Re: Dropped Mutation and Read messages.
This discussion should be on the C* user mailing list. Thanks! best, kjellman > On May 11, 2017, at 10:53 AM, Oskar Kjellin wrote: > > That seems way too low. Depending on what type of disk you have it should be > closer to 1-200MB. > That's probably causing your problems. It would still take a while for you to > compact all your data tho > > Sent from my iPhone > >> On 11 May 2017, at 19:50, varun saluja wrote: >> >> nodetool getcompactionthrougput >> >> ./nodetool getcompactionthroughput >> Current compaction throughput: 16 MB/s >> >> Regards, >> Varun Saluja >> >>> On 11 May 2017 at 23:18, varun saluja wrote: >>> Hi, >>> >>> PFB results for same. Numbers are scary here. >>> >>> [root@WA-CASSDB2 bin]# ./nodetool compactionstats >>> pending tasks: 137 >>> compaction type keyspace tablecompleted >>>totalunit progress >>>Compaction system hints 5762711108 >>> 837522028005 bytes 0.69% >>>Compaction walletkeyspace user_txn_history_v2101477894 >>> 4722068388 bytes 2.15% >>>Compaction walletkeyspace user_txn_history_v2 1511866634 >>> 753221762663 bytes 0.20% >>>Compaction walletkeyspace user_txn_history_v2 3664734135 >>> 18605501268 bytes 19.70% >>> Active compaction remaining time : 26h32m28s >>> >>> >>> On 11 May 2017 at 23:15, Oskar Kjellin wrote: What does nodetool compactionstats show? I meant compaction throttling. nodetool getcompactionthrougput > On 11 May 2017, at 19:41, varun saluja wrote: > > Hi Oskar, > > Thanks for response. > > Yes, could see lot of threads for compaction. Actually we are loading > around 400GB data per node on 3 node cassandra cluster. > Throttling was set to write around 7k TPS per node. Job ran fine for 2 > days and then, we start getting Mutation drops , longer GC and very high > load on system. > > System log reports: > Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) > off-heap > > The job was stopped 12 hours back. But, still these failures can be seen. > Can you Please let me know how shall i proceed further. If possible, > Please suggest some parameters for high write intensive jobs. > > > Regards, > Varun Saluja > > >> On 11 May 2017 at 23:01, Oskar Kjellin wrote: >> Do you have a lot of compactions going on? It sounds like you might've >> built up a huge backlog. Is your throttling configured properly? >> >>> On 11 May 2017, at 18:50, varun saluja wrote: >>> >>> Hi Experts, >>> >>> Seeking your help on a production issue. We were running high write >>> intensive job on our 3 node cassandra cluster V 2.1.7. >>> >>> TPS on nodes were high. Job ran for more than 2 days and thereafter, >>> loadavg on 1 of the node increased to very high number like loadavg : >>> 29. >>> >>> System log reports: >>> >>> INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 >>> MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms >>> INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 >>> MessagingService.java:888 - 2 READ messages dropped in last 5000ms >>> INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 >>> MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last >>> 5000ms >>> >>> The job was stopped due to heavy load. But sill after 12 hours , we can >>> see mutation drops messages and sudden increase on avgload >>> >>> Are these hintedhandoff mutations? Can we stop these. >>> Strangely this behaviour is seen only on 2 nodes. Node 1 does not show >>> any load or any such activity. >>> >>> Due to heavy load and GC , there are intermittent gossip failures among >>> node. Can you someone Please help. >>> >>> PS: Load job was stopped on cluster. Everything ran fine for few hours >>> and and Later issue started again like mutation messages drops. >>> >>> Thanks and Regards, >>> Varun Saluja >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>> > >>> >> - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Dropped Mutation and Read messages.
Indeed, sorry. Subscribed to both so missed which one this was. Sent from my iPhone > On 11 May 2017, at 19:56, Michael Kjellman > wrote: > > This discussion should be on the C* user mailing list. Thanks! > > best, > kjellman > >> On May 11, 2017, at 10:53 AM, Oskar Kjellin wrote: >> >> That seems way too low. Depending on what type of disk you have it should be >> closer to 1-200MB. >> That's probably causing your problems. It would still take a while for you >> to compact all your data tho >> >> Sent from my iPhone >> >>> On 11 May 2017, at 19:50, varun saluja wrote: >>> >>> nodetool getcompactionthrougput >>> >>> ./nodetool getcompactionthroughput >>> Current compaction throughput: 16 MB/s >>> >>> Regards, >>> Varun Saluja >>> On 11 May 2017 at 23:18, varun saluja wrote: Hi, PFB results for same. Numbers are scary here. [root@WA-CASSDB2 bin]# ./nodetool compactionstats pending tasks: 137 compaction type keyspace tablecompleted totalunit progress Compaction system hints 5762711108 837522028005 bytes 0.69% Compaction walletkeyspace user_txn_history_v2101477894 4722068388 bytes 2.15% Compaction walletkeyspace user_txn_history_v2 1511866634 753221762663 bytes 0.20% Compaction walletkeyspace user_txn_history_v2 3664734135 18605501268 bytes 19.70% Active compaction remaining time : 26h32m28s > On 11 May 2017 at 23:15, Oskar Kjellin wrote: > What does nodetool compactionstats show? > > I meant compaction throttling. nodetool getcompactionthrougput > > >> On 11 May 2017, at 19:41, varun saluja wrote: >> >> Hi Oskar, >> >> Thanks for response. >> >> Yes, could see lot of threads for compaction. Actually we are loading >> around 400GB data per node on 3 node cassandra cluster. >> Throttling was set to write around 7k TPS per node. Job ran fine for 2 >> days and then, we start getting Mutation drops , longer GC and very >> high load on system. >> >> System log reports: >> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) >> off-heap >> >> The job was stopped 12 hours back. But, still these failures can be >> seen. Can you Please let me know how shall i proceed further. If >> possible, Please suggest some parameters for high write intensive jobs. >> >> >> Regards, >> Varun Saluja >> >> >>> On 11 May 2017 at 23:01, Oskar Kjellin wrote: >>> Do you have a lot of compactions going on? It sounds like you might've >>> built up a huge backlog. Is your throttling configured properly? >>> On 11 May 2017, at 18:50, varun saluja wrote: Hi Experts, Seeking your help on a production issue. We were running high write intensive job on our 3 node cassandra cluster V 2.1.7. TPS on nodes were high. Job ran for more than 2 days and thereafter, loadavg on 1 of the node increased to very high number like loadavg : 29. System log reports: INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 2 READ messages dropped in last 5000ms INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 5000ms The job was stopped due to heavy load. But sill after 12 hours , we can see mutation drops messages and sudden increase on avgload Are these hintedhandoff mutations? Can we stop these. Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any load or any such activity. Due to heavy load and GC , there are intermittent gossip failures among node. Can you someone Please help. PS: Load job was stopped on cluster. Everything ran fine for few hours and and Later issue started again like mutation messages drops. Thanks and Regards, Varun Saluja - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org >> >>> > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Integrating vendor-specific code and developing plugins
Hi all, In this JIRA ticket https://issues.apache.org/jira/browse/CASSANDRA-13486, we proposed integrating our code to support a fast flash+FPGA card (called CAPI Flash) only available in the ppc architecture. Although we will keep discussing the topics specific to the patch (e.g. documentation, license, code quality) in the JIRA, we would like to start in this dev list more general discussion about how to (and how not to) merge architecture-specific (or vendor-specific) changes. I think in the end the problem boils down to how to test the architecture-specific code. The original contributors of the architecture-specific code can keep "supporting" the code in a sense that when a problem arises they can fix it and send a patch, but the committers cannot test it anyway. Are there any other factors we must consider? Also, in this particular case, it is relatively easy to turn the code change into a plugin because it extended the already-pluggable RowCache. I feel Cassandra has promoted the plugins not so much as other pluggable software have done like Eclipse, Apache HTTP server, fluentd, etc. For example, they have a list of plugins in their Web pages. I think if the community wants to encourage developers to maintain vendor-specific code as plugins outside of the main source tree, a deeper commitment to the plugin ecosystem would be appreciated. What do you think? Thanks, Rei Odaira
Re: Integrating vendor-specific code and developing plugins
Hey all, I'm on-board with what Rei is saying. I think we should be open to, and encourage, other platforms/architectures for integration. Of course, it will come down to specific maintainers/committers to do the testing and verification on non-typical platforms. Hopefully those maintainers will also contribute to other parts of the code base, as well, so I see this as another way to bring more folks into the project. WRT extensibility, it just requires someone to do the work of making reasonable abstraction points - and documenting them ;). The interesting question comes down to how to host/ship any pluggable dependencies. Much like what we had with jna before they relicensed, we'll probably ship some things in-tree and some things users will have to fetch and deploy themselves; it's a case-by-case basis. Thanks, -Jason On Thu, May 11, 2017 at 2:59 PM, 大平怜 wrote: > Hi all, > > In this JIRA ticket https://issues.apache.org/jira/browse/CASSANDRA-13486, > we proposed integrating our code to support a fast flash+FPGA card (called > CAPI Flash) only available in the ppc architecture. Although we will keep > discussing the topics specific to the patch (e.g. documentation, license, > code quality) in the JIRA, we would like to start in this dev list more > general discussion about how to (and how not to) merge > architecture-specific (or vendor-specific) changes. > > I think in the end the problem boils down to how to test the > architecture-specific code. The original contributors of the > architecture-specific code can keep "supporting" the code in a sense that > when a problem arises they can fix it and send a patch, but the committers > cannot test it anyway. Are there any other factors we must consider? > > Also, in this particular case, it is relatively easy to turn the code > change into a plugin because it extended the already-pluggable RowCache. I > feel Cassandra has promoted the plugins not so much as other pluggable > software have done like Eclipse, Apache HTTP server, fluentd, etc. For > example, they have a list of plugins in their Web pages. I think if the > community wants to encourage developers to maintain vendor-specific code as > plugins outside of the main source tree, a deeper commitment to the plugin > ecosystem would be appreciated. > > What do you think? > > > Thanks, > Rei Odaira >