Re: Does partition size limitation still exists in Cassandra 3.10 given there is a B-tree implementation?

2017-05-11 Thread DuyHai Doan
Yes the recommendation still applies

Wide partitions have huge impact on repair (over streaming), compaction and
bootstrap

Le 10 mai 2017 23:54, "Kant Kodali"  a écrit :

Hi All,

Cassandra community had always been recommending 100MB per partition as a
sweet spot however does this limitation still exist given there is a B-tree
implementation to identify rows inside a partition?

https://github.com/apache/cassandra/blob/trunk/src/java/
org/apache/cassandra/db/rows/BTreeRow.java

Thanks!


Re: Does partition size limitation still exists in Cassandra 3.10 given there is a B-tree implementation?

2017-05-11 Thread Kant Kodali
Hi DuyHai,

I am trying to see what are the possible things we can do to get over this
limitation?

1. Would this https://issues.apache.org/jira/browse/CASSANDRA-7447 help at
all?
2. Can we have Merkle trees built for groups of rows in partition ? such
that we can stream only those groups where the hash is different?
3. It would be interesting to see if we can spread a partition across nodes.

I am just trying to validate some ideas that can help potentially get over
this 100MB limitation since we may not always fit into a time series model.

Thanks!

On Thu, May 11, 2017 at 12:37 AM, DuyHai Doan  wrote:

> Yes the recommendation still applies
>
> Wide partitions have huge impact on repair (over streaming), compaction
> and bootstrap
>
> Le 10 mai 2017 23:54, "Kant Kodali"  a écrit :
>
> Hi All,
>
> Cassandra community had always been recommending 100MB per partition as a
> sweet spot however does this limitation still exist given there is a B-tree
> implementation to identify rows inside a partition?
>
> https://github.com/apache/cassandra/blob/trunk/src/java/org/
> apache/cassandra/db/rows/BTreeRow.java
>
> Thanks!
>
>
>


Re: Does partition size limitation still exists in Cassandra 3.10 given there is a B-tree implementation?

2017-05-11 Thread Kant Kodali
oh this looks like one I am looking for
https://issues.apache.org/jira/browse/CASSANDRA-9754. Is this in Cassandra
3.10 or merged somewhere?

On Thu, May 11, 2017 at 1:13 AM, Kant Kodali  wrote:

> Hi DuyHai,
>
> I am trying to see what are the possible things we can do to get over this
> limitation?
>
> 1. Would this https://issues.apache.org/jira/browse/CASSANDRA-7447 help
> at all?
> 2. Can we have Merkle trees built for groups of rows in partition ? such
> that we can stream only those groups where the hash is different?
> 3. It would be interesting to see if we can spread a partition across
> nodes.
>
> I am just trying to validate some ideas that can help potentially get over
> this 100MB limitation since we may not always fit into a time series model.
>
> Thanks!
>
> On Thu, May 11, 2017 at 12:37 AM, DuyHai Doan 
> wrote:
>
>> Yes the recommendation still applies
>>
>> Wide partitions have huge impact on repair (over streaming), compaction
>> and bootstrap
>>
>> Le 10 mai 2017 23:54, "Kant Kodali"  a écrit :
>>
>> Hi All,
>>
>> Cassandra community had always been recommending 100MB per partition as a
>> sweet spot however does this limitation still exist given there is a
>> B-tree
>> implementation to identify rows inside a partition?
>>
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/
>> apache/cassandra/db/rows/BTreeRow.java
>>
>> Thanks!
>>
>>
>>
>


Re: Does partition size limitation still exists in Cassandra 3.10 given there is a B-tree implementation?

2017-05-11 Thread Michael Kjellman
I'm almost done with a rebased trunk patch. Hit a few snags. I want nothing 
more to finish this thing... The latest issue was due to range tombstones and 
the fact that the deletion time was being stored in the index from 3.0 onwards. 
I hope to have everything pushed very shortly. Sorry for the delay, I'm doing 
my best... there is never enough hours in the day. :)

best,
kjellman 

> On May 11, 2017, at 1:48 AM, Kant Kodali  wrote:
> 
> oh this looks like one I am looking for
> https://issues.apache.org/jira/browse/CASSANDRA-9754. Is this in Cassandra
> 3.10 or merged somewhere?
> 
> On Thu, May 11, 2017 at 1:13 AM, Kant Kodali  wrote:
> 
>> Hi DuyHai,
>> 
>> I am trying to see what are the possible things we can do to get over this
>> limitation?
>> 
>> 1. Would this https://issues.apache.org/jira/browse/CASSANDRA-7447 help
>> at all?
>> 2. Can we have Merkle trees built for groups of rows in partition ? such
>> that we can stream only those groups where the hash is different?
>> 3. It would be interesting to see if we can spread a partition across
>> nodes.
>> 
>> I am just trying to validate some ideas that can help potentially get over
>> this 100MB limitation since we may not always fit into a time series model.
>> 
>> Thanks!
>> 
>> On Thu, May 11, 2017 at 12:37 AM, DuyHai Doan 
>> wrote:
>> 
>>> Yes the recommendation still applies
>>> 
>>> Wide partitions have huge impact on repair (over streaming), compaction
>>> and bootstrap
>>> 
>>> Le 10 mai 2017 23:54, "Kant Kodali"  a écrit :
>>> 
>>> Hi All,
>>> 
>>> Cassandra community had always been recommending 100MB per partition as a
>>> sweet spot however does this limitation still exist given there is a
>>> B-tree
>>> implementation to identify rows inside a partition?
>>> 
>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/
>>> apache/cassandra/db/rows/BTreeRow.java
>>> 
>>> Thanks!
>>> 
>>> 
>>> 
>> 



Dropped Mutation and Read messages.

2017-05-11 Thread varun saluja
Hi Experts,

Seeking your help on a production issue.  We were running high write intensive 
job on our 3 node cassandra cluster V 2.1.7.

TPS on nodes were high. Job ran for more than 2 days and thereafter, loadavg on 
1 of the node increased to very high number like loadavg : 29.

System log reports:

INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 
839 MUTATION messages dropped in last 5000ms
INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 2 
READ messages dropped in last 5000ms
INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 1 
REQUEST_RESPONSE messages dropped in last 5000ms

The job was stopped due to heavy load. But sill after 12 hours , we can see 
mutation drops messages and sudden increase on avgload 

Are these hintedhandoff mutations? Can we stop these.
Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any load 
or any such activity.

Due to heavy load and GC , there are intermittent gossip failures among node. 
Can you someone Please help. 

PS: Load job was stopped on cluster. Everything ran fine for few hours and and 
Later issue started again like mutation messages drops.

Thanks and Regards,
Varun Saluja

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Soliciting volunteers for flaky dtests on trunk

2017-05-11 Thread Jason Brown
I've taken
CASSANDRA-13507
CASSANDRA-13517

-Jason


On Wed, May 10, 2017 at 9:45 PM, Lerh Chuan Low 
wrote:

> I'll try my hand on https://issues.apache.org/jira/browse/CASSANDRA-13182.
>
> On 11 May 2017 at 05:59, Blake Eggleston  wrote:
>
> > I've taken CASSANDRA-13194, CASSANDRA-13506, CASSANDRA-13515,
> > and CASSANDRA-13372 to start
> >
> > On May 10, 2017 at 12:44:47 PM, Ariel Weisberg (ar...@weisberg.ws)
> wrote:
> >
> > Hi,
> >
> > The dev list murdered my rich text formatted email. Here it is
> > reformatted as plain text.
> >
> > The unit tests are looking pretty reliable right now. There is a long
> > tail of infrequently failing tests but it's not bad and almost all
> > builds succeed in the current build environment. In CircleCI it seems
> > like unit tests might be a little less reliable, but still usable.
> >
> > The dtests on the other hand aren't producing clean builds yetl. There
> > is also a pretty diverse set of failing tests.
> >
> > I did a bit of triaging of the flakey dtests. I started by cataloging
> > everything, but what I found is that the long tail of flakey dtests is
> > very long indeed so I narrowed focus to just the top frequently failing
> > tests for now. See https://goo.gl/b96CdO
> >
> > I created spreadsheet with some of the failing tests. Links to JIRA,
> > last time the test was seen failing, and how many failures I found in
> > Apache Jenkins across the 3 dtest builds. There are a lot of failures
> > not listed. There would be 50+ entries if I cataloged each one.
> >
> > There are two hard failing tests, but both are already moving along:
> > CASSANDRA-13229 (Ready to commit, assigned Alex Petrov, Paulo Motta
> > reviewing, last updated April 2017) dtest failure in
> > topology_test.TestTopology.size_estimates_multidc_test
> > CASSANDRA-13113 (Ready to commit, assigned Alex Petrov, Sam T Reviewing,
> > last updated March 2017) test failure in
> > auth_test.TestAuth.system_auth_ks_is_alterable_test
> >
> > I think the tests we should tackle first are on this sheet in priority
> > order https://goo.gl/S3khv1
> >
> > Suite: bootstrap_test
> > Test: TestBootstrap.simultaneous_bootstrap_test
> > JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13506
> > Last failure: 5/5/2017
> > Counted failures: 45
> >
> > Suite: repair_test
> > Test: incremental_repair_test.TestIncRepair.compaction_test
> > JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13194
> > Last failure: 5/4/2017
> > Counted failures: 44
> >
> > Suite: sstableutil_test
> > Test: SSTableUtilTest.compaction_test
> > JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13182
> > Last failure: 5/4/2017
> > Counted failures: 35
> >
> > Suite: paging_test
> > Test: TestPagingWithDeletions.test_ttl_deletions
> > JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13507
> > Last failure: 4/25/2017
> > Counted failures: 31
> >
> > Suite: repair_test
> > Test: incremental_repair_test.TestIncRepair.multiple_repair_test
> > JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13515
> > Last failed: 5/4/2017
> > Counted failures: 18
> >
> > Suite: cqlsh_tests
> > Test: cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip_*
> > JIRA:
> > https://issues.apache.org/jira/issues/?jql=project%20%
> > 3D%20CASSANDRA%20AND%20status%20in%20(Open%2C%20%22In%
> > 20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22%
> > 2C%20%22Ready%20to%20Commit%22%2C%20%22Awaiting%
> > 20Feedback%22)%20AND%20text%20~%20%22CqlshCopyTest%22
> > Last failed: 5/8/2017
> > Counted failures: 23
> >
> > Suite: paxos_tests
> > Test: TestPaxos.contention_test_many_threads
> > JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13517
> > Last failed: 5/8/2017
> > Counted failures: 15
> >
> > Suite: repair_test
> > Test: TestRepair
> > JIRA:
> > https://issues.apache.org/jira/issues/?jql=status%20%3D%
> > 20Open%20AND%20text%20~%20%22dtest%20failure%20repair_test%22
> > Last failure: 5/4/2017
> > Comment: No one test fails a lot but the number of failing tests is
> > substantial
> >
> > Suite: cqlsh_tests
> > Test: cqlsh_tests.CqlshSmokeTest.[test_insert | test_truncate |
> > test_use_keyspace | test_create_keyspace]
> > JIRA: No JIRA yet
> > Last failed: 4/22/2017
> > count: 6
> >
> > If you have spare cycles you can make a huge difference in test
> > stability by picking off one of these.
> >
> > Regards,
> > Ariel
> >
> > On Wed, May 10, 2017, at 12:45 PM, Ariel Weisberg wrote:
> > > Hi all,
> > >
> > > The unit tests are looking pretty reliable right now. There is a long
> > > tail of infrequently failing tests but it's not bad and almost all
> > > builds succeed in the current build environment. In CircleCI it seems
> > > like unit tests might be a little less reliable, but still usable.
> > > The dtests on the other hand aren't producing clean builds yetl. There
> > > is also a pretty diverse set of failing tests.
> > > I did a bit of triaging of the flakey dtests. I started by cataloging
> > > everything, but what I found is that t

Re: Dropped Mutation and Read messages.

2017-05-11 Thread Oskar Kjellin
Do you have a lot of compactions going on? It sounds like you might've built up 
a huge backlog. Is your throttling configured properly?

> On 11 May 2017, at 18:50, varun saluja  wrote:
> 
> Hi Experts,
> 
> Seeking your help on a production issue.  We were running high write 
> intensive job on our 3 node cassandra cluster V 2.1.7.
> 
> TPS on nodes were high. Job ran for more than 2 days and thereafter, loadavg 
> on 1 of the node increased to very high number like loadavg : 29.
> 
> System log reports:
> 
> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 
> 839 MUTATION messages dropped in last 5000ms
> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 
> 2 READ messages dropped in last 5000ms
> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 
> 1 REQUEST_RESPONSE messages dropped in last 5000ms
> 
> The job was stopped due to heavy load. But sill after 12 hours , we can see 
> mutation drops messages and sudden increase on avgload 
> 
> Are these hintedhandoff mutations? Can we stop these.
> Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any 
> load or any such activity.
> 
> Due to heavy load and GC , there are intermittent gossip failures among node. 
> Can you someone Please help. 
> 
> PS: Load job was stopped on cluster. Everything ran fine for few hours and 
> and Later issue started again like mutation messages drops.
> 
> Thanks and Regards,
> Varun Saluja
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Dropped Mutation and Read messages.

2017-05-11 Thread Oskar Kjellin
What does nodetool compactionstats show?

I meant compaction throttling. nodetool getcompactionthrougput


> On 11 May 2017, at 19:41, varun saluja  wrote:
> 
> Hi Oskar,
> 
> Thanks for response.
> 
>  Yes, could see lot of threads for compaction. Actually we are loading around 
> 400GB data  per node on 3 node cassandra cluster.
> Throttling was set to write around 7k TPS per node. Job ran fine for 2 days 
> and then, we start getting Mutation drops  , longer GC and very high load on 
> system.
> 
> System log reports:
> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) 
> off-heap
> 
>  The job was stopped 12 hours back. But, still these failures can be seen. 
> Can you Please let me know how shall i proceed further. If possible, Please 
> suggest some parameters for high write intensive jobs.
> 
> 
> Regards,
> Varun Saluja
> 
> 
>> On 11 May 2017 at 23:01, Oskar Kjellin  wrote:
>> Do you have a lot of compactions going on? It sounds like you might've built 
>> up a huge backlog. Is your throttling configured properly?
>> 
>> > On 11 May 2017, at 18:50, varun saluja  wrote:
>> >
>> > Hi Experts,
>> >
>> > Seeking your help on a production issue.  We were running high write 
>> > intensive job on our 3 node cassandra cluster V 2.1.7.
>> >
>> > TPS on nodes were high. Job ran for more than 2 days and thereafter, 
>> > loadavg on 1 of the node increased to very high number like loadavg : 29.
>> >
>> > System log reports:
>> >
>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 
>> > - 839 MUTATION messages dropped in last 5000ms
>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 
>> > - 2 READ messages dropped in last 5000ms
>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 
>> > - 1 REQUEST_RESPONSE messages dropped in last 5000ms
>> >
>> > The job was stopped due to heavy load. But sill after 12 hours , we can 
>> > see mutation drops messages and sudden increase on avgload
>> >
>> > Are these hintedhandoff mutations? Can we stop these.
>> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any 
>> > load or any such activity.
>> >
>> > Due to heavy load and GC , there are intermittent gossip failures among 
>> > node. Can you someone Please help.
>> >
>> > PS: Load job was stopped on cluster. Everything ran fine for few hours and 
>> > and Later issue started again like mutation messages drops.
>> >
>> > Thanks and Regards,
>> > Varun Saluja
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> > For additional commands, e-mail: dev-h...@cassandra.apache.org
>> >
> 


Re: Dropped Mutation and Read messages.

2017-05-11 Thread Oskar Kjellin
That seems way too low. Depending on what type of disk you have it should be 
closer to 1-200MB.
That's probably causing your problems. It would still take a while for you to 
compact all your data tho 

Sent from my iPhone

> On 11 May 2017, at 19:50, varun saluja  wrote:
> 
> nodetool getcompactionthrougput
> 
> ./nodetool getcompactionthroughput
> Current compaction throughput: 16 MB/s
> 
> Regards,
> Varun Saluja
> 
>> On 11 May 2017 at 23:18, varun saluja  wrote:
>> Hi,
>> 
>> PFB results for same. Numbers are scary here.
>> 
>> [root@WA-CASSDB2 bin]# ./nodetool compactionstats
>> pending tasks: 137
>>compaction type keyspace tablecompleted   
>>totalunit   progress
>> Compaction   system hints   5762711108   
>> 837522028005   bytes  0.69%
>> Compaction   walletkeyspace   user_txn_history_v2101477894 
>> 4722068388   bytes  2.15%
>> Compaction   walletkeyspace   user_txn_history_v2   1511866634   
>> 753221762663   bytes  0.20%
>> Compaction   walletkeyspace   user_txn_history_v2   3664734135
>> 18605501268   bytes 19.70%
>> Active compaction remaining time :  26h32m28s
>> 
>> 
>> 
>>> On 11 May 2017 at 23:15, Oskar Kjellin  wrote:
>>> What does nodetool compactionstats show?
>>> 
>>> I meant compaction throttling. nodetool getcompactionthrougput
>>> 
>>> 
 On 11 May 2017, at 19:41, varun saluja  wrote:
 
 Hi Oskar,
 
 Thanks for response.
 
  Yes, could see lot of threads for compaction. Actually we are loading 
 around 400GB data  per node on 3 node cassandra cluster.
 Throttling was set to write around 7k TPS per node. Job ran fine for 2 
 days and then, we start getting Mutation drops  , longer GC and very high 
 load on system.
 
 System log reports:
 Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) 
 off-heap
 
  The job was stopped 12 hours back. But, still these failures can be seen. 
 Can you Please let me know how shall i proceed further. If possible, 
 Please suggest some parameters for high write intensive jobs.
 
 
 Regards,
 Varun Saluja
 
 
> On 11 May 2017 at 23:01, Oskar Kjellin  wrote:
> Do you have a lot of compactions going on? It sounds like you might've 
> built up a huge backlog. Is your throttling configured properly?
> 
> > On 11 May 2017, at 18:50, varun saluja  wrote:
> >
> > Hi Experts,
> >
> > Seeking your help on a production issue.  We were running high write 
> > intensive job on our 3 node cassandra cluster V 2.1.7.
> >
> > TPS on nodes were high. Job ran for more than 2 days and thereafter, 
> > loadavg on 1 of the node increased to very high number like loadavg : 
> > 29.
> >
> > System log reports:
> >
> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 
> > MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 
> > MessagingService.java:888 - 2 READ messages dropped in last 5000ms
> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 
> > MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 
> > 5000ms
> >
> > The job was stopped due to heavy load. But sill after 12 hours , we can 
> > see mutation drops messages and sudden increase on avgload
> >
> > Are these hintedhandoff mutations? Can we stop these.
> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show 
> > any load or any such activity.
> >
> > Due to heavy load and GC , there are intermittent gossip failures among 
> > node. Can you someone Please help.
> >
> > PS: Load job was stopped on cluster. Everything ran fine for few hours 
> > and and Later issue started again like mutation messages drops.
> >
> > Thanks and Regards,
> > Varun Saluja
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
 
>> 
> 


Re: Dropped Mutation and Read messages.

2017-05-11 Thread varun saluja
Hi,

PFB results for same. Numbers are scary here.

[root@WA-CASSDB2 bin]# ./nodetool compactionstats
pending tasks: 137
   compaction type keyspace tablecompleted
 totalunit   progress
Compaction   system hints   5762711108
837522028005   bytes  0.69%
Compaction   walletkeyspace   user_txn_history_v2101477894
4722068388   bytes  2.15%
Compaction   walletkeyspace   user_txn_history_v2   1511866634
753221762663   bytes  0.20%
Compaction   walletkeyspace   user_txn_history_v2   3664734135
 18605501268   bytes 19.70%
Active compaction remaining time :  *26h32m28s*



On 11 May 2017 at 23:15, Oskar Kjellin  wrote:

> What does nodetool compactionstats show?
>
> I meant compaction throttling. nodetool getcompactionthrougput
>
>
> On 11 May 2017, at 19:41, varun saluja  wrote:
>
> Hi Oskar,
>
> Thanks for response.
>
>  Yes, could see lot of threads for compaction. Actually we are loading
> around 400GB data  per node on 3 node cassandra cluster.
> Throttling was set to write around 7k TPS per node. Job ran fine for 2
> days and then, we start getting Mutation drops  , longer GC and very high
> load on system.
>
> System log reports:
> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%)
> off-heap
>
>  The job was stopped 12 hours back. But, still these failures can be seen.
> Can you Please let me know how shall i proceed further. If possible, Please
> suggest some parameters for high write intensive jobs.
>
>
> Regards,
> Varun Saluja
>
>
> On 11 May 2017 at 23:01, Oskar Kjellin  wrote:
>
>> Do you have a lot of compactions going on? It sounds like you might've
>> built up a huge backlog. Is your throttling configured properly?
>>
>> > On 11 May 2017, at 18:50, varun saluja  wrote:
>> >
>> > Hi Experts,
>> >
>> > Seeking your help on a production issue.  We were running high write
>> intensive job on our 3 node cassandra cluster V 2.1.7.
>> >
>> > TPS on nodes were high. Job ran for more than 2 days and thereafter,
>> loadavg on 1 of the node increased to very high number like loadavg : 29.
>> >
>> > System log reports:
>> >
>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
>> MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
>> MessagingService.java:888 - 2 READ messages dropped in last 5000ms
>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
>> MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last
>> 5000ms
>> >
>> > The job was stopped due to heavy load. But sill after 12 hours , we can
>> see mutation drops messages and sudden increase on avgload
>> >
>> > Are these hintedhandoff mutations? Can we stop these.
>> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show
>> any load or any such activity.
>> >
>> > Due to heavy load and GC , there are intermittent gossip failures among
>> node. Can you someone Please help.
>> >
>> > PS: Load job was stopped on cluster. Everything ran fine for few hours
>> and and Later issue started again like mutation messages drops.
>> >
>> > Thanks and Regards,
>> > Varun Saluja
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> > For additional commands, e-mail: dev-h...@cassandra.apache.org
>> >
>>
>
>


Re: Dropped Mutation and Read messages.

2017-05-11 Thread varun saluja
*nodetool getcompactionthrougput*

./nodetool getcompactionthroughput
Current compaction throughput: 16 MB/s

Regards,
Varun Saluja

On 11 May 2017 at 23:18, varun saluja  wrote:

> Hi,
>
> PFB results for same. Numbers are scary here.
>
> [root@WA-CASSDB2 bin]# ./nodetool compactionstats
> pending tasks: 137
>compaction type keyspace tablecompleted
>  totalunit   progress
> Compaction   system hints   5762711108
> 837522028005   bytes  0.69%
> Compaction   walletkeyspace   user_txn_history_v2101477894
> 4722068388   bytes  2.15%
> Compaction   walletkeyspace   user_txn_history_v2   1511866634
> 753221762663   bytes  0.20%
> Compaction   walletkeyspace   user_txn_history_v2   3664734135
>  18605501268   bytes 19.70%
> Active compaction remaining time :  *26h32m28s*
>
>
>
> On 11 May 2017 at 23:15, Oskar Kjellin  wrote:
>
>> What does nodetool compactionstats show?
>>
>> I meant compaction throttling. nodetool getcompactionthrougput
>>
>>
>> On 11 May 2017, at 19:41, varun saluja  wrote:
>>
>> Hi Oskar,
>>
>> Thanks for response.
>>
>>  Yes, could see lot of threads for compaction. Actually we are loading
>> around 400GB data  per node on 3 node cassandra cluster.
>> Throttling was set to write around 7k TPS per node. Job ran fine for 2
>> days and then, we start getting Mutation drops  , longer GC and very high
>> load on system.
>>
>> System log reports:
>> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%)
>> off-heap
>>
>>  The job was stopped 12 hours back. But, still these failures can be
>> seen. Can you Please let me know how shall i proceed further. If possible,
>> Please suggest some parameters for high write intensive jobs.
>>
>>
>> Regards,
>> Varun Saluja
>>
>>
>> On 11 May 2017 at 23:01, Oskar Kjellin  wrote:
>>
>>> Do you have a lot of compactions going on? It sounds like you might've
>>> built up a huge backlog. Is your throttling configured properly?
>>>
>>> > On 11 May 2017, at 18:50, varun saluja  wrote:
>>> >
>>> > Hi Experts,
>>> >
>>> > Seeking your help on a production issue.  We were running high write
>>> intensive job on our 3 node cassandra cluster V 2.1.7.
>>> >
>>> > TPS on nodes were high. Job ran for more than 2 days and thereafter,
>>> loadavg on 1 of the node increased to very high number like loadavg : 29.
>>> >
>>> > System log reports:
>>> >
>>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
>>> MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
>>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
>>> MessagingService.java:888 - 2 READ messages dropped in last 5000ms
>>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
>>> MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last
>>> 5000ms
>>> >
>>> > The job was stopped due to heavy load. But sill after 12 hours , we
>>> can see mutation drops messages and sudden increase on avgload
>>> >
>>> > Are these hintedhandoff mutations? Can we stop these.
>>> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show
>>> any load or any such activity.
>>> >
>>> > Due to heavy load and GC , there are intermittent gossip failures
>>> among node. Can you someone Please help.
>>> >
>>> > PS: Load job was stopped on cluster. Everything ran fine for few hours
>>> and and Later issue started again like mutation messages drops.
>>> >
>>> > Thanks and Regards,
>>> > Varun Saluja
>>> >
>>> > -
>>> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> > For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> >
>>>
>>
>>
>


Re: Dropped Mutation and Read messages.

2017-05-11 Thread varun saluja
Hi Oskar,

Thanks for response.

 Yes, could see lot of threads for compaction. Actually we are loading
around 400GB data  per node on 3 node cassandra cluster.
Throttling was set to write around 7k TPS per node. Job ran fine for 2 days
and then, we start getting Mutation drops  , longer GC and very high load
on system.

System log reports:
Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%)
off-heap

 The job was stopped 12 hours back. But, still these failures can be seen.
Can you Please let me know how shall i proceed further. If possible, Please
suggest some parameters for high write intensive jobs.


Regards,
Varun Saluja


On 11 May 2017 at 23:01, Oskar Kjellin  wrote:

> Do you have a lot of compactions going on? It sounds like you might've
> built up a huge backlog. Is your throttling configured properly?
>
> > On 11 May 2017, at 18:50, varun saluja  wrote:
> >
> > Hi Experts,
> >
> > Seeking your help on a production issue.  We were running high write
> intensive job on our 3 node cassandra cluster V 2.1.7.
> >
> > TPS on nodes were high. Job ran for more than 2 days and thereafter,
> loadavg on 1 of the node increased to very high number like loadavg : 29.
> >
> > System log reports:
> >
> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
> MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
> MessagingService.java:888 - 2 READ messages dropped in last 5000ms
> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
> MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last
> 5000ms
> >
> > The job was stopped due to heavy load. But sill after 12 hours , we can
> see mutation drops messages and sudden increase on avgload
> >
> > Are these hintedhandoff mutations? Can we stop these.
> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show
> any load or any such activity.
> >
> > Due to heavy load and GC , there are intermittent gossip failures among
> node. Can you someone Please help.
> >
> > PS: Load job was stopped on cluster. Everything ran fine for few hours
> and and Later issue started again like mutation messages drops.
> >
> > Thanks and Regards,
> > Varun Saluja
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>


Re: Dropped Mutation and Read messages.

2017-05-11 Thread Michael Kjellman
This discussion should be on the C* user mailing list. Thanks!

best,
kjellman

> On May 11, 2017, at 10:53 AM, Oskar Kjellin  wrote:
> 
> That seems way too low. Depending on what type of disk you have it should be 
> closer to 1-200MB.
> That's probably causing your problems. It would still take a while for you to 
> compact all your data tho 
> 
> Sent from my iPhone
> 
>> On 11 May 2017, at 19:50, varun saluja  wrote:
>> 
>> nodetool getcompactionthrougput
>> 
>> ./nodetool getcompactionthroughput
>> Current compaction throughput: 16 MB/s
>> 
>> Regards,
>> Varun Saluja
>> 
>>> On 11 May 2017 at 23:18, varun saluja  wrote:
>>> Hi,
>>> 
>>> PFB results for same. Numbers are scary here.
>>> 
>>> [root@WA-CASSDB2 bin]# ./nodetool compactionstats
>>> pending tasks: 137
>>>   compaction type keyspace tablecompleted   
>>>totalunit   progress
>>>Compaction   system hints   5762711108   
>>> 837522028005   bytes  0.69%
>>>Compaction   walletkeyspace   user_txn_history_v2101477894 
>>> 4722068388   bytes  2.15%
>>>Compaction   walletkeyspace   user_txn_history_v2   1511866634   
>>> 753221762663   bytes  0.20%
>>>Compaction   walletkeyspace   user_txn_history_v2   3664734135
>>> 18605501268   bytes 19.70%
>>> Active compaction remaining time :  26h32m28s
>>> 
>>> 
>>> 
 On 11 May 2017 at 23:15, Oskar Kjellin  wrote:
 What does nodetool compactionstats show?
 
 I meant compaction throttling. nodetool getcompactionthrougput
 
 
> On 11 May 2017, at 19:41, varun saluja  wrote:
> 
> Hi Oskar,
> 
> Thanks for response.
> 
> Yes, could see lot of threads for compaction. Actually we are loading 
> around 400GB data  per node on 3 node cassandra cluster.
> Throttling was set to write around 7k TPS per node. Job ran fine for 2 
> days and then, we start getting Mutation drops  , longer GC and very high 
> load on system.
> 
> System log reports:
> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) 
> off-heap
> 
> The job was stopped 12 hours back. But, still these failures can be seen. 
> Can you Please let me know how shall i proceed further. If possible, 
> Please suggest some parameters for high write intensive jobs.
> 
> 
> Regards,
> Varun Saluja
> 
> 
>> On 11 May 2017 at 23:01, Oskar Kjellin  wrote:
>> Do you have a lot of compactions going on? It sounds like you might've 
>> built up a huge backlog. Is your throttling configured properly?
>> 
>>> On 11 May 2017, at 18:50, varun saluja  wrote:
>>> 
>>> Hi Experts,
>>> 
>>> Seeking your help on a production issue.  We were running high write 
>>> intensive job on our 3 node cassandra cluster V 2.1.7.
>>> 
>>> TPS on nodes were high. Job ran for more than 2 days and thereafter, 
>>> loadavg on 1 of the node increased to very high number like loadavg : 
>>> 29.
>>> 
>>> System log reports:
>>> 
>>> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 
>>> MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
>>> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 
>>> MessagingService.java:888 - 2 READ messages dropped in last 5000ms
>>> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 
>>> MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 
>>> 5000ms
>>> 
>>> The job was stopped due to heavy load. But sill after 12 hours , we can 
>>> see mutation drops messages and sudden increase on avgload
>>> 
>>> Are these hintedhandoff mutations? Can we stop these.
>>> Strangely this behaviour is seen only on 2 nodes. Node 1 does not show 
>>> any load or any such activity.
>>> 
>>> Due to heavy load and GC , there are intermittent gossip failures among 
>>> node. Can you someone Please help.
>>> 
>>> PS: Load job was stopped on cluster. Everything ran fine for few hours 
>>> and and Later issue started again like mutation messages drops.
>>> 
>>> Thanks and Regards,
>>> Varun Saluja
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
> 
>>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Dropped Mutation and Read messages.

2017-05-11 Thread Oskar Kjellin
Indeed, sorry. Subscribed to both so missed which one this was. 

Sent from my iPhone

> On 11 May 2017, at 19:56, Michael Kjellman  
> wrote:
> 
> This discussion should be on the C* user mailing list. Thanks!
> 
> best,
> kjellman
> 
>> On May 11, 2017, at 10:53 AM, Oskar Kjellin  wrote:
>> 
>> That seems way too low. Depending on what type of disk you have it should be 
>> closer to 1-200MB.
>> That's probably causing your problems. It would still take a while for you 
>> to compact all your data tho 
>> 
>> Sent from my iPhone
>> 
>>> On 11 May 2017, at 19:50, varun saluja  wrote:
>>> 
>>> nodetool getcompactionthrougput
>>> 
>>> ./nodetool getcompactionthroughput
>>> Current compaction throughput: 16 MB/s
>>> 
>>> Regards,
>>> Varun Saluja
>>> 
 On 11 May 2017 at 23:18, varun saluja  wrote:
 Hi,
 
 PFB results for same. Numbers are scary here.
 
 [root@WA-CASSDB2 bin]# ./nodetool compactionstats
 pending tasks: 137
  compaction type keyspace tablecompleted   
totalunit   progress
   Compaction   system hints   5762711108   
 837522028005   bytes  0.69%
   Compaction   walletkeyspace   user_txn_history_v2101477894 
 4722068388   bytes  2.15%
   Compaction   walletkeyspace   user_txn_history_v2   1511866634   
 753221762663   bytes  0.20%
   Compaction   walletkeyspace   user_txn_history_v2   3664734135
 18605501268   bytes 19.70%
 Active compaction remaining time :  26h32m28s
 
 
 
> On 11 May 2017 at 23:15, Oskar Kjellin  wrote:
> What does nodetool compactionstats show?
> 
> I meant compaction throttling. nodetool getcompactionthrougput
> 
> 
>> On 11 May 2017, at 19:41, varun saluja  wrote:
>> 
>> Hi Oskar,
>> 
>> Thanks for response.
>> 
>> Yes, could see lot of threads for compaction. Actually we are loading 
>> around 400GB data  per node on 3 node cassandra cluster.
>> Throttling was set to write around 7k TPS per node. Job ran fine for 2 
>> days and then, we start getting Mutation drops  , longer GC and very 
>> high load on system.
>> 
>> System log reports:
>> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) 
>> off-heap
>> 
>> The job was stopped 12 hours back. But, still these failures can be 
>> seen. Can you Please let me know how shall i proceed further. If 
>> possible, Please suggest some parameters for high write intensive jobs.
>> 
>> 
>> Regards,
>> Varun Saluja
>> 
>> 
>>> On 11 May 2017 at 23:01, Oskar Kjellin  wrote:
>>> Do you have a lot of compactions going on? It sounds like you might've 
>>> built up a huge backlog. Is your throttling configured properly?
>>> 
 On 11 May 2017, at 18:50, varun saluja  wrote:
 
 Hi Experts,
 
 Seeking your help on a production issue.  We were running high write 
 intensive job on our 3 node cassandra cluster V 2.1.7.
 
 TPS on nodes were high. Job ran for more than 2 days and thereafter, 
 loadavg on 1 of the node increased to very high number like loadavg : 
 29.
 
 System log reports:
 
 INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 
 MessagingService.java:888 - 839 MUTATION messages dropped in last 
 5000ms
 INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 
 MessagingService.java:888 - 2 READ messages dropped in last 5000ms
 INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 
 MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in 
 last 5000ms
 
 The job was stopped due to heavy load. But sill after 12 hours , we 
 can see mutation drops messages and sudden increase on avgload
 
 Are these hintedhandoff mutations? Can we stop these.
 Strangely this behaviour is seen only on 2 nodes. Node 1 does not show 
 any load or any such activity.
 
 Due to heavy load and GC , there are intermittent gossip failures 
 among node. Can you someone Please help.
 
 PS: Load job was stopped on cluster. Everything ran fine for few hours 
 and and Later issue started again like mutation messages drops.
 
 Thanks and Regards,
 Varun Saluja
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: dev-h...@cassandra.apache.org
 
>> 
 
>>> 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Integrating vendor-specific code and developing plugins

2017-05-11 Thread 大平怜
Hi all,

In this JIRA ticket https://issues.apache.org/jira/browse/CASSANDRA-13486,
we proposed integrating our code to support a fast flash+FPGA card (called
CAPI Flash) only available in the ppc architecture. Although we will keep
discussing the topics specific to the patch (e.g. documentation, license,
code quality) in the JIRA, we would like to start in this dev list more
general discussion about how to (and how not to) merge
architecture-specific (or vendor-specific) changes.

I think in the end the problem boils down to how to test the
architecture-specific code. The original contributors of the
architecture-specific code can keep "supporting" the code in a sense that
when a problem arises they can fix it and send a patch, but the committers
cannot test it anyway.  Are there any other factors we must consider?

Also, in this particular case, it is relatively easy to turn the code
change into a plugin because it extended the already-pluggable RowCache.  I
feel Cassandra has promoted the plugins not so much as other pluggable
software have done like Eclipse, Apache HTTP server, fluentd, etc.  For
example, they have a list of plugins in their Web pages.  I think if the
community wants to encourage developers to maintain vendor-specific code as
plugins outside of the main source tree, a deeper commitment to the plugin
ecosystem would be appreciated.

What do you think?


Thanks,
Rei Odaira


Re: Integrating vendor-specific code and developing plugins

2017-05-11 Thread Jason Brown
Hey all,

I'm on-board with what Rei is saying. I think we should be open to, and
encourage, other platforms/architectures for integration. Of course, it
will come down to specific maintainers/committers to do the testing and
verification on non-typical platforms. Hopefully those maintainers will
also contribute to other parts of the code base, as well, so I see this as
another way to bring more folks into the project.

WRT extensibility, it just requires someone to do the work of making
reasonable abstraction points - and documenting them ;). The interesting
question comes down to how to host/ship any pluggable dependencies. Much
like what we had with jna before they relicensed, we'll probably ship some
things in-tree and some things users will have to fetch and deploy
themselves; it's a case-by-case basis.

Thanks,

-Jason


On Thu, May 11, 2017 at 2:59 PM, 大平怜  wrote:

> Hi all,
>
> In this JIRA ticket https://issues.apache.org/jira/browse/CASSANDRA-13486,
> we proposed integrating our code to support a fast flash+FPGA card (called
> CAPI Flash) only available in the ppc architecture. Although we will keep
> discussing the topics specific to the patch (e.g. documentation, license,
> code quality) in the JIRA, we would like to start in this dev list more
> general discussion about how to (and how not to) merge
> architecture-specific (or vendor-specific) changes.
>
> I think in the end the problem boils down to how to test the
> architecture-specific code. The original contributors of the
> architecture-specific code can keep "supporting" the code in a sense that
> when a problem arises they can fix it and send a patch, but the committers
> cannot test it anyway.  Are there any other factors we must consider?
>
> Also, in this particular case, it is relatively easy to turn the code
> change into a plugin because it extended the already-pluggable RowCache.  I
> feel Cassandra has promoted the plugins not so much as other pluggable
> software have done like Eclipse, Apache HTTP server, fluentd, etc.  For
> example, they have a list of plugins in their Web pages.  I think if the
> community wants to encourage developers to maintain vendor-specific code as
> plugins outside of the main source tree, a deeper commitment to the plugin
> ecosystem would be appreciated.
>
> What do you think?
>
>
> Thanks,
> Rei Odaira
>