[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-07-02 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049702#comment-14049702
 ] 

Marcus Eriksson commented on CASSANDRA-6621:


Just realized that since CASSANDRA-6503 we will wait until all files have 
completed streaming before adding them to the manifest. The latest addition to 
the patch will just make sure we don't do STCS on the *new* data.

We should probably change that to add the files immediately after they are 
streamed (unless we are doing repair), WDYT [~yukim]? Should always be 
beneficial to at least do any compaction on the streamed data during bootstrap 
and we should not have the problem of resurfacing data in the bootstrap case.

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.10

 Attachments: 0001-option-to-disallow-L0-stcs.patch, 
 0001-property-to-disable-stcs-in-l0-v2.patch, 
 0001-property-to-disable-stcs-in-l0.patch, 
 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-07-02 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050034#comment-14050034
 ] 

Yuki Morishita commented on CASSANDRA-6621:
---

I think better solution is CASSANDRA-7460 so we don't need to introduce special 
case bootstrap streaming in 2.0.

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.10

 Attachments: 0001-option-to-disallow-L0-stcs.patch, 
 0001-property-to-disable-stcs-in-l0-v2.patch, 
 0001-property-to-disable-stcs-in-l0.patch, 
 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-07-01 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049062#comment-14049062
 ] 

T Jake Luciani commented on CASSANDRA-6621:
---

+1

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.10

 Attachments: 0001-option-to-disallow-L0-stcs.patch, 
 0001-property-to-disable-stcs-in-l0-v2.patch, 
 0001-property-to-disable-stcs-in-l0.patch, 
 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-30 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047705#comment-14047705
 ] 

T Jake Luciani commented on CASSANDRA-6621:
---

[~krummas] in addition to the property, can't we also automatically avoid L0 
STCS during bootstrap by checking SystemKeyspace.bootstrapInProgress()?

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.10

 Attachments: 0001-option-to-disallow-L0-stcs.patch, 
 0001-property-to-disable-stcs-in-l0.patch, 
 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-30 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047715#comment-14047715
 ] 

Marcus Eriksson commented on CASSANDRA-6621:


Ah, now I get what you meant, yes, we can by default always disable STCS in L0 
during bootstrap, will fix

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.10

 Attachments: 0001-option-to-disallow-L0-stcs.patch, 
 0001-property-to-disable-stcs-in-l0.patch, 
 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-30 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047712#comment-14047712
 ] 

Marcus Eriksson commented on CASSANDRA-6621:


My reasoning was that you probably want it disabled for a while after bootstrap 
is done since there will probably still be many files in L0.

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.10

 Attachments: 0001-option-to-disallow-L0-stcs.patch, 
 0001-property-to-disable-stcs-in-l0.patch, 
 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-30 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047770#comment-14047770
 ] 

T Jake Luciani commented on CASSANDRA-6621:
---

[~rbranson] had raised concerns of using STCS at all so I ran some tests that 
show STCS of L0 improves overall compaction time by 15% and lowers read 
latency.  I wrote up my tests here

https://docs.google.com/document/d/1eI8BbVed_TWBYKZwX9y8dPowCIOPPOwTvD_hF_47l7w/edit?usp=sharing



 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.10

 Attachments: 0001-option-to-disallow-L0-stcs.patch, 
 0001-property-to-disable-stcs-in-l0.patch, 
 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-27 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046016#comment-14046016
 ] 

Marcus Eriksson commented on CASSANDRA-6621:


created CASSANDRA-7460 for the keep-sstable-level patch

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.9

 Attachments: 0001-option-to-disallow-L0-stcs.patch, 
 0001-property-to-disable-stcs-in-l0.patch, 
 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-26 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044375#comment-14044375
 ] 

Marcus Eriksson commented on CASSANDRA-6621:


I think Rick means that if the attached patch does not hit 2.0, we should at 
least add an option to disable STCS in L0 in 2.0

And, I agree, though default should probably leave the behavior as-is (ie, 
continue doing stcs unless disabled) to avoid any surprises.



 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.9

 Attachments: 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-26 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045053#comment-14045053
 ] 

Yuki Morishita commented on CASSANDRA-6621:
---

[~krummas] Controling through compaction option is nice idea. But do we want to 
change all nodes at once? When we just want to disable in bootstrapping node, 
then we need to add JMX interface or system property to control.

Also, let's move Marcus's first patch to new JIRA and target it to 2.1.x 
release.

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.9

 Attachments: 0001-option-to-disallow-L0-stcs.patch, 
 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-26 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045067#comment-14045067
 ] 

T Jake Luciani commented on CASSANDRA-6621:
---

Yeah I agree with [~yukim] it should be perhaps a property

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.9

 Attachments: 0001-option-to-disallow-L0-stcs.patch, 
 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-26 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045237#comment-14045237
 ] 

sankalp kohli commented on CASSANDRA-6621:
--

+1 on [~yukim] comment. 

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.9

 Attachments: 0001-option-to-disallow-L0-stcs.patch, 
 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-25 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043644#comment-14043644
 ] 

Yuki Morishita commented on CASSANDRA-6621:
---

Streaming protocol version is independent from MessagingService, so the impact 
is limited inside streaming.
Though since we currently check if streaming version matches exactly between 
nodes, it would be problematic when bootstrapping new node in the middle of 
upgrading.
We need to change that check logic to accept lower stream version and make sure 
message serializers accept both old and new versions.

With that said, I'm +1 to move this to 2.1 if protocol change is involved.

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.9

 Attachments: 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-25 Thread Rick Branson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043921#comment-14043921
 ] 

Rick Branson commented on CASSANDRA-6621:
-

How about we add a switch to enable/disable this and disable it by default in 
2.0 then? This behavior is a net negative for people using LCS *unless* they 
happen to have long bursts of *very* high write volume.

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.9

 Attachments: 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-25 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043927#comment-14043927
 ] 

sankalp kohli commented on CASSANDRA-6621:
--

This behavior is a net negative for people using LCS
Why is it negative? 

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.9

 Attachments: 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-25 Thread Rick Branson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043976#comment-14043976
 ] 

Rick Branson commented on CASSANDRA-6621:
-

As I said above, it causes bootstraps to require 2x disk space. One of the 
stated goals of LCS is to avoid the 2x disk space requirement, which the 
fallback violates.

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.9

 Attachments: 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-25 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044015#comment-14044015
 ] 

sankalp kohli commented on CASSANDRA-6621:
--

If we are doing it in 2.0, then we need to make a note in NEWS.txt that 
streaming won't be supported during upgrade. 
Also CASSANDRA-7414 will be key to recover from not so full levels. 
Your patch looks good otherwise. 

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.9

 Attachments: 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-25 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044017#comment-14044017
 ] 

sankalp kohli commented on CASSANDRA-6621:
--

[~rbranson] I am a little confused :)
 it causes bootstraps to require 2x disk space.
Here you mean the current behavior causes it to take 2x space or once the fix 
is in place? 

Why do you want  it to be default to off in 2.0? I am fine if it is bumped to 
2.1 but curious :)

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction, streaming
 Fix For: 2.0.9

 Attachments: 0001-wip-keep-sstable-level-when-bootstrapping.patch


 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-12 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029149#comment-14029149
 ] 

Marcus Eriksson commented on CASSANDRA-6621:


bq. Stream stables from the source by sorting them by level which will cause 
streaming of stables in following order L1 to Lx and then finally L0.
Would it be simpler to just look at all sstables on the new node after we have 
bootstrapped (but before we start compacting) and try to optimally distribute 
them in levels? Feels like we could do a better job in this case

btw, we need the pick high level sstable for lower level compaction-thing (or 
something similar) after we have run nodetool cleanup on a node, we would 
have the same situation with many half-empty levels there as well

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Priority: Minor

 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-12 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029432#comment-14029432
 ] 

sankalp kohli commented on CASSANDRA-6621:
--

Would it be simpler to just look at all sstables on the new node after we have 
bootstrapped (but before we start compacting) and try to optimally distribute 
them in levels? Feels like we could do a better job in this case
Yes this will be better if we can afford to pause the compaction on organic 
writes. I was trying to give an approach in which we don't pause compaction. 
The thing to keep in mind is that for large nodes, it will take several hours 
to bootstrap and compaction won't be running on the new organic data. This 
might be a problem for cases where people insert into C* in batches.  

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Assignee: Marcus Eriksson
Priority: Minor
  Labels: compaction

 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-11 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028766#comment-14028766
 ] 

Marcus Eriksson commented on CASSANDRA-6621:


bq. We might want to also stream the stable level and can put stables coming 
from same level in one level on the bootstrapping node. The problem with this 
will be that we might end up with very few stable in higher levels violating 
the constrain that only last level can be less than limit.

This should be fine for short periods of time right? Problem will be that it 
will take a long time until the highest level gets compacted. What if we detect 
that, and include a couple of those high level sstables in lower lever 
compactions until the higher level is empty or starts doing real compactions?



 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Priority: Minor

 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-11 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028820#comment-14028820
 ] 

sankalp kohli commented on CASSANDRA-6621:
--

 What if we detect that, and include a couple of those high level sstables in 
lower lever compactions until the higher level is empty or starts doing real 
compactions?
Yes we can keep doing that till last level is the only level which is not full. 
To minimize this, like I said we can do this:
Stream stables from the source by sorting them by level which will cause 
streaming of stables in following order L1 to Lx and then finally L0.
Sounds good? 


 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Priority: Minor

 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-10 Thread Rick Branson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026916#comment-14026916
 ] 

Rick Branson commented on CASSANDRA-6621:
-

This is definitely sub-optimal for us as well, just in terms of time spent 
compacting after a bootstrap/rebuild. We never get behind in L0 during normal 
operation. In addition, it causes one of the nice things about LCS to be 
invalidated as well, which is that you never have to worry about having double 
the disk space to compact. Bootstraps cause large compactions (~50% of the size 
of the CF), which means we need a ton of extra disk on bootstrap just to build 
nodes.

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Priority: Minor

 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-10 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027006#comment-14027006
 ] 

Jonathan Ellis commented on CASSANDRA-6621:
---

What do you think we could do to mitigate this, [~krummas]?

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Priority: Minor

 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-10 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027029#comment-14027029
 ] 

sankalp kohli commented on CASSANDRA-6621:
--

If we stream different regions from different nodes how can we get any 
overlaps?
If a node from which it is streaming has the same row in two stable in 
different levels, it will come in as 2 stables. The only place we can put it is 
L0. 
We might want to also stream the stable level and can put stables coming from 
same level in one level on the bootstrapping node.  The problem with this will 
be that we might end up with very few stable in higher levels violating the 
constrain that only last level can be less than limit. 


 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Priority: Minor

 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-10 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027035#comment-14027035
 ] 

Jonathan Ellis commented on CASSANDRA-6621:
---

What if we just special cased LCS during bootstrap to just put streamed data 
into the first level it doesn't overlap, up to X levels where X is calculated 
from total dataset being streamed?  This won't necessarily be optimal but it 
will always do better than the status quo and never worse.

 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Priority: Minor

 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6621) STCS fallback is not optimal when bootstrapping

2014-06-10 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027113#comment-14027113
 ] 

sankalp kohli commented on CASSANDRA-6621:
--

What if we just special cased LCS during bootstrap to just put streamed data 
into the first level it doesn't overlap
I agree this will be better but here is another optional improvement which can 
minimize stables in L0. 

Stream stables from the source by sorting them by level which will cause 
streaming of stables in following order L1 to Lx and then finally L0. Here is 
why this will help. 
1) If we stream an stable from higher level first, it will take a plot in L1 
and will kick other stables to higher levels or to even L0. 
2) If L0 of the streaming node is backed up and has 20-30 stables, it might end 
up in filling X levels and will kick other stables to L0 due to overlapping. 
Streaming L0 in the end will help in this case. 

Also I find it cleaner just to visualize that Level Z stables will go in Level 
Z on the node being bootstrapped.
where X is calculated from total dataset being streamed
Also I am not sure whether doing the sort based improvement which I am 
proposing will result in limited number of levels in the bootstrapping node. If 
node is bootstrapping from node A and B and A has 5 levels and B has 3 levels. 
The bootstrap node will have 5 levels. So we might not need to calculate X() 



 STCS fallback is not optimal when bootstrapping
 ---

 Key: CASSANDRA-6621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6621
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański
Priority: Minor

 The initial discussion started in (closed) CASSANDRA-5371. I've rewritten my 
 last comment here...
 After streaming (e.g. during boostrap) Cassandra places all sstables at L0. 
 At the end of the process we end up with huge number of sstables at the 
 lowest level. 
 Currently, Cassandra falls back to STCS until the number of sstables at L0 
 reaches the reasonable level (32 or something).
 I'm not sure if falling back to STCS is the best way to handle this 
 particular situation. I've read the comment in the code and I'm aware why it 
 is a good thing to do if we have to many sstables at L0 as a result of too 
 many random inserts. We have a lot of sstables, each of them covers the whole 
 ring, there's simply no better option.
 However, after the bootstrap situation looks a bit different. The loaded 
 sstables already have very small ranges! We just have to tidy up a bit and 
 everything should be OK. STCS ignores that completely and after a while we 
 have a bit less sstables but each of them covers the whole ring instead of 
 just a small part. I believe that in that case letting LCS do the job is a 
 better option that allowing STCS mix everything up before.
 Is there a way to disable STCS fallback? I'd like to test that scenario in 
 practice during our next bootstrap...
 Does Cassandra really have to put streamed sstables at L0? The only thing we 
 have to assure is that sstables at any given level do not overlap. If we 
 stream different regions from different nodes how can we get any overlaps?



--
This message was sent by Atlassian JIRA
(v6.2#6252)