Re: Nodetool rebuild question

2016-10-06 Thread Jeff Jirsa
Read repairs (both foreground/blocking due to consistency level requirements 
and background/nonblocking due to table option/probability) will go memTable -> 
flush -> sstable.

 

 

 

From: Anubhav Kale <anubhav.k...@microsoft.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Thursday, October 6, 2016 at 11:50 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: RE: Nodetool rebuild question

 

Sure. 

 

When a read repair happens, does it go via the memtable -> SS Table route OR 
does the source node send SS Table tmp files directly to inconsistent replica ?

 

From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com] 
Sent: Wednesday, October 5, 2016 2:20 PM
To: user@cassandra.apache.org
Subject: Re: Nodetool rebuild question

 

If you set RF to 0, you can ignore my second sentence/paragraph. The third 
still applies.

 

 

From: Anubhav Kale <anubhav.k...@microsoft.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wednesday, October 5, 2016 at 1:56 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: RE: Nodetool rebuild question

 

Thanks. 

 

We always set RF to 0 and then “removenode” all nodes in the DC that we want to 
decom. So, I highly doubt that is the problem. Plus, #SSTables on a given node 
on average is ~2000 (we have 140 nodes in one ring and two rings overall).

 

From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com] 
Sent: Wednesday, October 5, 2016 1:44 PM
To: user@cassandra.apache.org
Subject: Re: Nodetool rebuild question

 

Both of your statements are true.

 

During your decom, you likely streamed LOTs of sstables to the remaining nodes 
(especially true if you didn’t drop the replication factor to 0 for the DC you 
decommissioned). Since those tens of thousands of sstables take a while to 
compact, if you then rebuild (or bootstrap) before compaction is done, you’ll 
get a LOT of extra sstables.

 

This is one of the reasons that people with large clusters don’t use vnodes – 
if you needed to bootstrap ~100 more nodes into a cluster, you’d have to wait 
potentially a day or more per node to compact away the leftovers before 
bootstrapping the next, which is prohibitive at scale. 

 

-  Jeff

 

From: Anubhav Kale <anubhav.k...@microsoft.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wednesday, October 5, 2016 at 1:34 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Nodetool rebuild question

 

Hello,

 

As part of rebuild, I noticed that the destination node gets -tmp- files from 
other nodes. Are following statements correct ?

 

1.   The files are written to disk without going through memtables.

2.   Regular compactors eventually compact them to bring down # SSTables to 
a reasonable number.

 

We have noticed that the destination node has created > 40K *Data* files in 
first hour of streaming itself. We have not seen such pattern before, so trying 
to understand what could have changed. (We do use Vnodes and We haven’t 
increased # nodes recently, but have decomm-ed a DC). 

 

Thanks much !


CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and may 
be legally privileged. If you are not the intended recipient, do not disclose, 
copy, distribute, or use this email or any attachments. If you have received 
this in error please let the sender know and then delete the email and all 
attachments.


CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and may 
be legally privileged. If you are not the intended recipient, do not disclose, 
copy, distribute, or use this email or any attachments. If you have received 
this in error please let the sender know and then delete the email and all 
attachments.


CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and may 
be legally privileged. If you are not the intended recipient, do not disclose, 
copy, distribute, or use this email or any attachments. If you have received 
this in error please let the sender know and then delete the email and all 
attachments.


smime.p7s
Description: S/MIME cryptographic signature


RE: Nodetool rebuild question

2016-10-06 Thread Anubhav Kale
Sure.

When a read repair happens, does it go via the memtable -> SS Table route OR 
does the source node send SS Table tmp files directly to inconsistent replica ?

From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com]
Sent: Wednesday, October 5, 2016 2:20 PM
To: user@cassandra.apache.org
Subject: Re: Nodetool rebuild question

If you set RF to 0, you can ignore my second sentence/paragraph. The third 
still applies.


From: Anubhav Kale 
<anubhav.k...@microsoft.com<mailto:anubhav.k...@microsoft.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Wednesday, October 5, 2016 at 1:56 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: RE: Nodetool rebuild question

Thanks.

We always set RF to 0 and then “removenode” all nodes in the DC that we want to 
decom. So, I highly doubt that is the problem. Plus, #SSTables on a given node 
on average is ~2000 (we have 140 nodes in one ring and two rings overall).

From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com]
Sent: Wednesday, October 5, 2016 1:44 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Nodetool rebuild question

Both of your statements are true.

During your decom, you likely streamed LOTs of sstables to the remaining nodes 
(especially true if you didn’t drop the replication factor to 0 for the DC you 
decommissioned). Since those tens of thousands of sstables take a while to 
compact, if you then rebuild (or bootstrap) before compaction is done, you’ll 
get a LOT of extra sstables.

This is one of the reasons that people with large clusters don’t use vnodes – 
if you needed to bootstrap ~100 more nodes into a cluster, you’d have to wait 
potentially a day or more per node to compact away the leftovers before 
bootstrapping the next, which is prohibitive at scale.


-  Jeff

From: Anubhav Kale 
<anubhav.k...@microsoft.com<mailto:anubhav.k...@microsoft.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Wednesday, October 5, 2016 at 1:34 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Nodetool rebuild question

Hello,

As part of rebuild, I noticed that the destination node gets -tmp- files from 
other nodes. Are following statements correct ?


1.   The files are written to disk without going through memtables.

2.   Regular compactors eventually compact them to bring down # SSTables to 
a reasonable number.

We have noticed that the destination node has created > 40K *Data* files in 
first hour of streaming itself. We have not seen such pattern before, so trying 
to understand what could have changed. (We do use Vnodes and We haven’t 
increased # nodes recently, but have decomm-ed a DC).

Thanks much !

CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and may 
be legally privileged. If you are not the intended recipient, do not disclose, 
copy, distribute, or use this email or any attachments. If you have received 
this in error please let the sender know and then delete the email and all 
attachments.

CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and may 
be legally privileged. If you are not the intended recipient, do not disclose, 
copy, distribute, or use this email or any attachments. If you have received 
this in error please let the sender know and then delete the email and all 
attachments.


Re: Nodetool rebuild question

2016-10-05 Thread Jeff Jirsa
If you set RF to 0, you can ignore my second sentence/paragraph. The third 
still applies.

 

 

From: Anubhav Kale <anubhav.k...@microsoft.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wednesday, October 5, 2016 at 1:56 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: RE: Nodetool rebuild question

 

Thanks. 

 

We always set RF to 0 and then “removenode” all nodes in the DC that we want to 
decom. So, I highly doubt that is the problem. Plus, #SSTables on a given node 
on average is ~2000 (we have 140 nodes in one ring and two rings overall).

 

From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com] 
Sent: Wednesday, October 5, 2016 1:44 PM
To: user@cassandra.apache.org
Subject: Re: Nodetool rebuild question

 

Both of your statements are true.

 

During your decom, you likely streamed LOTs of sstables to the remaining nodes 
(especially true if you didn’t drop the replication factor to 0 for the DC you 
decommissioned). Since those tens of thousands of sstables take a while to 
compact, if you then rebuild (or bootstrap) before compaction is done, you’ll 
get a LOT of extra sstables.

 

This is one of the reasons that people with large clusters don’t use vnodes – 
if you needed to bootstrap ~100 more nodes into a cluster, you’d have to wait 
potentially a day or more per node to compact away the leftovers before 
bootstrapping the next, which is prohibitive at scale. 

 

-  Jeff

 

From: Anubhav Kale <anubhav.k...@microsoft.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wednesday, October 5, 2016 at 1:34 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Nodetool rebuild question

 

Hello,

 

As part of rebuild, I noticed that the destination node gets -tmp- files from 
other nodes. Are following statements correct ?

 

1.   The files are written to disk without going through memtables.

2.   Regular compactors eventually compact them to bring down # SSTables to 
a reasonable number.

 

We have noticed that the destination node has created > 40K *Data* files in 
first hour of streaming itself. We have not seen such pattern before, so trying 
to understand what could have changed. (We do use Vnodes and We haven’t 
increased # nodes recently, but have decomm-ed a DC). 

 

Thanks much !


CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and may 
be legally privileged. If you are not the intended recipient, do not disclose, 
copy, distribute, or use this email or any attachments. If you have received 
this in error please let the sender know and then delete the email and all 
attachments.


CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and may 
be legally privileged. If you are not the intended recipient, do not disclose, 
copy, distribute, or use this email or any attachments. If you have received 
this in error please let the sender know and then delete the email and all 
attachments.


smime.p7s
Description: S/MIME cryptographic signature


RE: Nodetool rebuild question

2016-10-05 Thread Anubhav Kale
Thanks.

We always set RF to 0 and then “removenode” all nodes in the DC that we want to 
decom. So, I highly doubt that is the problem. Plus, #SSTables on a given node 
on average is ~2000 (we have 140 nodes in one ring and two rings overall).

From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com]
Sent: Wednesday, October 5, 2016 1:44 PM
To: user@cassandra.apache.org
Subject: Re: Nodetool rebuild question

Both of your statements are true.

During your decom, you likely streamed LOTs of sstables to the remaining nodes 
(especially true if you didn’t drop the replication factor to 0 for the DC you 
decommissioned). Since those tens of thousands of sstables take a while to 
compact, if you then rebuild (or bootstrap) before compaction is done, you’ll 
get a LOT of extra sstables.

This is one of the reasons that people with large clusters don’t use vnodes – 
if you needed to bootstrap ~100 more nodes into a cluster, you’d have to wait 
potentially a day or more per node to compact away the leftovers before 
bootstrapping the next, which is prohibitive at scale.


-  Jeff

From: Anubhav Kale 
<anubhav.k...@microsoft.com<mailto:anubhav.k...@microsoft.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Wednesday, October 5, 2016 at 1:34 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Nodetool rebuild question

Hello,

As part of rebuild, I noticed that the destination node gets -tmp- files from 
other nodes. Are following statements correct ?


1.   The files are written to disk without going through memtables.

2.   Regular compactors eventually compact them to bring down # SSTables to 
a reasonable number.

We have noticed that the destination node has created > 40K *Data* files in 
first hour of streaming itself. We have not seen such pattern before, so trying 
to understand what could have changed. (We do use Vnodes and We haven’t 
increased # nodes recently, but have decomm-ed a DC).

Thanks much !

CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and may 
be legally privileged. If you are not the intended recipient, do not disclose, 
copy, distribute, or use this email or any attachments. If you have received 
this in error please let the sender know and then delete the email and all 
attachments.


Re: Nodetool rebuild question

2016-10-05 Thread Jeff Jirsa
Both of your statements are true.

 

During your decom, you likely streamed LOTs of sstables to the remaining nodes 
(especially true if you didn’t drop the replication factor to 0 for the DC you 
decommissioned). Since those tens of thousands of sstables take a while to 
compact, if you then rebuild (or bootstrap) before compaction is done, you’ll 
get a LOT of extra sstables.

 

This is one of the reasons that people with large clusters don’t use vnodes – 
if you needed to bootstrap ~100 more nodes into a cluster, you’d have to wait 
potentially a day or more per node to compact away the leftovers before 
bootstrapping the next, which is prohibitive at scale. 

 

-  Jeff

 

From: Anubhav Kale <anubhav.k...@microsoft.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wednesday, October 5, 2016 at 1:34 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Nodetool rebuild question

 

Hello,

 

As part of rebuild, I noticed that the destination node gets -tmp- files from 
other nodes. Are following statements correct ?

 

1.   The files are written to disk without going through memtables.

2.   Regular compactors eventually compact them to bring down # SSTables to 
a reasonable number.

 

We have noticed that the destination node has created > 40K *Data* files in 
first hour of streaming itself. We have not seen such pattern before, so trying 
to understand what could have changed. (We do use Vnodes and We haven’t 
increased # nodes recently, but have decomm-ed a DC). 

 

Thanks much !


CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and may 
be legally privileged. If you are not the intended recipient, do not disclose, 
copy, distribute, or use this email or any attachments. If you have received 
this in error please let the sender know and then delete the email and all 
attachments.


smime.p7s
Description: S/MIME cryptographic signature


Nodetool rebuild question

2016-10-05 Thread Anubhav Kale
Hello,

As part of rebuild, I noticed that the destination node gets -tmp- files from 
other nodes. Are following statements correct ?


1.   The files are written to disk without going through memtables.

2.   Regular compactors eventually compact them to bring down # SSTables to 
a reasonable number.

We have noticed that the destination node has created > 40K *Data* files in 
first hour of streaming itself. We have not seen such pattern before, so trying 
to understand what could have changed. (We do use Vnodes and We haven't 
increased # nodes recently, but have decomm-ed a DC).

Thanks much !