Re: [VOTE PASSED] Release Apache Cassandra 3.0.15

2017-10-09 Thread Michael Shuler
With 8 binding +1, 1 non-binding +1, and no other votes, the 3.0.15 vote
has passed. I'll get the artifacts published soon.

-- 
Kind regards,
Michael

On 10/02/2017 12:18 PM, Michael Shuler wrote:
> I propose the following artifacts for release as 3.0.15.
> 
> sha1: b32a9e6452c78e6ad08e371314bf1ab7492d0773
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.15-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1150/org/apache/cassandra/apache-cassandra/3.0.15/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1150/
> 
> The Debian packages are available here: http://people.apache.org/~mshuler
> 
> The vote will be open for 72 hours (longer if needed).
> 
> [1]: (CHANGES.txt) https://goo.gl/RyuPpw
> [2]: (NEWS.txt) https://goo.gl/qxwUti
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: What would be the appropriate number of vnodes (num_tokens) to use?

2017-10-09 Thread Jeff Jirsa
As long as balanced is achieved, fewer vnodes the better 

-- 
Jeff Jirsa


> On Oct 9, 2017, at 7:53 AM, Li, Guangxing  wrote:
> 
> Jeff,
> 
> so the key really is to keep nodes load balanced, and as long as that such
> balance is achieved, using a smaller amount of vnodes does not have other
> negative impact?
> 
> Thanks.
> 
> George
> 
>> On Mon, Oct 9, 2017 at 8:46 AM, Jeff Jirsa  wrote:
>> 
>> 256 was chosen because the original vnode allocation algorithm was random
>> and fewer than 256 could lead to unbalanced nodes
>> 
>> In 3.0 there’s a less naive algorithm to ensure more balanced
>> distribution, and there  16 or 32 is probably preferable
>> 
>> 
>> --
>> Jeff Jirsa
>> 
>> 
>>> On Oct 9, 2017, at 7:38 AM, Li, Guangxing 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> the documentation says that '...The recommended initial value for
>>> num_tokens is 256...' and this is what we did with our cluster which is
>>> running Cassandra Community 2.0.9, has 3 physical nodes with replication
>>> factor 3 for all keyspaces, each with 256 vnodes, each physical node has
>>> about 96 GB data. We noticed that doing a repair for some keyspaces can
>>> take up to 37 hours. We did some testing and reduced the number of vnodes
>>> from 256 to 32 for each physical node, and we noticed that this does
>> reduce
>>> the amount of time to do repair quite a lot, as indicated in the
>> following:
>>> 
>>> nodetool repair command Cassandra version Number of vnodes/physical node
>> Repair
>>> time
>>> 
>>> nodetool repair courseassociation associations
>>> 2.0.9
>>> 256 26 hours 4 minutes
>>> 32 21 hours 46 minutes
>>> 
>>> nodetool repair userassociation associations
>>> 2.0.9
>>> 256 37 hours 2 minutes
>>> 32 26 hours 29 minutes
>>> 
>>> nodetool repair orguserassociation associations
>>> 2.0.9
>>> 256 13 hours 35 minutes
>>> 32 6 hrs 27 minutes
>>> 
>>> nodetool repair userorgassociation associations
>>> 2.0.9
>>> 256 3 hours 26 minutes
>>> 32 1 hour 39 minutes
>>> 
>>> So using a smaller number of vnodes does reduce the repair time, but what
>>> are other implications by doing so, performance? system resource
>>> consumptions? Is there a general guideline on the number of vnodes we
>>> should configure to?
>>> 
>>> Thanks.
>>> 
>>> George
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [VOTE PASSED] Release Apache Cassandra 3.11.1

2017-10-09 Thread Michael Shuler
On 10/09/2017 10:15 AM, Michael Shuler wrote:
> I count 7 binding +1 votes and no other votes for the 3.11.1 release.
> Thanks for all the feedback. I will publish the release artifacts shortly.
> 

Resend with subject updated to [VOTE PASSED] :)

-- 
Michael

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [VOTE] Release Apache Cassandra 3.11.1

2017-10-09 Thread Michael Shuler
I count 7 binding +1 votes and no other votes for the 3.11.1 release.
Thanks for all the feedback. I will publish the release artifacts shortly.

-- 
Kind regards,
Michael

On 10/02/2017 12:58 PM, Michael Shuler wrote:
> I propose the following artifacts for release as 3.11.1.
> 
> sha1: 983c72a84ab6628e09a78ead9e20a0c323a005af
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.11.1-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1151/org/apache/cassandra/apache-cassandra/3.11.1/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1151/
> 
> The Debian packages are available here: http://people.apache.org/~mshuler
> 
> The vote will be open for 72 hours (longer if needed).
> 
> [1]: (CHANGES.txt) https://goo.gl/dZCRk8
> [2]: (NEWS.txt) https://goo.gl/rh24MX
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: What would be the appropriate number of vnodes (num_tokens) to use?

2017-10-09 Thread Jeff Jirsa
256 was chosen because the original vnode allocation algorithm was random and 
fewer than 256 could lead to unbalanced nodes

In 3.0 there’s a less naive algorithm to ensure more balanced distribution, and 
there  16 or 32 is probably preferable


-- 
Jeff Jirsa


> On Oct 9, 2017, at 7:38 AM, Li, Guangxing  wrote:
> 
> Hi,
> 
> the documentation says that '...The recommended initial value for
> num_tokens is 256...' and this is what we did with our cluster which is
> running Cassandra Community 2.0.9, has 3 physical nodes with replication
> factor 3 for all keyspaces, each with 256 vnodes, each physical node has
> about 96 GB data. We noticed that doing a repair for some keyspaces can
> take up to 37 hours. We did some testing and reduced the number of vnodes
> from 256 to 32 for each physical node, and we noticed that this does reduce
> the amount of time to do repair quite a lot, as indicated in the following:
> 
> nodetool repair command Cassandra version Number of vnodes/physical node 
> Repair
> time
> 
> nodetool repair courseassociation associations
> 2.0.9
> 256 26 hours 4 minutes
> 32 21 hours 46 minutes
> 
> nodetool repair userassociation associations
> 2.0.9
> 256 37 hours 2 minutes
> 32 26 hours 29 minutes
> 
> nodetool repair orguserassociation associations
> 2.0.9
> 256 13 hours 35 minutes
> 32 6 hrs 27 minutes
> 
> nodetool repair userorgassociation associations
> 2.0.9
> 256 3 hours 26 minutes
> 32 1 hour 39 minutes
> 
> So using a smaller number of vnodes does reduce the repair time, but what
> are other implications by doing so, performance? system resource
> consumptions? Is there a general guideline on the number of vnodes we
> should configure to?
> 
> Thanks.
> 
> George

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: What would be the appropriate number of vnodes (num_tokens) to use?

2017-10-09 Thread Li, Guangxing
Jeff,

so the key really is to keep nodes load balanced, and as long as that such
balance is achieved, using a smaller amount of vnodes does not have other
negative impact?

Thanks.

George

On Mon, Oct 9, 2017 at 8:46 AM, Jeff Jirsa  wrote:

> 256 was chosen because the original vnode allocation algorithm was random
> and fewer than 256 could lead to unbalanced nodes
>
> In 3.0 there’s a less naive algorithm to ensure more balanced
> distribution, and there  16 or 32 is probably preferable
>
>
> --
> Jeff Jirsa
>
>
> > On Oct 9, 2017, at 7:38 AM, Li, Guangxing 
> wrote:
> >
> > Hi,
> >
> > the documentation says that '...The recommended initial value for
> > num_tokens is 256...' and this is what we did with our cluster which is
> > running Cassandra Community 2.0.9, has 3 physical nodes with replication
> > factor 3 for all keyspaces, each with 256 vnodes, each physical node has
> > about 96 GB data. We noticed that doing a repair for some keyspaces can
> > take up to 37 hours. We did some testing and reduced the number of vnodes
> > from 256 to 32 for each physical node, and we noticed that this does
> reduce
> > the amount of time to do repair quite a lot, as indicated in the
> following:
> >
> > nodetool repair command Cassandra version Number of vnodes/physical node
> Repair
> > time
> >
> > nodetool repair courseassociation associations
> > 2.0.9
> > 256 26 hours 4 minutes
> > 32 21 hours 46 minutes
> >
> > nodetool repair userassociation associations
> > 2.0.9
> > 256 37 hours 2 minutes
> > 32 26 hours 29 minutes
> >
> > nodetool repair orguserassociation associations
> > 2.0.9
> > 256 13 hours 35 minutes
> > 32 6 hrs 27 minutes
> >
> > nodetool repair userorgassociation associations
> > 2.0.9
> > 256 3 hours 26 minutes
> > 32 1 hour 39 minutes
> >
> > So using a smaller number of vnodes does reduce the repair time, but what
> > are other implications by doing so, performance? system resource
> > consumptions? Is there a general guideline on the number of vnodes we
> > should configure to?
> >
> > Thanks.
> >
> > George
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: What would be the appropriate number of vnodes (num_tokens) to use?

2017-10-09 Thread Li, Guangxing
That is good info. Thanks.
George

On Mon, Oct 9, 2017 at 10:23 AM, Jeff Jirsa  wrote:

> One of my very smart coworkers who rarely posts to the list pointed out
> privately that I've oversimplified this, and there are other advantages to
> having more vnodes SOMETIMES.
>
> In particular: most of our longest streaming operations
> (bootstrap/decommission/removenode) are cpu bound on the stream receiver.
> Having a single token per node can make those streams take quite some time,
> as we send a single file at a time per stream. If you had more vnodes per
> machine, you could stream more ranges in parallel, taking advantage of more
> cores, streaming significantly faster. This is a very real gain if you are
> regularly adding or removing a FEW nodes. If you're regularly doubling your
> cluster, using a single token per node is probably better, as you can add
> multiple nodes to the cluster at any given time as you can guarantee new
> nodes won't interact with other joining /leaving nodes.
>
>
>
>
> On Mon, Oct 9, 2017 at 8:26 AM, Jeff Jirsa  wrote:
>
> > As long as balanced is achieved, fewer vnodes the better
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Oct 9, 2017, at 7:53 AM, Li, Guangxing 
> > wrote:
> > >
> > > Jeff,
> > >
> > > so the key really is to keep nodes load balanced, and as long as that
> > such
> > > balance is achieved, using a smaller amount of vnodes does not have
> other
> > > negative impact?
> > >
> > > Thanks.
> > >
> > > George
> > >
> > >> On Mon, Oct 9, 2017 at 8:46 AM, Jeff Jirsa  wrote:
> > >>
> > >> 256 was chosen because the original vnode allocation algorithm was
> > random
> > >> and fewer than 256 could lead to unbalanced nodes
> > >>
> > >> In 3.0 there’s a less naive algorithm to ensure more balanced
> > >> distribution, and there  16 or 32 is probably preferable
> > >>
> > >>
> > >> --
> > >> Jeff Jirsa
> > >>
> > >>
> > >>> On Oct 9, 2017, at 7:38 AM, Li, Guangxing 
> > >> wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> the documentation says that '...The recommended initial value for
> > >>> num_tokens is 256...' and this is what we did with our cluster which
> is
> > >>> running Cassandra Community 2.0.9, has 3 physical nodes with
> > replication
> > >>> factor 3 for all keyspaces, each with 256 vnodes, each physical node
> > has
> > >>> about 96 GB data. We noticed that doing a repair for some keyspaces
> can
> > >>> take up to 37 hours. We did some testing and reduced the number of
> > vnodes
> > >>> from 256 to 32 for each physical node, and we noticed that this does
> > >> reduce
> > >>> the amount of time to do repair quite a lot, as indicated in the
> > >> following:
> > >>>
> > >>> nodetool repair command Cassandra version Number of vnodes/physical
> > node
> > >> Repair
> > >>> time
> > >>>
> > >>> nodetool repair courseassociation associations
> > >>> 2.0.9
> > >>> 256 26 hours 4 minutes
> > >>> 32 21 hours 46 minutes
> > >>>
> > >>> nodetool repair userassociation associations
> > >>> 2.0.9
> > >>> 256 37 hours 2 minutes
> > >>> 32 26 hours 29 minutes
> > >>>
> > >>> nodetool repair orguserassociation associations
> > >>> 2.0.9
> > >>> 256 13 hours 35 minutes
> > >>> 32 6 hrs 27 minutes
> > >>>
> > >>> nodetool repair userorgassociation associations
> > >>> 2.0.9
> > >>> 256 3 hours 26 minutes
> > >>> 32 1 hour 39 minutes
> > >>>
> > >>> So using a smaller number of vnodes does reduce the repair time, but
> > what
> > >>> are other implications by doing so, performance? system resource
> > >>> consumptions? Is there a general guideline on the number of vnodes we
> > >>> should configure to?
> > >>>
> > >>> Thanks.
> > >>>
> > >>> George
> > >>
> > >> -
> > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >>
> > >>
> >
>


Re: What would be the appropriate number of vnodes (num_tokens) to use?

2017-10-09 Thread Jeff Jirsa
One of my very smart coworkers who rarely posts to the list pointed out
privately that I've oversimplified this, and there are other advantages to
having more vnodes SOMETIMES.

In particular: most of our longest streaming operations
(bootstrap/decommission/removenode) are cpu bound on the stream receiver.
Having a single token per node can make those streams take quite some time,
as we send a single file at a time per stream. If you had more vnodes per
machine, you could stream more ranges in parallel, taking advantage of more
cores, streaming significantly faster. This is a very real gain if you are
regularly adding or removing a FEW nodes. If you're regularly doubling your
cluster, using a single token per node is probably better, as you can add
multiple nodes to the cluster at any given time as you can guarantee new
nodes won't interact with other joining /leaving nodes.




On Mon, Oct 9, 2017 at 8:26 AM, Jeff Jirsa  wrote:

> As long as balanced is achieved, fewer vnodes the better
>
> --
> Jeff Jirsa
>
>
> > On Oct 9, 2017, at 7:53 AM, Li, Guangxing 
> wrote:
> >
> > Jeff,
> >
> > so the key really is to keep nodes load balanced, and as long as that
> such
> > balance is achieved, using a smaller amount of vnodes does not have other
> > negative impact?
> >
> > Thanks.
> >
> > George
> >
> >> On Mon, Oct 9, 2017 at 8:46 AM, Jeff Jirsa  wrote:
> >>
> >> 256 was chosen because the original vnode allocation algorithm was
> random
> >> and fewer than 256 could lead to unbalanced nodes
> >>
> >> In 3.0 there’s a less naive algorithm to ensure more balanced
> >> distribution, and there  16 or 32 is probably preferable
> >>
> >>
> >> --
> >> Jeff Jirsa
> >>
> >>
> >>> On Oct 9, 2017, at 7:38 AM, Li, Guangxing 
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> the documentation says that '...The recommended initial value for
> >>> num_tokens is 256...' and this is what we did with our cluster which is
> >>> running Cassandra Community 2.0.9, has 3 physical nodes with
> replication
> >>> factor 3 for all keyspaces, each with 256 vnodes, each physical node
> has
> >>> about 96 GB data. We noticed that doing a repair for some keyspaces can
> >>> take up to 37 hours. We did some testing and reduced the number of
> vnodes
> >>> from 256 to 32 for each physical node, and we noticed that this does
> >> reduce
> >>> the amount of time to do repair quite a lot, as indicated in the
> >> following:
> >>>
> >>> nodetool repair command Cassandra version Number of vnodes/physical
> node
> >> Repair
> >>> time
> >>>
> >>> nodetool repair courseassociation associations
> >>> 2.0.9
> >>> 256 26 hours 4 minutes
> >>> 32 21 hours 46 minutes
> >>>
> >>> nodetool repair userassociation associations
> >>> 2.0.9
> >>> 256 37 hours 2 minutes
> >>> 32 26 hours 29 minutes
> >>>
> >>> nodetool repair orguserassociation associations
> >>> 2.0.9
> >>> 256 13 hours 35 minutes
> >>> 32 6 hrs 27 minutes
> >>>
> >>> nodetool repair userorgassociation associations
> >>> 2.0.9
> >>> 256 3 hours 26 minutes
> >>> 32 1 hour 39 minutes
> >>>
> >>> So using a smaller number of vnodes does reduce the repair time, but
> what
> >>> are other implications by doing so, performance? system resource
> >>> consumptions? Is there a general guideline on the number of vnodes we
> >>> should configure to?
> >>>
> >>> Thanks.
> >>>
> >>> George
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>
>


Re: Stream is failing while removing the node

2017-10-09 Thread Jason Brown
Varun,

Please open a JIRA ticket with all the details of what you are seeing.

Thanks,

-Jason

On Mon, Oct 9, 2017 at 7:54 PM, Varun Barala 
wrote:

> Hi developers,
>
> Recently, I was removing one node from the cluster without downtime.
>
> Cluster size :: 3 node [test cluster with nodeA, nodeB, nodeC]
>
>
> *Procedure:-*1 Change RF to 1
> 2 Check all the nodes are up
> 3 Run repair on all the nodes
> 4 Run decommission on node nodeC
> 5 After decommission
> 6 Check node status(status should be *UN*)
> 7 Clean up on all the nodes
> * repeat steps[2..7] for nodeB
>
>
> But we found some weird exception in the logs:-
> *level:WARN  thread:STREAM-IN-/10.142.1.11 
> @timestamp:2017-10-06T02:44:21.567+
> hostname:ip-10-142-1-10.compute.internal
> loggerName:org.apache.cassandra.streaming.compress.CompressedStreamReader
> file:CompressedStreamReader.java:115message:[Stream
> cc2f84d0-aa3e-11e7-a2fc-fd45c9a98801] Error while reading partition
> DecoratedKey(-9223372036854775808, ) from stream on ks='ks1' and
> table='cf1'.  throwable:*
> *level:WARN  thread:STREAM-IN-/10.142.1.11 
> @timestamp:2017-10-06T02:44:21.568+
> hostname:ip-10-142-1-10.compute.internal
> loggerName:org.apache.cassandra.streaming.StreamSession
> file:StreamSession.java:626 message:[Stream
> #cc2f84d0-aa3e-11e7-a2fc-fd45c9a98801] Retrying for following error
> throwable:java.lang.AssertionError: null\nat
> org.apache.cassandra.streaming.compress.CompressedInputStream.read(
> CompressedInputStream.java:106)\nat
> java.io.InputStream.read(InputStream.java:170)\nat
> java.io.InputStream.skip(InputStream.java:224)\nat
> org.apache.cassandra.streaming.StreamReader.drain(
> StreamReader.java:158)\nat
> org.apache.cassandra.streaming.compress.CompressedStreamReader.read(
> CompressedStreamReader.java:129)\nat
> org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(
> IncomingFileMessage.java:48)\nat
> org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(
> IncomingFileMessage.java:38)\nat
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(
> StreamMessage.java:56)\nat
> org.apache.cassandra.streaming.ConnectionHandler$
> IncomingMessageHandler.run(ConnectionHandler.java:250)\nat
> java.lang.Thread.run(Thread.java:745)\n*
>
> assertion failing at:-
> https://github.com/apache/cassandra/blob/cassandra-2.1.
> 13/src/java/org/apache/cassandra/streaming/compress/
> CompressedInputStream.java#L106
>
> Could someone please tell me in which scenario this happens, Thanks!!
>
> Thanks in advance!!
>
> Regards
> Varun Barala
>