How long/how many days 'nodetool gossipinfo' will have decommissioned nodes info

2016-09-25 Thread Laxmikanth S
Hi,

Recently we have decommissioned nodes from Cassandra cluster , but even
after nearly 48 hours 'nodetool gossipinfo' still shows the removed nodes(
as LEFT).

I just wanted to recommission the same node again. So just wanted to know ,
will it create a problem if I recommission the same node(same IP)  again
while its state is maintained as LEFT in 'nodetool gossipnfo'.


Thanks,
Techpyaasa


Iterating over a table with multiple producers [Python]

2016-09-25 Thread Bhuvan Rawal
Hi,

Its a common occurrence where full scan of Cassandra table is required. One
of the most common requirement is to get the count of rows in a table. As
Cassandra doesn't keep count information stored anywhere (A node may not
have any clue about writes happening on other nodes) when we aggregate
using count(*) essentially all rows are being sent to coordinator by other
nodes, which is not really recommended and adds pressure on the
coordinator's heap.

Another common use case may be to filter by a cell value on which secondary
index isnt created. It may not be efficient to create a secondary index for
a just a one off requirement, a single full scan can again resolve it.

I worked on scans using single producer but it became pretty time consuming
as the size of table grew big. With motivation from java driver test cases
-(Datastax Java Driver
)
I worked on multi token range scan.

This approach gave pretty interesting results (of course depending on the
client machine and cluster size) and i though of sharing it with @users. We
have achieved in excess of 1.5 Million Rows per second scan by using 50
workers on a 6 node cluster which was pretty cool. It can be made faster on
larger clusters and better client machine. This approach has been tested on
a 710 Mil Row table and the scan took 473 seconds without overwhelming
Cassandra nodes.

This has been discussed in detail on my blog
,
sample code at github
.
Feel free to reach out if I can help / there could be better way out.

Regards,
Bhuvan

# Note - 1. Paging set reinjection feature has been used in case of
exception which is new in 3.7 driver which makes failover pretty easy
# This may not beat spark but If you dont have spark infra setup in
locality with Cassandra this could be a pretty good way to get things done
quickly.


Re: regarding drain process

2016-09-25 Thread jason zhao yang
Hi Varun,

It looks like a scheduled job that runs "nodetool drain"..

Zhao Yang

Varun Barala 于2016年9月25日周日 下午7:45写道:

> Jeff Jirsa thanks for your reply!!
>
> We are not using any chef/puppet and It happens only at one node other
> nodes are working fine.
> And all machines are using same AMI image.
>
> Did anybody face such situation or have any suggestions ?
>
> Thanking you.
>
> On Wed, Jul 27, 2016 at 10:27 PM, Jeff Jirsa 
> wrote:
>
>> Are you running chef/puppet or similar?
>>
>>
>>
>> *From: *Varun Barala 
>> *Reply-To: *"user@cassandra.apache.org" 
>> *Date: *Tuesday, July 26, 2016 at 10:15 PM
>> *To: *"user@cassandra.apache.org" 
>> *Subject: *regarding drain process
>>
>>
>>
>> Hi all,
>>
>>
>> Recently I'm facing a problem with cassandra nodes. Nodes go down very
>> frequently.
>> I went through system.log and found the reason that somehow c* triggers 
>> *draining
>> process.*
>>
>> I know the purpose of *nodetool drain *but it should not trigger
>> automatically, right ?
>>
>> or is there any specific settings for the same ?
>>
>>
>> *we are using C*-2.1.13.*
>>
>> please let me know if you need more info.
>>
>> Thanking you!!
>>
>> Regards,
>> Varun Barala
>>
>
>


Re: regarding drain process

2016-09-25 Thread Varun Barala
Jeff Jirsa thanks for your reply!!

We are not using any chef/puppet and It happens only at one node other
nodes are working fine.
And all machines are using same AMI image.

Did anybody face such situation or have any suggestions ?

Thanking you.

On Wed, Jul 27, 2016 at 10:27 PM, Jeff Jirsa 
wrote:

> Are you running chef/puppet or similar?
>
>
>
> *From: *Varun Barala 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Tuesday, July 26, 2016 at 10:15 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *regarding drain process
>
>
>
> Hi all,
>
>
> Recently I'm facing a problem with cassandra nodes. Nodes go down very
> frequently.
> I went through system.log and found the reason that somehow c* triggers 
> *draining
> process.*
>
> I know the purpose of *nodetool drain *but it should not trigger
> automatically, right ?
>
> or is there any specific settings for the same ?
>
>
> *we are using C*-2.1.13.*
>
> please let me know if you need more info.
>
> Thanking you!!
>
> Regards,
> Varun Barala
>