Re: Output of "select token from system.local where key = 'local' "

2016-08-30 Thread Alexander DEJANOVSKI
Hi Siddharth, yes, we are sure token ranges will never overlap (I think the start token in describering output is excluded and the end token included). You can get per host information in the Datastax Java driver using : Set rangesForKeyspace = cluster.getMetadata().getTokenRanges(

RE: How to get information of each read/write request?

2016-08-30 Thread Jun Wu
Hi Chris, Thank you so much for the reply. For the tracing on in cqlsh, it gives a very high-level information. I do need more other detailed information. For the ticket, that's exactly what I want: the waiting time for the thread pool queue. Actually I do want to the waiting time

RE: How to get information of each read/write request?

2016-08-30 Thread Jun Wu
Hi Matija, Thank you so much for the reply. The zipking seems to be a very useful tool and I'll take a close look at it. Meanwhile, I know how to use the Jconsole to get the information exposing to JMX. However, for the Jconsole, I need to click the refresh button to get the updated

Re: Can Sqoop be used in cassandra only cluster for data migration?

2016-08-30 Thread G P
You can look into using the open Talend big data package. -- Enviado do aplicação myMail para Android Terça-feira, 30 Agosto 2016, 09:42PM +01:00 de Amit Trivedi tria...@gmail.com: I am working on a POC and would like to move data from a relational database to

Can Sqoop be used in cassandra only cluster for data migration?

2016-08-30 Thread Amit Trivedi
I am working on a POC and would like to move data from a relational database to Cassandra. I was wondering if I can use Sqoop for this since it is one time thing and it would be easy to just give a select query to Sqoop to pull data from relational database. However, it looks like I need to setup

Re: How to get information of each read/write request?

2016-08-30 Thread Chris Lohfink
Running a query with trace (`TRACING ON` in cqlsh) can give you a lot of the information for an individual request. There has been a ticket to track time in queue (https://issues.apache.org/jira/browse/CASSANDRA-8398) but no ones worked on it yet. Chris On Tue, Aug 30, 2016 at 12:20 PM, Jun Wu

Re: How to get information of each read/write request?

2016-08-30 Thread Matija Gobec
Hi Jun, If you are looking to track each request zipking is your best bet. The last pickle has a blog about tracing using zipkin. Regarding the stats you see in nodetool did you check the

Re: Read timeouts on primary key queries

2016-08-30 Thread Joseph Tech
On further analysis, this issue happens only on 1 table in the KS which has the max reads. @Atul, I will look at system health, but didnt see anything standing out from GC logs. (using JDK 1.8_92 with G1GC). @Patrick , could you please elaborate the "mismatch on node count + RF" part. On Tue,

How to get information of each read/write request?

2016-08-30 Thread Jun Wu
Hi there, I'm very interested in the read/write path of Cassandra. Specifically, I'd like to know the whole process when a read/write request comes in. I noticed that for reach request it could go through multiple stages. For example, for read request, it could be in ReadStage,

ApacheCon Seville CFP closes September 9th

2016-08-30 Thread Rich Bowen
It's traditional. We wait for the last minute to get our talk proposals in for conferences. Well, the last minute has arrived. The CFP for ApacheCon Seville closes on September 9th, which is less than 2 weeks away. It's time to get your talks in, so that we can make this the best ApacheCon yet.

Select..IN query specs

2016-08-30 Thread Atul Saroha
I understand *IN* query only allowed on clustering columns. Just want to understand: why is it not allowed on non-primary columns with ALLOW FILTERING in case of "where clause" containing all partition keys with it. Can someone guide me to a DOC/Blog for better understanding

Re: Read timeouts on primary key queries

2016-08-30 Thread Atul Saroha
There could be many reasons for this if it is intermittent. CPU usage + I/O wait status. As read are I/O intensive, your IOPS requirement should be met that time load. Heap issue if CPU is busy for GC only. Network health could be the reason. So better to look system health during that time when

Re: Read timeouts on primary key queries

2016-08-30 Thread Joseph Tech
Hi Patrick, The nodetool status shows all nodes up and normal now. From OpsCenter "Event Log" , there are some nodes reported as being down/up etc. during the timeframe of timeout, but these are Search workload nodes from the remote (non-local) DC. The RF is 3 and there are 9 nodes per DC.

Re: Read Repairs and CL

2016-08-30 Thread Ben Slater
Thanks Sam - a couple of subtleties there that we missed in our review. Cheers Ben On Tue, 30 Aug 2016 at 19:42 Sam Tunnicliffe wrote: > Just to clarify a little further, it's true that read repair queries are > performed at CL ALL, but this is slightly different to a regular,

Re: Read Repairs and CL

2016-08-30 Thread Sam Tunnicliffe
Just to clarify a little further, it's true that read repair queries are performed at CL ALL, but this is slightly different to a regular, user-initiated query at that CL. Say you have RF=5 and you issue read at CL ALL, the coordinator will send requests to all 5 replicas and block until it

Re: Output of "select token from system.local where key = 'local' "

2016-08-30 Thread Siddharth Verma
Hi , Can we be sure that, token ranges in nodetool describering will be non overlapping? Thanks Siddharth Verma

Re: Output of "select token from system.local where key = 'local' "

2016-08-30 Thread Siddharth Verma
Hi Alex, Thanks for your reply. I saw describering yesterday, but didn't know "the first endpoint being the primary". Thanks for that. Is there anyway to get the same information in application? If there isn't any way to get the same information at application layer, I would be using this as a

Re: Output of "select token from system.local where key = 'local' "

2016-08-30 Thread Alexander DEJANOVSKI
Hi Siddarth, I would recommend running "nodetool describering keyspace_name" as its output is much simpler to reason about : Schema Version:9a091b4e-3712-3149-b187-d2b09250a19b TokenRange: TokenRange(start_token:1943978523300203561, end_token:2137919499801737315, endpoints:[127.0.0.3,

Re: Output of "select token from system.local where key = 'local' "

2016-08-30 Thread Siddharth Verma
Hi, I saw that in cassandra-driver-core,(3.1.0) Metadata.TokenMap has primaryToTokens which has the value for ALL the nodes. I tried to find (primary)range ownership for nodes in one DC. And executed the following in debug mode in IDE. TreeMap primaryTokenMap = new TreeMap<>();

Re: Bootstrapping multiple C* nodes in AWS

2016-08-30 Thread Jeff Jirsa
Related reading: https://issues.apache.org/jira/browse/CASSANDRA-2434 and   https://issues.apache.org/jira/browse/CASSANDRA-7069 From: Ben Slater Reply-To: "user@cassandra.apache.org" Date: Monday, August 29, 2016 at 11:48 PM To:

Re: Bootstrapping multiple C* nodes in AWS

2016-08-30 Thread Ben Slater
Hi Aiman, Best practice would be to map cassandra racks to AWS availability zones. If you are following this then you would add one node per AZ to keep the number of nodes in each rack balanced. It is technically possible to add multiple nodes simultaneously (at least joining simultaneously -

Bootstrapping multiple C* nodes in AWS

2016-08-30 Thread Aiman Parvaiz
Hi all I am running C* 2.1.12 in AWS EC2 Classic with RF=3 and vnodes(256 tokens/node). My nodes are distributed in three different availability zones. I want to scale up the cluster size, given the data size per node it takes around 24 hours to add one node. I wanted to know if its safe to add