Re: Query on Performance Dip

2024-04-05 Thread Jon Haddad
Try changing the chunk length parameter on the compression settings to 4kb, and reduce read ahead to 16kb if you’re using EBS or 4KB if you’re using decent local ssd or nvme. Counters read before write. — Jon Haddad Rustyrazorblade Consulting rustyrazorblade.com On Fri, Apr 5, 2024 at 9:27 AM

Re: Query on Performance Dip

2024-04-05 Thread Subroto Barua via user
follow up question on performance issue with 'counter writes'- is there a parameter or condition that limits the allocation rate for 'CounterMutationStage'? I see 13-18mb/s for 4.1.4 Vs 20-25mb/s for 4.0.5. The back-end infra is same for both the clusters and same test cases/data model. On

Re: Query on Performance Dip

2024-03-30 Thread Jon Haddad
Hi, Unfortunately, the numbers you're posting have no meaning without context. The speculative retries could be the cause of a problem, or you could simply be executing enough queries and you have a fairly high variance in latency which triggers them often. It's unclear how many queries / second

Re: Query on Performance Dip

2024-03-30 Thread ranju goel
Hi All, On debugging the cluster for performance dip seen while using 4.1.4, i found high speculation retries Value in nodetool tablestats during read operation. I ran the below tablestats command and checked its output after every few secs and noticed that retries are on rising side. Also

Re: Query on Performance Dip

2024-03-27 Thread Subroto Barua via user
we are seeing similar perf issues with counter writes - to reproduce: cassandra-stress counter_write n=10 no-warmup cl=LOCAL_QUORUM -rate threads=50 -mode native cql3 user= password= -name op rate: 39,260 ops (4.1) and 63,689 ops (4.0) latency 99th percentile: 7.7ms (4.1) and 1.8ms

Re: Query on version 4.1.3

2024-01-11 Thread Luciano Greiner
We are a about to do the same upgrade, although aiming v4.1.2 Highly interested in this topic as well. Luciano Greiner On Thu, Jan 11, 2024 at 4:13 AM ranju goel wrote: > > Hi Everyone, > > We are planning to upgrade from 4.0.11 to 4.1.3, the main motive of upgrading > is 4.0.11 going EOS in

Re: Query on Token range

2023-06-10 Thread ranju goel
Thanks , it helped but also looking for a way to get total number of token ranges assigned to that node, which i am doing currently manually( subtracting) by using nodetool ring. Best Regards Ranju On Fri, Jun 9, 2023 at 12:50 PM guo Maxwell wrote: > I think nodetool info with --token may do

Re: Query on Token range

2023-06-09 Thread guo Maxwell
I think nodetool info with --token may do some help. ranju goel 于2023年6月9日周五 15:09写道: > Hi everyone, > > Is there any faster way to calculate the number of token ranges allocated > to a node > (x.y.z.w)? > > I used the manual way by subtracting the last token with the start token > shown in the

Re: Query for Cassandra Driver

2022-12-22 Thread manish khandelwal
Hi Deepti I think you can reach out to https://groups.google.com/a/lists.datastax.com/g/cpp-driver-user. Regards Manish On Fri, Dec 23, 2022 at 12:52 PM Deepti Sharma S via user < user@cassandra.apache.org> wrote: > Hello Team, > > > > Could you please help in answering below query. > > > > >

RE: Query for Cassandra Driver

2022-12-22 Thread Deepti Sharma S via user
Hello Team, Could you please help in answering below query. Regards, Deepti Sharma PMP(r) & ITIL From: Deepti Sharma S via user Sent: 20 December 2022 18:39 To: user@cassandra.apache.org Cc: Nandita Singh S Subject: Query for Cassandra Driver Hello Team, We have an Application following

Re: Query regarding EOS for Cassandra version 3.11.13

2022-12-15 Thread manish khandelwal
3.11.x versions will be maintained till May July 2023. Please refer https://cassandra.apache.org/_/download.html On Thu, Dec 15, 2022, 20:55 Pranav Kumar (EXT) via user < user@cassandra.apache.org> wrote: > Hi Team, > > > > Could you please help us to know when version 3.11.13 is going to be

Re: Query drivertimeout PT2S

2022-11-09 Thread Cédrick Lunven
; > > > *From:* Bowen Song via user > *Sent:* Tuesday, November 8, 2022 1:53 PM > *To:* user@cassandra.apache.org > *Subject:* [EXTERNAL] Re: Query drivertimeout PT2S > > > > This is a mailing list for the Apache Cassandra, and that's not the same > as DataStax Enter

RE: Query drivertimeout PT2S

2022-11-09 Thread Durity, Sean R via user
ou would need to give us more data to guide >those discussions. Sean R. Durity From: Bowen Song via user Sent: Tuesday, November 8, 2022 1:53 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Query drivertimeout PT2S This is a mailing list for the Apache Cassandra, and that's not the s

Re: Query drivertimeout PT2S

2022-11-08 Thread Bowen Song via user
This is a mailing list for the Apache Cassandra, and that's not the same as DataStax Enterprise Cassandra you are using. We may still be able to help here if you could provide more details, such as the queries, table schema, system stats (cpu, ram, disk io, network, and so on), logs, table

Re: Query around Data Modelling -2

2022-07-01 Thread Bowen Song via user
From:* Bowen Song *Sent:* Friday, July 1, 2022 08:48 *To:* user@cassandra.apache.org *Subject:* Re: Query around Data Modelling -2 This message was sent from outside the company. Please do not click links or open attachments unless you recognise the sourc

Re: Query around Data Modelling -2

2022-07-01 Thread MyWorld
> > > <https://www.facebook.com/SkylineCommunications/> > > <https://www.instagram.com/skyline.dataminer/> > > > <https://skyline.be/skyline/awards?utm_source=signature_medium=email_campaign=icon> > > > > > > > > *From:* Bowen Song &g

RE: Query around Data Modelling -2

2022-06-30 Thread Michiel Saelen
; [cid:image010.png@01D88D2B.263669C0] From: Bowen Song Sent: Friday, July 1, 2022 08:48 To: user@cassandra.apache.org Subject: Re: Query around Data Modelling -2 This message was sent from outside the company. Please do not click links or open attachments unless you recognise the source of t

Re: Query around Data Modelling -2

2022-06-30 Thread Bowen Song
And why do you do that? On 30/06/2022 16:35, MyWorld wrote: We run major compaction once in a week On Thu, Jun 30, 2022, 8:14 PM Bowen Song wrote: I have noticed this "running a weekly repair and compaction job". What do you mean weekly compaction job? Have you disabled the

Re: Query around Data Modelling -2

2022-06-30 Thread MyWorld
We run major compaction once in a week On Thu, Jun 30, 2022, 8:14 PM Bowen Song wrote: > I have noticed this "running a weekly repair and compaction job". > > What do you mean weekly compaction job? Have you disabled the > auto-compaction on the table and is relying on weekly scheduled >

Re: Query around Data Modelling -2

2022-06-30 Thread Bowen Song
I have noticed this "running a weekly repair and compaction job". What do you mean weekly compaction job? Have you disabled the auto-compaction on the table and is relying on weekly scheduled compactions? Or running weekly major compactions? Neither of these sounds right. On 30/06/2022

Re: Query around Data Modelling -2

2022-06-30 Thread MyWorld
Hi Jeff, We are running repair with -pr option. You are right it would have no or very minimal impact on read (considering the fact now data has to be read from 2 levels instead of 3). But my guess there is no negative impact of this model2. On Thu, Jun 30, 2022, 7:41 PM Jeff Jirsa wrote: >

Re: Query around Data Modelling -2

2022-06-30 Thread Jeff Jirsa
How are you running repair? -pr? Or -st/-et? 4.0 gives you real incremental repair which helps. Splitting the table won’t make reads faster. It will increase the potential parallelization of compaction. > On Jun 30, 2022, at 7:04 AM, MyWorld wrote: > >  > Hi all, > > Another query around

Re: Query around Data Modelling

2022-06-22 Thread MyWorld
Thanks a lot Jeff, Michiel and Manish for your replies. Really helpful. On Thu, Jun 23, 2022, 9:50 AM Jeff Jirsa wrote: > This is assuming each row is like … I dunno 10-1000 bytes. If you’re > storing like a huge 1mb blob use two tables for sure. > > On Jun 22, 2022, at 9:06 PM, Jeff Jirsa

Re: Query around Data Modelling

2022-06-22 Thread Jeff Jirsa
This is assuming each row is like … I dunno 10-1000 bytes. If you’re storing like a huge 1mb blob use two tables for sure. > On Jun 22, 2022, at 9:06 PM, Jeff Jirsa wrote: > >  > > Ok so here’s how I would think about this > > The writes don’t matter. (There’s a tiny tiny bit of nuance in

Re: Query around Data Modelling

2022-06-22 Thread Jeff Jirsa
Ok so here’s how I would think about this The writes don’t matter. (There’s a tiny tiny bit of nuance in one table where you can contend adding to the memtable but the best cassandra engineers on earth probably won’t notice that unless you have really super hot partitions, so ignore the

Re: Query around Data Modelling

2022-06-22 Thread MyWorld
Hi Jeff, Let me know how no of rows have an impact here. May be today I have 80-100 rows per partition. But what if I started storing 2-4k rows per partition. However total partition size is still under 100 MB On Thu, Jun 23, 2022, 7:18 AM Jeff Jirsa wrote: > How many rows per partition in each

RE: Query around Data Modelling

2022-06-22 Thread Michiel Saelen
I guess it will depend on your use case. If your columns for table1 and table2 are significant in size it might be the case that model 2 is faster and you could perform queries in parallel, but … If you always need to retrieve both the row from table1 and table2, then both queries together might

Re: Query around Data Modelling

2022-06-22 Thread manish khandelwal
Table1 should be fine if some column values are not entered than Cassandra will not create entry for them so partiton will almost be same in both cases. On Thu, Jun 23, 2022, 07:08 MyWorld wrote: > Hi all, > > Just a small query around data Modelling. > Suppose we have to design the data model

Re: Query around Data Modelling

2022-06-22 Thread Jeff Jirsa
How many rows per partition in each model? > On Jun 22, 2022, at 6:38 PM, MyWorld wrote: > >  > Hi all, > > Just a small query around data Modelling. > Suppose we have to design the data model for 2 different use cases which will > query the data on same set of (partion+clustering key). So

Re: Query timed out after PT2M

2022-02-08 Thread Joe Obernberger
Update - the answer was spark.cassandra.input.split.sizeInMB. The default value is 512MBytes.  Setting this to 50 resulted in a lot more splits and the job ran in under 11 minutes; no timeout errors.  In this case the job was a simple count.  10 minutes 48 seconds for over 8.2 billion rows. 

Re: Query timed out after PT2M

2022-02-08 Thread Joe Obernberger
Update - I believe that for large tables, the spark.cassandra.read.timeoutMS needs to be very long; like 4 hours or longer.  The job now runs much longer, but still doesn't complete.  I'm now facing this all too familiar error: com.datastax.oss.driver.api.core.servererrors.ReadTimeoutException:

Re: Query timed out after PT2M

2022-02-07 Thread Joe Obernberger
Some more info.  Tried different GC strategies - no luck. It only happens on large tables (more than 1 billion rows).  Works fine on a 300million row table.  There is very high CPU usage during the run. I've tried setting spark.dse.continuousPagingEnabled to false and I've tried setting

Re: Query timed out after PT2M

2022-02-04 Thread Joe Obernberger
I've tried several different GC settings - but still getting timeouts. Using openJDK 11 with: -XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=500 -XX:InitiatingHeapOccupancyPercent=70 -XX:ParallelGCThreads=24 -XX:ConcGCThreads=24 Machine has 40

Re: Query timed out after PT2M

2022-02-04 Thread Joe Obernberger
Still no go.  Oddly, I can use trino and do a count OK, but with spark I get the timeouts.  I don't believe tombstones are an issue: nodetool cfstats doc.doc Total number of tables: 82 Keyspace : doc     Read Count: 1514288521     Read Latency: 0.5080819034089475 ms

Re: Query timed out after PT2M

2022-02-03 Thread manish khandelwal
It maybe the case you have lots of tombstones in this table which is making reads slow and timeouts during bulk reads. On Fri, Feb 4, 2022, 03:23 Joe Obernberger wrote: > So it turns out that number after PT is increments of 60 seconds. I > changed the timeout to 96, and now I get PT16M

Re: Query timed out after PT2M

2022-02-03 Thread Joe Obernberger
So it turns out that number after PT is increments of 60 seconds.  I changed the timeout to 96, and now I get PT16M (96/6).  Since I'm still getting timeouts, something else must be wrong. Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage

Re: Query timed out after PT2M

2022-02-03 Thread Joe Obernberger
I did find this: https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md And "spark.cassandra.read.timeoutMS" is set to 12. Running a test now, and I think that is it.  Thank you Scott. -Joe On 2/3/2022 3:19 PM, Joe Obernberger wrote: Thank you Scott! I am

Re: Query timed out after PT2M

2022-02-03 Thread Joe Obernberger
Thank you Scott! I am using the spark cassandra connector.  Code: SparkSession spark = SparkSession     .builder()     .appName("SparkCassandraApp")     .config("spark.cassandra.connection.host", "chaos")    

Re: Query timed out after PT2M

2022-02-03 Thread C. Scott Andreas
Hi Joe, it looks like "PT2M" may refer to a timeout value that could be set by your Spark job's initialization of the client. I don't see a string matching this in the Cassandra codebase itself, but I do see that this is parseable as a Duration.```jshell>

Re: Query timed out after PT1M

2021-04-13 Thread Bowen Song
Ouch, counters. Counters in Cassandra have pretty bad performance comparing to everything else in Cassandra or counters (and their equivalent, such as integer types) in other mainstream databases, and they often are inaccurate too. I personally would recommend against the use of counters in

Re: Query timed out after PT1M

2021-04-13 Thread Joe Obernberger
Interestingly, I just tried creating two CqlSession objects and when I use both instead of a single CqlSession for all queries, the 'No Node available to execute query' no longer happens.  In other words, if I use a different CqlSession for updating the doc.seq table, it works.  If that

Re: Query timed out after PT1M

2021-04-13 Thread Joe Obernberger
Thank you Bowen - I wasn't familiar with PT1M. I'm doing the following: update doc.seq set doccount=doccount+? where id=? Which runs OK. Immediately following the update, I do: select doccount from doc.seq where id=? It is the above statement that is throwing the error under heavy load. The

Re: Query timed out after PT1M

2021-04-13 Thread Bowen Song
The error message is clear, it was a DriverTimeoutException, and it was because the query timed out after one minute. /Note: "PT1M" means a period of one minute, see //https://en.wikipedia.org/wiki/ISO_8601#Durations / If you need help from

Re: Query data through python using IN clause

2020-04-02 Thread Nitan Kainth
Thanks Alex. On Thu, Apr 2, 2020 at 1:39 AM Alex Ott wrote: > Hi > > Working code is below, but I want to warn you - prefer not to use IN with > partition keys - because you'll have different partition key values, > coordinator node will need to perform queries to other hosts that hold > these

Re: Query data through python using IN clause

2020-04-02 Thread Alex Ott
Hi Working code is below, but I want to warn you - prefer not to use IN with partition keys - because you'll have different partition key values, coordinator node will need to perform queries to other hosts that hold these partition keys, and this slow downs the operation, and adds an additional

Re: Query timeouts after Cassandra Migration

2020-02-07 Thread Reid Pinchback
To: "user@cassandra.apache.org" Subject: Re: Query timeouts after Cassandra Migration Message from External Sender So do you advise copying tokens in such cases ? What procedure is advisable ? Specifically for your case with 3 nodes + RF=3, it won't make a difference so leave it as it is

Re: Query timeouts after Cassandra Migration

2020-02-06 Thread Erick Ramirez
> > So do you advise copying tokens in such cases ? What procedure is > advisable ? > Specifically for your case with 3 nodes + RF=3, it won't make a difference so leave it as it is. > Latency increased on target cluster. > Have you tried to run a trace of the queries which are slow? It will

Re: Query timeouts after Cassandra Migration

2020-02-06 Thread Ankit Gadhiya
Thanks Eric. So do you advise copying tokens in such cases ? What procedure is advisable ? Latency increased on target cluster. I’d double check on storage disks but it should be same. — Ankit On Thu, Feb 6, 2020 at 9:07 PM Erick Ramirez wrote: > I didn’t copy tokens since it’s an identical

Re: Query timeouts after Cassandra Migration

2020-02-06 Thread Erick Ramirez
> > I didn’t copy tokens since it’s an identical cluster and we have RF as 3 > on 3 node cluster. Is it still needed , why? > In C*, same number of nodes alone isn't enough. Clusters aren't really identical unless token assignments are the same. In your case though since each node has a full copy

Re: Query timeouts after Cassandra Migration

2020-02-06 Thread Ankit Gadhiya
Hi Michael, Thanks for your response. I didn’t copy tokens since it’s an identical cluster and we have RF as 3 on 3 node cluster. Is it still needed , why? Don’t see anything in cassandra log as such. I don’t have debugs enabled. Thanks & Regards, Ankit On Thu, Feb 6, 2020 at 1:47 PM Michael

Re: Query timeouts after Cassandra Migration

2020-02-06 Thread Michael Shuler
Did you copy the tokens from cluster1 to new cluster2? Same Cassandra version, same instance type/size? What to the logs say on cluster2 that look different from the cluster1 norm? There are a number of possible `nodetool` utilities that may help see what is happening on new cluster2. Michael

Re: Query failure

2019-03-14 Thread Léo FERLIN SUTTON
I checked and the configuration file matched on all the nodes. I checked `cqlsh --cqlversion "3.4.0" -u cassandra_superuser -p my_password nodeXX 9042` with each node and finally one failed. It had somehow not been restarted since the configuration change. It was not responsive to `systemctl

Re: Query failure

2019-03-14 Thread Sam Tunnicliffe
Hi Leo my guess would be that your configuration is not consistent across all nodes in the cluster. The responses you’re seeing are totally indicative of being connected to a node where PasswordAuthenticator is not enabled in cassandra.yaml. Thanks, Sam > On 14 Mar 2019, at 10:56, Léo

Re: Query With Limit Clause

2018-11-07 Thread shalom sagges
Thanks a lot for the info :) On Tue, Nov 6, 2018 at 11:11 AM DuyHai Doan wrote: > Cassandra will execute such request using a Partition Range Scan. > > See more details here http://www.doanduyhai.com/blog/?p=13191, chapter E > Cluster Read Path (look at the formula of Concurrency Factor) > > >

Re: Query on Data Modelling of a specific usecase

2017-04-20 Thread Naresh Yadav
Hi Jon, Thanks for your guidance. In above mentioned table i can have different scale depending on Report. One report may have 1 rows. Second report may have half million rows. Third report may have 1 million rows. Fourth report may have 10 million rows. As this is timeseries data that was

Re: Query on Data Modelling of a specific usecase

2017-04-19 Thread Jon Haddad
How much data do you plan to store in each table? I’ll be honest, this doesn’t sound like a Cassandra use case at first glance. 1 table per report x 1000 is going to be a bad time. Odds are with different queries, you’ll need multiple views, so lets call that a handful of tables per report.

Re: Query on Data Modelling of a specific usecase

2017-04-19 Thread Naresh Yadav
Looking for cassandra expert's recommendation on above usecase, please reply. On Mon, Apr 17, 2017 at 7:37 PM, Naresh Yadav wrote: > Hi all, > > This is my existing table configured on apache-cassandra-3.0.9: > > CREATE TABLE report_id1 ( >mc_id text, >tag_id text,

RE: Query on Cassandra clusters

2017-01-03 Thread SEAN_R_DURITY
[mailto:sumit.anve...@gmail.com] Sent: Wednesday, December 21, 2016 3:47 PM To: user@cassandra.apache.org Subject: Re: Query on Cassandra clusters Thank you Alain for the detailed explanation. To answer you question on Java version, JVM settings and Memory usage. We are using using 1.8.0_45

Re: Query

2016-12-30 Thread Work
means Sql part is missing and needs to be handled. > > > > Can anyone please tell me some big name who is using Cassandra for handling > its huge data sets like Twitter etc. > > > > Sent from Outlook > > > > From: Edward Capriolo <edlinuxg...@gmail.c

RE: Query

2016-12-30 Thread SEAN_R_DURITY
studies, too, with their enterprise version of Cassandra: http://www.datastax.com/resources/casestudies Sean Durity From: Sikander Rafiq [mailto:hafiz_ra...@hotmail.com] Sent: Friday, December 30, 2016 8:00 AM To: user@cassandra.apache.org Subject: Re: Query Thanks for your comments/suggestions

Re: Query

2016-12-30 Thread Sikander Rafiq
2016 5:53 AM To: user@cassandra.apache.org Subject: Re: Query You should start with understanding your needs. Once you understand your need you can pick the software that fits your need. Staring with a software stack is backwards. On Thu, Dec 29, 2016 at 11:34 PM, Ben Slater <ben.sla...@

Re: Query

2016-12-29 Thread Edward Capriolo
You should start with understanding your needs. Once you understand your need you can pick the software that fits your need. Staring with a software stack is backwards. On Thu, Dec 29, 2016 at 11:34 PM, Ben Slater wrote: > I wasn’t familiar with Gizzard either so I

Re: Query

2016-12-29 Thread Ben Slater
I wasn’t familiar with Gizzard either so I thought I’d take a look. The first things on their github readme is: *NB: This project is currently not recommended as a base for new consumers.* (And no commits since 2013) So, Cassandra definitely looks like a better choice as your datastore for a new

Re: Query

2016-12-29 Thread Manoj Khangaonkar
I am not that familiar with gizzard but with gizzard + mysql , you have multiple moving parts in the system that need to managed separately. You'll need the mysql expert for mysql and the gizzard expert to manage the distributed part. It can be argued that long term this will have higher

Re: Query on Cassandra clusters

2016-12-21 Thread Sumit Anvekar
Thank you Alain for the detailed explanation. To answer you question on Java version, JVM settings and Memory usage. We are using using 1.8.0_45. precisely >java -version java version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build

Re: Query on Cassandra clusters

2016-12-21 Thread Alain RODRIGUEZ
Hi Sumit, 1. I have a Cassandra cluster with 11 nodes, 5 of which have Cassandra > version 3.0.3 and then newer 5 nodes have 3.6.0 version. I strongly recommend to: - Stick with one version of Apache Cassandra per cluster. - Always be as close as possible from the last minor release of

Re: Query regarding spark on cassandra

2016-04-28 Thread Siddharth Verma
Anyways, thanks for your reply. On Thu, Apr 28, 2016 at 1:59 PM, Hannu Kröger wrote: > Ok, then I don’t understand the problem. > > Hannu > > On 28 Apr 2016, at 11:19, Siddharth Verma > wrote: > > Hi Hannu, > > Had the issue been caused due to

Re: Query regarding spark on cassandra

2016-04-28 Thread Hannu Kröger
Ok, then I don’t understand the problem. Hannu > On 28 Apr 2016, at 11:19, Siddharth Verma > wrote: > > Hi Hannu, > > Had the issue been caused due to read, the insert, and delete statement would > have been erroneous. > "I saw the stdout from web-ui of spark,

Re: Query regarding spark on cassandra

2016-04-28 Thread Siddharth Verma
Hi Hannu, Had the issue been caused due to read, the insert, and delete statement would have been erroneous. "I saw the stdout from web-ui of spark, and the query along with true was printed for both the queries.". The statements were correct as seen on the UI. Thanks, Siddharth Verma On Thu,

Re: Query regarding spark on cassandra

2016-04-28 Thread Hannu Kröger
Hi, could it be consistency level issue? If you use ONE for reads and writes, might be that sometimes you don't get what you are writing. See: https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html Br, Hannu 2016-04-27 20:41 GMT+03:00 Siddharth Verma

Re: Query regarding spark on cassandra

2016-04-28 Thread Siddharth Verma
Edit: 1. dc2 node has been removed. nodetool status shows only active nodes. 2. Repair done on all nodes. 3. Cassandra restarted Still it doesn't solve the problem. On Thu, Apr 28, 2016 at 9:00 AM, Siddharth Verma < verma.siddha...@snapdeal.com> wrote: > Hi, If the info could be used > we

Re: Query regarding spark on cassandra

2016-04-27 Thread Siddharth Verma
Hi, If the info could be used we are using two DCs dc1 - 3 nodes dc2 - 1 node however, dc2 has been down for 3-4 weeks, and we haven't removed it yet. spark slaves on same machines as the cassandra nodes. each node has two instances of slaves. spark master on a separate machine. If anyone could

Re: Query regarding CassandraJavaRDD while running spark job on cassandra

2016-03-24 Thread Kai Wang
I suggest you post this to spark-cassandra-connector list. On Sat, Mar 12, 2016 at 12:52 AM, Siddharth Verma < verma.siddha...@snapdeal.com> wrote: > In cassandra I have a table with the following schema. > > CREATE TABLE my_keyspace.my_table1 ( > col_1 text, > col_2 text, > col_3

Re: Query Consistency Issues...

2015-12-15 Thread Paulo Motta
What cassandra and driver versions are you running? It may be that the second update is getting the same timestamp as the first, or even a lower timestamp if it's being processed by another server with unsynced clock, so that update may be getting lost. If you have high frequency updates in the

Re: Query Consistency Issues...

2015-12-15 Thread James Carman
On Tue, Dec 15, 2015 at 2:57 PM Paulo Motta wrote: > What cassandra and driver versions are you running? > > We are using 2.1.7.1 > It may be that the second update is getting the same timestamp as the > first, or even a lower timestamp if it's being processed by

Re: Query Consistency Issues...

2015-12-15 Thread Steve Robenalt
I agree with Jon. It's almost a statistical certainty that such updates will be processed out of order some of the time because the clock sync between machines will never be perfect. Depending on how your actual code that shows this problem is structured, there are ways to reduce or eliminate

Re: Query Consistency Issues...

2015-12-15 Thread Jonathan Haddad
High volume updates to a single key in a distributed system that relies on a timestamp for conflict resolution is not a particularly great idea. If you ever do this from multiple clients you'll find unexpected results at least some of the time. On Tue, Dec 15, 2015 at 12:41 PM Paulo Motta

Re: query statement return empty

2015-07-30 Thread Jeff Jirsa
What consistency level are you using with your query? What replication factor are you using on your keyspace? Have you run repair? The most likely explanation is that you wrote with low consistency (ANY, ONE, etc), and that one or more replicas does not have the cell. You’re then reading with

RE: query statement return empty

2015-07-30 Thread 鄢来琼
@cassandra.apache.org 主题: Re: query statement return empty What consistency level are you using with your query? What replication factor are you using on your keyspace? Have you run repair? The most likely explanation is that you wrote with low consistency (ANY, ONE, etc), and that one or more

Re: Query returning tombstones

2015-05-03 Thread horschi
Hi Jens, thanks a lot for the link! Your ticket seems very similar to my request. kind regards, Christian On Sat, May 2, 2015 at 2:25 PM, Jens Rantil jens.ran...@tink.se wrote: Hi Christian, I just know Sylvain explicitly stated he wasn't a fan of exposing tombstones here:

Re: query contains IN on the partition key and an ORDER BY

2015-05-02 Thread Robert Wille
Bag the IN clause and execute multiple parallel queries instead. It’s more performant anyway. On May 2, 2015, at 11:46 AM, Abhishek Singh Bailoo abhishek.singh.bai...@gmail.commailto:abhishek.singh.bai...@gmail.com wrote: Hi I have run into the following issue

Re: Query returning tombstones

2015-05-02 Thread Jens Rantil
Hi Christian, I just know Sylvain explicitly stated he wasn't a fan of exposing tombstones here: https://issues.apache.org/jira/browse/CASSANDRA-8574?focusedCommentId=14292063page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14292063 Cheers, Jens On Wed, Apr 29, 2015

Re: query by column size

2015-02-13 Thread chandra Varahala
I have already secondary index on that column, but how to I query that column by size ? thanks chandra On Fri, Feb 13, 2015 at 3:30 AM, Marcelo Valle (BLOOMBERG/ LONDON) mvallemil...@bloomberg.net wrote: There is no automatic indexing in Cassandra. There are secondary indexes, but not for

Re: query by column size

2015-02-13 Thread Tyler Hobbs
On Fri, Feb 13, 2015 at 11:18 AM, chandra Varahala hadoopandcassan...@gmail.com wrote: I have already secondary index on that column, but how to I query that column by size ? You can't. If this is a query that you want to do regularly and efficiently, I suggest creating a second table to

Re: Query strategy with respect to tombstones

2014-12-17 Thread Ryan Svihla
so first limits are good, the unlimited row count of a user can eventually eat you, which I suspect it is here, you maybe better off partitioning your data with some reasonable limits, but this is a bigger domain modeling conversation. Second, tombstone overflowing is typically a canary for a data

Re: query tracing

2014-11-15 Thread Jimmy Lin
Well we are able to do the tracing under normal load, but not yet able to turn on tracing on demand during heavy load from client side(due to hard to predict traffic pattern). under normal load we saw most of the time query spent (in one particular row we focus on) between merging data from

Re: query tracing

2014-11-15 Thread Jens Rantil
Maybe you should try to lower your read repair probability? — Sent from Mailbox On Sat, Nov 15, 2014 at 9:40 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: Well we are able to do the tracing under normal load, but not yet able to turn on tracing on demand during heavy load from client side(due

Re: query tracing

2014-11-15 Thread Jimmy Lin
hi Jen, interesting idea, but I thought read repair happen in background, and so won't affect the actual read request calling from real client. ? On Sat, Nov 15, 2014 at 1:04 AM, Jens Rantil jens.ran...@tink.se wrote: Maybe you should try to lower your read repair probability? — Sent from

Re: query tracing

2014-11-10 Thread Johnny Miller
Be cautious enabling query tracing. Great tool for dev/testing/diagnosing etc.. - but it does persist data to the system_traces keyspace with a TTL of 24 hours and will, as a consequence, consume resources. http://www.datastax.com/dev/blog/advanced-request-tracing-in-cassandra-1-2

Re: query tracing

2014-11-10 Thread DuyHai Doan
As Jonathan said, it's better to activate query tracing client side. It'll give you better flexibility of when to turn on off tracing and on which table. Server-side tracing is global (all tables) and probabilistic, thus may not give satisfactory level of debugging. Programmatically it's pretty

Re: query tracing

2014-11-07 Thread Robert Coli
On Fri, Nov 7, 2014 at 9:35 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: is there any significant performance penalty if one turn on Cassandra query tracing, through DataStax java driver (say, per every query request of some trouble query)? What does 'significant' mean in your sentence? I'm

Re: query tracing

2014-11-07 Thread Chris Lohfink
It saves a lot of information for each request thats traced so there is significant overhead. If you start at a low probability and move it up based on the load impact it will provide a lot of insight and you can control the cost. --- Chris Lohfink On Fri, Nov 7, 2014 at 11:35 AM, Jimmy Lin

Re: query tracing

2014-11-07 Thread Jonathan Haddad
Personally I've found that using query timing + log aggregation on the client side is more effective than trying to mess with tracing probability in order to find a single query which has recently become a problem. I recommend wrapping your session with something that can automatically log the

Re: Query returns incomplete result

2014-05-19 Thread Aaron Morton
Calling execute the second time runs the query a second time, and it looks like the query mutates instance state during the pagination. What happens if you only call execute() once ? Cheers Aaron - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache

Re: Query first 1 columns for each partitioning keys in CQL?

2014-05-19 Thread Bryan Talbot
I think there are several issues in your schema and queries. First, the schema can't efficiently return the single newest post for every author. It can efficiently return the newest N posts for a particular author. On Fri, May 16, 2014 at 11:53 PM, 後藤 泰陽 matope@gmail.com wrote: But I

Re: Query first 1 columns for each partitioning keys in CQL?

2014-05-17 Thread 後藤 泰陽
Hello, Thank you for your addressing. But I consider LIMIT to be a keyword to limits result numbers from WHOLE results retrieved by the SELECT statement. The result with SELECT.. LIMIT is below. Unfortunately, This is not what I wanted. I wante latest posts of each authors. (Now I doubt if

Re: Query first 1 columns for each partitioning keys in CQL?

2014-05-17 Thread DuyHai Doan
Clearly with your current data model, having X latest post for each author is not possible. However, what's about this ? CREATE TABLE latest_posts_per_user ( author ascii latest_post mapuuid,text, PRIMARY KEY (author) ) The latest_post will keep a collection of X latest posts for

Re: Query first 1 columns for each partitioning keys in CQL?

2014-05-17 Thread Matope Ono
Hmm. Something like a user-managed-index looks the only way to do what I want to do. Thank you, I'll try that. 2014-05-17 18:07 GMT+09:00 DuyHai Doan doanduy...@gmail.com: Clearly with your current data model, having X latest post for each author is not possible. However, what's about this

Re: Query first 1 columns for each partitioning keys in CQL?

2014-05-16 Thread Jonathan Lacefield
Hello, Have you looked at using the CLUSTERING ORDER BY and LIMIT features of CQL3? These may help you achieve your goals. http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refClstrOrdr.html http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html

Re: Query on blob col using CQL3

2014-02-28 Thread Mikhail Stepura
Did you try http://cassandra.apache.org/doc/cql3/CQL.html#blobFun ? On 2/28/14, 9:14, Senthil, Athinanthny X. -ND wrote: Anyone can suggest how to query on blob column via CQL3. I get bad request error saying cannot parse data. I want to lookup on key column which is defined as blob. But I

  1   2   >