subject:"RE\: \[EXTERNAL\]"

RE: [EXTERNAL] Re: Cassandra migration from 1.25 to 3.x

2019-06-17 Thread Durity, Sean R

The advice so far is exactly correct for an in-place kind of upgrade. The blog 
post you mentioned is different. They decided to jump versions in Cassandra by 
standing up a new cluster and using a dual-write/dual-read process for their 
app. They also wrote code to read and interpret sstables in order to migrate 
existing data. Getting that right with compaction running, data consistency, 
etc. it not easy. That is what Cassandra does, of course. They had to reverse 
engineer that process.

I would not personally take that path as it seems a more difficult way to go -- 
for the DBA/admin. It is a nice path for the development team, though. They 
only had to look at their reads and writes (already encapsulated in a DAO) for 
the dual clusters. In a multi-upgrade scenario, drivers and statements probably 
have to get upgraded at several steps along the way (including a move from 
Thrift to CQL, probably). More app testing is required each upgrade. So, the 
decision has to be based on which resources you have and trust (app dev and 
testing + Cassandra upgrades or data migration and testing). Once you have 
automated/semi-automated Cassandra upgrades in place, that is an easier path, 
but that company obviously hadn't invested there.

Sean Durity

-Original Message-
From: Michael Shuler  On Behalf Of Michael Shuler
Sent: Monday, June 17, 2019 8:26 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra migration from 1.25 to 3.x

First and foremost, read NEWS.txt from your current version to the version you 
wish to upgrade to. There are too may details that you many need to be aware 
of. For instance, in the 2.0.0 Upgrading notes:

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_cassandra-2D3.11_NEWS.txt-23L1169-2DL1178=DwIDaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=CPWy2XBaDFHzVDapNO4E7kLIFFfbkRd8KqftSrypjSU=D3Y18E9gxewpushCMETjHt9cS8lKvLMrhUdhPriF4Dk=

I assume you meant 1.2.5, so you're first step is to upgrade to at least
1.2.9 (I would suggest using latest 1.2.x, which is 1.2.19). Then you can to to 
2.0.x and up.

Practicing on a scratch cluster is valuable experience. Reading the upgrade 
notes in NEWS.txt is a must.

--
Kind regards,
Michael

On 6/17/19 3:34 AM, Anurag Sharma wrote:
> Thanks Alex,
>
> I came across some interesting and efficient ways of upgrading from
> 1.x to 3.x as described in the blog here
>  eezilkha_database-2Dmigration-2Dat-2Dscale-2Dae85c14c3621=DwIDaQ=M
> tgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=
> CPWy2XBaDFHzVDapNO4E7kLIFFfbkRd8KqftSrypjSU=7ekpEXHT1Qm_xL9l6_1Kty32
> fDDerlB_PgO1-4K1-VQ= > and others. Was curious if someone has
> open-sourced their custom utility.  :D
>
> Regards
> Anurag
>
> On Mon, Jun 17, 2019 at 1:27 PM Oleksandr Shulgin
> mailto:oleksandr.shul...@zalando.de>> wrote:
>
> On Mon, Jun 17, 2019 at 9:30 AM Anurag Sharma
> mailto:anurag.rp.sha...@gmail.com>> wrote:
>
>
> We are upgrading Cassandra from 1.25 to 3.X. Just curious if
> there is any recommended open source utility for the same.
>
>
> Hi,
>
> The "recommended  open source utility" is the Apache Cassandra
> itself. ;-)
>
> Given the huge difference between the major versions, though, you
> will need a decent amount of planning and preparation to
> successfully complete such a migration.  Most likely you will want
> to do it in small steps, first upgrading to the latest minor version
> in the 1.x series, then making a jump to 2.x, then to 3.0, and only
> then to 3.x if you really mean to.  On each upgrade step, be sure to
> examine the release notes carefully to understand if there is any
> impact for your cluster and/or client applications.  Do have a test
> system with preferably identical setup and configuration and execute
> the upgrade steps there first to verify your expectations.
>
> Good luck!
> --
> Alex
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any

Re: [EXTERNAL] Re: Sstableloader

2019-05-30 Thread Goetz, Anthony

It appears you have two goals you are trying to accomplish at the same time.  
My recommendation is to break it into two different steps.  You need to decide 
if you are going to upgrade DSE or OSS.


  *   Upgrade DSE then migrate to OSS
 *   Upgrade DSE to version that matches OSS 3.11.3 binary
 *   Perform datacenter switch
  *   Migrate to OSS then upgrade
 *   Migrate to OSS using version that matches DSE Cassandra binary (DSE 
5.0.7 = 3.0.11)
 *   Upgrade OSS to 3.11.3 binary

From: Rahul Reddy 
Date: Thursday, May 30, 2019 at 6:37 AM
To: Cassandra User List 
Cc: Anthony Goetz 
Subject: [EXTERNAL] Re: Sstableloader

Thank you Anthony and Jonathan. To add new ring it doesn't have to be same 
version of Cassandra right. For ex dse 5.12 which is 3.11.0 has stables with mc 
name and apache 3.11.3 also uses sstables name with mc . We should be still 
able to add it to the ring correct

On Wed, May 29, 2019, 9:55 PM Goetz, Anthony 
mailto:anthony_goe...@comcast.com>> wrote:
My team migrated from DSE to OSS a few years ago by doing datacenter switch.  
You will need to update replication strategy for all keyspaces that are using 
Everywhere to NetworkTopologyStrategy before adding any OSS nodes.  As Jonathan 
mentioned, DSE nodes will revert this change on restart.  To account for this, 
we modified our init script to call a cql script that would make sure the 
keyspaces were set back to NetworkTopologyStrategy.

High Level Plan:

  *   Find DSE Cassandra binary version
  *   Review config to make sure you are not using any DSE specific settings
  *   Update replication strategy on keyspaces using Everywhere to 
NetworkTopologyStrategy
  *   Add OSS DC using same binary version as DSE
  *   Migrate clients to new OSS DC
  *   Decommission DSE DC

Note:  OpsCenter will stop working once you add OSS nodes.

From: Jonathan Koppenhofer mailto:j...@koppedomain.com>>
Reply-To: Cassandra User List 
mailto:user@cassandra.apache.org>>
Date: Wednesday, May 29, 2019 at 6:45 PM
To: Cassandra User List 
mailto:user@cassandra.apache.org>>
Subject: [EXTERNAL] Re: Sstableloader

Has anyone tried to do a DC switch as a means to migrate from Datastax to OSS? 
This would be the safest route as the ability to revert back to Datastax is 
easy. However, I'm curious how the dse_system keyspace would be replicated to 
OSS using their custom Everywhere strategy. You may have to change the to 
Network topology strategy before firing up OSS nodes. Also, keep in mind if you 
restart any DSE nodes, it will revert that keyspace back to EverywhereStrategy.

I also posted a means to migrate in place on this mailing list a few months 
back (thanks for help from others on the mailing list), but it is a little more 
involved and risky. Let me know if you can't find it, and I'll dig it up.

Finally, DSE 5.0 is open source equivalent 3.0.x. recommend you go to OSS 3.0 
then up to 3.11.
On Wed, May 29, 2019, 5:56 PM Nitan Kainth 
mailto:nitankai...@gmail.com>> wrote:
If cassandra version is same, it should work

Regards,
Nitan
Cell: 510 449 9629

On May 28, 2019, at 4:21 PM, Rahul Reddy 
mailto:rahulreddy1...@gmail.com>> wrote:
Hello,

Does sstableloader works between datastax and Apache cassandra. I'm trying to 
migrate dse 5.0.7 to Apache 3.11.1 ?

RE: [EXTERNAL] Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-28 Thread Durity, Sean R

This may sound a bit harsh, but I teach my developers that if they are trying 
to use ALLOW FILTERING – they are doing it wrong! We often choose Cassandra for 
its high availability and scalability characteristics. We love no downtime. 
ALLOW FILTERING is breaking the rules of availability and scalability.

Look at the full text of the error (not just the ending):
Bad Request: Cannot execute this query as it might involve data filtering and 
thus may have unpredictable performance. If you want to execute this query 
despite the performance unpredictability, use ALLOW FILTERING.
It is being polite, but it does warn you that performance is unpredictable. I 
can predict this: allow filtering will not scale. It won’t scale to large 
numbers of nodes (with small tables) or to large numbers of rows (regardless of 
node count). If you ignore the admittedly too polite warning, Cassandra will 
try to answer your query. It does it with a brute force, scan everything 
approach on all nodes (because you didn’t give it any partitions to target 
directly). That gets expensive and dangerous quickly. And, yes, it can endanger 
the whole cluster.

As an administrator, I do think that Cassandra should be able to protect itself 
better, perhaps by allowing the administrator to disallow those queries at all. 
It does at least warn you.


From: Attila Wind 
Sent: Tuesday, May 28, 2019 4:47 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Select in allow filtering stalls whole cluster. How to 
prevent such behavior?


Hi Shalom,

Thanks for your notes! So you also experienced this thing... fine

Then maybe the best rules to follow are these:
a) never(!) run a query "ALLOW FILTERING" on a Production cluster
b) if you need these queries build a test cluster (somehow) and mirror the data 
(somehow) OR add denormalized tables (write + code complexity overhead) to 
fulfill those queries

Can we agree on this one maybe as a "good to follow" policy?

In our case luckily users = developers always. So I can expect them being aware 
of the consequences of a particular query.
We also have test data fully mirrored into a test cluster. So running those 
queries on test system is possible.
Plus If for whatever reason we really really need to run such a query in Prod I 
can simply instruct them test query like this in the test system for sure

cheers
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355

On 2019. 05. 28. 8:59, shalom sagges wrote:
Hi Attila,

I'm definitely no guru, but I've experienced several cases where people at my 
company used allow filtering and caused major performance issues.
As data size increases, the impact will be stronger. If you have large 
partitions, performance will decrease.
GC can be affected. And if GC stops the world too long for too many times, you 
will feel it.

I sincerely believe the best way would be to educate the users and remodel the 
data. Perhaps you need to denormalize your tables or at least use secondary 
indices (I prefer to keep it as simple as possible and denormalize).
If it's a cluster for analytics, perhaps you need to build a designated cluster 
only for that so if something does break or get too pressured, normal 
activities wouldn't be affected, but there are pros and cons for that idea too.

Hope this helps.

Regards,


On Tue, May 28, 2019 at 9:43 AM Attila Wind 
 wrote:

Hi Gurus,

Looks we stopped this thread. However I would be very much curious answers 
regarding b) ...

Anyone any comments on that?
I do see this as a potential production outage risk now... Especially as we are 
planning to run analysis queries by hand exactly like that over the cluster...

thanks!
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355

On 2019. 05. 23. 11:42, shalom sagges wrote:
a) Interesting... But only in case you do not provide partitioning key right? 
(so IN() is for partitioning key?)

I think you should ask yourself a different question. Why am I using ALLOW 
FILTERING in the first place? What happens if I remove it from the query?
I prefer to denormalize the data to multiple tables or at least create an index 
on the requested column (preferably queried together with a known partition 
key).


b) Still does not explain or justify "all 8 nodes to halt and unresponsiveness 
to external requests" behavior... Even if servers are busy with the request 
seriously becoming non-responsive...?

I think it can justify the unresponsiveness.

RE: [EXTERNAL] Re: Python driver concistency problem

2019-05-28 Thread Durity, Sean R

This is a stretch, but are you using authentication and/or authorization? In my 
understanding the queries executed for you to do the authentication and/or 
authorization are usually done at LOCAL_ONE (or QUORUM for cassandra user), but 
maybe there is something that is changed in the security setup? Any UDTs or 
triggers involved in the query? To me, your error seems more like a query being 
executed “for you” instead of your actual query.


Sean Durity


From: Vlad 
Sent: Wednesday, May 22, 2019 6:53 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Python driver concistency problem

That's the issue - I do not use consistency ALL. I set QUORUM or ONE but it 
still performs with ALL.

On Wednesday, May 22, 2019 12:42 PM, shalom sagges 
mailto:shalomsag...@gmail.com>> wrote:

In a lot of cases, the issue is with the data model.
Can you describe the table?
Can you provide the query you use to retrieve the data?
What's the load on your cluster?
Are there lots of tombstones?

You can set the consistency level to ONE, just to check if you get responses. 
Although normally I would never use ALL unless I run a DDL command.
I prefer local_quorum if I want my consistency to be strong while keeping 
Cassandra's high availability.

Regards,









The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-16 Thread Ahmed Eljami

The issue is fixed with nodetool scrub, now both rows are under the same
clustering.

I'll open a jira to analyze the source of this issue with Cassandra 3.11.3

Thanks.

Le jeu. 16 mai 2019 à 04:53, Jeff Jirsa  a écrit :

> I don’t have a good answer for you - I don’t know if scrub will fix this
> (you could copy an sstable offline and try it locally in ccm) - you may
> need to delete and reinsert, though I’m really interested in knowing how
> this happened if you weren’t ever exposed to #14008.
>
> Can you open a JIRA? If your sstables aren’t especially sensitive,
> uploading them would be swell. Otherwise , an anonymized JSON dump may be
> good enough for whichever developer looks at fixing this
>
> --
> Jeff Jirsa
>
>
> On May 15, 2019, at 7:27 PM, Ahmed Eljami  wrote:
>
> Jeff, In this case is there any solution to resolve that directly in the
> sstable (compact, scrub...) or we have to apply a batch on the client level
> (delete a partition and re write it)?
>
> Thank you for your reply.
>
> Le mer. 15 mai 2019 à 18:09, Ahmed Eljami  a
> écrit :
>
>> effectively, this was written in 2.1.14 and we upgrade to 3.11.3 so we
>> should not be impacted by this issue ?!
>> thanks
>>
>>

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Jeff Jirsa

I don’t have a good answer for you - I don’t know if scrub will fix this (you 
could copy an sstable offline and try it locally in ccm) - you may need to 
delete and reinsert, though I’m really interested in knowing how this happened 
if you weren’t ever exposed to #14008. 

Can you open a JIRA? If your sstables aren’t especially sensitive, uploading 
them would be swell. Otherwise , an anonymized JSON dump may be good enough for 
whichever developer looks at fixing this 

-- 
Jeff Jirsa

> On May 15, 2019, at 7:27 PM, Ahmed Eljami  wrote:
> 
> Jeff, In this case is there any solution to resolve that directly in the 
> sstable (compact, scrub...) or we have to apply a batch on the client level 
> (delete a partition and re write it)?
> 
> Thank you for your reply. 
> 
>> Le mer. 15 mai 2019 à 18:09, Ahmed Eljami  a écrit :
>> effectively, this was written in 2.1.14 and we upgrade to 3.11.3 so we 
>> should not be impacted by this issue ?!
>> thanks
>>

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Ahmed Eljami

Jeff, In this case is there any solution to resolve that directly in the
sstable (compact, scrub...) or we have to apply a batch on the client level
(delete a partition and re write it)?

Thank you for your reply.

Le mer. 15 mai 2019 à 18:09, Ahmed Eljami  a écrit :

> effectively, this was written in 2.1.14 and we upgrade to 3.11.3 so we
> should not be impacted by this issue ?!
> thanks
>
>

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Ahmed Eljami

 effectively, this was written in 2.1.14 and we upgrade to 3.11.3 so we
should not be impacted by this issue ?!
thanks

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Jeff Jirsa

https://issues.apache.org/jira/browse/CASSANDRA-14008

If this was written in 2.1/2.2 and you upgraded to 3.0.x (x < 16) or 
3.1-3.11.1, could be this issue. 

-- 
Jeff Jirsa


> On May 15, 2019, at 8:43 AM, Ahmed Eljami  wrote:
> 
> What about this part of the dump:
> 
> "type" : "row",
> "position" : 4123,
> "clustering" : [ "", "Token", "abcd", "" ],
> "cells" : [
>   { "name" : "dvalue", "value" : "", "tstamp" : 
> "2019-04-26T17:20:39.910Z", "ttl" : 31708792, "expires_at" : 
> "2020-04-27T17:20:31Z", "expired" : false } 
> 
> Why we don't have a liveness_info for this row ?
> 
> Thanks
> 
>> Le mer. 15 mai 2019 à 17:40, Ahmed Eljami  a écrit :
>> Hi Sean,
>> Thanks for reply,
>> I'm agree with you about uniquness but when  the output of sstabledump show 
>> that we have the same value for the column g => "clustering" : [ "", 
>> "Token", "abcd", "" ], 
>> and when we select with the whole primary key with the valuers wich I see in 
>> the sstable, cqlsh return 2 rows..
>> 
>>> Le mer. 15 mai 2019 à 17:27, Durity, Sean R  a 
>>> écrit :
>>> Uniqueness is determined by the partition key PLUS the clustering columns. 
>>> Hard to tell from your data below, but is it possible that one of the 
>>> clustering columns (perhaps g) has different values? That would easily 
>>> explain the 2 rows returned – because they ARE different rows in the same 
>>> partition. In your data model, make sure you need all the clustering 
>>> columns to determine uniqueness or you will indeed have more rows than you 
>>> might expect.
>>> 
>>>  
>>> 
>>> Sean Durity
>>> 
>>>  
>>> 
>>>  
>>> 
>>> From: Ahmed Eljami  
>>> Sent: Wednesday, May 15, 2019 10:56 AM
>>> To: user@cassandra.apache.org
>>> Subject: [EXTERNAL] Two separate rows for the same partition !!
>>> 
>>>  
>>> 
>>> Hi guys,
>>> 
>>>  
>>> 
>>> We have a strange problem with the data in cassandra, after inserting twice 
>>> the same partition with differents columns, we see that cassandra returns 2 
>>> rows on cqlsh rather than one...:
>>> 
>>>  
>>> 
>>> a| b| c| d| f| g| h| i| j| k| l
>>> 
>>> --++---+--+---+-++---+--++
>>> 
>>> |bbb|  rrr| | Token | abcd|| False | 
>>> {'expiration': '1557943260838', 'fname': 'WS', 'freshness': 
>>> '1556299239910'} |   null |   null
>>> 
>>> |bbb|  rrr| | Token | abcd||  null |
>>>  null | 
>>>|   null
>>> 
>>>  
>>> 
>>> With the primary key = PRIMARY KEY ((a, b, c), d, e, f, g)
>>> 
>>>  
>>> 
>>> On the sstable we have the following data:
>>> 
>>>  
>>> 
>>> [
>>>   {
>>> "partition" : {
>>>   "key" : [ "", "bbb", "rrr" ],
>>>   "position" : 3760
>>> },
>>> "rows" : [
>>>   {
>>> "type" : "range_tombstone_bound",
>>> "start" : {
>>>   "type" : "inclusive",
>>>   "clustering" : [ "", "Token", "abcd", "*" ],
>>>   "deletion_info" : { "marked_deleted" : 
>>> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
>>> }
>>>   },
>>>   {
>>> "type" : "range_tombstone_bound",
>>> "end" : {
>>>   "type" : "exclusive",
>>>   "clustering" : [ "", "Token", "abcd", "" ],
>>>   "deletion_info" : { "marked_deleted" : 
>>> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
>>> }
>>>   },
>>>   {
>>> "type" : "row",
>>> "position" : 3974,
>>> "clustering" : [ "", "Token", "abcd", "" ],
>>> "liveness_info" : { "tstamp" : "2019-04-26T17:20:39.910Z", "ttl" : 
>>> 31708792, "expires_at" : "2020-04-27T17:20:31Z", "expired" : false },
>>> "cells" : [
>>>   { "name" : "connected", "value" : false },
>>>   { "name" : "dattrib", "deletion_info" : { "marked_deleted" : 
>>> "2019-04-26T17:20:39.90Z", "local_delete_time" : "2019-04-26T17:20:39Z" 
>>> } },
>>>   { "name" : "dattrib", "path" : [ "expiration" ], "value" : 
>>> "1557943260838" },
>>>   { "name" : "dattrib", "path" : [ "fname" ], "value" : "WS" },
>>>   { "name" : "dattrib", "path" : [ "freshness" ], "value" : 
>>> "1556299239910" }
>>> ]
>>>   },
>>>   {
>>> "type" : "row",
>>> "position" : 4123,
>>> "clustering" : [ "", "Token", "abcd", "" ],
>>> "cells" : [
>>>   { "name" : "dvalue", "value" : "", "tstamp" : 
>>> "2019-04-26T17:20:39.910Z", "ttl" : 31708792, "expires_at" : 
>>> "2020-04-27T17:20:31Z", "expired" : false }
>>> ]
>>>   },
>>>   {
>>> "type" : "range_tombstone_bound",
>>> "start" : {
>>>

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Ahmed Eljami

What about this part of the dump:

"type" : "row",
"position" : 4123,
"clustering" : [ "", "Token", "abcd", "" ],
"cells" : [
  { "name" : "dvalue", "value" : "", "tstamp" :
"2019-04-26T17:20:39.910Z", "ttl" : 31708792, "expires_at" :
"2020-04-27T17:20:31Z", "expired" : false }

Why we don't have a *liveness_info* for this row ?

Thanks

Le mer. 15 mai 2019 à 17:40, Ahmed Eljami  a écrit :

> Hi Sean,
> Thanks for reply,
> I'm agree with you about uniquness but when  the output of sstabledump
> show that we have the same value for the column g => "clustering" : [
> "", "Token", "abcd", "" ],
> and when we select with the whole primary key with the valuers wich I see
> in the sstable, cqlsh return 2 rows..
>
> Le mer. 15 mai 2019 à 17:27, Durity, Sean R 
> a écrit :
>
>> Uniqueness is determined by the partition key PLUS the clustering
>> columns. Hard to tell from your data below, but is it possible that one of
>> the clustering columns (perhaps g) has different values? That would easily
>> explain the 2 rows returned – because they ARE different rows in the same
>> partition. In your data model, make sure you need all the clustering
>> columns to determine uniqueness or you will indeed have more rows than you
>> might expect.
>>
>>
>>
>> Sean Durity
>>
>>
>>
>>
>>
>> *From:* Ahmed Eljami 
>> *Sent:* Wednesday, May 15, 2019 10:56 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] Two separate rows for the same partition !!
>>
>>
>>
>> Hi guys,
>>
>>
>>
>> We have a strange problem with the data in cassandra, after inserting
>> twice the same partition with differents columns, we see that cassandra
>> returns 2 rows on cqlsh rather than one...:
>>
>>
>>
>> a| b| c| d| f| g| h| i| j| k| l
>>
>>
>> --++---+--+---+-++---+--++
>>
>> |bbb|  rrr| | Token | abcd|| False |
>> {'expiration': '1557943260838', 'fname': 'WS', 'freshness':
>> '1556299239910'} |   null |   null
>>
>> |bbb|  rrr| | Token | abcd||  null |
>>
>> null ||   null
>>
>>
>>
>> With the primary key = PRIMARY KEY ((a, b, c), d, e, f, g)
>>
>>
>>
>> On the sstable we have the following data:
>>
>>
>>
>> [
>>   {
>> "partition" : {
>>   "key" : [ "", "bbb", "rrr" ],
>>   "position" : 3760
>> },
>> "rows" : [
>>   {
>> "type" : "range_tombstone_bound",
>> "start" : {
>>   "type" : "inclusive",
>>   "clustering" : [ "", "Token", "abcd", "*" ],
>>   "deletion_info" : { "marked_deleted" :
>> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
>> }
>>   },
>>   {
>> "type" : "range_tombstone_bound",
>> "end" : {
>>   "type" : "exclusive",
>>   "clustering" : [ "", "Token", "abcd", "" ],
>>   "deletion_info" : { "marked_deleted" :
>> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
>> }
>>   },
>>   {
>> "type" : "row",
>> "position" : 3974,
>> "clustering" : [ "", "Token", "abcd", "" ],
>> "liveness_info" : { "tstamp" : "2019-04-26T17:20:39.910Z", "ttl"
>> : 31708792, "expires_at" : "2020-04-27T17:20:31Z", "expired" : false },
>> "cells" : [
>>   { "name" : "connected", "value" : false },
>>   { "name" : "dattrib", "deletion_info" : { "marked_deleted" :
>> "2019-04-26T17:20:39.90Z", "local_delete_time" : "2019-04-26T17:20:39Z"
>> } },
>>   { "name" : "dattrib", "path" : [ "expiration" ], "value" :
>> "1557943260838" },
>>   { "name" : "dattrib", "path" : [ "fname" ], "value" : "WS" },
>>   { "name" : "dattrib", "path" : [ "freshness" ], "value" :
>> "1556299239910" }
>> ]
>>   },
>>   {
>> "type" : "row",
>> "position" : 4123,
>> "clustering" : [ "", "Token", "abcd", "" ],
>> "cells" : [
>>   { "name" : "dvalue", "value" : "", "tstamp" :
>> "2019-04-26T17:20:39.910Z", "ttl" : 31708792, "expires_at" :
>> "2020-04-27T17:20:31Z", "expired" : false }
>> ]
>>   },
>>   {
>> "type" : "range_tombstone_bound",
>> "start" : {
>>   "type" : "exclusive",
>>   "clustering" : [ "", "Token", "abcd", "" ],
>>   "deletion_info" : { "marked_deleted" :
>> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
>> }
>>   },
>>   {
>> "type" : "range_tombstone_bound",
>> "end" : {
>>   "type" : "inclusive",
>>   "clustering" : [ "", "Token", "abcd", "*" ],
>>   "deletion_info" : { "marked_deleted" :
>> "2019-04-26T17:20:39.909Z", "local_delete_time"

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Ahmed Eljami

Hi Sean,
Thanks for reply,
I'm agree with you about uniquness but when  the output of sstabledump show
that we have the same value for the column g => "clustering" : [ "",
"Token", "abcd", "" ],
and when we select with the whole primary key with the valuers wich I see
in the sstable, cqlsh return 2 rows..

Le mer. 15 mai 2019 à 17:27, Durity, Sean R  a
écrit :

> Uniqueness is determined by the partition key PLUS the clustering columns.
> Hard to tell from your data below, but is it possible that one of the
> clustering columns (perhaps g) has different values? That would easily
> explain the 2 rows returned – because they ARE different rows in the same
> partition. In your data model, make sure you need all the clustering
> columns to determine uniqueness or you will indeed have more rows than you
> might expect.
>
>
>
> Sean Durity
>
>
>
>
>
> *From:* Ahmed Eljami 
> *Sent:* Wednesday, May 15, 2019 10:56 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Two separate rows for the same partition !!
>
>
>
> Hi guys,
>
>
>
> We have a strange problem with the data in cassandra, after inserting
> twice the same partition with differents columns, we see that cassandra
> returns 2 rows on cqlsh rather than one...:
>
>
>
> a| b| c| d| f| g| h| i| j| k| l
>
>
> --++---+--+---+-++---+--++
>
> |bbb|  rrr| | Token | abcd|| False |
> {'expiration': '1557943260838', 'fname': 'WS', 'freshness':
> '1556299239910'} |   null |   null
>
> |bbb|  rrr| | Token | abcd||  null |
>
> null ||   null
>
>
>
> With the primary key = PRIMARY KEY ((a, b, c), d, e, f, g)
>
>
>
> On the sstable we have the following data:
>
>
>
> [
>   {
> "partition" : {
>   "key" : [ "", "bbb", "rrr" ],
>   "position" : 3760
> },
> "rows" : [
>   {
> "type" : "range_tombstone_bound",
> "start" : {
>   "type" : "inclusive",
>   "clustering" : [ "", "Token", "abcd", "*" ],
>   "deletion_info" : { "marked_deleted" :
> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
> }
>   },
>   {
> "type" : "range_tombstone_bound",
> "end" : {
>   "type" : "exclusive",
>   "clustering" : [ "", "Token", "abcd", "" ],
>   "deletion_info" : { "marked_deleted" :
> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
> }
>   },
>   {
> "type" : "row",
> "position" : 3974,
> "clustering" : [ "", "Token", "abcd", "" ],
> "liveness_info" : { "tstamp" : "2019-04-26T17:20:39.910Z", "ttl" :
> 31708792, "expires_at" : "2020-04-27T17:20:31Z", "expired" : false },
> "cells" : [
>   { "name" : "connected", "value" : false },
>   { "name" : "dattrib", "deletion_info" : { "marked_deleted" :
> "2019-04-26T17:20:39.90Z", "local_delete_time" : "2019-04-26T17:20:39Z"
> } },
>   { "name" : "dattrib", "path" : [ "expiration" ], "value" :
> "1557943260838" },
>   { "name" : "dattrib", "path" : [ "fname" ], "value" : "WS" },
>   { "name" : "dattrib", "path" : [ "freshness" ], "value" :
> "1556299239910" }
> ]
>   },
>   {
> "type" : "row",
> "position" : 4123,
> "clustering" : [ "", "Token", "abcd", "" ],
> "cells" : [
>   { "name" : "dvalue", "value" : "", "tstamp" :
> "2019-04-26T17:20:39.910Z", "ttl" : 31708792, "expires_at" :
> "2020-04-27T17:20:31Z", "expired" : false }
> ]
>   },
>   {
> "type" : "range_tombstone_bound",
> "start" : {
>   "type" : "exclusive",
>   "clustering" : [ "", "Token", "abcd", "" ],
>   "deletion_info" : { "marked_deleted" :
> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
> }
>   },
>   {
> "type" : "range_tombstone_bound",
> "end" : {
>   "type" : "inclusive",
>   "clustering" : [ "", "Token", "abcd", "*" ],
>   "deletion_info" : { "marked_deleted" :
> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
> }
>   }
> ]
>   }
>
>
>
> what's weired that the two rows with "position" : 3974, and  "position" :
> 4123 should be on the same row...!!
>
> Since, we can't reproduce the issue ...
>
>
>
> Any idea please ?
>
> Thanks.
>
>
>
> --
>
> Cordialement;
>
> Ahmed ELJAMI
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient,

RE: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Durity, Sean R

Uniqueness is determined by the partition key PLUS the clustering columns. Hard 
to tell from your data below, but is it possible that one of the clustering 
columns (perhaps g) has different values? That would easily explain the 2 rows 
returned – because they ARE different rows in the same partition. In your data 
model, make sure you need all the clustering columns to determine uniqueness or 
you will indeed have more rows than you might expect.

Sean Durity


From: Ahmed Eljami 
Sent: Wednesday, May 15, 2019 10:56 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Two separate rows for the same partition !!

Hi guys,

We have a strange problem with the data in cassandra, after inserting twice the 
same partition with differents columns, we see that cassandra returns 2 rows on 
cqlsh rather than one...:

a| b| c| d| f| g| h| i| j| k| l

--++---+--+---+-++---+--++

|bbb|  rrr| | Token | abcd|| False | 
{'expiration': '1557943260838', 'fname': 'WS', 'freshness': '1556299239910'} |  
 null |   null

|bbb|  rrr| | Token | abcd||  null |
 null |
|   null

With the primary key = PRIMARY KEY ((a, b, c), d, e, f, g)

On the sstable we have the following data:

[
  {
"partition" : {
  "key" : [ "", "bbb", "rrr" ],
  "position" : 3760
},
"rows" : [
  {
"type" : "range_tombstone_bound",
"start" : {
  "type" : "inclusive",
  "clustering" : [ "", "Token", "abcd", "*" ],
  "deletion_info" : { "marked_deleted" : "2019-04-26T17:20:39.909Z", 
"local_delete_time" : "2019-04-26T17:20:39Z" }
}
  },
  {
"type" : "range_tombstone_bound",
"end" : {
  "type" : "exclusive",
  "clustering" : [ "", "Token", "abcd", "" ],
  "deletion_info" : { "marked_deleted" : "2019-04-26T17:20:39.909Z", 
"local_delete_time" : "2019-04-26T17:20:39Z" }
}
  },
  {
"type" : "row",
"position" : 3974,
"clustering" : [ "", "Token", "abcd", "" ],
"liveness_info" : { "tstamp" : "2019-04-26T17:20:39.910Z", "ttl" : 
31708792, "expires_at" : "2020-04-27T17:20:31Z", "expired" : false },
"cells" : [
  { "name" : "connected", "value" : false },
  { "name" : "dattrib", "deletion_info" : { "marked_deleted" : 
"2019-04-26T17:20:39.90Z", "local_delete_time" : "2019-04-26T17:20:39Z" } },
  { "name" : "dattrib", "path" : [ "expiration" ], "value" : 
"1557943260838" },
  { "name" : "dattrib", "path" : [ "fname" ], "value" : "WS" },
  { "name" : "dattrib", "path" : [ "freshness" ], "value" : 
"1556299239910" }
]
  },
  {
"type" : "row",
"position" : 4123,
"clustering" : [ "", "Token", "abcd", "" ],
"cells" : [
  { "name" : "dvalue", "value" : "", "tstamp" : 
"2019-04-26T17:20:39.910Z", "ttl" : 31708792, "expires_at" : 
"2020-04-27T17:20:31Z", "expired" : false }
]
  },
  {
"type" : "range_tombstone_bound",
"start" : {
  "type" : "exclusive",
  "clustering" : [ "", "Token", "abcd", "" ],
  "deletion_info" : { "marked_deleted" : "2019-04-26T17:20:39.909Z", 
"local_delete_time" : "2019-04-26T17:20:39Z" }
}
  },
  {
"type" : "range_tombstone_bound",
"end" : {
  "type" : "inclusive",
  "clustering" : [ "", "Token", "abcd", "*" ],
  "deletion_info" : { "marked_deleted" : "2019-04-26T17:20:39.909Z", 
"local_delete_time" : "2019-04-26T17:20:39Z" }
}
  }
]
  }

what's weired that the two rows with "position" : 3974, and  "position" : 4123 
should be on the same row...!!
Since, we can't reproduce the issue ...

Any idea please ?
Thanks.

--
Cordialement;
Ahmed ELJAMI



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses,

RE: [EXTERNAL] Re: Using Cassandra as an object store

2019-04-19 Thread Durity, Sean R

Object stores are some of our largest and oldest use cases. Cassandra has been 
a good choice for us. We do chunk the objects into 64k chunks (I think), so 
that partitions are not too large and it scales predictably. For us, the choice 
was more about high availability and scalability, which Cassandra provides well.

Sean Durity




From: Paul Chandler 
Sent: Friday, April 19, 2019 5:24 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Using Cassandra as an object store

Gene,

I have found that clusters used as object stores have caused me more problems 
than normal in the past, so I recommend using a separate object store if 
possible.

However, it certainly can be done, there is just a few things to consider:

1) Deletion policy: How are these objects going to be deleted, we have had 
problems in the past where deleted objects didn’t get removed from disk. This 
was because by the time they were deleted they had been compacted into very 
large sstables that were rarely compacted again. So think about compaction 
strategy and any tombstone issues you may come across.

2) Compression: Are the objects already compressed before they are stored eg 
jpgs ? If so turn compression off on the table, this reduces the amount of data 
read into memory when reading the data, reducing pressure on the heap. We did 
some trials with one system, and found much better performance if the 
compression was performed on the client side. So try some tests with that.

3) How often is the data read? There will be be completely different hardware 
requirements depending on whether this is a image store for an e-commerce site, 
compared with a pdf store holding client invoices. With a small amount of reads 
per object, then you can specify smaller CPUs and memory machines with a large 
amount of storage. If there are a large amount of reads, them you need to think 
much more carefully about memory and CPU, as per the Walmart article you 
referenced.

Thanks

Paul Chandler
www.redshots.com




On 19 Apr 2019, at 09:04, DuyHai Doan 
mailto:doanduy...@gmail.com>> wrote:

Idea:

To guarantee data integrity, you can store an MD5 of all chunks data as static 
column in the partition that contains the chunks

On Fri, Apr 19, 2019 at 9:18 AM cclive1601你 
mailto:cclive1...@gmail.com>> wrote:
we have use cassandra as object store for some years, you can just split the 
object into some small pieces. object got a pk, then the some small pieces got 
some pks ,object's pk and pieces's pk can be store in meta table in cassandra, 
and small pieces's pk and some pieces store in data table.  we store videos 
,picture and other no structure data.

Gene mailto:gh5...@gmail.com>> 于2019年4月19日周五 下午1:25写道：
Howdy

I'm looking at the possibility of using cassandra as an object store to offload 
image/blob data from an Oracle database.  I've seen mentions of it being used 
as an object store in a large scale fashion, like with Walmart:

https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593

However I have found little on small scale setups and if it's even worth using 
Cassandra in place of something else that's meant to be used for object 
storage, like Ceph.

Additionally, I've read that cassandra struggles with storing objects 10MB or 
larger and it's recommended to break objects up into smaller chunks, which 
either requires some kind of middleware between our application and cassandra, 
or it would require our application to split objects into smaller chunks and 
recombine them as needed.

I've looked into pithos and astyanax, but those are both no longer developed 
and I'm not seeing anything that might replace them in the long term.

https://github.com/exoscale/pithos
https://github.com/Netflix/astyanax

Any helpful information or advice would be greatly appreciated.

Thanks in advance.

-Gene


--
you are the apple of my eye !

Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Jonathan Koppenhofer

We do multiple nodes per host as a standard practice. In our case, we never
put 2 nodes from a single cluster  on the same host, though as mentioned
before, you could potentially get away with that if you properly use rack
awareness, just be careful of load.

We also do NOT use any other layer of segregation such as docker or VMs, we
just have multiple IPs per host, and bind each IP to a distinct node. We
have looked at VMs and Containers, but they either add abstraction
complexity or some kind of performance penalty.

As for system resources, we dedicate individual ssds for each node, but
CPU, memory, and network is shared. We are spoiled by good network and
beefy memory, so the only place we have to be careful is CPU. As such, we
pick fairly conservative Cassandra.yaml settings and monitor CPU usage. If
workloads get hot on a particular host, we have some flexibility to move
things around.

In any case, it sounds like you will be fine running 1 node per host. With
that many resources, be sure to tune you nodes to make use of them.

Good luck.

On Thu, Apr 18, 2019, 2:49 PM William R 
wrote:

> hi,
>
> Thank you for your answers, starting with the most important point from
> your answers I understand that
>
> "it is OK to go more than 1 TB in disk usage"
>
> so in this case if I am going to use the 50% of the disk capacity I will
> end up having around 3 TB per node which in this case I will not need to
> use a docker solution which is a very good usa case for us.
>
> The goal of my setup is to save large data volumes in every node (~ 3 TB -
> 50% usage of HD) with the current hardware that we possess. The high
> availability I consider it standard since we are going to have 2 DCs with
> RF3.
>
> I also have to note that Datastax also recommends usage no more than 500
> GB - 1 TB.
>
> Cheers,
>
> Vasilis
>
>
> Sent with ProtonMail <https://protonmail.com> Secure Email.
>
> ‐‐‐ Original Message ‐‐‐
> On Thursday, April 18, 2019 6:56 PM, Jacques-Henri Berthemet <
> jacques-henri.berthe...@genesys.com> wrote:
>
> So how much data can you safely fit per node using SSDs with Cassandra
> 3.11? How much free space do you need on your disks?
>
> There should be some recommendations on node sizes on:
>
> http://cassandra.apache.org/doc/latest/operating/hardware.html
>
> Documentation - Apache Cassandra
> <http://cassandra.apache.org/doc/latest/operating/hardware.html>
> cassandra.apache.org
> The Apache Cassandra database is the right choice when you need
> scalability and high availability without compromising performance. Linear
> scalability and proven fault-tolerance on commodity hardware or cloud
> infrastructure make it the perfect platform for mission-critical data.
> Cassandra's support for replicating across multiple datacenters is
> best-in-class, providing lower latency for your ...
>
>
> --
>
> *From:* Jon Haddad 
> *Sent:* Thursday, April 18, 2019 6:43:15 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [EXTERNAL] multiple Cassandra instances per server,
> possible?
>
> Agreed with Jeff here.  The whole "community recommends no more than
> 1TB" has been around, and inaccurate, for a long time.
>
> The biggest issue with dense nodes is how long it takes to replace
> them.  4.0 should help with that under certain circumstances.
>
>
> On Thu, Apr 18, 2019 at 6:57 AM Jeff Jirsa  wrote:
> >
> > Agreed that you can go larger than 1T on ssd
> >
> > You can do this safely with both instances in the same cluster if you
> guarantee two replicas aren’t on the same machine. Cassandra provides a
> primitive to do this - rack awareness through the network topology snitch.
> >
> > The limitation (until 4.0) is that you’ll need two IPs per machine as
> both instances have to run in the same port.
> >
> >
> > --
> > Jeff Jirsa
> >
> >
> > On Apr 18, 2019, at 6:45 AM, Durity, Sean R 
> wrote:
> >
> > What is the data problem that you are trying to solve with Cassandra? Is
> it high availability? Low latency queries? Large data volumes? High
> concurrent users? I would design the solution to fit the problem(s) you are
> solving.
> >
> >
> >
> > For example, if high availability is the goal, I would be very cautious
> about 2 nodes/machine. If you need the full amount of the disk – you *can*
> have larger nodes than 1 TB. I agree that administration tasks (like
> adding/removing nodes, etc.) are more painful with large nodes – but not
> impossible. For large amounts of data, I like nodes that have about 2.5 – 3
> TB of usable SSD disk.
> >
> >
> >
> > It is possible

Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread William R

hi,

Thank you for your answers, starting with the most important point from your 
answers I understand that

"it is OK to go more than 1 TB in disk usage"

so in this case if I am going to use the 50% of the disk capacity I will end up 
having around 3 TB per node which in this case I will not need to use a docker 
solution which is a very good usa case for us.

The goal of my setup is to save large data volumes in every node (~ 3 TB - 50% 
usage of HD) with the current hardware that we possess. The high availability I 
consider it standard since we are going to have 2 DCs with RF3.

I also have to note that Datastax also recommends usage no more than 500 GB - 1 
TB.

Cheers,

Vasilis

Sent with [ProtonMail](https://protonmail.com) Secure Email.

‐‐‐ Original Message ‐‐‐
On Thursday, April 18, 2019 6:56 PM, Jacques-Henri Berthemet 
 wrote:

> So how much data can you safely fit per node using SSDs with Cassandra 3.11? 
> How much free space do you need on your disks?
>
> There should be some recommendations on node sizes on:
>
> http://cassandra.apache.org/doc/latest/operating/hardware.html
>
> [Documentation - Apache 
> Cassandra](http://cassandra.apache.org/doc/latest/operating/hardware.html)
> cassandra.apache.org
> The Apache Cassandra database is the right choice when you need scalability 
> and high availability without compromising performance. Linear scalability 
> and proven fault-tolerance on commodity hardware or cloud infrastructure make 
> it the perfect platform for mission-critical data. Cassandra's support for 
> replicating across multiple datacenters is best-in-class, providing lower 
> latency for your ...
>
> ---
>
> From: Jon Haddad 
> Sent: Thursday, April 18, 2019 6:43:15 PM
> To: user@cassandra.apache.org
> Subject: Re: [EXTERNAL] multiple Cassandra instances per server, possible?
>
> Agreed with Jeff here.  The whole "community recommends no more than
> 1TB" has been around, and inaccurate, for a long time.
>
> The biggest issue with dense nodes is how long it takes to replace
> them.  4.0 should help with that under certain circumstances.
>
> On Thu, Apr 18, 2019 at 6:57 AM Jeff Jirsa  wrote:
>>
>> Agreed that you can go larger than 1T on ssd
>>
>> You can do this safely with both instances in the same cluster if you 
>> guarantee two replicas aren’t on the same machine. Cassandra provides a 
>> primitive to do this - rack awareness through the network topology snitch.
>>
>> The limitation (until 4.0) is that you’ll need two IPs per machine as both 
>> instances have to run in the same port.
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Apr 18, 2019, at 6:45 AM, Durity, Sean R  
>> wrote:
>>
>> What is the data problem that you are trying to solve with Cassandra? Is it 
>> high availability? Low latency queries? Large data volumes? High concurrent 
>> users? I would design the solution to fit the problem(s) you are solving.
>>
>>
>>
>> For example, if high availability is the goal, I would be very cautious 
>> about 2 nodes/machine. If you need the full amount of the disk – you *can* 
>> have larger nodes than 1 TB. I agree that administration tasks (like 
>> adding/removing nodes, etc.) are more painful with large nodes – but not 
>> impossible. For large amounts of data, I like nodes that have about 2.5 – 3 
>> TB of usable SSD disk.
>>
>>
>>
>> It is possible that your nodes might be under-utilized, especially at first. 
>> But if the hardware is already available, you have to use what you have.
>>
>>
>>
>> We have done multiple nodes on single physical hardware, but they were two 
>> separate clusters (for the same application). In that case, we had  a 
>> different install location and different ports for one of the clusters.
>>
>>
>>
>> Sean Durity
>>
>>
>>
>> From: William R 
>> Sent: Thursday, April 18, 2019 9:14 AM
>> To: user@cassandra.apache.org
>> Subject: [EXTERNAL] multiple Cassandra instances per server, possible?
>>
>>
>>
>> Hi all,
>>
>>
>>
>> In our small company we have 10 nodes of (2 x 3 TB HD) 6 TB each, 128 GB ram 
>> and 64 cores and we are thinking to use them as Cassandra nodes. From what I 
>> am reading around, the community recommends that every node should not keep 
>> more than 1 TB data so in this case I am wondering if it is possible to 
>> install 2 instances per node using docker so each docker instance can write 
>> to its own physical disk and utilise more efficiently the rest hardware (CP

Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Jacques-Henri Berthemet

So how much data can you safely fit per node using SSDs with Cassandra 3.11? 
How much free space do you need on your disks?

There should be some recommendations on node sizes on:

http://cassandra.apache.org/doc/latest/operating/hardware.html

Documentation - Apache 
Cassandra<http://cassandra.apache.org/doc/latest/operating/hardware.html>
cassandra.apache.org
The Apache Cassandra database is the right choice when you need scalability and 
high availability without compromising performance. Linear scalability and 
proven fault-tolerance on commodity hardware or cloud infrastructure make it 
the perfect platform for mission-critical data. Cassandra's support for 
replicating across multiple datacenters is best-in-class, providing lower 
latency for your ...





From: Jon Haddad 
Sent: Thursday, April 18, 2019 6:43:15 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] multiple Cassandra instances per server, possible?

Agreed with Jeff here.  The whole "community recommends no more than
1TB" has been around, and inaccurate, for a long time.

The biggest issue with dense nodes is how long it takes to replace
them.  4.0 should help with that under certain circumstances.


On Thu, Apr 18, 2019 at 6:57 AM Jeff Jirsa  wrote:
>
> Agreed that you can go larger than 1T on ssd
>
> You can do this safely with both instances in the same cluster if you 
> guarantee two replicas aren’t on the same machine. Cassandra provides a 
> primitive to do this - rack awareness through the network topology snitch.
>
> The limitation (until 4.0) is that you’ll need two IPs per machine as both 
> instances have to run in the same port.
>
>
> --
> Jeff Jirsa
>
>
> On Apr 18, 2019, at 6:45 AM, Durity, Sean R  
> wrote:
>
> What is the data problem that you are trying to solve with Cassandra? Is it 
> high availability? Low latency queries? Large data volumes? High concurrent 
> users? I would design the solution to fit the problem(s) you are solving.
>
>
>
> For example, if high availability is the goal, I would be very cautious about 
> 2 nodes/machine. If you need the full amount of the disk – you *can* have 
> larger nodes than 1 TB. I agree that administration tasks (like 
> adding/removing nodes, etc.) are more painful with large nodes – but not 
> impossible. For large amounts of data, I like nodes that have about 2.5 – 3 
> TB of usable SSD disk.
>
>
>
> It is possible that your nodes might be under-utilized, especially at first. 
> But if the hardware is already available, you have to use what you have.
>
>
>
> We have done multiple nodes on single physical hardware, but they were two 
> separate clusters (for the same application). In that case, we had  a 
> different install location and different ports for one of the clusters.
>
>
>
> Sean Durity
>
>
>
> From: William R 
> Sent: Thursday, April 18, 2019 9:14 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] multiple Cassandra instances per server, possible?
>
>
>
> Hi all,
>
>
>
> In our small company we have 10 nodes of (2 x 3 TB HD) 6 TB each, 128 GB ram 
> and 64 cores and we are thinking to use them as Cassandra nodes. From what I 
> am reading around, the community recommends that every node should not keep 
> more than 1 TB data so in this case I am wondering if it is possible to 
> install 2 instances per node using docker so each docker instance can write 
> to its own physical disk and utilise more efficiently the rest hardware (CPU 
> & RAM).
>
>
>
> I understand with this setup there is the danger of creating a single point 
> of failure for 2 Cassandra nodes but except that do you think that is a 
> possible setup to start with the cluster?
>
>
>
> Except the docker solution do you recommend any other way to split the 
> physical node to 2 instances? (VMWare? or even maybe 2 separate installations 
> of Cassandra? )
>
>
>
> Eventually we are aiming in a cluster consisted of 2 DCs with 10 nodes each 
> (5 baremetal nodes with 2 Cassandra instances)
>
>
>
> Probably later when we will start introducing more nodes to the cluster we 
> can decommissioning the "double-instaned" ones and aim for a more homogeneous 
> solution..
>
>
>
> Thank you,
>
>
>
> Wil
>
>
> 
>
> The information in this Internet Email is confidential and may be legally 
> privileged. It is intended solely for the addressee. Access to this Email by 
> anyone else is unauthorized. If you are not the intended recipient, any 
> disclosure, copying, distribution or any action taken or omitted to be taken 
> in reliance on it, is prohibited and may be unlawful. When addressed to o

Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Jon Haddad

Agreed with Jeff here.  The whole "community recommends no more than
1TB" has been around, and inaccurate, for a long time.

The biggest issue with dense nodes is how long it takes to replace
them.  4.0 should help with that under certain circumstances.


On Thu, Apr 18, 2019 at 6:57 AM Jeff Jirsa  wrote:
>
> Agreed that you can go larger than 1T on ssd
>
> You can do this safely with both instances in the same cluster if you 
> guarantee two replicas aren’t on the same machine. Cassandra provides a 
> primitive to do this - rack awareness through the network topology snitch.
>
> The limitation (until 4.0) is that you’ll need two IPs per machine as both 
> instances have to run in the same port.
>
>
> --
> Jeff Jirsa
>
>
> On Apr 18, 2019, at 6:45 AM, Durity, Sean R  
> wrote:
>
> What is the data problem that you are trying to solve with Cassandra? Is it 
> high availability? Low latency queries? Large data volumes? High concurrent 
> users? I would design the solution to fit the problem(s) you are solving.
>
>
>
> For example, if high availability is the goal, I would be very cautious about 
> 2 nodes/machine. If you need the full amount of the disk – you *can* have 
> larger nodes than 1 TB. I agree that administration tasks (like 
> adding/removing nodes, etc.) are more painful with large nodes – but not 
> impossible. For large amounts of data, I like nodes that have about 2.5 – 3 
> TB of usable SSD disk.
>
>
>
> It is possible that your nodes might be under-utilized, especially at first. 
> But if the hardware is already available, you have to use what you have.
>
>
>
> We have done multiple nodes on single physical hardware, but they were two 
> separate clusters (for the same application). In that case, we had  a 
> different install location and different ports for one of the clusters.
>
>
>
> Sean Durity
>
>
>
> From: William R 
> Sent: Thursday, April 18, 2019 9:14 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] multiple Cassandra instances per server, possible?
>
>
>
> Hi all,
>
>
>
> In our small company we have 10 nodes of (2 x 3 TB HD) 6 TB each, 128 GB ram 
> and 64 cores and we are thinking to use them as Cassandra nodes. From what I 
> am reading around, the community recommends that every node should not keep 
> more than 1 TB data so in this case I am wondering if it is possible to 
> install 2 instances per node using docker so each docker instance can write 
> to its own physical disk and utilise more efficiently the rest hardware (CPU 
> & RAM).
>
>
>
> I understand with this setup there is the danger of creating a single point 
> of failure for 2 Cassandra nodes but except that do you think that is a 
> possible setup to start with the cluster?
>
>
>
> Except the docker solution do you recommend any other way to split the 
> physical node to 2 instances? (VMWare? or even maybe 2 separate installations 
> of Cassandra? )
>
>
>
> Eventually we are aiming in a cluster consisted of 2 DCs with 10 nodes each 
> (5 baremetal nodes with 2 Cassandra instances)
>
>
>
> Probably later when we will start introducing more nodes to the cluster we 
> can decommissioning the "double-instaned" ones and aim for a more homogeneous 
> solution..
>
>
>
> Thank you,
>
>
>
> Wil
>
>
> 
>
> The information in this Internet Email is confidential and may be legally 
> privileged. It is intended solely for the addressee. Access to this Email by 
> anyone else is unauthorized. If you are not the intended recipient, any 
> disclosure, copying, distribution or any action taken or omitted to be taken 
> in reliance on it, is prohibited and may be unlawful. When addressed to our 
> clients any opinions or advice contained in this Email are subject to the 
> terms and conditions expressed in any applicable governing The Home Depot 
> terms of business or client engagement letter. The Home Depot disclaims all 
> responsibility and liability for the accuracy and content of this attachment 
> and for any damages or losses arising from any inaccuracies, errors, viruses, 
> e.g., worms, trojan horses, etc., or other items of a destructive nature, 
> which may be contained in this attachment and shall not be liable for direct, 
> indirect, consequential or special damages in connection with this e-mail 
> message or its attachment.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Jeff Jirsa

Agreed that you can go larger than 1T on ssd

You can do this safely with both instances in the same cluster if you guarantee 
two replicas aren’t on the same machine. Cassandra provides a primitive to do 
this - rack awareness through the network topology snitch. 

The limitation (until 4.0) is that you’ll need two IPs per machine as both 
instances have to run in the same port.


-- 
Jeff Jirsa


> On Apr 18, 2019, at 6:45 AM, Durity, Sean R  
> wrote:
> 
> What is the data problem that you are trying to solve with Cassandra? Is it 
> high availability? Low latency queries? Large data volumes? High concurrent 
> users? I would design the solution to fit the problem(s) you are solving.
>  
> For example, if high availability is the goal, I would be very cautious about 
> 2 nodes/machine. If you need the full amount of the disk – you *can* have 
> larger nodes than 1 TB. I agree that administration tasks (like 
> adding/removing nodes, etc.) are more painful with large nodes – but not 
> impossible. For large amounts of data, I like nodes that have about 2.5 – 3 
> TB of usable SSD disk.
>  
> It is possible that your nodes might be under-utilized, especially at first. 
> But if the hardware is already available, you have to use what you have.
>  
> We have done multiple nodes on single physical hardware, but they were two 
> separate clusters (for the same application). In that case, we had  a 
> different install location and different ports for one of the clusters.
>  
> Sean Durity
>  
> From: William R  
> Sent: Thursday, April 18, 2019 9:14 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] multiple Cassandra instances per server, possible?
>  
> Hi all,
>  
> In our small company we have 10 nodes of (2 x 3 TB HD) 6 TB each, 128 GB ram 
> and 64 cores and we are thinking to use them as Cassandra nodes. From what I 
> am reading around, the community recommends that every node should not keep 
> more than 1 TB data so in this case I am wondering if it is possible to 
> install 2 instances per node using docker so each docker instance can write 
> to its own physical disk and utilise more efficiently the rest hardware (CPU 
> & RAM).
>  
> I understand with this setup there is the danger of creating a single point 
> of failure for 2 Cassandra nodes but except that do you think that is a 
> possible setup to start with the cluster?
>  
> Except the docker solution do you recommend any other way to split the 
> physical node to 2 instances? (VMWare? or even maybe 2 separate installations 
> of Cassandra? )
>  
> Eventually we are aiming in a cluster consisted of 2 DCs with 10 nodes each 
> (5 baremetal nodes with 2 Cassandra instances)
>  
> Probably later when we will start introducing more nodes to the cluster we 
> can decommissioning the "double-instaned" ones and aim for a more homogeneous 
> solution..
>  
> Thank you,
>  
> Wil
> 
> 
> The information in this Internet Email is confidential and may be legally 
> privileged. It is intended solely for the addressee. Access to this Email by 
> anyone else is unauthorized. If you are not the intended recipient, any 
> disclosure, copying, distribution or any action taken or omitted to be taken 
> in reliance on it, is prohibited and may be unlawful. When addressed to our 
> clients any opinions or advice contained in this Email are subject to the 
> terms and conditions expressed in any applicable governing The  Home Depot 
> terms of business or client engagement letter. The Home Depot disclaims all 
> responsibility and liability for the accuracy and content of this attachment 
> and for any damages or losses arising from any inaccuracies, errors, viruses, 
> e.g., worms, trojan horses, etc., or other items of a destructive nature, 
> which may be contained in this attachment and shall not be liable for direct, 
> indirect, consequential or special damages in connection with this e-mail 
> message or its attachment.

RE: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Durity, Sean R

What is the data problem that you are trying to solve with Cassandra? Is it 
high availability? Low latency queries? Large data volumes? High concurrent 
users? I would design the solution to fit the problem(s) you are solving.

For example, if high availability is the goal, I would be very cautious about 2 
nodes/machine. If you need the full amount of the disk – you *can* have larger 
nodes than 1 TB. I agree that administration tasks (like adding/removing nodes, 
etc.) are more painful with large nodes – but not impossible. For large amounts 
of data, I like nodes that have about 2.5 – 3 TB of usable SSD disk.

It is possible that your nodes might be under-utilized, especially at first. 
But if the hardware is already available, you have to use what you have.

We have done multiple nodes on single physical hardware, but they were two 
separate clusters (for the same application). In that case, we had  a different 
install location and different ports for one of the clusters.

Sean Durity

From: William R 
Sent: Thursday, April 18, 2019 9:14 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] multiple Cassandra instances per server, possible?

Hi all,

In our small company we have 10 nodes of (2 x 3 TB HD) 6 TB each, 128 GB ram 
and 64 cores and we are thinking to use them as Cassandra nodes. From what I am 
reading around, the community recommends that every node should not keep more 
than 1 TB data so in this case I am wondering if it is possible to install 2 
instances per node using docker so each docker instance can write to its own 
physical disk and utilise more efficiently the rest hardware (CPU & RAM).

I understand with this setup there is the danger of creating a single point of 
failure for 2 Cassandra nodes but except that do you think that is a possible 
setup to start with the cluster?

Except the docker solution do you recommend any other way to split the physical 
node to 2 instances? (VMWare? or even maybe 2 separate installations of 
Cassandra? )

Eventually we are aiming in a cluster consisted of 2 DCs with 10 nodes each (5 
baremetal nodes with 2 Cassandra instances)

Probably later when we will start introducing more nodes to the cluster we can 
decommissioning the "double-instaned" ones and aim for a more homogeneous 
solution..

Thank you,

Wil



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: [EXTERNAL] Re: Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException:

2019-04-17 Thread Krishnanand Khambadkone

 Thank you gentlemen for all your responses.   Reading through them I was able 
to resolve the issue by doing the following,
a.  Creating an index on one of the query fieldsb.  Setting page size to 200
Now the query runs instantaneously.
On Wednesday, April 17, 2019, 7:12:21 AM PDT, Shaurya Gupta 
 wrote:  

 As already mentioned in this thread, ALLOW FILTERING should be avoided in any 
scenario.
It seems to work in test scenarios, but as soon as the data increases to 
certain size(a few MBs), it starts failing miserably and fails almost always.
ThanksShaurya

On Wed, Apr 17, 2019, 6:44 PM Durity, Sean R  
wrote:

If you are just trying to get a sense of the data, you could try adding a limit 
clause to limit the amount of results and hopefully beat the timeout.

However, ALLOW FILTERING really means "ALLOW ME TO DESTROY MY APPLICATION AND 
CLUSTER." It means the data model does not support the query and will not scale 
-- in this case, not even on one node. Design a new table to support the query 
with a proper partition key (and any clustering keys).

Sean Durity

-Original Message-
From: Dinesh Joshi 
Sent: Wednesday, April 17, 2019 2:39 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Caused by: 
com.datastax.driver.core.exceptions.ReadTimeoutException:

More info with detailed explanation: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_apache-2Dcassandra-2Dscalability-2Dallow-2Dfiltering-2Dpartition-2Dkeys_=DwIFAg=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=HXMAEKpR-N5O0-U5rclUrsVk5QPmSUQYels4VTOVZWI=LHl3QGlLsAdszkJ6XK3O2w7_EcSyRyaSFjBgEcK9nfo=

Dinesh

> On Apr 16, 2019, at 11:24 PM, Mahesh Daksha  wrote:
>
> Hi,
>
> how much data you are trying to read in the single query? is it large in size 
> or normal text data.
> Looking at the exception it seems the node is unable to deliver data within 
> stipulated time. I have faced similar issue with the response data in huge in 
> size (some binary data). But it was solved as we spread the data across 
> multiple rows.
>
> Thanks,
> Mahesh Daksha
>
> On Wed, Apr 17, 2019 at 11:42 AM Krishnanand Khambadkone 
>  wrote:
> Hi,  I have a single instance cassandra server.  I am trying to execute a 
> query with ALLOW FILTERING option.  When I run this same query from cqlsh it 
> runs fine but when I try to execute the query through the java driver it 
> throws this exception.  I have increased all the timeouts in cassandra.yaml 
> file and also included the read timeout option in the SimpleStetement query I 
> am running.  Any idea how I can fix this issue.
> Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: 
> Cassandra timeout during read query at consistency LOCAL_ONE (1 responses 
> were required but only 0 replica responded)
>
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: [EXTERNAL] Re: Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException:

2019-04-17 Thread Shaurya Gupta

As already mentioned in this thread, ALLOW FILTERING should be avoided in
any scenario.

It seems to work in test scenarios, but as soon as the data increases to
certain size(a few MBs), it starts failing miserably and fails almost
always.

Thanks
Shaurya


On Wed, Apr 17, 2019, 6:44 PM Durity, Sean R 
wrote:

> If you are just trying to get a sense of the data, you could try adding a
> limit clause to limit the amount of results and hopefully beat the timeout.
>
> However, ALLOW FILTERING really means "ALLOW ME TO DESTROY MY APPLICATION
> AND CLUSTER." It means the data model does not support the query and will
> not scale -- in this case, not even on one node. Design a new table to
> support the query with a proper partition key (and any clustering keys).
>
>
> Sean Durity
>
>
> -Original Message-
> From: Dinesh Joshi 
> Sent: Wednesday, April 17, 2019 2:39 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] Re: Caused by:
> com.datastax.driver.core.exceptions.ReadTimeoutException:
>
> More info with detailed explanation:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_apache-2Dcassandra-2Dscalability-2Dallow-2Dfiltering-2Dpartition-2Dkeys_=DwIFAg=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=HXMAEKpR-N5O0-U5rclUrsVk5QPmSUQYels4VTOVZWI=LHl3QGlLsAdszkJ6XK3O2w7_EcSyRyaSFjBgEcK9nfo=
>
> Dinesh
>
> > On Apr 16, 2019, at 11:24 PM, Mahesh Daksha  wrote:
> >
> > Hi,
> >
> > how much data you are trying to read in the single query? is it large in
> size or normal text data.
> > Looking at the exception it seems the node is unable to deliver data
> within stipulated time. I have faced similar issue with the response data
> in huge in size (some binary data). But it was solved as we spread the data
> across multiple rows.
> >
> > Thanks,
> > Mahesh Daksha
> >
> > On Wed, Apr 17, 2019 at 11:42 AM Krishnanand Khambadkone <
> kkhambadk...@yahoo.com.invalid> wrote:
> > Hi,  I have a single instance cassandra server.  I am trying to execute
> a query with ALLOW FILTERING option.  When I run this same query from cqlsh
> it runs fine but when I try to execute the query through the java driver it
> throws this exception.  I have increased all the timeouts in cassandra.yaml
> file and also included the read timeout option in the SimpleStetement query
> I am running.  Any idea how I can fix this issue.
> > Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException:
> Cassandra timeout during read query at consistency LOCAL_ONE (1 responses
> were required but only 0 replica responded)
> >
> >
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
> 
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

RE: [EXTERNAL] Re: Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException:

2019-04-17 Thread Durity, Sean R

If you are just trying to get a sense of the data, you could try adding a limit 
clause to limit the amount of results and hopefully beat the timeout.

However, ALLOW FILTERING really means "ALLOW ME TO DESTROY MY APPLICATION AND 
CLUSTER." It means the data model does not support the query and will not scale 
-- in this case, not even on one node. Design a new table to support the query 
with a proper partition key (and any clustering keys).

Sean Durity

-Original Message-
From: Dinesh Joshi 
Sent: Wednesday, April 17, 2019 2:39 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Caused by: 
com.datastax.driver.core.exceptions.ReadTimeoutException:

More info with detailed explanation: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_apache-2Dcassandra-2Dscalability-2Dallow-2Dfiltering-2Dpartition-2Dkeys_=DwIFAg=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=HXMAEKpR-N5O0-U5rclUrsVk5QPmSUQYels4VTOVZWI=LHl3QGlLsAdszkJ6XK3O2w7_EcSyRyaSFjBgEcK9nfo=

Dinesh

> On Apr 16, 2019, at 11:24 PM, Mahesh Daksha  wrote:
>
> Hi,
>
> how much data you are trying to read in the single query? is it large in size 
> or normal text data.
> Looking at the exception it seems the node is unable to deliver data within 
> stipulated time. I have faced similar issue with the response data in huge in 
> size (some binary data). But it was solved as we spread the data across 
> multiple rows.
>
> Thanks,
> Mahesh Daksha
>
> On Wed, Apr 17, 2019 at 11:42 AM Krishnanand Khambadkone 
>  wrote:
> Hi,  I have a single instance cassandra server.  I am trying to execute a 
> query with ALLOW FILTERING option.  When I run this same query from cqlsh it 
> runs fine but when I try to execute the query through the java driver it 
> throws this exception.  I have increased all the timeouts in cassandra.yaml 
> file and also included the read timeout option in the SimpleStetement query I 
> am running.  Any idea how I can fix this issue.
> Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: 
> Cassandra timeout during read query at consistency LOCAL_ONE (1 responses 
> were required but only 0 replica responded)
>
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: [EXTERNAL] Re: Getting Consistency level TWO when it is requested LOCAL_ONE

2019-04-12 Thread Jean Carlo

I think this jira

https://issues.apache.org/jira/browse/CASSANDRA-9895

Answer my question

Saludos

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


On Fri, Apr 12, 2019 at 10:04 AM Jean Carlo 
wrote:

> Hello Sean
>
> Well this is a little bit confusing. After digging into the doc, I found
> this old documentation of Datastax that says
> "First, we can dynamically adjust behavior depending on the cluster size
> and arrangement. Cassandra prefers to perform batchlog writes to two
> different replicas in the same datacenter as the coordinator."
>
> https://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2
>
> Which may explain the message in the timeout. However I do not know if
> this information is still true. Someone know if this is still true?
>
> Reading the comments in
> https://issues.apache.org/jira/browse/CASSANDRA-9620 that says ' *Writing
> the batch log will always be done using CL ONE.*' Contradict what I
> understood from datastax's doc
>
> Yes I understood batches are not for speed. Still we are using it for a
> consistency need.
>
> @Mahesh Yes we do set the consistency like that
>
> Thank you
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
>
> On Thu, Apr 11, 2019 at 3:39 PM Durity, Sean R <
> sean_r_dur...@homedepot.com> wrote:
>
>> https://issues.apache.org/jira/browse/CASSANDRA-9620 has something
>> similar that was determined to be a driver error. I would start with
>> looking at the driver version and also the RetryPolicy that is in effect
>> for the Cluster. Secondly, I would look at whether a batch is really needed
>> for the statements. Cassandra batches are for atomicity – not speed.
>>
>>
>>
>> Sean Durity
>>
>> Staff Systems Engineer – Cassandra
>>
>> MTC 2250
>>
>> #cassandra - for the latest news and updates
>>
>>
>>
>>
>>
>> *From:* Mahesh Daksha 
>> *Sent:* Thursday, April 11, 2019 5:21 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] Re: Getting Consistency level TWO when it is
>> requested LOCAL_ONE
>>
>>
>>
>> Hi Jean,
>>
>>
>>
>> I want to understand how you are setting the write consistency level as
>> LOCAL ONE. That is with every query you mentioning consistency level or you
>> have set the spring cassandra config with provided consistency level.
>>
>> Like this:
>>
>> cluster.setQueryOptions(new
>> QueryOptions().setConsistencyLevel(ConsistencyLevel.valueOf(cassandraConsistencyLevel)));
>>
>>
>>
>> The only possibility i see of such behavior is its getting overridden
>> from some where.
>>
>>
>>
>> Thanks,
>>
>> Mahesh Daksha
>>
>>
>>
>> On Thu, Apr 11, 2019 at 1:43 PM Jean Carlo 
>> wrote:
>>
>> Hello everyone,
>>
>>
>>
>> I have a case where the developers are using spring data framework for
>> Cassandra. We are writing batches setting consistency level at LOCAL_ONE
>> but we got a timeout like this
>>
>>
>>
>> *Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException:
>> Cassandra timeout during BATCH_LOG write query at consistency TWO (2
>> replica were required but only 1 acknowledged the write)*
>>
>>
>>
>> Is it Cassandra that somehow writes to the *system.batchlog* using
>> consistency TWO or is it spring data that makes some dirty things behind
>> the scenes ?
>>
>> (I want to believe it is the second one)
>>
>>
>>
>> Cheers
>>
>>
>>
>> Jean Carlo
>>
>>
>> "The best way to predict the future is to invent it" Alan Kay
>>
>>
>> --
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>

Re: [EXTERNAL] Re: Getting Consistency level TWO when it is requested LOCAL_ONE

2019-04-12 Thread Jean Carlo

Hello Sean

Well this is a little bit confusing. After digging into the doc, I found
this old documentation of Datastax that says
"First, we can dynamically adjust behavior depending on the cluster size
and arrangement. Cassandra prefers to perform batchlog writes to two
different replicas in the same datacenter as the coordinator."

https://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2

Which may explain the message in the timeout. However I do not know if this
information is still true. Someone know if this is still true?

Reading the comments in https://issues.apache.org/jira/browse/CASSANDRA-9620
that says ' *Writing the batch log will always be done using CL ONE.*'
Contradict what I understood from datastax's doc

Yes I understood batches are not for speed. Still we are using it for a
consistency need.

@Mahesh Yes we do set the consistency like that

Thank you

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


On Thu, Apr 11, 2019 at 3:39 PM Durity, Sean R 
wrote:

> https://issues.apache.org/jira/browse/CASSANDRA-9620 has something
> similar that was determined to be a driver error. I would start with
> looking at the driver version and also the RetryPolicy that is in effect
> for the Cluster. Secondly, I would look at whether a batch is really needed
> for the statements. Cassandra batches are for atomicity – not speed.
>
>
>
> Sean Durity
>
> Staff Systems Engineer – Cassandra
>
> MTC 2250
>
> #cassandra - for the latest news and updates
>
>
>
>
>
> *From:* Mahesh Daksha 
> *Sent:* Thursday, April 11, 2019 5:21 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Getting Consistency level TWO when it is
> requested LOCAL_ONE
>
>
>
> Hi Jean,
>
>
>
> I want to understand how you are setting the write consistency level as
> LOCAL ONE. That is with every query you mentioning consistency level or you
> have set the spring cassandra config with provided consistency level.
>
> Like this:
>
> cluster.setQueryOptions(new
> QueryOptions().setConsistencyLevel(ConsistencyLevel.valueOf(cassandraConsistencyLevel)));
>
>
>
> The only possibility i see of such behavior is its getting overridden from
> some where.
>
>
>
> Thanks,
>
> Mahesh Daksha
>
>
>
> On Thu, Apr 11, 2019 at 1:43 PM Jean Carlo 
> wrote:
>
> Hello everyone,
>
>
>
> I have a case where the developers are using spring data framework for
> Cassandra. We are writing batches setting consistency level at LOCAL_ONE
> but we got a timeout like this
>
>
>
> *Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException:
> Cassandra timeout during BATCH_LOG write query at consistency TWO (2
> replica were required but only 1 acknowledged the write)*
>
>
>
> Is it Cassandra that somehow writes to the *system.batchlog* using
> consistency TWO or is it spring data that makes some dirty things behind
> the scenes ?
>
> (I want to believe it is the second one)
>
>
>
> Cheers
>
>
>
> Jean Carlo
>
>
> "The best way to predict the future is to invent it" Alan Kay
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

RE: [EXTERNAL] Re: Getting Consistency level TWO when it is requested LOCAL_ONE

2019-04-11 Thread Durity, Sean R

https://issues.apache.org/jira/browse/CASSANDRA-9620 has something similar that 
was determined to be a driver error. I would start with looking at the driver 
version and also the RetryPolicy that is in effect for the Cluster. Secondly, I 
would look at whether a batch is really needed for the statements. Cassandra 
batches are for atomicity – not speed.

[cid:image003.png@01D4F04A.61E8CD40]

Sean Durity
Staff Systems Engineer – Cassandra
MTC 2250
#cassandra - for the latest news and updates



From: Mahesh Daksha 
Sent: Thursday, April 11, 2019 5:21 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Getting Consistency level TWO when it is requested 
LOCAL_ONE

Hi Jean,

I want to understand how you are setting the write consistency level as LOCAL 
ONE. That is with every query you mentioning consistency level or you have set 
the spring cassandra config with provided consistency level.
Like this:
cluster.setQueryOptions(new 
QueryOptions().setConsistencyLevel(ConsistencyLevel.valueOf(cassandraConsistencyLevel)));

The only possibility i see of such behavior is its getting overridden from some 
where.

Thanks,
Mahesh Daksha

On Thu, Apr 11, 2019 at 1:43 PM Jean Carlo 
mailto:jean.jeancar...@gmail.com>> wrote:
Hello everyone,

I have a case where the developers are using spring data framework for 
Cassandra. We are writing batches setting consistency level at LOCAL_ONE but we 
got a timeout like this

Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra 
timeout during BATCH_LOG write query at consistency TWO (2 replica were 
required but only 1 acknowledged the write)

Is it Cassandra that somehow writes to the system.batchlog using consistency 
TWO or is it spring data that makes some dirty things behind the scenes ?
(I want to believe it is the second one)

Cheers

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: [EXTERNAL] Issue while updating a record in 3 node cassandra cluster deployed using kubernetes

2019-04-09 Thread Mahesh Daksha

Thank you Sean for your response. We are also suspecting the same and
analyzing/troubleshooting it around queries associated timestamp.

Thanks,
Mahesh Daksha


On Tue, Apr 9, 2019 at 7:08 PM Durity, Sean R 
wrote:

> My first suspicion would be to look at the server times in the cluster. It
> looks like other cases where a write occurs (with no errors) but the data
> is not retrieved as expected. If the write occurs with an earlier timestamp
> than the existing data, this is the behavior you would see. The write would
> occur, but it would not be the latest data to be retrieved. The write looks
> like it fails silently, but it actually does exactly what it is designed to
> do.
>
>
>
> Sean Durity
>
>
>
> *From:* Mahesh Daksha 
> *Sent:* Tuesday, April 09, 2019 9:10 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Issue while updating a record in 3 node cassandra
> cluster deployed using kubernetes
>
>
>
> Hello All,
>
> I have a 3 node cassandra cluster with Replication factor as 2 and
> read-write consistency set to QUORUM. We are using Spring data cassandra.
> All infrastructure is deployed using kubernetes.
>
> Now in normal use case many records gets inserted to cassandra table. Then
> we try to modify/update one of the record using save method of repo, like
> below:
>
> ChunkMeta *tmpRec* = chunkMetaRepository.*save*(chunkMeta);
>
> After execution of above statement we never see any exception or error.
> But still this update state goes silent/fail intermittently. That is at
> times the record in the db gets updated successfully where as other time it
> fails. Also in the above query when we print *tmpRec* it contains the
> updated and correct value every time. Still in the db these updated values
> doesn't get reflected.
>
> We check the the cassandra transport TRACE logs on all nodes and found the
> our queries are getting logged there and are being executed also with out
> any error or exception.
>
> Now another weird observation is this all thing works erfectly fine if I
> am using single cassandra node (in kubernetes) or if we deploy above infra
> using ansible (even works for 3 nodes for Ansible).
>
> It looks some issue is specifically with the kubernetes 3 node deployment
> of cassandra. Primarily looks like replication among nodes causing this.
>
> Please suggest.
>
>
>
>
> I have a 3 node cassandra cluster with Replication factor as 2 and
> read-write consistency set to QUORUM. We are using Spring data cassandra.
> All infrastructure is deployed using kubernetes.
>
> Now in normal use case many records gets inserted to cassandra table. Then
> we try to modify/update one of the record using save method of repo, like
> below:
>
> ChunkMeta tmpRec = chunkMetaRepository.*save*(chunkMeta);
>
> After execution of above statement we never see any exception or error.
> But still this update fail intermittently. That is when we check the record
> in the db sometime it gets updated successfully where as other time it
> fails. Also in the above query when we print *tmpRec* it contains the
> updated and correct value. Still in the db these updated values doesnt get
> reflected.
>
> We check the the cassandra transport TRACE logs on all nodes and found the
> our queries are getting logged there and are being executed also.
>
> Now another weird observation is this all thing works if I am using single
> cassandra node (in kubernetes) or if we deploy above infra using ansible
> (even works for 3 nodes for Ansible).
>
> It looks some issue is specifically with the kubernetes 3 node deployment
> of cassandra. Primarily looks like replication among nodes causing this.
>
> Please suggest.
>
> Below are the contents of  my cassandra Docker file:
>
> FROM ubuntu:16.04
>
>
>
> RUN apt-get update && apt-get install -y python sudo lsof vim dnsutils 
> net-tools && apt-get clean && \
>
> addgroup testuser && useradd -g testuser testuser && usermod --password 
> testuser testuser;
>
>
>
> RUN mkdir -p /opt/test && \
>
> mkdir -p /opt/test/data;
>
>
>
> ADD jre8.tar.gz /opt/test/
>
> ADD apache-cassandra-3.11.0-bin.tar.gz /opt/test/
>
>
>
> RUN chmod 755 -R /opt/test/jre && \
>
> ln -s /opt/test/jre/bin/java /usr/bin/java && \
>
> mv /opt/test/apache-cassandra* /opt/test/cassandra;
>
>
>
> RUN mkdir -p /opt/test/cassandra/logs;
>
>
>
> ENV JAVA_HOME /opt/test/jre
>
> RUN export JAVA_HOME
>
>
>
> COPY version.txt /opt/test/cassandra/version.txt
>
>
>
> WORKDIR /opt/test/cassandra/bin/
>
>
>
> RUN mkdir -p /opt/test/data/saved_caches && \
>
> mkdir -p /opt/test/data/commitlog && \
>
> mkdir -p /opt/test/data/hints && \
>
> chown -R testuser:testuser /opt/test/data && \
>
> chown -R testuser:testuser /opt/test;
>
>
>
> USER testuser
>
>
>
> CMD cp /etc/cassandra/cassandra.yml ../conf/conf.yml && perl -p -e 
> 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg; s/\$\{([^}]+)\}//eg' 
> ../conf/conf.yml > ../conf/cassandra.yaml && rm ../conf/conf.yml && 
> ./cassandra -f
>
>

RE: [EXTERNAL] Issue while updating a record in 3 node cassandra cluster deployed using kubernetes

2019-04-09 Thread Durity, Sean R

My first suspicion would be to look at the server times in the cluster. It 
looks like other cases where a write occurs (with no errors) but the data is 
not retrieved as expected. If the write occurs with an earlier timestamp than 
the existing data, this is the behavior you would see. The write would occur, 
but it would not be the latest data to be retrieved. The write looks like it 
fails silently, but it actually does exactly what it is designed to do.

Sean Durity

From: Mahesh Daksha 
Sent: Tuesday, April 09, 2019 9:10 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Issue while updating a record in 3 node cassandra cluster 
deployed using kubernetes


Hello All,

I have a 3 node cassandra cluster with Replication factor as 2 and read-write 
consistency set to QUORUM. We are using Spring data cassandra. All 
infrastructure is deployed using kubernetes.

Now in normal use case many records gets inserted to cassandra table. Then we 
try to modify/update one of the record using save method of repo, like below:

ChunkMeta tmpRec = chunkMetaRepository.save(chunkMeta);

After execution of above statement we never see any exception or error. But 
still this update state goes silent/fail intermittently. That is at times the 
record in the db gets updated successfully where as other time it fails. Also 
in the above query when we print tmpRec it contains the updated and correct 
value every time. Still in the db these updated values doesn't get reflected.

We check the the cassandra transport TRACE logs on all nodes and found the our 
queries are getting logged there and are being executed also with out any error 
or exception.

Now another weird observation is this all thing works erfectly fine if I am 
using single cassandra node (in kubernetes) or if we deploy above infra using 
ansible (even works for 3 nodes for Ansible).

It looks some issue is specifically with the kubernetes 3 node deployment of 
cassandra. Primarily looks like replication among nodes causing this.

Please suggest.



I have a 3 node cassandra cluster with Replication factor as 2 and read-write 
consistency set to QUORUM. We are using Spring data cassandra. All 
infrastructure is deployed using kubernetes.

Now in normal use case many records gets inserted to cassandra table. Then we 
try to modify/update one of the record using save method of repo, like below:

ChunkMeta tmpRec = chunkMetaRepository.save(chunkMeta);

After execution of above statement we never see any exception or error. But 
still this update fail intermittently. That is when we check the record in the 
db sometime it gets updated successfully where as other time it fails. Also in 
the above query when we print tmpRec it contains the updated and correct value. 
Still in the db these updated values doesnt get reflected.

We check the the cassandra transport TRACE logs on all nodes and found the our 
queries are getting logged there and are being executed also.

Now another weird observation is this all thing works if I am using single 
cassandra node (in kubernetes) or if we deploy above infra using ansible (even 
works for 3 nodes for Ansible).

It looks some issue is specifically with the kubernetes 3 node deployment of 
cassandra. Primarily looks like replication among nodes causing this.

Please suggest.

Below are the contents of  my cassandra Docker file:

FROM ubuntu:16.04



RUN apt-get update && apt-get install -y python sudo lsof vim dnsutils 
net-tools && apt-get clean && \

addgroup testuser && useradd -g testuser testuser && usermod --password 
testuser testuser;



RUN mkdir -p /opt/test && \

mkdir -p /opt/test/data;



ADD jre8.tar.gz /opt/test/

ADD apache-cassandra-3.11.0-bin.tar.gz /opt/test/



RUN chmod 755 -R /opt/test/jre && \

ln -s /opt/test/jre/bin/java /usr/bin/java && \

mv /opt/test/apache-cassandra* /opt/test/cassandra;



RUN mkdir -p /opt/test/cassandra/logs;



ENV JAVA_HOME /opt/test/jre

RUN export JAVA_HOME



COPY version.txt /opt/test/cassandra/version.txt



WORKDIR /opt/test/cassandra/bin/



RUN mkdir -p /opt/test/data/saved_caches && \

mkdir -p /opt/test/data/commitlog && \

mkdir -p /opt/test/data/hints && \

chown -R testuser:testuser /opt/test/data && \

chown -R testuser:testuser /opt/test;



USER testuser



CMD cp /etc/cassandra/cassandra.yml ../conf/conf.yml && perl -p -e 
's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg; s/\$\{([^}]+)\}//eg' 
../conf/conf.yml > ../conf/cassandra.yaml && rm ../conf/conf.yml && ./cassandra 
-f

Please note conf.yml is basically cassandra.yml file having properties related 
to cassandra.



Thanks,

Mahesh Daksha



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken

Re: [EXTERNAL] Re: Garbage Collector

2019-03-22 Thread Ahmed Eljami

Thx guys for sharing your experiences with G1.

Since I sent you my question about GC, we have updated the version of java.
Always with CMS/java8 and updating from u9x to u201. Just with that, we
observe a gain of 66% (150ms ==> 50ms of STW) :)

We are planning a second tuning, this time with G1.

Thanks.

Le mar. 19 mars 2019 à 19:56, Durity, Sean R 
a écrit :

> My default is G1GC using 50% of available RAM (so typically a minimum of
> 16 GB for the JVM). That has worked in just about every case I’m familiar
> with. In the old days we used CMS, but tuning that beast is a black art
> with few wizards available (though several on this mailing list). Today, I
> just don’t see GC issues – unless there is a bad query in play. For me, the
> data model/query construction is the more fruitful path to achieving
> performance and reliability.
>
>
>
> Sean Durity
>
>
>
> *From:* Jon Haddad 
> *Sent:* Tuesday, March 19, 2019 2:16 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Garbage Collector
>
>
>
> G1 is optimized for high throughput with higher pause times.  It's great
> if you have mixed / unpredictable workloads, and as Elliott mentioned is
> mostly set & forget.
>
>
>
> ZGC requires Java 11, which is only supported on trunk.  I plan on messing
> with it soon, but I haven't had time yet.  We'll share the results on our
> blog (TLP) when we get to it.
>
>
>
> Jon
>
>
>
> On Tue, Mar 19, 2019 at 10:12 AM Elliott Sims 
> wrote:
>
> I use G1, and I think it's actually the default now for newer Cassandra
> versions.  For G1, I've done very little custom config/tuning.  I increased
> heap to 16GB (out of 64GB physical), but most of the rest is at or near
> default.  For the most part, it's been "feed it more RAM, and it works"
> compared to CMS's "lower overhead, works great until it doesn't" and dozens
> of knobs.
>
> I haven't tried ZGC yet, but anecdotally I've heard that it doesn't really
> match or beat G1 quite yet.
>
>
>
> On Tue, Mar 19, 2019 at 9:44 AM Ahmed Eljami 
> wrote:
>
> Hi Folks,
>
>
>
> Does someone use G1 GC or ZGC on production?
>
>
>
> Can you share your feedback, the configuration used if it's possible ?
>
>
>
> Thanks.
>
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


-- 
Cordialement;

Ahmed ELJAMI

RE: [EXTERNAL] Re: Garbage Collector

2019-03-19 Thread Durity, Sean R

My default is G1GC using 50% of available RAM (so typically a minimum of 16 GB 
for the JVM). That has worked in just about every case I’m familiar with. In 
the old days we used CMS, but tuning that beast is a black art with few wizards 
available (though several on this mailing list). Today, I just don’t see GC 
issues – unless there is a bad query in play. For me, the data model/query 
construction is the more fruitful path to achieving performance and reliability.



Sean Durity

From: Jon Haddad 
Sent: Tuesday, March 19, 2019 2:16 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Garbage Collector

G1 is optimized for high throughput with higher pause times.  It's great if you 
have mixed / unpredictable workloads, and as Elliott mentioned is mostly set & 
forget.

ZGC requires Java 11, which is only supported on trunk.  I plan on messing with 
it soon, but I haven't had time yet.  We'll share the results on our blog (TLP) 
when we get to it.

Jon

On Tue, Mar 19, 2019 at 10:12 AM Elliott Sims 
mailto:elli...@backblaze.com>> wrote:
I use G1, and I think it's actually the default now for newer Cassandra 
versions.  For G1, I've done very little custom config/tuning.  I increased 
heap to 16GB (out of 64GB physical), but most of the rest is at or near 
default.  For the most part, it's been "feed it more RAM, and it works" 
compared to CMS's "lower overhead, works great until it doesn't" and dozens of 
knobs.
I haven't tried ZGC yet, but anecdotally I've heard that it doesn't really 
match or beat G1 quite yet.

On Tue, Mar 19, 2019 at 9:44 AM Ahmed Eljami 
mailto:ahmed.elj...@gmail.com>> wrote:
Hi Folks,

Does someone use G1 GC or ZGC on production?

Can you share your feedback, the configuration used if it's possible ?

Thanks.




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-14 Thread Rahul Singh

Adding to Stefan's comment. There is a "scylladb" migrator, which uses the
spark connector from Datastax, and theoretically can work on any Cassandra
compiant DB.. and should not be limited to cassandra to scylla.

https://www.scylladb.com/2019/02/07/moving-from-cassandra-to-scylla-via-apache-spark-scylla-migrator/

https://github.com/scylladb/scylla-migrator

On Thu, Mar 14, 2019 at 3:04 PM Durity, Sean R 
wrote:

> The possibility of a highly available way to do this gives more
> challenges. I would be weighing the cost of a complex solution vs the
> possibility of a maintenance window when you stop your app to move the
> data, then restart.
>
>
>
> For the straight copy of the data, I am currently enamored with DataStax’s
> dsbulk utility for unloading and loading larger amounts of data. I don’t
> have extensive experience, yet, but it has been fast enough in my
> experiments – and that is without doing too much tuning for speed. From a
> host not in the cluster, I was able to extract 3.5 million rows in about 11
> seconds. I inserted them into a differently partitioned table in about 26
> seconds. Very small data rows, but it was impressive for not doing much to
> try and speed it up further. (In some other tests, it was about ¼ the time
> of simple copy statement from cqlsh)
>
>
>
> If I was designing something for a “can’t take an outage” scenario, I
> would start with:
>
> -  Writing the data to the old and new tables on all inserts
>
> -  On reads, read from the new table first. If not there, read
> from the old table ß could introduce some latency, but would be
> available; could also do asynchronous reads on both tables and choose the
> latest
>
> -  Do this until the data has been copied from old to new (with
> dsbulk or custom code or Spark)
>
> -  Drop the double writes and conditional reads
>
>
>
>
>
> Sean
>
>
>
> *From:* Stefan Miklosovic 
> *Sent:* Wednesday, March 13, 2019 6:39 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [EXTERNAL] Re: Migrate large volume of data from one table
> to another table within the same cluster when COPY is not an option.
>
>
>
> Hi Leena,
>
>
>
> as already suggested in my previous email, you could use Apache Spark and
> Cassandra Spark connector (1). I have checked TTLs and I believe you should
> especially read this section (2) about TTLs. Seems like thats what you need
> to do, ttls per row. The workflow would be that you read from your source
> table, making transformations per row (via some mapping) and then you would
> save it to new table.
>
>
>
> This would import it "all" but until you switch to the new table and
> records are still being saved into the original one, I am not sure how to
> cover "the gap" in such sense that once you make the switch, you would miss
> records which were created in the first table after you did the loading.
> You could maybe leverage Spark streaming (Cassandra connector knows that
> too) so you would make this transformation on the fly with new ones.
>
>
>
> (1) https://github.com/datastax/spark-cassandra-connector
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_datastax_spark-2Dcassandra-2Dconnector=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=_DgzHjtyiXt4QUBiWPplE-cs_HMaVflC9fAK6I4TdpQ=mMB-uNoPbBBK9Zfn5WuDoKoF31IgSi1MXgNlYG7jhDE=>
>
> (2)
> https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md#using-a-different-value-for-each-row
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_datastax_spark-2Dcassandra-2Dconnector_blob_master_doc_5-5Fsaving.md-23using-2Da-2Ddifferent-2Dvalue-2Dfor-2Deach-2Drow=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=_DgzHjtyiXt4QUBiWPplE-cs_HMaVflC9fAK6I4TdpQ=AwO-LFAxHWvYgzjuWt9ez5FHKDeNdS3C6KYfaoUUgOs=>
>
>
>
>
>
> On Thu, 14 Mar 2019 at 00:13, Leena Ghatpande 
> wrote:
>
> Understand, 2nd table would be a better approach. So what would be the
> best way to copy 70M rows from current table to the 2nd table with ttl set
> on each record as the first table?
>
>
> --
>
> *From:* Durity, Sean R 
> *Sent:* Wednesday, March 13, 2019 8:17 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: [EXTERNAL] Re: Migrate large volume of data from one table
> to another table within the same cluster when COPY is not an option.
>
>
>
> Correct, there is no current flag. I think there SHOULD be one.
>
>
>
>
>
> *From:* Dieudonné Madishon NGAYA 
> *Sent:* Tuesday, March 12, 2019 7:17 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Migrat

RE: [EXTERNAL] Re: Default TTL on CF

2019-03-14 Thread Durity, Sean R

I spent a month of my life on similar problem... There wasn't an easy answer, 
but this is what I did

#1 - Stop the problem from growing further. Get new inserts using a TTL (or set 
the default on the table so they get it). App team had to do this one.
#2 - Delete any data  that should already be expired.
- In my case the partition key included a date in the composite string they had 
built. So I could know from the partition key if the data needed to be deleted. 
I used sstablekeys to get the keys into files on each host. Then I parsed the 
files and created deletes for only those expired records. Then I executed the 
deletes. Then I had to do some compaction to actually create disk space. A long 
process with hundreds of billions of records...
#3 - Add TTL to data that should live. I gave this to the app team. Using the 
extracted keys I gave them, they could calculate the proper TTL. They read the 
data with the key, calculated TTL, and rewrote the data with TTL. Long, boring, 
etc. but they did it.

Sean Durity

-Original Message-
From: Jeff Jirsa 
Sent: Thursday, March 14, 2019 9:30 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Default TTL on CF

SSTableReader and CQLSSTableWriter if you’re comfortable with Java

--
Jeff Jirsa

> On Mar 14, 2019, at 1:28 PM, Nick Hatfield  wrote:
>
> Bummer but, reasonable. Any cool tricks I could use to make that process
> easier? I have many TB of data on a live cluster and was hoping to
> starting cleaning out the earlier bad habits of data housekeeping
>
>> On 3/14/19, 9:24 AM, "Jeff Jirsa"  wrote:
>>
>> It does not impact existing data
>>
>> The data gets an expiration time stamp when you write it. Changing the
>> default only impacts newly written data
>>
>> If you need to change the expiration time on existing data, you must
>> update it
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>>> On Mar 14, 2019, at 1:16 PM, Nick Hatfield 
>>> wrote:
>>>
>>> Hello,
>>>
>>> Can anyone tell me if setting a default TTL will affect existing data?
>>> I would like to enable a default TTL and have cassandra add that TTL to
>>> any rows that don¹t currently have a TTL set.
>>>
>>> Thanks,
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

RE: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-14 Thread Durity, Sean R

The possibility of a highly available way to do this gives more challenges. I 
would be weighing the cost of a complex solution vs the possibility of a 
maintenance window when you stop your app to move the data, then restart.

For the straight copy of the data, I am currently enamored with DataStax’s 
dsbulk utility for unloading and loading larger amounts of data. I don’t have 
extensive experience, yet, but it has been fast enough in my experiments – and 
that is without doing too much tuning for speed. From a host not in the 
cluster, I was able to extract 3.5 million rows in about 11 seconds. I inserted 
them into a differently partitioned table in about 26 seconds. Very small data 
rows, but it was impressive for not doing much to try and speed it up further. 
(In some other tests, it was about ¼ the time of simple copy statement from 
cqlsh)

If I was designing something for a “can’t take an outage” scenario, I would 
start with:

-  Writing the data to the old and new tables on all inserts

-  On reads, read from the new table first. If not there, read from the 
old table <-- could introduce some latency, but would be available; could also 
do asynchronous reads on both tables and choose the latest

-  Do this until the data has been copied from old to new (with dsbulk 
or custom code or Spark)

-  Drop the double writes and conditional reads


Sean

From: Stefan Miklosovic 
Sent: Wednesday, March 13, 2019 6:39 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Migrate large volume of data from one table to 
another table within the same cluster when COPY is not an option.

Hi Leena,

as already suggested in my previous email, you could use Apache Spark and 
Cassandra Spark connector (1). I have checked TTLs and I believe you should 
especially read this section (2) about TTLs. Seems like thats what you need to 
do, ttls per row. The workflow would be that you read from your source table, 
making transformations per row (via some mapping) and then you would save it to 
new table.

This would import it "all" but until you switch to the new table and records 
are still being saved into the original one, I am not sure how to cover "the 
gap" in such sense that once you make the switch, you would miss records which 
were created in the first table after you did the loading. You could maybe 
leverage Spark streaming (Cassandra connector knows that too) so you would make 
this transformation on the fly with new ones.

(1) 
https://github.com/datastax/spark-cassandra-connector<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_datastax_spark-2Dcassandra-2Dconnector=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=_DgzHjtyiXt4QUBiWPplE-cs_HMaVflC9fAK6I4TdpQ=mMB-uNoPbBBK9Zfn5WuDoKoF31IgSi1MXgNlYG7jhDE=>
(2) 
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md#using-a-different-value-for-each-row<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_datastax_spark-2Dcassandra-2Dconnector_blob_master_doc_5-5Fsaving.md-23using-2Da-2Ddifferent-2Dvalue-2Dfor-2Deach-2Drow=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=_DgzHjtyiXt4QUBiWPplE-cs_HMaVflC9fAK6I4TdpQ=AwO-LFAxHWvYgzjuWt9ez5FHKDeNdS3C6KYfaoUUgOs=>


On Thu, 14 Mar 2019 at 00:13, Leena Ghatpande 
mailto:lghatpa...@hotmail.com>> wrote:
Understand, 2nd table would be a better approach. So what would be the best way 
to copy 70M rows from current table to the 2nd table with ttl set on each 
record as the first table?


From: Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>>
Sent: Wednesday, March 13, 2019 8:17 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: [EXTERNAL] Re: Migrate large volume of data from one table to 
another table within the same cluster when COPY is not an option.


Correct, there is no current flag. I think there SHOULD be one.





From: Dieudonné Madishon NGAYA mailto:dmng...@gmail.com>>
Sent: Tuesday, March 12, 2019 7:17 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Migrate large volume of data from one table to another 
table within the same cluster when COPY is not an option.



Hi Sean, you can’t flag in Cassandra.yaml not allowing allow filtering , the 
only thing you can do will be from your data model .

Don’t ask Cassandra to query all data from table but the ideal query will be 
using single partition.



On Tue, Mar 12, 2019 at 6:46 PM Stefan Miklosovic 
mailto:stefan.mikloso...@instaclustr.com>> 
wrote:

Hi Sean,



for sure, the best approach would be to create another table which would treat 
just that specific query.



How do I set the flag for not allowing allow filtering in cassandra.yaml? I 
read a doco and there seems to be nothing about that.



Regards



On Wed, 13 Mar 2019 at 06:57, Duri

Re: [EXTERNAL] Re: Cluster size "limit"

2019-03-14 Thread Ahmed Eljami

So less vnodes allows more nodes, I understand.

But, It still hard to implement  on existing cluster with more than 10
Keyspaces with different RF...

Re: [EXTERNAL] Re: Cluster size "limit"

2019-03-13 Thread Gregory Raevski

Instaclustr's done some analysis around "optimal" number of vnodes to use -
https://www.instaclustr.com/cassandra-vnodes-how-many-should-i-use/
I might be able to provide more details if you're interested.

As, Sean said, you can create the new DC with different number of vnodes.

Kind Regards,
*Gregory Raevski*

On Thu, 14 Mar 2019 at 05:11, Durity, Sean R 
wrote:

> Rebuild the DCs with a new number of vnodes… I have done it.
>
>
>
> Sean
>
>
>
> *From:* Ahmed Eljami 
> *Sent:* Wednesday, March 13, 2019 2:09 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [EXTERNAL] Re: Cluster size "limit"
>
>
>
> Is not possible with an existing cluster!
>
> Le mer. 13 mars 2019 à 18:39, Durity, Sean R 
> a écrit :
>
> If you can change to 8 vnodes, it will be much better for repairs and
> other kinds of streaming operations. The old advice of 256 per node is now
> not very helpful.
>
>
>
> Sean
>
>
>
> *From:* Ahmed Eljami 
> *Sent:* Wednesday, March 13, 2019 1:27 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Cluster size "limit"
>
>
>
> Yes, 256 vnodes
>
>
>
> Le mer. 13 mars 2019 à 17:31, Jeff Jirsa  a écrit :
>
> Do you use vnodes? How many vnodes per machine?
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Mar 13, 2019, at 3:58 PM, Ahmed Eljami  wrote:
>
> Hi,
>
>
>
> We are planning to add a third datacenter to our cluster (already has 2
> datacenter, every datcenter has 50 nodes, so 100 nodes in total).
>
>
>
> My fear is that an important number of nodes per cluster (> 100) could
> cause a lot of problems like gossip duration, maintenance (repair...)...
>
>
>
> I know that it depends on use cases, volume of data and many other thing,
> but I would like that you share your  experiences with that.
>
>
>
> Thx
>
>
>
>
>
>
>
>
> --
>
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

Re: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-13 Thread Stefan Miklosovic

Hi Leena,

as already suggested in my previous email, you could use Apache Spark and
Cassandra Spark connector (1). I have checked TTLs and I believe you should
especially read this section (2) about TTLs. Seems like thats what you need
to do, ttls per row. The workflow would be that you read from your source
table, making transformations per row (via some mapping) and then you would
save it to new table.

This would import it "all" but until you switch to the new table and
records are still being saved into the original one, I am not sure how to
cover "the gap" in such sense that once you make the switch, you would miss
records which were created in the first table after you did the loading.
You could maybe leverage Spark streaming (Cassandra connector knows that
too) so you would make this transformation on the fly with new ones.

(1) https://github.com/datastax/spark-cassandra-connector
(2)
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md#using-a-different-value-for-each-row


On Thu, 14 Mar 2019 at 00:13, Leena Ghatpande 
wrote:

> Understand, 2nd table would be a better approach. So what would be the
> best way to copy 70M rows from current table to the 2nd table with ttl set
> on each record as the first table?
>
> --
> *From:* Durity, Sean R 
> *Sent:* Wednesday, March 13, 2019 8:17 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: [EXTERNAL] Re: Migrate large volume of data from one table
> to another table within the same cluster when COPY is not an option.
>
>
> Correct, there is no current flag. I think there SHOULD be one.
>
>
>
>
>
> *From:* Dieudonné Madishon NGAYA 
> *Sent:* Tuesday, March 12, 2019 7:17 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Migrate large volume of data from one table to
> another table within the same cluster when COPY is not an option.
>
>
>
> Hi Sean, you can’t flag in Cassandra.yaml not allowing allow filtering ,
> the only thing you can do will be from your data model .
>
> Don’t ask Cassandra to query all data from table but the ideal query will
> be using single partition.
>
>
>
> On Tue, Mar 12, 2019 at 6:46 PM Stefan Miklosovic <
> stefan.mikloso...@instaclustr.com> wrote:
>
> Hi Sean,
>
>
>
> for sure, the best approach would be to create another table which would
> treat just that specific query.
>
>
>
> How do I set the flag for not allowing allow filtering in cassandra.yaml?
> I read a doco and there seems to be nothing about that.
>
>
>
> Regards
>
>
>
> On Wed, 13 Mar 2019 at 06:57, Durity, Sean R 
> wrote:
>
> If there are 2 access patterns, I would consider having 2 tables. The
> first one with the ID, which you say is the majority use case.  Then have a
> second table that uses a time-bucket approach as others have suggested:
>
> (time bucket, id) as primary key
>
> Choose a time bucket (day, week, hour, month, whatever) that would hold
> less than 100 MB of data in the time-bucket partition.
>
>
>
> You could include all relevant data in the second table to meet your
> query. OR, if that data seems too large or too volatile to duplicate, just
> include your primary key and look-up the data in the primary table as
> needed.
>
>
>
> If you use allow filtering, you are setting yourself up for failure to
> scale. I tell my developers, “if you use allow filtering, you are doing it
> wrong.” In fact, I think the Cassandra admin should be able to set a flag
> in cassandra.yaml to not allow filtering at all. The cluster should be able
> to protect itself from bad queries.
>
>
>
>
>
>
>
> *From:* Leena Ghatpande 
> *Sent:* Tuesday, March 12, 2019 9:02 AM
> *To:* Stefan Miklosovic ;
> user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Migrate large volume of data from one table to
> another table within the same cluster when COPY is not an option.
>
>
>
> Our data model cannot be like below as you have recommended as majority of
> the reads need to select the data by the partition key (id) only, not by
> date.
>
> You could remodel your data in such way that you would make primary key
> like this
>
> ((date), hour-minute, id)
>
> or
>
> ((date, hour-minute), id)
>
>
>
>
>
> By adding the date as clustering column, yes the idea was to use the Allow
> Filtering on the date and pull the records. Understand that it is not
> recommended to do this, but we have been doing this on another existing
> large table and have not run into any issue so far. But want to understand
> if there is a better approach to this?
>
>
>
> Thanks
>
>
> --
>
> *From

RE: [EXTERNAL] Re: Cluster size "limit"

2019-03-13 Thread Durity, Sean R

Rebuild the DCs with a new number of vnodes… I have done it.

Sean

From: Ahmed Eljami 
Sent: Wednesday, March 13, 2019 2:09 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Cluster size "limit"

Is not possible with an existing cluster!
Le mer. 13 mars 2019 à 18:39, Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> a écrit :
If you can change to 8 vnodes, it will be much better for repairs and other 
kinds of streaming operations. The old advice of 256 per node is now not very 
helpful.

Sean

From: Ahmed Eljami mailto:ahmed.elj...@gmail.com>>
Sent: Wednesday, March 13, 2019 1:27 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Cluster size "limit"

Yes, 256 vnodes

Le mer. 13 mars 2019 à 17:31, Jeff Jirsa 
mailto:jji...@gmail.com>> a écrit :
Do you use vnodes? How many vnodes per machine?
--
Jeff Jirsa

On Mar 13, 2019, at 3:58 PM, Ahmed Eljami 
mailto:ahmed.elj...@gmail.com>> wrote:
Hi,

We are planning to add a third datacenter to our cluster (already has 2 
datacenter, every datcenter has 50 nodes, so 100 nodes in total).

My fear is that an important number of nodes per cluster (> 100) could cause a 
lot of problems like gossip duration, maintenance (repair...)...

I know that it depends on use cases, volume of data and many other thing, but I 
would like that you share your  experiences with that.

Thx

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: [EXTERNAL] Re: Cluster size "limit"

2019-03-13 Thread Ahmed Eljami

Is not possible with an existing cluster!

Le mer. 13 mars 2019 à 18:39, Durity, Sean R 
a écrit :

> If you can change to 8 vnodes, it will be much better for repairs and
> other kinds of streaming operations. The old advice of 256 per node is now
> not very helpful.
>
>
>
> Sean
>
>
>
> *From:* Ahmed Eljami 
> *Sent:* Wednesday, March 13, 2019 1:27 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Cluster size "limit"
>
>
>
> Yes, 256 vnodes
>
>
>
> Le mer. 13 mars 2019 à 17:31, Jeff Jirsa  a écrit :
>
> Do you use vnodes? How many vnodes per machine?
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Mar 13, 2019, at 3:58 PM, Ahmed Eljami  wrote:
>
> Hi,
>
>
>
> We are planning to add a third datacenter to our cluster (already has 2
> datacenter, every datcenter has 50 nodes, so 100 nodes in total).
>
>
>
> My fear is that an important number of nodes per cluster (> 100) could
> cause a lot of problems like gossip duration, maintenance (repair...)...
>
>
>
> I know that it depends on use cases, volume of data and many other thing,
> but I would like that you share your  experiences with that.
>
>
>
> Thx
>
>
>
>
>
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

RE: [EXTERNAL] Re: Cluster size "limit"

2019-03-13 Thread Durity, Sean R

If you can change to 8 vnodes, it will be much better for repairs and other 
kinds of streaming operations. The old advice of 256 per node is now not very 
helpful.

Sean

From: Ahmed Eljami 
Sent: Wednesday, March 13, 2019 1:27 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cluster size "limit"

Yes, 256 vnodes

Le mer. 13 mars 2019 à 17:31, Jeff Jirsa 
mailto:jji...@gmail.com>> a écrit :
Do you use vnodes? How many vnodes per machine?
--
Jeff Jirsa

On Mar 13, 2019, at 3:58 PM, Ahmed Eljami 
mailto:ahmed.elj...@gmail.com>> wrote:
Hi,

We are planning to add a third datacenter to our cluster (already has 2 
datacenter, every datcenter has 50 nodes, so 100 nodes in total).

My fear is that an important number of nodes per cluster (> 100) could cause a 
lot of problems like gossip duration, maintenance (repair...)...

I know that it depends on use cases, volume of data and many other thing, but I 
would like that you share your  experiences with that.

Thx

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-13 Thread Leena Ghatpande

Understand, 2nd table would be a better approach. So what would be the best way 
to copy 70M rows from current table to the 2nd table with ttl set on each 
record as the first table?

From: Durity, Sean R 
Sent: Wednesday, March 13, 2019 8:17 AM
To: user@cassandra.apache.org
Subject: RE: [EXTERNAL] Re: Migrate large volume of data from one table to 
another table within the same cluster when COPY is not an option.

Correct, there is no current flag. I think there SHOULD be one.

From: Dieudonné Madishon NGAYA 
Sent: Tuesday, March 12, 2019 7:17 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Migrate large volume of data from one table to another 
table within the same cluster when COPY is not an option.

Hi Sean, you can’t flag in Cassandra.yaml not allowing allow filtering , the 
only thing you can do will be from your data model .

Don’t ask Cassandra to query all data from table but the ideal query will be 
using single partition.

On Tue, Mar 12, 2019 at 6:46 PM Stefan Miklosovic 
mailto:stefan.mikloso...@instaclustr.com>> 
wrote:

Hi Sean,

for sure, the best approach would be to create another table which would treat 
just that specific query.

How do I set the flag for not allowing allow filtering in cassandra.yaml? I 
read a doco and there seems to be nothing about that.

Regards

On Wed, 13 Mar 2019 at 06:57, Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:

If there are 2 access patterns, I would consider having 2 tables. The first one 
with the ID, which you say is the majority use case.  Then have a second table 
that uses a time-bucket approach as others have suggested:

(time bucket, id) as primary key

Choose a time bucket (day, week, hour, month, whatever) that would hold less 
than 100 MB of data in the time-bucket partition.

You could include all relevant data in the second table to meet your query. OR, 
if that data seems too large or too volatile to duplicate, just include your 
primary key and look-up the data in the primary table as needed.

If you use allow filtering, you are setting yourself up for failure to scale. I 
tell my developers, “if you use allow filtering, you are doing it wrong.” In 
fact, I think the Cassandra admin should be able to set a flag in 
cassandra.yaml to not allow filtering at all. The cluster should be able to 
protect itself from bad queries.

From: Leena Ghatpande mailto:lghatpa...@hotmail.com>>
Sent: Tuesday, March 12, 2019 9:02 AM
To: Stefan Miklosovic 
mailto:stefan.mikloso...@instaclustr.com>>; 
user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Migrate large volume of data from one table to another 
table within the same cluster when COPY is not an option.

Our data model cannot be like below as you have recommended as majority of the 
reads need to select the data by the partition key (id) only, not by date.

You could remodel your data in such way that you would make primary key like 
this

((date), hour-minute, id)

or

((date, hour-minute), id)

By adding the date as clustering column, yes the idea was to use the Allow 
Filtering on the date and pull the records. Understand that it is not 
recommended to do this, but we have been doing this on another existing large 
table and have not run into any issue so far. But want to understand if there 
is a better approach to this?

Thanks

From: Stefan Miklosovic 
mailto:stefan.mikloso...@instaclustr.com>>
Sent: Monday, March 11, 2019 7:12 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Migrate large volume of data from one table to another table 
within the same cluster when COPY is not an option.

The query which does not work should be like this, I made a mistake there

cqlsh> SELECT * from my_keyspace.my_table where  number > 2;

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot 
execute this query as it might involve data filtering and thus may have 
unpredictable performance. If you want to execute this query despite the 
performance unpredictability, use ALLOW FILTERING"

On Tue, 12 Mar 2019 at 10:10, Stefan Miklosovic 
mailto:stefan.mikloso...@instaclustr.com>> 
wrote:

Hi Leena,

"We are thinking of creating a new table with a date field as a clustering 
column to be able to query for date ranges, but partition key to clustering key 
will be 1-1. Is this a good approach?"

If you want to select by some time range here, I am wondering how would making 
datetime a clustering column help you here? You still have to provide primary 
key, right?

E.g. select * from your_keyspace.your_table where id=123 and my_date > 
yesterday and my_date < tomorrow (you got the idea)

If you make my_date clustering column, you cant not do this below, because you 
still have to specify partition key fully

RE: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-13 Thread Durity, Sean R

Correct, there is no current flag. I think there SHOULD be one.

From: Dieudonné Madishon NGAYA 
Sent: Tuesday, March 12, 2019 7:17 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Migrate large volume of data from one table to another 
table within the same cluster when COPY is not an option.

Hi Sean, you can’t flag in Cassandra.yaml not allowing allow filtering , the 
only thing you can do will be from your data model .
Don’t ask Cassandra to query all data from table but the ideal query will be 
using single partition.

On Tue, Mar 12, 2019 at 6:46 PM Stefan Miklosovic 
mailto:stefan.mikloso...@instaclustr.com>> 
wrote:
Hi Sean,

for sure, the best approach would be to create another table which would treat 
just that specific query.

How do I set the flag for not allowing allow filtering in cassandra.yaml? I 
read a doco and there seems to be nothing about that.

Regards

On Wed, 13 Mar 2019 at 06:57, Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
If there are 2 access patterns, I would consider having 2 tables. The first one 
with the ID, which you say is the majority use case.  Then have a second table 
that uses a time-bucket approach as others have suggested:
(time bucket, id) as primary key
Choose a time bucket (day, week, hour, month, whatever) that would hold less 
than 100 MB of data in the time-bucket partition.

You could include all relevant data in the second table to meet your query. OR, 
if that data seems too large or too volatile to duplicate, just include your 
primary key and look-up the data in the primary table as needed.

If you use allow filtering, you are setting yourself up for failure to scale. I 
tell my developers, “if you use allow filtering, you are doing it wrong.” In 
fact, I think the Cassandra admin should be able to set a flag in 
cassandra.yaml to not allow filtering at all. The cluster should be able to 
protect itself from bad queries.

From: Leena Ghatpande mailto:lghatpa...@hotmail.com>>
Sent: Tuesday, March 12, 2019 9:02 AM
To: Stefan Miklosovic 
mailto:stefan.mikloso...@instaclustr.com>>; 
user@cassandra.apache.org
Subject: [EXTERNAL] Re: Migrate large volume of data from one table to another 
table within the same cluster when COPY is not an option.

Our data model cannot be like below as you have recommended as majority of the 
reads need to select the data by the partition key (id) only, not by date.
You could remodel your data in such way that you would make primary key like 
this
((date), hour-minute, id)
or
((date, hour-minute), id)

By adding the date as clustering column, yes the idea was to use the Allow 
Filtering on the date and pull the records. Understand that it is not 
recommended to do this, but we have been doing this on another existing large 
table and have not run into any issue so far. But want to understand if there 
is a better approach to this?

Thanks

From: Stefan Miklosovic 
mailto:stefan.mikloso...@instaclustr.com>>
Sent: Monday, March 11, 2019 7:12 PM
To: user@cassandra.apache.org
Subject: Re: Migrate large volume of data from one table to another table 
within the same cluster when COPY is not an option.

The query which does not work should be like this, I made a mistake there

cqlsh> SELECT * from my_keyspace.my_table where  number > 2;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot 
execute this query as it might involve data filtering and thus may have 
unpredictable performance. If you want to execute this query despite the 
performance unpredictability, use ALLOW FILTERING"

On Tue, 12 Mar 2019 at 10:10, Stefan Miklosovic 
mailto:stefan.mikloso...@instaclustr.com>> 
wrote:
Hi Leena,

"We are thinking of creating a new table with a date field as a clustering 
column to be able to query for date ranges, but partition key to clustering key 
will be 1-1. Is this a good approach?"

If you want to select by some time range here, I am wondering how would making 
datetime a clustering column help you here? You still have to provide primary 
key, right?

E.g. select * from your_keyspace.your_table where id=123 and my_date > 
yesterday and my_date < tomorrow (you got the idea)

If you make my_date clustering column, you cant not do this below, because you 
still have to specify partition key fully and then clustering key (optionally) 
where you can further order and do ranges. But you cant do a query without 
specifying partition key. Well, you can use ALLOW FILTERING but you do not want 
to do this at all in your situation as it would scan everything.

select * from your_keyspace.your_table where my_date > yesterday and my_date < 
tomorrow

cqlsh> create KEYSPACE my_keyspace WITH replication = {'class': 
'NetworkTopologyStrategy', 'dc1': '1'};
cqlsh> CREATE TABLE my_keyspace.my_table (id uuid, number int, PRIMARY KEY 
((id), number));

cqlsh> SELECT * from my_keyspace.my_table ;

RE: [EXTERNAL] Re: A Question About Hints

2019-03-05 Thread Durity, Sean R

Versions 2.0 and 2.1 were generally very stable, so I can understand a 
reticence to move when there are so many other things competing for time and 
attention.

Sean Durity




From: shalom sagges 
Sent: Monday, March 04, 2019 4:21 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: A Question About Hints

Everyone really should move off of the 2.x versions just like you are doing.
Tell me about it... But since there are a lot of groups involved, these things 
take time unfortunately.

Thanks for your assistance Kenneth

On Mon, Mar 4, 2019 at 11:04 PM Kenneth Brotman 
mailto:kenbrot...@yahoo.com.invalid>> wrote:
Since you are in the process of upgrading, I’d do nothing on the settings right 
now.  But if you wanted to do something on the settings in the meantime, based 
on my read of the information available, I’d maybe double the default settings. 
The upgrade will help a lot of things as you know.

Everyone really should move off of the 2.x versions just like you are doing.

From: shalom sagges 
[mailto:shalomsag...@gmail.com]
Sent: Monday, March 04, 2019 12:34 PM
To: user@cassandra.apache.org
Subject: Re: A Question About Hints

See my comments inline.

Do the 8 nodes clusters have the problem too?
Yes

To the same extent?
It depends on the throughput, but basically the smaller clusters get low 
throughput, so the problem is naturally smaller.

Is it any cluster across multi-DC’s?
Yes

Do all the clusters use nodes with similar specs?
All nodes have similar specs within a cluster but different specs on different 
clusters.

The version of Cassandra you are on can make a difference.  What version are 
you on?
Currently I'm on various versions, 2.0.14, 2.1.15 and 3.0.12. In the process of 
upgrading to 3.11.4

Did you see Edward Capriolo’s presentation at 26:19 into the YouTube video at: 
https://www.youtube.com/watch?v=uN4FtAjYmLU
 where he briefly mentions you can get into trouble if you go to fast or two 
slow?
I guess you can say it about almost any parameter you change :)

BTW, I thought the comments at the end of the article you mentioned were really 
good.
The entire article is very good, but I wonder if it's still valid since it was 
created around 4 years ago.

Thanks!




On Mon, Mar 4, 2019 at 9:37 PM Kenneth Brotman 
mailto:kenbrotman@yahoocom.invalid>> wrote:
Makes sense  If you have time and don’t mind, could you answer the following:
Do the 8 nodes clusters have the problem too?
To the same extent?
Is it just the clusters with the large node count?
Is it any cluster across multi-DC’s?
Do all the clusters use nodes with similar specs?

The version of Cassandra you are on can make a difference.  What version are 
you on?

Did you see Edward Capriolo’s presentation at 26:19 into the YouTube video at: 
https://www.youtube.com/watch?v=uN4FtAjYmLU
 where he briefly mentions you can get into trouble if you go to fast or two 
slow?
BTW, I thought the comments at the end of the article you mentioned were really 
good.



From: shalom sagges 
[mailto:shalomsag...@gmail.com]
Sent: Monday, March 04, 2019 11:04 AM
To: user@cassandra.apache.org
Subject: Re: A Question About Hints

It varies...
Some clusters have 48 nodes, others 24 nodes and some 8 nodes.
Both settings are on default.

I’d try making a single conservative change to one or the other, measure and 
reassess.  Then do same to other setting.
That's the plan, but I thought I might first get some valuable information from 
someone in the community that has already experienced in this type of change.

Thanks!

On Mon, Mar 4, 2019 at 8:27 PM Kenneth Brotman 
mailto:kenbrot...@yahoo.com.invalid>> wrote:
It sounds like your use case might be appropriate for tuning those two settings 
some.

How many nodes are in the cluster?
Are both settings definitely on the default values currently?

I’d try making a single conservative change to one or the other, measure and 
reassess.  Then do same to other setting.

Then of course share your results with us.

From: shalom sagges 
[mailto:shalomsag...@gmail.com]
Sent: Monday, March 04, 2019 9:54 AM
To: user@cassandra.apache.org
Subject: Re: A Question About Hints

Hi Kenneth,

The concern is that in some cases, hints accumulate on nodes, and it takes a 
while until they are delivered (multi DCs).
I see that

Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

2019-02-27 Thread Peter Heitman

I appreciate the thoughtful replies. We will have to evaluate whether
cassandra is the right datastore for us. It was chosen because our primary
requirement is to store lots of data about lots of devices at a high rate.
The search requirements are very secondary but required for the management
of the devices. We are close to being able to do some scale testing of the
solution and will evaluate the cost of cassandra for this application at
that time.

On Wed, Feb 27, 2019 at 2:04 PM Jonathan Haddad  wrote:

> If the goal is arbitrary queries, I'd avoid Cassandra altogether.  Don't
> use DSE Search or Ellesandra, they're two solutions designed to solve
> problems that are Cassandra first, search second.
>
> I'd go straight to elastic search for workloads that are primarily search
> driven, like you listed above.  The idea of having one DB doing both things
> sounds great until it's an operational nightmare.
>
> On Wed, Feb 27, 2019 at 10:57 AM Rahul Singh 
> wrote:
>
>> +1 on Datastax and could consider looking at Elassandra.
>>
>> On Thu, Feb 7, 2019 at 9:14 AM Durity, Sean R <
>> sean_r_dur...@homedepot.com> wrote:
>>
>>> Kenneth is right. Trying to port/support a relational model to a CQL
>>> model the way you are doing it is not going to go well. You won’t be able
>>> to scale or get the search flexibility that you want. It will make
>>> Cassandra seem like a bad fit. You want to play to Cassandra’s strengths –
>>> availability, low latency, scalability, etc. so you need to store the data
>>> the way you want to retrieve it (query first modeling!). You could look at
>>> defining the “right” partition and clustering keys, so that the searches
>>> are within a single, reasonably sized partition. And you could have lookup
>>> tables for other common search patterns (item_by_model_name, etc.)
>>>
>>>
>>>
>>> If that kind of modeling gets you to a situation where you have too many
>>> lookup tables to keep consistent, you could consider something like
>>> DataStax Enterprise Search (embedded SOLR) to create SOLR indexes on
>>> searchable fields. A SOLR query will typically be an order of magnitude
>>> slower than a partition key lookup, though.
>>>
>>>
>>>
>>> It really boils down to the purpose of the data store. If you are
>>> looking for primarily an “anything goes” search engine, Cassandra may not
>>> be a good choice. If you need Cassandra-level availability, extremely low
>>> latency queries (on known access patterns), high volume/low latency writes,
>>> easy scalability, etc. then you are going to have to rethink how you model
>>> the data.
>>>
>>>
>>>
>>>
>>>
>>> Sean Durity
>>>
>>>
>>>
>>> *From:* Kenneth Brotman 
>>> *Sent:* Thursday, February 07, 2019 7:01 AM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* [EXTERNAL] RE: SASI queries- cqlsh vs java driver
>>>
>>>
>>>
>>> Peter,
>>>
>>>
>>>
>>> Sounds like you may need to use a different architecture.  Perhaps you
>>> need something like Presto or Kafka as a part of the solution.  If the data
>>> from the legacy system is wrong for Cassandra it’s an ETL problem?  You’d
>>> have to transform the data you want to use with Cassandra so that a proper
>>> data model for Cassandra can be used.
>>>
>>>
>>>
>>> *From:* Peter Heitman [mailto:pe...@heitman.us ]
>>> *Sent:* Wednesday, February 06, 2019 10:05 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>>
>>>
>>>
>>> Yes, I have read the material. The problem is that the application has a
>>> query facility available to the user where they can type in "(A = foo AND B
>>> = bar) OR C = chex" where A, B, and C are from a defined list of terms,
>>> many of which are columns in the mytable below while others are from other
>>> tables. This query facility was implemented and shipped years before we
>>> decided to move to Cassandra
>>>
>>> On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman <
>>> kenbrot...@yahoo.com.invalid> wrote:
>>>
>>> The problem is you’re not using a query first design.  I would recommend
>>> first reading chapter 5 of Cassandra: The Definitive Guide by Jeff
>>> Carpenter and Eben Hewitt.  It’s available free online at this link
>>> 
>>> .
>>>
>>>
>>>
>>> Kenneth Brotman
>>>
>>>
>>>
>>> *From:* Peter Heitman [mailto:pe...@heitman.us]
>>> *Sent:* Wednesday, February 06, 2019 6:33 PM
>>>
>>>
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>>
>>>
>>>
>>> Yes, I

Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

2019-02-27 Thread Jonathan Haddad

If the goal is arbitrary queries, I'd avoid Cassandra altogether.  Don't
use DSE Search or Ellesandra, they're two solutions designed to solve
problems that are Cassandra first, search second.

I'd go straight to elastic search for workloads that are primarily search
driven, like you listed above.  The idea of having one DB doing both things
sounds great until it's an operational nightmare.

On Wed, Feb 27, 2019 at 10:57 AM Rahul Singh 
wrote:

> +1 on Datastax and could consider looking at Elassandra.
>
> On Thu, Feb 7, 2019 at 9:14 AM Durity, Sean R 
> wrote:
>
>> Kenneth is right. Trying to port/support a relational model to a CQL
>> model the way you are doing it is not going to go well. You won’t be able
>> to scale or get the search flexibility that you want. It will make
>> Cassandra seem like a bad fit. You want to play to Cassandra’s strengths –
>> availability, low latency, scalability, etc. so you need to store the data
>> the way you want to retrieve it (query first modeling!). You could look at
>> defining the “right” partition and clustering keys, so that the searches
>> are within a single, reasonably sized partition. And you could have lookup
>> tables for other common search patterns (item_by_model_name, etc.)
>>
>>
>>
>> If that kind of modeling gets you to a situation where you have too many
>> lookup tables to keep consistent, you could consider something like
>> DataStax Enterprise Search (embedded SOLR) to create SOLR indexes on
>> searchable fields. A SOLR query will typically be an order of magnitude
>> slower than a partition key lookup, though.
>>
>>
>>
>> It really boils down to the purpose of the data store. If you are looking
>> for primarily an “anything goes” search engine, Cassandra may not be a good
>> choice. If you need Cassandra-level availability, extremely low latency
>> queries (on known access patterns), high volume/low latency writes, easy
>> scalability, etc. then you are going to have to rethink how you model the
>> data.
>>
>>
>>
>>
>>
>> Sean Durity
>>
>>
>>
>> *From:* Kenneth Brotman 
>> *Sent:* Thursday, February 07, 2019 7:01 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] RE: SASI queries- cqlsh vs java driver
>>
>>
>>
>> Peter,
>>
>>
>>
>> Sounds like you may need to use a different architecture.  Perhaps you
>> need something like Presto or Kafka as a part of the solution.  If the data
>> from the legacy system is wrong for Cassandra it’s an ETL problem?  You’d
>> have to transform the data you want to use with Cassandra so that a proper
>> data model for Cassandra can be used.
>>
>>
>>
>> *From:* Peter Heitman [mailto:pe...@heitman.us ]
>> *Sent:* Wednesday, February 06, 2019 10:05 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>
>>
>>
>> Yes, I have read the material. The problem is that the application has a
>> query facility available to the user where they can type in "(A = foo AND B
>> = bar) OR C = chex" where A, B, and C are from a defined list of terms,
>> many of which are columns in the mytable below while others are from other
>> tables. This query facility was implemented and shipped years before we
>> decided to move to Cassandra
>>
>> On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman <
>> kenbrot...@yahoo.com.invalid> wrote:
>>
>> The problem is you’re not using a query first design.  I would recommend
>> first reading chapter 5 of Cassandra: The Definitive Guide by Jeff
>> Carpenter and Eben Hewitt.  It’s available free online at this link
>> 
>> .
>>
>>
>>
>> Kenneth Brotman
>>
>>
>>
>> *From:* Peter Heitman [mailto:pe...@heitman.us]
>> *Sent:* Wednesday, February 06, 2019 6:33 PM
>>
>>
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>
>>
>>
>> Yes, I "know" that allow filtering is a sign of a (possibly fatal)
>> inefficient data model. I haven't figured out how to do it correctly yet
>>
>> On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman <
>> kenbrot...@yahoo.com.invalid> wrote:
>>
>> Exactly.  When you design your data model correctly you shouldn’t have to
>> use ALLOW FILTERING in the queries.  That is not recommended.
>>
>>
>>
>> Kenneth Brotman
>>
>>
>>
>> *From:* Peter Heitman [mailto:pe...@heitman.us]
>> *Sent:* Wednesday, February 06, 2019 6:09 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>
>>
>>
>> You are completely right! My problem is that I am

Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

2019-02-27 Thread Rahul Singh

+1 on Datastax and could consider looking at Elassandra.

On Thu, Feb 7, 2019 at 9:14 AM Durity, Sean R 
wrote:

> Kenneth is right. Trying to port/support a relational model to a CQL model
> the way you are doing it is not going to go well. You won’t be able to
> scale or get the search flexibility that you want. It will make Cassandra
> seem like a bad fit. You want to play to Cassandra’s strengths –
> availability, low latency, scalability, etc. so you need to store the data
> the way you want to retrieve it (query first modeling!). You could look at
> defining the “right” partition and clustering keys, so that the searches
> are within a single, reasonably sized partition. And you could have lookup
> tables for other common search patterns (item_by_model_name, etc.)
>
>
>
> If that kind of modeling gets you to a situation where you have too many
> lookup tables to keep consistent, you could consider something like
> DataStax Enterprise Search (embedded SOLR) to create SOLR indexes on
> searchable fields. A SOLR query will typically be an order of magnitude
> slower than a partition key lookup, though.
>
>
>
> It really boils down to the purpose of the data store. If you are looking
> for primarily an “anything goes” search engine, Cassandra may not be a good
> choice. If you need Cassandra-level availability, extremely low latency
> queries (on known access patterns), high volume/low latency writes, easy
> scalability, etc. then you are going to have to rethink how you model the
> data.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Kenneth Brotman 
> *Sent:* Thursday, February 07, 2019 7:01 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] RE: SASI queries- cqlsh vs java driver
>
>
>
> Peter,
>
>
>
> Sounds like you may need to use a different architecture.  Perhaps you
> need something like Presto or Kafka as a part of the solution.  If the data
> from the legacy system is wrong for Cassandra it’s an ETL problem?  You’d
> have to transform the data you want to use with Cassandra so that a proper
> data model for Cassandra can be used.
>
>
>
> *From:* Peter Heitman [mailto:pe...@heitman.us ]
> *Sent:* Wednesday, February 06, 2019 10:05 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: SASI queries- cqlsh vs java driver
>
>
>
> Yes, I have read the material. The problem is that the application has a
> query facility available to the user where they can type in "(A = foo AND B
> = bar) OR C = chex" where A, B, and C are from a defined list of terms,
> many of which are columns in the mytable below while others are from other
> tables. This query facility was implemented and shipped years before we
> decided to move to Cassandra
>
> On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman 
> wrote:
>
> The problem is you’re not using a query first design.  I would recommend
> first reading chapter 5 of Cassandra: The Definitive Guide by Jeff
> Carpenter and Eben Hewitt.  It’s available free online at this link
> 
> .
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Peter Heitman [mailto:pe...@heitman.us]
> *Sent:* Wednesday, February 06, 2019 6:33 PM
>
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: SASI queries- cqlsh vs java driver
>
>
>
> Yes, I "know" that allow filtering is a sign of a (possibly fatal)
> inefficient data model. I haven't figured out how to do it correctly yet
>
> On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman 
> wrote:
>
> Exactly.  When you design your data model correctly you shouldn’t have to
> use ALLOW FILTERING in the queries.  That is not recommended.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Peter Heitman [mailto:pe...@heitman.us]
> *Sent:* Wednesday, February 06, 2019 6:09 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: SASI queries- cqlsh vs java driver
>
>
>
> You are completely right! My problem is that I am trying to port code for
> SQL to CQL for an application that provides the user with a relatively
> general search facility. The original implementation didn't worry about
> secondary indexes - it just took advantage of the ability to create
> arbitrarily complex queries with inner joins, left joins, etc. I am
> reimplimenting it to create a parse tree of CQL queries and doing the ANDs
> and ORs in the application. Of course once I get enough of this implemented
> I will have to load up the table with a large data set and see if it gives
> acceptable performance for our use case.
>
> On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman 
>

RE: [EXTERNAL] Re: Question on changing node IP address

2019-02-27 Thread Durity, Sean R

I am not making a recommendation for anyone else – just sharing our experience 
and reasoning. It is why I argued for keeping PropertyFileSnitch in some JIRA 
that proposed dropping it completely. GPFS is the typical recommendation for 
production use. Just a hurdle not worth my time at the moment.

From: Alexander Dejanovski 
Sent: Wednesday, February 27, 2019 9:22 AM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Question on changing node IP address

It has to be balanced with the dangers related to the PropertyFileSnitch.
I've seen such incidents happen twice in the last few months in different 
places and both times recovery was difficult and hazardous.

I still strongly recommend against it.

On Wed, Feb 27, 2019 at 3:11 PM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
We use the PropertyFileSnitch precisely because it is the same on every node. 
If each node has to have a different file (for GPFS) – deployment is more 
complicated. (And for any automated configuration you would have a list of 
hosts and DC/rack information to compile anyway)

I do put UNKNOWN as the default DC so that any missed node easily appears in 
its own unused DC.

Sean Durity

From: Alexander Dejanovski 
mailto:a...@thelastpickle.com>>
Sent: Wednesday, February 27, 2019 4:43 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Question on changing node IP address

This snitch is easy to misconfigure. It allows some nodes to have a different 
view of the cluster if they are configured differently, which can result in 
data loss (or at least data that is very hard to recover).
Also it has a nasty feature that allows to set a default DC/Rack. If one node 
isn't properly declared in all the files throughout the cluster, it will be 
seen as part of that "default" DC and then again, it's hard to recover.
Be aware that while the GossipingPropertyFileSnitch will not allow changing 
rack of DC for a node that already bootstrapped, the PropertyFileSnitch will 
allow to change it without any notice. So a little misconfiguration could merge 
all nodes from DC1 into DC2, abruptly changing token ownership (and it could 
very be the case that DC1 thinks it's part of DC2 but DC2 still thinks DC1 is 
DC1).
So again, I think this snitch is dangerous and shouldn't be used. The 
GossipingPropertyFileSnitch is much more secure and easy to use.

Cheers,

On Wed, Feb 27, 2019 at 10:13 AM shalom sagges 
mailto:shalomsag...@gmail.com>> wrote:
If you're using the PropertyFileSnitch, well... you shouldn't as it's a rather 
dangerous and tedious snitch to use

I inherited Cassandra clusters that use the PropertyFileSnitch. It's been 
working fine, but you've kinda scared me :-)
Why is it dangerous to use?
If I decide to change the snitch, is it seamless or is there a specific 
procedure one must follow?

Thanks!

On Wed, Feb 27, 2019 at 10:08 AM Alexander Dejanovski 
mailto:a...@thelastpickle.com>> wrote:
I confirm what Oleksandr said.
Just stop Cassandra, change the IP, and restart Cassandra.
If you're using the GossipingPropertyFileSnitch, the node will redeclare its 
new IP through Gossip and that's it.
If you're using the PropertyFileSnitch, well... you shouldn't as it's a rather 
dangerous and tedious snitch to use. But if you are, it'll require to change 
the file containing all the IP addresses across the cluster.

I've been changing IPs on a whole cluster back in 2.1 this way and it went 
through seamlessly.

Cheers,

On Wed, Feb 27, 2019 at 8:54 AM Oleksandr Shulgin 
mailto:oleksandr.shul...@zalando.de>> wrote:
On Wed, Feb 27, 2019 at 4:15 AM 
wxn...@zjqunshuo.com<mailto:wxn...@zjqunshuo.com> 
mailto:wxn...@zjqunshuo.com>> wrote:
>After restart with the new address the server will notice it and log a 
>warning, but it will keep token ownership as long as it keeps the old host id 
>(meaning it must use the same data directory as before restart).

Based on my understanding, token range is binded to host id. As long as host id 
doesn't change, everything is ok. Besides data directory, any other thing can 
lead to host id change? And how host id is caculated? For example, if I upgrade 
Cassandra binary to a new version, after restart, will host id change?

I believe host id is calculated once the new node is initialized and never 
changes afterwards, even through major upgrades.  It is stored in system 
keyspace in data directory, and is stable across restarts.

--
Alex

--
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.thelastpickle.com_=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=xojzDh-fJSOl_ZfDMCYIYi4sckWpwqdnKDG5QMx2nUE=HwAxm8xI-Bmc8IFmwEK0we9hlvNwUVuj7DGpXuNM8r4=>
--
-
Alexander Dejanovski
France
@alexanderdeja

Re: [EXTERNAL] Re: Question on changing node IP address

2019-02-27 Thread Oleksandr Shulgin

On Wed, Feb 27, 2019 at 3:11 PM Durity, Sean R 
wrote:

> We use the PropertyFileSnitch precisely because it is the same on every
> node. If each node has to have a different file (for GPFS) – deployment is
> more complicated. (And for any automated configuration you would have a
> list of hosts and DC/rack information to compile anyway)
>
>
>
> I do put UNKNOWN as the default DC so that any missed node easily appears
> in its own unused DC.
>

Alright, it obviously makes a lot of difference which snitch to use.  We
are deploying to EC2, so we are using the EC2 snitches at all times.  I
guess some complexity is hidden from us by these custom implementations.

At the same time, we do try to assign IP addresses in a predictable manner
when deploying a new cluster, in order to fix the list of seed nodes in
advance (we wouldn't care about the rest of nodes).

So I think, for the original question: careful when changing IP address of
seeds nodes.  Probably you want to start with non-seeds and promote some of
them to seeds before you start changing IP addresses of the old seeds.

--
Alex

Re: [EXTERNAL] Re: Question on changing node IP address

2019-02-27 Thread Alexander Dejanovski

It has to be balanced with the dangers related to the PropertyFileSnitch.
I've seen such incidents happen twice in the last few months in different
places and both times recovery was difficult and hazardous.

I still strongly recommend against it.

On Wed, Feb 27, 2019 at 3:11 PM Durity, Sean R 
wrote:

> We use the PropertyFileSnitch precisely because it is the same on every
> node. If each node has to have a different file (for GPFS) – deployment is
> more complicated. (And for any automated configuration you would have a
> list of hosts and DC/rack information to compile anyway)
>
>
>
> I do put UNKNOWN as the default DC so that any missed node easily appears
> in its own unused DC.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Alexander Dejanovski 
> *Sent:* Wednesday, February 27, 2019 4:43 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Question on changing node IP address
>
>
>
> This snitch is easy to misconfigure. It allows some nodes to have a
> different view of the cluster if they are configured differently, which can
> result in data loss (or at least data that is very hard to recover).
>
> Also it has a nasty feature that allows to set a default DC/Rack. If one
> node isn't properly declared in all the files throughout the cluster, it
> will be seen as part of that "default" DC and then again, it's hard to
> recover.
>
> Be aware that while the GossipingPropertyFileSnitch will not allow
> changing rack of DC for a node that already bootstrapped, the
> PropertyFileSnitch will allow to change it without any notice. So a little
> misconfiguration could merge all nodes from DC1 into DC2, abruptly changing
> token ownership (and it could very be the case that DC1 thinks it's part of
> DC2 but DC2 still thinks DC1 is DC1).
>
> So again, I think this snitch is dangerous and shouldn't be used. The
> GossipingPropertyFileSnitch is much more secure and easy to use.
>
>
>
> Cheers,
>
>
>
>
>
> On Wed, Feb 27, 2019 at 10:13 AM shalom sagges 
> wrote:
>
> If you're using the PropertyFileSnitch, well... you shouldn't as it's a
> rather dangerous and tedious snitch to use
>
>
>
> I inherited Cassandra clusters that use the PropertyFileSnitch. It's been
> working fine, but you've kinda scared me :-)
>
> Why is it dangerous to use?
>
> If I decide to change the snitch, is it seamless or is there a specific
> procedure one must follow?
>
>
>
> Thanks!
>
>
>
>
>
> On Wed, Feb 27, 2019 at 10:08 AM Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
> I confirm what Oleksandr said.
>
> Just stop Cassandra, change the IP, and restart Cassandra.
>
> If you're using the GossipingPropertyFileSnitch, the node will redeclare
> its new IP through Gossip and that's it.
>
> If you're using the PropertyFileSnitch, well... you shouldn't as it's a
> rather dangerous and tedious snitch to use. But if you are, it'll require
> to change the file containing all the IP addresses across the cluster.
>
>
>
> I've been changing IPs on a whole cluster back in 2.1 this way and it went
> through seamlessly.
>
>
>
> Cheers,
>
>
>
> On Wed, Feb 27, 2019 at 8:54 AM Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
> On Wed, Feb 27, 2019 at 4:15 AM wxn...@zjqunshuo.com 
> wrote:
>
> >After restart with the new address the server will notice it and log a
> warning, but it will keep token ownership as long as it keeps the old host
> id (meaning it must use the same data directory as before restart).
>
>
>
> Based on my understanding, token range is binded to host id. As long as
> host id doesn't change, everything is ok. Besides data directory, any other
> thing can lead to host id change? And how host id is caculated? For
> example, if I upgrade Cassandra binary to a new version, after restart,
> will host id change?
>
>
>
> I believe host id is calculated once the new node is initialized and never
> changes afterwards, even through major upgrades.  It is stored in system
> keyspace in data directory, and is stable across restarts.
>
>
>
> --
>
> Alex
>
>
>
> --
>
> -
>
> Alexander Dejanovski
>
> France
>
> @alexanderdeja
>
>
>
> Consultant
>
> Apache Cassandra Consulting
>
> http://www.thelastpickle.com
> 
>
> --
>
> -
>
> Alexander Dejanovski
>
> France
>
> @alexanderdeja
>
>
>
> Consultant
>
> Apache Cassandra Consulting
>
> http://www.thelastpickle.com
> 
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the

RE: [EXTERNAL] Re: Question on changing node IP address

2019-02-27 Thread Durity, Sean R

We use the PropertyFileSnitch precisely because it is the same on every node. 
If each node has to have a different file (for GPFS) – deployment is more 
complicated. (And for any automated configuration you would have a list of 
hosts and DC/rack information to compile anyway)

I do put UNKNOWN as the default DC so that any missed node easily appears in 
its own unused DC.

Sean Durity

From: Alexander Dejanovski 
Sent: Wednesday, February 27, 2019 4:43 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Question on changing node IP address

This snitch is easy to misconfigure. It allows some nodes to have a different 
view of the cluster if they are configured differently, which can result in 
data loss (or at least data that is very hard to recover).
Also it has a nasty feature that allows to set a default DC/Rack. If one node 
isn't properly declared in all the files throughout the cluster, it will be 
seen as part of that "default" DC and then again, it's hard to recover.
Be aware that while the GossipingPropertyFileSnitch will not allow changing 
rack of DC for a node that already bootstrapped, the PropertyFileSnitch will 
allow to change it without any notice. So a little misconfiguration could merge 
all nodes from DC1 into DC2, abruptly changing token ownership (and it could 
very be the case that DC1 thinks it's part of DC2 but DC2 still thinks DC1 is 
DC1).
So again, I think this snitch is dangerous and shouldn't be used. The 
GossipingPropertyFileSnitch is much more secure and easy to use.

Cheers,

On Wed, Feb 27, 2019 at 10:13 AM shalom sagges 
mailto:shalomsag...@gmail.com>> wrote:
If you're using the PropertyFileSnitch, well... you shouldn't as it's a rather 
dangerous and tedious snitch to use

I inherited Cassandra clusters that use the PropertyFileSnitch. It's been 
working fine, but you've kinda scared me :-)
Why is it dangerous to use?
If I decide to change the snitch, is it seamless or is there a specific 
procedure one must follow?

Thanks!

On Wed, Feb 27, 2019 at 10:08 AM Alexander Dejanovski 
mailto:a...@thelastpickle.com>> wrote:
I confirm what Oleksandr said.
Just stop Cassandra, change the IP, and restart Cassandra.
If you're using the GossipingPropertyFileSnitch, the node will redeclare its 
new IP through Gossip and that's it.
If you're using the PropertyFileSnitch, well... you shouldn't as it's a rather 
dangerous and tedious snitch to use. But if you are, it'll require to change 
the file containing all the IP addresses across the cluster.

I've been changing IPs on a whole cluster back in 2.1 this way and it went 
through seamlessly.

Cheers,

On Wed, Feb 27, 2019 at 8:54 AM Oleksandr Shulgin 
mailto:oleksandr.shul...@zalando.de>> wrote:
On Wed, Feb 27, 2019 at 4:15 AM 
wxn...@zjqunshuo.com 
mailto:wxn...@zjqunshuo.com>> wrote:
>After restart with the new address the server will notice it and log a 
>warning, but it will keep token ownership as long as it keeps the old host id 
>(meaning it must use the same data directory as before restart).

Based on my understanding, token range is binded to host id. As long as host id 
doesn't change, everything is ok. Besides data directory, any other thing can 
lead to host id change? And how host id is caculated? For example, if I upgrade 
Cassandra binary to a new version, after restart, will host id change?

I believe host id is calculated once the new node is initialized and never 
changes afterwards, even through major upgrades.  It is stored in system 
keyspace in data directory, and is stable across restarts.

--
Alex

--
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com
--
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all

Re: [EXTERNAL] Re: Question on changing node IP address

2019-02-26 Thread Oleksandr Shulgin

On Tue, Feb 26, 2019 at 3:26 PM Durity, Sean R 
wrote:

> This has not been my experience. Changing IP address is one of the worst
> admin tasks for Cassandra. System.peers and other information on each nodes
> is stored by ip address. And gossip is really good at sending around the
> old information mixed with new…
>

Hm, on which version was it?  I might be biased not having worked with
anything but 3.0 since recently.

--
Alex

RE: [EXTERNAL] Re: Question on changing node IP address

2019-02-26 Thread Durity, Sean R

This has not been my experience. Changing IP address is one of the worst admin 
tasks for Cassandra. System.peers and other information on each nodes is stored 
by ip address. And gossip is really good at sending around the old information 
mixed with new…



Sean Durity

From: Oleksandr Shulgin 
Sent: Tuesday, February 26, 2019 5:36 AM
To: User 
Subject: [EXTERNAL] Re: Question on changing node IP address

On Tue, Feb 26, 2019 at 9:39 AM 
wxn...@zjqunshuo.com 
mailto:wxn...@zjqunshuo.com>> wrote:

I'm running 2.2.8 with vnodes and I'm planning to change node IP address.
My procedure is:
Turn down one node, setting auto_bootstrap to false in yaml file, then bring it 
up with -Dcassandra.replace_address. Repeat the procedure one by one for the 
other nodes.

I care about streaming because the data is very large and if there is 
streaming, it will take a long time. When the node with new IP be brought up, 
will it take over the token range it has before? I expect no token range 
reassignment and no streaming. Am I right?

Any thing I need care about when making IP address change?

Changing the IP address of a node does not require special considerations.  
After restart with the new address the server will notice it and log a warning, 
but it will keep token ownership as long as it keeps the old host id (meaning 
it must use the same data directory as before restart).

At the same time, *do not* use the replace_address option: it assumes empty 
data directory and will try to stream data from other replicas into the node.

--
Alex




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-14 Thread Jeff Jirsa

It takes effect on each sstable as it’s written, so you have to rewrite your 
dataset before it’s fully in effect

You can do that with “nodetool upgradesstables -a”



-- 
Jeff Jirsa


> On Feb 13, 2019, at 11:43 PM, "ishib...@gmail.com"  wrote:
> 
> Hi Jeff,
> If increase the value, it will affect only newly created indexes. Will repair 
> rebuilds old indexes with new , larger, size, or leave them with the same 
> size?
> 
> Best regards, Ilya
> 
> 
> 
>  Ð˜ÑÑ…Ð¾Ð´Ð½Ð¾Ðµ ÑÐ¾Ð¾Ð±Ñ‰ÐµÐ½Ð¸Ðµ 
> Ð¢ÐµÐ¼Ð°: Re: [EXTERNAL] Re: Make large partitons lighter on select without 
> changing primary partition formation.
> ÐžÑ‚: Jeff Jirsa 
> ÐšÐ¾Ð¼Ñƒ: user@cassandra.apache.org
> ÐšÐ¾Ð¿Ð¸Ñ: 
> 
> 
> We build an index on each partition as we write it - in 3.0 itâ€™s a list 
> that relates the clustering columns for a given partition key to a point in 
> the file. When you read, we use that index to skip to the point at the 
> beginning of your read.
> 
> That 64k value is just a default that few people ever have reason to change - 
> itâ€™s somewhat similar to the 64k compression chunk size, though theyâ€™re 
> not aligned.
> 
> If you increase the value from 64k to 128k, youâ€™ll have half as many index 
> markers per partition. This means when you use the index, youâ€™ll do a bit 
> more IO to find the actual start of your result. However, it also means you 
> have half as many index objects created in the heap on reads - for many uses 
> cases with wife partitions, the indexinfo objects on reads create far too 
> much garbage and cause bad latency/gc. This just gives you a way to trade off 
> between the two options - disk IO or gc pauses.
> 
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Feb 13, 2019, at 10:45 PM, "ishib...@gmail.com"  
>> wrote:
>> 
>> Hello!
>> increase column_index_size_in_kb for rarely index creations, am I correct?
>> But will it be used in every read request, or column index for queries 
>> within a row only?
>> 
>> Best regards, Ilya
>> 
>> 
>> 
>>  ÃËœÃ‘ÂÃ‘â€¦ÃÂ¾ÃÂ´ÃÂ½ÃÂ¾ÃÂµ 
>> Ã‘ÂÃÂ¾ÃÂ¾ÃÂ±Ã‘â€°ÃÂµÃÂ½ÃÂ¸ÃÂµ 
>> ÃÂ¢ÃÂµÃÂ¼ÃÂ°: Re: [EXTERNAL] Re: Make large partitons lighter on select 
>> without changing primary partition formation.
>> ÃÅ¾Ã‘â€š: Jeff Jirsa 
>> ÃÅ¡ÃÂ¾ÃÂ¼Ã‘Æ’: user@cassandra.apache.org
>> ÃÅ¡ÃÂ¾ÃÂ¿ÃÂ¸Ã‘Â: 
>> 
>> 
>> Cassandra-11206 (https://issues.apache.org/jira/browse/CASSANDRA-11206) is 
>> in 3.11 and does have a few knobs to make this less painful
>> 
>> You can also increase the column index size from 64kb to something 
>> significantly higher to decrease the cost of those reads on the JVM 
>> (shifting cost to the disk) - consider 256k or 512k for 100-1000mb 
>> partitions.
>> 
>> -- 
>> Jeff Jirsa
>> 
>> 
>>> On Feb 13, 2019, at 5:48 AM, Durity, Sean R  
>>> wrote:
>>> 
>>> Agreed. ItÃ¢â‚¬â„¢s pretty close to impossible to administrate your way out 
>>> of a data model that doesnÃ¢â‚¬â„¢t play to CassandraÃ¢â‚¬â„¢s strengths. 
>>> Which is true for other data storage technologies Ã¢â‚¬â€œ you need to 
>>> model the data the way that the engine is designed to work.
>>>  
>>>  
>>> Sean Durity
>>>  
>>> From: DuyHai Doan  
>>> Sent: Wednesday, February 13, 2019 8:08 AM
>>> To: user 
>>> Subject: [EXTERNAL] Re: Make large partitons lighter on select without 
>>> changing primary partition formation.
>>>  
>>> Plain answer is NO
>>>  
>>> There is a slight hope that the JIRA 
>>> https://issues.apache.org/jira/browse/CASSANDRA-9754 gets into 4.0 release
>>>  
>>> But right now, there seems to be few interest in this ticket, the last 
>>> comment 23/Feb/2017 old ...
>>>  
>>>  
>>> On Wed, Feb 13, 2019 at 1:18 PM Vsevolod Filaretov  
>>> wrote:
>>> Hi all,
>>>  
>>> The question.
>>>  
>>> We have Cassandra 3.11.1 with really heavy primary partitions:
>>> cfhistograms 95%% is 130+ mb, 95%% cell count is 3.3mln and higher, 98%% 
>>> partition size is 220+ mb sometimes partitions are 1+ gb. We have regular 
>>> problems with node lockdowns leading to read request timeouts under read 
>>> requests load.
>>>  
>>> Changing primary partition key structure is out of question.
>>>  
>>> Are there any sharding techniques available to dilute partitions at level 
>>> lower than 'select' requests to make read performance bett

Re: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-13 Thread ishib...@gmail.com

Hi Jeff,If increase the value, it will affect only newly created indexes. Will repair rebuilds old indexes with new , larger, size, or leave them with the same size?Best regards, Ilya Исходное сообщение Тема: Re: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.От: Jeff Jirsa Кому: user@cassandra.apache.orgКопия: We build an index on each partition as we write it - in 3.0 it’s a list that relates the clustering columns for a given partition key to a point in the file. When you read, we use that index to skip to the point at the beginning of your read.That 64k value is just a default that few people ever have reason to change - it’s somewhat similar to the 64k compression chunk size, though they’re not aligned.If you increase the value from 64k to 128k, you’ll have half as many index markers per partition. This means when you use the index, you’ll do a bit more IO to find the actual start of your result. However, it also means you have half as many index objects created in the heap on reads - for many uses cases with wife partitions, the indexinfo objects on reads create far too much garbage and cause bad latency/gc. This just gives you a way to trade off between the two options - disk IO or gc pauses.-- Jeff JirsaOn Feb 13, 2019, at 10:45 PM, "ishib...@gmail.com" <ishib...@gmail.com> wrote:Hello!increase column_index_size_in_kb for rarely index creations, am I correct?But will it be used in every read request, or column index for queries within a row only?Best regards, Ilya Ð˜ÑÑ…Ð¾Ð´Ð½Ð¾Ðµ ÑÐ¾Ð¾Ð±Ñ‰ÐµÐ½Ð¸Ðµ --------Ð¢ÐµÐ¼Ð°: Re: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.ÐžÑ‚: Jeff Jirsa ÐšÐ¾Ð¼Ñƒ: user@cassandra.apache.orgÐšÐ¾Ð¿Ð¸Ñ: Cassandra-11206 (https://issues.apache.org/jira/browse/CASSANDRA-11206) is in 3.11 and does have a few knobs to make this less painfulYou can also increase the column index size from 64kb to something significantly higher to decrease the cost of those reads on the JVM (shifting cost to the disk) - consider 256k or 512k for 100-1000mb partitions.-- Jeff JirsaOn Feb 13, 2019, at 5:48 AM, Durity, Sean R <sean_r_dur...@homedepot.com> wrote:







Agreed. Itâ€™s pretty close to impossible to administrate your way out of a data model that doesnâ€™t play to Cassandraâ€™s strengths. Which is true for other data
 storage technologies â€“ you need to model the data the way that the engine is designed to work.
 
 
Sean Durity
 
From: DuyHai Doan <doanduy...@gmail.com>

Sent: Wednesday, February 13, 2019 8:08 AM
To: user <user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.
 


Plain answer is NO

 


There is a slight hope that the JIRA https://issues.apache.org/jira/browse/CASSANDRA-9754
 gets into 4.0 release


 


But right now, there seems to be few interest in this ticket, the last comment 23/Feb/2017 old ...


 



 


On Wed, Feb 13, 2019 at 1:18 PM Vsevolod Filaretov <vsfilare...@gmail.com> wrote:



Hi all,

 


The question.


 


We have Cassandra 3.11.1 with really heavy primary partitions:


cfhistograms 95%% is 130+ mb, 95%% cell count is 3.3mln and higher, 98%% partition size is 220+ mb sometimes partitions are 1+ gb. We have regular problems with node lockdowns leading to read request timeouts under read requests load.


 


Changing primary partition key structure is out of question.


 


Are there any sharding techniques available to dilute partitions at level lower than 'select' requests to make read performance better? Without changing read requests syntax?


 


Thank you all in advance,


Vsevolod Filaretov.








The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution
 or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The
 Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan
 horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.



Ð¢ÐÐ¥FòVç7V'67&–ÂRÖÖ–Ã¢W6W"×Vç7V'67&–676æG&æ6†Ræ÷Ð¤f÷"FF—F–öæÂ6öÖÖæG2ÂRÖÖ–Ã¢W6W"Ö†VÇ676æG&æ6†Ræ÷Ð

Re: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-13 Thread Jeff Jirsa

We build an index on each partition as we write it - in 3.0 it’s a list that 
relates the clustering columns for a given partition key to a point in the 
file. When you read, we use that index to skip to the point at the beginning of 
your read.

That 64k value is just a default that few people ever have reason to change - 
it’s somewhat similar to the 64k compression chunk size, though they’re not 
aligned.

If you increase the value from 64k to 128k, you’ll have half as many index 
markers per partition. This means when you use the index, you’ll do a bit more 
IO to find the actual start of your result. However, it also means you have 
half as many index objects created in the heap on reads - for many uses cases 
with wife partitions, the indexinfo objects on reads create far too much 
garbage and cause bad latency/gc. This just gives you a way to trade off 
between the two options - disk IO or gc pauses.


-- 
Jeff Jirsa


> On Feb 13, 2019, at 10:45 PM, "ishib...@gmail.com"  wrote:
> 
> Hello!
> increase column_index_size_in_kb for rarely index creations, am I correct?
> But will it be used in every read request, or column index for queries within 
> a row only?
> 
> Best regards, Ilya
> 
> 
> 
>  Ð˜ÑÑ…Ð¾Ð´Ð½Ð¾Ðµ ÑÐ¾Ð¾Ð±Ñ‰ÐµÐ½Ð¸Ðµ 
> Ð¢ÐµÐ¼Ð°: Re: [EXTERNAL] Re: Make large partitons lighter on select without 
> changing primary partition formation.
> ÐžÑ‚: Jeff Jirsa 
> ÐšÐ¾Ð¼Ñƒ: user@cassandra.apache.org
> ÐšÐ¾Ð¿Ð¸Ñ: 
> 
> 
> Cassandra-11206 (https://issues.apache.org/jira/browse/CASSANDRA-11206) is in 
> 3.11 and does have a few knobs to make this less painful
> 
> You can also increase the column index size from 64kb to something 
> significantly higher to decrease the cost of those reads on the JVM (shifting 
> cost to the disk) - consider 256k or 512k for 100-1000mb partitions.
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Feb 13, 2019, at 5:48 AM, Durity, Sean R  
>> wrote:
>> 
>> Agreed. Itâ€™s pretty close to impossible to administrate your way out of a 
>> data model that doesnâ€™t play to Cassandraâ€™s strengths. Which is true for 
>> other data storage technologies â€“ you need to model the data the way that 
>> the engine is designed to work.
>>  
>>  
>> Sean Durity
>>  
>> From: DuyHai Doan  
>> Sent: Wednesday, February 13, 2019 8:08 AM
>> To: user 
>> Subject: [EXTERNAL] Re: Make large partitons lighter on select without 
>> changing primary partition formation.
>>  
>> Plain answer is NO
>>  
>> There is a slight hope that the JIRA 
>> https://issues.apache.org/jira/browse/CASSANDRA-9754 gets into 4.0 release
>>  
>> But right now, there seems to be few interest in this ticket, the last 
>> comment 23/Feb/2017 old ...
>>  
>>  
>> On Wed, Feb 13, 2019 at 1:18 PM Vsevolod Filaretov  
>> wrote:
>> Hi all,
>>  
>> The question.
>>  
>> We have Cassandra 3.11.1 with really heavy primary partitions:
>> cfhistograms 95%% is 130+ mb, 95%% cell count is 3.3mln and higher, 98%% 
>> partition size is 220+ mb sometimes partitions are 1+ gb. We have regular 
>> problems with node lockdowns leading to read request timeouts under read 
>> requests load.
>>  
>> Changing primary partition key structure is out of question.
>>  
>> Are there any sharding techniques available to dilute partitions at level 
>> lower than 'select' requests to make read performance better? Without 
>> changing read requests syntax?
>>  
>> Thank you all in advance,
>> Vsevolod Filaretov.
>> 
>> 
>> The information in this Internet Email is confidential and may be legally 
>> privileged. It is intended solely for the addressee. Access to this Email by 
>> anyone else is unauthorized. If you are not the intended recipient, any 
>> disclosure, copying, distribution or any action taken or omitted to be taken 
>> in reliance on it, is prohibited and may be unlawful. When addressed to our 
>> clients any opinions or advice contained in this Email are subject to the 
>> terms and conditions expressed in any applicable governing The Home Depot 
>> terms of business or client engagement letter. The Home Depot disclaims all 
>> responsibility and liability for the accuracy and content of this attachment 
>> and for any damages or losses arising from any inaccuracies, errors, 
>> viruses, e.g., worms, trojan horses, etc., or other items of a destructive 
>> nature, which may be contained in this attachment and shall not be liable 
>> for direct, indirect, consequential or special damages in connection with 
>> this e-mail message or its attachment.
> Ð¢ÐÐ¥FòVç7V'67&–ÂRÖÖ–Ã¢W6W"×Vç7V'67&–676æG&æ6†Ræ÷Ð¤f÷"FF—F–öæÂ6öÖÖæG2ÂRÖÖ–Ã¢W6W"Ö†VÇ676æG&æ6†Ræ÷Ð

Re: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-13 Thread ishib...@gmail.com

Hello!increase column_index_size_in_kb for rarely index creations, am I correct?But will it be used in every read request, or column index for queries within a row only?Best regards, Ilya Исходное сообщение Тема: Re: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.От: Jeff Jirsa Кому: user@cassandra.apache.orgКопия: Cassandra-11206 (https://issues.apache.org/jira/browse/CASSANDRA-11206) is in 3.11 and does have a few knobs to make this less painfulYou can also increase the column index size from 64kb to something significantly higher to decrease the cost of those reads on the JVM (shifting cost to the disk) - consider 256k or 512k for 100-1000mb partitions.-- Jeff JirsaOn Feb 13, 2019, at 5:48 AM, Durity, Sean R <sean_r_dur...@homedepot.com> wrote:







Agreed. It’s pretty close to impossible to administrate your way out of a data model that doesn’t play to Cassandra’s strengths. Which is true for other data
 storage technologies – you need to model the data the way that the engine is designed to work.
 
 
Sean Durity
 
From: DuyHai Doan <doanduy...@gmail.com>

Sent: Wednesday, February 13, 2019 8:08 AM
To: user <user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.
 


Plain answer is NO

 


There is a slight hope that the JIRA https://issues.apache.org/jira/browse/CASSANDRA-9754
 gets into 4.0 release


 


But right now, there seems to be few interest in this ticket, the last comment 23/Feb/2017 old ...


 



 


On Wed, Feb 13, 2019 at 1:18 PM Vsevolod Filaretov <vsfilare...@gmail.com> wrote:



Hi all,

 


The question.


 


We have Cassandra 3.11.1 with really heavy primary partitions:


cfhistograms 95%% is 130+ mb, 95%% cell count is 3.3mln and higher, 98%% partition size is 220+ mb sometimes partitions are 1+ gb. We have regular problems with node lockdowns leading to read request timeouts under read requests load.


 


Changing primary partition key structure is out of question.


 


Are there any sharding techniques available to dilute partitions at level lower than 'select' requests to make read performance better? Without changing read requests syntax?


 


Thank you all in advance,


Vsevolod Filaretov.








The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution
 or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The
 Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan
 horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

Re: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-13 Thread Vsevolod Filaretov

@all, thank you for your answers,

Jeff, Thank you very much, will look into it.

ср, 13 февр. 2019 г., 18:38 Jeff Jirsa jji...@gmail.com:

> Cassandra-11206 (https://issues.apache.org/jira/browse/CASSANDRA-11206)
> is in 3.11 and does have a few knobs to make this less painful
>
> You can also increase the column index size from 64kb to something
> significantly higher to decrease the cost of those reads on the JVM
> (shifting cost to the disk) - consider 256k or 512k for 100-1000mb
> partitions.
>
> --
> Jeff Jirsa
>
>
> On Feb 13, 2019, at 5:48 AM, Durity, Sean R 
> wrote:
>
> Agreed. It’s pretty close to impossible to administrate your way out of a
> data model that doesn’t play to Cassandra’s strengths. Which is true for
> other data storage technologies – you need to model the data the way that
> the engine is designed to work.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* DuyHai Doan 
> *Sent:* Wednesday, February 13, 2019 8:08 AM
> *To:* user 
> *Subject:* [EXTERNAL] Re: Make large partitons lighter on select without
> changing primary partition formation.
>
>
>
> Plain answer is NO
>
>
>
> There is a slight hope that the JIRA
> https://issues.apache.org/jira/browse/CASSANDRA-9754
> 
> gets into 4.0 release
>
>
>
> But right now, there seems to be few interest in this ticket, the last
> comment 23/Feb/2017 old ...
>
>
>
>
>
> On Wed, Feb 13, 2019 at 1:18 PM Vsevolod Filaretov 
> wrote:
>
> Hi all,
>
>
>
> The question.
>
>
>
> We have Cassandra 3.11.1 with really heavy primary partitions:
>
> cfhistograms 95%% is 130+ mb, 95%% cell count is 3.3mln and higher, 98%%
> partition size is 220+ mb sometimes partitions are 1+ gb. We have regular
> problems with node lockdowns leading to read request timeouts under read
> requests load.
>
>
>
> Changing primary partition key structure is out of question.
>
>
>
> Are there any sharding techniques available to dilute partitions at level
> lower than 'select' requests to make read performance better? Without
> changing read requests syntax?
>
>
>
> Thank you all in advance,
>
> Vsevolod Filaretov.
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
>

Re: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-13 Thread Jeff Jirsa

Cassandra-11206 (https://issues.apache.org/jira/browse/CASSANDRA-11206) is in 
3.11 and does have a few knobs to make this less painful

You can also increase the column index size from 64kb to something 
significantly higher to decrease the cost of those reads on the JVM (shifting 
cost to the disk) - consider 256k or 512k for 100-1000mb partitions.

-- 
Jeff Jirsa


> On Feb 13, 2019, at 5:48 AM, Durity, Sean R  
> wrote:
> 
> Agreed. It’s pretty close to impossible to administrate your way out of a 
> data model that doesn’t play to Cassandra’s strengths. Which is true for 
> other data storage technologies – you need to model the data the way that the 
> engine is designed to work.
>  
>  
> Sean Durity
>  
> From: DuyHai Doan  
> Sent: Wednesday, February 13, 2019 8:08 AM
> To: user 
> Subject: [EXTERNAL] Re: Make large partitons lighter on select without 
> changing primary partition formation.
>  
> Plain answer is NO
>  
> There is a slight hope that the JIRA 
> https://issues.apache.org/jira/browse/CASSANDRA-9754 gets into 4.0 release
>  
> But right now, there seems to be few interest in this ticket, the last 
> comment 23/Feb/2017 old ...
>  
>  
> On Wed, Feb 13, 2019 at 1:18 PM Vsevolod Filaretov  
> wrote:
> Hi all,
>  
> The question.
>  
> We have Cassandra 3.11.1 with really heavy primary partitions:
> cfhistograms 95%% is 130+ mb, 95%% cell count is 3.3mln and higher, 98%% 
> partition size is 220+ mb sometimes partitions are 1+ gb. We have regular 
> problems with node lockdowns leading to read request timeouts under read 
> requests load.
>  
> Changing primary partition key structure is out of question.
>  
> Are there any sharding techniques available to dilute partitions at level 
> lower than 'select' requests to make read performance better? Without 
> changing read requests syntax?
>  
> Thank you all in advance,
> Vsevolod Filaretov.
> 
> 
> The information in this Internet Email is confidential and may be legally 
> privileged. It is intended solely for the addressee. Access to this Email by 
> anyone else is unauthorized. If you are not the intended recipient, any 
> disclosure, copying, distribution or any action taken or omitted to be taken 
> in reliance on it, is prohibited and may be unlawful. When addressed to our 
> clients any opinions or advice contained in this Email are subject to the 
> terms and conditions expressed in any applicable governing The  Home Depot 
> terms of business or client engagement letter. The Home Depot disclaims all 
> responsibility and liability for the accuracy and content of this attachment 
> and for any damages or losses arising from any inaccuracies, errors, viruses, 
> e.g., worms, trojan horses, etc., or other items of a destructive nature, 
> which may be contained in this attachment and shall not be liable for direct, 
> indirect, consequential or special damages in connection with this e-mail 
> message or its attachment.

RE: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-13 Thread Durity, Sean R

Agreed. It’s pretty close to impossible to administrate your way out of a data 
model that doesn’t play to Cassandra’s strengths. Which is true for other data 
storage technologies – you need to model the data the way that the engine is 
designed to work.


Sean Durity

From: DuyHai Doan 
Sent: Wednesday, February 13, 2019 8:08 AM
To: user 
Subject: [EXTERNAL] Re: Make large partitons lighter on select without changing 
primary partition formation.

Plain answer is NO

There is a slight hope that the JIRA 
https://issues.apache.org/jira/browse/CASSANDRA-9754
 gets into 4.0 release

But right now, there seems to be few interest in this ticket, the last comment 
23/Feb/2017 old ...


On Wed, Feb 13, 2019 at 1:18 PM Vsevolod Filaretov 
mailto:vsfilare...@gmail.com>> wrote:
Hi all,

The question.

We have Cassandra 3.11.1 with really heavy primary partitions:
cfhistograms 95%% is 130+ mb, 95%% cell count is 3.3mln and higher, 98%% 
partition size is 220+ mb sometimes partitions are 1+ gb. We have regular 
problems with node lockdowns leading to read request timeouts under read 
requests load.

Changing primary partition key structure is out of question.

Are there any sharding techniques available to dilute partitions at level lower 
than 'select' requests to make read performance better? Without changing read 
requests syntax?

Thank you all in advance,
Vsevolod Filaretov.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: [EXTERNAL] Re: Bootstrap keeps failing

2019-02-07 Thread Léo FERLIN SUTTON

Thank you for the recommendation.

We are already using datastax's recommended settings for tcp_keepalive.

Regards,

Leo

On Thu, Feb 7, 2019 at 5:49 PM Durity, Sean R 
wrote:

> I have seen unreliable streaming (streaming that doesn’t finish) because
> of TCP timeouts from firewalls or switches. The default tcp_keepalive
> kernel parameters are usually not tuned for that. See
> https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/idleFirewallLinux.html
> for more details. These “remote” timeouts are difficult to detect or prove
> if you don’t have access to the intermediate network equipment.
>
>
>
> Sean Durity
>
> *From:* Léo FERLIN SUTTON 
> *Sent:* Thursday, February 07, 2019 10:26 AM
> *To:* user@cassandra.apache.org; dinesh.jo...@yahoo.com
> *Subject:* [EXTERNAL] Re: Bootstrap keeps failing
>
>
>
> Hello !
>
> Thank you for your answers.
>
>
>
> So I have tried, multiple times, to start bootstrapping from scratch. I
> often have the same problem (on other nodes as well) but sometimes it works
> and I can move on to another node.
>
>
>
> I have joined a jstack dump and some logs.
>
>
>
> Our node was shut down at around 97% disk space used.
>
> I turned it back on and it starting the bootstrap process again.
>
>
>
> The log file is the log from this attempt, same for the thread dump.
>
>
>
> Small warning, I have somewhat anonymised the log files so there may be
> some inconsistencies.
>
>
>
> Regards,
>
>
>
> Leo
>
>
>
> On Thu, Feb 7, 2019 at 8:13 AM dinesh.jo...@yahoo.com.INVALID <
> dinesh.jo...@yahoo.com.invalid> wrote:
>
> Would it be possible for you to take a thread dump & logs and share them?
>
>
>
> Dinesh
>
>
>
>
>
> On Wednesday, February 6, 2019, 10:09:11 AM PST, Léo FERLIN SUTTON <
> lfer...@mailjet.com.INVALID> wrote:
>
>
>
>
>
> Hello !
>
>
>
> I am having a recurrent problem when trying to bootstrap a few new nodes.
>
>
>
> Some general info :
>
>- I am running cassandra 3.0.17
>- We have about 30 nodes in our cluster
>- All healthy nodes have between 60% to 90% used disk space on
>/var/lib/cassandra
>
> So I create a new node and let auto_bootstrap do it's job. After a few
> days the bootstrapping node stops streaming new data but is still not a
> member of the cluster.
>
>
>
> `nodetool status` says the node is still joining,
>
>
>
> When this happens I run `nodetool bootstrap resume`. This usually ends up
> in two different ways :
>
>1. The node fills up to 100% disk space and crashes.
>2. The bootstrap resume finishes with errors
>
> When I look at `nodetool netstats -H` is  looks like `bootstrap resume`
> does not resume but restarts a full transfer of every data from every node.
>
>
>
> This is the output I get from `nodetool resume` :
>
> [2019-02-06 01:39:14,369] received file
> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-225-big-Data.db
> (progress: 2113%)
>
> [2019-02-06 01:39:16,821] received file
> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-88-big-Data.db
> (progress: 2113%)
>
> [2019-02-06 01:39:17,003] received file
> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-89-big-Data.db
> (progress: 2113%)
>
> [2019-02-06 01:39:17,032] session with /10.16.XX.YYY complete (progress:
> 2113%)
>
> [2019-02-06 01:41:15,160] received file
> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-220-big-Data.db
> (progress: 2113%)
>
> [2019-02-06 01:42:02,864] received file
> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-226-big-Data.db
> (progress: 2113%)
>
> [2019-02-06 01:42:09,284] received file
> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-227-big-Data.db
> (progress: 2113%)
>
> [2019-02-06 01:42:10,522] received file
> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-228-big-Data.db
> (progress: 2113%)
>
> [2019-02-06 01:42:10,622] received file
> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-229-big-Data.db
> (progress: 2113%)
>
> [2019-02-06 01:42:11,925] received file
> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-90-big-Data.db
> (progress: 2114%)
>
> [2019-02-06 01:42:14,887] received file
> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-91-big-Data.db
> (progress: 2114%)
>
> [2019-02-06 01:42:14,980] session with /10.16.XX.ZZZ complete (progress:
> 2114%)
>
> [2019-02-06 01:42:14,980] Stream failed
>
> [2019-02-06 01:42:14,982] Error during bootstrap: Stream failed
>
> [2019-02-06 01:42:14,982] Resume bootstrap complete
>
>
>
> The bootstrap `progress` goes way over 100% and eventually fails.
>
>
>
>
>
> Right now I have a node with this output from `nodetool status` :
>
> `UJ  10.16.XX.YYY  2.93 TB256  ?
>  5788f061-a3c0-46af-b712-ebeecd397bf7  c`
>
>
>
> It is almost filled with data, yet if I look at

RE: [EXTERNAL] Re: Bootstrap keeps failing

2019-02-07 Thread Durity, Sean R

I have seen unreliable streaming (streaming that doesn’t finish) because of TCP 
timeouts from firewalls or switches. The default tcp_keepalive kernel 
parameters are usually not tuned for that. See 
https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/idleFirewallLinux.html
 for more details. These “remote” timeouts are difficult to detect or prove if 
you don’t have access to the intermediate network equipment.

Sean Durity
From: Léo FERLIN SUTTON 
Sent: Thursday, February 07, 2019 10:26 AM
To: user@cassandra.apache.org; dinesh.jo...@yahoo.com
Subject: [EXTERNAL] Re: Bootstrap keeps failing

Hello !

Thank you for your answers.

So I have tried, multiple times, to start bootstrapping from scratch. I often 
have the same problem (on other nodes as well) but sometimes it works and I can 
move on to another node.

I have joined a jstack dump and some logs.

Our node was shut down at around 97% disk space used.
I turned it back on and it starting the bootstrap process again.

The log file is the log from this attempt, same for the thread dump.

Small warning, I have somewhat anonymised the log files so there may be some 
inconsistencies.

Regards,

Leo

On Thu, Feb 7, 2019 at 8:13 AM 
dinesh.jo...@yahoo.com.INVALID 
mailto:dinesh.jo...@yahoo.com.invalid>> wrote:
Would it be possible for you to take a thread dump & logs and share them?

Dinesh


On Wednesday, February 6, 2019, 10:09:11 AM PST, Léo FERLIN SUTTON 
mailto:lfer...@mailjet.com.INVALID>> wrote:


Hello !

I am having a recurrent problem when trying to bootstrap a few new nodes.

Some general info :

  *   I am running cassandra 3.0.17
  *   We have about 30 nodes in our cluster
  *   All healthy nodes have between 60% to 90% used disk space on 
/var/lib/cassandra
So I create a new node and let auto_bootstrap do it's job. After a few days the 
bootstrapping node stops streaming new data but is still not a member of the 
cluster.

`nodetool status` says the node is still joining,

When this happens I run `nodetool bootstrap resume`. This usually ends up in 
two different ways :

  1.  The node fills up to 100% disk space and crashes.
  2.  The bootstrap resume finishes with errors
When I look at `nodetool netstats -H` is  looks like `bootstrap resume` does 
not resume but restarts a full transfer of every data from every node.

This is the output I get from `nodetool resume` :
[2019-02-06 01:39:14,369] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-225-big-Data.db
 (progress: 2113%)
[2019-02-06 01:39:16,821] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-88-big-Data.db
 (progress: 2113%)
[2019-02-06 01:39:17,003] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-89-big-Data.db
 (progress: 2113%)
[2019-02-06 01:39:17,032] session with /10.16.XX.YYY complete (progress: 2113%)
[2019-02-06 01:41:15,160] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-220-big-Data.db
 (progress: 2113%)
[2019-02-06 01:42:02,864] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-226-big-Data.db
 (progress: 2113%)
[2019-02-06 01:42:09,284] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-227-big-Data.db
 (progress: 2113%)
[2019-02-06 01:42:10,522] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-228-big-Data.db
 (progress: 2113%)
[2019-02-06 01:42:10,622] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-229-big-Data.db
 (progress: 2113%)
[2019-02-06 01:42:11,925] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-90-big-Data.db
 (progress: 2114%)
[2019-02-06 01:42:14,887] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-91-big-Data.db
 (progress: 2114%)
[2019-02-06 01:42:14,980] session with /10.16.XX.ZZZ complete (progress: 2114%)
[2019-02-06 01:42:14,980] Stream failed
[2019-02-06 01:42:14,982] Error during bootstrap: Stream failed
[2019-02-06 01:42:14,982] Resume bootstrap complete

The bootstrap `progress` goes way over 100% and eventually fails.


Right now I have a node with this output from `nodetool status` :
`UJ  10.16.XX.YYY  2.93 TB256  ? 
5788f061-a3c0-46af-b712-ebeecd397bf7  c`

It is almost filled with data, yet if I look at `nodetool netstats` :
Receiving 480 files, 325.39 GB total. Already received 5 files, 68.32 
MB total
Receiving 499 files, 328.96 GB total. Already received 1 files, 1.32 GB 
total
Receiving 506 files, 345.33 GB total. Already received 6 files, 24.19 
MB total
Receiving 362 files, 206.73 GB total. Already received 7 files, 34 MB 
total
Receiving 424 files, 281.25 GB total. Already

RE: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

2019-02-07 Thread Durity, Sean R

Kenneth is right. Trying to port/support a relational model to a CQL model the 
way you are doing it is not going to go well. You won’t be able to scale or get 
the search flexibility that you want. It will make Cassandra seem like a bad 
fit. You want to play to Cassandra’s strengths – availability, low latency, 
scalability, etc. so you need to store the data the way you want to retrieve it 
(query first modeling!). You could look at defining the “right” partition and 
clustering keys, so that the searches are within a single, reasonably sized 
partition. And you could have lookup tables for other common search patterns 
(item_by_model_name, etc.)

If that kind of modeling gets you to a situation where you have too many lookup 
tables to keep consistent, you could consider something like DataStax 
Enterprise Search (embedded SOLR) to create SOLR indexes on searchable fields. 
A SOLR query will typically be an order of magnitude slower than a partition 
key lookup, though.

It really boils down to the purpose of the data store. If you are looking for 
primarily an “anything goes” search engine, Cassandra may not be a good choice. 
If you need Cassandra-level availability, extremely low latency queries (on 
known access patterns), high volume/low latency writes, easy scalability, etc. 
then you are going to have to rethink how you model the data.


Sean Durity

From: Kenneth Brotman 
Sent: Thursday, February 07, 2019 7:01 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

Peter,

Sounds like you may need to use a different architecture.  Perhaps you need 
something like Presto or Kafka as a part of the solution.  If the data from the 
legacy system is wrong for Cassandra it’s an ETL problem?  You’d have to 
transform the data you want to use with Cassandra so that a proper data model 
for Cassandra can be used.

From: Peter Heitman [mailto:pe...@heitman.us]
Sent: Wednesday, February 06, 2019 10:05 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

Yes, I have read the material. The problem is that the application has a query 
facility available to the user where they can type in "(A = foo AND B = bar) OR 
C = chex" where A, B, and C are from a defined list of terms, many of which are 
columns in the mytable below while others are from other tables. This query 
facility was implemented and shipped years before we decided to move to 
Cassandra
On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman 
mailto:kenbrot...@yahoo.com.invalid>> wrote:
The problem is you’re not using a query first design.  I would recommend first 
reading chapter 5 of Cassandra: The Definitive Guide by Jeff Carpenter and Eben 
Hewitt.  It’s available free online at this 
link.

Kenneth Brotman

From: Peter Heitman [mailto:pe...@heitman.us]
Sent: Wednesday, February 06, 2019 6:33 PM

To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

Yes, I "know" that allow filtering is a sign of a (possibly fatal) inefficient 
data model. I haven't figured out how to do it correctly yet
On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman 
mailto:kenbrot...@yahoo.com.invalid>> wrote:
Exactly.  When you design your data model correctly you shouldn’t have to use 
ALLOW FILTERING in the queries.  That is not recommended.

Kenneth Brotman

From: Peter Heitman [mailto:pe...@heitman.us]
Sent: Wednesday, February 06, 2019 6:09 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

You are completely right! My problem is that I am trying to port code for SQL 
to CQL for an application that provides the user with a relatively general 
search facility. The original implementation didn't worry about secondary 
indexes - it just took advantage of the ability to create arbitrarily complex 
queries with inner joins, left joins, etc. I am reimplimenting it to create a 
parse tree of CQL queries and doing the ANDs and ORs in the application. Of 
course once I get enough of this implemented I will have to load up the table 
with a large data set and see if it gives acceptable performance for our use 
case.
On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman 
mailto:kenbrotman@yahoo.cominvalid>> wrote:
Isn’t that a lot of SASI indexes for one table.  Could you denormalize

Re: [EXTERNAL] fine tuning for wide rows and mixed worload system

2019-01-11 Thread Marco Gasparini

Hi Sean,

> I will start – knowing that others will have additional help/questions
I hope that, I really need help with this :)

> What heap size are you using? Sounds like you are using the CMS garbage
collector.

Yes, I'm using CMS garbage Collector. I have not used G1 because I read it
isn't recommended but if you are saying that is going to help me with my
use case I have no objection in using it. I will try.
I have 3 nodes: node1 has 32GB and node2 and node3 16 GB. I'm currently
using 50% RAM for each node.


> Spinning disks are a problem, too. Can you tell if the IO is getting
overwhelmed? SSDs are much preferred.

I'm not sure about it, 'dstat' and 'iostat' tell me that rMB/s is
constantly above 100MB/s and %util is closed to 100% and in these
conditions the node is frozen.
HDD specifics says that maximum transfer rate is 175MB/s for node1 and
155MB/s for node2 and node3.
Unfortunately switching to spinning disk to SSD is not an option.



> Read before write is usually an anti-pattern for Cassandra. From your
queries, it seems you have a partition key and clustering key.
Can you give us the table schema? I’m also concerned about the IF EXISTS in
your delete.
I think that invokes a light weight transaction – costly for performance.
Is it really required for your use case?

I don't need the 'IF EXISTS' parameter. Actually is pretty much a refuse
from an old query and I can try to remove this.

Here the schema:

CREATE KEYSPACE my_keyspace WITH replication = {'class':
'NetworkTopologyStrategy', 'DC1': '3'}  AND durable_writes = false;
CREATE TABLE my_keyspace.my_table (
pkey text,
event_datetime timestamp,
f1 text,
f2 text,
f3 text,
f4 text,
f5 int,
f6 bigint,
f7 bigint,
f8 text,
f9 text,
PRIMARY KEY (pkey, event_datetime)
) WITH CLUSTERING ORDER BY (event_datetime DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 9
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';


Thank you very much
Marco

Il giorno ven 11 gen 2019 alle ore 16:14 Durity, Sean R <
sean_r_dur...@homedepot.com> ha scritto:

> I will start – knowing that others will have additional help/questions.
>
>
>
> What heap size are you using? Sounds like you are using the CMS garbage
> collector. That takes some arcane knowledge and lots of testing to tune. I
> would start with G1 and using ½ the available RAM as the heap size. I would
> want 32 GB RAM as a minimum on the hosts.
>
>
>
> Spinning disks are a problem, too. Can you tell if the IO is getting
> overwhelmed? SSDs are much preferred.
>
>
>
> Read before write is usually an anti-pattern for Cassandra. From your
> queries, it seems you have a partition key and clustering key. Can you give
> us the table schema? I’m also concerned about the IF EXISTS in your delete.
> I think that invokes a light weight transaction – costly for performance.
> Is it really required for your use case?
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Marco Gasparini 
> *Sent:* Friday, January 11, 2019 8:20 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] fine tuning for wide rows and mixed worload system
>
>
>
> Hello everyone,
>
>
>
> I need some advise in order to solve my use case problem. I have already
> tried some solutions but it didn't work out.
>
> Can you help me with the following configuration please? any help is very
> appreciate
>
>
>
> I'm using:
>
> - Cassandra 3.11.3
>
> - java version "1.8.0_191"
>
>
>
> My use case is composed by the following constraints:
>
> - about 1M reads per day (it is going to rise up)
>
> - about 2M writes per day (it is going to rise up)
>
> - there is a high peek of requests in less than 2 hours in which the
> system receives half of all day traffic (500K reads, 1M writes)
>
> - each request is composed by 1 read and 2 writes (1 delete + 1 write)
>
>
>
> * the read query selects max 3 records based on the primary
> key (select * from my_keyspace.my_table where pkey = ? limit 3)
>
> * then is performed a deletion of one record (delete from
> my_keyspace.my_table where pkey = ? and event_datetime = ? IF EXISTS)
>
> * finally the new data is stored (insert into
> my_keyspace.my_table (event_datetime, pkey, agent, some_id, ft, ftt..)
> values (?,?,?,?,?,?...))
>
>
>
> - each row is pretty wide. I don't really know the exact size because
> there are 2 dynamic text columns that stores

RE: [EXTERNAL] fine tuning for wide rows and mixed worload system

2019-01-11 Thread Durity, Sean R

I will start – knowing that others will have additional help/questions.

What heap size are you using? Sounds like you are using the CMS garbage 
collector. That takes some arcane knowledge and lots of testing to tune. I 
would start with G1 and using ½ the available RAM as the heap size. I would 
want 32 GB RAM as a minimum on the hosts.

Spinning disks are a problem, too. Can you tell if the IO is getting 
overwhelmed? SSDs are much preferred.

Read before write is usually an anti-pattern for Cassandra. From your queries, 
it seems you have a partition key and clustering key. Can you give us the table 
schema? I’m also concerned about the IF EXISTS in your delete. I think that 
invokes a light weight transaction – costly for performance. Is it really 
required for your use case?


Sean Durity

From: Marco Gasparini 
Sent: Friday, January 11, 2019 8:20 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] fine tuning for wide rows and mixed worload system

Hello everyone,

I need some advise in order to solve my use case problem. I have already tried 
some solutions but it didn't work out.
Can you help me with the following configuration please? any help is very 
appreciate

I'm using:
- Cassandra 3.11.3
- java version "1.8.0_191"

My use case is composed by the following constraints:
- about 1M reads per day (it is going to rise up)
- about 2M writes per day (it is going to rise up)
- there is a high peek of requests in less than 2 hours in which the system 
receives half of all day traffic (500K reads, 1M writes)
- each request is composed by 1 read and 2 writes (1 delete + 1 write)

* the read query selects max 3 records based on the primary key 
(select * from my_keyspace.my_table where pkey = ? limit 3)
* then is performed a deletion of one record (delete from 
my_keyspace.my_table where pkey = ? and event_datetime = ? IF EXISTS)
* finally the new data is stored (insert into my_keyspace.my_table 
(event_datetime, pkey, agent, some_id, ft, ftt..) values (?,?,?,?,?,?...))

- each row is pretty wide. I don't really know the exact size because there are 
2 dynamic text columns that stores data between 1MB to 50MB length each.
  So, reads are going to be huge because I read 3 records of that dimension 
every time. Writes are complex as well because each row is that wide.

Currently, I own 3 nodes with the following properties:
- node1:
* Intel Core i7-3770
* 2x HDD SATA 3,0 TB
* 4x RAM 8192 MB DDR3
* nominative bit rate 175MB/s
# blockdev --report /dev/sd[ab]
RORA   SSZ   BSZ   StartSecSize   Device
rw   256   512  4096  0   3000592982016   
/dev/sda
rw   256   512  4096  0   3000592982016   
/dev/sdb

- node2,3:
* Intel Core i7-2600
* 2x HDD SATA 3,0 TB
* 4x RAM 4096 MB DDR3
* nominative bit rate 155MB/s
# blockdev --report /dev/sd[ab]
RORA   SSZ   BSZ   StartSecSize   Device
rw   256   512  4096  0   3000592982016   
/dev/sda
rw   256   512  4096  0   3000592982016   
/dev/sdb

Each node has 2 disks but I have disabled RAID option and I have created a 
virtual single disk in order to get much free space.
Can this configuration create issues?

I have already tried some configurations in order to make it work, like:
1) straigthforward attempt
- default Cassandra configuration (cassandra.yaml)
- RF=1
- SizeTieredCompactionStrategy  (write strategy)
- no row cache (because of wide rows dimension is better to have no 
row cache)
- gc_grace_seconds = 1 day (unfortunately, I did no repair schedule 
at all)
results:
too many timeouts, losing data

2)
- added repair schedules
- RF=3 (in order increase reads speed)
results:
- too many timeouts, losing data
- high I/O consumption on each nodes (iostat shows 100% 
in %util on each nodes, dstat shows hundred of M read for each iteration)
- node2 frozen until I stopped data writes.
- node3 almost frozen
- many panding MutationStage events in TPSTATS in node2
- many full GC
- many HintsDispatchExecutor events in system.log

actual)
- added repair schedules
- RF=3
- set durable_writes = false in order to speed up writes
- increased young heap
- decreased SurviviorRatio in order to get much young size 
available because of wide rows data
- increased from 1 to 3 MaxTenuringThreshold in order to decrease 
reads latency
- increased

RE: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-10 Thread Durity, Sean R

RF in the Analytics DC can be 2 (or even 1) if storage cost is more important 
than availability. There is a storage (and CPU and network latency) cost for a 
separate Spark cluster. So, the variables of your specific use case may swing 
the decision in different directions.

Sean Durity
From: Dor Laor 
Sent: Wednesday, January 09, 2019 11:23 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Good way of configuring Apache spark with Apache 
Cassandra

On Wed, Jan 9, 2019 at 7:28 AM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
I think you could consider option C: Create a (new) analytics DC in Cassandra 
and run your spark nodes there. Then you can address the scaling just on that 
DC. You can also use less vnodes, only replicate certain keyspaces, etc. in 
order to perform the analytics more efficiently.

But this way you duplicate the entire dataset RF times over. It's very very 
expensive.
It is a common practice to run Spark on a separate Cassandra (virtual) 
datacenter but it's done
in order to isolate the analytic workload from the realtime workload for 
isolation and low latency guarantees.
We addressed this problem elsewhere, beyond this scope.

Sean Durity

From: Dor Laor mailto:d...@scylladb.com>>
Sent: Friday, January 04, 2019 4:21 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Good way of configuring Apache spark with Apache 
Cassandra

I strongly recommend option B, separate clusters. Reasons:
 - Networking of node-node is negligible compared to networking within the node
 - Different scaling considerations
   Your workload may require 10 Spark nodes and 20 database nodes, so why 
bundle them?
   This ratio may also change over time as your application evolves and amount 
of data changes.
 - Isolation - If Spark has a spike in cpu/IO utilization, you wouldn't want it 
to affect Cassandra and the opposite.
   If you isolate it with cgroups, you may have too much idle time when the 
above doesn't happen.

On Fri, Jan 4, 2019 at 12:47 PM Goutham reddy 
mailto:goutham.chiru...@gmail.com>> wrote:
Hi,
We have requirement of heavy data lifting and analytics requirement and decided 
to go with Apache Spark. In the process we have come up with two patterns
a. Apache Spark and Apache Cassandra co-located and shared on same nodes.
b. Apache Spark on one independent cluster and Apache Cassandra as one 
independent cluster.

Need good pattern how to use the analytic engine for Cassandra. Thanks in 
advance.

Regards
Goutham.

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

RE: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-10 Thread Durity, Sean R

At this point, I would be talking to DataStax. They already have Spark and 
SOLR/search fully embedded in their product. You can look at their docs for 
some idea of the RAM and CPU required for combined Search/Analytics use cases. 
I would expect this to be a much faster route to production.

Sean Durity
From: Goutham reddy 
Sent: Wednesday, January 09, 2019 11:29 AM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Good way of configuring Apache spark with Apache 
Cassandra

Thanks Sean. But what if I want to have both Spark and elasticsearch with 
Cassandra as separare data center. Does that cause any overhead ?

On Wed, Jan 9, 2019 at 7:28 AM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
I think you could consider option C: Create a (new) analytics DC in Cassandra 
and run your spark nodes there. Then you can address the scaling just on that 
DC. You can also use less vnodes, only replicate certain keyspaces, etc. in 
order to perform the analytics more efficiently.

Sean Durity

From: Dor Laor mailto:d...@scylladb.com>>
Sent: Friday, January 04, 2019 4:21 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Good way of configuring Apache spark with Apache 
Cassandra

I strongly recommend option B, separate clusters. Reasons:
 - Networking of node-node is negligible compared to networking within the node
 - Different scaling considerations
   Your workload may require 10 Spark nodes and 20 database nodes, so why 
bundle them?
   This ratio may also change over time as your application evolves and amount 
of data changes.
 - Isolation - If Spark has a spike in cpu/IO utilization, you wouldn't want it 
to affect Cassandra and the opposite.
   If you isolate it with cgroups, you may have too much idle time when the 
above doesn't happen.

On Fri, Jan 4, 2019 at 12:47 PM Goutham reddy 
mailto:goutham.chiru...@gmail.com>> wrote:
Hi,
We have requirement of heavy data lifting and analytics requirement and decided 
to go with Apache Spark. In the process we have come up with two patterns
a. Apache Spark and Apache Cassandra co-located and shared on same nodes.
b. Apache Spark on one independent cluster and Apache Cassandra as one 
independent cluster.

Need good pattern how to use the analytic engine for Cassandra. Thanks in 
advance.

Regards
Goutham.

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.
--
Regards
Goutham Reddy

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-09 Thread Dor Laor

On Wed, Jan 9, 2019 at 7:28 AM Durity, Sean R 
wrote:

> I think you could consider option C: Create a (new) analytics DC in
> Cassandra and run your spark nodes there. Then you can address the scaling
> just on that DC. You can also use less vnodes, only replicate certain
> keyspaces, etc. in order to perform the analytics more efficiently.
>

But this way you duplicate the entire dataset RF times over. It's very very
expensive.
It is a common practice to run Spark on a separate Cassandra (virtual)
datacenter but it's done
in order to isolate the analytic workload from the realtime workload for
isolation and low latency guarantees.
We addressed this problem elsewhere, beyond this scope.


>
>
>
> Sean Durity
>
>
>
> *From:* Dor Laor 
> *Sent:* Friday, January 04, 2019 4:21 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Good way of configuring Apache spark with
> Apache Cassandra
>
>
>
> I strongly recommend option B, separate clusters. Reasons:
>
>  - Networking of node-node is negligible compared to networking within the
> node
>
>  - Different scaling considerations
>
>Your workload may require 10 Spark nodes and 20 database nodes, so why
> bundle them?
>
>This ratio may also change over time as your application evolves and
> amount of data changes.
>
>  - Isolation - If Spark has a spike in cpu/IO utilization, you wouldn't
> want it to affect Cassandra and the opposite.
>
>If you isolate it with cgroups, you may have too much idle time when
> the above doesn't happen.
>
>
>
>
>
> On Fri, Jan 4, 2019 at 12:47 PM Goutham reddy 
> wrote:
>
> Hi,
>
> We have requirement of heavy data lifting and analytics requirement and
> decided to go with Apache Spark. In the process we have come up with two
> patterns
>
> a. Apache Spark and Apache Cassandra co-located and shared on same nodes.
>
> b. Apache Spark on one independent cluster and Apache Cassandra as one
> independent cluster.
>
>
>
> Need good pattern how to use the analytic engine for Cassandra. Thanks in
> advance.
>
>
>
> Regards
>
> Goutham.
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-09 Thread Jonathan Haddad

> I’m still not sure if having tombstones vs. empty values / frozen UDTs
will have the same results.

When in doubt, benchmark.

Good luck,
Jon

On Wed, Jan 9, 2019 at 3:02 PM Tomas Bartalos 
wrote:

> Loosing atomic updates is a good point, but in my use case its not a
> problem, since I always overwrite the whole record (no partitial updates).
>
> I’m still not sure if having tombstones vs. empty values / frozen UDTs
> will have the same results.
> When I update one row with 10 null columns it will create 10 tombstones.
> We do OLAP processing of data stored in Cassandra with Spark.
>
> When Spark requests range of data, lets say 1000 rows, I can easily hit
> the 10 000 tombstones threshold.
>
> Even if I would not hit the error threshold Spark requests would increase
> the heap pressure, because tombstones have to be collected and returned to
> coordinator.
>
> Are my assumptions correct ?
>
> On 4 Jan 2019, at 21:15, DuyHai Doan  wrote:
>
> The idea of storing your data as a single blob can be dangerous.
>
> Indeed, you loose the ability to perform atomic update on each column.
>
> In Cassandra, LWW is the rule. Suppose 2 concurrent updates on the same
> row, 1st update changes column Firstname (let's say it's a Person record)
> and 2nd update changes column Lastname
>
> Now depending on the timestamp between the 2 updates, you'll have:
>
> - old Firstname, new Lastname
> - new Firstname, old Lastname
>
> having updates on columns atomically guarantees you to have new Firstname,
> new Lastname
>
> On Fri, Jan 4, 2019 at 8:17 PM Jonathan Haddad  wrote:
>
>> Those are two different cases though.  It *sounds like* (again, I may be
>> missing the point) you're trying to overwrite a value with another value.
>> You're either going to serialize a blob and overwrite a single cell, or
>> you're going to overwrite all the cells and include a tombstone.
>>
>> When you do a read, reading a single tombstone vs a single vs is
>> essentially the same thing, performance wise.
>>
>> In your description you said "~ 20-100 events", and you're overwriting
>> the event each time, so I don't know how you go to 10K tombstones either.
>> Compaction will bring multiple tombstones together for a cell in the same
>> way it compacts multiple values for a single cell.
>>
>> I sounds to make like you're taking some advice about tombstones out of
>> context and trying to apply the advice to a different problem.  Again, I
>> might be misunderstanding what you're doing.
>>
>>
>> On Fri, Jan 4, 2019 at 10:49 AM Tomas Bartalos 
>> wrote:
>>
>>> Hello Jon,
>>>
>>> I thought having tombstones is much higher overhead than just
>>> overwriting values. The compaction overhead can be l similar, but I think
>>> the read performance is much worse.
>>>
>>> Tombstones accumulate and hang for 10 days (by default) before they are
>>> eligible for compaction.
>>>
>>> Also we have tombstone warning and error thresholds. If cassandra scans
>>> more than 10 000 tombstones, she will abort the query.
>>>
>>> According to this article:
>>> https://opencredo.com/blogs/cassandra-tombstones-common-issues/
>>>
>>> "The cassandra.yaml comments explain in perfectly: *“When executing a
>>> scan, within or across a partition, we need to keep the tombstones seen in
>>> memory so we can return them to the coordinator, which will use them to
>>> make sure other replicas also know about the deleted rows. With workloads
>>> that generate a lot of tombstones, this can cause performance problems and
>>> even exhaust the server heap. "*
>>>
>>> Regards,
>>> Tomas
>>>
>>> On Fri, 4 Jan 2019, 7:06 pm Jonathan Haddad >>
 If you're overwriting values, it really doesn't matter much if it's a
 tombstone or any other value, they still need to be compacted and have the
 same overhead at read time.

 Tombstones are problematic when you try to use Cassandra as a queue (or
 something like a queue) and you need to scan over thousands of tombstones
 in order to get to the real data.  You're simply overwriting a row and
 trying to avoid a single tombstone.

 Maybe I'm missing something here.  Why do you think overwriting a
 single cell with a tombstone is any worse than overwriting a single cell
 with a value?

 Jon


 On Fri, Jan 4, 2019 at 9:57 AM Tomas Bartalos 
 wrote:

> Hello,
>
> I beleive your approach is the same as using spark with "
> spark.cassandra.output.ignoreNulls=true"
> This will not cover the situation when a value have to be overwriten
> with null.
>
> I found one possible solution - change the schema to keep only primary
> key fields and move all other fields to frozen UDT.
> create table (year, month, day, id, frozen, primary key((year,
> month, day), id) )
> In this way anything that is null inside event doesn't create
> tombstone, since event is serialized to BLOB.
> The penalty is in need of deserializing the whole Event when

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-09 Thread Tomas Bartalos

Loosing atomic updates is a good point, but in my use case its not a problem, 
since I always overwrite the whole record (no partitial updates).

I’m still not sure if having tombstones vs. empty values / frozen UDTs will 
have the same results.
When I update one row with 10 null columns it will create 10 tombstones.
We do OLAP processing of data stored in Cassandra with Spark.

When Spark requests range of data, lets say 1000 rows, I can easily hit the 10 
000 tombstones threshold.

Even if I would not hit the error threshold Spark requests would increase the 
heap pressure, because tombstones have to be collected and returned to 
coordinator. 

Are my assumptions correct ?

> On 4 Jan 2019, at 21:15, DuyHai Doan  wrote:
> 
> The idea of storing your data as a single blob can be dangerous.
> 
> Indeed, you loose the ability to perform atomic update on each column.
> 
> In Cassandra, LWW is the rule. Suppose 2 concurrent updates on the same row, 
> 1st update changes column Firstname (let's say it's a Person record) and 2nd 
> update changes column Lastname
> 
> Now depending on the timestamp between the 2 updates, you'll have:
> 
> - old Firstname, new Lastname
> - new Firstname, old Lastname
> 
> having updates on columns atomically guarantees you to have new Firstname, 
> new Lastname
> 
> On Fri, Jan 4, 2019 at 8:17 PM Jonathan Haddad  > wrote:
> Those are two different cases though.  It *sounds like* (again, I may be 
> missing the point) you're trying to overwrite a value with another value.  
> You're either going to serialize a blob and overwrite a single cell, or 
> you're going to overwrite all the cells and include a tombstone.
> 
> When you do a read, reading a single tombstone vs a single vs is essentially 
> the same thing, performance wise.  
> 
> In your description you said "~ 20-100 events", and you're overwriting the 
> event each time, so I don't know how you go to 10K tombstones either.  
> Compaction will bring multiple tombstones together for a cell in the same way 
> it compacts multiple values for a single cell.  
> 
> I sounds to make like you're taking some advice about tombstones out of 
> context and trying to apply the advice to a different problem.  Again, I 
> might be misunderstanding what you're doing.
> 
> 
> On Fri, Jan 4, 2019 at 10:49 AM Tomas Bartalos  > wrote:
> Hello Jon, 
> 
> I thought having tombstones is much higher overhead than just overwriting 
> values. The compaction overhead can be l similar, but I think the read 
> performance is much worse.
> 
> Tombstones accumulate and hang for 10 days (by default) before they are 
> eligible for compaction. 
> 
> Also we have tombstone warning and error thresholds. If cassandra scans more 
> than 10 000 tombstones, she will abort the query.
> 
> According to this article: 
> https://opencredo.com/blogs/cassandra-tombstones-common-issues/ 
> 
> 
> "The cassandra.yaml comments explain in perfectly: “When executing a scan, 
> within or across a partition, we need to keep the tombstones seen in memory 
> so we can return them to the coordinator, which will use them to make sure 
> other replicas also know about the deleted rows. With workloads that generate 
> a lot of tombstones, this can cause performance problems and even exhaust the 
> server heap. "
> 
> Regards, 
> Tomas
> 
> On Fri, 4 Jan 2019, 7:06 pm Jonathan Haddad   wrote:
> If you're overwriting values, it really doesn't matter much if it's a 
> tombstone or any other value, they still need to be compacted and have the 
> same overhead at read time.  
> 
> Tombstones are problematic when you try to use Cassandra as a queue (or 
> something like a queue) and you need to scan over thousands of tombstones in 
> order to get to the real data.  You're simply overwriting a row and trying to 
> avoid a single tombstone.  
> 
> Maybe I'm missing something here.  Why do you think overwriting a single cell 
> with a tombstone is any worse than overwriting a single cell with a value?
> 
> Jon
> 
> 
> On Fri, Jan 4, 2019 at 9:57 AM Tomas Bartalos  > wrote:
> Hello,
> 
> I beleive your approach is the same as using spark with 
> "spark.cassandra.output.ignoreNulls=true"
> This will not cover the situation when a value have to be overwriten with 
> null. 
> 
> I found one possible solution - change the schema to keep only primary key 
> fields and move all other fields to frozen UDT.
> create table (year, month, day, id, frozen, primary key((year, month, 
> day), id) )
> In this way anything that is null inside event doesn't create tombstone, 
> since event is serialized to BLOB.
> The penalty is in need of deserializing the whole Event when selecting only 
> few columns. 
> Can anyone confirm if this is good solution performance wise?
> 
> Thank you, 
> 
> On Fri, 4 Jan 2019, 2:20 pm DuyHai

Re: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-09 Thread Goutham reddy

Thanks Sean. But what if I want to have both Spark and elasticsearch with
Cassandra as separare data center. Does that cause any overhead ?

On Wed, Jan 9, 2019 at 7:28 AM Durity, Sean R 
wrote:

> I think you could consider option C: Create a (new) analytics DC in
> Cassandra and run your spark nodes there. Then you can address the scaling
> just on that DC. You can also use less vnodes, only replicate certain
> keyspaces, etc. in order to perform the analytics more efficiently.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Dor Laor 
> *Sent:* Friday, January 04, 2019 4:21 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Good way of configuring Apache spark with
> Apache Cassandra
>
>
>
> I strongly recommend option B, separate clusters. Reasons:
>
>  - Networking of node-node is negligible compared to networking within the
> node
>
>  - Different scaling considerations
>
>Your workload may require 10 Spark nodes and 20 database nodes, so why
> bundle them?
>
>This ratio may also change over time as your application evolves and
> amount of data changes.
>
>  - Isolation - If Spark has a spike in cpu/IO utilization, you wouldn't
> want it to affect Cassandra and the opposite.
>
>If you isolate it with cgroups, you may have too much idle time when
> the above doesn't happen.
>
>
>
>
>
> On Fri, Jan 4, 2019 at 12:47 PM Goutham reddy 
> wrote:
>
> Hi,
>
> We have requirement of heavy data lifting and analytics requirement and
> decided to go with Apache Spark. In the process we have come up with two
> patterns
>
> a. Apache Spark and Apache Cassandra co-located and shared on same nodes.
>
> b. Apache Spark on one independent cluster and Apache Cassandra as one
> independent cluster.
>
>
>
> Need good pattern how to use the analytic engine for Cassandra. Thanks in
> advance.
>
>
>
> Regards
>
> Goutham.
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
-- 
Regards
Goutham Reddy

RE: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-09 Thread Durity, Sean R

I think you could consider option C: Create a (new) analytics DC in Cassandra 
and run your spark nodes there. Then you can address the scaling just on that 
DC. You can also use less vnodes, only replicate certain keyspaces, etc. in 
order to perform the analytics more efficiently.


Sean Durity

From: Dor Laor 
Sent: Friday, January 04, 2019 4:21 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Good way of configuring Apache spark with Apache 
Cassandra

I strongly recommend option B, separate clusters. Reasons:
 - Networking of node-node is negligible compared to networking within the node
 - Different scaling considerations
   Your workload may require 10 Spark nodes and 20 database nodes, so why 
bundle them?
   This ratio may also change over time as your application evolves and amount 
of data changes.
 - Isolation - If Spark has a spike in cpu/IO utilization, you wouldn't want it 
to affect Cassandra and the opposite.
   If you isolate it with cgroups, you may have too much idle time when the 
above doesn't happen.


On Fri, Jan 4, 2019 at 12:47 PM Goutham reddy 
mailto:goutham.chiru...@gmail.com>> wrote:
Hi,
We have requirement of heavy data lifting and analytics requirement and decided 
to go with Apache Spark. In the process we have come up with two patterns
a. Apache Spark and Apache Cassandra co-located and shared on same nodes.
b. Apache Spark on one independent cluster and Apache Cassandra as one 
independent cluster.

Need good pattern how to use the analytic engine for Cassandra. Thanks in 
advance.

Regards
Goutham.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread DuyHai Doan

The idea of storing your data as a single blob can be dangerous.

Indeed, you loose the ability to perform atomic update on each column.

In Cassandra, LWW is the rule. Suppose 2 concurrent updates on the same
row, 1st update changes column Firstname (let's say it's a Person record)
and 2nd update changes column Lastname

Now depending on the timestamp between the 2 updates, you'll have:

- old Firstname, new Lastname
- new Firstname, old Lastname

having updates on columns atomically guarantees you to have new Firstname,
new Lastname

On Fri, Jan 4, 2019 at 8:17 PM Jonathan Haddad  wrote:

> Those are two different cases though.  It *sounds like* (again, I may be
> missing the point) you're trying to overwrite a value with another value.
> You're either going to serialize a blob and overwrite a single cell, or
> you're going to overwrite all the cells and include a tombstone.
>
> When you do a read, reading a single tombstone vs a single vs is
> essentially the same thing, performance wise.
>
> In your description you said "~ 20-100 events", and you're overwriting the
> event each time, so I don't know how you go to 10K tombstones either.
> Compaction will bring multiple tombstones together for a cell in the same
> way it compacts multiple values for a single cell.
>
> I sounds to make like you're taking some advice about tombstones out of
> context and trying to apply the advice to a different problem.  Again, I
> might be misunderstanding what you're doing.
>
>
> On Fri, Jan 4, 2019 at 10:49 AM Tomas Bartalos 
> wrote:
>
>> Hello Jon,
>>
>> I thought having tombstones is much higher overhead than just overwriting
>> values. The compaction overhead can be l similar, but I think the read
>> performance is much worse.
>>
>> Tombstones accumulate and hang for 10 days (by default) before they are
>> eligible for compaction.
>>
>> Also we have tombstone warning and error thresholds. If cassandra scans
>> more than 10 000 tombstones, she will abort the query.
>>
>> According to this article:
>> https://opencredo.com/blogs/cassandra-tombstones-common-issues/
>>
>> "The cassandra.yaml comments explain in perfectly: *“When executing a
>> scan, within or across a partition, we need to keep the tombstones seen in
>> memory so we can return them to the coordinator, which will use them to
>> make sure other replicas also know about the deleted rows. With workloads
>> that generate a lot of tombstones, this can cause performance problems and
>> even exhaust the server heap. "*
>>
>> Regards,
>> Tomas
>>
>> On Fri, 4 Jan 2019, 7:06 pm Jonathan Haddad >
>>> If you're overwriting values, it really doesn't matter much if it's a
>>> tombstone or any other value, they still need to be compacted and have the
>>> same overhead at read time.
>>>
>>> Tombstones are problematic when you try to use Cassandra as a queue (or
>>> something like a queue) and you need to scan over thousands of tombstones
>>> in order to get to the real data.  You're simply overwriting a row and
>>> trying to avoid a single tombstone.
>>>
>>> Maybe I'm missing something here.  Why do you think overwriting a single
>>> cell with a tombstone is any worse than overwriting a single cell with a
>>> value?
>>>
>>> Jon
>>>
>>>
>>> On Fri, Jan 4, 2019 at 9:57 AM Tomas Bartalos 
>>> wrote:
>>>
 Hello,

 I beleive your approach is the same as using spark with "
 spark.cassandra.output.ignoreNulls=true"
 This will not cover the situation when a value have to be overwriten
 with null.

 I found one possible solution - change the schema to keep only primary
 key fields and move all other fields to frozen UDT.
 create table (year, month, day, id, frozen, primary key((year,
 month, day), id) )
 In this way anything that is null inside event doesn't create
 tombstone, since event is serialized to BLOB.
 The penalty is in need of deserializing the whole Event when selecting
 only few columns.
 Can anyone confirm if this is good solution performance wise?

 Thank you,

 On Fri, 4 Jan 2019, 2:20 pm DuyHai Doan >>>
> "The problem is I can't know the combination of set/unset values" -->
> Just for this requirement, Achilles has a working solution for many years
> using INSERT_NOT_NULL_FIELDS strategy:
>
> https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy
>
> Or you can use the Update API that by design only perform update on
> not null fields:
> https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity
>
>
> Behind the scene, for each new combination of INSERT INTO table(x,y,z)
> statement, Achilles will check its prepared statement cache and if the
> statement does not exist yet, create a new prepared statement and put it
> into the cache for later re-use for you
>
> Disclaiment: I'm the creator of Achilles
>
>
>
> On Thu, Dec 27, 2018 at 10:21 PM Tomas

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Jonathan Haddad

Those are two different cases though.  It *sounds like* (again, I may be
missing the point) you're trying to overwrite a value with another value.
You're either going to serialize a blob and overwrite a single cell, or
you're going to overwrite all the cells and include a tombstone.

When you do a read, reading a single tombstone vs a single vs is
essentially the same thing, performance wise.

In your description you said "~ 20-100 events", and you're overwriting the
event each time, so I don't know how you go to 10K tombstones either.
Compaction will bring multiple tombstones together for a cell in the same
way it compacts multiple values for a single cell.

I sounds to make like you're taking some advice about tombstones out of
context and trying to apply the advice to a different problem.  Again, I
might be misunderstanding what you're doing.


On Fri, Jan 4, 2019 at 10:49 AM Tomas Bartalos 
wrote:

> Hello Jon,
>
> I thought having tombstones is much higher overhead than just overwriting
> values. The compaction overhead can be l similar, but I think the read
> performance is much worse.
>
> Tombstones accumulate and hang for 10 days (by default) before they are
> eligible for compaction.
>
> Also we have tombstone warning and error thresholds. If cassandra scans
> more than 10 000 tombstones, she will abort the query.
>
> According to this article:
> https://opencredo.com/blogs/cassandra-tombstones-common-issues/
>
> "The cassandra.yaml comments explain in perfectly: *“When executing a
> scan, within or across a partition, we need to keep the tombstones seen in
> memory so we can return them to the coordinator, which will use them to
> make sure other replicas also know about the deleted rows. With workloads
> that generate a lot of tombstones, this can cause performance problems and
> even exhaust the server heap. "*
>
> Regards,
> Tomas
>
> On Fri, 4 Jan 2019, 7:06 pm Jonathan Haddad 
>> If you're overwriting values, it really doesn't matter much if it's a
>> tombstone or any other value, they still need to be compacted and have the
>> same overhead at read time.
>>
>> Tombstones are problematic when you try to use Cassandra as a queue (or
>> something like a queue) and you need to scan over thousands of tombstones
>> in order to get to the real data.  You're simply overwriting a row and
>> trying to avoid a single tombstone.
>>
>> Maybe I'm missing something here.  Why do you think overwriting a single
>> cell with a tombstone is any worse than overwriting a single cell with a
>> value?
>>
>> Jon
>>
>>
>> On Fri, Jan 4, 2019 at 9:57 AM Tomas Bartalos 
>> wrote:
>>
>>> Hello,
>>>
>>> I beleive your approach is the same as using spark with "
>>> spark.cassandra.output.ignoreNulls=true"
>>> This will not cover the situation when a value have to be overwriten
>>> with null.
>>>
>>> I found one possible solution - change the schema to keep only primary
>>> key fields and move all other fields to frozen UDT.
>>> create table (year, month, day, id, frozen, primary key((year,
>>> month, day), id) )
>>> In this way anything that is null inside event doesn't create tombstone,
>>> since event is serialized to BLOB.
>>> The penalty is in need of deserializing the whole Event when selecting
>>> only few columns.
>>> Can anyone confirm if this is good solution performance wise?
>>>
>>> Thank you,
>>>
>>> On Fri, 4 Jan 2019, 2:20 pm DuyHai Doan >>
 "The problem is I can't know the combination of set/unset values" -->
 Just for this requirement, Achilles has a working solution for many years
 using INSERT_NOT_NULL_FIELDS strategy:

 https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy

 Or you can use the Update API that by design only perform update on not
 null fields:
 https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity


 Behind the scene, for each new combination of INSERT INTO table(x,y,z)
 statement, Achilles will check its prepared statement cache and if the
 statement does not exist yet, create a new prepared statement and put it
 into the cache for later re-use for you

 Disclaiment: I'm the creator of Achilles



 On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos <
 tomas.barta...@gmail.com> wrote:

> Hello,
>
> The problem is I can't know the combination of set/unset values. From
> my perspective every value should be set. The event from Kafka represents
> the complete state of the happening at certain point in time. In my table 
> I
> want to store the latest event so the most recent state of the happening
> (in this table I don't care about the history). Actually I used wrong
> expression since its just the opposite of "incremental update", every 
> event
> carries all data (state) for specific point of time.
>
> The event is represented with nested json structure. Top level
> elements of the json

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Tomas Bartalos

Hello Jon,

I thought having tombstones is much higher overhead than just overwriting
values. The compaction overhead can be l similar, but I think the read
performance is much worse.

Tombstones accumulate and hang for 10 days (by default) before they are
eligible for compaction.

Also we have tombstone warning and error thresholds. If cassandra scans
more than 10 000 tombstones, she will abort the query.

According to this article:
https://opencredo.com/blogs/cassandra-tombstones-common-issues/

"The cassandra.yaml comments explain in perfectly: *“When executing a scan,
within or across a partition, we need to keep the tombstones seen in memory
so we can return them to the coordinator, which will use them to make sure
other replicas also know about the deleted rows. With workloads that
generate a lot of tombstones, this can cause performance problems and even
exhaust the server heap. "*

Regards,
Tomas

On Fri, 4 Jan 2019, 7:06 pm Jonathan Haddad  If you're overwriting values, it really doesn't matter much if it's a
> tombstone or any other value, they still need to be compacted and have the
> same overhead at read time.
>
> Tombstones are problematic when you try to use Cassandra as a queue (or
> something like a queue) and you need to scan over thousands of tombstones
> in order to get to the real data.  You're simply overwriting a row and
> trying to avoid a single tombstone.
>
> Maybe I'm missing something here.  Why do you think overwriting a single
> cell with a tombstone is any worse than overwriting a single cell with a
> value?
>
> Jon
>
>
> On Fri, Jan 4, 2019 at 9:57 AM Tomas Bartalos 
> wrote:
>
>> Hello,
>>
>> I beleive your approach is the same as using spark with "
>> spark.cassandra.output.ignoreNulls=true"
>> This will not cover the situation when a value have to be overwriten with
>> null.
>>
>> I found one possible solution - change the schema to keep only primary
>> key fields and move all other fields to frozen UDT.
>> create table (year, month, day, id, frozen, primary key((year,
>> month, day), id) )
>> In this way anything that is null inside event doesn't create tombstone,
>> since event is serialized to BLOB.
>> The penalty is in need of deserializing the whole Event when selecting
>> only few columns.
>> Can anyone confirm if this is good solution performance wise?
>>
>> Thank you,
>>
>> On Fri, 4 Jan 2019, 2:20 pm DuyHai Doan >
>>> "The problem is I can't know the combination of set/unset values" -->
>>> Just for this requirement, Achilles has a working solution for many years
>>> using INSERT_NOT_NULL_FIELDS strategy:
>>>
>>> https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy
>>>
>>> Or you can use the Update API that by design only perform update on not
>>> null fields:
>>> https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity
>>>
>>>
>>> Behind the scene, for each new combination of INSERT INTO table(x,y,z)
>>> statement, Achilles will check its prepared statement cache and if the
>>> statement does not exist yet, create a new prepared statement and put it
>>> into the cache for later re-use for you
>>>
>>> Disclaiment: I'm the creator of Achilles
>>>
>>>
>>>
>>> On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos <
>>> tomas.barta...@gmail.com> wrote:
>>>
 Hello,

 The problem is I can't know the combination of set/unset values. From
 my perspective every value should be set. The event from Kafka represents
 the complete state of the happening at certain point in time. In my table I
 want to store the latest event so the most recent state of the happening
 (in this table I don't care about the history). Actually I used wrong
 expression since its just the opposite of "incremental update", every event
 carries all data (state) for specific point of time.

 The event is represented with nested json structure. Top level elements
 of the json are table fields with type like text, boolean, timestamp, list
 and the nested elements are UDT fields.

 Simplified example:
 There is a new purchase for the happening, event:
 {total_amount: 50, items : [A, B, C, new_item], purchase_time :
 '2018-12-27 13:30', specials: null, customer : {... }, fare_amount,...}
 I don't know what actually happened for this event, maybe there is a
 new item purchased, maybe some customer info have been changed, maybe the
 specials have been revoked and I have to reset them. I just need to store
 the state as it artived from Kafka, there might already be an event for
 this happening saved before, or maybe this is the first one.

 BR,
 Tomas

 On Thu, 27 Dec 2018, 9:36 pm Eric Stevens >>>
> Depending on the use case, creating separate prepared statements for
> each combination of set / unset values in large INSERT/UPDATE statements
> may be prohibitive.
>
> Instead, you can look into driver level support for UNSET

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Jonathan Haddad

If you're overwriting values, it really doesn't matter much if it's a
tombstone or any other value, they still need to be compacted and have the
same overhead at read time.

Tombstones are problematic when you try to use Cassandra as a queue (or
something like a queue) and you need to scan over thousands of tombstones
in order to get to the real data.  You're simply overwriting a row and
trying to avoid a single tombstone.

Maybe I'm missing something here.  Why do you think overwriting a single
cell with a tombstone is any worse than overwriting a single cell with a
value?

Jon


On Fri, Jan 4, 2019 at 9:57 AM Tomas Bartalos 
wrote:

> Hello,
>
> I beleive your approach is the same as using spark with "
> spark.cassandra.output.ignoreNulls=true"
> This will not cover the situation when a value have to be overwriten with
> null.
>
> I found one possible solution - change the schema to keep only primary key
> fields and move all other fields to frozen UDT.
> create table (year, month, day, id, frozen, primary key((year,
> month, day), id) )
> In this way anything that is null inside event doesn't create tombstone,
> since event is serialized to BLOB.
> The penalty is in need of deserializing the whole Event when selecting
> only few columns.
> Can anyone confirm if this is good solution performance wise?
>
> Thank you,
>
> On Fri, 4 Jan 2019, 2:20 pm DuyHai Doan 
>> "The problem is I can't know the combination of set/unset values" -->
>> Just for this requirement, Achilles has a working solution for many years
>> using INSERT_NOT_NULL_FIELDS strategy:
>>
>> https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy
>>
>> Or you can use the Update API that by design only perform update on not
>> null fields:
>> https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity
>>
>>
>> Behind the scene, for each new combination of INSERT INTO table(x,y,z)
>> statement, Achilles will check its prepared statement cache and if the
>> statement does not exist yet, create a new prepared statement and put it
>> into the cache for later re-use for you
>>
>> Disclaiment: I'm the creator of Achilles
>>
>>
>>
>> On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos 
>> wrote:
>>
>>> Hello,
>>>
>>> The problem is I can't know the combination of set/unset values. From my
>>> perspective every value should be set. The event from Kafka represents the
>>> complete state of the happening at certain point in time. In my table I
>>> want to store the latest event so the most recent state of the happening
>>> (in this table I don't care about the history). Actually I used wrong
>>> expression since its just the opposite of "incremental update", every event
>>> carries all data (state) for specific point of time.
>>>
>>> The event is represented with nested json structure. Top level elements
>>> of the json are table fields with type like text, boolean, timestamp, list
>>> and the nested elements are UDT fields.
>>>
>>> Simplified example:
>>> There is a new purchase for the happening, event:
>>> {total_amount: 50, items : [A, B, C, new_item], purchase_time :
>>> '2018-12-27 13:30', specials: null, customer : {... }, fare_amount,...}
>>> I don't know what actually happened for this event, maybe there is a new
>>> item purchased, maybe some customer info have been changed, maybe the
>>> specials have been revoked and I have to reset them. I just need to store
>>> the state as it artived from Kafka, there might already be an event for
>>> this happening saved before, or maybe this is the first one.
>>>
>>> BR,
>>> Tomas
>>>
>>>
>>> On Thu, 27 Dec 2018, 9:36 pm Eric Stevens >>
 Depending on the use case, creating separate prepared statements for
 each combination of set / unset values in large INSERT/UPDATE statements
 may be prohibitive.

 Instead, you can look into driver level support for UNSET values.
 Requires Cassandra 2.2 or later IIRC.

 See:
 Java Driver:
 https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding
 Python Driver:
 https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values
 Node Driver:
 https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset

 On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <
 sean_r_dur...@homedepot.com> wrote:

> You say the events are incremental updates. I am interpreting this to
> mean only some columns are updated. Others should keep their original
> values.
>
> You are correct that inserting null creates a tombstone.
>
> Can you only insert the columns that actually have new values? Just
> skip the columns with no information. (Make the insert generator a bit
> smarter.)
>
> Create table happening (id text primary key, event text, a text, b
> text, c text);
> Insert into table

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Tomas Bartalos

Hello,

I beleive your approach is the same as using spark with "
spark.cassandra.output.ignoreNulls=true"
This will not cover the situation when a value have to be overwriten with
null.

I found one possible solution - change the schema to keep only primary key
fields and move all other fields to frozen UDT.
create table (year, month, day, id, frozen, primary key((year,
month, day), id) )
In this way anything that is null inside event doesn't create tombstone,
since event is serialized to BLOB.
The penalty is in need of deserializing the whole Event when selecting only
few columns.
Can anyone confirm if this is good solution performance wise?

Thank you,

On Fri, 4 Jan 2019, 2:20 pm DuyHai Doan  "The problem is I can't know the combination of set/unset values" --> Just
> for this requirement, Achilles has a working solution for many years using
> INSERT_NOT_NULL_FIELDS strategy:
>
> https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy
>
> Or you can use the Update API that by design only perform update on not
> null fields:
> https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity
>
>
> Behind the scene, for each new combination of INSERT INTO table(x,y,z)
> statement, Achilles will check its prepared statement cache and if the
> statement does not exist yet, create a new prepared statement and put it
> into the cache for later re-use for you
>
> Disclaiment: I'm the creator of Achilles
>
>
>
> On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos 
> wrote:
>
>> Hello,
>>
>> The problem is I can't know the combination of set/unset values. From my
>> perspective every value should be set. The event from Kafka represents the
>> complete state of the happening at certain point in time. In my table I
>> want to store the latest event so the most recent state of the happening
>> (in this table I don't care about the history). Actually I used wrong
>> expression since its just the opposite of "incremental update", every event
>> carries all data (state) for specific point of time.
>>
>> The event is represented with nested json structure. Top level elements
>> of the json are table fields with type like text, boolean, timestamp, list
>> and the nested elements are UDT fields.
>>
>> Simplified example:
>> There is a new purchase for the happening, event:
>> {total_amount: 50, items : [A, B, C, new_item], purchase_time :
>> '2018-12-27 13:30', specials: null, customer : {... }, fare_amount,...}
>> I don't know what actually happened for this event, maybe there is a new
>> item purchased, maybe some customer info have been changed, maybe the
>> specials have been revoked and I have to reset them. I just need to store
>> the state as it artived from Kafka, there might already be an event for
>> this happening saved before, or maybe this is the first one.
>>
>> BR,
>> Tomas
>>
>>
>> On Thu, 27 Dec 2018, 9:36 pm Eric Stevens >
>>> Depending on the use case, creating separate prepared statements for
>>> each combination of set / unset values in large INSERT/UPDATE statements
>>> may be prohibitive.
>>>
>>> Instead, you can look into driver level support for UNSET values.
>>> Requires Cassandra 2.2 or later IIRC.
>>>
>>> See:
>>> Java Driver:
>>> https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding
>>> Python Driver:
>>> https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values
>>> Node Driver:
>>> https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset
>>>
>>> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <
>>> sean_r_dur...@homedepot.com> wrote:
>>>
 You say the events are incremental updates. I am interpreting this to
 mean only some columns are updated. Others should keep their original
 values.

 You are correct that inserting null creates a tombstone.

 Can you only insert the columns that actually have new values? Just
 skip the columns with no information. (Make the insert generator a bit
 smarter.)

 Create table happening (id text primary key, event text, a text, b
 text, c text);
 Insert into table happening (id, event, a, b, c) values
 ("MainEvent","The most complete info we have right now","Priceless","10
 pm","Grand Ballroom");
 -- b changes
 Insert into happening (id, b) values ("MainEvent","9:30 pm");


 Sean Durity


 -Original Message-
 From: Tomas Bartalos 
 Sent: Thursday, December 27, 2018 9:27 AM
 To: user@cassandra.apache.org
 Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values

 Hello,

 I’d start with describing my use case and how I’d like to use Cassandra
 to solve my storage needs.
 We're processing a stream of events for various happenings. Every event
 have a unique happening_id.
 One happening may have many events,

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread DuyHai Doan

"The problem is I can't know the combination of set/unset values" --> Just
for this requirement, Achilles has a working solution for many years using
INSERT_NOT_NULL_FIELDS strategy:

https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy

Or you can use the Update API that by design only perform update on not
null fields:
https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity


Behind the scene, for each new combination of INSERT INTO table(x,y,z)
statement, Achilles will check its prepared statement cache and if the
statement does not exist yet, create a new prepared statement and put it
into the cache for later re-use for you

Disclaiment: I'm the creator of Achilles



On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos 
wrote:

> Hello,
>
> The problem is I can't know the combination of set/unset values. From my
> perspective every value should be set. The event from Kafka represents the
> complete state of the happening at certain point in time. In my table I
> want to store the latest event so the most recent state of the happening
> (in this table I don't care about the history). Actually I used wrong
> expression since its just the opposite of "incremental update", every event
> carries all data (state) for specific point of time.
>
> The event is represented with nested json structure. Top level elements of
> the json are table fields with type like text, boolean, timestamp, list and
> the nested elements are UDT fields.
>
> Simplified example:
> There is a new purchase for the happening, event:
> {total_amount: 50, items : [A, B, C, new_item], purchase_time :
> '2018-12-27 13:30', specials: null, customer : {... }, fare_amount,...}
> I don't know what actually happened for this event, maybe there is a new
> item purchased, maybe some customer info have been changed, maybe the
> specials have been revoked and I have to reset them. I just need to store
> the state as it artived from Kafka, there might already be an event for
> this happening saved before, or maybe this is the first one.
>
> BR,
> Tomas
>
>
> On Thu, 27 Dec 2018, 9:36 pm Eric Stevens 
>> Depending on the use case, creating separate prepared statements for each
>> combination of set / unset values in large INSERT/UPDATE statements may be
>> prohibitive.
>>
>> Instead, you can look into driver level support for UNSET values.
>> Requires Cassandra 2.2 or later IIRC.
>>
>> See:
>> Java Driver:
>> https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding
>> Python Driver:
>> https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values
>> Node Driver:
>> https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset
>>
>> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <
>> sean_r_dur...@homedepot.com> wrote:
>>
>>> You say the events are incremental updates. I am interpreting this to
>>> mean only some columns are updated. Others should keep their original
>>> values.
>>>
>>> You are correct that inserting null creates a tombstone.
>>>
>>> Can you only insert the columns that actually have new values? Just skip
>>> the columns with no information. (Make the insert generator a bit smarter.)
>>>
>>> Create table happening (id text primary key, event text, a text, b text,
>>> c text);
>>> Insert into table happening (id, event, a, b, c) values
>>> ("MainEvent","The most complete info we have right now","Priceless","10
>>> pm","Grand Ballroom");
>>> -- b changes
>>> Insert into happening (id, b) values ("MainEvent","9:30 pm");
>>>
>>>
>>> Sean Durity
>>>
>>>
>>> -Original Message-
>>> From: Tomas Bartalos 
>>> Sent: Thursday, December 27, 2018 9:27 AM
>>> To: user@cassandra.apache.org
>>> Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values
>>>
>>> Hello,
>>>
>>> I’d start with describing my use case and how I’d like to use Cassandra
>>> to solve my storage needs.
>>> We're processing a stream of events for various happenings. Every event
>>> have a unique happening_id.
>>> One happening may have many events, usually ~ 20-100 events. I’d like to
>>> store only the latest event for the same happening (Event is an incremental
>>> update and it contains all up-to date data about happening).
>>> Technically the events are streamed from Kafka, processed with Spark an
>>> saved to Cassandra.
>>> In Cassandra we use upserts (insert with same primary key).  So far so
>>> good, however there comes the tombstone...
>>>
>>> When I’m inserting field with NULL value, Cassandra creates tombstone
>>> for this field. As I understood this is due to space efficiency, Cassandra
>>> doesn’t have to remember there is a NULL value, she just deletes the
>>> respective column and a delete creates a ... tombstone.
>>> I was hoping there could be an option to tell Cassandra not to be so
>>> space effective and store “unset" info without generating

Re: [EXTERNAL] Writes and Reads with high latency

2018-12-28 Thread Marco Gasparini

 to all your questions
Thank you very much!

Regards
Marco


Il giorno gio 27 dic 2018 alle ore 21:09 Durity, Sean R <
sean_r_dur...@homedepot.com> ha scritto:

> Your RF is only 1, so the data only exists on one node. This is not
> typically how Cassandra is used. If you need the high availability and low
> latency, you typically set RF to 3 per DC.
>
>
>
> How many event_datetime records can you have per pkey? How many pkeys
> (roughly) do you have? In general, you only want to have at most 100 MB of
> data per partition (pkey). If it is larger than that, I would expect some
> timeouts. And because only one node has the data, a single timeout means
> you won’t get any data. Server timeouts default to just 10 seconds. The
> secret to Cassandra is to always select your data by at least the primary
> key (which you are doing). So, I suspect you either have very wide rows or
> lots of tombstones.
>
>
>
> Since you mention lots of deletes, I am thinking it could be tombstones.
> Are you getting any tombstone warnings or errors in your system.log? When
> you delete, are you deleting a full partition? If you are deleting just
> part of a partition over and over, I think you will be creating too many
> tombstones. I try to design my data partitions so that deletes are for a
> full partition. Then I won’t be reading through 1000s (or more) tombstones
> trying to find the live data.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Marco Gasparini 
> *Sent:* Thursday, December 27, 2018 3:01 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [EXTERNAL] Writes and Reads with high latency
>
>
>
> Hello Sean,
>
>
>
> here my schema and RF:
>
>
>
> -
>
> CREATE KEYSPACE my_keyspace WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DC1': '1'}  AND durable_writes = true;
>
>
>
> CREATE TABLE my_keyspace.my_table (
>
> pkey text,
>
> event_datetime timestamp,
>
> agent text,
>
> ft text,
>
> ftt text,
>
> some_id bigint,
>
> PRIMARY KEY (pkey, event_datetime)
>
> ) WITH CLUSTERING ORDER BY (event_datetime DESC)
>
> AND bloom_filter_fp_chance = 0.01
>
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>
> AND comment = ''
>
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
>
> AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>
> AND crc_check_chance = 1.0
>
> AND dclocal_read_repair_chance = 0.1
>
> AND default_time_to_live = 0
>
> AND gc_grace_seconds = 9
>
> AND max_index_interval = 2048
>
> AND memtable_flush_period_in_ms = 0
>
> AND min_index_interval = 128
>
> AND read_repair_chance = 0.0
>
> AND speculative_retry = '99PERCENTILE';
>
>
>
> -
>
>
>
> Queries I make are very simple:
>
>
>
> select pkey, event_datetime, ft, some_id, ftt from my_keyspace.my_table
> where pkey = ? limit ?;
>
> and
>
> insert into my_keyspace.my_table (event_datetime, pkey, agent, some_id,
> ft, ftt) values (?,?,?,?,?,?);
>
>
>
> About Retry policy, the answer is yes, actually when a write fails I store
> it somewhere else and, after a period, a try to write it to Cassandra
> again. This way I can store almost all my data, but when the problem is the
> read I don't apply any Retry policy (but this is my problem)
>
>
>
>
>
> Thanks
>
> Marco
>
>
>
>
>
> Il giorno ven 21 dic 2018 alle ore 17:18 Durity, Sean R <
> sean_r_dur...@homedepot.com> ha scritto:
>
> Can you provide the schema and the queries? What is the RF of the keyspace
> for the data? Are you using any Retry policy on your Cluster object?
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Marco Gasparini 
> *Sent:* Friday, December 21, 2018 10:45 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Writes and Reads with high latency
>
>
>
> hello all,
>
>
>
> I have 1 DC of 3 nodes in which is running Cassandra 3.11.3 with
> consistency level ONE and Java 1.8.0_191.
>
>
>
> Every day, there are many nodejs programs that send data to the
> cassandra's cluster via NodeJs cassandra-driver.
>
> Every day I got like 600k requests. Each request makes the server to:
>
> 1_ READ some data in Cassandra (by an id, usually I get 3 records),
>
> 2_ DELETE one of those

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2018-12-27 Thread Tomas Bartalos

Hello,

The problem is I can't know the combination of set/unset values. From my
perspective every value should be set. The event from Kafka represents the
complete state of the happening at certain point in time. In my table I
want to store the latest event so the most recent state of the happening
(in this table I don't care about the history). Actually I used wrong
expression since its just the opposite of "incremental update", every event
carries all data (state) for specific point of time.

The event is represented with nested json structure. Top level elements of
the json are table fields with type like text, boolean, timestamp, list and
the nested elements are UDT fields.

Simplified example:
There is a new purchase for the happening, event:
{total_amount: 50, items : [A, B, C, new_item], purchase_time : '2018-12-27
13:30', specials: null, customer : {... }, fare_amount,...}
I don't know what actually happened for this event, maybe there is a new
item purchased, maybe some customer info have been changed, maybe the
specials have been revoked and I have to reset them. I just need to store
the state as it artived from Kafka, there might already be an event for
this happening saved before, or maybe this is the first one.

BR,
Tomas


On Thu, 27 Dec 2018, 9:36 pm Eric Stevens  Depending on the use case, creating separate prepared statements for each
> combination of set / unset values in large INSERT/UPDATE statements may be
> prohibitive.
>
> Instead, you can look into driver level support for UNSET values.
> Requires Cassandra 2.2 or later IIRC.
>
> See:
> Java Driver:
> https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding
> Python Driver:
> https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values
> Node Driver:
> https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset
>
> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <
> sean_r_dur...@homedepot.com> wrote:
>
>> You say the events are incremental updates. I am interpreting this to
>> mean only some columns are updated. Others should keep their original
>> values.
>>
>> You are correct that inserting null creates a tombstone.
>>
>> Can you only insert the columns that actually have new values? Just skip
>> the columns with no information. (Make the insert generator a bit smarter.)
>>
>> Create table happening (id text primary key, event text, a text, b text,
>> c text);
>> Insert into table happening (id, event, a, b, c) values ("MainEvent","The
>> most complete info we have right now","Priceless","10 pm","Grand Ballroom");
>> -- b changes
>> Insert into happening (id, b) values ("MainEvent","9:30 pm");
>>
>>
>> Sean Durity
>>
>>
>> -Original Message-
>> From: Tomas Bartalos 
>> Sent: Thursday, December 27, 2018 9:27 AM
>> To: user@cassandra.apache.org
>> Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values
>>
>> Hello,
>>
>> I’d start with describing my use case and how I’d like to use Cassandra
>> to solve my storage needs.
>> We're processing a stream of events for various happenings. Every event
>> have a unique happening_id.
>> One happening may have many events, usually ~ 20-100 events. I’d like to
>> store only the latest event for the same happening (Event is an incremental
>> update and it contains all up-to date data about happening).
>> Technically the events are streamed from Kafka, processed with Spark an
>> saved to Cassandra.
>> In Cassandra we use upserts (insert with same primary key).  So far so
>> good, however there comes the tombstone...
>>
>> When I’m inserting field with NULL value, Cassandra creates tombstone for
>> this field. As I understood this is due to space efficiency, Cassandra
>> doesn’t have to remember there is a NULL value, she just deletes the
>> respective column and a delete creates a ... tombstone.
>> I was hoping there could be an option to tell Cassandra not to be so
>> space effective and store “unset" info without generating tombstones.
>> Something similar to inserting empty strings instead of null values:
>>
>> CREATE TABLE happening (id text PRIMARY KEY, event text); insert into
>> happening (‘1’, ‘event1’); — tombstone is generated insert into happening
>> (‘1’, null); — tombstone is not generated insert into happening (‘1’, '’);
>>
>> Possible solutions:
>> 1. Disable tombstones with gc_grace_seconds = 0 or set to reasonable low
>> value (1 hour ?) . Not good, since phantom data may re-appear 2. ignore
>> NULLs on spark side with “spark.cassandra.output.ignoreNulls=true”. Not
>> good since this will never overwrite previously inserted event field with
>> “empty” one.
>> 3. On inserts with spark, find all NULL values and replace them with
>> “empty” equivalent (empty string for text, 0 for integer). Very inefficient
>> and problematic to find “empty” equivalent for some data types.
>>
>> Until tombstones appeared

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2018-12-27 Thread Eric Stevens

Depending on the use case, creating separate prepared statements for each
combination of set / unset values in large INSERT/UPDATE statements may be
prohibitive.

Instead, you can look into driver level support for UNSET values.  Requires
Cassandra 2.2 or later IIRC.

See:
Java Driver:
https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding
Python Driver:
https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values
Node Driver:
https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset

On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R 
wrote:

> You say the events are incremental updates. I am interpreting this to mean
> only some columns are updated. Others should keep their original values.
>
> You are correct that inserting null creates a tombstone.
>
> Can you only insert the columns that actually have new values? Just skip
> the columns with no information. (Make the insert generator a bit smarter.)
>
> Create table happening (id text primary key, event text, a text, b text, c
> text);
> Insert into table happening (id, event, a, b, c) values ("MainEvent","The
> most complete info we have right now","Priceless","10 pm","Grand Ballroom");
> -- b changes
> Insert into happening (id, b) values ("MainEvent","9:30 pm");
>
>
> Sean Durity
>
>
> -Original Message-
> From: Tomas Bartalos 
> Sent: Thursday, December 27, 2018 9:27 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values
>
> Hello,
>
> I’d start with describing my use case and how I’d like to use Cassandra to
> solve my storage needs.
> We're processing a stream of events for various happenings. Every event
> have a unique happening_id.
> One happening may have many events, usually ~ 20-100 events. I’d like to
> store only the latest event for the same happening (Event is an incremental
> update and it contains all up-to date data about happening).
> Technically the events are streamed from Kafka, processed with Spark an
> saved to Cassandra.
> In Cassandra we use upserts (insert with same primary key).  So far so
> good, however there comes the tombstone...
>
> When I’m inserting field with NULL value, Cassandra creates tombstone for
> this field. As I understood this is due to space efficiency, Cassandra
> doesn’t have to remember there is a NULL value, she just deletes the
> respective column and a delete creates a ... tombstone.
> I was hoping there could be an option to tell Cassandra not to be so space
> effective and store “unset" info without generating tombstones.
> Something similar to inserting empty strings instead of null values:
>
> CREATE TABLE happening (id text PRIMARY KEY, event text); insert into
> happening (‘1’, ‘event1’); — tombstone is generated insert into happening
> (‘1’, null); — tombstone is not generated insert into happening (‘1’, '’);
>
> Possible solutions:
> 1. Disable tombstones with gc_grace_seconds = 0 or set to reasonable low
> value (1 hour ?) . Not good, since phantom data may re-appear 2. ignore
> NULLs on spark side with “spark.cassandra.output.ignoreNulls=true”. Not
> good since this will never overwrite previously inserted event field with
> “empty” one.
> 3. On inserts with spark, find all NULL values and replace them with
> “empty” equivalent (empty string for text, 0 for integer). Very inefficient
> and problematic to find “empty” equivalent for some data types.
>
> Until tombstones appeared Cassandra was the right fit for our use case,
> however now I’m not sure if we’re heading the right direction.
> Could you please give me some advice how to solve this problem ?
>
> Thank you,
> Tomas
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
> 
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in

RE: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2018-12-27 Thread Durity, Sean R

You say the events are incremental updates. I am interpreting this to mean only 
some columns are updated. Others should keep their original values.

You are correct that inserting null creates a tombstone.

Can you only insert the columns that actually have new values? Just skip the 
columns with no information. (Make the insert generator a bit smarter.)

Create table happening (id text primary key, event text, a text, b text, c 
text);
Insert into table happening (id, event, a, b, c) values ("MainEvent","The most 
complete info we have right now","Priceless","10 pm","Grand Ballroom");
-- b changes
Insert into happening (id, b) values ("MainEvent","9:30 pm");


Sean Durity


-Original Message-
From: Tomas Bartalos 
Sent: Thursday, December 27, 2018 9:27 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values

Hello,

I’d start with describing my use case and how I’d like to use Cassandra to 
solve my storage needs.
We're processing a stream of events for various happenings. Every event have a 
unique happening_id.
One happening may have many events, usually ~ 20-100 events. I’d like to store 
only the latest event for the same happening (Event is an incremental update 
and it contains all up-to date data about happening).
Technically the events are streamed from Kafka, processed with Spark an saved 
to Cassandra.
In Cassandra we use upserts (insert with same primary key).  So far so good, 
however there comes the tombstone...

When I’m inserting field with NULL value, Cassandra creates tombstone for this 
field. As I understood this is due to space efficiency, Cassandra doesn’t have 
to remember there is a NULL value, she just deletes the respective column and a 
delete creates a ... tombstone.
I was hoping there could be an option to tell Cassandra not to be so space 
effective and store “unset" info without generating tombstones.
Something similar to inserting empty strings instead of null values:

CREATE TABLE happening (id text PRIMARY KEY, event text); insert into happening 
(‘1’, ‘event1’); — tombstone is generated insert into happening (‘1’, null); — 
tombstone is not generated insert into happening (‘1’, '’);

Possible solutions:
1. Disable tombstones with gc_grace_seconds = 0 or set to reasonable low value 
(1 hour ?) . Not good, since phantom data may re-appear 2. ignore NULLs on 
spark side with “spark.cassandra.output.ignoreNulls=true”. Not good since this 
will never overwrite previously inserted event field with “empty” one.
3. On inserts with spark, find all NULL values and replace them with “empty” 
equivalent (empty string for text, 0 for integer). Very inefficient and 
problematic to find “empty” equivalent for some data types.

Until tombstones appeared Cassandra was the right fit for our use case, however 
now I’m not sure if we’re heading the right direction.
Could you please give me some advice how to solve this problem ?

Thank you,
Tomas
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

RE: [EXTERNAL] Writes and Reads with high latency

2018-12-27 Thread Durity, Sean R

Your RF is only 1, so the data only exists on one node. This is not typically 
how Cassandra is used. If you need the high availability and low latency, you 
typically set RF to 3 per DC.

How many event_datetime records can you have per pkey? How many pkeys (roughly) 
do you have? In general, you only want to have at most 100 MB of data per 
partition (pkey). If it is larger than that, I would expect some timeouts. And 
because only one node has the data, a single timeout means you won’t get any 
data. Server timeouts default to just 10 seconds. The secret to Cassandra is to 
always select your data by at least the primary key (which you are doing). So, 
I suspect you either have very wide rows or lots of tombstones.

Since you mention lots of deletes, I am thinking it could be tombstones. Are 
you getting any tombstone warnings or errors in your system.log? When you 
delete, are you deleting a full partition? If you are deleting just part of a 
partition over and over, I think you will be creating too many tombstones. I 
try to design my data partitions so that deletes are for a full partition. Then 
I won’t be reading through 1000s (or more) tombstones trying to find the live 
data.


Sean Durity

From: Marco Gasparini 
Sent: Thursday, December 27, 2018 3:01 AM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Writes and Reads with high latency

Hello Sean,

here my schema and RF:

-
CREATE KEYSPACE my_keyspace WITH replication = {'class': 
'NetworkTopologyStrategy', 'DC1': '1'}  AND durable_writes = true;

CREATE TABLE my_keyspace.my_table (
pkey text,
event_datetime timestamp,
agent text,
ft text,
ftt text,
some_id bigint,
PRIMARY KEY (pkey, event_datetime)
) WITH CLUSTERING ORDER BY (event_datetime DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 9
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

-

Queries I make are very simple:

select pkey, event_datetime, ft, some_id, ftt from my_keyspace.my_table where 
pkey = ? limit ?;
and
insert into my_keyspace.my_table (event_datetime, pkey, agent, some_id, ft, 
ftt) values (?,?,?,?,?,?);

About Retry policy, the answer is yes, actually when a write fails I store it 
somewhere else and, after a period, a try to write it to Cassandra again. This 
way I can store almost all my data, but when the problem is the read I don't 
apply any Retry policy (but this is my problem)


Thanks
Marco


Il giorno ven 21 dic 2018 alle ore 17:18 Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> ha scritto:
Can you provide the schema and the queries? What is the RF of the keyspace for 
the data? Are you using any Retry policy on your Cluster object?


Sean Durity

From: Marco Gasparini 
mailto:marco.gaspar...@competitoor.com>>
Sent: Friday, December 21, 2018 10:45 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Writes and Reads with high latency

hello all,

I have 1 DC of 3 nodes in which is running Cassandra 3.11.3 with consistency 
level ONE and Java 1.8.0_191.

Every day, there are many nodejs programs that send data to the cassandra's 
cluster via NodeJs cassandra-driver.
Every day I got like 600k requests. Each request makes the server to:
1_ READ some data in Cassandra (by an id, usually I get 3 records),
2_ DELETE one of those records
3_ WRITE the data into Cassandra.

So every day I make many deletes.

Every day I find errors like:
"All host(s) tried for query failed. First host tried, 
10.8.0.10:9042<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.8.0.10-3A9042_=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=Y2zNzOyvqOiHqZ5yvB1rO_X6C-HivNjXYN0bLLL-yZQ=2v42cyvuxcXJ0oMfUrRcY-kRno1SkM4CTEMi4n1k0Wo=>:
 Host considered as DOWN. See innerErrors"
"Server timeout during write query at consistency LOCAL_ONE (0 peer(s) 
acknowledged the write over 1 required)"
"Server timeout during write query at consistency SERIAL (0 peer(s) 
acknowledged the write over 1 required)"
"Server timeout during read query at consistency LOCAL_ONE (0 peer(s) 
acknowledged the read over 1 required)"

nodetool tablehistograms tells me this:

Percentile  SSTables Writ

Re: [EXTERNAL] Writes and Reads with high latency

2018-12-27 Thread Marco Gasparini

Hello Sean,

here my schema and RF:

-
CREATE KEYSPACE my_keyspace WITH replication = {'class':
'NetworkTopologyStrategy', 'DC1': '1'}  AND durable_writes = true;

CREATE TABLE my_keyspace.my_table (
pkey text,
event_datetime timestamp,
agent text,
ft text,
ftt text,
some_id bigint,
PRIMARY KEY (pkey, event_datetime)
) WITH CLUSTERING ORDER BY (event_datetime DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 9
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

-

Queries I make are very simple:

select pkey, event_datetime, ft, some_id, ftt from my_keyspace.my_table
where pkey = ? limit ?;
and
insert into my_keyspace.my_table (event_datetime, pkey, agent, some_id, ft,
ftt) values (?,?,?,?,?,?);

About Retry policy, the answer is yes, actually when a write fails I store
it somewhere else and, after a period, a try to write it to Cassandra
again. This way I can store almost all my data, but when the problem is the
read I don't apply any Retry policy (but this is my problem)


Thanks
Marco


Il giorno ven 21 dic 2018 alle ore 17:18 Durity, Sean R <
sean_r_dur...@homedepot.com> ha scritto:

> Can you provide the schema and the queries? What is the RF of the keyspace
> for the data? Are you using any Retry policy on your Cluster object?
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Marco Gasparini 
> *Sent:* Friday, December 21, 2018 10:45 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Writes and Reads with high latency
>
>
>
> hello all,
>
>
>
> I have 1 DC of 3 nodes in which is running Cassandra 3.11.3 with
> consistency level ONE and Java 1.8.0_191.
>
>
>
> Every day, there are many nodejs programs that send data to the
> cassandra's cluster via NodeJs cassandra-driver.
>
> Every day I got like 600k requests. Each request makes the server to:
>
> 1_ READ some data in Cassandra (by an id, usually I get 3 records),
>
> 2_ DELETE one of those records
>
> 3_ WRITE the data into Cassandra.
>
>
>
> So every day I make many deletes.
>
>
>
> Every day I find errors like:
>
> "All host(s) tried for query failed. First host tried, 10.8.0.10:9042
> :
> Host considered as DOWN. See innerErrors"
>
> "Server timeout during write query at consistency LOCAL_ONE (0 peer(s)
> acknowledged the write over 1 required)"
>
> "Server timeout during write query at consistency SERIAL (0 peer(s)
> acknowledged the write over 1 required)"
>
> "Server timeout during read query at consistency LOCAL_ONE (0 peer(s)
> acknowledged the read over 1 required)"
>
>
>
> nodetool tablehistograms tells me this:
>
>
>
> Percentile  SSTables Write Latency  Read LatencyPartition
> SizeCell Count
>
>   (micros)  (micros)   (bytes)
>
> 50% 8.00379.02   1955.67
> 379022 8
>
> 75%10.00785.94 155469.30
> 65494917
>
> 95%12.00  17436.92 268650.95
>  162972235
>
> 98%12.00  25109.16 322381.14
>  234679942
>
> 99%12.00  30130.99 386857.37
>  337939150
>
> Min 0.00  6.87 88.15
>  104 0
>
> Max12.00  43388.63 386857.37
> 20924300   179
>
>
>
> in the 99% I noted that write and read latency is pretty high, but I don't
> know how to improve that.
>
> I can provide more statistics if needed.
>
>
>
> Is there any improvement I can make to the Cassandra's configuration in
> order to not to lose any data?
>
>
>
> Thanks
>
>
>
> Regards
>
> Marco
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to

RE: [EXTERNAL] Writes and Reads with high latency

2018-12-21 Thread Durity, Sean R

Can you provide the schema and the queries? What is the RF of the keyspace for 
the data? Are you using any Retry policy on your Cluster object?

Sean Durity

From: Marco Gasparini 
Sent: Friday, December 21, 2018 10:45 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Writes and Reads with high latency

hello all,

I have 1 DC of 3 nodes in which is running Cassandra 3.11.3 with consistency 
level ONE and Java 1.8.0_191.

Every day, there are many nodejs programs that send data to the cassandra's 
cluster via NodeJs cassandra-driver.
Every day I got like 600k requests. Each request makes the server to:
1_ READ some data in Cassandra (by an id, usually I get 3 records),
2_ DELETE one of those records
3_ WRITE the data into Cassandra.

So every day I make many deletes.

Every day I find errors like:
"All host(s) tried for query failed. First host tried, 
10.8.0.10:9042:
 Host considered as DOWN. See innerErrors"
"Server timeout during write query at consistency LOCAL_ONE (0 peer(s) 
acknowledged the write over 1 required)"
"Server timeout during write query at consistency SERIAL (0 peer(s) 
acknowledged the write over 1 required)"
"Server timeout during read query at consistency LOCAL_ONE (0 peer(s) 
acknowledged the read over 1 required)"

nodetool tablehistograms tells me this:

Percentile  SSTables Write Latency  Read LatencyPartition Size  
  Cell Count
  (micros)  (micros)   (bytes)
50% 8.00379.02   1955.67379022  
   8
75%10.00785.94 155469.30654949  
  17
95%12.00  17436.92 268650.95   1629722  
  35
98%12.00  25109.16 322381.14   2346799  
  42
99%12.00  30130.99 386857.37   3379391  
  50
Min 0.00  6.87 88.15   104  
   0
Max12.00  43388.63 386857.37  20924300  
 179

in the 99% I noted that write and read latency is pretty high, but I don't know 
how to improve that.
I can provide more statistics if needed.

Is there any improvement I can make to the Cassandra's configuration in order 
to not to lose any data?

Thanks

Regards
Marco

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

RE: [EXTERNAL] Re: upgrade Apache Cassandra 2.1.9 to 3.0.9

2018-12-05 Thread Durity, Sean R

In my understanding, there is a balance of getting upgradesstables done vs 
normal activity. I think the cluster can function fine with old and new 
sstables, but there can be a performance hit to reading the older version 
(perhaps). Personally, I don’t restart repairs until upgradesstables is 
completed. So, I push to get upgradesstables completed as soon as possible.


Sean Durity

From: Shravan R 
Sent: Tuesday, December 04, 2018 3:39 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: upgrade Apache Cassandra 2.1.9 to 3.0.9

Thanks Sean. I have automation in place that can put the new binary and restart 
the node to a newer version as quickly as possible. upgradesstables is I/O 
intensive and it takes time and is proportional to the data on the node. Given 
these constraints, is there a risk due to prolonged upgradesstables?

On Tue, Dec 4, 2018 at 12:20 PM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
We have had great success with Cassandra upgrades with applications staying 
on-line. It is one of the strongest benefits of Cassandra. A couple things I 
incorporate into upgrades:

-  The main task is getting the new binaries loaded, then restarting 
the node – in a rolling fashion. Get this done as quickly as possible

-  Streaming between versions is usually problematic. So, I never do 
any node additions or decommissions during an upgrade

-  With applications running, there is not an acceptable back-out plan 
(either lose data or take a long outage or both), so we are always going 
forward. So, lower life cycle testing is important before hitting production

-  Upgrading is a more frequent activity, so get the process/automation 
in place. The upgrade process should not be a reason to delay, especially for 
minor version upgrades that might be quickly necessary (security issue or bug 
fix).


Sean Durity

From: Shravan R mailto:skr...@gmail.com>>
Sent: Tuesday, December 04, 2018 12:22 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: upgrade Apache Cassandra 2.1.9 to 3.0.9

Thanks Jeff. I tried to bootstrap a 3.x node to a partially upgraded cluster 
(2.1.9 + 3.x) and I was not able to do so. The schema never settled.

How does the below approach sound like?

  1.  Update the software binary on all nodes to use cassandra-3.x upon a 
restart.
  2.  Restart all nodes in a rolling fashion
  3.  Run nodetool upgradesstables in a rolling fashion

Is there a risk on pending nodetool upgradesstables?

On Sun, Dec 2, 2018 at 2:12 AM Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:


On Dec 2, 2018, at 12:40 PM, Shravan R 
mailto:skr...@gmail.com>> wrote:
Marc/Dimitry/Jon - greatly appreciate your feedback. I will look into the 
version part that you suggested. The reason to go direct to 3.x is to take a bi 
leap and reduce overall effort to upgrade a large cluster (development 
included).

I have these questions from my original post. Appreciate if you could shed some 
light and point me in the right direction.

1) How do deal with decommissioning a 2.1.9 node in a partially upgraded 
cluster?

If any of the replicas have already upgraded, which is almost guaranteed if 
you’re using vnodes, It’s hard / you don’t. You’d basically upgrade everything 
else and then deal with it. If a host fails mid upgrade you’ll likely have some 
period of unavailables while you bounce the replicas to finish, then you can 
decom



2) How to bootstrap a 3.x node to a partially upgraded cluster?

This may work fine, but test it because I’m not certain. It should be able to 
read the 2.1 and 3.0 sstables that’ll stream so it’ll just work

3) Is there an alternative approach to the upgrade large clusters. i.e instead 
of going through nodetool upgradesstables on each node in rolling fashion

Bounce them all as quickly as is practical, do the upgradesstables after the 
bounces complete





On Sat, Dec 1, 2018 at 1:03 PM Jonathan Haddad 
mailto:j...@jonhaddad.com>> wrote:
Dmitry is right. Generally speaking always go with the latest bug fix release.

On Sat, Dec 1, 2018 at 10:14 AM Dmitry Saprykin 
mailto:saprykin.dmi...@gmail.com>> wrote:
See more here
https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-13004<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_plugins_servlet_mobile-23issue_CASSANDRA-2D13004=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=--WtdKaRCohgTv7Y6px-TdcK2xJFB9oaDOSfdoBQ8D0=8csmPWgUEWao6E4wthrG_-BX5a2OQJKXpkKtFLjSPlI=>

On Sat, Dec 1, 2018 at 1:02 PM Dmitry Saprykin 
mailto:saprykin.dmi...@gmail.com>> wrote:
Even more, 3.0.9 is a terrible target choice by itself. It has a nasty bug 
corrupting sstables on alter.

On Sat, Dec 1, 2018 at 11:55 AM Marc Selwan 
mailto:marc.sel...@datastax.com>> wrote:
Hi Shravan,

Did you upgrade Apache Cassandra 2.1.9 to the latest patch release bef

Re: [EXTERNAL] Re: upgrade Apache Cassandra 2.1.9 to 3.0.9

2018-12-04 Thread Shravan R

Thanks Sean. I have automation in place that can put the new binary and
restart the node to a newer version as quickly as possible. upgradesstables
is I/O intensive and it takes time and is proportional to the data on the
node. Given these constraints, is there a risk due to prolonged
upgradesstables?

On Tue, Dec 4, 2018 at 12:20 PM Durity, Sean R 
wrote:

> We have had great success with Cassandra upgrades with applications
> staying on-line. It is one of the strongest benefits of Cassandra. A couple
> things I incorporate into upgrades:
>
> -  The main task is getting the new binaries loaded, then
> restarting the node – in a rolling fashion. Get this done as quickly as
> possible
>
> -  Streaming between versions is usually problematic. So, I never
> do any node additions or decommissions during an upgrade
>
> -  With applications running, there is not an acceptable back-out
> plan (either lose data or take a long outage or both), so we are always
> going forward. So, lower life cycle testing is important before hitting
> production
>
> -  Upgrading is a more frequent activity, so get the
> process/automation in place. The upgrade process should not be a reason to
> delay, especially for minor version upgrades that might be quickly
> necessary (security issue or bug fix).
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Shravan R 
> *Sent:* Tuesday, December 04, 2018 12:22 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: upgrade Apache Cassandra 2.1.9 to 3.0.9
>
>
>
> Thanks Jeff. I tried to bootstrap a 3.x node to a partially upgraded
> cluster (2.1.9 + 3.x) and I was *not* able to do so. The schema never
> settled.
>
>
>
> How does the below approach sound like?
>
>1. Update the software binary on all nodes to use cassandra-3.x upon a
>restart.
>2. Restart all nodes in a rolling fashion
>3. Run nodetool upgradesstables in a rolling fashion
>
>
>
> Is there a risk on pending nodetool upgradesstables?
>
>
>
> On Sun, Dec 2, 2018 at 2:12 AM Jeff Jirsa  wrote:
>
>
>
>
> On Dec 2, 2018, at 12:40 PM, Shravan R  wrote:
>
> Marc/Dimitry/Jon - greatly appreciate your feedback. I will look into the
> version part that you suggested. The reason to go direct to 3.x is to take
> a bi leap and reduce overall effort to upgrade a large cluster (development
> included).
>
>
>
> I have these questions from my original post. Appreciate if you could shed
> some light and point me in the right direction.
>
>
>
> 1) How do deal with decommissioning a 2.1.9 node in a partially upgraded
> cluster?
>
>
>
> If any of the replicas have already upgraded, which is almost guaranteed
> if you’re using vnodes, It’s hard / you don’t. You’d basically upgrade
> everything else and then deal with it. If a host fails mid upgrade you’ll
> likely have some period of unavailables while you bounce the replicas to
> finish, then you can decom
>
>
>
>
>
>
>
> 2) How to bootstrap a 3.x node to a partially upgraded cluster?
>
>
>
> This may work fine, but test it because I’m not certain. It should be able
> to read the 2.1 and 3.0 sstables that’ll stream so it’ll just work
>
>
>
> 3) Is there an alternative approach to the upgrade large clusters. i.e
> instead of going through nodetool upgradesstables on each node in rolling
> fashion
>
>
>
> Bounce them all as quickly as is practical, do the upgradesstables after
> the bounces complete
>
>
>
>
>
>
>
>
>
>
>
> On Sat, Dec 1, 2018 at 1:03 PM Jonathan Haddad  wrote:
>
> Dmitry is right. Generally speaking always go with the latest bug fix
> release.
>
>
>
> On Sat, Dec 1, 2018 at 10:14 AM Dmitry Saprykin 
> wrote:
>
> See more here
>
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-13004
> 
>
>
>
> On Sat, Dec 1, 2018 at 1:02 PM Dmitry Saprykin 
> wrote:
>
> Even more, 3.0.9 is a terrible target choice by itself. It has a nasty bug
> corrupting sstables on alter.
>
>
>
> On Sat, Dec 1, 2018 at 11:55 AM Marc Selwan 
> wrote:
>
> Hi Shravan,
>
>
>
> Did you upgrade Apache Cassandra 2.1.9 to the latest patch release before
> doing the major upgrade? It's generally favorable to go to the latest patch
> release as often times they include fixes that smooth over the upgrade
> process. There are hundreds of bug fixes between 2.1.9 and 2.1.20 (current
> version)
>
>
>
> Best,
>
> Marc
>
>
>
> On Fri, Nov 30, 2018 at 3:13 PM Shravan R  wrote:
>
> Hello,
>
>
>
> I am planning to upgrade Apache Cassandra 2.1.9 to Apache Cassandra-3.0.9.
> I came up with the version based on [1]. I followed upgrade steps as in
> [2]. I was testing the same in the lab and encountered issues (streaming
> just fails and hangs for ever) with bootstrapping a 3.0.9 node

RE: [EXTERNAL] Cassandra Upgrade Plan 2.2.4 to 3.11.3

2018-12-04 Thread Durity, Sean R

See my recent post for some additional points. But I wanted to encourage you to 
look at the in-place upgrade on your existing hardware. No need to add a DC to 
try and upgrade. The cluster will handle reads and writes with nodes of 
different versions – no problems. I have done this many times on many clusters.

Also, I tell my teams there is no real back-out after we get the first node 
upgraded. This is because any new data is being written in the new sstable 
format (assuming the version has a new sstable format) – whether inserts or 
compaction. Any snapshot of the cluster pre-upgrade is now obsolete. Test 
thoroughly, then go forward as quickly as possible.


Sean Durity

From: Devaki, Srinivas 
Sent: Sunday, December 02, 2018 9:24 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Cassandra Upgrade Plan 2.2.4 to 3.11.3

Hi everyone,

I have planned out our org's cassandra upgrade plan and want to make sure if it 
seems fine.

Details Existing Cluster:
* Cassandra 2.2.4
* 8 nodes with 32G ram and 12G max heap allocated to cassandra
* 4 nodes in each rack

1. Ensured all clients to use LOCAL_* consistency levels and all traffic to 
"old" dc
2. Add new cluster as "new" dc with cassandra 2.2.4
  2.1 update conf on all nodes in "old" dc
  2.2 rolling restart the "old" dc
3. Alter tables with similar replication factor on the "new" dc
4. cassandra repair on all nodes in "new" dc
5. upgrade each node in "new" dc to cassandra 3.11.3 (and upgradesstables)
6. switch all clients to connect to new cluster
7. repair all new nodes once more
8. alter tables to replication only on new dc
9. remove "old" dc

and I have some doubts on the same plan
D1. can i just join 3.11.3 cluster as "new" dc in the 2.2.4 cluster?
D2. how does rolling upgrade work, as in within the same cluster how can 2 
versions coexist?

Will be grateful if you could review this plan.

PS: following this plan to ensure that I can revert back to old behaviour at 
any step

Thanks
Srinivas Devaki
SRE/SDE at Zomato






The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

RE: [EXTERNAL] Re: upgrade Apache Cassandra 2.1.9 to 3.0.9

2018-12-04 Thread Durity, Sean R

We have had great success with Cassandra upgrades with applications staying 
on-line. It is one of the strongest benefits of Cassandra. A couple things I 
incorporate into upgrades:

-  The main task is getting the new binaries loaded, then restarting 
the node – in a rolling fashion. Get this done as quickly as possible

-  Streaming between versions is usually problematic. So, I never do 
any node additions or decommissions during an upgrade

-  With applications running, there is not an acceptable back-out plan 
(either lose data or take a long outage or both), so we are always going 
forward. So, lower life cycle testing is important before hitting production

-  Upgrading is a more frequent activity, so get the process/automation 
in place. The upgrade process should not be a reason to delay, especially for 
minor version upgrades that might be quickly necessary (security issue or bug 
fix).


Sean Durity

From: Shravan R 
Sent: Tuesday, December 04, 2018 12:22 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: upgrade Apache Cassandra 2.1.9 to 3.0.9

Thanks Jeff. I tried to bootstrap a 3.x node to a partially upgraded cluster 
(2.1.9 + 3.x) and I was not able to do so. The schema never settled.

How does the below approach sound like?

  1.  Update the software binary on all nodes to use cassandra-3.x upon a 
restart.
  2.  Restart all nodes in a rolling fashion
  3.  Run nodetool upgradesstables in a rolling fashion

Is there a risk on pending nodetool upgradesstables?

On Sun, Dec 2, 2018 at 2:12 AM Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:


On Dec 2, 2018, at 12:40 PM, Shravan R 
mailto:skr...@gmail.com>> wrote:
Marc/Dimitry/Jon - greatly appreciate your feedback. I will look into the 
version part that you suggested. The reason to go direct to 3.x is to take a bi 
leap and reduce overall effort to upgrade a large cluster (development 
included).

I have these questions from my original post. Appreciate if you could shed some 
light and point me in the right direction.

1) How do deal with decommissioning a 2.1.9 node in a partially upgraded 
cluster?

If any of the replicas have already upgraded, which is almost guaranteed if 
you’re using vnodes, It’s hard / you don’t. You’d basically upgrade everything 
else and then deal with it. If a host fails mid upgrade you’ll likely have some 
period of unavailables while you bounce the replicas to finish, then you can 
decom




2) How to bootstrap a 3.x node to a partially upgraded cluster?

This may work fine, but test it because I’m not certain. It should be able to 
read the 2.1 and 3.0 sstables that’ll stream so it’ll just work


3) Is there an alternative approach to the upgrade large clusters. i.e instead 
of going through nodetool upgradesstables on each node in rolling fashion

Bounce them all as quickly as is practical, do the upgradesstables after the 
bounces complete






On Sat, Dec 1, 2018 at 1:03 PM Jonathan Haddad 
mailto:j...@jonhaddad.com>> wrote:
Dmitry is right. Generally speaking always go with the latest bug fix release.

On Sat, Dec 1, 2018 at 10:14 AM Dmitry Saprykin 
mailto:saprykin.dmi...@gmail.com>> wrote:
See more here
https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-13004

On Sat, Dec 1, 2018 at 1:02 PM Dmitry Saprykin 
mailto:saprykin.dmi...@gmail.com>> wrote:
Even more, 3.0.9 is a terrible target choice by itself. It has a nasty bug 
corrupting sstables on alter.

On Sat, Dec 1, 2018 at 11:55 AM Marc Selwan 
mailto:marc.sel...@datastax.com>> wrote:
Hi Shravan,

Did you upgrade Apache Cassandra 2.1.9 to the latest patch release before doing 
the major upgrade? It's generally favorable to go to the latest patch release 
as often times they include fixes that smooth over the upgrade process. There 
are hundreds of bug fixes between 2.1.9 and 2.1.20 (current version)

Best,
Marc

On Fri, Nov 30, 2018 at 3:13 PM Shravan R 
mailto:skr...@gmail.com>> wrote:
Hello,

I am planning to upgrade Apache Cassandra 2.1.9 to Apache Cassandra-3.0.9. I 
came up with the version based on [1]. I followed upgrade steps as in [2]. I 
was testing the same in the lab and encountered issues (streaming just fails 
and hangs for ever) with bootstrapping a 3.0.9 node on a partially upgraded 
cluster. [50% of nodes on 2.1.9 and 50% on 3.0.9]. The production cluster that 
I am supporting is pretty large and I am anticipating to end up in a situation 
like this (Hope not) and would like to be prepared.

1) How do deal with decommissioning a 2.1.9 node in a partially upgraded 
cluster?
2) How to bootstrap a 3.x node to a partially upgraded cluster?
3) Is there an alternative approach to the upgrade large

Re: [EXTERNAL] Availability issues for write/update/read workloads (up to 100s downtime) in case of a Cassandra node failure

2018-11-23 Thread Daniel Seybold

Hi Alexander,

thanks a lot for the pointers, I checked the mentioned issue.

While the reported issue seems to match our problem it only occurs reads
and not for writes (according to the Datastax Jira). But we experience
downtimes for writes and reads.

Which version of the Datastax Driver are you using for your tests?

We use version 3.0.0

But I have also tried version 3.2.0 to avoid your mentioned JAVA-1346
issue, but still the same behaviour with respect to the downtime.

How is it configured (load balancing policies, etc...) ?

Besides the write consistency of ONE it uses the default settings.

As we use the YCSB as workload for our experiments, you can have a look
at the driver settings in the basic class:
https://github.com/brianfrankcooper/YCSB/blob/master/cassandra/src/main/java/com/yahoo/ycsb/db/CassandraCQLClient.java

Do you have some debug logs on the client side that could help?

On client side the logs shows no exceptions or any suspicious messages.

I also turned on the tracing but didn't find any suspicious messages
(yet I did not spend too much time in that and I am no expert the
Cassandra Driver)

If more detailed logs or the traces would help to further investigate
the issue let me know and I will rerun the experiments to create the
logs and traces.

Many thanks again for your help.

Cheers,

Daniel

Am 16.11.2018 um 15:08 schrieb Alexander Dejanovski:

Hi Daniel,

it seems like the driver isn't detecting that the node went down,
which is probably due to the way the node is being killed.
If I remember correctly, in some cases Netty transport is still up in
the client, which will still allows to send queries without them
answering back : https://datastax-oss.atlassian.net/browse/JAVA-1346

Eventually, the node gets discarded when the heartbeat system catches up.
It's also possible that the stuck queries then eat up all the
available slots in the driver, preventing any other query to be sent
in that JVM.

Which version of the Datastax Driver are you using for your tests?
How is it configured (load balancing policies, etc...) ?
Do you have some debug logs on the client side that could help?

Thanks,

On Fri, Nov 16, 2018 at 1:19 PM Daniel Seybold
mailto:daniel.seyb...@uni-ulm.de>> wrote:

Hi Sean,

thanks for your comments, find below some more details with
respect to the (1) VM sizing and (2) the replication factor:

(1) VM sizing:

We selected the small VMs as intial setup to run our experiments.
We have also executed the same experiments (5 nodes) on larger VMs
with 6 cores and 12GB memory (where 6GB was allocated to Cassandra).

We use the default CMS garbace collector (with default settings)
and the debug.log and system.log does not show any suspicious GC
messages.

(2) Replication factor

We set the RF to 5 as we want to emulate a scenario which is able
to survive multiple-node failures. We have also tried a RF of 3
(in the 5 node cluster) but the downtime in case of a node failure
persists.

I also attached two plots which show the results with the
downtimes for using the larger VMs and setting the RF to 3

Any further comments much appreciated,

Cheers,
Daniel

Am 09.11.2018 um 19:04 schrieb Durity, Sean R:

The VMs’ memory (4 GB) seems pretty small for Cassandra. What
heap size are you using? Which garbage collector? Are you seeing
long GC times on the nodes? The basic rule of thumb is to give
the Cassandra heap 50% of the RAM on the host. 2 GB isn’t very much.

Also, I wouldn’t set the replication factor to 5 (the number of
nodes). If RF is always equal to the number of nodes, you can’t
really scale beyond the size of the disk on any one node (all
data is on each node). A replication factor of 3 would be more
like a typical production set-up.

Sean Durity

*From:*Daniel Seybold

*Sent:* Friday, November 09, 2018 5:49 AM
*To:* user@cassandra.apache.org
*Subject:* [EXTERNAL] Availability issues for write/update/read
workloads (up to 100s downtime) in case of a Cassandra node failure

Hi Apache Cassandra experts,

we are running a set of availability evaluations under a
write/read/update workloads with Apache Cassandra and experience
some unexpected results, i.e. 0 ops/s over a period up to 100s.

In order to provide a clear picture find below the details of (1)
the setup and (2) the evaluation workflow

*1. Setup:*

Cassandra version: 3.11.2
Cluster size: 5 nodes
Replication Factor: 5
Each nodes runs in the same private OpenStack based cloud, within
the same availability zone and uses the private network.
Each nodes runs as OS Ubuntu 16.04 server and has 2 cores, 4GB
RAM and 50GB disk.

Workload:
Yahoo Cloud Serving Benchmark 0.12
W1: 100% write
W2: 100% read

Re: [EXTERNAL] Availability issues for write/update/read workloads (up to 100s downtime) in case of a Cassandra node failure

2018-11-16 Thread Alexander Dejanovski

Hi Daniel,

it seems like the driver isn't detecting that the node went down, which is
probably due to the way the node is being killed.
If I remember correctly, in some cases Netty transport is still up in the
client, which will still allows to send queries without them answering back
: https://datastax-oss.atlassian.net/browse/JAVA-1346
Eventually, the node gets discarded when the heartbeat system catches up.
It's also possible that the stuck queries then eat up all the available
slots in the driver, preventing any other query to be sent in that JVM.

Which version of the Datastax Driver are you using for your tests?
How is it configured (load balancing policies, etc...) ?
Do you have some debug logs on the client side that could help?

Thanks,


On Fri, Nov 16, 2018 at 1:19 PM Daniel Seybold 
wrote:

> Hi Sean,
>
> thanks for your comments, find below some more details with respect to the
> (1) VM sizing and (2) the replication factor:
>
> (1) VM sizing:
>
> We selected the small VMs as intial setup to run our experiments. We have
> also executed the same experiments (5 nodes) on larger VMs with 6 cores and
> 12GB memory (where 6GB was allocated to Cassandra).
>
> We use the default CMS garbace collector (with default settings) and the
> debug.log and system.log does not show any suspicious GC messages.
>
> (2) Replication factor
>
> We set the RF to 5 as we want to emulate a scenario which is able to
> survive multiple-node failures. We have also tried a RF of 3 (in the 5 node
> cluster) but the downtime in case of a node failure persists.
>
>
> I also attached two plots which show the results with the downtimes for
> using the larger VMs and setting the RF to 3
>
> Any further comments much appreciated,
> Cheers,
> Daniel
>
>
> Am 09.11.2018 um 19:04 schrieb Durity, Sean R:
>
> The VMs’ memory (4 GB) seems pretty small for Cassandra. What heap size
> are you using? Which garbage collector? Are you seeing long GC times on the
> nodes? The basic rule of thumb is to give the Cassandra heap 50% of the RAM
> on the host. 2 GB isn’t very much.
>
>
>
> Also, I wouldn’t set the replication factor to 5 (the number of nodes). If
> RF is always equal to the number of nodes, you can’t really scale beyond
> the size of the disk on any one node (all data is on each node). A
> replication factor of 3 would be more like a typical production set-up.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Daniel Seybold 
> 
> *Sent:* Friday, November 09, 2018 5:49 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Availability issues for write/update/read workloads
> (up to 100s downtime) in case of a Cassandra node failure
>
>
>
> Hi Apache Cassandra experts,
>
> we are running a set of availability evaluations under a write/read/update
> workloads with Apache Cassandra and experience some unexpected results,
> i.e.  0 ops/s over a period up to 100s.
>
> In order to provide a clear picture find below the details of (1) the
> setup and (2) the evaluation workflow
>
> *1. Setup:*
>
> Cassandra version: 3.11.2
> Cluster size: 5 nodes
> Replication Factor: 5
> Each nodes runs in the same private OpenStack based cloud, within the same
> availability zone and uses the private network.
> Each nodes runs as OS Ubuntu 16.04 server and has 2 cores, 4GB RAM and
> 50GB disk.
>
> Workload:
> Yahoo Cloud Serving Benchmark 0.12
> W1: 100% write
> W2: 100% read
> W3: 100% update
>
> *2. Evaluation Workflow: *
>
> 1. allocate 5 VMs & deploy DBMS cluster
> 2. start a YCSB worklod (only one of W1-3) which runs up to 30 minutes
> 3. wait for 200s
> 4. trigger the selection of a  random node in the cluster and delete the
> VM without stopping  Cassandra before
> 5. analyze throughput time series over the evaluation
>
>
>
> *3. (Unexpected) Results *We expected to see a (slight) drop in the
> throughput as soon as the VM was deleted.
> But the throughput results show that the there are periods of ~10s - 150s
> (not deterministic) where no operations are executed (all metrics are
> collected on client side)
> Yet, there are no timeout exceptions on client side and also the logs on
> cluster side do not show anything that explains this behaviour.
>
> I attached a series of plots which show the throughput and the downtimes
> over the evaluation runs.
>
> Do you have any explanations for this behaviour or recommendations how to
> reduce the  potential "downtime" ?
>
> Thanks in advance for any help and recommendations,
>
> Cheers,
> Daniel
>
>
>
> --
>
> M.Sc. Daniel Seybold
>
>
>
> Universität Ulm
>
> Institut Organisation und Management
>
> von Informationssystemen (OMI)
>
> Albert-Einstein-Allee 43 
> 
>
>
> 
>
> 89081 Ulm 
> 
>
> Phone: +49 (0)731 50-28 799 <+49%20731%205028799>
>
>
>

Re: [EXTERNAL] Availability issues for write/update/read workloads (up to 100s downtime) in case of a Cassandra node failure

2018-11-16 Thread Daniel Seybold


Hi Sean,

thanks for your comments, find below some more details with respect to 
the (1) VM sizing and (2) the replication factor:


(1) VM sizing:

We selected the small VMs as intial setup to run our experiments. We 
have also executed the same experiments (5 nodes) on larger VMs with 6 
cores and 12GB memory (where 6GB was allocated to Cassandra).


We use the default CMS garbace collector (with default settings) and the 
debug.log and system.log does not show any suspicious GC messages.


(2) Replication factor

We set the RF to 5 as we want to emulate a scenario which is able to 
survive multiple-node failures. We have also tried a RF of 3 (in the 5 
node cluster) but the downtime in case of a node failure persists.



I also attached two plots which show the results with the downtimes for 
using the larger VMs and setting the RF to 3


Any further comments much appreciated,

Cheers,
Daniel


Am 09.11.2018 um 19:04 schrieb Durity, Sean R:


The VMs’ memory (4 GB) seems pretty small for Cassandra. What heap 
size are you using? Which garbage collector? Are you seeing long GC 
times on the nodes? The basic rule of thumb is to give the Cassandra 
heap 50% of the RAM on the host. 2 GB isn’t very much.


Also, I wouldn’t set the replication factor to 5 (the number of 
nodes). If RF is always equal to the number of nodes, you can’t really 
scale beyond the size of the disk on any one node (all data is on each 
node). A replication factor of 3 would be more like a typical 
production set-up.


Sean Durity

*From:*Daniel Seybold 
*Sent:* Friday, November 09, 2018 5:49 AM
*To:* user@cassandra.apache.org
*Subject:* [EXTERNAL] Availability issues for write/update/read 
workloads (up to 100s downtime) in case of a Cassandra node failure


Hi Apache Cassandra experts,

we are running a set of availability evaluations under a 
write/read/update workloads with Apache Cassandra and experience some 
unexpected results, i.e.  0 ops/s over a period up to 100s.


In order to provide a clear picture find below the details of (1) the 
setup and (2) the evaluation workflow


*1. Setup:*

Cassandra version: 3.11.2
Cluster size: 5 nodes
Replication Factor: 5
Each nodes runs in the same private OpenStack based cloud, within the 
same availability zone and uses the private network.
Each nodes runs as OS Ubuntu 16.04 server and has 2 cores, 4GB RAM and 
50GB disk.


Workload:
Yahoo Cloud Serving Benchmark 0.12
W1: 100% write
W2: 100% read
W3: 100% update

*2. Evaluation Workflow: *

1. allocate 5 VMs & deploy DBMS cluster
2. start a YCSB worklod (only one of W1-3) which runs up to 30 minutes
3. wait for 200s
4. trigger the selection of a  random node in the cluster and delete 
the VM without stopping  Cassandra before

5. analyze throughput time series over the evaluation

*3. (Unexpected) Results

*We expected to see a (slight) drop in the throughput as soon as the 
VM was deleted.
But the throughput results show that the there are periods of ~10s - 
150s (not deterministic) where no operations are executed (all metrics 
are collected on client side)
Yet, there are no timeout exceptions on client side and also the logs 
on cluster side do not show anything that explains this behaviour.


I attached a series of plots which show the throughput and the 
downtimes over the evaluation runs.


Do you have any explanations for this behaviour or recommendations how 
to reduce the  potential "downtime" ?


Thanks in advance for any help and recommendations,

Cheers,
Daniel



--
M.Sc. Daniel Seybold
Universität Ulm
Institut Organisation und Management
von Informationssystemen (OMI)
Albert-Einstein-Allee 43
89081 Ulm
Phone: +49 (0)731 50-28 799



The information in this Internet Email is confidential and may be 
legally privileged. It is intended solely for the addressee. Access to 
this Email by anyone else is unauthorized. If you are not the intended 
recipient, any disclosure, copying, distribution or any action taken 
or omitted to be taken in reliance on it, is prohibited and may be 
unlawful. When addressed to our clients any opinions or advice 
contained in this Email are subject to the terms and conditions 
expressed in any applicable governing The Home Depot terms of business 
or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this 
attachment and for any damages or losses arising from any 
inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or 
other items of a destructive nature, which may be contained in this 
attachment and shall not be liable for direct, indirect, consequential 
or special damages in connection with this e-mail message or its 
attachment.


--
M.Sc. Daniel Seybold

Universität Ulm
Institut Organisation und Management
von Informationssystemen (OMI)
Albert-Einstein-Allee 43
89081 Ulm
Phone: +49 (0)731 50-28 799



cassandra_failures_v2.pdf
Description:

Re: [EXTERNAL] Is Apache Cassandra supports Data at rest

2018-11-14 Thread Jonathan Haddad

Just because Cassandra doesn't do it doesn't mean you aren't able to
encrypt your data at rest, and you definitely don't need DSE to do it.  I
recommend checking out the LUKS project.

https://gitlab.com/cryptsetup/cryptsetup/blob/master/README.md

This, IMO, is a better option than having the database do it since with
this you are able to encrypt everything.  Your logs, indexes, etc.

Jon



On Wed, Nov 14, 2018 at 10:47 AM Durity, Sean R 
wrote:

> I think you are asking about **encryption** at rest. To my knowledge,
> open source Cassandra does not support this natively. There are options,
> like encrypting the data in the application before it gets to Cassandra.
> Some companies offer other solutions. IMO, if you need the increased
> security, it is worth using something like DataStax Enterprise.
>
>
>
>
>
> Sean Durity
>
> *From:* Goutham reddy 
> *Sent:* Tuesday, November 13, 2018 1:22 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Is Apache Cassandra supports Data at rest
>
>
>
> Hi,
>
> Does Apache Cassandra supports data at rest, because datastax Cassandra
> supports it. Can anybody help me.
>
>
>
> Thanks and Regards,
>
> Goutham.
>
> --
>
> Regards
>
> Goutham Reddy
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: [EXTERNAL] Is Apache Cassandra supports Data at rest

2018-11-14 Thread Ben Slater

I wrote a blog post a while ago on the pros and cons of encrypting in your
application for use with Cassandra that you might find useful background on
this subject:
https://www.instaclustr.com/securing-apache-cassandra-with-application-level-encryption/

Cheers
Ben

On Wed, 14 Nov 2018 at 13:47 Durity, Sean R 
wrote:

> I think you are asking about **encryption** at rest. To my knowledge,
> open source Cassandra does not support this natively. There are options,
> like encrypting the data in the application before it gets to Cassandra.
> Some companies offer other solutions. IMO, if you need the increased
> security, it is worth using something like DataStax Enterprise.
>
>
>
>
>
> Sean Durity
>
> *From:* Goutham reddy 
> *Sent:* Tuesday, November 13, 2018 1:22 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Is Apache Cassandra supports Data at rest
>
>
>
> Hi,
>
> Does Apache Cassandra supports data at rest, because datastax Cassandra
> supports it. Can anybody help me.
>
>
>
> Thanks and Regards,
>
> Goutham.
>
> --
>
> Regards
>
> Goutham Reddy
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
-- 


*Ben Slater*

*Chief Product Officer *

   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

RE: [EXTERNAL] Is Apache Cassandra supports Data at rest

2018-11-14 Thread Durity, Sean R

I think you are asking about *encryption* at rest. To my knowledge, open source 
Cassandra does not support this natively. There are options, like encrypting 
the data in the application before it gets to Cassandra. Some companies offer 
other solutions. IMO, if you need the increased security, it is worth using 
something like DataStax Enterprise.


Sean Durity
From: Goutham reddy 
Sent: Tuesday, November 13, 2018 1:22 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Is Apache Cassandra supports Data at rest

Hi,
Does Apache Cassandra supports data at rest, because datastax Cassandra 
supports it. Can anybody help me.

Thanks and Regards,
Goutham.
--
Regards
Goutham Reddy



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

RE: [EXTERNAL] Re: Multiple cluster for a single application

2018-11-08 Thread Durity, Sean R

We have a cluster over 100 nodes that performs just fine for its use case. In 
our case, we needed the disk space and did not want the admin headache of very 
dense nodes. It does take more automation and process to handle a larger 
cluster, but those are all good things to solve anyway.

But count me in on being interested in what DataStax is calling “Big Node.” 
Would love to be able to use denser nodes, if the headaches are reduced.

Sean Durity

From: Ben Slater 
Sent: Wednesday, November 07, 2018 6:08 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Multiple cluster for a single application

I tend to recommend an approach similar to Eric’s functional sharding although 
I describe it at quality of service sharding - group your small, hot data into 
one cluster and your large, cooler data into another so you can provision 
infrastructure and tune according. I guess it depends on you management 
environment but if you app functionality allows your to split into multiple 
clusters (ie all your data is not all in one giant table) then I would 
generally look to split. Splitting also gives you the advantage of making it 
harder to have an outage that brings everything down.

Cheers
Ben

On Thu, 8 Nov 2018 at 08:44 Jonathan Haddad 
mailto:j...@jonhaddad.com>> wrote:
Interesting approach Eric, thanks for sharing that.

Regarding this:

> I've read documents recommended to use clusters with less than 50 or 100 
> nodes (Netflix got hundreds of clusters with less 100 nodes on each).

Not sure where you read that, but it's nonsense.  We work with quite a few 
clusters that are several hundred nodes each.  Your problems can get a bit 
amplified, for instance dynamic snitch can make a cluster perform significantly 
worse than if you just flat out disable it, which is what I usually recommend.

I'm curious how you arrived at the estimate of needing > 100 nodes.  Is that 
due to space constraints or performance ones?

On Wed, Nov 7, 2018 at 12:52 PM Eric Stevens 
mailto:migh...@gmail.com>> wrote:
We are engaging in both strategies at the same time:

1) We call it functional sharding - we write to clusters targeted according to 
the type of data being written.  Because different data types often have 
different workloads this has the nice side effect of being able to tune each 
cluster according to its workload.  Your ability to grow in this dimension is 
limited by the number of business object types you're recording.

2) We write to clusters sharded by time.  Our objects are network security 
events, so there's always an element of time.  We encode that time into 
deterministic object IDs so that we are able to identify in the read path which 
shard to direct the request to by extracting the time component.  This basic 
idea should be able to work any time you're able to use surrogate keys instead 
of natural keys.  If you are using natural keys, you may be facing an 
unpleasant migration should you need to increase the number of shards in this 
dimension.

Our reason for engaging in the second strategy was not purely Cassandra's 
fault, rather we were using DSE with a search workload, and the cost of 
rebuilding Solr indexes on streaming operations (such as adding nodes to an 
existing cluster) required enough resources that we found it prohibitive.  
That's because the bootstrapping node was also taking a production write 
workload, and we didn't want to run our cluster with enough overhead that a 
node could bootstrap and take production workload at the same time.

For vanilla Cassandra workloads we have run clusters with quite a bit more 
nodes than 100 without any appreciable trouble.  Curious if you can share 
documents about clusters over 100 nodes causing troubles for users.  I'm 
wondering if it's related to node failure rate combined with vnodes meaning 
that several concurrent node failures cause a part of the ring to go offline 
too reliably.

On Mon, Nov 5, 2018 at 7:38 AM onmstester onmstester 
 wrote:
Hi,

One of my applications requires to create a cluster with more than 100 nodes, 
I've read documents recommended to use clusters with less than 50 or 100 nodes 
(Netflix got hundreds of clusters with less 100 nodes on each).
Is it a good idea to use multiple clusters for a single application, just to 
decrease maintenance problems and system complexity/performance?
If So, which one of below policies is more suitable to distribute data among 
clusters and Why?
1. each cluster' would be responsible for a specific partial set of tables only 
(table sizes are almost equal so easy calculations here) for example inserts to 
table X would go to cluster Y
2. shard data at loader level by some business logic grouping of data, for 
example all rows with some column starting with X would go to cluster Y

I would appreciate sharing your experiences working with big clusters, problem 
encountered and solutions.

Thanks in Advance

Sent using Zoho

RE: [EXTERNAL] Re: rolling version upgrade, upgradesstables, and vulnerability window

2018-10-30 Thread Durity, Sean R

Just to pile on:

I agree. On our upgrades, I always aim to get the binary part done on all nodes 
before worrying about upgradesstables. Upgrade is one node at a time 
(precautionary). Upgradesstables depends on cluster size, data size, 
compactionthroughput, etc. I usually start with running upgradesstables on 2 
nodes per DC and watch how the application performs. On larger clusters (over 
30 nodes), I usually work up to 4-5 nodes per DC running upgradesstables with 
staggered start times.

NOTE: I am rarely doing streaming operations outside of repairs. But I want to 
be able to handle a down node, etc., so I do not run in mixed version mode very 
long.

Sean Durity

From: Carl Mueller 
Sent: Tuesday, October 30, 2018 11:51 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: rolling version upgrade, upgradesstables, and 
vulnerability window

Thank you very much. I couldn't find any definitive answer on that on the list 
or stackoverflow.

It's clear that the safest for a prod cluster is rolling version upgrade of the 
binary, then the upgradesstables.

I will strongly consider cstar for the upgradesstables

On Tue, Oct 30, 2018 at 10:39 AM Alexander Dejanovski 
mailto:a...@thelastpickle.com>> wrote:
Yes, as the new version can read both the old and the new sstables format.

Restrictions only apply when the cluster is in mixed versions.

On Tue, Oct 30, 2018 at 4:37 PM Carl Mueller 
mailto:carl.muel...@smartthings.com.invalid>>
 wrote:
But the topology change restrictions are only in place while there are 
heterogenous versions in the cluster? All the nodes at the upgraded version 
with "degraded" sstables does NOT preclude topology changes or node 
replacement/addition?

On Tue, Oct 30, 2018 at 10:33 AM Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:
Wait for 3.11.4 to be cut

I also vote for doing all the binary bounces and upgradesstables after the 
fact, largely because normal writes/compactions are going to naturally start 
upgrading sstables anyway, and there are some hard restrictions on mixed mode 
(e.g. schema changes won’t cross version) that can be far more impactful.

--
Jeff Jirsa

> On Oct 30, 2018, at 8:21 AM, Carl Mueller 
> mailto:carl.muel...@smartthings.com>.INVALID> 
> wrote:
>
> We are about to finally embark on some version upgrades for lots of clusters, 
> 2.1.x and 2.2.x targetting eventually 3.11.x
>
> I have seen recipes that do the full binary upgrade + upgrade sstables for 1 
> node before moving forward, while I've seen a 2016 vote by Jon Haddad (a TLP 
> guy) that backs doing the binary version upgrades through the cluster on a 
> rolling basis, then doing the upgradesstables on a rolling basis.
>
> Under what cluster conditions are streaming/node replacement precluded, that 
> is we are vulnerable to a cloud provided dumping one of our nodes under us or 
> hardware failure? We ain't apple, but we do have 30+ node datacenters and 
> 80-100 node clusters.
>
> Is the node replacement and streaming only disabled while there are 
> heterogenous cassandra versions, or until all the sstables have been upgraded 
> in the cluster?
>
> My instincts tell me the best thing to do is to get all the cassandra nodes 
> to the same version without the upgradesstables step through the cluster, and 
> then roll through the upgradesstables as needed, and that upgradesstables is 
> a node-local concern that doesn't impact streaming or node replacement or 
> other situations since cassandra can read old version sstables and new 
> sstables would simply be the new format.

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org
--
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies,

RE: [EXTERNAL] Re: [E] Re: nodetool status and node maintenance

2018-10-29 Thread Durity, Sean R

I have wrapped nodetool info into my own script that strips out and interprets 
the information I care about. That script also sets a return code based on the 
health of that node (which protocols are up, etc.). Then I can monitor the 
individual health of the node – as that node sees itself. I have found these 
much more actionable than up/down alerts from a single node’s view of the whole 
cluster (like nodetool status)


Sean Durity

From: Saha, Sushanta K 
Sent: Monday, October 29, 2018 7:52 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: [E] Re: nodetool status and node maintenance

Thanks!

On Fri, Oct 26, 2018 at 2:39 PM Alain RODRIGUEZ 
mailto:arodr...@gmail.com>> wrote:
Hello

Any way to temporarily make the node under maintenance invisible  from 
"nodetool status" output?

I don't think so.
I would use a different approach like for example only warn/email when the node 
is down for 30 seconds or a minute depending on how long it takes for your 
nodes to restart. This way the failure is not invisible, but ignored when only 
bouncing the nodes.

As a side note, be aware that the 'nodetool status' only give a view of the 
cluster from a specific node, that can be completely wrong as well :).

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Le ven. 26 oct. 2018 à 15:16, Saha, Sushanta K 
mailto:sushanta.s...@verizonwireless.com>> a 
écrit :
I have script that parses "nodetool status" output and emails alerts if any 
node is down. So, when I stop cassandra on a node for maintenance, all nodes 
stats emailing alarms.

Any way to temporarily make the node under maintenance invisible  from 
"nodetool status" output?

Thanks



--

Sushanta Saha|MTS IV-Cslt-Sys Engrg|WebIaaS_DB Group|HQ - VerizonWireless
O 770.797.1260  C 770.714.6555 Iaas Support Line 949-286-8810



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

RE: [EXTERNAL] Re: Installing a Cassandra cluster with multiple Linux OSs (Ubuntu+CentOS)

2018-10-23 Thread Durity, Sean R

Agreed. I have run clusters with both RHEL5 and RHEL6 nodes.

Sean Durity
From: Jeff Jirsa 
Sent: Sunday, October 14, 2018 12:40 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Installing a Cassandra cluster with multiple Linux OSs 
(Ubuntu+CentOS)

Should be fine, just get the java and kernel versions and kernel tuning params 
as close as possible

--
Jeff Jirsa

On Oct 14, 2018, at 5:09 PM, Eyal Bar 
mailto:eyal@kenshoo.com>> wrote:
Hi all,

Did anyone installed a Cassandra cluster with mixed Linux OSs where some of the 
nodes were ubuntu 12\14\16 and some of the nodes where CentOS7?

Will it work without issues?

Rational: We have a 40 servers cluster which was originally installed only with 
Ubuntu servers. Now we want to move to CentOS 7 but the effort of reinstalling 
the entire cluster + migration to CentOS 7 is not simple. So we thought about 
adding new CentOS 7 nodes to the existing cluster and gradually remove the 
Ubuntu ones.

Would love to read your thoughts.

Best,

--
Eyal Bar
Big Data Ops Team Lead | Data Platform and Monitoring  | Kenshoo
Office +972 (3) 746-6500 *473
Mobile +972 (52) 458-6100
www.Kenshoo.com
[Transform your marketing. Grow your 
business.]

This e-mail, as well as any attached document, may contain material which is 
confidential and privileged and may include trademark, copyright and other 
intellectual property rights that are proprietary to Kenshoo Ltd,  its 
subsidiaries or affiliates ("Kenshoo"). This e-mail and its attachments may be 
read, copied and used only by the addressee for the purpose(s) for which it was 
disclosed herein. If you have received it in error, please destroy the message 
and any attachment, and contact us immediately. If you are not the intended 
recipient, be aware that any review, reliance, disclosure, copying, 
distribution or use of the contents of this message without Kenshoo's express 
permission is strictly prohibited.

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: [EXTERNAL] Re: Tracing in cassandra

2018-10-12 Thread Nitan Kainth

Try query with partition key selection in where clause. But time for limit 11 
shouldn’t fail. Are all nodes up? Do you see any corruption in ay sstable?

Sent from my iPhone

> On Oct 12, 2018, at 11:40 AM, Abdul Patel  wrote:
> 
> Sean,
> 
> here it is :
> CREATE TABLE Keyspave.tblname (
> user_id bigint,
> session_id text,
> application_guid text,
> last_access_time timestamp,
> login_time timestamp,
> status int,
> terminated_by text,
> update_time timestamp,
> PRIMARY KEY (user_id, session_id)
> ) WITH CLUSTERING ORDER BY (session_id ASC)
> 
> also they see timeouts with limit 11 as well, so is it better to remove with 
> limit option ? or whats best to query such schema?
> 
>> On Fri, Oct 12, 2018 at 11:05 AM Durity, Sean R 
>>  wrote:
>> Cross-partition = multiple partitions
>> 
>>  
>> 
>> Simple example:
>> 
>> Create table customer (
>> 
>> Customerid int,
>> 
>> Name text,
>> 
>> Lastvisit date,
>> 
>> Phone text,
>> 
>> Primary key (customerid) );
>> 
>>  
>> 
>> Query
>> 
>> Select customerid from customer limit 5000;
>> 
>>  
>> 
>> The query is asking for 5000 different partitions to be selected across the 
>> cluster. This is a very EXPENSIVE query for Cassandra, especially as the 
>> number of nodes goes up. Typically, you want to query a single partition. 
>> Read timeouts are usually caused by queries that are selecting many 
>> partitions or a very large partition. That is why a schema for the involved 
>> table could help.
>> 
>>  
>> 
>>  
>> 
>> Sean Durity
>> 
>>  
>> 
>> From: Abdul Patel  
>> Sent: Friday, October 12, 2018 10:04 AM
>> To: user@cassandra.apache.org
>> Subject: [EXTERNAL] Re: Tracing in cassandra
>> 
>>  
>> 
>> Cpuld you elaborate cross partition query?
>> 
>> On Friday, October 12, 2018, Durity, Sean R  
>> wrote:
>> 
>> I suspect you are doing a cross-partition query, which will not scale well 
>> (as you can see). What is the schema for the table involved?
>> 
>>  
>> 
>>  
>> 
>> Sean Durity
>> 
>>  
>> 
>> From: Abdul Patel  
>> Sent: Thursday, October 11, 2018 5:54 PM
>> To: a...@instaclustr.com
>> Cc: user@cassandra.apache.org
>> Subject: [EXTERNAL] Re: Tracing in cassandra
>> 
>>  
>> 
>> Query :
>> 
>> SELECT * FROM keysoace.tablenameWHERE user_id = 390797583 LIMIT 5000; 
>> 
>> -Error: ReadTimeout: Error from server: code=1200 [Coordinator node timed 
>> out waiting for replica nodes' responses] message="Operation timed out - 
>> received only 0 responses." info={'received_responses': 0, 
>> 'required_responses': 1, 'consistency': 'ONE'}
>> 
>>  
>> 
>> e70ac650-cd9e-11e8-8e99-15807bff4dfd | e70bd7c0-cd9e-11e8-8e99-15807bff4dfd 
>> | 
>> Parsing SELECT * FROM keysoace.tablenameWHERE user_id = 390797583 LIMIT 
>> 5000; | 10.54.145.32 |   4020 |   
>> Native-Transport-Requests-3
>> 
>> e70ac650-cd9e-11e8-8e99-15807bff4dfd | e70bfed0-cd9e-11e8-8e99-15807bff4dfd 
>> |
>> Preparing statement | 
>> 10.54.145.32 |   5065 |   Native-Transport-Requests-3
>> 
>> e70ac650-cd9e-11e8-8e99-15807bff4dfd | e70c25e0-cd9e-11e8-8e99-15807bff4dfd 
>> |
>>  Executing single-partition query on roles | 
>> 10.54.145.32 |   6171 |   ReadStage-2
>> 
>> e70ac650-cd9e-11e8-8e99-15807bff4dfd | e70c4cf0-cd9e-11e8-8e99-15807bff4dfd 
>> |
>>Acquiring sstable references | 
>> 10.54.145.32 |   6362 |   ReadStage-2
>> 
>> e70ac650-cd9e-11e8-8e99-15807bff4dfd | e70c4cf1-cd9e-11e8-8e99-15807bff4dfd 
>> |  
>> Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones | 
>> 10.54.145.32 |   6641 |   ReadStage-2
>> 
>> e70ac650-cd9e-11e8-8e99-15807bff4dfd | e70c4cf2-cd9e-11e8-8e99-15807bff4dfd 
>> |
>>   Key cache hit for sstable 346 | 
>> 10.54.145.32 |   6955 |   ReadStage-2
>> 
>> e70ac650-cd9e-11e8-8e99-15807bff4dfd | e70c4cf3-cd9e-11e8-8e99-15807bff4dfd 
>> |
>>Bloom filter allows skipping sstable 347 | 
>> 10.54.145.32 |   7202 |   ReadStage-2
>> 
>> e70ac650-cd9e-11e8-8e99-15807bff4dfd | e70c7400-cd9e-11e8-8e99-15807bff4dfd 
>> |

Re: [EXTERNAL] Re: Tracing in cassandra

2018-10-12 Thread Abdul Patel

Sean,

here it is :
CREATE TABLE Keyspave.tblname (
user_id bigint,
session_id text,
application_guid text,
last_access_time timestamp,
login_time timestamp,
status int,
terminated_by text,
update_time timestamp,
PRIMARY KEY (user_id, session_id)
) WITH CLUSTERING ORDER BY (session_id ASC)

also they see timeouts with limit 11 as well, so is it better to remove
with limit option ? or whats best to query such schema?

On Fri, Oct 12, 2018 at 11:05 AM Durity, Sean R 
wrote:

> Cross-partition = multiple partitions
>
>
>
> Simple example:
>
> Create table customer (
>
> Customerid int,
>
> Name text,
>
> Lastvisit date,
>
> Phone text,
>
> Primary key (customerid) );
>
>
>
> Query
>
> Select customerid from customer limit 5000;
>
>
>
> The query is asking for 5000 different partitions to be selected across
> the cluster. This is a very EXPENSIVE query for Cassandra, especially as
> the number of nodes goes up. Typically, you want to query a single
> partition. Read timeouts are usually caused by queries that are selecting
> many partitions or a very large partition. That is why a schema for the
> involved table could help.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Abdul Patel 
> *Sent:* Friday, October 12, 2018 10:04 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Tracing in cassandra
>
>
>
> Cpuld you elaborate cross partition query?
>
> On Friday, October 12, 2018, Durity, Sean R 
> wrote:
>
> I suspect you are doing a cross-partition query, which will not scale well
> (as you can see). What is the schema for the table involved?
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Abdul Patel 
> *Sent:* Thursday, October 11, 2018 5:54 PM
> *To:* a...@instaclustr.com
> *Cc:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Tracing in cassandra
>
>
>
> Query :
>
> SELECT * FROM keysoace.tablenameWHERE user_id = 390797583 LIMIT 5000;
>
> -Error: ReadTimeout: Error from server: code=1200 [Coordinator node timed
> out waiting for replica nodes' responses] message="Operation timed out -
> received only 0 responses." info={'received_responses': 0,
> 'required_responses': 1, 'consistency': 'ONE'}
>
>
>
> e70ac650-cd9e-11e8-8e99-15807bff4dfd |
> e70bd7c0-cd9e-11e8-8e99-15807bff4dfd
> |
> Parsing SELECT * FROM keysoace.tablenameWHERE user_id = 390797583 LIMIT
> 5000; | 10.54.145.32 |   4020 |
> Native-Transport-Requests-3
>
> e70ac650-cd9e-11e8-8e99-15807bff4dfd |
> e70bfed0-cd9e-11e8-8e99-15807bff4dfd
> |
> Preparing statement |
> 10.54.145.32 |   5065 |
> Native-Transport-Requests-3
>
> e70ac650-cd9e-11e8-8e99-15807bff4dfd |
> e70c25e0-cd9e-11e8-8e99-15807bff4dfd |
>   
> Executing
> single-partition query on roles | 10.54.145.32 |   6171
> |   ReadStage-2
>
> e70ac650-cd9e-11e8-8e99-15807bff4dfd |
> e70c4cf0-cd9e-11e8-8e99-15807bff4dfd
> |
> Acquiring sstable references | 10.54.145.32 |   6362
> |   ReadStage-2
>
> e70ac650-cd9e-11e8-8e99-15807bff4dfd |
> e70c4cf1-cd9e-11e8-8e99-15807bff4dfd
> |
> Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones |
> 10.54.145.32 |   6641 |
> ReadStage-2
>
> e70ac650-cd9e-11e8-8e99-15807bff4dfd |
> e70c4cf2-cd9e-11e8-8e99-15807bff4dfd
> |
> Key cache hit for sstable 346 | 10.54.145.32 |   6955
> |   ReadStage-2
>
> e70ac650-cd9e-11e8-8e99-15807bff4dfd |
> e70c4cf3-cd9e-11e8-8e99-15807bff4dfd
> |
>Bloom filter allows skipping sstable 347 | 10.54.145.32
> |   7202 |   ReadStage-2
>
> e70ac650-cd9e-11e8-8e99-15807bff4dfd |
> e70c7400-cd9e-11e8-8e99-15807bff4dfd
> |
>   Merged data
> from memtables and 2 sstables | 10.54.145.32 |   7386
> |   ReadStage-2
>
> e70ac650-cd9e-11e8-8e99-15807bff4dfd |
> e70c7401-cd9e-11e8-8e99-15807bff4dfd
> |
> Read 1 live and 0 tombstone cells | 10.54.145.32 |   7519
> |   ReadStage-2
>
> e70ac650-cd9e-11e8-8e99-15807bff4dfd |
> e70c7402-cd9e-11e8-8e99-15807bff4dfd
> |
> Executing single-partition query on roles | 10.54.145.32 |   7826
> |   ReadStage-4
>
> e70ac650-cd9e-11e8-8e99-15807bff4dfd |
> e70c7403-cd9e-11e8-8e99-15807bff4dfd
> |
> Acquiring sstable references | 10.54.145.32 |   7924
> |   ReadStage-4
>
> e70ac650-cd9e-11e8-8e99-15807bff4dfd |
> e70c7404-cd9e-11e8-8e99-15807bff4dfd
> |
> Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones |
> 10.54.145.32 |   8060 |
> ReadStage-4
>
> e70ac650-cd9e-11e8-8e99-15807bff4dfd |
> e70c7405-cd9e-11e8-8e99-15807bff4dfd
> |
>

Re: [EXTERNAL] Upcoming Cassandra-related Conferences

2018-10-08 Thread Peter Corless

Hey folks!

Sean: I did a blog on DIstributed Data Summit
.
On top of the Scylla-oriented content, I covered Nate's keynote and
highlighted the sidecar talk by Netflix (incl. YouTube video for anyone who
wanted to watch it after-the-fact). I'd be interested to read & compare any
other similar blogs on the event. (Summit-as-Rashomon, as it were.)

Max: Thanks for the shout-out on Scylla Summit
. Besides Scylla-oriented
tracks, we'll also have presentations from Kong, Kafka (KSQL), KairosDB for
time-series, OpenNMS, and Red Hat talking about rebuilding Ceph on the
Seastar framework.

-Pete.

On Mon, Oct 8, 2018 at 7:29 AM Durity, Sean R 
wrote:

> Thank you. I do want to hear about future conferences. I would also love
> to hear reports/summaries/highlights from folks who went to Distributed
> Data Summit (or other conferences). I think user conferences are great!
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Max C. 
> *Sent:* Friday, October 05, 2018 8:33 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Upcoming Cassandra-related Conferences
>
>
>
> Some upcoming Cassandra-related conferences, if anyone is interested:
>
>
>
> *Scylla Summit*
>
> November 5-7, 2018
>
> Pullman San Francisco Bay Hotel, Redwood City CA
>
> https://www.scylladb.com/scylla-summit-2018/
> 
>
>
>
> (This one seems to be almost entirely Scylla focussed, maybe not terribly
> useful for non-Scylla users)
>
>
>
> *DataStax Accelerate*
>
> May 21-23, 2019
> National Harbor, Maryland
>
> https://www.datastax.com/accelerate
> 
>
>
>
> (No talks list or sponsors have been posted yet)
>
>
>
> *DISCLAIMER:*
>
> I’m not in the middle of the politics or nor do I have any affiliation
> with either of these companies.  I just thought lowly users like myself
> might appreciate the mention these on the -users list.
>
>
>
> I wish we should have had a post or two about the Distributed Data Summit;
>  I think we probably would have had an even better conference!  :-)
>
>
>
> - Max
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


-- 
Peter Corless
Technical Marketing Manager
pe...@scylladb.com
650-906-3134

RE: [EXTERNAL] Upcoming Cassandra-related Conferences

2018-10-08 Thread Durity, Sean R

Thank you. I do want to hear about future conferences. I would also love to 
hear reports/summaries/highlights from folks who went to Distributed Data 
Summit (or other conferences). I think user conferences are great!

Sean Durity

From: Max C. 
Sent: Friday, October 05, 2018 8:33 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Upcoming Cassandra-related Conferences

Some upcoming Cassandra-related conferences, if anyone is interested:

Scylla Summit
November 5-7, 2018
Pullman San Francisco Bay Hotel, Redwood City CA
https://www.scylladb.com/scylla-summit-2018/

(This one seems to be almost entirely Scylla focussed, maybe not terribly 
useful for non-Scylla users)

DataStax Accelerate
May 21-23, 2019
National Harbor, Maryland
https://www.datastax.com/accelerate

(No talks list or sponsors have been posted yet)

DISCLAIMER:
I’m not in the middle of the politics or nor do I have any affiliation with 
either of these companies.  I just thought lowly users like myself might 
appreciate the mention these on the -users list.

I wish we should have had a post or two about the Distributed Data Summit;  I 
think we probably would have had an even better conference!  :-)

- Max

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

< 1 2 3 4 >

101 - 200 of 351 matches

Mail list logo