Re: sstableloader & num_tokens change

2020-01-24 Thread Erick Ramirez
On the subject of DSBulk, sstableloader is the tool of choice for this scenario. +1 to Sergio and I'm confirming that DSBulk is designed as a bulk loader for CSV/JSON formats. Cheers!

Re: sstableloader & num_tokens change

2020-01-24 Thread Erick Ramirez
> If I may just loop this back to the question at hand: > > I'm curious if there are any gotchas with using sstableloader to restore > snapshots taken from 256-token nodes into a cluster with 32-token (or your > preferred number of tokens) nodes (otherwise same # of nodes and same RF). > No, there

Re: sstableloader & num_tokens change

2020-01-24 Thread Voytek Jarnot
If I may just loop this back to the question at hand: I'm curious if there are any gotchas with using sstableloader to restore snapshots taken from 256-token nodes into a cluster with 32-token (or your preferred number of tokens) nodes (otherwise same # of nodes and same RF). On Fri, Jan 24, 2020

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-24 Thread Reid Pinchback
Just a thought along those lines. If the memtable flush isn’t keeping up, you might find that manifested in the I/O queue length and dirty page stats leading into the time the OOM event took place. If you do see that, then you might need to do some I/O tuning as well. From: Jeff Jirsa Reply-

Re: sstableloader & num_tokens change

2020-01-24 Thread Sergio
https://docs.datastax.com/en/dsbulk/doc/dsbulk/reference/dsbulkLoad.html Just skimming through the docs I see examples by loading from CSV / JSON Maybe there is some other command or doc page that I am missing On Fri, Jan 24, 2020, 9:10 AM Nitan Kainth wrote: > Dsbulk works same as sstable

Re: sstableloader & num_tokens change

2020-01-24 Thread Nitan Kainth
Dsbulk works same as sstableloder. Regards, Nitan Cell: 510 449 9629 > On Jan 24, 2020, at 10:40 AM, Sergio wrote: > >  > I was wondering if that improvement for token allocation would work even with > just one rack. It should but I am not sure. > > Does Dsbulk support migration cluster to

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-24 Thread Jeff Jirsa
Ah, I focused too much on the literal meaning of startup. If it's happening JUST AFTER startup, it's probably getting flooded with hints from the other hosts when it comes online. If that's the case, it may be just simply overrunning the memtable, or it may be a deadlock like https://issues.apache

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-24 Thread Jeff Jirsa
6 GB of mutations on heap Startup would replay commitlog, which would re-materialize all of those mutations and put them into the memtable. The memtable would flush over time to disk, and clear the commitlog. It looks like PERHAPS the commitlog replay is faster than the memtable flush, so you're b

Re: sstableloader & num_tokens change

2020-01-24 Thread Voytek Jarnot
Why? Seems to me that the old Cassandra -> CSV/JSON and CSV/JSON -> new Cassandra are unnecessary steps in my case. On Fri, Jan 24, 2020 at 10:34 AM Nitan Kainth wrote: > Instead of sstableloader consider dsbulk by datastax. > > On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback < > rpinchb...@trip

Re: sstableloader & num_tokens change

2020-01-24 Thread Sergio
I was wondering if that improvement for token allocation would work even with just one rack. It should but I am not sure. Does Dsbulk support migration cluster to cluster without CSV or JSON export? Thanks and Regards On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth wrote: > Instead of sstableloader

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-24 Thread Michael Shuler
Some environment details like Cassandra version, amount of physical RAM, JVM configs (heap and others), and any other non-default cassandra.yaaml configs would help. The amount of data, number of keyspaces & tables, since you mention "clients", would also be helpful for people to suggest tuning

Re: sstableloader & num_tokens change

2020-01-24 Thread Nitan Kainth
Instead of sstableloader consider dsbulk by datastax. On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback wrote: > Jon Haddad has previously made the case for num_tokens=4. His Accelerate > 2019 talk is available at: > > > > https://www.youtube.com/watch?v=swL7bCnolkU > > > > You might want to chec

Re: sstableloader & num_tokens change

2020-01-24 Thread Reid Pinchback
Jon Haddad has previously made the case for num_tokens=4. His Accelerate 2019 talk is available at: https://www.youtube.com/watch?v=swL7bCnolkU You might want to check that out. Also I think the amount of effort you put into evening out the token distribution increases as vnode count shrinks.

sstableloader & num_tokens change

2020-01-24 Thread Voytek Jarnot
Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 node RF=3 cluster. I've read that 256 is not an optimal default num_tokens value, and that 32 is likely a better option. We have the "opportunity" to switch, as we're migrating environments and will likely be using sstablel

Re: Uneven token distribution with allocate_tokens_for_keyspace

2020-01-24 Thread Léo FERLIN SUTTON
Hi Anthony ! I have a follow-up question : Check to make sure that no other node in the cluster is assigned any of the > four tokens specified above. If there is another node in the cluster that > is assigned one of the above tokens, increment the conflicting token by > values of one until no oth

Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-24 Thread Behroz Sikander
We recently had a lot of OOM in C* and it was generally happening during startup. We took some heap dumps but still cannot pin point the exact reason. So, we need some help from experts. Our clients are not explicitly deleting data but they have TTL enabled. C* details: > show version [cqlsh 5.