Re: Seed nodes of DC2 creating own versions of system keyspaces

2018-03-06 Thread Jeff Jirsa
On Tue, Mar 6, 2018 at 9:50 AM, Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On 6 Mar 2018 16:55, "Jeff Jirsa" wrote: > > On Mar 6, 2018, at 12:32 AM, Oleksandr Shulgin < > oleksandr.shul...@zalando.de> wrote: > > On 5 Mar 2018 16:13, "Jeff Jirsa"

Re: Seed nodes of DC2 creating own versions of system keyspaces

2018-03-06 Thread Oleksandr Shulgin
On 6 Mar 2018 16:55, "Jeff Jirsa" wrote: On Mar 6, 2018, at 12:32 AM, Oleksandr Shulgin wrote: On 5 Mar 2018 16:13, "Jeff Jirsa" wrote: On Mar 5, 2018, at 6:40 AM, Oleksandr Shulgin wrote: We

Re: Cassandra at Instagram with Dikang Gu interview by Jeff Carpenter

2018-03-06 Thread Goutham reddy
It’s an interesting conversation. For more details about the pluggable storage engine here is the link. Blog: https://thenewstack.io/instagram-supercharges-cassandra-pluggable-rocksdb-storage-engine/ JIRA: https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-13475 On Tue, Mar

Re: system.size_estimates - safe to remove sstables?

2018-03-06 Thread Chris Lohfink
While its off you can delete the files in the directory yeah Chris > On Mar 6, 2018, at 2:35 AM, Kunal Gangakhedkar > wrote: > > Hi Chris, > > I checked for snapshots and backups - none found. > Also, we're not using opscenter, hadoop or spark or any such tool. > >

Re: DCAwareRoundRobinPolicy question

2018-03-06 Thread Justin Cameron
Hi Nicu, By default (with those lines commented), your application will only read data from the local datacenter. If you get different results when those lines are uncommented it means that the data is not consistent between your datacenters. You'll need to run a full repair to ensure that the

Re: Cassandra 2.1.18 - Concurrent nodetool repair resulting in > 30K SSTables for a single small (GBytes) CF

2018-03-06 Thread kurt greaves
> > What we did have was some sort of overlapping between our daily repair > cronjob and the newly added node still in progress joining. Don’t know if > this sort of combination might causing troubles. I wouldn't be surprised if this caused problems. Probably want to avoid that. with waiting a

Adding disk to operating C*

2018-03-06 Thread Eunsu Kim
Hello, I use 5 nodes to create a cluster of Cassandra. (SSD 1TB) I'm trying to mount an additional disk(SSD 1TB) on each node because each disk usage growth rate is higher than I expected. Then I will add the the directory to data_file_directories in cassanra.yaml Can I get advice from who

Re: Cassandra/Spark failing to process large table

2018-03-06 Thread Ben Slater
Hi Faraz Yes, it likely does mean there is inconsistency in the replicas. However, you shouldn’t be too freaked out about it - Cassandra is design to allow for this inconsistency to occur and the consistency levels allow you to achieve consistent results despite replicas not being consistent. To

Re: One time major deletion/purge vs periodic deletion

2018-03-06 Thread Charulata Sharma (charshar)
Well it’s not like that. We don’t just purge. There are business rules which will decide the records to be purged or archived and then purged, so cannot rely on TTL. Thanks, Charu From: Jens Rantil Reply-To: "user@cassandra.apache.org" Date:

Re: ReplayException after shutdown and restart

2018-03-06 Thread André Schütz
Another information about this error. Before creating the keyspace and the tables, we removed all former commitlog entries and data from the local data folder. "Testimport" is the keyspace for the tables. /cassandra/data/commitlog/* /cassandra/data/data/testimport/* After the filling of th

ReplayException after shutdown and restart

2018-03-06 Thread André Schütz
Hi, we created a keyspace that contained two tables. These tables are filled with data. For a shutdown of the machine, we performed "nodetool flush" (and also in a second try "nodetool drain"). After that, we stopped the Cassandra service with "SigTerm" We changed nothing after the system stop

cfhistograms InstanceNotFoundException EstimatePartitionSizeHistogram

2018-03-06 Thread onmstester onmstester
Running this command: nodetools cfhistograms keyspace1 table1 throws this exception in production server: javax.management.InstanceNotFoundException: org.apache.cassandra.metrics:type=Table,keyspace=keyspace1,scope=table1,name=EstimatePartitionSizeHistogram But i have no problem in a test

Re: Rocksandra blog post

2018-03-06 Thread Romain Hardouin
Rocksandra is very interesting for key/value data model. Let's hope it will land in C* upstream in the near future thanks to pluggable storage.Thanks Dikang! Le mardi 6 mars 2018 à 10:06:16 UTC+1, Kyrylo Lebediev a écrit : #yiv7016643451 #yiv7016643451 --

Re: [External] Re: Whch version is the best version to run now?

2018-03-06 Thread Carlos Rolo
Hello, Our 5 cents. Either 3.0.16 or 3.11.x We are really happy with the way 3.11.1/2 is behaving. We still have a lot of really well behaving Clusters in 2.1/2.2 latest. Regards, Carlos Juzarte Rolo Cassandra Consultant / Datastax Certified Architect / Cassandra MVP Pythian - Love your data

Re: Rocksandra blog post

2018-03-06 Thread Carl Mueller
Basically they are avoiding gc, right? Not necessarily improving on the theoreticals of sstables and LSM trees. Why didn't they use/try scylla? I'd be interested to see that benchmark. On Tue, Mar 6, 2018 at 3:48 AM, Romain Hardouin wrote: > Rocksandra is very

Re: [External] Re: Whch version is the best version to run now?

2018-03-06 Thread Anuj Wadehra
We evaluated both 3.0.x and 3.11.x. +1 for 3.11.2 as we faced major performance issues with 3.0.x. We have NOT evaluated new features on 3.11.x. Anuj Sent from Yahoo Mail on Android On Tue, 6 Mar 2018 at 19:35, Alain RODRIGUEZ wrote: Hello Tom, It's good to hear this

Re: data types storage saving

2018-03-06 Thread Carl Mueller
If you're willing to do the data type conversion in insert and retrieval, the you could use blobs as a sort of "adaptive length int" AFAIK On Tue, Mar 6, 2018 at 6:02 AM, onmstester onmstester wrote: > I'm using int data type for one of my columns but for 99.99...% its data

Re: Seed nodes of DC2 creating own versions of system keyspaces

2018-03-06 Thread Jeff Jirsa
-- Jeff Jirsa > On Mar 6, 2018, at 12:32 AM, Oleksandr Shulgin > wrote: > > On 5 Mar 2018 16:13, "Jeff Jirsa" wrote: >> On Mar 5, 2018, at 6:40 AM, Oleksandr Shulgin >> wrote: >> We were deploying a second DC

Re: Please give input or approval of JIRA 14128 so we can continue document cleanup

2018-03-06 Thread Alain RODRIGUEZ
Hello Kenneth, I believe this belongs to the dev list. Please be mindful about this, I think it matters for you as you will get faster answers and for us as we will be reading what we expect to find in each place, and save time. We all know here, as Cassandra users, how much it is important to

Re: cfhistograms InstanceNotFoundException EstimatePartitionSizeHistogram

2018-03-06 Thread Chris Lohfink
Make sure your using same version of nodetool as your version of Cassandra. That metric was renamed from EstimatedRowSize so if using a version of nodetool made for a more recent version you would get this error since EstimatePartitionSizeHistogram doesn’t exist on the older Cassandra host.

Re: [External] Re: Whch version is the best version to run now?

2018-03-06 Thread Alain RODRIGUEZ
Hello Tom, It's good to hear this kind of feedbacks, Thanks for sharing. 3.11.x seems to get more love from the community wrt patches. This is why > I'd recommend 3.11.x for new projects. > I also agree with this analysis. Stay away from any of the 2.x series, they're going EOL soonish and

Re: cfhistograms InstanceNotFoundException EstimatePartitionSizeHistogram

2018-03-06 Thread onmstester onmstester
One my PC i've the exactly same version of Cassandra and histograms command works perfectly so i'm sure that nothing is wrong with nodetool version Sent using Zoho Mail On Tue, 06 Mar 2018 17:38:04 +0330 Chris Lohfink clohf...@apple.com wrote Make sure your using same

Re: cfhistograms InstanceNotFoundException EstimatePartitionSizeHistogram

2018-03-06 Thread Nicolas Guyomar
I got once this kind of problem and it was exactly what Chris explained. Could you double check that on this remote host you do not have 2 versions of cassandra and nodetool is pointing to the old one ? On 6 March 2018 at 17:17, onmstester onmstester wrote: > One my PC i've

Re: Rocksandra blog post

2018-03-06 Thread Dikang Gu
Thanks everyone! @Kyrylo, In Rocksandra world, the storage engine is built on top of RocksDB, which is another LSM tree based storage engine. So the immutable sstables and compactions are managed by RocksDB instances. RocksDB supports different compaction strategies, similar to STCS and LCS. The

Cassandra at Instagram with Dikang Gu interview by Jeff Carpenter

2018-03-06 Thread Kenneth Brotman
Just released on DataStax Distributed Data Show, DiKang Gu of Instagram interviewed by author Jeff Carpenter. Found it really interesting: Shadow clustering, migrating from 2.2 to 3.0, using the Rocks DB as a pluggable storage engine for Cassandra

Re: Seed nodes of DC2 creating own versions of system keyspaces

2018-03-06 Thread Oleksandr Shulgin
On 5 Mar 2018 16:13, "Jeff Jirsa" wrote: On Mar 5, 2018, at 6:40 AM, Oleksandr Shulgin wrote: We were deploying a second DC today with 3 seed nodes (30 nodes in total) and we have noticed that all seed nodes reported the following: INFO

Re: Current active queries and status/type

2018-03-06 Thread Jens Rantil
You can do sampling of tracing on a table to avoid some of the overhead. On Fri, Mar 2, 2018, 00:23 D. Salvatore wrote: > Hi Nicolas, > Thank you very much for the response. > I am looking into something with a smaller time frame than a minute. > Tracing is a good way to

Re: system.size_estimates - safe to remove sstables?

2018-03-06 Thread Kunal Gangakhedkar
Hi Chris, I checked for snapshots and backups - none found. Also, we're not using opscenter, hadoop or spark or any such tool. So, do you think we can just remove the cf and restart the service? Thanks, Kunal On 5 March 2018 at 21:52, Chris Lohfink wrote: > Any chance

RE: Cassandra 2.1.18 - Concurrent nodetool repair resulting in > 30K SSTables for a single small (GBytes) CF

2018-03-06 Thread Steinmaurer, Thomas
Hi Kurt, our provisioning layer allows extending a cluster only one-by-one, thus we didn’t add multiple nodes at the same time. What we did have was some sort of overlapping between our daily repair cronjob and the newly added node still in progress joining. Don’t know if this sort of

Re: One time major deletion/purge vs periodic deletion

2018-03-06 Thread Jens Rantil
Sounds like you are using Cassandra as a queue. It's an antibiotic pattern. What I would do would be to rely on TTL for removal of data and use the TWCS compaction strategy to handle removal and you just focus on insertion. On Tue, Mar 6, 2018, 07:39 Charulata Sharma (charshar)

data types storage saving

2018-03-06 Thread onmstester onmstester
I'm using int data type for one of my columns but for 99.99...% its data never would be 65K, Should i change it to smallint (It would save some Gigabytes disks in a few months) or Cassandra Compression would take care of it in storage? What about blob data type ? Isn't better to use it in

DCAwareRoundRobinPolicy question

2018-03-06 Thread Marasoiu, Nicu
Hi, We normally use a single data center, but when switching from one to another we will temporarily have two. The applications works with the local DC part of the cluster. Normally the configuration of the driver would contain (the commented lines absent):

Re: Rocksandra blog post

2018-03-06 Thread Kyrylo Lebediev
Thanks for sharing, Dikang! Impressive results. As you plugged in different storage engine, it's interesting how you're dealing with compactions in Rocksandra? Is there still the concept of immutable SSTables + compaction strategies or it was changed somehow? Best, Kyrill

Re: Cassandra/Spark failing to process large table

2018-03-06 Thread Faraz Mateen
Thanks a lot for the response. Setting consistency to ALL/TWO started giving me consistent count results on both cqlsh and spark. As expected, my query time has increased by 1.5x ( Before, it was taking ~1.6 hours but with consistency level ALL, same query is taking ~2.4 hours to complete.)