cassandra4 roadmap and datastax
Hi, I've came a cross of the following article https://hub.packtpub.com/the-road-to-cassandra-4-0-what-does-the-future-have-in-store/ It raised several questions: 1 Are DataStax and Cassandra team work together again? (I was surprised to see DataStax hosting Cassandra4 conference, despite the fact that these are now 2 separate projects and AFAIK projects won't be compatible anymore). 2 Does Cassandra 4 brings any new features for developers? (all features in the article were DBA and Ops related). 3 Is there a any official roadmap for Cassandra 4 with release date estimates? Regards, Vitaliy - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: commitlog content
Thank you for the excellent response Alain! On Thu, Aug 30, 2018 at 5:25 PM Alain RODRIGUEZ wrote: > > Hello Vitaly. > > This sounds weird to me (unless we are speaking about a small size MB, a few > GB maybe). Then the commit log size is limited, by default (see below) and > the data should grow bigger in most cases. > > According to the documentation > (http://cassandra.apache.org/doc/latest/architecture/storage_engine.html#commitlog): > >> commitlog_total_space_in_mb: Total space to use for commit logs on disk. >> If space gets above this value, Cassandra will flush every dirty CF in the >> oldest segment and remove it. So a small total commitlog space will tend to >> cause more flush activity on less-active columnfamilies. >> The default value is the smaller of 8192, and 1/4 of the total space of the >> commitlog volume. >> Default Value: 8192 > > > The commit log is supposed to be cleaned on flush, thus the solution to > reduce the disk space used by commit logs are multiple: > - Decrease the value for 'commitlog_total_space_in_mb' (probably the best > option, you say what you want, and you get it) > - Use the table option 'memtable_flush_period_in_ms' (default is 0, pick what > you would like here - has to be done on all the table you want it to apply) > - Manually run: 'nodetool flush' should also clean the commit logs > - Reduce the size of the memtables > - Limit the maximum size per table before a flush is triggered with > 'memtable_cleanup_threshold'. According to the doc it's not a good idea > though > (http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#memtable-cleanup-threshold). > > Also, the data in Cassandra is compacted and compressed. Over a short time > period of test or if the data is small compared to the memory available and > fits mostly in memory, I can imagine that what you describe can happen. > > C*heers, > --- > Alain Rodriguez - @arodream - al...@thelastpickle.com > France / Spain > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > Le mar. 28 août 2018 à 18:24, Vitaliy Semochkin a > écrit : >> >> Hello, >> >> I've noticed that after a stress test that does only inserts a >> commitlog content exceeds data dir 20 times. >> What can be cause of such behavior? >> >> Running nodetool compact did not change anything. >> >> Regards, >> Vitaliy >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: SSTable Compression Ratio -1.0
Thank you ZAIDI, can you please explain why mentioned ratio is negative? On Tue, Aug 28, 2018 at 8:18 PM ZAIDI, ASAD A wrote: > > Compression ratio is ratio of compression to its original size - smaller is > better; see it like compressed/uncompressed > 1 would mean no change in size after compression! > > > > -Original Message- > From: Vitaliy Semochkin [mailto:vitaliy...@gmail.com] > Sent: Tuesday, August 28, 2018 12:03 PM > To: user@cassandra.apache.org > Subject: SSTable Compression Ratio -1.0 > > Hello, > > nodetool tablestats my_kespace > returns SSTable Compression Ratio -1.0 > > Can someone explain, what does -1.0 mean? > > Regards, > Vitaliy > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
commitlog content
Hello, I've noticed that after a stress test that does only inserts a commitlog content exceeds data dir 20 times. What can be cause of such behavior? Running nodetool compact did not change anything. Regards, Vitaliy - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
SSTable Compression Ratio -1.0
Hello, nodetool tablestats my_kespace returns SSTable Compression Ratio -1.0 Can someone explain, what does -1.0 mean? Regards, Vitaliy - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: data not deleted in data dir after keyspace dropped
Thank you very much Pranay, that was exactly what I needed! On Sat, Aug 25, 2018 at 12:17 AM Pranay akula wrote: > > Cassandra creates a snapshot when u drop keyspace. So u should run nodetool > clear snapshot on all nodes to reclaim ur space. > > > > On Fri, Aug 24, 2018, 4:14 PM Vineet G H wrote: >> >> It takes a while in cluster for drop to propagte this depends on >> amount of data and network traffic between your storage nodes >> On Fri, Aug 24, 2018 at 1:54 PM Vitaliy Semochkin >> wrote: >> > >> > Hi, >> > I'm using cassandra 3.11 >> > When I drop a keyspace it's data is not deleted from data dirs in a >> > cluster. >> > what additional steps are needed to make cluster nodes to deleted >> > deleted data from the disk? >> > >> > Regards, >> > Vitaliy >> > >> > - >> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> > For additional commands, e-mail: user-h...@cassandra.apache.org >> > >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: benefits oh HBase over Cassandra
Thank you very much for fast reply, Dinesh! I was under impression that with tunable consistency Cassandra can act as CP (in case it is needed), e.g by setting ALL on both reads and writes. Do you agree with this statement? PS Are there any other benefits of HBase you have found? I'd be glad to hear usecases list. On Sat, Aug 25, 2018 at 12:44 AM dinesh.jo...@yahoo.com.INVALID wrote: > > I've worked with both databases. They're suitable for different use-cases. If > you look at the CAP theorem; HBase is CP while Cassandra is a AP. If we talk > about a specific use-case, it'll be easier to discuss. > > Dinesh > > > On Friday, August 24, 2018, 1:56:31 PM PDT, Vitaliy Semochkin > wrote: > > > Hi, > > I read that once Facebook chose HBase over Cassandra for it's messenger, > but I never found what are the benefits for HBase over Cassandra, > can someone list, if there are any? > > Regards, > Vitaliy > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: data not deleted in data dir after keyspace dropped
Thank you very much for the fast reply, Vineet! Is there any way to speed up this process or manually trigger something analogs to vacuum full in PostgreSQL? On Sat, Aug 25, 2018 at 12:14 AM Vineet G H wrote: > > It takes a while in cluster for drop to propagte this depends on > amount of data and network traffic between your storage nodes > On Fri, Aug 24, 2018 at 1:54 PM Vitaliy Semochkin > wrote: > > > > Hi, > > I'm using cassandra 3.11 > > When I drop a keyspace it's data is not deleted from data dirs in a > > cluster. > > what additional steps are needed to make cluster nodes to deleted > > deleted data from the disk? > > > > Regards, > > Vitaliy > > > > - > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: user-h...@cassandra.apache.org > > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: cqlsh --request-timeout=3600 doesn't seems to work
Thank you for the fast replay, Pranay! This is testing environment and using count on it will do no harm. On Sat, Aug 25, 2018 at 12:11 AM Pranay akula wrote: > > You should change read_request_timeout in cassandra.yaml file. > > Default is 5 sec > > But it is not recommended to do count in cassandra better if u can avoid it > > > On Fri, Aug 24, 2018, 4:06 PM Vitaliy Semochkin wrote: >> >> Hi, >> >> i'm running count query for a very small table (less than 1000 000 records). >> When the amount of records gets to 800 000 i receive read timeout >> error in cqlsh. >> I tried to run cqlsh with option --request-timeout=3600, but receive same >> error, >> what should I do in order not to recieve timeout exception? >> >> Regards, >> Vitaliy >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
why returned achievedConsistencyLevel is null
HI, While using DataStax driver session.execute("some insert query")getExecutionInfo().getAchievedConsistencyLevel() is already returned as null, despite data is stored. Why could it be? Is it possible to make DataStax driver throw an exception in case desired consistency level was not achieved during the insert? Regards, Vitaliy - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
cqlsh --request-timeout=3600 doesn't seems to work
Hi, i'm running count query for a very small table (less than 1000 000 records). When the amount of records gets to 800 000 i receive read timeout error in cqlsh. I tried to run cqlsh with option --request-timeout=3600, but receive same error, what should I do in order not to recieve timeout exception? Regards, Vitaliy - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
speeding up cassandra-unit startup
Hi, I'm using cassandra-unit for integration tests, which is using regular cassandra.yaml to create a cassandra instance. What parameters are recommended to be changed in order to speed up startup process. Regards Vitaliy - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
benefits oh HBase over Cassandra
Hi, I read that once Facebook chose HBase over Cassandra for it's messenger, but I never found what are the benefits for HBase over Cassandra, can someone list, if there are any? Regards, Vitaliy - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
data not deleted in data dir after keyspace dropped
Hi, I'm using cassandra 3.11 When I drop a keyspace it's data is not deleted from data dirs in a cluster. what additional steps are needed to make cluster nodes to deleted deleted data from the disk? Regards, Vitaliy - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: which driver to use with cassandra 3
Thank you very much Duy Hai Doan! I have relatively simple demands and since spring using datastax driver I can always get back to it, though I would prefer to use spring in order to do bootstrapping and resource management for me. On Fri, Jul 20, 2018 at 4:51 PM DuyHai Doan wrote: > > Spring data cassandra is so so ... It has less features (at last at the time > I looked at it) than the default Java driver > > For driver, right now most of people are using Datastax's ones > > On Fri, Jul 20, 2018 at 3:36 PM, Vitaliy Semochkin > wrote: >> >> Hi, >> >> Which driver to use with cassandra 3 >> >> the one that is provided by datastax, netflix or something else. >> >> Spring uses driver from datastax, though is it a reliable solution for >> a long term project, having in mind that datastax and cassandra >> parted? >> >> Regards, >> Vitaliy >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: how to make cassandra listen not on 127.0.0.1 on 9042
Thank you very much for fast response Riccardo! Setting rpc_interface to eth0 did the trick and now listen interface works on eth0 as was specified. PS I wonder is it a bug or a feature? E.g. I don't want to expose thrift rpc because all clients work via CQL, why should I keep it exposed in order to expose cql 9042 port? On Fri, Jul 20, 2018 at 4:25 PM Riccardo Ferrari wrote: > > Hi, > > Have a look at the rcp_address description > http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html?highlight=rpc_address#rpc-address. > what does your hostname resolves to? > > Best, > > On Fri, Jul 20, 2018 at 3:09 PM, Vitaliy Semochkin > wrote: >> >> Hi >> >> I'm building a cluster of cassandra RHEL 7 >> using cassandra 3.11.2 rpm >> >> I want cassandra to listen on 9042 on eth0 however >> no matter what I do, it listens on 127.0.0.1 >> I tried to specify listen_address instead but it doesn't work too. >> >> What am I missing? >> How to make cassandra listen on 9042 other then 127.0.0.1? >> >> Regards, >> Vitaliy >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
which driver to use with cassandra 3
Hi, Which driver to use with cassandra 3 the one that is provided by datastax, netflix or something else. Spring uses driver from datastax, though is it a reliable solution for a long term project, having in mind that datastax and cassandra parted? Regards, Vitaliy - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
how to make cassandra listen not on 127.0.0.1 on 9042
Hi I'm building a cluster of cassandra RHEL 7 using cassandra 3.11.2 rpm I want cassandra to listen on 9042 on eth0 however no matter what I do, it listens on 127.0.0.1 I tried to specify listen_address instead but it doesn't work too. What am I missing? How to make cassandra listen on 9042 other then 127.0.0.1? Regards, Vitaliy - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: apache cassandra development process and future
Jeff and Rahul thank you very much for clarification. My main concern was the fact that since DataStax left Cassandra project it is unclear if the development speed will significantly slow down, even now it seems documentation site seems abandoned. Though players like Netflix, Apple and Microsoft look promising. On Wed, Jul 18, 2018 at 6:49 PM Rahul Singh wrote: > > YgaByte!!! <— another Cassandra “compliant" DB - not sure if they forked > C* or wrote Cassandra in go. ;) > https://github.com/YugaByte/yugabyte-db > > Datastax is Cassandra compliant — and can use the same sstables at least > until 6.0 (which uses a patched version of “4.0” which is 2-5x faster) — and > has the same actual tools that are in the OS version. > > Here are some signals from the big players that are understanding it’s power > and need. > > 1. Azure CosmosDB has a C* compliant API - seems like Managed C* under the > hood. They used ElasticSearch to run their Azure Search … > 2. Oracle now has a Datastax offering > 3. Mesosphere offers supported versions of Cassandra and Datastax > 4. Kubernetes and related purveyors use Cassandra as prime example as a part > of a Kubernetes backed cloud agnostic orchestration framework > 5. What Alain mentioned earlier. > > > -- > Rahul Singh > rahul.si...@anant.us > > Anant Corporation > On Jul 18, 2018, 9:35 AM -0400, Alain RODRIGUEZ , wrote: > > Hello, > > It's a complex topic that has already been extensively discussed (at least > for the part about Datastax). I am sharing my personal understanding, from > what I read in the mailing list mostly: > >> Recently Cassandra eco system became very fragmented > > > I would not put Scylladb in the same 'eco system' than Apache Cassandra. I > believed it is inspired by Cassandra and claim to be compatible with it up to > a certain point, but it's not the same software, thus not the same users and > community. > > About Datastax, I think they will give you a better idea of their position by > themselves here or through their support. I believe they also communicated > about it already. But in any case, I see Datastax more in the same 'eco > system' than Scylladb. Datastax uses a patched/forked version of Cassandra (+ > some other tools integrated with Cassandra and support). Plus it goes both > ways, Datastax greatly contributed to making Cassandra what it is now and > relies on it (or use to do so at least). I don't think that's the case for > Scylladb I don't see that much interest in connection/exchanges with > Scylladb, I mean no more than exchanging about DynamoDB for example. We can > make standards, compatibles features, compare performances, etc, but it's not > the same code base. > >> Since Datastax used to be the major participant to Cassandra >> development and now it looks it goes on is own way, what is going to >> be with the Apache Cassandra? > > > Well, this is a fair point, that was discussed in the past, but to make it > short, Apache Cassandra is not dead or anything close. There is a lot of > activity. Some people are stepping out, other stepping in, and other > companies and individual are actively contributing to Cassandra. A version > 4.0 of Cassandra is being actively worked on at the moment. If these topics > are of interest, you might want to join the "Cassandra dev" mailing list > (http://cassandra.apache.org/community/). > >> If there are any other active participants in development? > > > Yes, directly or by open sourcing internal tools quite a few companies have > contributed and continue to contribute to the Apache Cassandra ecosystem. I > invite you to have a look directly at this dev mailing list and check > people's email, profiles or companies. Check the Jira as well :). I am not > into doing this kind of stuff that much myself, I am not following this > closely but I can name for sure Apple, Netflix, The Last Pickle (my company), > Instaclustr I believe as well and many others that I am sorry not to name > here. > > Some people are working on Apache Cassandra for years and are around to help > regularly, they changed company but are still working on Cassandra, or even > changed company to work more with Apache Cassandra in some cases. > >> I'm also interested which distribution is the most popular at the >> moment in production? > > > I would say now you should start with C*3.0.last or C* 3.11.last. It seems to > be the general consensus in the mailing list lately. > For Scylladb and Datastax I don't know about the version to use. You should > ask them directly. > > C*heers, > --- > Alain Rodriguez - @arodream - al...@thelastpickle.com
apache cassandra development process and future
Hi, Recently Cassandra eco system became very fragmented: Scylladb provides solution based on Cassandra wire protocol claiming it is 10 times faster than Cassandra. Datastax provides it's own solution called DSE claiming it is twice faster than Cassandra. Also their site says "DataStax no longer supports the DataStax Community version of Apache Cassandra™ or the DataStax Distribution of Apache Cassandra™. Is their new software incompatible with Cassandra? Since Datastax used to be the major participant to Cassandra development and now it looks it goes on is own way, what is going to be with the Apache Cassandra? If there are any other active participants in development? I'm also interested which distribution is the most popular at the moment in production? Best Regards, Vitaliy - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: cassandra cluser sizing
Jeff, thank you very much for reply. Will try to use 4TB per instance. If I understand it correctly level compaction can lead to 50% https://docs.datastax.com/en/dse-planning/doc/planning/planningHardware.html Regarding the question of running multiple instances per server, am I correct that in case of 3.11 instances and having several disks dedicated for each instance, running multiple instances per server is ok? On Thu, Jul 12, 2018 at 5:47 PM Jeff Jirsa wrote: > > You can certainly go higher than a terabyte - 4 or so is common, Ive heard of > people doing up to 12 tb with the awareness that time to replace scales with > size on disk, so a very large host will take longer to rebuild than a small > host > > The 50% free guidance only applies to size tiered compaction, and given your > throughput you may prefer leveled compaction anyway. With leveled you should > target 30% free for compaction and repair > > You don’t need more than one Cassandra instance per host for 4tb but you may > want to consider it for more than that - multiple instances are especially > useful if you have multiple (lots of) disks and are running Cassandra before > CASSANDRA-6696 (which made jbod safer). > > -- > Jeff Jirsa > > > > On Jul 12, 2018, at 7:37 AM, Vitaliy Semochkin wrote: > > > > Hi, > > > > Which amount of data Cassandra 3 server in a cluster can serve at max? > > The documentation says it is only 1TB. > > If the load is not high (only about 100 requests per second with 1kb > > of data each) is it safe to go above 1TB size (let's say 5TB per > > server)? > > What will be safe maximum disk size a server in such cluster can serve? > > > > Documentation also says that compaction requires to have %50 of disk > > occupied space. In case I don't have update operations (only insert) > > do I need that much extra space for compaction? > > > > In articles (outside Datastax docs) I read that it is a common > > practice to launch more than one Cassandra server on one physical > > server in order to be able use more than 1TB of hard driver per > > server, is it recommended? > > > > - > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: user-h...@cassandra.apache.org > > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
cassandra cluser sizing
Hi, Which amount of data Cassandra 3 server in a cluster can serve at max? The documentation says it is only 1TB. If the load is not high (only about 100 requests per second with 1kb of data each) is it safe to go above 1TB size (let's say 5TB per server)? What will be safe maximum disk size a server in such cluster can serve? Documentation also says that compaction requires to have %50 of disk occupied space. In case I don't have update operations (only insert) do I need that much extra space for compaction? In articles (outside Datastax docs) I read that it is a common practice to launch more than one Cassandra server on one physical server in order to be able use more than 1TB of hard driver per server, is it recommended? - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org