Re: Looking for pointers about replication internal working

2021-09-02 Thread DuyHai Doan
As far as I remember, Apache Cassandra wanted to be self-sufficient and avoid pulling yet-another-piece-of-external-software for its internal work. With lightweight transactions since 3.0, it has the sufficient primitive for some scenarios that require linearizability My 2 cents Duy Hai DOAN

Re: What does the community think of the DataStax 4.x Java driver changes?

2020-10-29 Thread DuyHai Doan
Just my 2 cents Because of the tremendous breaking changes in terms of API as well as public facing classes (QueryBuilder for ex) I have stopped the development of the Achilles framework. Migrating to the 4.x version would require almost the complete rewrite of the framework, an effort which I

Re: Dynamo autoscaling: does it beat cassandra?

2019-12-09 Thread DuyHai Doan
Out of curiosity, does DynamoDB autoscaling allows you to exceed the partition limits (e.g. push more data than it is allowed for some outlier heavy partitions) ? If yes, it can be interesting (I guess DynamoDB is doing some kind of rebalancing behind the scene). If no, it's just an artificial

Re: TTL on UDT

2019-12-09 Thread DuyHai Doan
It depends on.. Latest version of Cassandra allows unfrozen UDT. The individual fields of UDT are updated atomically and they are stored effectively in distinct physical columns inside the partition, thus applying ttl() on them makes sense. I'm not sure however if the CQL parser allows this syntax

Re: Cluster sizing for huge dataset

2019-10-04 Thread DuyHai Doan
is solution and limitations. > > Note: that would also probably help you with your init-load/TWCS issue . > > My2c. > Cedrick > > On Tue, Oct 1, 2019 at 11:49 PM DuyHai Doan wrote: > >> The client wants to be able to access cold data (2 years old) in the >> same cluster so

Re: Challenge with initial data load with TWCS

2019-10-01 Thread DuyHai Doan
Thanks Alex for confirming Le 30 sept. 2019 09:17, "Oleksandr Shulgin" a écrit : > On Sun, Sep 29, 2019 at 9:42 AM DuyHai Doan wrote: > >> Thanks Jeff for sharing the ideas. I have some question though: >> >> - CQLSSTableWriter and explicitly break betwee

Re: Cluster sizing for huge dataset

2019-10-01 Thread DuyHai Doan
Regards > Julien > > Le lun. 30 sept. 2019 à 22:03, DuyHai Doan a écrit : >> >> Thanks all for your reply >> >> The target deployment is on Azure so with the Nice disk snapshot feature, >> replacing a dead node is easier, no streaming from Cassandra >&g

Re: Cluster sizing for huge dataset

2019-09-30 Thread DuyHai Doan
Thanks all for your reply The target deployment is on Azure so with the Nice disk snapshot feature, replacing a dead node is easier, no streaming from Cassandra About compaction overhead, using TwCs with a 1 day bucket and removing read repair and subrange repair should be sufficient Now the

Re: Challenge with initial data load with TWCS

2019-09-29 Thread DuyHai Doan
> historical load using this method > > > > > On Sep 28, 2019, at 1:31 PM, DuyHai Doan wrote: > > > > Hello users > > > > TWCS works great for permanent state. It creates SSTables of roughly > > fixed size if your insertion rate is pretty constant. >

Re: Cluster sizing for huge dataset

2019-09-29 Thread DuyHai Doan
sor data is similar / compressible. > > > On Sep 28, 2019, at 1:23 PM, DuyHai Doan wrote: > > > > Hello users > > > > I'm facing with a very challenging exercise: size a cluster with a huge > > dataset. > > > > Use-case = IoT > > > >

Challenge with initial data load with TWCS

2019-09-28 Thread DuyHai Doan
Hello users TWCS works great for permanent state. It creates SSTables of roughly fixed size if your insertion rate is pretty constant. Now the big deal is about the initial load. Let's say we configure a TWCS with window unit = day and window size = 1, we would have 1 SSTable per day and with

Cluster sizing for huge dataset

2019-09-28 Thread DuyHai Doan
Hello users I'm facing with a very challenging exercise: size a cluster with a huge dataset. Use-case = IoT Number of sensors: 30 millions Frequency of data: every 10 minutes Estimate size of a data: 100 bytes (including clustering columns) Data retention: 2 years Replication factor: 3 (pretty

Re: Is it possible to build multi cloud cluster for Cassandra

2019-09-05 Thread DuyHai Doan
Hello all I've given a thought to this multi-cloud marketing buzz with Cassandra Theoretically feasible (with GossipingPropertyFileSnitch) but practically a headache if you want a minimum of performance and security The problem comes from the network "devils in the details" Suppose DC1 in AWS

Re: Using Cassandra as an object store

2019-04-19 Thread DuyHai Doan
Idea: To guarantee data integrity, you can store an MD5 of all chunks data as static column in the partition that contains the chunks On Fri, Apr 19, 2019 at 9:18 AM cclive1601你 wrote: > we have use cassandra as object store for some years, you can just split > the object into some small

Re: Usage of allocate_tokens_for_keyspace for a new cluster

2019-02-14 Thread DuyHai Doan
Ok thanks John On Thu, Feb 14, 2019 at 8:51 PM Jonathan Haddad wrote: > Create the first node, setting the tokens manually. > Create the keyspace. > Add the rest of the nodes with the allocate tokens uncommented. > > On Thu, Feb 14, 2019 at 11:43 AM DuyHai Doan wrote: &

Usage of allocate_tokens_for_keyspace for a new cluster

2019-02-14 Thread DuyHai Doan
Hello users By looking at the mailing list archive, there was already some questions about the flag "allocate_tokens_for_keyspace" from cassandra.yaml I'm starting a fresh new cluster (with 0 data). The keyspace used by the project is raw_data so I set allocate_tokens_for_keyspace = raw_data in

Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-13 Thread DuyHai Doan
Plain answer is NO There is a slight hope that the JIRA https://issues.apache.org/jira/browse/CASSANDRA-9754 gets into 4.0 release But right now, there seems to be few interest in this ticket, the last comment 23/Feb/2017 old ... On Wed, Feb 13, 2019 at 1:18 PM Vsevolod Filaretov wrote: > Hi

Re: Max number of windows when using TWCS

2019-02-11 Thread DuyHai Doan
d sstables drop by itself? One ttl and gc grace seconds >> past whole sstable will have only tombstones. >> >> >> Regards, >> >> Nitan >> >> Cell: 510 449 9629 >> >> On Feb 11, 2019, at 2:23 PM, DuyHai Doan wrote: >> >> Purging data is also straightforward, just dropping SSTables (by a >> script) where create date is older than a threshold, we don't even need to >> rely on TTL >> >>

Re: Max number of windows when using TWCS

2019-02-11 Thread DuyHai Doan
dows and every read touches all of the windows, > you’re going to have a bad time. > > -- > Jeff Jirsa > > > On Feb 11, 2019, at 12:12 PM, DuyHai Doan wrote: > > Hello users > > On the official documentation for TWCS ( > http://cassandra.apache.org/doc/latest/operat

Max number of windows when using TWCS

2019-02-11 Thread DuyHai Doan
Hello users On the official documentation for TWCS ( http://cassandra.apache.org/doc/latest/operating/compaction.html#time-window-compactionstrategy) it is advised to select the windows unit and size so that the total number of windows intervals is around 20-30. Is there any explanation for this

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread DuyHai Doan
quot; >>>> This will not cover the situation when a value have to be overwriten >>>> with null. >>>> >>>> I found one possible solution - change the schema to keep only primary >>>> key fields and move all other fields to frozen UDT. >&g

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread DuyHai Doan
"The problem is I can't know the combination of set/unset values" --> Just for this requirement, Achilles has a working solution for many years using INSERT_NOT_NULL_FIELDS strategy: https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy Or you can use the Update API that by design only

Re: Tombstone removal optimization and question

2018-11-06 Thread DuyHai Doan
Thanks for the confirmation Kurt Le 6 nov. 2018 11:59, "kurt greaves" a écrit : > Yes it does. Consider if it didn't and you kept writing to the same > partition, you'd never be able to remove any tombstones for that partition. > > On Tue., 6 Nov. 2018, 19:40 DuyHai Doan

Tombstone removal optimization and question

2018-11-06 Thread DuyHai Doan
Hello all I have tried to sum up all rules related to tombstone removal: -- Given a tombstone written at timestamp (t) for a partition key (P) in SSTable (S1). This tombstone will be removed: 1) after

Re: comprehensive list of checks before rolling version upgrades

2018-10-30 Thread DuyHai Doan
To add to your excellent list: - no topology change (joining/leaving/decommissioning) nodes - no rebuild of index/MV under way On Tue, Oct 30, 2018 at 4:35 PM Carl Mueller wrote: > Does anyone have a pretty comprehensive list of these? Many that I don't > currently know how to check but I'm

Re: Aggregation of Set Data Type

2018-10-23 Thread DuyHai Doan
You will need to use user defined aggregates for this Le 23 oct. 2018 16:46, "Joseph Wonesh" a écrit : > Hello all, > > I am trying to aggregate rows which each contain a column of Set. > I would like the result to contain the sum of all sets, where null would be > equivalent to the empty set.

Re: Released an ACID-compliant transaction library on top of Cassandra

2018-10-16 Thread DuyHai Doan
I think it does use LWT under the hood: https://github.com/scalar-labs/scalardb/blob/master/src/main/java/com/scalar/database/transaction/consensuscommit/CommitMutationComposer.java#L74-L79 return new Put(base.getPartitionKey(), getClusteringKey(base, result).orElse(null))

Re: About UDF/UDA

2018-09-27 Thread DuyHai Doan
65, 870, 617, 2), ''': (381, 11668, 6024, 2)} > > I would like to have something lke: > | item| min | max| average | count | > --- > | | 365 | 870 | 617| 2

Re: About UDF/UDA

2018-09-26 Thread DuyHai Doan
A hint to answer your Q3 is to use a final function to perform the flattening or transformation on the result of the aggregation The syntax of an UDA is: CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS] aggregateName(type1, type2, …) SFUNC accumulatorFunction STYPE stateType [FINALFUNC

Re: [EXTERNAL] Re: cold vs hot data

2018-09-17 Thread DuyHai Doan
Also for the record, I remember Datastax having something called Tiered Storage that does move data around (folders/disk volume) based on data age. To be checked On Mon, Sep 17, 2018 at 10:23 PM, DuyHai Doan wrote: > Sean > > Without transactions à la SQL, how can you guarantee

Re: [EXTERNAL] Re: cold vs hot data

2018-09-17 Thread DuyHai Doan
Sean Without transactions à la SQL, how can you guarantee atomicity between both tables for upserts ? I mean, one write could succeed with hot table and fail for cold table The only solution I see is using logged batch, with a huge overhead and perf hit on for the writes On Mon, Sep 17, 2018 at

Re: Using CDC Feature to Stream C* to Kafka (Design Proposal)

2018-09-12 Thread DuyHai Doan
; >>> You know what they say: Go big or go home. >>> >>> Right now candidates are Cassandra itself but embedded or on the side >>> not on the actual data clusters, zookeeper (yuck) , Kafka (which needs >>> zookeeper, yuck) , S3 (outside service depe

Re: Using CDC Feature to Stream C* to Kafka (Design Proposal)

2018-09-10 Thread DuyHai Doan
Also using Calvin means having to implement a distributed monotonic sequence as a primitive, not trivial at all ... On Mon, Sep 10, 2018 at 3:08 PM, Rahul Singh wrote: > In response to mimicking Advanced replication in DSE. I understand the > goal. Although DSE advanced replication does one

Re: A blog about Cassandra in the IoT arena

2018-08-24 Thread DuyHai Doan
; tombstones and a threshold, it would be dedicated to deletion. It may be an > edge case , but people face issues with tombstones all the time because > they don’t know better. > > Rahul > On Aug 23, 2018, 11:50 AM -0500, DuyHai Doan , > wrote: > > As I used to tell some people, th

Re: A blog about Cassandra in the IoT arena

2018-08-23 Thread DuyHai Doan
As I used to tell some people, the day we make : 1. partition size unlimited, or at least huge partition easily manageable (compaction, repair, streaming, partition index file) 2. tombstone a non-issue that day, Cassandra will dominate any other IoT technology out there Until then ... On Thu,

Re: full text search on some text columns

2018-07-31 Thread DuyHai Doan
I had SASI in mind before stopping myself from replying to this thread. Actually the OP needs to index clustering column and partition key, and as far as I remember, I've myself opened a JIRA and pushed a patch for SASI to support indexing composite partition key but there are some issues so far

Re: which driver to use with cassandra 3

2018-07-20 Thread DuyHai Doan
Spring data cassandra is so so ... It has less features (at last at the time I looked at it) than the default Java driver For driver, right now most of people are using Datastax's ones On Fri, Jul 20, 2018 at 3:36 PM, Vitaliy Semochkin wrote: > Hi, > > Which driver to use with cassandra 3 > >

Re: default_time_to_live vs TTL on insert statement

2018-07-12 Thread DuyHai Doan
ire table by setting the table's >>> default_time_to_live >>> <https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlCreateTable.html#tabProp__cqlTableDefaultTTL> >>> property. If you try to set a TTL for a specific column that is longer >>> than the time defined b

Re: default_time_to_live vs TTL on insert statement

2018-07-11 Thread DuyHai Doan
default_time_to_live property applies if you don't specify any TTL on your CQL statement However you can always override the default_time_to_live

Re: [ANNOUNCE] LDAP Authenticator for Cassandra

2018-07-05 Thread DuyHai Doan
Super great, thank you for this contribution Kurt! On Thu, Jul 5, 2018 at 1:49 PM, kurt greaves wrote: > We've seen a need for an LDAP authentication implementation for Apache > Cassandra so we've gone ahead and created an open source implementation > (ALv2) utilising the pluggable auth support

Re: Write performance degradation

2018-06-18 Thread DuyHai Doan
Maybe the disk I/O cannot keep up with the high mutation rate ? Check the number of pending compactions On Sun, Jun 17, 2018 at 9:24 AM, onmstester onmstester wrote: > Hi, > > I was doing 500K inserts + 100K counter update in seconds on my cluster of > 12 nodes (20 core/128GB ram/4 * 600 HDD

Re: Data Proxy for Cassandra

2018-06-11 Thread DuyHai Doan
Hello Chidamber When you said "In addition, the data proxy is distributed based on consistent hashing and using gossip between data proxy nodes to keep the cached data unique (per node) and consistent", did you re-implement Consistent hashing and gossip algorithm from scratch in your proxy layer

Re: what's the read cl of list read-on-write operations?

2018-04-20 Thread DuyHai Doan
then the item to set finally is not b but d, > which is unexpected from the perspective of the previous read. > > Why Cassandra do not read from cluster with somehow read CL before > updating the list? > > > 2018-04-20 16:12 GMT+08:00 DuyHai Doan <doanduy...@gmail.com>: &

Re: what's the read cl of list read-on-write operations?

2018-04-20 Thread DuyHai Doan
The read operation on the list column is done locally on each replica so replication factor does not really apply here On Fri, Apr 20, 2018 at 7:37 AM, Jinhua Luo wrote: > Hi All, > > Some list operations, like set by index, needs to read the whole list > before update. >

Re: Does Cassandra supports ACID txn

2018-04-19 Thread DuyHai Doan
No ACID transaction any soon in Cassandra On Thu, Apr 19, 2018 at 7:35 AM, Rajesh Kishore wrote: > Hi, > > I am bit confused by reading different articles, does recent version of > Cassandra supports ACID transaction ? > > I found BATCH command , but not sure if it

Re: where does c* store the schema?

2018-04-16 Thread DuyHai Doan
There is a system_schema keyspace to store all the schema information https://docs.datastax.com/en/cql/3.3/cql/cql_using/useQuerySystem.html#useQuerySystem__table_bhg_1bw_4v On Mon, Apr 16, 2018 at 10:48 AM, Jinhua Luo wrote: > Hi All, > > Does c* use predefined

Re: Can I sort it as a result of group by?

2018-04-09 Thread DuyHai Doan
No, sorting by column other than clustering column is not possible On Mon, Apr 9, 2018 at 11:42 AM, Eunsu Kim wrote: > Hello, everyone. > > I am using 3.11.0 and I have the following table. > > CREATE TABLE summary_5m ( > service_key text, > hash_key int, >

Re: Text or....

2018-04-04 Thread DuyHai Doan
Compressing client-side is better because it will save: 1) a lot of bandwidth on the network 2) a lot of Cassandra CPU because no decompression server-side 3) a lot of Cassandra HEAP because the compressed blob should be relatively small (text data compress very well) compared to the raw size On

Re: Text or....

2018-04-04 Thread DuyHai Doan
Compress it and stores it as a blob. Unless you ever need to index it but I guess even with SASI indexing a so huge text block is not a good idea On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges wrote: > Hi All, > > A certain application is writing ~55,000 characters for a

Re: Cassandra filter with ordering query modeling

2018-03-01 Thread DuyHai Doan
https://www.slideshare.net/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics On Thu, Mar 1, 2018 at 3:48 PM, Valentina Crisan wrote: > 1) I created another table for Query#2/3. The partition Key was StartTime > and clustering key was name. When I execute

Re: Secondary Indexes C* 3.0

2018-02-22 Thread DuyHai Doan
Read this: http://www.doanduyhai.com/blog/?p=13191 On Thu, Feb 22, 2018 at 6:44 PM, Akash Gangil wrote: > To provide more context, I was going through this > https://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html# > useWhenIndex__highCardCol > > On Thu,

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread DuyHai Doan
So before buying any marketing claims from Microsoft or whoever, maybe should you try to use it extensively ? And talking about backup, have a look at DynamoDB: http://i68.tinypic.com/n1b6yr.jpg >From my POV, if a multi-billions company like Amazon doesn't get it right or can't make it easy for

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread DuyHai Doan
For UI and interactive data exploration there is already the Cassandra interpreter for Apache Zeppelin that is more than decent for the job On Wed, Feb 21, 2018 at 9:19 AM, Daniel Hölbling-Inzko < daniel.hoelbling-in...@bitmovin.com> wrote: > But what does this video really show? That Microsoft

Re: LWT broken?

2018-02-11 Thread DuyHai Doan
Mahdi , the issue in your code is here: else // we lost LWT, fetch the winning value 9existing_id = SELECT id FROM hash_id WHERE hash=computed_hash | consistency = ONE You lost LWT, it means that there is a concurrent LWT that has won the Paxos round and has applied the value using

Re: GDPR, Right to Be Forgotten, and Cassandra

2018-02-09 Thread DuyHai Doan
Or use the new user-defined compaction option recently introduced, provided you can determine over which SSTables a partition is spread On Fri, Feb 9, 2018 at 5:23 PM, Jon Haddad wrote: > Give this a read through: > >

Re: group by select queries

2018-02-01 Thread DuyHai Doan
Worth digging into the source code of GROUP BY but as far as I remember, using GROUP BY without any aggregation function will lead to C* picking just the first row (or maybe last, not sure on this point) row at hand. About ordering, since the grouping is on a component of partition key, do not

Re: Too many tombstones using TTL

2018-01-10 Thread DuyHai Doan
"The question is why Cassandra creates a tombstone for every column instead of single tombstone per row?" --> Simply because technically it is possible to set different TTL value on each column of a CQL row On Wed, Jan 10, 2018 at 2:59 PM, Python_Max wrote: > Hello, C*

Re: CQL Map vs clustering keys

2017-11-15 Thread DuyHai Doan
Yes, your remark is correct. However, once CASSANDRA-7396 (right now in 4.0 trunk) get released, you will be able to get a slice of map values using their (sorted) keys SELECT map[fromKey ... toKey] FROM TABLE ... Needless to say, it will be also possible to get a single element from the map by

Re: Securing Cassandra database

2017-11-13 Thread DuyHai Doan
You can pass in login/password from the client side and encrypt the client / cassandra connection... Le 13 nov. 2017 12:16, "Mokkapati, Bhargav (Nokia - IN/Chennai)" < bhargav.mokkap...@nokia.com> a écrit : Hi Team, We are using Apache Cassandra 3.0.13 version. As part of Cassandra

Re: Cassandra using a ton of native memory

2017-11-03 Thread DuyHai Doan
8Gb of RAM being a recommended production setting for most of the workload out there. Having only 16Gb of RAM, and because Cassandra is relying a lot on system page cache, there should be no surprise that your 16Gb being eaten up. On Fri, Nov 3, 2017 at 5:40 PM, Austin Sharp

Re: Golang + Cassandra + Text Search

2017-10-24 Thread DuyHai Doan
There is already a full text search index in Cassandra called SASI On Tue, Oct 24, 2017 at 6:50 AM, Ridley Submission < ridley.submission2...@gmail.com> wrote: > Hi, > > Quick question, I am wondering if anyone here who works with Go has > specific recommendations for as simple framework to add

Re: Does NTP affects LWT's ballot UUID?

2017-10-10 Thread DuyHai Doan
The ballot UUID is obtained using QUORUM agreement between replicas for a given partition key and we use this TimeUUID ballot as write-time for the mutation. The only scenario where I can see a problem is that NTP goes backward in time on a QUORUM of replicas, which would break the contract of

Re: new question ;-) // RE: understanding batch atomicity

2017-09-29 Thread DuyHai Doan
PartitionKey = 1 > => are mutations (A) & (B) done in an atomic way (all or nothing) ? > > Thanks. > > Dominique > > > > [@@ THALES GROUP INTERNAL @@] > > *De :* DuyHai Doan [mailto:doanduy...@gmail.com <doanduy...@gmail.com>] > *Envoyé :* vendredi 29 septembre 2017

Re: understanding batch atomicity

2017-09-29 Thread DuyHai Doan
All updates here means all mutations == INSERT/UPDATE or DELETE On Fri, Sep 29, 2017 at 5:07 PM, DE VITO Dominique < dominique.dev...@thalesgroup.com> wrote: > Hi, > > > > About BATCH, the Apache doc https://cassandra.apache.org/ > doc/latest/cql/dml.html?highlight=atomicity says : > > > >

Re: data loss in different DC

2017-09-28 Thread DuyHai Doan
If you're writing into DC1 with CL = LOCAL_xxx, there is no guarantee to be sure to read the same data in DC2. Only repair will help you On Thu, Sep 28, 2017 at 11:41 AM, Peng Xiao <2535...@qq.com> wrote: > Dear All, > > We have a cluster with one DC1:RF=3,another DC DC2:RF=1 only for ETL,but >

Re: Datastax Driver Mapper & Secondary Indexes

2017-09-26 Thread DuyHai Doan
If you're looking for schema generation from Bean annotations: https://github.com/doanduyhai/Achilles/wiki/DDL-Scripts-Generation On Tue, Sep 26, 2017 at 2:50 PM, Daniel Hölbling-Inzko < daniel.hoelbling-in...@bitmovin.com> wrote: > Hi, I also just figured out that there is no schema generation

Re: Self-healing data integrity?

2017-09-11 Thread DuyHai Doan
eff Jirsa > > > On Sep 9, 2017, at 12:59 PM, Jeff Jirsa <jji...@gmail.com> wrote: > > There is, but they aren't consulted on the streaming paths (only on normal > reads) > > > -- > Jeff Jirsa > > > On Sep 9, 2017, at 12:02 PM, DuyHai Doan <doanduy...@g

Re: Self-healing data integrity?

2017-09-09 Thread DuyHai Doan
Jeff, With default compression enabled on each table, isn't there CRC files created along side with SSTables that can help detecting bit-rot ? On Sat, Sep 9, 2017 at 7:50 PM, Jeff Jirsa wrote: > Cassandra doesn't do that automatically - it can guarantee consistency on >

Re: Lightweight transaction in Multi DC

2017-09-09 Thread DuyHai Doan
C > > > > LOCAL_SERIAL is dc level, SERIAL checks for complete cluster level. > > > > On Fri, Sep 8, 2017 at 2:33 PM, Charulata Sharma (charshar) < > chars...@cisco.com> wrote: > > Yes …it is with LOCAL_SERIAL. Should I be using SERIAL ? > > > > T

Re: Lightweight transaction in Multi DC

2017-09-08 Thread DuyHai Doan
Are you using CAS with SERIAL consistency level for your multi-DC setup ? On Fri, Sep 8, 2017 at 9:27 PM, Charulata Sharma (charshar) < chars...@cisco.com> wrote: > Hi, > > We are facing a serious issue with CAS in a multi DC setup and I > wanted to get some input on it from the forum. > >

Re: No columns are defined for Materialized View other than primary key

2017-09-07 Thread DuyHai Doan
> wrote: > >> There is one more column "data" here in MView? >> >> On 7 Sep 2017 7:49 p.m., "DuyHai Doan" <doanduy...@gmail.com> wrote: >> >>> The answer of your question is in the error message. For once it's very >>> clear. The p

Re: No columns are defined for Materialized View other than primary key

2017-09-07 Thread DuyHai Doan
The answer of your question is in the error message. For once it's very clear. The primary key of your materialized view is EXACTLY the same as for your base table. So the question is what's the point creating this materialized view ... On Thu, Sep 7, 2017 at 4:01 PM, Alex Kotelnikov <

[ANNOUNCE] Achilles 5.3.0

2017-08-26 Thread DuyHai Doan
Hello Cassandra users I'm happy to announce the release of Achilles 5.3.0 The new added features are - Support for Cassandra up to 3.11.0 and Datastax Enterprise up to 5.1.2 - Support for new Duration type (CASSANDRA-11873) - Support for literal value in (CASSANDRA-10783) - Support for GROUP BY

Re: SASI and secondary index simultaniously

2017-07-12 Thread DuyHai Doan
In the original source code Sasi will be chosen instead of secondary index Le 12 juil. 2017 09:13, "Vlad" a écrit : > Hi, > > it's possible to create both regular secondary index and SASI on the same > column: > > > > > *CREATE TABLE ks.tb (id int PRIMARY KEY, name

Re: timeoutexceptions with UDF causing cassandra forceful exits

2017-07-03 Thread DuyHai Doan
Beside the config of user_function_timeout_policy, I would say having an UDF that times out badly is generally an indication that you should review your UDF code On Mon, Jul 3, 2017 at 7:58 PM, Jeff Jirsa wrote: > > > On 2017-06-29 17:00 (-0700), Akhil Mehra

Re: UDF for sorting

2017-07-03 Thread DuyHai Doan
Plain answer is no you can't The reason is that UDF only transform column values on each row but does not have the ability to modify rows ordering On Mon, Jul 3, 2017 at 10:14 PM, techpyaasa . wrote: > Hi all, > > I have a table like > > CREATE TABLE ks.cf ( pk1 bigint,

Re: SASI index on datetime column does not filter on minutes

2017-06-19 Thread DuyHai Doan
The + in the date format is necessary to specify timezone On Mon, Jun 19, 2017 at 5:38 PM, Hannu Kröger wrote: > Hello, > > I tried the same thing with 3.10 which I happened to have at hand and that > seems to work. > > cqlsh:test> select lastname,firstname,dateofbirth

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread DuyHai Doan
t; > > > On Mon, Jun 12, 2017 at 10:03 AM DuyHai Doan <doanduy...@gmail.com> wrote: > >> For all those promoting ES as a PRIMARY datastore, please read this >> before: >> >> https://discuss.elastic.co/t/elasticsearch-as-a-primary-database/85733/13 >> >

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread DuyHai Doan
For all those promoting ES as a PRIMARY datastore, please read this before: https://discuss.elastic.co/t/elasticsearch-as-a-primary-database/85733/13 There are a lot of warning before recommending ES as a datastore. The answer from Pilato, ES official evangelist: - You absolutely care

Re: Cassandra & Spark

2017-06-08 Thread DuyHai Doan
Interesting Tobias, when you said "Instead we transferred the data to Apache Kudu", did you transfer all Cassandra data into Kudu from with a single migration and then tap into Kudo for aggregation or did you run data import every day/week/month from Cassandra into Kudu ? >From my point of view,

Re: Understanding the limitation to only one non-PK column in MV-PK

2017-06-06 Thread DuyHai Doan
All the explanation for why just 1 non PK column can be used as PK for MV is here: https://skillsmatter.com/skillscasts/7446-cassandra-udf-and-materialised-views-in-depth Skip to 19:18 for the explanation On Mon, May 8, 2017 at 8:08 PM, Fridtjof Sander < fridtjof.san...@googlemail.com> wrote:

Re: Order by for aggregated values

2017-06-06 Thread DuyHai Doan
a query to take less than 2 > seconds. > > > > My impression was that Spark is aimed at larger scale analytics. > > > > I am ok with the limitation on “group by”. I am intending to use async > queries and token-aware load balancing to partition the query and execute &

Re: Order by for aggregated values

2017-06-06 Thread DuyHai Doan
First Group By is only allowed on partition keys and clustering columns, not on arbitrary column. The internal implementation of group by tries to fetch data on clustering order to avoid having to "re-sort" them in memory which would be very expensive Second, group by works best when restricted

Re: Reg:- Generate dummy data for Cassandra Tables

2017-06-05 Thread DuyHai Doan
Personally I'm using https://github.com/Marak/faker.js/ to generate various kind of dataset. That's the most comprehensive "free" data generator I've found so far but it's in JS. On Mon, Jun 5, 2017 at 7:13 AM, Jeff Jirsa wrote: > On 2017-06-04 20:03 (-0700), "@Nandan@"

Re: Apache Cassandra - Configuration Management

2017-05-17 Thread DuyHai Doan
For configuration management there are tons of tools out there: - ansible - chef - puppet - saltstack I surely forgot a few others On Wed, May 17, 2017 at 6:33 PM, ZAIDI, ASAD A wrote: > Good Morning Folks – > > > > I’m running 14 nodes Cassandra cluster in two data centers ,

Re: Reg:- DSE 5.1.0 Issue

2017-05-16 Thread DuyHai Doan
Nandan Since you have asked many times questions about DSE on this OSS mailing list, I suggest you to contact directly Datastax if you're using their enterprise edition. Every Datastax customer has access to their support. If you're a sub-contractor for a final customer that is using DSE, ask

Re: Testing Stratio Index Queries with Cassandra-Stress Tool

2017-04-25 Thread DuyHai Doan
Use Gatling with the CQL plugin: https://github.com/gatling-cql/GatlingCql On Tue, Apr 25, 2017 at 2:36 PM, Akshay Suresh < akshay.sur...@unotechsoft.com> wrote: > Hi > > I have a set of tables with Stratio Index. > > Is there anyway to test Stratio based SELECT queries using the >

Re: sasi index question (read timeout on many selects)

2017-02-16 Thread DuyHai Doan
Using MV and put id as partition key is your best bet right now. SASI will be too expensive for this simple use case On Thu, Feb 16, 2017 at 3:21 PM, Micha wrote: > > > it's like having a table (sha256 blob primary key, id timeuuid, data1 > text, ., ) > > So both,

Re: sasi index question (read timeout on many selects)

2017-02-16 Thread DuyHai Doan
[image: Inline image 1] On Thu, Feb 16, 2017 at 3:08 PM, Micha <mich...@fantasymail.de> wrote: > > > On 16.02.2017 14:30, DuyHai Doan wrote: > > Why indexing BLOB data ? It does not make any sense > > My partition key is a secure hash sum, I don't index a blob. > > > > >

Re: sasi index question (read timeout on many selects)

2017-02-16 Thread DuyHai Doan
Why indexing BLOB data ? It does not make any sense "I thought sasi index is globally held, in contrast to the normal secondary index.." --> Who said that ? It's just wrong On Thu, Feb 16, 2017 at 1:50 PM, Micha wrote: > Hi, > > > my table has (among others) three

Re: Time series data model and tombstones

2017-02-08 Thread DuyHai Doan
nning > TWCS for five days now, and things are stable so far. I also had a patch to > the application code to implement date partitioning ready to go, but I > wanted to see how things went with only making the compaction changes. > > On Sun, Jan 29, 2017 at 4:05 PM, DuyHai Doan &

Re: Why does CockroachDB github website say Cassandra has no Availability on datacenter failure?

2017-02-07 Thread DuyHai Doan
The link you posted doesn't say anything about Cassandra Le 7 févr. 2017 11:41, "Kant Kodali" a écrit : > Why does CockroachDB github website say Cassandra has no Availability on > datacenter failure? > > https://github.com/cockroachdb/cockroach >

Re: Global TTL vs Insert TTL

2017-02-01 Thread DuyHai Doan
@rustyrazorblade ~/dev/cassandra/data/data/test$ > ../../../tools/bin/sstablemetadata a-7bca6b50e8a511e6869a5596edf4dd > 35/mc-1-big-Data.db > . > SSTable max local deletion time: 1485980862 > > On Wed, Feb 1, 2017 at 6:59 AM DuyHai Doan <doanduy...@gmail.com> w

Re: Global TTL vs Insert TTL

2017-02-01 Thread DuyHai Doan
Global TTL is better than dynamic runtime TTL Why ? Because Global TTL is a table property and Cassandra can perform optimization when compacting. For example if it can see than the maxTimestamp of an SSTable is older than the table Global TTL, the SSTable can be entirely dropped during

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
there > scenarios in which Cassandra will have to read cells where time < 50? In > particular I am wondering if compression might have any affect. > > On Sun, Jan 29, 2017 at 3:01 PM DuyHai Doan <doanduy...@gmail.com> wrote: > >> "Should the data be sorted by my tim

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
ith > respect to sorting. I have been able to review some SSTables with > sstablemetadata and I can see that old/expired data is definitely living > with live data. > > > On Sun, Jan 29, 2017 at 2:38 PM, DuyHai Doan <doanduy...@gmail.com> wrote: > >> Ok so give it a try with TW

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
has to scan over a lot tombstones to fetch the correct range of data thus your issue On Sun, Jan 29, 2017 at 8:19 PM, John Sanda <john.sa...@gmail.com> wrote: > It was with STCS. It was on a 2.x version before TWCS was available. > > On Sun, Jan 29, 2017 at 10:58 AM DuyHa

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ? If you're using DTCS, beware of its weird behavior and tricky configuration. On Sun, Jan 29, 2017 at 3:52 PM, John Sanda wrote: > Your partitioning key is text. If you have multiple entries per id

Re: Time series data model and tombstones

2017-01-28 Thread DuyHai Doan
When the data expired (after TTL of 7 days), at the next compaction they are transformed into tombstonnes and will still stay there during gc_grace_seconds. After that, they (the tombstonnes) will be completely removed at the next compaction, if there is any ... So doing some maths, supposing

Re: implementing a 'sorted set' on top of cassandra

2017-01-14 Thread DuyHai Doan
Sorting on an "incremented" numeric value has always been a nightmare to be done properly in C* Either use Counter type but then no sorting is possible since counter cannot be used as type for clustering column (which allows sort) Or use simple numeric type on clustering column but then to

  1   2   3   4   5   6   >