from:"DuyHai Doan"

Re: Looking for pointers about replication internal working

2021-09-02 Thread DuyHai Doan

As far as I remember, Apache Cassandra wanted to be self-sufficient and
avoid pulling yet-another-piece-of-external-software for its internal work.

With lightweight transactions since 3.0, it has the sufficient primitive
for some scenarios that require linearizability

My 2 cents

Duy Hai DOAN

On Thu, Sep 2, 2021 at 1:52 AM Han  wrote:

> Hi,
>
> I'm reading an old annotated version of the Cassandra paper (
> https://docs.datastax.com/en/articles/cassandra/cassandrathenandnow.html)
> ,  and am curious about this annotation about "Replication" section:
>
> Zookeeper usage was restricted to Facebook’s in-house Cassandra branch;
> Apache Cassandra has always avoided it.
>
> 
>
> Is there any paper or blog or other pointers to understand what Apache
> Cassandra did to avoid Zookeeper?
>
> Thanks!
>
> Han
>
>

Re: What does the community think of the DataStax 4.x Java driver changes?

2020-10-29 Thread DuyHai Doan

Just my 2 cents

Because of the tremendous breaking changes in terms of API as well as
public facing classes (QueryBuilder for ex) I have stopped the development
of the Achilles framework.

Migrating to the 4.x version would require almost the complete rewrite of
the framework, an effort which I cannot afford to dedicate to (the
framework is 7 years old now). I also advise many customers of mine to
adopt a wait & see strategy with regards to the new drivers because of the
amount of application rewrite, due to the aforementioned public-facing
classes changes

Regards

Duy Hai DOAN

On Thu, Oct 29, 2020 at 2:06 PM Johnny Miller  wrote:

> Joshua - thanks for the update, I have found the ASF slack channel
> (#cassandra-cep-drivers-donation) and google doc (
> https://docs.google.com/document/d/1e0SsZxjeTabzrMv99pCz9zIkkgWjUd4KL5Yp0GFzNnY/edit#).
> Will be watching it closely.
>
> In terms of the functional changes brought into the driver with 4.x the
> downgrading CL has always been a controversial feature, but the failover to
> remote DC being removed - I am curious to understand why?
>
> Thanks,
>
> Johnny
>
> On Thu, 29 Oct 2020 at 13:53, Joshua McKenzie 
> wrote:
>
>> That's an immense amount of incredibly useful feedback Johnny. Thanks for
>> taking the time and energy to write all this up.
>>
>> I work with some of the engineers who authored these changes in the
>> driver and have brought this thread to their attention. The authors have
>> offered the driver as a CEP donation to the C* project so we will have one
>> in tree which should give a clear path to fixing some of these API issues
>> as well as the loss of functionality on a major.
>>
>>
>> On Thu, Oct 29, 2020 at 8:37 AM, Johnny Miller 
>> wrote:
>>
>>> Hi Everybody,
>>>
>>>
>>> We wanted to reach out to the community around the v4 changes in the
>>> DataStax Java driver and gauge people's opinions on some of the changes.
>>> DataStax have done a tremendous job over the years on the Cassandra drivers
>>> and contributing to this community. However, we are currently struggling to
>>> adopt the latest driver due to these changes.
>>>
>>>
>>> We have been working on a project to upgrade an application from v3 to
>>> v4.9 of the driver and have encountered major changes between these
>>> versions.
>>>
>>>
>>> We have observed the latest version of the driver contains many more
>>> DataStax Enterprise (DSE) specific code, and this is not surprising as
>>> DataStax have been generous to build it for the Cassandra community.
>>>
>>>
>>> From our understanding, the DSE specific code must be included even if
>>> you are unable to use it or require it. For example, in CqlSessionBuilder
>>> class which is the main entry point into the driver,  there are APIs
>>> relating directly to DataStax Enterprise non-OSS functionality, their cloud
>>> DBaaS etc.. e.g.
>>>
>>>
>>> - withApplicationName (
>>> https://docs.datastax.com/en/drivers/java/4.9/com/datastax/oss/driver/api/core/session/SessionBuilder.html#withApplicationName-java.lang.String-
>>> )
>>>
>>> - withApplicationVersion (
>>> https://docs.datastax.com/en/drivers/java/4.9/com/datastax/oss/driver/api/core/session/SessionBuilder.html#withApplicationVersion-java.lang.String-
>>> )
>>>
>>> - withCloudProxyAddress (
>>> https://docs.datastax.com/en/drivers/java/4.9/com/datastax/oss/driver/api/core/session/SessionBuilder.html#withCloudProxyAddress-java.net.InetSocketAddress-
>>> )
>>>
>>> - withCloudProxyAddress (
>>> https://docs.datastax.com/en/drivers/java/4.9/com/datastax/oss/driver/api/core/session/SessionBuilder.html#withCloudSecureConnectBundle-java.io.InputStream-
>>> )
>>>
>>>
>>> plus more.
>>>
>>>
>>> All of these are sitting under the com.datastax.oss package - not the
>>> com.datastax.dse package.
>>>
>>>
>>> Additionally the reference.conf for the default driver settings contains
>>> a large number of DSE specific options:
>>>
>>>
>>> https://github.com/datastax/java-driver/blob/4.9.0/core/src/main/resources/reference.conf
>>>
>>>
>>> We would like to have seen this implemented in a subclass of the
>>> CqlSessionBuilder eg. DataStaxCqlSessionBuilder and the conf split into two
>>> separate config files.
>>>
>>>
>>> Additionally, the structure of the library is such that it is bundling
>>> all of the DSE driver code with the non-DSE driver code eg. graph, search
>>> etc. We would also like to have seen DataStax to have implemented it as
>>> separate libs and use a dependency on an OSS only lib in the datastax
>>> specific lib for the shared functionality.
>>>
>>>
>>> It would be great to be able to only take in the dependencies and code
>>> needed for Apache Cassandra and not the commercial products around it.
>>>
>>>
>>> However, the above observations are trivial compared to the two core
>>> features of the driver that seem to have been deprecated and we would like
>>> your opinion.
>>>
>>>
>>> 1 - No more failovers to remote-DC
>>>
>>> Previous versions of the driver allowed the driver to

Re: Dynamo autoscaling: does it beat cassandra?

2019-12-09 Thread DuyHai Doan

Out of curiosity, does DynamoDB autoscaling allows you to exceed the
partition limits  (e.g. push more data than it is allowed for some outlier
heavy partitions) ? If yes, it can be interesting (I guess DynamoDB is
doing some kind of rebalancing behind the scene). If no, it's just an
artificial capping figure they increase to cope with spikes in throughput

On Mon, Dec 9, 2019 at 9:35 PM Carl Mueller
 wrote:

> Dynamo salespeople have been pushing autoscaling abilities that have been
> one of the key temptations to our management to switch off of cassandra.
>
> Has anyone done any numbers on how well dynamo will autoscale demand
> spikes, and how we could architect cassandra to compete with such abilities?
>
> We probably could overprovision and with the presumably higher cost of
> dynamo beat it, although the sales engineers claim they are closing the
> cost factor too. We could vertically scale to some degree, but node
> expansion seems close.
>
> VNode expansion is still limited to one at a time?
>
> We use VNodes so we can't do netflix's cluster doubling, correct? With
> cass 4.0's alleged segregation of the data by token we could though and
> possibly also "prep" the node by having the necessary sstables already
> present ahead of time?
>
> There's always "caching" too, but there isn't a lot of data on general
> fronting of cassandra with caches, and the row cache continues to be mostly
> useless?
>

Re: TTL on UDT

2019-12-09 Thread DuyHai Doan

It depends on.. Latest version of Cassandra allows unfrozen UDT. The
individual fields of UDT are updated atomically and they are stored
effectively in distinct physical columns inside the partition, thus
applying ttl() on them makes sense. I'm not sure however if the CQL parser
allows this syntax

On Mon, Dec 9, 2019 at 9:13 PM Carl Mueller
 wrote:

> I could be wrong, but UDTs I think are written (and overwritten) as one
> unit, so the notion of a TTL on a UDT field doesn't exist, the TTL is
> applied to the overall structure.
>
> Think of it like a serialized json object with multiple fields. To update
> a field they deserialize the json, then reserialize the json with the new
> value, and the whole json object has the new timestamp or ttl.
>
> On Tue, Dec 3, 2019 at 10:02 AM Mark Furlong 
> wrote:
>
>> When I run the command ‘select ttl(udt_field) from table; I’m getting an
>> error ‘InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="Cannot use selection function ttl on collections"’. How can I get
>> the TTL from a UDT field?
>>
>>
>>
>> *Mark Furlong*
>>
>>
>>
>>
>>
>> We empower journeys of personal discovery to enrich lives
>>
>>
>>
>>
>>
>

Re: Cluster sizing for huge dataset

2019-10-04 Thread DuyHai Doan

The problem is that the user wants to access old data also using cql, not
popping un a Sparksql just to fetch one or two old records

Le 4 oct. 2019 12:38, "Cedrick Lunven"  a
écrit :

> Hi,
>
> If you are using DataStax Enterprise why not offloading cold data to DSEFS
> (HDFS implementation) with friendly analytics storage format like parquet,
> keep only OLTP in the Cassandra Tables. Recommended size for DSEFS can go
> up to 30TB a node.
>
> I am pretty sure you are already aware of this option and would be curious
> to get your think about this solution and limitations.
>
> Note: that would also probably help you with your init-load/TWCS issue .
>
> My2c.
> Cedrick
>
> On Tue, Oct 1, 2019 at 11:49 PM DuyHai Doan  wrote:
>
>> The client wants to be able to access cold data (2 years old) in the
>> same cluster so moving data to another system is not possible
>>
>> However, since we're using Datastax Enterprise, we can leverage Tiered
>> Storage and store old data on Spinning Disks to save on hardware
>>
>> Regards
>>
>> On Tue, Oct 1, 2019 at 9:47 AM Julien Laurenceau
>>  wrote:
>> >
>> > Hi,
>> > Depending on the use case, you may also consider storage tiering with
>> fresh data on hot-tier (Cassandra) and older data on cold-tier
>> (Spark/Parquet or Presto/Parquet). It would be a lot more complex, but may
>> fit more appropriately the budget and you may reuse some tech already
>> present in your environment.
>> > You may even do subsampling during the transformation offloading data
>> from Cassandra in order to keep one point out of 10 for older data if
>> subsampling makes sense for your data signal.
>> >
>> > Regards
>> > Julien
>> >
>> > Le lun. 30 sept. 2019 à 22:03, DuyHai Doan  a
>> écrit :
>> >>
>> >> Thanks all for your reply
>> >>
>> >> The target deployment is on Azure so with the Nice disk snapshot
>> feature, replacing a dead node is easier, no streaming from Cassandra
>> >>
>> >> About compaction overhead, using TwCs with a 1 day bucket and removing
>> read repair and subrange repair should be sufficient
>> >>
>> >> Now the only remaining issue is Quorum read which triggers repair
>> automagically
>> >>
>> >> Before 4.0  there is no flag to turn it off unfortunately
>> >>
>> >> Le 30 sept. 2019 15:47, "Eric Evans"  a
>> écrit :
>> >>
>> >> On Sat, Sep 28, 2019 at 8:50 PM Jeff Jirsa  wrote:
>> >>
>> >> [ ... ]
>> >>
>> >> > 2) The 2TB guidance is old and irrelevant for most people, what you
>> really care about is how fast you can replace the failed machine
>> >> >
>> >> > You’d likely be ok going significantly larger than that if you use a
>> few vnodes, since that’ll help rebuild faster (you’ll stream from more
>> sources on rebuild)
>> >> >
>> >> > If you don’t want to use vnodes, buy big machines and run multiple
>> Cassandra instances in it - it’s not hard to run 3-4TB per instance and
>> 12-16T of SSD per machine
>> >>
>> >> We do this too.  It's worth keeping in mind though that you'll still
>> >> have a 12-16T blast radius in the event of a host failure.  As the
>> >> host density goes up, consider steps to make the host more robust
>> >> (RAID, redundant power supplies, etc).
>> >>
>> >> --
>> >> Eric Evans
>> >> john.eric.ev...@gmail.com
>> >>
>> >> -
>> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> >> For additional commands, e-mail: user-h...@cassandra.apache.org
>> >>
>> >>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>
> --
>
>
> Cedrick Lunven | EMEA Developer Advocate Manager
>
>
> <https://www.linkedin.com/in/clunven/> <https://twitter.com/clunven>
> <https://clun.github.io/> <https://github.com/clun/>
>
>
> ❓Ask us your questions : *DataStax Community
> <https://community.datastax.com/index.html>*
>
> Test our new products : *DataStax Labs
> <https://downloads.datastax.com/#labs>*
>
>
>
> <https://constellation.datastax.com/?utm_campaign=FY20Q2_CONSTELLATION_medium=email_source=signature>
>
>
>

Re: Challenge with initial data load with TWCS

2019-10-01 Thread DuyHai Doan

Thanks Alex for confirming

Le 30 sept. 2019 09:17, "Oleksandr Shulgin" 
a écrit :

> On Sun, Sep 29, 2019 at 9:42 AM DuyHai Doan  wrote:
>
>> Thanks Jeff for sharing the ideas. I have some question though:
>>
>> - CQLSSTableWriter and explicitly break between windows --> Even if
>> you break between windows, If we have data worth of 1 years it would
>> requires us to use CQLSSTableWriter during 1 year (365 days) because
>> the write time taken into account when flushing to SSTable is the
>> current clock timestamp :
>>
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/CQLSSTableWriter.java#L252-L259
>>
>
> Well, it's not obvious from that block of code alone if one cannot
> override the write time.  But if that's the case, maybe you want to
> subclass it and extend for that possibility.
>
> What we're looking for is a way to load 1 year of data and forcing
>> write timestamp to the past so that the initial loading operation is
>> seen by TWCS as if we have loaded the data normally day by day during
>> 1 year
>>
>> - Use the normal write path for a single window at a time, explicitly
>> calling flush between windows. --> I don't understand how calling
>> flush will trigger windowing in TWCS, as far as I know, it is based on
>> write time. And by the way, can we load data using normal CQL and just
>> forcing the write time to be in the past so that TWCS will trigger
>> compaction properly ?
>>
>
> Exactly.  If you go for this option, you should specify the write time
> explicitly by adding `USING TIMESTAMP :xxx` to your CQL statement.
>
> --
> Alex
>
>

Re: Cluster sizing for huge dataset

2019-10-01 Thread DuyHai Doan

The client wants to be able to access cold data (2 years old) in the
same cluster so moving data to another system is not possible

However, since we're using Datastax Enterprise, we can leverage Tiered
Storage and store old data on Spinning Disks to save on hardware

Regards

On Tue, Oct 1, 2019 at 9:47 AM Julien Laurenceau
 wrote:
>
> Hi,
> Depending on the use case, you may also consider storage tiering with fresh 
> data on hot-tier (Cassandra) and older data on cold-tier (Spark/Parquet or 
> Presto/Parquet). It would be a lot more complex, but may fit more 
> appropriately the budget and you may reuse some tech already present in your 
> environment.
> You may even do subsampling during the transformation offloading data from 
> Cassandra in order to keep one point out of 10 for older data if subsampling 
> makes sense for your data signal.
>
> Regards
> Julien
>
> Le lun. 30 sept. 2019 à 22:03, DuyHai Doan  a écrit :
>>
>> Thanks all for your reply
>>
>> The target deployment is on Azure so with the Nice disk snapshot feature, 
>> replacing a dead node is easier, no streaming from Cassandra
>>
>> About compaction overhead, using TwCs with a 1 day bucket and removing read 
>> repair and subrange repair should be sufficient
>>
>> Now the only remaining issue is Quorum read which triggers repair 
>> automagically
>>
>> Before 4.0  there is no flag to turn it off unfortunately
>>
>> Le 30 sept. 2019 15:47, "Eric Evans"  a écrit :
>>
>> On Sat, Sep 28, 2019 at 8:50 PM Jeff Jirsa  wrote:
>>
>> [ ... ]
>>
>> > 2) The 2TB guidance is old and irrelevant for most people, what you really 
>> > care about is how fast you can replace the failed machine
>> >
>> > You’d likely be ok going significantly larger than that if you use a few 
>> > vnodes, since that’ll help rebuild faster (you’ll stream from more sources 
>> > on rebuild)
>> >
>> > If you don’t want to use vnodes, buy big machines and run multiple 
>> > Cassandra instances in it - it’s not hard to run 3-4TB per instance and 
>> > 12-16T of SSD per machine
>>
>> We do this too.  It's worth keeping in mind though that you'll still
>> have a 12-16T blast radius in the event of a host failure.  As the
>> host density goes up, consider steps to make the host more robust
>> (RAID, redundant power supplies, etc).
>>
>> --
>> Eric Evans
>> john.eric.ev...@gmail.com
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Cluster sizing for huge dataset

2019-09-30 Thread DuyHai Doan

Thanks all for your reply

The target deployment is on Azure so with the Nice disk snapshot feature,
replacing a dead node is easier, no streaming from Cassandra

About compaction overhead, using TwCs with a 1 day bucket and removing read
repair and subrange repair should be sufficient

Now the only remaining issue is Quorum read which triggers repair
automagically

Before 4.0  there is no flag to turn it off unfortunately

Le 30 sept. 2019 15:47, "Eric Evans"  a écrit :

On Sat, Sep 28, 2019 at 8:50 PM Jeff Jirsa  wrote:

[ ... ]

> 2) The 2TB guidance is old and irrelevant for most people, what you
really care about is how fast you can replace the failed machine
>
> You’d likely be ok going significantly larger than that if you use a few
vnodes, since that’ll help rebuild faster (you’ll stream from more sources
on rebuild)
>
> If you don’t want to use vnodes, buy big machines and run multiple
Cassandra instances in it - it’s not hard to run 3-4TB per instance and
12-16T of SSD per machine

We do this too.  It's worth keeping in mind though that you'll still
have a 12-16T blast radius in the event of a host failure.  As the
host density goes up, consider steps to make the host more robust
(RAID, redundant power supplies, etc).

-- 
Eric Evans
john.eric.ev...@gmail.com

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Challenge with initial data load with TWCS

2019-09-29 Thread DuyHai Doan

Thanks Jeff for sharing the ideas. I have some question though:

- CQLSSTableWriter and explicitly break between windows --> Even if
you break between windows, If we have data worth of 1 years it would
requires us to use CQLSSTableWriter during 1 year (365 days) because
the write time taken into account when flushing to SSTable is the
current clock timestamp :
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/CQLSSTableWriter.java#L252-L259

What we're looking for is a way to load 1 year of data and forcing
write timestamp to the past so that the initial loading operation is
seen by TWCS as if we have loaded the data normally day by day during
1 year

- Use the normal write path for a single window at a time, explicitly
calling flush between windows. --> I don't understand how calling
flush will trigger windowing in TWCS, as far as I know, it is based on
write time. And by the way, can we load data using normal CQL and just
forcing the write time to be in the past so that TWCS will trigger
compaction properly ?

Regards

On Sun, Sep 29, 2019 at 3:51 AM Jeff Jirsa  wrote:
>
>
>
> We used to do either:
>
> - CQLSSTableWriter and explicitly break between windows (then nodetool 
> refresh or sstableloader to push them into the system), or
>
> - Use the normal write path for a single window at a time, explicitly calling 
> flush between windows. You can’t have current data writing while you do your 
> historical load using this method
>
>
>
> > On Sep 28, 2019, at 1:31 PM, DuyHai Doan  wrote:
> >
> > Hello users
> >
> > TWCS works great for permanent state. It creates SSTables of roughly
> > fixed size if your insertion rate is pretty constant.
> >
> > Now the big deal is about the initial load.
> >
> > Let's say we configure a TWCS with window unit = day and window size =
> > 1, we would have 1 SSTable per day and with TTL = 365 days all data
> > would expire after 1 year
> >
> > Now, since the cluster is still empty we need to load data worth of 1
> > year. If we use TWCS and if the loading takes 7 days, we would have 7
> > SSTables, each of them aggregating 365/7 worth of annual data. Ideally
> > we would like TWCS to split these data into 365 distinct SSTables
> >
> > So my question is: how to manage this scenario ? How to perform an
> > initial load for a table using TWCS and make the compaction split
> > nicely the data base on source data timestamp and not insertion
> > timestamp ?
> >
> > Regards
> >
> > Duy Hai DOAN
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Cluster sizing for huge dataset

2019-09-29 Thread DuyHai Doan

Thank you Jeff for the hints

We are targeting to reach 20Tb/machine using TWCS and 8 vnodes (using
the new token allocation algo). Also we will try the new zstd
compression.

About transient replication, the underlying trade-offs and semantics
are hard to understand for common people (for example, reading at CL
ONE in the face of 2 full replicas loss leads to unavailable
exception, unlike normal replication) so we will let it out for the
moment

Regards

On Sun, Sep 29, 2019 at 3:50 AM Jeff Jirsa  wrote:
>
> A few random thoughts here
>
> 1) 90 nodes / 900T in a cluster isn’t that big. petabyte per cluster is a 
> manageable size.
>
> 2) The 2TB guidance is old and irrelevant for most people, what you really 
> care about is how fast you can replace the failed machine
>
> You’d likely be ok going significantly larger than that if you use a few 
> vnodes, since that’ll help rebuild faster (you’ll stream from more sources on 
> rebuild)
>
> If you don’t want to use vnodes, buy big machines and run multiple Cassandra 
> instances in it - it’s not hard to run 3-4TB per instance and 12-16T of SSD 
> per machine
>
> 3) Transient replication in 4.0 could potentially be worth trying out, 
> depending on your risk tolerance. Doing 2 full and one transient replica may 
> save you 30% storage
>
> 4) Note that you’re not factoring in compression, and some of the recent zstd 
> work may go a long way if your sensor data is similar / compressible.
>
> > On Sep 28, 2019, at 1:23 PM, DuyHai Doan  wrote:
> >
> > Hello users
> >
> > I'm facing with a very challenging exercise: size a cluster with a huge 
> > dataset.
> >
> > Use-case = IoT
> >
> > Number of sensors: 30 millions
> > Frequency of data: every 10 minutes
> > Estimate size of a data: 100 bytes (including clustering columns)
> > Data retention: 2 years
> > Replication factor: 3 (pretty standard)
> >
> > A very quick math gives me:
> >
> > 6 data points / hour * 24 * 365 ~50 000 data points/ year/ sensor
> >
> > In term of size, it is 50 000 x 100 bytes = 5Mb worth of data /year /sensor
> >
> > Now the big problem is that we have 30 millions of sensor so the disk
> > requirements adds up pretty fast: 5 Mb * 30 000 000 = 5Tb * 30 = 150Tb
> > worth of data/year
> >
> > We want to store data for 2 years => 300Tb
> >
> > We have RF=3 ==> 900Tb 
> >
> > Now, according to commonly recommended density (with SSD), one shall
> > not exceed 2Tb of data per node, which give us a rough sizing of 450
> > nodes cluster !!!
> >
> > Even if we push the limit up to 10Tb using TWCS (has anyone tried this
> > ?) We would still need 90 beefy nodes to support this.
> >
> > Any thoughts/ideas to reduce the nodes count or increase density and
> > keep the cluster manageable ?
> >
> > Regards
> >
> > Duy Hai DOAN
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Challenge with initial data load with TWCS

2019-09-28 Thread DuyHai Doan

Hello users

TWCS works great for permanent state. It creates SSTables of roughly
fixed size if your insertion rate is pretty constant.

Now the big deal is about the initial load.

Let's say we configure a TWCS with window unit = day and window size =
1, we would have 1 SSTable per day and with TTL = 365 days all data
would expire after 1 year

Now, since the cluster is still empty we need to load data worth of 1
year. If we use TWCS and if the loading takes 7 days, we would have 7
SSTables, each of them aggregating 365/7 worth of annual data. Ideally
we would like TWCS to split these data into 365 distinct SSTables

So my question is: how to manage this scenario ? How to perform an
initial load for a table using TWCS and make the compaction split
nicely the data base on source data timestamp and not insertion
timestamp ?

Regards

Duy Hai DOAN

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Cluster sizing for huge dataset

2019-09-28 Thread DuyHai Doan

Hello users

I'm facing with a very challenging exercise: size a cluster with a huge dataset.

Use-case = IoT

Number of sensors: 30 millions
Frequency of data: every 10 minutes
Estimate size of a data: 100 bytes (including clustering columns)
Data retention: 2 years
Replication factor: 3 (pretty standard)

A very quick math gives me:

6 data points / hour * 24 * 365 ~50 000 data points/ year/ sensor

In term of size, it is 50 000 x 100 bytes = 5Mb worth of data /year /sensor

Now the big problem is that we have 30 millions of sensor so the disk
requirements adds up pretty fast: 5 Mb * 30 000 000 = 5Tb * 30 = 150Tb
worth of data/year

We want to store data for 2 years => 300Tb

We have RF=3 ==> 900Tb 

Now, according to commonly recommended density (with SSD), one shall
not exceed 2Tb of data per node, which give us a rough sizing of 450
nodes cluster !!!

Even if we push the limit up to 10Tb using TWCS (has anyone tried this
?) We would still need 90 beefy nodes to support this.

Any thoughts/ideas to reduce the nodes count or increase density and
keep the cluster manageable ?

Regards

Duy Hai DOAN

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Is it possible to build multi cloud cluster for Cassandra

2019-09-05 Thread DuyHai Doan

Hello all

I've given a thought to this multi-cloud marketing buzz with Cassandra

Theoretically feasible (with GossipingPropertyFileSnitch) but practically a
headache if you want a minimum of performance and security

The problem comes from the network "devils in the details"

Suppose DC1 in AWS inside a VPC and DC2 in Azure inside a VNet

One needs to allow traffic between both DCs. Of course, apart from obvious
networking stuff like allowing traffic on Security Group, there is much
more to consider.

If your C* nodes have all public IPs, then the inter-node traffic can go
through the Internet, with unpredictable latency and connection bandwidth.
Also you need to encrypt this traffic with SSL and manage certificates on
each node (and more importantly certificates ROTATION, imagine the surprise
after one year when they expire, your production is down ...). Exposing
your C* node with public IPs is also risky (hacking, DDoS )

A better solution is to have all your nodes in private VPC and VNet with
private IPs and the big question is how to route the traffic between 2
cloud providers using their network backbone without going through the
Internet

As far as I know (some may correct me if I'm wrong) there is no direct
backbone connection between AWS and Azure so you may need to ask a 3rd
party ISP like Equinix to provide this kind of link. Basically from the AWS
VPC you have a Direct Connect to Equinix, then from Equinix an Express
Route to Azure VNet. This is technically possible, complex to implement,
and especially very expensive... Of course, the SSL certificates are also
required in this scenario unless you absolutely trust your ISP (which you
shouldn't)

Another thing to consider is the outgoing traffic. Indeed when running
repair, if you have a lot of de-synchronized data between both DCs or if
you undergo a lot of over-streaming, the bill for network traffic can also
be substantial. Most cloud providers don't charge for Data In, but for Data
Out :-)

All this blabla is about inter-node traffic, I'm not even talking about
client/server traffic, good luck!

Regards

On Thu, Sep 5, 2019 at 8:22 PM Goutham reddy 
wrote:

> Thanks Jon that explained me everything.
>
> On Thu, Sep 5, 2019 at 10:00 AM Jon Haddad  wrote:
>
>> Technically, not a problem.  Use GossipingPropertyFileSnitch to keep
>> things simple and you can go across whatever cloud providers you want
>> without issue.
>>
>> The biggest issue you're going to have isn't going to be Cassandra, it's
>> having the expertise in the different cloud providers to understand their
>> strengths and weaknesses.  You'll want to benchmark every resource, and
>> properly sizing your instances to C* is now 2x (or 3x for 3 cloud
>> providers) the work.
>>
>> I recommend using Terraform to make provisioning a bit easier.
>>
>> On Thu, Sep 5, 2019 at 9:36 AM Goutham reddy 
>> wrote:
>>
>>> Hello,
>>> Is it wise and advisable to build multi cloud environment for Cassandra
>>> for High Availability.
>>> AWS as one datacenter and Azure as another datacenter.
>>> If yes are there any challenges involved?
>>>
>>> Thanks and regards,
>>> Goutham.
>>>
>>

Re: Using Cassandra as an object store

2019-04-19 Thread DuyHai Doan

Idea:

To guarantee data integrity, you can store an MD5 of all chunks data as
static column in the partition that contains the chunks

On Fri, Apr 19, 2019 at 9:18 AM cclive1601你  wrote:

> we have use cassandra as object store for some years, you can just split
> the object into some small pieces. object got a pk, then the some small
> pieces got some pks ,object's pk and pieces's pk can be store in meta table
> in cassandra, and small pieces's pk and some pieces store in data table.
> we store videos ,picture and other no structure data.
>
> Gene  于2019年4月19日周五 下午1:25写道：
>
>> Howdy
>>
>> I'm looking at the possibility of using cassandra as an object store to
>> offload image/blob data from an Oracle database.  I've seen mentions of it
>> being used as an object store in a large scale fashion, like with Walmart:
>>
>>
>> https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593
>>
>> However I have found little on small scale setups and if it's even worth
>> using Cassandra in place of something else that's meant to be used for
>> object storage, like Ceph.
>>
>> Additionally, I've read that cassandra struggles with storing objects
>> 10MB or larger and it's recommended to break objects up into smaller
>> chunks, which either requires some kind of middleware between our
>> application and cassandra, or it would require our application to split
>> objects into smaller chunks and recombine them as needed.
>>
>> I've looked into pithos and astyanax, but those are both no longer
>> developed and I'm not seeing anything that might replace them in the long
>> term.
>>
>> https://github.com/exoscale/pithos
>> https://github.com/Netflix/astyanax
>>
>> Any helpful information or advice would be greatly appreciated.
>>
>> Thanks in advance.
>>
>> -Gene
>>
>
>
> --
> you are the apple of my eye !
>

Re: Usage of allocate_tokens_for_keyspace for a new cluster

2019-02-14 Thread DuyHai Doan

Ok thanks John

On Thu, Feb 14, 2019 at 8:51 PM Jonathan Haddad  wrote:

> Create the first node, setting the tokens manually.
> Create the keyspace.
> Add the rest of the nodes with the allocate tokens uncommented.
>
> On Thu, Feb 14, 2019 at 11:43 AM DuyHai Doan  wrote:
>
>> Hello users
>>
>> By looking at the mailing list archive, there was already some questions
>> about the flag "allocate_tokens_for_keyspace" from cassandra.yaml
>>
>> I'm starting a fresh new cluster (with 0 data).
>>
>> The keyspace used by the project is raw_data so I
>> set allocate_tokens_for_keyspace = raw_data in the cassandra.yaml
>>
>> However the cluster fails to start, the keyspace does not exist yet (of
>> course, it is not yet created).
>>
>> So to me it is like chicken and egg issue:
>>
>> 1. You create a fresh new cluster with the option
>> "allocate_tokens_for_keyspace" commented out, in this case you cannot
>> optimize the token allocations
>> 2. You create a fresh new cluster with option
>> "allocate_tokens_for_keyspace" pointing to a not-yet created keyspace, it
>> fails (logically)
>>
>> The third option is:
>>
>>  a. create a new cluster with "allocate_tokens_for_keyspace" commented out
>>  b. create the keyspace "raw_data"
>>  c. set allocate_tokens_for_keyspace = raw_data
>>
>> My question is, since after step a. the token allocation is *already*
>> done, what's the point setting the flag in step c. 
>>
>> Regards
>>
>> Duy Hai DOAN
>>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>

Usage of allocate_tokens_for_keyspace for a new cluster

2019-02-14 Thread DuyHai Doan

Hello users

By looking at the mailing list archive, there was already some questions
about the flag "allocate_tokens_for_keyspace" from cassandra.yaml

I'm starting a fresh new cluster (with 0 data).

The keyspace used by the project is raw_data so I
set allocate_tokens_for_keyspace = raw_data in the cassandra.yaml

However the cluster fails to start, the keyspace does not exist yet (of
course, it is not yet created).

So to me it is like chicken and egg issue:

1. You create a fresh new cluster with the option
"allocate_tokens_for_keyspace" commented out, in this case you cannot
optimize the token allocations
2. You create a fresh new cluster with option
"allocate_tokens_for_keyspace" pointing to a not-yet created keyspace, it
fails (logically)

The third option is:

 a. create a new cluster with "allocate_tokens_for_keyspace" commented out
 b. create the keyspace "raw_data"
 c. set allocate_tokens_for_keyspace = raw_data

My question is, since after step a. the token allocation is *already* done,
what's the point setting the flag in step c. 

Regards

Duy Hai DOAN

Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-13 Thread DuyHai Doan

Plain answer is NO

There is a slight hope that the JIRA
https://issues.apache.org/jira/browse/CASSANDRA-9754 gets into 4.0 release

But right now, there seems to be few interest in this ticket, the last
comment 23/Feb/2017 old ...


On Wed, Feb 13, 2019 at 1:18 PM Vsevolod Filaretov 
wrote:

> Hi all,
>
> The question.
>
> We have Cassandra 3.11.1 with really heavy primary partitions:
> cfhistograms 95%% is 130+ mb, 95%% cell count is 3.3mln and higher, 98%%
> partition size is 220+ mb sometimes partitions are 1+ gb. We have regular
> problems with node lockdowns leading to read request timeouts under read
> requests load.
>
> Changing primary partition key structure is out of question.
>
> Are there any sharding techniques available to dilute partitions at level
> lower than 'select' requests to make read performance better? Without
> changing read requests syntax?
>
> Thank you all in advance,
> Vsevolod Filaretov.
>

Re: Max number of windows when using TWCS

2019-02-11 Thread DuyHai Doan

thanks for the pointer Jeff

On Mon, Feb 11, 2019 at 9:40 PM Jeff Jirsa  wrote:

> There's a bit of headache around overlapping sstables being strictly safe
> to delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418 was
> added to allow the "I know it's not technically safe, but just delete it
> anyway" use case. For a lot of people who started using TWCS before 13418,
> "stop cassandra, remove stuff we know is expired, start cassandra" is a
> not-uncommon pattern in very high-write, high-disk-space use cases.
>
>
>
> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth 
> wrote:
>
>> Hi,
>> In regards to comment “Purging data is also straightforward, just
>> dropping SSTables (by a script) where create date is older than a
>> threshold, we don't even need to rely on TTL”
>>
>> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds
>> past whole sstable will have only tombstones.
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On Feb 11, 2019, at 2:23 PM, DuyHai Doan  wrote:
>>
>> Purging data is also straightforward, just dropping SSTables (by a
>> script) where create date is older than a threshold, we don't even need to
>> rely on TTL
>>
>>

Re: Max number of windows when using TWCS

2019-02-11 Thread DuyHai Doan

No worry for overlapping, the use-case is about events/timeseries and there
is almost no delay so it should be fine.

On the note-side, since we have the guarantee to have 1 SSTable/day of
ingestion, this is very easy to "emulate" incremental backup. You just need
to find the generated SSTable with the latest create date and back it up
every day at midnight with a script.

Purging data is also straightforward, just dropping SSTables (by a script)
where create date is older than a threshold, we don't even need to rely on
TTL



On Mon, Feb 11, 2019 at 9:19 PM Jeff Jirsa  wrote:

> Wild ass guess based on a large use case I knew about at the time
>
> If you go above that, I expect it’d largely be fine as long as you were
> sure they weren’t overlapping so reads only ever touched a small subset of
> the windows (ideally 1).
>
> If you have one day windows and every read touches all of the windows,
> you’re going to have a bad time.
>
> --
> Jeff Jirsa
>
>
> On Feb 11, 2019, at 12:12 PM, DuyHai Doan  wrote:
>
> Hello users
>
> On the official documentation for TWCS (
> http://cassandra.apache.org/doc/latest/operating/compaction.html#time-window-compactionstrategy)
> it is advised to select the windows unit and size so that the total number
> of windows intervals is around 20-30.
>
> Is there any explanation for this range of 20-30 ? What if we exceed this
> range, let's say having 1 day windows and keeping data for 1year, thus
> having indeed 356 intervals ? What can go wrong with this ?
>
> Regards
>
> Duy Hai DOAN
>
>

Max number of windows when using TWCS

2019-02-11 Thread DuyHai Doan

Hello users

On the official documentation for TWCS (
http://cassandra.apache.org/doc/latest/operating/compaction.html#time-window-compactionstrategy)
it is advised to select the windows unit and size so that the total number
of windows intervals is around 20-30.

Is there any explanation for this range of 20-30 ? What if we exceed this
range, let's say having 1 day windows and keeping data for 1year, thus
having indeed 356 intervals ? What can go wrong with this ?

Regards

Duy Hai DOAN

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread DuyHai Doan

The idea of storing your data as a single blob can be dangerous.

Indeed, you loose the ability to perform atomic update on each column.

In Cassandra, LWW is the rule. Suppose 2 concurrent updates on the same
row, 1st update changes column Firstname (let's say it's a Person record)
and 2nd update changes column Lastname

Now depending on the timestamp between the 2 updates, you'll have:

- old Firstname, new Lastname
- new Firstname, old Lastname

having updates on columns atomically guarantees you to have new Firstname,
new Lastname

On Fri, Jan 4, 2019 at 8:17 PM Jonathan Haddad  wrote:

> Those are two different cases though.  It *sounds like* (again, I may be
> missing the point) you're trying to overwrite a value with another value.
> You're either going to serialize a blob and overwrite a single cell, or
> you're going to overwrite all the cells and include a tombstone.
>
> When you do a read, reading a single tombstone vs a single vs is
> essentially the same thing, performance wise.
>
> In your description you said "~ 20-100 events", and you're overwriting the
> event each time, so I don't know how you go to 10K tombstones either.
> Compaction will bring multiple tombstones together for a cell in the same
> way it compacts multiple values for a single cell.
>
> I sounds to make like you're taking some advice about tombstones out of
> context and trying to apply the advice to a different problem.  Again, I
> might be misunderstanding what you're doing.
>
>
> On Fri, Jan 4, 2019 at 10:49 AM Tomas Bartalos 
> wrote:
>
>> Hello Jon,
>>
>> I thought having tombstones is much higher overhead than just overwriting
>> values. The compaction overhead can be l similar, but I think the read
>> performance is much worse.
>>
>> Tombstones accumulate and hang for 10 days (by default) before they are
>> eligible for compaction.
>>
>> Also we have tombstone warning and error thresholds. If cassandra scans
>> more than 10 000 tombstones, she will abort the query.
>>
>> According to this article:
>> https://opencredo.com/blogs/cassandra-tombstones-common-issues/
>>
>> "The cassandra.yaml comments explain in perfectly: *“When executing a
>> scan, within or across a partition, we need to keep the tombstones seen in
>> memory so we can return them to the coordinator, which will use them to
>> make sure other replicas also know about the deleted rows. With workloads
>> that generate a lot of tombstones, this can cause performance problems and
>> even exhaust the server heap. "*
>>
>> Regards,
>> Tomas
>>
>> On Fri, 4 Jan 2019, 7:06 pm Jonathan Haddad >
>>> If you're overwriting values, it really doesn't matter much if it's a
>>> tombstone or any other value, they still need to be compacted and have the
>>> same overhead at read time.
>>>
>>> Tombstones are problematic when you try to use Cassandra as a queue (or
>>> something like a queue) and you need to scan over thousands of tombstones
>>> in order to get to the real data.  You're simply overwriting a row and
>>> trying to avoid a single tombstone.
>>>
>>> Maybe I'm missing something here.  Why do you think overwriting a single
>>> cell with a tombstone is any worse than overwriting a single cell with a
>>> value?
>>>
>>> Jon
>>>
>>>
>>> On Fri, Jan 4, 2019 at 9:57 AM Tomas Bartalos 
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I beleive your approach is the same as using spark with "
>>>> spark.cassandra.output.ignoreNulls=true"
>>>> This will not cover the situation when a value have to be overwriten
>>>> with null.
>>>>
>>>> I found one possible solution - change the schema to keep only primary
>>>> key fields and move all other fields to frozen UDT.
>>>> create table (year, month, day, id, frozen, primary key((year,
>>>> month, day), id) )
>>>> In this way anything that is null inside event doesn't create
>>>> tombstone, since event is serialized to BLOB.
>>>> The penalty is in need of deserializing the whole Event when selecting
>>>> only few columns.
>>>> Can anyone confirm if this is good solution performance wise?
>>>>
>>>> Thank you,
>>>>
>>>> On Fri, 4 Jan 2019, 2:20 pm DuyHai Doan >>>
>>>>> "The problem is I can't know the combination of set/unset values" -->
>>>>> Just for this requirement, Achilles has a working solution for many years
>>>>>

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread DuyHai Doan

"The problem is I can't know the combination of set/unset values" --> Just
for this requirement, Achilles has a working solution for many years using
INSERT_NOT_NULL_FIELDS strategy:

https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy

Or you can use the Update API that by design only perform update on not
null fields:
https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity


Behind the scene, for each new combination of INSERT INTO table(x,y,z)
statement, Achilles will check its prepared statement cache and if the
statement does not exist yet, create a new prepared statement and put it
into the cache for later re-use for you

Disclaiment: I'm the creator of Achilles



On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos 
wrote:

> Hello,
>
> The problem is I can't know the combination of set/unset values. From my
> perspective every value should be set. The event from Kafka represents the
> complete state of the happening at certain point in time. In my table I
> want to store the latest event so the most recent state of the happening
> (in this table I don't care about the history). Actually I used wrong
> expression since its just the opposite of "incremental update", every event
> carries all data (state) for specific point of time.
>
> The event is represented with nested json structure. Top level elements of
> the json are table fields with type like text, boolean, timestamp, list and
> the nested elements are UDT fields.
>
> Simplified example:
> There is a new purchase for the happening, event:
> {total_amount: 50, items : [A, B, C, new_item], purchase_time :
> '2018-12-27 13:30', specials: null, customer : {... }, fare_amount,...}
> I don't know what actually happened for this event, maybe there is a new
> item purchased, maybe some customer info have been changed, maybe the
> specials have been revoked and I have to reset them. I just need to store
> the state as it artived from Kafka, there might already be an event for
> this happening saved before, or maybe this is the first one.
>
> BR,
> Tomas
>
>
> On Thu, 27 Dec 2018, 9:36 pm Eric Stevens 
>> Depending on the use case, creating separate prepared statements for each
>> combination of set / unset values in large INSERT/UPDATE statements may be
>> prohibitive.
>>
>> Instead, you can look into driver level support for UNSET values.
>> Requires Cassandra 2.2 or later IIRC.
>>
>> See:
>> Java Driver:
>> https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding
>> Python Driver:
>> https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values
>> Node Driver:
>> https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset
>>
>> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <
>> sean_r_dur...@homedepot.com> wrote:
>>
>>> You say the events are incremental updates. I am interpreting this to
>>> mean only some columns are updated. Others should keep their original
>>> values.
>>>
>>> You are correct that inserting null creates a tombstone.
>>>
>>> Can you only insert the columns that actually have new values? Just skip
>>> the columns with no information. (Make the insert generator a bit smarter.)
>>>
>>> Create table happening (id text primary key, event text, a text, b text,
>>> c text);
>>> Insert into table happening (id, event, a, b, c) values
>>> ("MainEvent","The most complete info we have right now","Priceless","10
>>> pm","Grand Ballroom");
>>> -- b changes
>>> Insert into happening (id, b) values ("MainEvent","9:30 pm");
>>>
>>>
>>> Sean Durity
>>>
>>>
>>> -Original Message-
>>> From: Tomas Bartalos 
>>> Sent: Thursday, December 27, 2018 9:27 AM
>>> To: user@cassandra.apache.org
>>> Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values
>>>
>>> Hello,
>>>
>>> I’d start with describing my use case and how I’d like to use Cassandra
>>> to solve my storage needs.
>>> We're processing a stream of events for various happenings. Every event
>>> have a unique happening_id.
>>> One happening may have many events, usually ~ 20-100 events. I’d like to
>>> store only the latest event for the same happening (Event is an incremental
>>> update and it contains all up-to date data about happening).
>>> Technically the events are streamed from Kafka, processed with Spark an
>>> saved to Cassandra.
>>> In Cassandra we use upserts (insert with same primary key).  So far so
>>> good, however there comes the tombstone...
>>>
>>> When I’m inserting field with NULL value, Cassandra creates tombstone
>>> for this field. As I understood this is due to space efficiency, Cassandra
>>> doesn’t have to remember there is a NULL value, she just deletes the
>>> respective column and a delete creates a ... tombstone.
>>> I was hoping there could be an option to tell Cassandra not to be so
>>> space effective and store “unset" info without generating

Re: Tombstone removal optimization and question

2018-11-06 Thread DuyHai Doan

Thanks for the confirmation Kurt

Le 6 nov. 2018 11:59, "kurt greaves"  a écrit :

> Yes it does. Consider if it didn't and you kept writing to the same
> partition, you'd never be able to remove any tombstones for that partition.
>
> On Tue., 6 Nov. 2018, 19:40 DuyHai Doan 
>> Hello all
>>
>> I have tried to sum up all rules related to tombstone removal:
>>
>> 
>> --
>>
>> Given a tombstone written at timestamp (t) for a partition key (P) in
>> SSTable (S1). This tombstone will be removed:
>>
>> 1) after gc_grace_seconds period has passed
>> 2) at the next compaction round, if SSTable S1 is selected (not at all
>> guaranteed because compaction is not deterministic)
>> 3) if the partition key (P) is not present in any other SSTable that is
>> NOT picked by the current round of compaction
>>
>> Rule 3) is quite complex to understand so here is the detailed
>> explanation:
>>
>> If Partition Key (P) also exists in another SSTable (S2) that is NOT
>> compacted together with SSTable (S1), if we remove the tombstone, there is
>> some data in S2 that may resurrect.
>>
>> Precisely, at compaction time, Cassandra does not have ANY detail about
>> Partition (P) that stays in S2 so it cannot remove the tombstone right away.
>>
>> Now, for each SSTable, we have some metadata, namely minTimestamp and
>> maxTimestamp.
>>
>> I wonder if the current compaction optimization does use/leverage this
>> metadata for tombstone removal. Indeed if we know that tombstone timestamp
>> (t) < minTimestamp, it can be safely removed.
>>
>> Does someone has the info ?
>>
>> Regards
>>
>>
>>

Tombstone removal optimization and question

2018-11-06 Thread DuyHai Doan

Hello all

I have tried to sum up all rules related to tombstone removal:

--

Given a tombstone written at timestamp (t) for a partition key (P) in
SSTable (S1). This tombstone will be removed:

1) after gc_grace_seconds period has passed
2) at the next compaction round, if SSTable S1 is selected (not at all
guaranteed because compaction is not deterministic)
3) if the partition key (P) is not present in any other SSTable that is NOT
picked by the current round of compaction

Rule 3) is quite complex to understand so here is the detailed explanation:

If Partition Key (P) also exists in another SSTable (S2) that is NOT
compacted together with SSTable (S1), if we remove the tombstone, there is
some data in S2 that may resurrect.

Precisely, at compaction time, Cassandra does not have ANY detail about
Partition (P) that stays in S2 so it cannot remove the tombstone right away.

Now, for each SSTable, we have some metadata, namely minTimestamp and
maxTimestamp.

I wonder if the current compaction optimization does use/leverage this
metadata for tombstone removal. Indeed if we know that tombstone timestamp
(t) < minTimestamp, it can be safely removed.

Does someone has the info ?

Regards

Re: comprehensive list of checks before rolling version upgrades

2018-10-30 Thread DuyHai Doan

To add to your excellent list:

- no topology change (joining/leaving/decommissioning) nodes
- no rebuild of index/MV under way

On Tue, Oct 30, 2018 at 4:35 PM Carl Mueller
 wrote:

> Does anyone have a pretty comprehensive list of these? Many that I don't
> currently know how to check but I'm researching...
>
> I've seen:
>
> - verify disk space available for snapshot + sstablerewrite
> - gossip state agreement, all nodes are healthy
> - schema state agreement
> - ability to access all the nodes
> - no repairs, upgradesstables, and cleans underway
> - read repair/hinted handoff is not backed up
>
> Other possibles:
> - repair state? can we get away with unrepaired data?
> - pending tasks?
> - streaming state/tasks?
>
>

Re: Aggregation of Set Data Type

2018-10-23 Thread DuyHai Doan

You will need to use user defined aggregates for this

Le 23 oct. 2018 16:46, "Joseph Wonesh"  a
écrit :

> Hello all,
>
>  I am trying to aggregate rows which each contain a column of Set.
> I would like the result to contain the sum of all sets, where null would be
> equivalent to the empty set. I expected a query like: "select
> sum(my_set_column) from my_table group by my_key_column" to do this, but
> the set type is not supported by this aggregate. Does anyone know of a way
> to aggregate this using existing cassandra built-ins? Thanks!
>
> This message is private and confidential. If you have received message in
> error, please notify us and remove from your system.

Re: Released an ACID-compliant transaction library on top of Cassandra

2018-10-16 Thread DuyHai Doan

I think it does use LWT under the hood:

https://github.com/scalar-labs/scalardb/blob/master/src/main/java/com/scalar/database/transaction/consensuscommit/CommitMutationComposer.java#L74-L79

return new Put(base.getPartitionKey(), getClusteringKey(base,
result).orElse(null))
.forNamespace(base.forNamespace().get())
.forTable(base.forTable().get())
.withConsistency(Consistency.LINEARIZABLE)
.withCondition(
new PutIf(
new ConditionalExpression(ID, toIdValue(id), Operator.EQ),
new ConditionalExpression(
STATE, toStateValue(TransactionState.PREPARED),
Operator.EQ)))
.withValue(Attribute.toCommittedAtValue(current))
.withValue(Attribute.toStateValue(TransactionState.COMMITTED));



On Tue, Oct 16, 2018 at 6:40 PM sankalp kohli 
wrote:

> What License did you use? Can we please use Apache 2.0?
>
> On Tue, Oct 16, 2018 at 9:39 AM sankalp kohli 
> wrote:
>
>> This is awesome and thanks for working on it.
>>
>> On Tue, Oct 16, 2018 at 9:37 AM Ariel Weisberg  wrote:
>>
>>> Hi,
>>>
>>> Yes this does sound great. Does this rely on Cassandra's internal SERIAL
>>> consistency and CAS functionality or is that implemented at a higher level?
>>>
>>> Regards,
>>> Ariel
>>>
>>> On Tue, Oct 16, 2018, at 12:31 PM, Jeff Jirsa wrote:
>>> > This is great!
>>> >
>>> > --
>>> > Jeff Jirsa
>>> >
>>> >
>>> > > On Oct 16, 2018, at 5:47 PM, Hiroyuki Yamada 
>>> wrote:
>>> > >
>>> > > Hi all,
>>> > >
>>> > > # Sorry, I accidentally emailed the following to dev@, so
>>> re-sending to here.
>>> > >
>>> > > We have been working on ACID-compliant transaction library on top of
>>> > > Cassandra called Scalar DB,
>>> > > and are pleased to announce the release of v.1.0 RC version in open
>>> source.
>>> > >
>>> > > https://github.com/scalar-labs/scalardb/
>>> > >
>>> > > Scalar DB is a library that provides a distributed storage
>>> abstraction
>>> > > and client-coordinated distributed transaction on the storage,
>>> > > and makes non-ACID distributed database/storage ACID-compliant.
>>> > > And Cassandra is the first supported database implementation.
>>> > >
>>> > > It's been internally tested intensively and is jepsen-passed.
>>> > > (see jepsen directory for more detail)
>>> > > If you are looking for ACID transaction capability on top of
>>> cassandra,
>>> > > Please take a look and give us a feedback or contribution.
>>> > >
>>> > > Best regards,
>>> > > Hiroyuki Yamada
>>> > >
>>> > > -
>>> > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> > > For additional commands, e-mail: user-h...@cassandra.apache.org
>>> > >
>>> >
>>> > -
>>> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> > For additional commands, e-mail: user-h...@cassandra.apache.org
>>> >
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>

Re: About UDF/UDA

2018-09-27 Thread DuyHai Doan

I'm afraid you cannot have a proper tabular formatting or an expand to
multiple rows (which changes significantly the semantics)

Indeed the result of the final func is returned by CQL as a whole column
and currently there is no way to change the output formatting


On Thu, Sep 27, 2018 at 6:55 AM, Riccardo Ferrari 
wrote:

> Thank you Doan,
>
>  Indeed I'm using a FINALFUNC to compute the average already.
> Bit more context, I'm working on bucketized data, each bucket has already
> an 'event count' and an 'average' in it.
> My functions look as follow:
>
> //SFUNC
> CREATE OR REPLACE FUNCTION summaryState(state map tuple >>, name text, avgloadtime int,
> eventcount int)
> CALLED ON NULL INPUT
> RETURNS map >>
> LANGUAGE java
> AS $$
> if (name != null) {
> com.datastax.driver.core.TupleValue stats =
> (com.datastax.driver.core.TupleValue)state.get(name);
>
> if (stats == null) {
> com.datastax.driver.core.TupleType statsType =
> com.datastax.driver.core.TupleType.of(com.datastax.
> driver.core.ProtocolVersion.NEWEST_SUPPORTED, com.datastax.driver.core.
> CodecRegistry.DEFAULT_INSTANCE, com.datastax.driver.core.DataType.bigint(),
> com.datastax.driver.core.DataType.bigint(),com.
> datastax.driver.core.DataType.bigint(),com.datastax.driver.
> core.DataType.bigint());
>
> stats = statsType.newValue(Long.MAX_VALUE, 0L, 0L, 0L);
> }
>
> //Track min
> Long min_ = (Long) stats.getLong(0);
> min_ = min_ < avgloadtime ?  min_ : avgloadtime;
> stats.setLong(0, min_);
>
> //Track max
> Long max_ = (Long) stats.getLong(1);
> max_ = max_ > avgloadtime ?  max_ : avgloadtime;
> stats.setLong(1, max_);
>
> //Unroll average
> Long avgSum = (Long) stats.getLong(2);
> avgSum = avgSum + avgloadtime;
> stats.setLong(2, avgSum);
>
> //Event count
> Long sampleSum = (Long) stats.getLong(3);
> sampleSum = sampleSum + eventcount;
> stats.setLong(3, sampleSum);
>
> state.put(name, stats);
> }
> return state;
> $$;
>
> //FINALFUNC
> CREATE OR REPLACE FUNCTION summaryFinal (state map tuple >>)
> CALLED ON NULL INPUT
> RETURNS map >>
> LANGUAGE java
> AS $$
> for (Object name : state.keySet()) {
> com.datastax.driver.core.TupleValue stats =
> (com.datastax.driver.core.TupleValue) state.get(name);
>
> Long avgSum = stats.getLong(2);
> Long sampleSum = stats.getLong(3);
>
> // Workaround: I can't escame the '/' using cql and had to use
> Math.pow
> double avg_ = avgSum * Math.pow(sampleSum, -1);
> stats.setLong(2, new Double(avg_).longValue());
>
> state.put(name, stats);
> }
> return state;
> $$;
> //AGGREGATE
> CREATE OR REPLACE AGGREGATE summary(text, int, int)
> SFUNC summaryState
> STYPE map >>
> FINALFUNC summaryFinal
> INITCOND {};
>
> This gives me the following output:
> .summary(event, averageloadtime, count)
> 
> --
>  {'': (365, 870, 617, 2), ''': (381, 11668, 6024, 2)}
>
> I would like to have something lke:
> | item| min  | max| average | count |
> ---
> |   | 365 | 870 | 617| 2|
> |   | 381 | 11668 | 6024  | 2|
>
> Do you know if that is possible?
>
> On Wed, Sep 26, 2018 at 10:21 PM DuyHai Doan  wrote:
>
>> A hint to answer your Q3 is to use a final function to perform the
>> flattening or transformation on the result of the aggregation
>>
>> The syntax of an UDA is:
>>
>> CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS]
>> aggregateName(type1, type2, …)
>> SFUNC accumulatorFunction
>> STYPE stateType
>> [FINALFUNC finalFunction]
>> INITCOND initCond;
>>
>>
>> The final return type will be the return type of the FINALFUNC and not
>> necessarily the stateType
>>
>> More details by reading my blog post on it: http://www.doanduyhai.com/
>> blog/?p=1876
>>
>> On Wed, Sep 26, 2018 at 3:58 PM, Riccardo Ferrari 
>> wrote:
>>
>>> Hi users!
>>>
>>> Given my Cassandra version 3.0.x I don't have the famous GROUP BY
>>> operator available. So look

Re: About UDF/UDA

2018-09-26 Thread DuyHai Doan

A hint to answer your Q3 is to use a final function to perform the
flattening or transformation on the result of the aggregation

The syntax of an UDA is:

CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS]
aggregateName(type1, type2, …)
SFUNC accumulatorFunction
STYPE stateType
[FINALFUNC finalFunction]
INITCOND initCond;


The final return type will be the return type of the FINALFUNC and not
necessarily the stateType

More details by reading my blog post on it:
http://www.doanduyhai.com/blog/?p=1876

On Wed, Sep 26, 2018 at 3:58 PM, Riccardo Ferrari 
wrote:

> Hi users!
>
> Given my Cassandra version 3.0.x I don't have the famous GROUP BY operator
> available. So looking around I turned to UDAs.
>
> I'm aware all/most of the magic happens on the coordinator and the plan is
> to keep the data volume low to avoid too much pressure.
>
> Q1: How much is low volume. It's obvious the answer is depends but, has
> anyone some experience to share?
>
> Q2: Do I undestand correctly is does not support pagination?
>
> I need something as simple as extract `min`, `max`, `average` and  `count`
> per group where I don't know the actual group - I can't fire a query per
> each group name. - so something like `SELECT my_uda(field1, field2) WHERE
> ...;`
> This leads to:
> - a function that tracks min, max and sum up count and average. The state
> is a tuple
> - a final function that computes the average.
> - the aggregate function that uses the previous two
> the result is something like
> | 'item': (min_value, max_value, avg_value, count) , 'item2': (...),  ...|
> Q3: Is there a way to `flatten` or `explode` the result into multiple
> lines ?
> If Q3 answer is yes: Is there a way to create multiple columns out of the
> result:
> ||other_fileds | item | min | max | avg | count||
>
> BONUS: Are there altenative? Should I really take into account upgrading
> to 3.11.X ?
> Thanks!
>

Re: [EXTERNAL] Re: cold vs hot data

2018-09-17 Thread DuyHai Doan

Also for the record, I remember Datastax having something called Tiered
Storage that does move data around (folders/disk volume) based on data age.
To be checked

On Mon, Sep 17, 2018 at 10:23 PM, DuyHai Doan  wrote:

> Sean
>
> Without transactions à la SQL, how can you guarantee atomicity between
> both tables for upserts ? I mean, one write could succeed with hot table
> and fail for cold table
>
> The only solution I see is using logged batch, with a huge overhead and
> perf hit on for the writes
>
> On Mon, Sep 17, 2018 at 8:28 PM, Durity, Sean R <
> sean_r_dur...@homedepot.com> wrote:
>
>> An idea:
>>
>> On initial insert, insert into 2 tables:
>> Hot with short TTL
>> Cold/archive with a longer (or no) TTL
>> Then your hot data is always in the same table, but being expired. And
>> you can access the archive table only for the more rare circumstances. Then
>> you could have the HOT table on a different volume of faster storage. If
>> the hot/cold tables are in different keyspaces, then you could also have
>> different replication (a HOT DC and an archive DC, for example)
>>
>>
>> Sean Durity
>>
>>
>> -Original Message-
>> From: Mateusz 
>> Sent: Friday, September 14, 2018 2:40 AM
>> To: user@cassandra.apache.org
>> Subject: [EXTERNAL] Re: cold vs hot data
>>
>> On piątek, 14 września 2018 02:46:43 CEST Alaa Zubaidi (PDF) wrote:
>> > The data can grow to +100TB however the hot data will be in most cases
>> > less than 10TB but we still need to keep the rest of data accessible.
>> > Anyone has this problem?
>> > What is the best way to make the cluster more efficient?
>> > Is there a way to somehow automatically move the old data to different
>> > storage (rack, dc, etc)?
>> > Any ideas?
>>
>> We solved it using lvmcache.
>>
>> --
>> Mateusz
>> (...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś,
>> krótko mówiąc - podpora społeczeństwa."
>> Nikos Kazantzakis - "Grek Zorba"
>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>> 
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>
>

Re: [EXTERNAL] Re: cold vs hot data

2018-09-17 Thread DuyHai Doan

Sean

Without transactions à la SQL, how can you guarantee atomicity between both
tables for upserts ? I mean, one write could succeed with hot table and
fail for cold table

The only solution I see is using logged batch, with a huge overhead and
perf hit on for the writes

On Mon, Sep 17, 2018 at 8:28 PM, Durity, Sean R  wrote:

> An idea:
>
> On initial insert, insert into 2 tables:
> Hot with short TTL
> Cold/archive with a longer (or no) TTL
> Then your hot data is always in the same table, but being expired. And you
> can access the archive table only for the more rare circumstances. Then you
> could have the HOT table on a different volume of faster storage. If the
> hot/cold tables are in different keyspaces, then you could also have
> different replication (a HOT DC and an archive DC, for example)
>
>
> Sean Durity
>
>
> -Original Message-
> From: Mateusz 
> Sent: Friday, September 14, 2018 2:40 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] Re: cold vs hot data
>
> On piątek, 14 września 2018 02:46:43 CEST Alaa Zubaidi (PDF) wrote:
> > The data can grow to +100TB however the hot data will be in most cases
> > less than 10TB but we still need to keep the rest of data accessible.
> > Anyone has this problem?
> > What is the best way to make the cluster more efficient?
> > Is there a way to somehow automatically move the old data to different
> > storage (rack, dc, etc)?
> > Any ideas?
>
> We solved it using lvmcache.
>
> --
> Mateusz
> (...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś,
> krótko mówiąc - podpora społeczeństwa."
> Nikos Kazantzakis - "Grek Zorba"
>
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
> 
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

Re: Using CDC Feature to Stream C* to Kafka (Design Proposal)

2018-09-12 Thread DuyHai Doan

The biggest problem of having CDC working correctly in C* is the
deduplication issue.

Having a process to read incoming mutation from commitlog is not that hard,
having to dedup them through N replicas is much harder

The idea is : why don't we generate the CDC event directly at the
coordinator side ? Indeed, the coordinator is the single source of true for
each mutation request. As soon as the coordinator receives 1
acknowledgement from any replica, the mutation can be considered "durable"
and safely sent downstream to the CDC processor. This approach would
requires to change the write path on the coordinator side and may have
impact on performance (if writing to CDC downstream is blocking or too slow)

My 2 cents

On Wed, Sep 12, 2018 at 5:56 AM, Joy Gao  wrote:

> Re Rahul:  "Although DSE advanced replication does one way, those are use
> cases with limited value to me because ultimately it’s still a master slave
> design."
> Completely agree. I'm not familiar with Calvin protocol, but that sounds
> interesting (reading time...).
>
> On Tue, Sep 11, 2018 at 8:38 PM Joy Gao  wrote:
>
>> Thank you all for the feedback so far.
>>
>> The immediate use case for us is setting up a real-time streaming data
>> pipeline from C* to our Data Warehouse (BigQuery), where other teams can
>> access the data for reporting/analytics/ad-hoc query. We already do this
>> with MySQL
>> <https://wecode.wepay.com/posts/streaming-databases-in-realtime-with-mysql-debezium-kafka>,
>> where we stream the MySQL Binlog via Debezium <https://debezium.io>'s
>> MySQL Connector to Kafka, and then use a BigQuery Sink Connector to stream
>> data to BigQuery.
>>
>> Re Jon's comment about why not write to Kafka first? In some cases that
>> may be ideal; but one potential concern we have with writing to Kafka first
>> is not having "read-after-write" consistency. The data could be written to
>> Kafka, but not yet consumed by C*. If the web service issues a (quorum)
>> read immediately after the (quorum) write, the data that is being returned
>> could still be outdated if the consumer did not catch up. Having web
>> service interacts with C* directly solves this problem for us (we could add
>> a cache before writing to Kafka, but that adds additional operational
>> complexity to the architecture; alternatively, we could write to Kafka and
>> C* transactionally, but distributed transaction is slow).
>>
>> Having the ability to stream its data to other systems could make C* more
>> flexible and more easily integrated into a larger data ecosystem. As Dinesh
>> has mentioned, implementing this in the database layer means there is a
>> standard approach to getting a change notification stream (unlike trigger
>> which is ad-hoc and customized). Aside from replication, the change events
>> could be used for updating Elasticsearch, generating derived views (i.e.
>> for reporting), sending to an audit services, sending to a notification
>> service, and in our case, streaming to our data warehouse for analytics.
>> (one article that goes over database streaming is Martin Kleppman's Turning
>> the Database Inside Out with Apache Samza
>> <https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/>,
>> which seems relevant here). For reference, this turning database into a
>> stream of change events is pretty common in SQL databases (i.e. mysql
>> binlog, postgres WAL) and NoSQL databases that have primary-replica setup
>> (i.e. Mongodb Oplog). Recently CockroachDB introduced a CDC feature as well
>> (and they have master-less replication too).
>>
>> Hope that answers the question. That said, dedupe/ordering/getting full
>> row of data via C* CDC is a hard problem, but may be worth solving for
>> reasons mentioned above. Our proposal is an user approach to solve these
>> problems. Maybe the more sensible thing to do is to build it as part of C*
>> itself, but that's a much bigger discussion. If anyone is building a
>> streaming pipeline for C*, we'd be interested in hearing their approaches
>> as well.
>>
>>
>> On Tue, Sep 11, 2018 at 7:01 AM Rahul Singh 
>> wrote:
>>
>>> You know what they say: Go big or go home.
>>>
>>> Right now candidates are Cassandra itself but embedded or on the side
>>> not on the actual data clusters, zookeeper (yuck) , Kafka (which needs
>>> zookeeper, yuck) , S3 (outside service dependency, so no go. )
>>>
>>> Jeff, Those are great patterns. ESP. Second one. Have used it several
>>> times. Cassandra is a great place to store data in transport.
>>>

Re: Using CDC Feature to Stream C* to Kafka (Design Proposal)

2018-09-10 Thread DuyHai Doan

Also using Calvin means having to implement a distributed monotonic
sequence as a primitive, not trivial at all ...

On Mon, Sep 10, 2018 at 3:08 PM, Rahul Singh 
wrote:

> In response to mimicking Advanced replication in DSE. I understand the
> goal. Although DSE advanced replication does one way, those are use cases
> with limited value to me because ultimately it’s still a master slave
> design.
>
> I’m working on a prototype for this for two way replication between
> clusters or databases regardless of dB tech - and every variation I can get
> to comes down to some implementation of the Calvin protocol which basically
> verifies the change in either cluster , sequences it according to impact to
> underlying data, and then schedules the mutation in a predictable manner on
> both clusters / DBS.
>
> All that means is that I need to sequence the change before it happens so
> I can predictably ensure it’s Scheduled for write / Mutation. So I’m
> Back to square one: having a definitive queue / ledger separate from the
> individual commit log of the cluster.
>
>
> Rahul Singh
> Chief Executive Officer
> m 202.905.2818
>
> Anant Corporation
> 1010 Wisconsin Ave NW, Suite 250
> 
> Washington, D.C. 20007
>
> We build and manage digital business technology platforms.
> On Sep 10, 2018, 3:58 AM -0400, Dinesh Joshi ,
> wrote:
>
> On Sep 9, 2018, at 6:08 AM, Jonathan Haddad  wrote:
>
> There may be some use cases for it.. but I'm not sure what they are.  It
> might help if you shared the use cases where the extra complexity is
> required?  When does writing to Cassandra which then dedupes and writes to
> Kafka a preferred design then using Kafka and simply writing to Cassandra?
>
>
> From the reading of the proposal, it seems bring functionality similar to
> MySQL's binlog to Kafka connector. This is useful for many applications
> that want to be notified when certain (or any) rows change in the database
> primarily for a event driven application architecture.
>
> Implementing this in the database layer means there is a standard approach
> to getting a change notification stream. Downstream subscribers can then
> decide which notifications to act on.
>
> LinkedIn's databus is similar in functionality -
> https://github.com/linkedin/databus However it is for heterogenous
> datastores.
>
> On Thu, Sep 6, 2018 at 1:53 PM Joy Gao  wrote:
>
>>
>>
>> We have a* WIP design doc
>> * that goes
>> over this idea in details.
>>
>> We haven't sort out all the edge cases yet, but would love to get some
>> feedback from the community on the general feasibility of this approach.
>> Any ideas/concerns/questions would be helpful to us. Thanks!
>>
>>
> Interesting idea. I did go over the proposal briefly. I concur with Jon
> about adding more use-cases to clarify this feature's potential use-cases.
>
> Dinesh
>
>

Re: A blog about Cassandra in the IoT arena

2018-08-24 Thread DuyHai Doan

No what I meant by infinite partition is not auto sub-partitioning, even at
server-side. Ideally Cassandra should be able to support infinite partition
size and make compaction, repair and streaming of such partitions
manageable:

- compaction: find a way to iterate super efficiently through the whole
partition and merge-sort all sstables containing data of the same
partition.

 - repair: find another approach than Merkle tree because its resolution is
not granular enough. Ideally repair resolution should be at the clustering
level or every xxx clustering values

 - streaming: same idea as repair, in case of error/disconnection the
stream should be resumed at the latest clustering level checkpoint, or at
least should we checkpoint every xxx clustering values

 - partition index: find a way to index efficiently the huge partition.
Right now huge partition has a dramatic impact on partition index. The work
of Michael Kjellman on birch indices is going into the right direction
 (CASSANDRA-9754)

About tombstone, there is recently a research paper about Dotted DB and an
attempt to make delete without using tombstones:
http://haslab.uminho.pt/tome/files/dotteddb_srds.pdf



On Fri, Aug 24, 2018 at 12:38 AM, Rahul Singh 
wrote:

> Agreed. One of the ideas I had on partition size is to automatically
> synthetically shard based on some basic patterns seen in the data.
>
> It could be implemented as a tool that would create a new table with an
> additional part of the key that is an automatic created shard, or it would
> use an existing key and then migrate the data.
>
> The internal automatic shard would adjust as needed and keep
> “Subpartitons” or “rowsets” but return the full partition given some
> special CQL
>
> This is done today at the Data Access layer and he data model design but
> it’s pretty much a step by step process that could be algorithmically done.
>
> Regarding the tombstone — maybe we have another thread dedicated to
> cleaning tombstones - separate from compaction. Depending on the amount of
> tombstones and a threshold, it would be dedicated to deletion. It may be an
> edge case , but people face issues with tombstones all the time because
> they don’t know better.
>
> Rahul
> On Aug 23, 2018, 11:50 AM -0500, DuyHai Doan ,
> wrote:
>
> As I used to tell some people, the day we make :
>
> 1. partition size unlimited, or at least huge partition easily manageable
> (compaction, repair, streaming, partition index file)
> 2. tombstone a non-issue
>
> that day, Cassandra will dominate any other IoT technology out there
>
> Until then ...
>
> On Thu, Aug 23, 2018 at 4:54 PM, Rahul Singh  > wrote:
>
>> Good analysis of how the different key structures affect use cases and
>> performance. I think you could extend this article with potential
>> evaluation of FiloDB which specifically tries to solve the OLAP issue with
>> arbitrary queries.
>>
>> Another option is leveraging Elassandra (index in Elasticsearch
>> collocates with C*) or DataStax (index in Solr collocated with C*)
>>
>> I personally haven’t used SnappyData but that’s another Spark based DB
>> that could be leveraged for performance real-time queries on the OLTP side.
>>
>> Rahul
>> On Aug 23, 2018, 2:48 AM -0500, Affan Syed , wrote:
>>
>> Hi,
>>
>> we wrote a blog about some of the results that engineers from AN10 shared
>> earlier.
>>
>> I am sharing it here for greater comments and discussions.
>>
>> http://www.an10.io/technology/cassandra-and-iot-queries-are-
>> they-a-good-match/
>>
>>
>> Thank you.
>>
>>
>>
>> - Affan
>>
>>
>

Re: A blog about Cassandra in the IoT arena

2018-08-23 Thread DuyHai Doan

As I used to tell some people, the day we make :

1. partition size unlimited, or at least huge partition easily manageable
(compaction, repair, streaming, partition index file)
2. tombstone a non-issue

that day, Cassandra will dominate any other IoT technology out there

Until then ...

On Thu, Aug 23, 2018 at 4:54 PM, Rahul Singh 
wrote:

> Good analysis of how the different key structures affect use cases and
> performance. I think you could extend this article with potential
> evaluation of FiloDB which specifically tries to solve the OLAP issue with
> arbitrary queries.
>
> Another option is leveraging Elassandra (index in Elasticsearch collocates
> with C*) or DataStax (index in Solr collocated with C*)
>
> I personally haven’t used SnappyData but that’s another Spark based DB
> that could be leveraged for performance real-time queries on the OLTP side.
>
> Rahul
> On Aug 23, 2018, 2:48 AM -0500, Affan Syed , wrote:
>
> Hi,
>
> we wrote a blog about some of the results that engineers from AN10 shared
> earlier.
>
> I am sharing it here for greater comments and discussions.
>
> http://www.an10.io/technology/cassandra-and-iot-queries-are-
> they-a-good-match/
>
>
> Thank you.
>
>
>
> - Affan
>
>

Re: full text search on some text columns

2018-07-31 Thread DuyHai Doan

I had SASI in mind before stopping myself from replying to this thread.
Actually the OP needs to index clustering column and partition key, and as
far as I remember, I've myself opened a JIRA and pushed a patch for SASI to
support indexing composite partition key but there are some issues so far
preventing this to be merged into trunk

https://issues.apache.org/jira/browse/CASSANDRA-11734

https://issues.apache.org/jira/browse/CASSANDRA-13228

On Tue, Jul 31, 2018 at 5:17 PM, Jordan West  wrote:

>
>
> On Tue, Jul 31, 2018 at 7:45 AM, onmstester onmstester <
> onmstes...@zoho.com> wrote:
>
>> I need to do a full text search (like) on one of my clustering keys and
>> one of partition keys (it use text as data type).
>>
>
> For simple LIKE queries on existing columns you could give SASI (
> https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useSASIIndex.html)
> a try without having to stand up a separate piece of software. Its
> relatively new and isn’t as battle tested as other parts of Cassandra but
> it has been used in production. There are some performance issues with
> wider-CQL partitions if you have those (https://issues.apache.org/
> jira/browse/CASSANDRA-11990). I hope to address that for 4.0, time
> permitted.
>
> Full disclosure, I was one of the original SASI authors.
>
>
>> The input rate is high so only Cassandra could handle it, Is there any
>> open source version project which help using cassandra+ solr or cassandra +
>> elastic?
>> Any Recommendation on doing this with home-made solutions would be
>> appreciated?
>>
>> Sent using Zoho Mail 
>>
>>
>
>
>

Re: which driver to use with cassandra 3

2018-07-20 Thread DuyHai Doan

Spring data cassandra is so so ... It has less features (at last at the
time I looked at it) than the default Java driver

For driver, right now most of people are using Datastax's ones

On Fri, Jul 20, 2018 at 3:36 PM, Vitaliy Semochkin 
wrote:

> Hi,
>
> Which driver to use with cassandra 3
>
> the one that is provided by datastax, netflix or something else.
>
> Spring uses driver from datastax, though is it a reliable solution for
> a long term project, having in mind that datastax and cassandra
> parted?
>
> Regards,
> Vitaliy
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: default_time_to_live vs TTL on insert statement

2018-07-12 Thread DuyHai Doan

To set TTL on a column only and not on the whole CQL row, use UPDATE
instead:

UPDATE  USING TTL xxx SET = WHERE partition=yyy

On Thu, Jul 12, 2018 at 2:42 PM, Nitan Kainth  wrote:

> Kurt,
>
> It is same mentioned on apache docuemtation too, I am not able to find it
> right now.
>
> But my question is:
> How to set TTL for a whole column?
>
> On Wed, Jul 11, 2018 at 11:36 PM, kurt greaves 
> wrote:
>
>> The Datastax documentation is wrong. It won't error, and it shouldn't. If
>> you want to fix that documentation I suggest contacting Datastax.
>>
>> On 11 July 2018 at 19:56, Nitan Kainth  wrote:
>>
>>> Hi DuyHai,
>>>
>>> Could you please explain in what case C* will error based on documented
>>> statement:
>>>
>>> You can set a default TTL for an entire table by setting the table's
>>> default_time_to_live
>>> <https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlCreateTable.html#tabProp__cqlTableDefaultTTL>
>>>  property. If you try to set a TTL for a specific column that is longer
>>> than the time defined by the table TTL, Cassandra returns an error.
>>>
>>>
>>>
>>> On Wed, Jul 11, 2018 at 2:34 PM, DuyHai Doan 
>>> wrote:
>>>
>>>> default_time_to_live
>>>> <https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlCreateTable.html#tabProp__cqlTableDefaultTTL>
>>>>  property applies if you don't specify any TTL on your CQL statement
>>>>
>>>> However you can always override the default_time_to_live
>>>> <https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlCreateTable.html#tabProp__cqlTableDefaultTTL>
>>>>  property by specifying a custom value for each CQL statement
>>>>
>>>> The behavior is correct, nothing wrong here
>>>>
>>>> On Wed, Jul 11, 2018 at 7:31 PM, Nitan Kainth 
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> As per document: https://docs.datastax.com/en/cql/3.3/cql/cql_using
>>>>> /useExpireExample.html
>>>>>
>>>>>
>>>>>-
>>>>>
>>>>>You can set a default TTL for an entire table by setting the
>>>>>table's default_time_to_live
>>>>>
>>>>> <https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlCreateTable.html#tabProp__cqlTableDefaultTTL>
>>>>> property. If you try to set a TTL for a specific column that is
>>>>>longer than the time defined by the table TTL, Cassandra returns an 
>>>>> error.
>>>>>
>>>>>
>>>>> When I tried to test this statement, i found, we can insert data with
>>>>> TTL greater than default_time_to_live. Is the document needs correction, 
>>>>> or
>>>>> am I mis-understanding it?
>>>>>
>>>>> CREATE TABLE test (
>>>>>
>>>>> name text PRIMARY KEY,
>>>>>
>>>>> description text
>>>>>
>>>>> ) WITH bloom_filter_fp_chance = 0.01
>>>>>
>>>>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>
>>>>> AND comment = ''
>>>>>
>>>>> AND compaction = {'class': 'org.apache.cassandra.db.compa
>>>>> ction.SizeTieredCompactionStrategy', 'max_threshold': '32',
>>>>> 'min_threshold': '4'}
>>>>>
>>>>> AND compression = {'chunk_length_in_kb': '64', 'class': '
>>>>> org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>
>>>>> AND crc_check_chance = 1.0
>>>>>
>>>>> AND dclocal_read_repair_chance = 0.1
>>>>>
>>>>> AND default_time_to_live = 240
>>>>>
>>>>> AND gc_grace_seconds = 864000
>>>>>
>>>>> AND max_index_interval = 2048
>>>>>
>>>>> AND memtable_flush_period_in_ms = 0
>>>>>
>>>>> AND min_index_interval = 128
>>>>>
>>>>> AND read_repair_chance = 0.0
>>>>>
>>>>> AND speculative_retry = '99PERCENTILE';
>>>>>
>>>>> insert into test (name, description) values ('name5', 'name
>>>>> description5') using ttl 360;
>>>>>
>>>>> select * from test ;
>>>>>
>>>>>
>>>>>  name  | description
>>>>>
>>>>> ---+---
>>>>>
>>>>>  name5 | name description5
>>>>>
>>>>>
>>>>> SELECT TTL (description) from test;
>>>>>
>>>>>
>>>>>  ttl(description)
>>>>>
>>>>> --
>>>>>
>>>>>  351
>>>>>
>>>>> Can someone please clear this for me?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: default_time_to_live vs TTL on insert statement

2018-07-11 Thread DuyHai Doan

default_time_to_live

 property applies if you don't specify any TTL on your CQL statement

However you can always override the default_time_to_live

 property by specifying a custom value for each CQL statement

The behavior is correct, nothing wrong here

On Wed, Jul 11, 2018 at 7:31 PM, Nitan Kainth  wrote:

> Hi,
>
> As per document: https://docs.datastax.com/en/cql/3.3/cql/
> cql_using/useExpireExample.html
>
>
>-
>
>You can set a default TTL for an entire table by setting the table's
>default_time_to_live
>
> 
> property. If you try to set a TTL for a specific column that is
>longer than the time defined by the table TTL, Cassandra returns an error.
>
>
> When I tried to test this statement, i found, we can insert data with TTL
> greater than default_time_to_live. Is the document needs correction, or am
> I mis-understanding it?
>
> CREATE TABLE test (
>
> name text PRIMARY KEY,
>
> description text
>
> ) WITH bloom_filter_fp_chance = 0.01
>
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>
> AND comment = ''
>
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
>
> AND compression = {'chunk_length_in_kb': '64', 'class': '
> org.apache.cassandra.io.compress.LZ4Compressor'}
>
> AND crc_check_chance = 1.0
>
> AND dclocal_read_repair_chance = 0.1
>
> AND default_time_to_live = 240
>
> AND gc_grace_seconds = 864000
>
> AND max_index_interval = 2048
>
> AND memtable_flush_period_in_ms = 0
>
> AND min_index_interval = 128
>
> AND read_repair_chance = 0.0
>
> AND speculative_retry = '99PERCENTILE';
>
> insert into test (name, description) values ('name5', 'name description5')
> using ttl 360;
>
> select * from test ;
>
>
>  name  | description
>
> ---+---
>
>  name5 | name description5
>
>
> SELECT TTL (description) from test;
>
>
>  ttl(description)
>
> --
>
>  351
>
> Can someone please clear this for me?
>
>
>
>
>
>

Re: [ANNOUNCE] LDAP Authenticator for Cassandra

2018-07-05 Thread DuyHai Doan

Super great, thank you for this contribution Kurt!

On Thu, Jul 5, 2018 at 1:49 PM, kurt greaves  wrote:

> We've seen a need for an LDAP authentication implementation for Apache
> Cassandra so we've gone ahead and created an open source implementation
> (ALv2) utilising the pluggable auth support in C*.
>
> Now, I'm positive there are multiple implementations floating around that
> haven't been open sourced, and that's understandable given how much of a
> nightmare working with LDAP is, so we've come up with an implementation
> that will hopefully work for the general case, but should be perfectly
> possible to extend, or at least use an example to create your own and maybe
> contribute something back ;). It's by no means perfect, but it seems to
> work, and we're hoping people with actual LDAP environments can test and
> add support/improvements for more weird LDAP based use cases.
>
> You can find the code and setup + configuration instructions on github
> , and a blog that goes
> into more detail here
> .
>
> PS: Don't look too closely at the nasty cache hackery in the 3.11 branch,
> I'll fix it in 4.0, I promise. Just be satisfied that it works, I think.
>

Re: Write performance degradation

2018-06-18 Thread DuyHai Doan

Maybe the disk I/O cannot keep up with the high mutation rate ?

Check the number of pending compactions

On Sun, Jun 17, 2018 at 9:24 AM, onmstester onmstester 
wrote:

> Hi,
>
> I was doing 500K inserts + 100K counter update in seconds on my cluster of
> 12 nodes (20 core/128GB ram/4 * 600 HDD 10K) using batch statements
> with no problem.
> I saw a lot of warning show that most of batches not concerning a single
> node, so they should not be in a batch, on the other hand input load of my
> application
> increased by 50%, so i switched to non-batch async inserts and increased
> number of client threads so the load increased by 50%.
> The system worked for 2 days with no problem with load of 750K inserts +
> 150K counter updates per seconds but suddendly a lot of timeout on insert
> generated in log files
> Decreasing input load to previous load, even less than that did not help.
> When i restart my client (after some hours that its been started log
> timeouts and erros) it works with no problem for 20 minutes but again
> starts logging timeout errors.
> CPU load of nodes in cluster is less than 25%.
> How can i solve this problem? I'm saving all jmx metrics of cassande\ra by
> monitoring system, What should i check?
>
> Sent using Zoho Mail 
>
>
>

Re: Data Proxy for Cassandra

2018-06-11 Thread DuyHai Doan

Hello Chidamber

When you said "In addition, the data proxy is distributed based on
consistent hashing and using gossip between data proxy nodes to keep the
cached data unique (per node) and consistent", did you re-implement
Consistent hashing and gossip algorithm from scratch in your proxy layer ?

Regards

On Mon, Jun 11, 2018 at 5:46 PM, Chidamber Kulkarni 
wrote:

> Hello,
>
> We have been working on a distributed data proxy for Cassandra. A data
> proxy is a combination of proxy and caching that also takes care of data
> consistency and invalidation for insert and updates. In addition, the data
> proxy is distributed based on consistent hashing and using gossip between
> data proxy nodes to keep the cached data unique (per node) and consistent.
> Finally, we have also implemented our data proxy on a FPGA-based
> accelerator to achieve lower latency and better throughput numbers.
>
> We have a blog post with more details about our technology and initial
> results here: https://www.reniac.com/2018/04/10/turbocharging-your-
> cassandra-db-with-reniac-data-proxy/
>
> In brief, the main highlights of our results are that we observe a latency
> reduction of almost 9X-10X compared to baseline Cassandra and a throughput
> increase of 3X-4X. Interested to hear thoughts on what kind of benchmarking
> setup you would like to see us use given we are now exploring other
> workloads to benchmark with our engine.
>
> thanks,
> Chidamber
>
>

Re: what's the read cl of list read-on-write operations?

2018-04-20 Thread DuyHai Doan

The inconsistency scenario you describe can occur for sure

Now repair (read repair, consistent read + weekly repair) is there to fix it

"Why Cassandra do not read from cluster with somehow read CL before
updating the list?"

Because read-before-write on the cluster level is an anti-pattern.

The read-before-write at local storage is somehow already an anti-pattern.
That's why it's recommended to avoid using list as much as possible




On Fri, Apr 20, 2018 at 1:01 PM, Jinhua Luo <luajit...@gmail.com> wrote:

> Do you confirm it just reads the local storage? If so, I have a question:
>
> Think that, the user reads the list using QUORUM CL, e.g. the value is
> {a,b,c}, then, it wants to set the second item b.
> It sends such write request to some coordinator, but that coordinator
> has outdated version in its local storage,
> let's say it's {a,d}, then the item to set finally is not b but d,
> which is unexpected from the perspective of the previous read.
>
> Why Cassandra do not read from cluster with somehow read CL before
> updating the list?
>
>
> 2018-04-20 16:12 GMT+08:00 DuyHai Doan <doanduy...@gmail.com>:
> > The read operation on the list column is done locally on each replica so
> > replication factor does not really apply here
> >
> > On Fri, Apr 20, 2018 at 7:37 AM, Jinhua Luo <luajit...@gmail.com> wrote:
> >>
> >> Hi All,
> >>
> >> Some list operations, like set by index, needs to read the whole list
> >> before update.
> >> So what's the read consistency level of that read? Use the same cl of
> >> the setting for the normal read?
> >>
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: user-h...@cassandra.apache.org
> >>
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: what's the read cl of list read-on-write operations?

2018-04-20 Thread DuyHai Doan

The read operation on the list column is done locally on each replica so
replication factor does not really apply here

On Fri, Apr 20, 2018 at 7:37 AM, Jinhua Luo  wrote:

> Hi All,
>
> Some list operations, like set by index, needs to read the whole list
> before update.
> So what's the read consistency level of that read? Use the same cl of
> the setting for the normal read?
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: Does Cassandra supports ACID txn

2018-04-19 Thread DuyHai Doan

No ACID transaction any soon in Cassandra

On Thu, Apr 19, 2018 at 7:35 AM, Rajesh Kishore 
wrote:

> Hi,
>
> I am bit confused by reading different articles, does recent version of
> Cassandra supports ACID transaction ?
>
> I found BATCH command , but not sure if it supports rollback, consider
> that transaction I am going to perform would be on single partition.
>
> Also, what are the limitations if any?
>
> Thanks,
> Rajesh
>

Re: where does c* store the schema?

2018-04-16 Thread DuyHai Doan

There is a system_schema keyspace to store all the schema information

https://docs.datastax.com/en/cql/3.3/cql/cql_using/useQuerySystem.html#useQuerySystem__table_bhg_1bw_4v

On Mon, Apr 16, 2018 at 10:48 AM, Jinhua Luo  wrote:

> Hi All,
>
> Does c* use predefined keyspace/tables to store the user defined schema?
> If so, what's the RWN of those meta schema? And what's the procedure
> to update them?
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: Can I sort it as a result of group by?

2018-04-09 Thread DuyHai Doan

No, sorting by column other than clustering column is not possible

On Mon, Apr 9, 2018 at 11:42 AM, Eunsu Kim  wrote:

> Hello, everyone.
>
> I am using 3.11.0 and I have the following table.
>
> CREATE TABLE summary_5m (
> service_key text,
> hash_key int,
> instance_hash int,
> collected_time timestamp,
> count int,
> PRIMARY KEY ((service_key), hash_key, instance_hash, collected_time)
> )
>
>
> And I can sum count grouping by primary key.
>
> select service_key, hash_key, instance_hash, sum(count) as count_summ
> from apm.ip_summary_5m
> where service_key='ABCED'
> group by service_key, hash_key, instance_hash;
>
>
> But what I want is to get only the top 100 with a high value added.
>
> Like following query is attached … (syntax error, of course)
>
> order by count_sum limit 100;
>
> Anybody have ever solved this problem?
>
> Thank you in advance.
>
>
>

Re: Text or....

2018-04-04 Thread DuyHai Doan

Compressing client-side is better because it will save:

1) a lot of bandwidth on the network
2) a lot of Cassandra CPU because no decompression server-side
3) a lot of Cassandra HEAP because the compressed blob should be relatively
small (text data compress very well) compared to the raw size

On Wed, Apr 4, 2018 at 2:59 PM, Jeronimo de A. Barros <
jeronimo.bar...@gmail.com> wrote:

> Hi,
>
> We use a pseudo file-system table where the chunks are blobs of 64 KB and
> we never had any performance issue.
>
> Primary-key structure is ((file-uuid), chunck-id).
>
> Jero
>
> On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges 
> wrote:
>
>> Hi All,
>>
>> A certain application is writing ~55,000 characters for a single row.
>> Most of these characters are entered to one column with "text" data type.
>>
>> This looks insanely large for one row.
>> Would you suggest to change the data type from "text" to BLOB or any
>> other option that might fit this scenario?
>>
>> Thanks!
>>
>
>

Re: Text or....

2018-04-04 Thread DuyHai Doan

Compress it and stores it as a blob.
Unless you ever need to index it but I guess even with SASI indexing a so
huge text block is not a good idea

On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges 
wrote:

> Hi All,
>
> A certain application is writing ~55,000 characters for a single row. Most
> of these characters are entered to one column with "text" data type.
>
> This looks insanely large for one row.
> Would you suggest to change the data type from "text" to BLOB or any other
> option that might fit this scenario?
>
> Thanks!
>

Re: Cassandra filter with ordering query modeling

2018-03-01 Thread DuyHai Doan

https://www.slideshare.net/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics

On Thu, Mar 1, 2018 at 3:48 PM, Valentina Crisan  wrote:

> 1) I created another table for Query#2/3. The partition Key was StartTime
> and clustering key was name. When I execute my queries, I get an exception
> saying that I need to ALLOW FILTERING.
>
> *Primary key(startTime,name) - the only queries that can be answered by
> this model are: where startTime = , where startTime IN (value1, value2),
> where startTime = and name = . Clustering keys support =,<,<=,>,>= while
> partition key supports = and IN operators. *
> *Your query was with name first and then startTime so in this case
> Cassandra is telling you that cannot answer this unless you use Allow
> Filtering at the end of the query = which basically is a disaster for
> performance since will bring all data in the coordinator and perform local
> filtering of the data. So, the model is not good for this query. *
>
> 2) I created a table with Name as partitioning key and startTime as
> clustering key. This way I was able to order the data in descending order
> based on startTime. But the problem was that if a row with same "name" was
> inserted, it was overriding the previously inserted row.
>
> *In Cassandra the primary key has 2 main purposes: to answer the queries
> and to provide uniqueness for the entries. This means that every variation
> of ( name, startTime) should be unique otherwise Cassandra will overwrite
> existing values ( actually C* doesn't read before write by default) and
> write the new values. In your case name in combination with different
> starttimes should provide unicity to the entries. If it's likely to have 2
> entries for 1 name and 1 startTime then you need to insert in the primary
> key another column that will provide the uniqueness. This column will be
> last clustering key and you will not need to involve it in queries - the
> role will be only for uniqueness. *
>
>
>  Valentina
>
>
> On Thu, Mar 1, 2018 at 3:26 PM, Behroz Sikander 
> wrote:
>
>> Thank you for your response.
>>
>> I have been through the document and I have tried these techniques but I
>> failed to model my queries correctly.
>>
>> Forexample, I have already tried the following:
>> 1) I created another table for Query#2/3. The partition Key was StartTime
>> and clustering key was name. When I execute my queries, I get an exception
>> saying that I need to ALLOW FILTERING.
>> 2) I created a table with Name as partitioning key and startTime as
>> clustering key. This way I was able to order the data in descending order
>> based on startTime. But the problem was that if a row with same "name" was
>> inserted, it was overriding the previously inserted row.
>>
>> I am not sure how to model such queries.
>>
>>
>> On Thu, Mar 1, 2018 at 2:02 PM, Kyrylo Lebediev > > wrote:
>>
>>> Hi!
>>>
>>>
>>> Partition key (Id in your case) must be in WHERE cause if not using
>>> indexes (but indexes should be used carefully, not like in case of
>>> relational DB's). Also, only columns which belong to primary key ( =
>>> partition key + clustering key) can be used in WHERE in such cases. That's
>>> why 2nd and 3rd are failing.
>>> You might find this useful: http://cassandra.apache.org/do
>>> c/latest/cql/dml.html#the-where-clause
>>>
>>> There are several Cassandra handbooks available on Amazon, maybe it
>>> would be helpful for you to use some of them as starting point to
>>> understand aspects of Cassandra data[query] modeling.
>>>
>>>
>>> Regards,
>>>
>>> Kyrill
>>> --
>>> *From:* Behroz Sikander 
>>> *Sent:* Thursday, March 1, 2018 2:36:28 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Cassandra filter with ordering query modeling
>>>
>>> Hi,own vote
>>> favorite
>>> 
>>>
>>> I am new to Cassandra and I am trying to model a table in Cassandra. My
>>> queries look like the following
>>>
>>> Query #1: select * from TableA where Id = "123"Query #2: select * from 
>>> TableA where name="test" orderby startTime DESCQuery #3: select * from 
>>> TableA where state="running" orderby startTime DESC
>>>
>>> I have been able to build the table for Query #1 which looks like
>>>
>>> val tableAStatement = SchemaBuilder.createTable("tableA").ifNotExists.
>>> addPartitionKey(Id, DataType.uuid).
>>> addColumn(Name, DataType.text).
>>> addColumn(StartTime, DataType.timestamp).
>>> addColumn(EndTime, DataType.timestamp).
>>> addColumn(State, DataType.text)
>>>
>>> session.execute(tableAStatement)
>>>
>>> but for Query#2 and 3, I have tried many different things but failed.
>>> Everytime, I get stuck in a different error from cassandra.
>>>
>>> Considering the above queries, what would be the right table model? What
>>> is the right way to model such

Re: Secondary Indexes C* 3.0

2018-02-22 Thread DuyHai Doan

Read this: http://www.doanduyhai.com/blog/?p=13191




On Thu, Feb 22, 2018 at 6:44 PM, Akash Gangil  wrote:

> To provide more context, I was going through this
> https://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html#
> useWhenIndex__highCardCol
>
> On Thu, Feb 22, 2018 at 9:35 AM, Akash Gangil 
> wrote:
>
>> Hi,
>>
>> I was wondering if there are recommendations around the cardinality of
>> secondary indexes.
>>
>> As I understand an index on a column with many distinct values will be
>> inefficient. Is it because the index would only direct me to the specfic
>> sstable, but then it sequentially searches for the target records? So a
>> wide range of the index could lead to a lot of ssltable options to traverse?
>>
>> Though what's unclear is what the recommended (or benchmarked?) limit, is
>> it the index must have 100 distinct values, or can it have upto 1000 or
>> 5 distinct values?
>>
>> thanks!
>>
>>
>>
>>
>> --
>> Akash
>>
>
>
>
> --
> Akash
>

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread DuyHai Doan

So before buying any marketing claims from Microsoft or whoever, maybe
should you try to use it extensively ?

And talking about backup, have a look at DynamoDB:
http://i68.tinypic.com/n1b6yr.jpg

>From my POV, if a multi-billions company like Amazon doesn't get it right
or can't make it easy for end-user (without involving  an unwieldy Hadoop
machinery:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBPipeline.html),
what Cassandra offers in term of back-up restore is more than satisfactory




On Wed, Feb 21, 2018 at 8:56 PM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

>  Josh,
>
> To say nothing is indifference.  If you care about your community,
> sometimes don't you have to bring up a subject even though you know it's
> also temporarily adding some discomfort?
>
> As to opening a JIRA, I've got a very specific topic to try in mind now.
> An easy one I'll work on and then announce.  Someone else will have to do
> the coding.  A year from now I would probably just knock it out to make
> sure it's as easy as I expect it to be but to be honest, as I've been
> saying, I'm not set up to do that right now.  I've barely looked at any
> Cassandra code; for one; everyone on this list probably codes more than I
> do, secondly; and lastly, it's a good one for someone that wants an easy
> one to start with: vNodes.  I've already seen too many people seeking
> assistance with the vNode setting.
>
> And you can expect as others have been mentioning that there should be
> similar ones on compaction, repair and backup.
>
> Microsoft knows poor usability gives them an easy market to take over. And
> they make it easy to switch.
>
> Beginning at 4:17 in the video, it says the following:
>
> "You don't need to worry about replica sets, quorum or read
> repair.  You can focus on writing correct application logic."
>
> At 4:42, it says:
> "Hopefully this gives you a quick idea of how seamlessly you can
> bring your existing Cassandra applications to Azure Cosmos DB.  No code
> changes are required.  It works with your favorite Cassandra tools and
> drivers including for example native Cassandra driver for Spark. And it
> takes seconds to get going, and it's elastically and globally scalable."
>
> More to come,
>
> Kenneth Brotman
>
> -Original Message-
> From: Josh McKenzie [mailto:jmcken...@apache.org]
> Sent: Wednesday, February 21, 2018 8:28 AM
> To: d...@cassandra.apache.org
> Cc: User
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
> There's a disheartening amount of "here's where Cassandra is bad, and
> here's what it needs to do for me for free" happening in this thread.
>
> This is open-source software. Everyone is *strongly encouraged* to submit
> a patch to move the needle on *any* of these things being complained about
> in this thread.
>
> For the Apache Way  to
> work, people need to step up and meaningfully contribute to a project to
> scratch their own itch instead of just waiting for a random
> corporation-subsidized engineer to happen to have interests that align with
> them and contribute that to the project.
>
> Beating a dead horse for things everyone on the project knows are serious
> pain points is not productive.
>
> On Wed, Feb 21, 2018 at 5:45 AM, Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
> > On Mon, Feb 19, 2018 at 10:01 AM, Kenneth Brotman <
> > kenbrot...@yahoo.com.invalid> wrote:
> >
> > >
> > > >> Cluster wide management should be a big theme in any next major
> > release.
> > > >>
> > > >Na. Stability and testing should be a big theme in the next major
> > release.
> > > >
> > >
> > > Double Na on that one Jeff.  I think you have a concern there about
> > > the need to test sufficiently to ensure the stability of the next
> > > major release.  That makes perfect sense.- for every release,
> > > especially the major ones.  Continuous improvement is not a phase of
> > > development for example.  CI should be in everything, in every
> > > phase.  Stability and testing a part of every release not just one.
> > > A major release should be
> > a
> > > nice step from the previous major release though.
> > >
> >
> > I guess what Jeff refers to is the tick-tock release cycle experiment,
> > which has proven to be a complete disaster by popular opinion.
> >
> > There's also the "materialized views" feature which failed to
> > materialize in the end (pun intended) and had to be declared
> > experimental retroactively.
> >
> > Another prominent example is incremental repair which was introduced
> > as the default option in 2.2 and now is not recommended to use because
> > of so many corner cases where it can fail.  So again experimental as an
> afterthought.
> >
> > Not to mention that even if you are aware of the default incremental
> > and go with full repair instead, you're still up for a sad surprise:
> > anti-compaction will be triggered despite the "full"

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread DuyHai Doan

For UI and interactive data exploration there is already the Cassandra
interpreter for Apache Zeppelin that is more than decent for the job

On Wed, Feb 21, 2018 at 9:19 AM, Daniel Hölbling-Inzko <
daniel.hoelbling-in...@bitmovin.com> wrote:

> But what does this video really show? That Microsoft managed to run
> Cassandra as a SaaS product with nice UI?
> Google did that years ago with BigTable and Amazon with DynamoDB.
>
> I agree that we need more tools, but not so much for querying (although
> that would also help a bit), but just in general the project feels
> unapproachable right now.
> Besides the excellent DataStax documentation there is little best practice
> knowledge about how to operate and provision Cassandra clusters.
> Having some recipes for Chef, Puppet or Ansible that show the most common
> settings (or some Cloudfoundry/GCP Templates or Helm Charts) would be
> really useful.
> Also a list of all the projects that Cassandra goes well with (like TLP
> Reaper and and Netflix's Priam etc..)
>
> greetings Daniel
>
> On Wed, 21 Feb 2018 at 07:23 Kenneth Brotman 
> wrote:
>
>> If you watch this video through you'll see why usability is so
>> important.  You can't ignore usability issues.
>>
>> Cassandra does not exist in a vacuum.  The competitors are world class.
>>
>> The video is on the New Cassandra API for Azure Cosmos DB:
>> https://www.youtube.com/watch?v=1Sf4McGN1AQ
>>
>> Kenneth Brotman
>>
>> -Original Message-
>> From: Daniel Hölbling-Inzko [mailto:daniel.hoelbling-in...@bitmovin.com]
>> Sent: Tuesday, February 20, 2018 1:28 AM
>> To: user@cassandra.apache.org; James Briggs
>> Cc: d...@cassandra.apache.org
>> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>>
>> Hi,
>>
>> I have to add my own two cents here as the main thing that keeps me from
>> really running Cassandra is the amount of pain running it incurs.
>> Not so much because it's actually painful but because the tools are so
>> different and the documentation and best practices are scattered across a
>> dozen outdated DataStax articles and this mailing list etc.. We've been
>> hesitant (although our use case is perfect for using Cassandra) to deploy
>> Cassandra to any critical systems as even after a year of running it we
>> still don't have the operational experience to confidently run critical
>> systems with it.
>>
>> Simple things like a foolproof / safe cluster-wide S3 Backup (like
>> Elasticsearch has it) would for example solve a TON of issues for new
>> people. I don't need it auto-scheduled or something, but having to
>> configure cron jobs across the whole cluster is a pain in the ass for small
>> teams.
>> To be honest, even the way snapshots are done right now is already super
>> painful. Every other system I operated so far will just create one backup
>> folder I can export, in C* the Backup is scattered across a bunch of
>> different Keyspace folders etc.. needless to say that it took a while until
>> I trusted my backup scripts fully.
>>
>> And especially for a Database I believe Backup/Restore needs to be a
>> non-issue that's documented front and center. If not smaller teams just
>> don't have the resources to dedicate to learning and building the tools
>> around it.
>>
>> Now that the team is getting larger we could spare the resources to
>> operate these things, but switching from a well-understood RDBMs schema to
>> Cassandra is now incredibly hard and will probably take years.
>>
>> greetings Daniel
>>
>> On Tue, 20 Feb 2018 at 05:56 James Briggs > invalid>
>> wrote:
>>
>> > Kenneth:
>> >
>> > What you said is not wrong.
>> >
>> > Vertica and Riak are examples of distributed databases that don't
>> > require hand-holding.
>> >
>> > Cassandra is for Java-programmer DIYers, or more often Datastax
>> > clients, at this point.
>> > Thanks, James.
>> >
>> > --
>> > *From:* Kenneth Brotman 
>> > *To:* user@cassandra.apache.org
>> > *Cc:* d...@cassandra.apache.org
>> > *Sent:* Monday, February 19, 2018 4:56 PM
>> >
>> > *Subject:* RE: Cassandra Needs to Grow Up by Version Five!
>> >
>> > Jeff, you helped me figure out what I was missing.  It just took me a
>> > day to digest what you wrote.  I’m coming over from another type of
>> > engineering.  I didn’t know and it’s not really documented.  Cassandra
>> > runs in a data center.  Now days that means the nodes are going to be
>> > in managed containers, Docker containers, managed by Kerbernetes,
>> > Meso or something, and for that reason anyone operating Cassandra in a
>> > real world setting would not encounter the issues I raised in the way I
>> described.
>> >
>> > Shouldn’t the architectural diagrams people reference indicate that in
>> > some way?  That would have help me.
>> >
>> > Kenneth Brotman
>> >
>> > *From:* Kenneth Brotman [mailto:kenbrot...@yahoo.com]
>> > *Sent:* Monday, February 19, 2018 10:43 AM
>> > *To:*

Re: LWT broken?

2018-02-11 Thread DuyHai Doan

Mahdi , the issue in your code is here:

else // we lost LWT, fetch the winning value
 9existing_id = SELECT id FROM hash_id WHERE hash=computed_hash |
consistency = ONE

You lost LWT, it means that there is a concurrent LWT that has won the
Paxos round and has applied the value using QUORUM/SERIAL.

In best case, it means that the won LWT value has been applied to at least
2 replicas out of 3 (assuming RF=3)
In worst case, the won LWT value has not been applied yet or is pending to
be applied to any replica

Now, if you immediately read with CL=ONE, you may:

1) Read the staled value on the 3rd replica which has not yet received the
correct won LWT value
2) Or worst, read a staled value because the won LWT is being applied when
the read operation is made

That's the main reason reading with CL=SERIAL is recommended (CL=QUORUM is
not sufficient enough)

Reading with CL=SERIAL will:

a. like QUORUM, contact strict majority of replicas
b. unlike QUORUM, look for validated (but not yet applied) previous Paxos
round value and force-applied it before actually reading the new value




On Sun, Feb 11, 2018 at 5:36 PM, Mahdi Ben Hamida 
wrote:

> Totally understood that it's not worth (or it's rather incorrect) to mix
> serial and non serial operations for LWT tables. It would be highly
> satisfying to my engineer mind if someone can explain why that would cause
> issues in this particular situation. The only explanation I have is that a
> non serial read may cause a read repair to happen and that could interfere
> with a concurrent serial write, although I still can't explain how that
> would cause two different "insert if not exist" transactions to both
> succeed.
>
> --
> Mahdi.
>
> On 2/9/18 2:40 PM, Jonathan Haddad wrote:
>
> If you want consistent reads you have to use the CL that enforces it.
> There’s no way around it.
> On Fri, Feb 9, 2018 at 2:35 PM Mahdi Ben Hamida 
> wrote:
>
>> In this case, we only write using CAS (code guarantees that). We also
>> never update, just insert if not exist. Once a hash exists, it never
>> changes (it may get deleted later and that'll be a CAS delete as well).
>>
>> --
>> Mahdi.
>>
>> On 2/9/18 1:38 PM, Jeff Jirsa wrote:
>>
>>
>>
>> On Fri, Feb 9, 2018 at 1:33 PM, Mahdi Ben Hamida 
>> wrote:
>>
>>>  Under what circumstances would we be reading inconsistent results ? Is
>>> there a case where we end up reading a value that actually end up not being
>>> written ?
>>>
>>>
>>>
>>
>> If you ever write the same value with CAS and without CAS (different code
>> paths both updating the same value), you're using CAS wrong, and
>> inconsistencies can happen.
>>
>>
>>
>>
>

Re: GDPR, Right to Be Forgotten, and Cassandra

2018-02-09 Thread DuyHai Doan

Or use the new user-defined compaction option recently introduced, provided
you can determine over which SSTables a partition is spread

On Fri, Feb 9, 2018 at 5:23 PM, Jon Haddad  wrote:

> Give this a read through:
>
> https://github.com/protectwise/cassandra-util/tree/master/deleting-
> compaction-strategy
>
> Basically you write your own logic for how stuff gets forgotten, then you
> can recompact every sstable with upgradesstables -a.
>
> Jon
>
>
> On Feb 9, 2018, at 8:10 AM, Nicolas Guyomar 
> wrote:
>
> Hi everyone,
>
> Because of GDPR we really face the need to support “Right to Be Forgotten”
> requests => https://gdpr-info.eu/art-17-gdpr/  stating that *"the
> controller shall have the obligation to erase personal data without undue
> delay"*
>
> Because I usually meet customers that do not have that much clients,
> modeling one partition per client is almost always possible, easing
> deletion by partition key.
>
> Then, appart from triggering a manual compaction on impacted tables using
> STCS, I do not see how I can be GDPR compliant.
>
> I'm kind of surprised not to find any thread on that matter on the ML, do
> you guys have any modeling strategy that would make it easier to get rid of
> data ?
>
> Thank you for any given advice
>
> Nicolas
>
>
>

Re: group by select queries

2018-02-01 Thread DuyHai Doan

Worth digging into the source code of GROUP BY but as far as I remember,
using GROUP BY without any aggregation function will lead to C* picking
just the first row (or maybe last, not sure on this point) row at hand.

About ordering, since the grouping is on a component of partition key, do
not expect any sensible order since only token order matters

On Thu, Feb 1, 2018 at 6:38 AM, kurt greaves  wrote:

> Seems problematic. Would you be able to create a JIRA ticket with the
> above information/examples?
>
> On 30 January 2018 at 22:41, Modha, Digant 
> wrote:
>
>> It was local quorum.  There’s no difference with CONSISTENCY ALL.
>>
>>
>>
>> Consistency level set to LOCAL_QUORUM.
>>
>> cassandra@cqlsh> select * from wp.position  where account_id = 'user_1';
>>
>>
>>
>> account_id | security_id | counter | avg_exec_price | pending_quantity |
>> quantity | transaction_id | update_time
>>
>> +-+-++--
>> +--++---
>> --
>>
>>  user_1 |AMZN |   2 | 1239.2 |0
>> | 1011 |   null | 2018-01-25 17:18:07.158000+
>>
>>  user_1 |AMZN |   1 | 1239.2 |0
>> | 1010 |   null | 2018-01-25 17:18:07.158000+
>>
>>
>>
>> (2 rows)
>>
>> cassandra@cqlsh> select * from wp.position  where account_id = 'user_1'
>> group by security_id;
>>
>>
>>
>> account_id | security_id | counter | avg_exec_price | pending_quantity |
>> quantity | transaction_id | update_time
>>
>> +-+-++--
>> +--++---
>> --
>>
>>  user_1 |AMZN |   1 | 1239.2 |0
>> | 1010 |   null | 2018-01-25 17:18:07.158000+
>>
>>
>>
>> (1 rows)
>>
>> cassandra@cqlsh> select account_id,security_id, counter,
>> avg_exec_price,quantity, update_time from wp.position  where account_id =
>> 'user_1' group by security_id ;
>>
>>
>>
>> account_id | security_id | counter | avg_exec_price | quantity |
>> update_time
>>
>> +-+-++--
>> +-
>>
>>  user_1 |AMZN |   2 | 1239.2 | 1011 |
>> 2018-01-25 17:18:07.158000+
>>
>>
>>
>> (1 rows)
>>
>> cassandra@cqlsh>  consistency all;
>>
>> Consistency level set to ALL.
>>
>> cassandra@cqlsh> select * from wp.position  where account_id = 'user_1'
>> group by security_id;
>>
>>
>>
>> account_id | security_id | counter | avg_exec_price | pending_quantity |
>> quantity | transaction_id | update_time
>>
>> +-+-++--
>> +--++---
>> --
>>
>>  user_1 |AMZN |   1 | 1239.2 |0
>> | 1010 |   null | 2018-01-25 17:18:07.158000+
>>
>>
>>
>> (1 rows)
>>
>> cassandra@cqlsh> select account_id,security_id, counter,
>> avg_exec_price,quantity, update_time from wp.position  where account_id =
>> 'user_1' group by security_id ;
>>
>>
>>
>> account_id | security_id | counter | avg_exec_price | quantity |
>> update_time
>>
>> +-+-++--
>> +-
>>
>>  user_1 |AMZN |   2 | 1239.2 | 1011 |
>> 2018-01-25 17:18:07.158000+
>>
>>
>>
>>
>>
>> *From:* kurt greaves [mailto:k...@instaclustr.com]
>> *Sent:* Monday, January 29, 2018 11:03 PM
>> *To:* User
>> *Subject:* Re: group by select queries
>>
>>
>>
>> What consistency were you querying at? Can you retry with CONSISTENCY ALL?
>>
>>
>>
>> 
>>
>>
>> TD Securities disclaims any liability or losses either direct or
>> consequential caused by the use of this information. This communication is
>> for informational purposes only and is not intended as an offer or
>> solicitation for the purchase or sale of any financial instrument or as an
>> official confirmation of any transaction. TD Securities is neither making
>> any investment recommendation nor providing any professional or advisory
>> services relating to the activities described herein. All market prices,
>> data and other information are not warranted as to completeness or accuracy
>> and are subject to change without notice Any products described herein are
>> (i) not insured by the FDIC, (ii) not a deposit or other obligation of, or
>> guaranteed by, an insured depository institution and (iii) subject to
>> investment risks, including possible loss of the principal amount invested.
>> The information shall not be further distributed or duplicated in whole or
>> in part by any means without the prior written consent of TD Securities. TD
>> Securities is a trademark of The Toronto-Dominion Bank and represents TD
>> Securities (USA)

Re: Too many tombstones using TTL

2018-01-10 Thread DuyHai Doan

"The question is why Cassandra creates a tombstone for every column instead
of single tombstone per row?"

--> Simply because technically it is possible to set different TTL value on
each column of a CQL row

On Wed, Jan 10, 2018 at 2:59 PM, Python_Max  wrote:

> Hello, C* users and experts.
>
> I have (one more) question about tombstones.
>
> Consider the following example:
> cqlsh> create keyspace test_ttl with replication = {'class':
> 'SimpleStrategy', 'replication_factor': '1'}; use test_ttl;
> cqlsh> create table items(a text, b text, c1 text, c2 text, c3 text,
> primary key (a, b));
> cqlsh> insert into items(a,b,c1,c2,c3) values('AAA', 'BBB', 'C111',
> 'C222', 'C333') using ttl 60;
> bash$ nodetool flush
> bash$ sleep 60
> bash$ nodetool compact test_ttl items
> bash$ sstabledump mc-2-big-Data.db
>
> [
>   {
> "partition" : {
>   "key" : [ "AAA" ],
>   "position" : 0
> },
> "rows" : [
>   {
> "type" : "row",
> "position" : 58,
> "clustering" : [ "BBB" ],
> "liveness_info" : { "tstamp" : "2018-01-10T13:29:25.777Z", "ttl" :
> 60, "expires_at" : "2018-01-10T13:30:25Z", "expired" : true },
> "cells" : [
>   { "name" : "c1", "deletion_info" : { "local_delete_time" :
> "2018-01-10T13:29:25Z" }
>   },
>   { "name" : "c2", "deletion_info" : { "local_delete_time" :
> "2018-01-10T13:29:25Z" }
>   },
>   { "name" : "c3", "deletion_info" : { "local_delete_time" :
> "2018-01-10T13:29:25Z" }
>   }
> ]
>   }
> ]
>   }
> ]
>
> The question is why Cassandra creates a tombstone for every column instead
> of single tombstone per row?
>
> In production environment I have a table with ~30 columns and It gives me
> a warning for 30k tombstones and 300 live rows. It is 30 times more then it
> could be.
> Can this behavior be tuned in some way?
>
> Thanks.
>
> --
> Best regards,
> Python_Max.
>

Re: CQL Map vs clustering keys

2017-11-15 Thread DuyHai Doan

Yes, your remark is correct.

However, once CASSANDRA-7396 (right now in 4.0 trunk) get released, you
will be able to get a slice of map values using their (sorted) keys

SELECT map[fromKey ... toKey] FROM TABLE ...

Needless to say, it will be also possible to get a single element from the
map by its key with SELECT map[key] syntax

It will work exactly like clustering columns storage engine-wise.



On Wed, Nov 15, 2017 at 5:12 PM, eugene miretsky 
wrote:

> Hi,
>
> What would be the tradeoffs between using
>
> 1) Map
>
> (
>
> id UUID PRIMARY KEY,
>
> myMap map
>
> );
>
> 2) Clustering key
>
> (
>
>  id UUID PRIMARY KEY,
>
> key int,
>
> val text,
>
> PRIMARY KEY (id, key))
>
> );
>
> My understanding is that maps are stored very similarly to clustering
> columns, where the map key is part of the SSTable's column name. The main
> difference seems to be that with maps all the key/value pairs get retrieved
> together, while with clustering keys we can retrieve individual rows, or a
> range of keys.
>
> Cheers,
> Eugene
>

Re: Securing Cassandra database

2017-11-13 Thread DuyHai Doan

You can pass in login/password from the client side and encrypt the client
/ cassandra connection...

Le 13 nov. 2017 12:16, "Mokkapati, Bhargav (Nokia - IN/Chennai)" <
bhargav.mokkap...@nokia.com> a écrit :

Hi Team,



We are using Apache Cassandra 3.0.13 version.



As part of Cassandra database security, we have created database super user
authentication, but from driver side we have default cql connection syntax
as “cqlsh ” not like “cqlsh  -u username and -p
password”. So cqlsh connection failing from application side.



So we have choosen a firewall method to limit the access to Cassandra
database with system IP address ranges.



Suggest us If any other better method than IP address firewall to create a
security  for Cassandra.



Thanks,

Bhargav

Re: Cassandra using a ton of native memory

2017-11-03 Thread DuyHai Doan

8Gb of RAM being a recommended production setting for most of the workload
out there. Having only 16Gb of RAM, and because Cassandra is relying a lot
on system page cache, there should be no surprise that your 16Gb being
eaten up.

On Fri, Nov 3, 2017 at 5:40 PM, Austin Sharp  wrote:

> I’ve investigated further. It appears that the performance issues are
> because Cassandra’s memory-mapped files (*.db files) fill up the physical
> memory and start being swapped to disk. Is this related to recommendations
> to disable swapping on a machine where Cassandra is installed? Should I
> disable memory-mapped IO?
>
> I can see issues in JIRA related to Windows memory-mapped I/O but they all
> appear to be fixed prior to 3.11.
>
>
>
> *From:* Austin Sharp [mailto:austin.sh...@seeq.com]
> *Sent:* Thursday, November 2, 2017 17:51
> *To:* user@cassandra.apache.org
> *Subject:* Cassandra using a ton of native memory
>
>
>
> Hi,
>
>
>
> I have a problem with Cassandra 3.11.0 on Windows. I'm testing a workload
> w= ith a lot of read-then-writes that had no significant problems on
> Cassandra=  2.x. However, now when this workload continues for a while
> (perhaps an hou= r), Cassandra or its JVM effectively use up all of the
> machine's 16GB of me= mory. Cassandra is started with -Xmx2147M, and JMX
> shows <2GB heap memory a= nd <100MB of off-heap memory. However, when I use
> something like Process Ex= plorer, I see that Cassandra has 10 to 11GB of
> memory in its working set, a= nd Windows shows essentially no free memory
> at all. Once the system has no = free memory, other processes suffer long
> sequences of unresponsiveness.
>
>
>
> I can't see anything terribly wrong from JMX metrics or log files - they
> ne= ver show more than 1GB of non-heap memory. Where should I look to
> investiga= te this further?
>
>
>
> Thanks,
>
> Austin
>
>
>

Re: Golang + Cassandra + Text Search

2017-10-24 Thread DuyHai Doan

There is already a full text search index in Cassandra called SASI

On Tue, Oct 24, 2017 at 6:50 AM, Ridley Submission <
ridley.submission2...@gmail.com> wrote:

> Hi,
>
> Quick question, I am wondering if anyone here who works with Go has
> specific recommendations for as simple framework to add text search on top
> of cassandra?
>
> (Apologies if this is off topic—I am not quite sure what forum in the
> cassandra community would be best for this type of question)
>
> Thanks,
> Riley
>

Re: Does NTP affects LWT's ballot UUID?

2017-10-10 Thread DuyHai Doan

The ballot UUID is obtained using QUORUM agreement between replicas for a
given partition key and we use this TimeUUID ballot as write-time for the
mutation.

The only scenario where I can see a problem is that NTP goes backward in
time on a QUORUM of replicas, which would break the contract of
monotonicity. I don't know how likely this event is ...

On Tue, Oct 10, 2017 at 9:07 AM, Daniel Woo  wrote:

> Hi guys,
>
> The ballot UUID should be monotonically increasing on each coordinator,
> but the UUID in cassandra is version 1 (timestamp based), what happens if
> the NTP service adjusts system clock while a two phase paxos prepare/commit
> is in progress?
>
> --
> Thanks & Regards,
> Daniel
>

Re: new question ;-) // RE: understanding batch atomicity

2017-09-29 Thread DuyHai Doan

We should probably replace "atomic" by "automatic retry" because it
reflects exactly the actual guarantees

On Fri, Sep 29, 2017 at 6:10 PM, Jon Haddad <j...@jonhaddad.com> wrote:

> The use of “atomic” for batches is misleading.  Batches will eventually
> complete, that doesn’t make them atomic.  “All or nothing” is also
> incorrect, as you can read them in the middle and get “some parts of it”,
> and without a rollback it’s just “eventually all”.
>
>
> On Sep 29, 2017, at 10:59 AM, DE VITO Dominique <
> dominique.dev...@thalesgroup.com> wrote:
>
> Thanks DuyHai !
>
> Does anyone know if BATCH provides atomicity for all mutations of a given
> partition key for a __single__ table ?
>
> Or if BATCH provides atomicity for all mutations of a given partition key
> for __ALL__ mutated tables into the BATCH ?
>
> That is, in case of :
>
> BEGIN BATCH
> Update table_1 where PartitionKey_table_1 = 1 … => (A) mutation
> Update table_2 where PartitionKey_table_2 = 1 … => (B) mutation
> END BATCH
>
> Here, both mutations occur for the same PartitionKey = 1
> => are mutations (A) & (B) done in an atomic way (all or nothing) ?
>
> Thanks.
>
> Dominique
>
>
>
> [@@ THALES GROUP INTERNAL @@]
>
> *De :* DuyHai Doan [mailto:doanduy...@gmail.com <doanduy...@gmail.com>]
> *Envoyé :* vendredi 29 septembre 2017 17:10
> *À :* user
> *Objet :* Re: understanding batch atomicity
>
> All updates here means all mutations == INSERT/UPDATE or DELETE
>
>
>
> On Fri, Sep 29, 2017 at 5:07 PM, DE VITO Dominique <
> dominique.dev...@thalesgroup.com> wrote:
> Hi,
>
> About BATCH, the Apache doc https://cassandra.apache.
> org/doc/latest/cql/dml.html?highlight=atomicity says :
>
> “*The BATCH statement group multiple modification statements
> (insertions/updates and deletions) into a single statement. It serves
> several purposes:*
> *...*
> *All updates in a BATCH belonging to a given partition key are performed
> in isolation*”
>
> Is “All *updates*” meaning equivalent to “All modifications (whatever
> it’s sources: INSERT or UPDATE statements)” ?
>
> Or, is “*updates*” meaning partition-level isolation *only* for UPDATE
> statements into the batch (w/o taking into isolation the INSERT other
> statements into the batch) ?
>
> Thanks
>
> Regards
> Dominique
>
>
>

Re: understanding batch atomicity

2017-09-29 Thread DuyHai Doan

All updates here means all mutations == INSERT/UPDATE or DELETE



On Fri, Sep 29, 2017 at 5:07 PM, DE VITO Dominique <
dominique.dev...@thalesgroup.com> wrote:

> Hi,
>
>
>
> About BATCH, the Apache doc https://cassandra.apache.org/
> doc/latest/cql/dml.html?highlight=atomicity says :
>
>
>
> “*The BATCH statement group multiple modification statements
> (insertions/updates and deletions) into a single statement. It serves
> several purposes:*
>
> *...*
>
> *All updates in a BATCH belonging to a given partition key are performed
> in isolation*”
>
>
>
> Is “All *updates*” meaning equivalent to “All modifications (whatever
> it’s sources: INSERT or UPDATE statements)” ?
>
>
>
> Or, is “*updates*” meaning partition-level isolation *only* for UPDATE
> statements into the batch (w/o taking into isolation the INSERT other
> statements into the batch) ?
>
>
>
> Thanks
>
>
>
> Regards
>
> Dominique
>
>
>

Re: data loss in different DC

2017-09-28 Thread DuyHai Doan

If you're writing into DC1 with CL = LOCAL_xxx, there is no guarantee to be
sure to read the same data in DC2. Only repair will help you

On Thu, Sep 28, 2017 at 11:41 AM, Peng Xiao <2535...@qq.com> wrote:

> Dear All,
>
> We have a cluster with one DC1:RF=3,another DC DC2:RF=1 only for ETL,but
> we found that sometimes we can query records in DC1,while not able not find
> the same record in DC2 with local_quorum.How it happens?
> Could anyone please advise?
> looks we can only run repair to fix it.
>
> Thanks,
> Peng Xiao
>

Re: Datastax Driver Mapper & Secondary Indexes

2017-09-26 Thread DuyHai Doan

If you're looking for schema generation from Bean annotations:
https://github.com/doanduyhai/Achilles/wiki/DDL-Scripts-Generation

On Tue, Sep 26, 2017 at 2:50 PM, Daniel Hölbling-Inzko <
daniel.hoelbling-in...@bitmovin.com> wrote:

> Hi, I also just figured out that there is no schema generation off the
> mapper.
> Thanks for pointing me to the secondary index info. I'll have a look.
>
> greetings Daniel
>
> On Tue, 26 Sep 2017 at 09:42 kurt greaves  wrote:
>
>> If you've created a secondary index you simply query it by specifying it
>> as part of the where clause. Note that you should really understand the
>> drawbacks of secondary indexes before using them, as they might not be
>> incredibly efficient depending on what you need them for.
>> http://www.wentnet.com/blog/?p=77 and https://pantheon.io/blog/
>> cassandra-scale-problem-secondary-indexes might help.
>>
>> On 26 September 2017 at 07:17, Daniel Hölbling-Inzko <
>> daniel.hoelbling-in...@bitmovin.com> wrote:
>>
>>> Hi,
>>> I am currently moving an application from SQL to Cassandra using Java. I
>>> successfully got the DataStax driver and the mapper up and running, but
>>> can't seem to figure out how to set secondary indexes through the mapper.
>>> I also can't seem to find anything related to indexes in the mapper
>>> sources - am I missing something or is this missing from the client library?
>>>
>>> greetings Daniel
>>>
>>
>>

Re: Self-healing data integrity?

2017-09-11 Thread DuyHai Doan

Agree

 A tricky detail about streaming is that:

1) On the sender side, the node just send the SSTable (without any other
components like CRC files, partition index, partition summary etc...)
2) The sender does not even bother to de-serialize the SSTable data, it is
just sending the stream of bytes by reading directly SSTables content from
disk
3) On the receiver side, the node receives the bytes stream and needs to
serialize it in memory to rebuild all the SSTable components (CRC files,
partition index, partition summary ...)

So the consequences are:

a. there is a bottleneck on receiving side because of serialization
b. if there is a bit rot in SSTables, since CRC files are not sent, no
chance to detect it from receiving side
c. if we want to include CRC checks in the streaming path, it's a whole
review of the streaming architecture, not only adding some feature

On Sat, Sep 9, 2017 at 10:06 PM, Jeff Jirsa <jji...@gmail.com> wrote:

> (Which isn't to say that someone shouldn't implement this; they should,
> and there's probably a JIRA to do so already written, but it's a project of
> volunteers, and nobody has volunteered to do the work yet)
>
> --
> Jeff Jirsa
>
>
> On Sep 9, 2017, at 12:59 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>
> There is, but they aren't consulted on the streaming paths (only on normal
> reads)
>
>
> --
> Jeff Jirsa
>
>
> On Sep 9, 2017, at 12:02 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
> Jeff,
>
>  With default compression enabled on each table, isn't there CRC files
> created along side with SSTables that can help detecting bit-rot ?
>
>
> On Sat, Sep 9, 2017 at 7:50 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>
>> Cassandra doesn't do that automatically - it can guarantee consistency on
>> read or write via ConsistencyLevel on each query, and it can run active
>> (AntiEntropy) repairs. But active repairs must be scheduled (by human or
>> cron or by third party script like http://cassandra-reaper.io/), and to
>> be pedantic, repair only fixes consistency issue, there's some work to be
>> done to properly address/support fixing corrupted replicas (for example,
>> repair COULD send a bit flip from one node to all of the others)
>>
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Sep 9, 2017, at 1:07 AM, Ralph Soika <ralph.so...@imixs.com> wrote:
>>
>> Hi,
>>
>> I am searching for a big data storage solution for the Imixs-Workflow
>> project. I started with Hadoop until I became aware of the
>> 'small-file-problem'. So I am considering using Cassandra now.
>>
>> But Hadoop has one important feature for me. The replicator continuously
>> examines whether data blocks are consistent across all datanodes. This will
>> detect disk errors and automatically move data from defective blocks to
>> working blocks. I think this is called 'self-healing mechanism'.
>>
>> Is there a similar feature in Cassandra too?
>>
>>
>> Thanks for help
>>
>> Ralph
>>
>>
>>
>> --
>>
>>
>

Re: Self-healing data integrity?

2017-09-09 Thread DuyHai Doan

Jeff,

 With default compression enabled on each table, isn't there CRC files
created along side with SSTables that can help detecting bit-rot ?


On Sat, Sep 9, 2017 at 7:50 PM, Jeff Jirsa  wrote:

> Cassandra doesn't do that automatically - it can guarantee consistency on
> read or write via ConsistencyLevel on each query, and it can run active
> (AntiEntropy) repairs. But active repairs must be scheduled (by human or
> cron or by third party script like http://cassandra-reaper.io/), and to
> be pedantic, repair only fixes consistency issue, there's some work to be
> done to properly address/support fixing corrupted replicas (for example,
> repair COULD send a bit flip from one node to all of the others)
>
>
>
> --
> Jeff Jirsa
>
>
> On Sep 9, 2017, at 1:07 AM, Ralph Soika  wrote:
>
> Hi,
>
> I am searching for a big data storage solution for the Imixs-Workflow
> project. I started with Hadoop until I became aware of the
> 'small-file-problem'. So I am considering using Cassandra now.
>
> But Hadoop has one important feature for me. The replicator continuously
> examines whether data blocks are consistent across all datanodes. This will
> detect disk errors and automatically move data from defective blocks to
> working blocks. I think this is called 'self-healing mechanism'.
>
> Is there a similar feature in Cassandra too?
>
>
> Thanks for help
>
> Ralph
>
>
>
> --
>
>

Re: Lightweight transaction in Multi DC

2017-09-09 Thread DuyHai Doan

Using SERIAL is of course much more expensive, but then the trade-off is
that you are guaranteed to have linearizability cross data-centers.

Please note that when using Lightweight Transactions, there 2 two distinct
consistency levels to be set:

1) The Paxos phase consistency level: SERIAL or LOCAL_SERIAL (equivalent to
QUORUM/LOCAL_QUORUM)
2) The commit/Cassandra write consistency level: any normal consistency
level

Please see this diagram:
https://www.slideshare.net/doanduyhai/cassandra-introduction-parisjug/90

The Paxos consistency level applies to phases 1 to 3. The commit
consistency level only applies to the last phase.

On Sat, Sep 9, 2017 at 12:49 AM, Charulata Sharma (charshar) <
chars...@cisco.com> wrote:

> Thanks for your reply. I understand that LOCAL_SERIAL is for within a DC ,
> will setting up SERIAL not slow down the operation?
>
> And should I set SERIAL for both read and write phase or just Read phase?
> This is becoming a big problem for us. It happens for a small percentage of
>
> data set but it happens daily because of the highly concurrent and random
> DC routing of the received events.
>
>
>
> We had a counter implementation prior to CAS, which used to fail most of
> the times. CAS provided a big relief but it is still not totally without
> errors.
>
> Would really appreciate the community’s feedback.
>
>
>
> Thanks,
>
> Charu
>
>
>
> *From: *vasu gunja <vasu.no...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Friday, September 8, 2017 at 1:56 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Lightweight transaction in Multi DC
>
>
>
> LOCAL_SERIAL is dc level, SERIAL checks for complete cluster level.
>
>
>
> On Fri, Sep 8, 2017 at 2:33 PM, Charulata Sharma (charshar) <
> chars...@cisco.com> wrote:
>
> Yes …it is with LOCAL_SERIAL. Should I be using SERIAL ?
>
>
>
> Thanks,
>
> Charu
>
>
>
> *From: *DuyHai Doan <doanduy...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Friday, September 8, 2017 at 12:30 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Lightweight transaction in Multi DC
>
>
>
> Are you using CAS with SERIAL consistency level for your multi-DC setup ?
>
>
>
> On Fri, Sep 8, 2017 at 9:27 PM, Charulata Sharma (charshar) <
> chars...@cisco.com> wrote:
>
> Hi,
>
>   We are facing a serious issue with CAS in a multi DC setup and I
> wanted to get some input on it from the forum.
>
>
>
> We have a Column family which stores counts for the number of events our
> application receives. When the counts reach a certain threshold,
>
> there is another process which kicks in. The issue that is happening is,
> sometimes when concurrent events come in 2 Data Centers, both the events
> read the same count
>
> and increment it to the same value which causes the next process to not
> kick in.
>
>
>
> When concurrent events come in the same DC, this does not happen, because
> in this case, the CAS explicitly fails for one of them.
>
> However, in case of Multiple DCs this is not happening. Has anyone faced a
> similar issue and is there any resolution to this?? I was unable to find
> any open Jira on this.
>
>
>
> Thanks,
>
> Charu
>
>
>
>
>
>
>
>
>

Re: Lightweight transaction in Multi DC

2017-09-08 Thread DuyHai Doan

Are you using CAS with SERIAL consistency level for your multi-DC setup ?

On Fri, Sep 8, 2017 at 9:27 PM, Charulata Sharma (charshar) <
chars...@cisco.com> wrote:

> Hi,
>
>   We are facing a serious issue with CAS in a multi DC setup and I
> wanted to get some input on it from the forum.
>
>
>
> We have a Column family which stores counts for the number of events our
> application receives. When the counts reach a certain threshold,
>
> there is another process which kicks in. The issue that is happening is,
> sometimes when concurrent events come in 2 Data Centers, both the events
> read the same count
>
> and increment it to the same value which causes the next process to not
> kick in.
>
>
>
> When concurrent events come in the same DC, this does not happen, because
> in this case, the CAS explicitly fails for one of them.
>
> However, in case of Multiple DCs this is not happening. Has anyone faced a
> similar issue and is there any resolution to this?? I was unable to find
> any open Jira on this.
>
>
>
> Thanks,
>
> Charu
>
>
>
>
>

Re: No columns are defined for Materialized View other than primary key

2017-09-07 Thread DuyHai Doan

"As I described, non-filtered full scans on MV are more efficient than
filtered full scans on a table"

--> But if your MV has the same primary key as your view, how can it be
possible ?

Can you elaborate on what you mean by "non filtered full scan on MV" ?
Please give us some sample SELECT queries

On Thu, Sep 7, 2017 at 5:11 PM, Alex Kotelnikov <
alex.kotelni...@diginetica.com> wrote:

> In this example all tables and materialized views share all columns. What
> is the question?
>
> On 7 September 2017 at 17:26, sha p <shatestt...@gmail.com> wrote:
>
>> There is one more column "data" here in MView?
>>
>> On 7 Sep 2017 7:49 p.m., "DuyHai Doan" <doanduy...@gmail.com> wrote:
>>
>>> The answer of your question is in the error message. For once it's very
>>> clear. The primary key of your materialized view is EXACTLY the same as for
>>> your base table.
>>>
>>> So the question is what's the point creating this materialized view ...
>>>
>>>
>>>
>>> On Thu, Sep 7, 2017 at 4:01 PM, Alex Kotelnikov <
>>> alex.kotelni...@diginetica.com> wrote:
>>>
>>>> Hey. I have a problem creating a materialized view.
>>>>
>>>> My case is quite similar to
>>>> https://issues.apache.org/jira/browse/CASSANDRA-13564
>>>> but discussion in comments there faded, let me describe by case.
>>>>
>>>> I have a table like
>>>> CREATE TABLE users (
>>>>   site_id int,
>>>>   user_id text,
>>>>   n int,
>>>>   data set<frozen>,
>>>>   PRIMARY KEY ((site_id, user_id), n));
>>>>
>>>> user data is updated and read by PK and sometimes I have to fetch all
>>>> user for some specific site_id. It appeared that full scan by
>>>> token(site_id,user_id) filtered by WHERE site_id =  works much
>>>> slower than unfiltered full scan on
>>>> CREATE MATERIALIZED VIEW users_1 AS
>>>> SELECT site_id, user_id, n, data
>>>> FROM users
>>>> WHERE site_id = 1 AND user_id IS NOT NULL AND n IS NOT NULL
>>>> PRIMARY KEY ((site_id, user_id), n);
>>>>
>>>> yes, you have to do so for each site_id, but it makes such bulk fetches
>>>> much faster. (When I do so, I am always puzzled, why I have to put NOT NULL
>>>> for a part of a primary key).
>>>> And just in case, I tried secondary indices on site_id. For such use
>>>> they improve nothing.
>>>>
>>>>
>>>> But things are changing and we realized that we want to get rid of
>>>> clustering key, n.
>>>>
>>>> DROP MATERIALIZED VIEW users_1;
>>>> DROP TABLE users;
>>>>
>>>> CREATE TABLE users (
>>>> site_id int,
>>>> user_id text,
>>>> data set,
>>>> PRIMARY KEY ((site_id, user_id)));
>>>>
>>>> CREATE MATERIALIZED VIEW users_1 AS
>>>> SELECT site_id, user_id, data
>>>> FROM users
>>>> WHERE site_id = 1 AND user_id IS NOT NULL
>>>> PRIMARY KEY ((site_id, user_id));
>>>>
>>>> And here I get the error I listed in the subject.
>>>> InvalidRequest: Error from server: code=2200 [Invalid query]
>>>> message="No columns are defined for Materialized View other than primary
>>>> key"
>>>>
>>>> But why? I still expect scans to be faster with MV. It appears to be
>>>> possible to create a dummy column and using as a clustering key. That's
>>>> ugly.
>>>> --
>>>>
>>>> Best Regards,
>>>>
>>>>
>>>> *Alexander Kotelnikov*
>>>>
>>>> *Team Lead*
>>>>
>>>> DIGINETICA
>>>> Retail Technology Company
>>>>
>>>> m: +7.921.915.06.28 <+7%20921%20915-06-28>
>>>>
>>>> *www.diginetica.com <http://www.diginetica.com/>*
>>>>
>>>
>>>
>
>
> --
>
> Best Regards,
>
>
> *Alexander Kotelnikov*
>
> *Team Lead*
>
> DIGINETICA
> Retail Technology Company
>
> m: +7.921.915.06.28 <+7%20921%20915-06-28>
>
> *www.diginetica.com <http://www.diginetica.com/>*
>

Re: No columns are defined for Materialized View other than primary key

2017-09-07 Thread DuyHai Doan

The answer of your question is in the error message. For once it's very
clear. The primary key of your materialized view is EXACTLY the same as for
your base table.

So the question is what's the point creating this materialized view ...



On Thu, Sep 7, 2017 at 4:01 PM, Alex Kotelnikov <
alex.kotelni...@diginetica.com> wrote:

> Hey. I have a problem creating a materialized view.
>
> My case is quite similar to
> https://issues.apache.org/jira/browse/CASSANDRA-13564
> but discussion in comments there faded, let me describe by case.
>
> I have a table like
> CREATE TABLE users (
>   site_id int,
>   user_id text,
>   n int,
>   data set,
>   PRIMARY KEY ((site_id, user_id), n));
>
> user data is updated and read by PK and sometimes I have to fetch all user
> for some specific site_id. It appeared that full scan by
> token(site_id,user_id) filtered by WHERE site_id =  works much
> slower than unfiltered full scan on
> CREATE MATERIALIZED VIEW users_1 AS
> SELECT site_id, user_id, n, data
> FROM users
> WHERE site_id = 1 AND user_id IS NOT NULL AND n IS NOT NULL
> PRIMARY KEY ((site_id, user_id), n);
>
> yes, you have to do so for each site_id, but it makes such bulk fetches
> much faster. (When I do so, I am always puzzled, why I have to put NOT NULL
> for a part of a primary key).
> And just in case, I tried secondary indices on site_id. For such use they
> improve nothing.
>
>
> But things are changing and we realized that we want to get rid of
> clustering key, n.
>
> DROP MATERIALIZED VIEW users_1;
> DROP TABLE users;
>
> CREATE TABLE users (
> site_id int,
> user_id text,
> data set,
> PRIMARY KEY ((site_id, user_id)));
>
> CREATE MATERIALIZED VIEW users_1 AS
> SELECT site_id, user_id, data
> FROM users
> WHERE site_id = 1 AND user_id IS NOT NULL
> PRIMARY KEY ((site_id, user_id));
>
> And here I get the error I listed in the subject.
> InvalidRequest: Error from server: code=2200 [Invalid query] message="No
> columns are defined for Materialized View other than primary key"
>
> But why? I still expect scans to be faster with MV. It appears to be
> possible to create a dummy column and using as a clustering key. That's
> ugly.
> --
>
> Best Regards,
>
>
> *Alexander Kotelnikov*
>
> *Team Lead*
>
> DIGINETICA
> Retail Technology Company
>
> m: +7.921.915.06.28 <+7%20921%20915-06-28>
>
> *www.diginetica.com *
>

[ANNOUNCE] Achilles 5.3.0

2017-08-26 Thread DuyHai Doan

Hello Cassandra users

I'm happy to announce the release of Achilles 5.3.0

The new added features are

- Support for Cassandra up to 3.11.0 and Datastax Enterprise up to 5.1.2
- Support for new Duration type (CASSANDRA-11873)
- Support for literal value in (CASSANDRA-10783)
- Support for GROUP BY operations (CASSANDRA-10707)
- Support for immutable entities

All the details are in the the wiki: https://github.com/doanduyhai/Achilles
/wiki/ 

Regards

Re: SASI and secondary index simultaniously

2017-07-12 Thread DuyHai Doan

In the original source code Sasi will be chosen instead of secondary index

Le 12 juil. 2017 09:13, "Vlad"  a écrit :

> Hi,
>
> it's possible to create both regular secondary index and SASI on the same
> column:
>
>
>
>
> *CREATE TABLE ks.tb (id int PRIMARY KEY,  name text);CREATE CUSTOM INDEX
> tb_name_idx_1 ON ks.tb (name) USING
> 'org.apache.cassandra.index.sasi.SASIIndex';CREATE INDEX tb_name_idx ON
> ks.tb (name);*
> But which one is used for SELECT? Assuming we have regular index and would
> like to migrate to SASI, can we first create SASI, than drop regular? And
> how can we check then index build is completed?
>
> Thanks.
>
>
>

Re: timeoutexceptions with UDF causing cassandra forceful exits

2017-07-03 Thread DuyHai Doan

Beside the config of user_function_timeout_policy, I would say having an
UDF that times out badly is generally an indication that you should review
your UDF code

On Mon, Jul 3, 2017 at 7:58 PM, Jeff Jirsa  wrote:

>
>
> On 2017-06-29 17:00 (-0700), Akhil Mehra  wrote:
> > By default user_function_timeout_policy is set to die i.e. warn and kill
> the JVM. Please find below a source code snippet that outlines possible
> setting.
>
> (Which also means you can set user_function_timeout_policy to ignore in
> your yaml and just log an error instead of exiting)
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: UDF for sorting

2017-07-03 Thread DuyHai Doan

Plain answer is no you can't

The reason is that UDF only transform column values on each row but does
not have the ability to modify rows ordering

On Mon, Jul 3, 2017 at 10:14 PM, techpyaasa .  wrote:

> Hi all,
>
> I have a table like
>
> CREATE TABLE ks.cf ( pk1 bigint, cc1 bigint, disp_name text , stat_obj
> text, status int, PRIMARY KEY (pk1, cc1)) WITH CLUSTERING ORDER BY (cc1 ASC)
>
> CREATE INDEX idx1 on ks.cf(status);
>
> I want to have a queries like
> *select * from ks.cf  where pk1=123 and cc1=345;*
>
> and
> *select * from ks.cf  where pk1=123 and status=1;*
> In this case , I want rows to be sorted based on 'disp_name' (asc/desc) .
>
> Can I achieve the same using UDF or anything else ?? (Sorry If my
> understanding about UDF is wrong).
>
> Thanks in advance
> TechPyaasa
>

Re: SASI index on datetime column does not filter on minutes

2017-06-19 Thread DuyHai Doan

The + in the date format is necessary to specify timezone

On Mon, Jun 19, 2017 at 5:38 PM, Hannu Kröger  wrote:

> Hello,
>
> I tried the same thing with 3.10 which I happened to have at hand and that
> seems to work.
>
> cqlsh:test> select lastname,firstname,dateofbirth from individuals where
> dateofbirth < '2001-01-01T10:00:00' and dateofbirth > '2000-11-18 17:59:18';
>
>  lastname | firstname | dateofbirth
> --+---+-
>   Jimmie2 |Lundin | 2000-12-19 17:55:17.00+
>   Jimmie3 |Lundin | 2000-11-18 17:55:18.00+
>Jimmie |Lundin | 2000-11-18 17:55:17.00+
>
> (3 rows)
> cqlsh:test> select lastname,firstname,dateofbirth from individuals where
> dateofbirth < '2001-01-01T10:00:00+' and dateofbirth >
> '2000-11-18T17:59:18+';
>
>  lastname | firstname | dateofbirth
> --+---+-
>   Jimmie2 |Lundin | 2000-12-19 17:55:17.00+
>
> (1 rows)
> cqlsh:test>
>
> Maybe you have timezone issue?
>
> Best Regards,
> Hannu
>
> On 19 June 2017 at 17:09:10, Tobias Eriksson (tobias.eriks...@qvantel.com)
> wrote:
>
> Hi
>
> I have a table like this (Cassandra 3.5)
>
> Table
>
> id uuid,
>
> lastname text,
>
> firstname text,
>
> address_id uuid,
>
> dateofbirth timestamp,
>
>
>
> PRIMARY KEY (id, lastname, firstname)
>
>
>
> And a SASI index like this
>
> create custom index indv_birth ON playground.individual(dateofbirth)
> USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
> 'SPARSE'};
>
>
>
> The data
>
>
>
> lastname | firstname | dateofbirth
>
> --+---+-
>
>Lundin |Jimmie | 2000-11-18 17:55:17.00+
>
>   Jansson |   Karolin | 2000-12-19 17:55:17.00+
>
> Öberg |Louisa | 2000-11-18 17:55:18.00+
>
>
>
>
>
> Now if I do this
>
> select lastname,firstname,dateofbirth from playground.individual where
> dateofbirth < '2001-01-01T10:00:00' and dateofbirth > '2000-11-18
> 17:59:18';
>
>
>
> I should only get ONE row, right
>
> lastname | firstname | dateofbirth
>
> --+---+-
>
> Jansson |   Karolin | 2000-12-19 17:55:17.00+
>
>
>
>
>
> But instead I get all 3 rows !!!
>
>
>
> Why is that ?
>
>
>
> -Tobias
>
>
>
>
>
>

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread DuyHai Doan

Sorry, I misread some reply I had the impression that people recommend ES
as primary datastore

On Mon, Jun 12, 2017 at 7:12 PM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> Nobody is promoting ES as a primary datastore in this thread.  Every
> mention of it is to accompany C*.
>
>
>
> On Mon, Jun 12, 2017 at 10:03 AM DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> For all those promoting ES as a PRIMARY datastore, please read this
>> before:
>>
>> https://discuss.elastic.co/t/elasticsearch-as-a-primary-database/85733/13
>>
>> There are a lot of warning before recommending ES as a datastore.
>>
>> The answer from Pilato, ES official evangelist:
>>
>>
>>- You absolutely care about your data and you want to be able to
>>reindex in all cases. You need for that a datastore. A datastore can be a
>>filesystem where you store JSON, HDFS, and/or a database you prefer and 
>> you
>>are confident with. About how to inject data in it, you may want to read:
>>http://david.pilato.fr/blog/2015/05/09/advanced-
>>search-for-your-legacy-application/7
>>
>> <http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-application/>
>>.
>>
>>
>>
>>
>> On Mon, Jun 12, 2017 at 5:08 PM, Michael Mior <mm...@uwaterloo.ca> wrote:
>>
>>> For queries 1-5 this seems like a potentially good use case for
>>> materialized views. Create one table with the videos stored by ID and the
>>> materialized views for each of the queries.
>>>
>>> --
>>> Michael Mior
>>> mm...@apache.org
>>>
>>>
>>> 2017-06-11 22:40 GMT-04:00 @Nandan@ <nandanpriyadarshi...@gmail.com>:
>>>
>>>> Hi,
>>>>
>>>> Currently, I am working on data modeling for Video Company in which we
>>>> have different types of users as well as different user functionality.
>>>> But currently, my concern is about Search video module based on
>>>> different fields.
>>>>
>>>> Query patterns are as below:-
>>>> 1) Select video by actor.
>>>> 2) select video by producer.
>>>> 3) select video by music.
>>>> 4) select video by actor and producer.
>>>> 5) select video by actor and music.
>>>>
>>>> Note: - In short, We want to establish an advanced search module by
>>>> which we can search by anyway and get the desired results.
>>>>
>>>> During a search , we need partial search also such that if any user can
>>>> search "Harry" title, then we are able to give them result as all videos
>>>> whose
>>>>  title contains "Harry" at any location.
>>>>
>>>> As per my ideas, I have to create separate tables such as
>>>> video_by_actor, video_by_producer etc.. and implement solr query on all
>>>> tables. Otherwise,
>>>> is there any others way by which we can implement this search module
>>>> effectively.
>>>>
>>>> Please suggest.
>>>>
>>>> Best regards,
>>>>
>>>
>>>
>>

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread DuyHai Doan

For all those promoting ES as a PRIMARY datastore, please read this before:

https://discuss.elastic.co/t/elasticsearch-as-a-primary-database/85733/13

There are a lot of warning before recommending ES as a datastore.

The answer from Pilato, ES official evangelist:


   - You absolutely care about your data and you want to be able to reindex
   in all cases. You need for that a datastore. A datastore can be a
   filesystem where you store JSON, HDFS, and/or a database you prefer and you
   are confident with. About how to inject data in it, you may want to read:
   http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-
   application/7
   

   .




On Mon, Jun 12, 2017 at 5:08 PM, Michael Mior  wrote:

> For queries 1-5 this seems like a potentially good use case for
> materialized views. Create one table with the videos stored by ID and the
> materialized views for each of the queries.
>
> --
> Michael Mior
> mm...@apache.org
>
>
> 2017-06-11 22:40 GMT-04:00 @Nandan@ :
>
>> Hi,
>>
>> Currently, I am working on data modeling for Video Company in which we
>> have different types of users as well as different user functionality.
>> But currently, my concern is about Search video module based on different
>> fields.
>>
>> Query patterns are as below:-
>> 1) Select video by actor.
>> 2) select video by producer.
>> 3) select video by music.
>> 4) select video by actor and producer.
>> 5) select video by actor and music.
>>
>> Note: - In short, We want to establish an advanced search module by which
>> we can search by anyway and get the desired results.
>>
>> During a search , we need partial search also such that if any user can
>> search "Harry" title, then we are able to give them result as all videos
>> whose
>>  title contains "Harry" at any location.
>>
>> As per my ideas, I have to create separate tables such as video_by_actor,
>> video_by_producer etc.. and implement solr query on all tables. Otherwise,
>> is there any others way by which we can implement this search module
>> effectively.
>>
>> Please suggest.
>>
>> Best regards,
>>
>
>

Re: Cassandra & Spark

2017-06-08 Thread DuyHai Doan

Interesting

Tobias, when you said "Instead we transferred the data to Apache Kudu", did
you transfer all Cassandra data into Kudu from with a single migration and
then tap into Kudo for aggregation or did you run data import every
day/week/month from Cassandra into Kudu ?

>From my point of view, the difficulty is not to have a static set of data
and run aggregation on it, there are a lot of alternatives out there. The
difficulty is to be able to run analytics on a live/production/changing
dataset with all the data movement & update that it implies.

Regards

On Thu, Jun 8, 2017 at 3:37 PM, Tobias Eriksson  wrote:

> Hi
>
> Something to consider before moving to Apache Spark and Cassandra
>
> I have a background where we have tons of data in Cassandra, and we wanted
> to use Apache Spark to run various jobs
>
> We loved what we could do with Spark, BUT….
>
> We realized soon that we wanted to run multiple jobs in parallel
>
> Some jobs would take 30 minutes and some 45 seconds
>
> Spark is by default arranged so that it will take up all the resources
> there is, this can be tweaked by using Mesos or Yarn
>
> But even with Mesos and Yarn we found it complicated to run multiple jobs
> in parallel.
>
> So eventually we ended up throwing out Spark,
>
> Instead we transferred the data to Apache Kudu, and then we ran our
> analysis on Kudu, and what a difference !
>
> “my two cents!”
>
> -Tobias
>
>
>
>
>
>
>
> *From: *한 승호 
> *Date: *Thursday, 8 June 2017 at 10:25
> *To: *"user@cassandra.apache.org" 
> *Subject: *Cassandra & Spark
>
>
>
> Hello,
>
>
>
> I am Seung-ho and I work as a Data Engineer in Korea. I need some advice.
>
>
>
> My company recently consider replacing RDMBS-based system with Cassandra
> and Hadoop.
>
> The purpose of this system is to analyze Cadssandra and HDFS data with
> Spark.
>
>
>
> It seems many user cases put emphasis on data locality, for instance, both
> Cassandra and Spark executor should be on the same node.
>
>
>
> The thing is, my company's data analyst team wants to analyze
> heterogeneous data source, Cassandra and HDFS, using Spark.
>
> So, I wonder what would be the best practices of using Cassandra and
> Hadoop in such case.
>
>
>
> Plan A: Both HDFS and Cassandra with NodeManager(Spark Executor) on the
> same node
>
>
>
> Plan B: Cassandra + Node Manager / HDFS + NodeManager in each node
> separately but the same cluster
>
>
>
>
>
> Which would be better or correct, or would be a better way?
>
>
>
> I appreciate your advice in advance :)
>
>
>
> Best Regards,
>
> Seung-Ho Han
>
>
>
>
>
> Windows 10용 메일 에서 보냄
>
>
>

Re: Understanding the limitation to only one non-PK column in MV-PK

2017-06-06 Thread DuyHai Doan

All the explanation for why just 1 non PK column can be used as PK for MV
is here:

https://skillsmatter.com/skillscasts/7446-cassandra-udf-and-materialised-views-in-depth

Skip to 19:18 for the explanation

On Mon, May 8, 2017 at 8:08 PM, Fridtjof Sander <
fridtjof.san...@googlemail.com> wrote:

> Hi,
>
> I'm struggling to understand some problems with respect to materialized
> views.
>
> First, I want to understand the example mentioned in
> https://issues.apache.org/jira/browse/CASSANDRA-9928 explaining how
> multiple non-PK columns in the view PK can lead to unrepairable/orphanized
> entries. I understand that only happens if a node dies that pushed an
> "intermediate" state (the result of only one of several updates affecting
> the same entry) to it's view replica. The case mentioned looks like the
> following: initially all nodes have (p=1, a=1, b=1). Then two concurrent
> updates are send: a=2 and b=2. One node gets b=2, deletes view (a=1, b=1,
> p=1) and inserts (a=1, b=2, p=1), then dies. The others get a=1, when is
> why they delete (a=1, b=1, p=1) and insert (a=2, b=1, p=1). Then (a=1,
> b=2, p=1) will never be deleted.
>
> What I don't understand is, why that can never happen with a single
> column. Consider (p=1, a=1) with two updates a=2 and a=3. One node receives
> a=2, deletes view entry (a=1, p=1) and inserts (a=2, p=1), then dies. The
> others get a=3, delete (a=1, p=1) and insert (a=3, p=1). Now, how is (a=2,
> p=1) removed from the view replica that was connected to the dying node? I
> don't get what's different here.
>
> I would really appreciate if someone could share some insight here!
>
> Fridtjof
>

Re: Order by for aggregated values

2017-06-06 Thread DuyHai Doan

The problem is not that it's not feasible from Cassandra side, it is

The problem is when doing arbitrary ORDER BY, Cassandra needs to resort to
in-memory sorting of a potentially huge amout of data --> more pressure on
heap --> impact on cluster stability

Whereas delegating this kind of job to Spark which has appropriate data
structure to lower heap pressure (Dataframe, project tungsten) is a better
idea.

"but in the Top N use case, far more data has to be transferred to the
client when the client has to do the sorting"

--> It is not true if you co-located your Spark worker with Cassandra
nodes. In this case, Spark reading data out of Cassandra nodes are always
node-local



On Tue, Jun 6, 2017 at 6:20 PM, Roger Fischer (CW) <rfis...@brocade.com>
wrote:

> Hi DuyHai,
>
>
>
> this is in response to the other points in your response.
>
>
>
> My application is a real-time application. It monitors devices in the
> network and displays the top N devices for various parameters averaged over
> a time period. A query may involve anywhere from 10 to 50k devices, and
> anywhere from 5 to 2000 intervals. We expect a query to take less than 2
> seconds.
>
>
>
> My impression was that Spark is aimed at larger scale analytics.
>
>
>
> I am ok with the limitation on “group by”. I am intending to use async
> queries and token-aware load balancing to partition the query and execute
> it in parallel on each node.
>
>
>
> Thanks…
>
>
>
> Roger
>
>
>
>
>
> *From:* DuyHai Doan [mailto:doanduy...@gmail.com]
> *Sent:* Tuesday, June 06, 2017 12:31 AM
> *To:* Roger Fischer (CW) <rfis...@brocade.com>
> *Cc:* user@cassandra.apache.org
> *Subject:* Re: Order by for aggregated values
>
>
>
> First Group By is only allowed on partition keys and clustering columns,
> not on arbitrary column. The internal implementation of group by tries to
> fetch data on clustering order to avoid having to "re-sort" them in memory
> which would be very expensive
>
>
>
> Second, group by works best when restricted to a single partition other
> wise it will force Cassandra to do a range scan so poor performance
>
>
>
>
>
> For all of those reasons I don't expect an "order by" on aggregated values
> to be available any soon
>
>
>
> Furthermore, Cassandra is optimised for real-time transactional scenarios,
> the group by/order by/limit is typically a classical analytics scenario, I
> would recommend to use the appropriate tool like Spark for that
>
>
>
>
>
> Le 6 juin 2017 04:00, "Roger Fischer (CW)" <rfis...@brocade.com> a écrit :
>
> Hello,
>
>
>
> is there any intent to support “order by” and “limit” on aggregated values?
>
>
>
> For time series data, top n queries are quite common. Group-by was the
> first step towards supporting such queries, but ordering by value and
> limiting the results are also required.
>
>
>
> Thanks…
>
>
>
> Roger
>
>
>
>
>
>
>
>
>

Re: Order by for aggregated values

2017-06-06 Thread DuyHai Doan

First Group By is only allowed on partition keys and clustering columns,
not on arbitrary column. The internal implementation of group by tries to
fetch data on clustering order to avoid having to "re-sort" them in memory
which would be very expensive

Second, group by works best when restricted to a single partition other
wise it will force Cassandra to do a range scan so poor performance


For all of those reasons I don't expect an "order by" on aggregated values
to be available any soon

Furthermore, Cassandra is optimised for real-time transactional scenarios,
the group by/order by/limit is typically a classical analytics scenario, I
would recommend to use the appropriate tool like Spark for that


Le 6 juin 2017 04:00, "Roger Fischer (CW)"  a écrit :

Hello,



is there any intent to support “order by” and “limit” on aggregated values?



For time series data, top n queries are quite common. Group-by was the
first step towards supporting such queries, but ordering by value and
limiting the results are also required.



Thanks…



Roger

Re: Reg:- Generate dummy data for Cassandra Tables

2017-06-05 Thread DuyHai Doan

Personally I'm using https://github.com/Marak/faker.js/  to generate
various kind of dataset. That's the most comprehensive "free" data
generator I've found so far but it's in JS.

On Mon, Jun 5, 2017 at 7:13 AM, Jeff Jirsa  wrote:

> On 2017-06-04 20:03 (-0700), "@Nandan@" 
> wrote:
> > Hi All,
> >
> > As I am creating oneself project for Cassandra Project in which I want to
> > insert some random dummy data into my tables.
>
> Jon @ TheLastPickle has a nice tool to do this: https://github.com/
> rustyrazorblade/cdm
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: Apache Cassandra - Configuration Management

2017-05-17 Thread DuyHai Doan

For configuration management there are tons of tools out there:

- ansible
- chef
- puppet
- saltstack

I surely forgot a few others


On Wed, May 17, 2017 at 6:33 PM, ZAIDI, ASAD A  wrote:

> Good Morning Folks –
>
>
>
> I’m running 14 nodes Cassandra cluster in two data centers , each node is
> has roughly 1.5TB. we’re anticipating more load therefore we’ll be
> expanding cluster with additional nodes.
>
> At this time, I’m kind of struggling to keep consistent cassandra.yaml
> file on each server – at this time, I’m maintaining yaml file manually. The
> only tool I’ve is splunk  which is only to ‘monitor‘ threads.
>
>
>
> Would you guy please suggest  open source tool that can help maintain the
> cluster. I’ll really appreciate your reply – Thanks/Asad
>

Re: Reg:- DSE 5.1.0 Issue

2017-05-16 Thread DuyHai Doan

Nandan

Since you have asked many times questions about DSE on this OSS mailing
list, I suggest you to contact directly Datastax if you're using their
enterprise edition. Every Datastax customer has access to their support. If
you're a sub-contractor for a final customer that is using DSE, ask your
customer to get this support access. On this OSS mailing list we cannot
answer questions related to a commercial product.



On Tue, May 16, 2017 at 1:07 PM, Hannu Kröger  wrote:

> Hello,
>
> DataStax is probably more than happy answer your particaly DataStax
> Enterprise related questions here (I don’t know if that is 100% right place
> but…):
> https://support.datastax.com/hc/en-us
>
> This mailing list is for open source Cassandra and DSE issues are mostly
> out of the scope here. HADOOP is one of DSE-only features.
>
> Cheers,
> Hannu
>
> On 16 May 2017, at 14:01, @Nandan@  wrote:
>
> Hi ,
> Sorry in Advance if I am posting here .
>
> I stuck in some particular steps.
>
> I was using DSE 4.8 on Single DC with 3 nodes. Today I upgraded my all 3
> nodes to DSE 5.1
> Issue is when I am trying to start SERVICE DSE RESTART i am getting error
> message as
>
> Hadoop functionality has been removed from DSE.
> Please try again without the HADOOP_ENABLED set in /etc/default/dse.
>
> Even in /etc/default//dse file , HADOOP_ENABLED is set as 0 .
>
> For testing ,Once I changed my HADOOP_ENABLED = 1 ,
>
> I  am getting error as
>
> Found multiple DSE core jar files in /usr/share/dse/lib
> /usr/share/dse/resources/dse/lib /usr/share/dse /usr/share/dse/common .
> Please make sure there is only one.
>
> I searched so many article , but till now not able to find the solution.
> Please help me to get out of this mess.
>
> Thanks and Best Regards,
> Nandan Priyadarshi.
>
>
>

Re: Testing Stratio Index Queries with Cassandra-Stress Tool

2017-04-25 Thread DuyHai Doan

Use Gatling with the CQL plugin: https://github.com/gatling-cql/GatlingCql

On Tue, Apr 25, 2017 at 2:36 PM, Akshay Suresh <
akshay.sur...@unotechsoft.com> wrote:

> Hi
>
> I have a set of tables with Stratio Index.
>
> Is there anyway to test Stratio based SELECT queries using the
> cassandra-stress-tool?
>
> Thanks in advance.
>
> --
> Regards,
>
>
> *Akshay Suresh*
> *Unotech Software Pvt. Ltd*
>
>
> * *
> M : +91 99309 80509 <+91%2099309%2080509>
> O : +91 (22) 2687 9402 <+91%2022%202687%209402>
> A :  D Wing, 7th floor, 32 Corporate Avenue, Off Mahakali Caves Road,
> Andheri (E), Mumbai, 400093
> W : www.unotechsoft.com
>
> *Disclaimer :* The contents of this e-mail and attachment(s) thereto are
> confidential and intended for the named recipient(s) only. It shall not
> attach any liability on the originator or Unotech Software Pvt. Ltd. or its
> affiliates. Any views or opinions presented in this email are solely those
> of the author and may not necessarily reflect the opinions of Unotech
> Software Pvt. Ltd. or its affiliates. Any form of reproduction,
> dissemination, copying, disclosure, modification, distribution and / or
> publication of this message without the prior written consent of the author
> of this e-mail is strictly prohibited. If you have received this email in
> error please delete it and notify the sender immediately.

Re: sasi index question (read timeout on many selects)

2017-02-16 Thread DuyHai Doan

Using MV and put id as partition key is your best bet right now. SASI will
be too expensive for this simple use case

On Thu, Feb 16, 2017 at 3:21 PM, Micha  wrote:

>
>
> it's like having a table (sha256 blob primary key, id timeuuid, data1
> text, ., )
>
> So both, sha256 and id are unique.
> I would like to query *either* with sha256 *or* with id.
>
> I thought this can be done with a sasi index, but it has to be done with
> a second table (manual way) or with a mv with id as partition key.
>
> On 16.02.2017 15:11, Benjamin Roth wrote:
> > No matter what has to be indexed here, the preferrable way is most
> > probably denormalization instead of another index.
>
> it's rather manual inserting the data with another partition key or make
> a mv for with the other key.
>
>

Re: sasi index question (read timeout on many selects)

2017-02-16 Thread DuyHai Doan

[image: Inline image 1]

On Thu, Feb 16, 2017 at 3:08 PM, Micha <mich...@fantasymail.de> wrote:

>
>
> On 16.02.2017 14:30, DuyHai Doan wrote:
> > Why indexing BLOB data ? It does not make any sense
>
> My partition key is a secure hash sum,  I don't index a blob.
>
>
>
>
>

Re: sasi index question (read timeout on many selects)

2017-02-16 Thread DuyHai Doan

Why indexing BLOB data ? It does not make any sense

"I thought sasi index is globally held, in contrast to the normal secondary
index.." --> Who said that ? It's just wrong

On Thu, Feb 16, 2017 at 1:50 PM, Micha  wrote:

> Hi,
>
>
> my table has (among others) three columns, which are unique blobs.
> So I made the first column the partition key and created two sasi
> indices for the two other columns.
>
> After inserting ca 90m records I'm not able to query a bunch of rows
> (sending 1 selects to the cluster) using only a sasi index. After a
> few seconds I get timeouts.
>
> I have read the documents about the sasi index but I don't get why this
> happens. Is this because I don't include the partition key in the query?
>
> I thought sasi index is globally held, in contrast to the normal
> secondary index..
>
>
> thanks for helping,
>  Michael
>
>

Re: Time series data model and tombstones

2017-02-08 Thread DuyHai Doan

Thanks for the update. Good to know that TWCS give you more stability

On Wed, Feb 8, 2017 at 6:20 PM, John Sanda <john.sa...@gmail.com> wrote:

> I wanted to provide a quick update. I was able to patch one of the
> environments that is hitting the tombstone problem. It has been running
> TWCS for five days now, and things are stable so far. I also had a patch to
> the application code to implement date partitioning ready to go, but I
> wanted to see how things went with only making the compaction changes.
>
> On Sun, Jan 29, 2017 at 4:05 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> In theory, you're right and Cassandra should possibly skip reading cells
>> having time < 50. But it's all theory, in practice Cassandra read chunks of
>> xxx kilobytes worth of data (don't remember the exact value of xxx, maybe
>> 64k or far less) so you may end up reading tombstones.
>>
>> On Sun, Jan 29, 2017 at 9:24 PM, John Sanda <john.sa...@gmail.com> wrote:
>>
>>> Thanks for the clarification. Let's say I have a partition in an SSTable
>>> where the values of time range from 100 to 10 and everything < 50 is
>>> expired. If I do a query with time < 100 and time >= 50, are there
>>> scenarios in which Cassandra will have to read cells where time < 50? In
>>> particular I am wondering if compression might have any affect.
>>>
>>> On Sun, Jan 29, 2017 at 3:01 PM DuyHai Doan <doanduy...@gmail.com>
>>> wrote:
>>>
>>>> "Should the data be sorted by my time column regardless of the
>>>> compaction strategy" --> It does
>>>>
>>>> What I mean is that an old "chunk" of expired data in SSTABLE-12 may be
>>>> compacted together with a new chunk of SSTABLE-2 containing fresh data so
>>>> in the new resulting SSTable will contain tombstones AND fresh data inside
>>>> the same partition, but of course sorted by clustering column "time".
>>>>
>>>> On Sun, Jan 29, 2017 at 8:55 PM, John Sanda <john.sa...@gmail.com>
>>>> wrote:
>>>>
>>>> Since STCS does not sort data based on timestamp, your wide partition
>>>> may span over multiple SSTables and inside each SSTable, old data (+
>>>> tombstones) may sit on the same partition as newer data.
>>>>
>>>>
>>>> Should the data be sorted by my time column regardless of the
>>>> compaction strategy? I didn't think that the column timestamp came into
>>>> play with respect to sorting. I have been able to review some SSTables with
>>>> sstablemetadata and I can see that old/expired data is definitely living
>>>> with live data.
>>>>
>>>>
>>>> On Sun, Jan 29, 2017 at 2:38 PM, DuyHai Doan <doanduy...@gmail.com>
>>>> wrote:
>>>>
>>>> Ok so give it a try with TWCS. Since STCS does not sort data based on
>>>> timestamp, your wide partition may span over multiple SSTables and inside
>>>> each SSTable, old data (+ tombstones) may sit on the same partition as
>>>> newer data.
>>>>
>>>> When reading by slice, even if you request for fresh data, Cassandra
>>>> has to scan over a lot tombstones to fetch the correct range of data thus
>>>> your issue
>>>>
>>>> On Sun, Jan 29, 2017 at 8:19 PM, John Sanda <john.sa...@gmail.com>
>>>> wrote:
>>>>
>>>> It was with STCS. It was on a 2.x version before TWCS was available.
>>>>
>>>> On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan <doanduy...@gmail.com>
>>>> wrote:
>>>>
>>>> Did you get this Overwhelming tombstonne behavior with STCS or with
>>>> TWCS ?
>>>>
>>>> If you're using DTCS, beware of its weird behavior and tricky
>>>> configuration.
>>>>
>>>> On Sun, Jan 29, 2017 at 3:52 PM, John Sanda <john.sa...@gmail.com>
>>>> wrote:
>>>>
>>>> Your partitioning key is text. If you have multiple entries per id you
>>>> are likely hitting older cells that have expired. Descending only affects
>>>> how the data is stored on disk, if you have to read the whole partition to
>>>> find whichever time you are querying for you could potentially hit
>>>> tombstones in other SSTables that contain the same "id". As mentioned
>>>> previously, you need to add a time bucket to your partitioning key and
>>>> definitely use DTCS/TWCS.
&g

Re: Why does CockroachDB github website say Cassandra has no Availability on datacenter failure?

2017-02-07 Thread DuyHai Doan

The link you posted doesn't say anything about Cassandra
Le 7 févr. 2017 11:41, "Kant Kodali"  a écrit :

> Why does CockroachDB github website say Cassandra has no Availability on
> datacenter failure?
>
> https://github.com/cockroachdb/cockroach
>

Re: Global TTL vs Insert TTL

2017-02-01 Thread DuyHai Doan

I was referring to this JIRA
https://issues.apache.org/jira/browse/CASSANDRA-3974 when talking about
dropping entire SSTable at compaction time

But the JIRA is pretty old and it is very possible that the optimization is
no longer there



On Wed, Feb 1, 2017 at 6:53 PM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> This is incorrect, there's no optimization used that references the table
> level TTL setting.   The max local deletion time is stored in table
> metadata.  See 
> org.apache.cassandra.io.sstable.metadata.StatsMetadata#maxLocalDeletionTime
> in the Cassandra 3.0 branch.The default ttl is stored
> here: org.apache.cassandra.schema.TableParams#defaultTimeToLive and is
> never referenced during compaction.
>
> Here's an example from a table I created without a default TTL, you can
> use the sstablemetadata tool to see:
>
> jhaddad@rustyrazorblade ~/dev/cassandra/data/data/test$
> ../../../tools/bin/sstablemetadata a-7bca6b50e8a511e6869a5596edf4dd
> 35/mc-1-big-Data.db
> .
> SSTable max local deletion time: 1485980862
>
> On Wed, Feb 1, 2017 at 6:59 AM DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> Global TTL is better than dynamic runtime TTL
>>
>> Why ?
>>
>>  Because Global TTL is a table property and Cassandra can perform
>> optimization when compacting.
>>
>> For example if it can see than the maxTimestamp of an SSTable is older
>> than the table Global TTL, the SSTable can be entirely dropped during
>> compaction
>>
>> Using dynamic TTL at runtime, since Cassandra doesn't how and cannot
>> track each individual TTL value, the previous optimization is not possible
>> (even if you always use the SAME TTL for all query, Cassandra is not
>> supposed to know that)
>>
>>
>>
>> On Wed, Feb 1, 2017 at 3:01 PM, Cogumelos Maravilha <
>> cogumelosmaravi...@sapo.pt> wrote:
>>
>> Thank you all, for your answers.
>>
>> On 02/01/2017 01:06 PM, Carlos Rolo wrote:
>>
>> To reinforce Alain statement:
>>
>> "I would say that the unsafe part is more about using C* 3.9" this is
>> key. You would be better on 3.0.x unless you need features on the 3.x
>> series.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
>> *linkedin.com/in/carlosjuzarterolo
>> <http://linkedin.com/in/carlosjuzarterolo>*
>> Mobile: +351 918 918 100 <+351%20918%20918%20100>
>> www.pythian.com
>>
>> On Wed, Feb 1, 2017 at 8:32 AM, Alain RODRIGUEZ <arodr...@gmail.com>
>> wrote:
>>
>> Is it safe to use TWCS in C* 3.9?
>>
>>
>> I would say that the unsafe part is more about using C* 3.9 than using
>> TWCS in C*3.9 :-). I see no reason to say 3.9 would be specifically unsafe
>> in C*3.9, but I might be missing something.
>>
>> Going from STCS to TWCS is often smooth, from LCS you might expect an
>> extra load compacting a lot (all?) of the SSTable from what we saw from the
>> field. In this case, be sure that your compaction options are safe enough
>> to handle this.
>>
>> TWCS is even easier to use on C*3.0.8+ and C*3.8+ as it became the new
>> default replacing TWCS, so no extra jar is needed, you can enable TWCS as
>> any other default compaction strategy.
>>
>> C*heers,
>> ---
>> Alain Rodriguez - @arodream - al...@thelastpickle.com
>> France
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> 2017-01-31 23:29 GMT+01:00 Cogumelos Maravilha <
>> cogumelosmaravi...@sapo.pt>:
>>
>> Hi Alain,
>>
>> Thanks for your response and the links.
>>
>> I've also checked "Time series data model and tombstones".
>>
>> Is it safe to use TWCS in C* 3.9?
>>
>> Thanks in advance.
>>
>> On 31-01-2017 11:27, Alain RODRIGUEZ wrote:
>>
>> Is there a overhead using line by line option or wasted disk space?
>>
>>  There is a very recent topic about that in the mailing list, look for "Time
>> series data model and tombstones". I believe DuyHai answer your question
>> there with more details :).
>>
>> *tl;dr:*
>>
>> Yes, if you know the TTL in advance, and it is fixed, you might want to
>> go with the table option instead of adding the TTL in each insert. Also you
>> might want consider using TWCS compaction strategy.
>>
>>

Re: Global TTL vs Insert TTL

2017-02-01 Thread DuyHai Doan

Global TTL is better than dynamic runtime TTL

Why ?

 Because Global TTL is a table property and Cassandra can perform
optimization when compacting.

For example if it can see than the maxTimestamp of an SSTable is older than
the table Global TTL, the SSTable can be entirely dropped during compaction

Using dynamic TTL at runtime, since Cassandra doesn't how and cannot track
each individual TTL value, the previous optimization is not possible (even
if you always use the SAME TTL for all query, Cassandra is not supposed to
know that)



On Wed, Feb 1, 2017 at 3:01 PM, Cogumelos Maravilha <
cogumelosmaravi...@sapo.pt> wrote:

> Thank you all, for your answers.
>
> On 02/01/2017 01:06 PM, Carlos Rolo wrote:
>
> To reinforce Alain statement:
>
> "I would say that the unsafe part is more about using C* 3.9" this is key.
> You would be better on 3.0.x unless you need features on the 3.x series.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
> *linkedin.com/in/carlosjuzarterolo
> *
> Mobile: +351 918 918 100 <+351%20918%20918%20100>
> www.pythian.com
>
> On Wed, Feb 1, 2017 at 8:32 AM, Alain RODRIGUEZ 
> wrote:
>
>> Is it safe to use TWCS in C* 3.9?
>>
>>
>> I would say that the unsafe part is more about using C* 3.9 than using
>> TWCS in C*3.9 :-). I see no reason to say 3.9 would be specifically unsafe
>> in C*3.9, but I might be missing something.
>>
>> Going from STCS to TWCS is often smooth, from LCS you might expect an
>> extra load compacting a lot (all?) of the SSTable from what we saw from the
>> field. In this case, be sure that your compaction options are safe enough
>> to handle this.
>>
>> TWCS is even easier to use on C*3.0.8+ and C*3.8+ as it became the new
>> default replacing TWCS, so no extra jar is needed, you can enable TWCS as
>> any other default compaction strategy.
>>
>> C*heers,
>> ---
>> Alain Rodriguez - @arodream - al...@thelastpickle.com
>> France
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> 2017-01-31 23:29 GMT+01:00 Cogumelos Maravilha <
>> cogumelosmaravi...@sapo.pt>:
>>
>>> Hi Alain,
>>>
>>> Thanks for your response and the links.
>>>
>>> I've also checked "Time series data model and tombstones".
>>>
>>> Is it safe to use TWCS in C* 3.9?
>>>
>>> Thanks in advance.
>>>
>>> On 31-01-2017 11:27, Alain RODRIGUEZ wrote:
>>>
>>> Is there a overhead using line by line option or wasted disk space?

  There is a very recent topic about that in the mailing list, look for 
 "Time
>>> series data model and tombstones". I believe DuyHai answer your question
>>> there with more details :).
>>>
>>> *tl;dr:*
>>>
>>> Yes, if you know the TTL in advance, and it is fixed, you might want to
>>> go with the table option instead of adding the TTL in each insert. Also you
>>> might want consider using TWCS compaction strategy.
>>>
>>> Here are some blogposts my coworkers recently wrote about TWCS, it might
>>> be useful:
>>>
>>> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html
>>> http://thelastpickle.com/blog/2017/01/10/twcs-part2.html
>>>
>>> C*heers,
>>> ---
>>> Alain Rodriguez - @arodream - al...@thelastpickle.com
>>> France
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>>
>>>
>>> 2017-01-31 10:43 GMT+01:00 Cogumelos Maravilha <
>>> cogumelosmaravi...@sapo.pt>:
>>>
 Hi I'm just wondering what option is fastest:

 Global:*create table xxx (.**AND **default_time_to_live = **XXX**;**
 and**UPDATE xxx USING TTL XXX;*

 Line by line:
 *INSERT INTO xxx (...** USING TTL xxx;*

 Is there a overhead using line by line option or wasted disk space?

 Thanks in advance.


>>>
>>>
>

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan

In theory, you're right and Cassandra should possibly skip reading cells
having time < 50. But it's all theory, in practice Cassandra read chunks of
xxx kilobytes worth of data (don't remember the exact value of xxx, maybe
64k or far less) so you may end up reading tombstones.

On Sun, Jan 29, 2017 at 9:24 PM, John Sanda <john.sa...@gmail.com> wrote:

> Thanks for the clarification. Let's say I have a partition in an SSTable
> where the values of time range from 100 to 10 and everything < 50 is
> expired. If I do a query with time < 100 and time >= 50, are there
> scenarios in which Cassandra will have to read cells where time < 50? In
> particular I am wondering if compression might have any affect.
>
> On Sun, Jan 29, 2017 at 3:01 PM DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> "Should the data be sorted by my time column regardless of the
>> compaction strategy" --> It does
>>
>> What I mean is that an old "chunk" of expired data in SSTABLE-12 may be
>> compacted together with a new chunk of SSTABLE-2 containing fresh data so
>> in the new resulting SSTable will contain tombstones AND fresh data inside
>> the same partition, but of course sorted by clustering column "time".
>>
>> On Sun, Jan 29, 2017 at 8:55 PM, John Sanda <john.sa...@gmail.com> wrote:
>>
>> Since STCS does not sort data based on timestamp, your wide partition may
>> span over multiple SSTables and inside each SSTable, old data (+
>> tombstones) may sit on the same partition as newer data.
>>
>>
>> Should the data be sorted by my time column regardless of the compaction
>> strategy? I didn't think that the column timestamp came into play with
>> respect to sorting. I have been able to review some SSTables with
>> sstablemetadata and I can see that old/expired data is definitely living
>> with live data.
>>
>>
>> On Sun, Jan 29, 2017 at 2:38 PM, DuyHai Doan <doanduy...@gmail.com>
>> wrote:
>>
>> Ok so give it a try with TWCS. Since STCS does not sort data based on
>> timestamp, your wide partition may span over multiple SSTables and inside
>> each SSTable, old data (+ tombstones) may sit on the same partition as
>> newer data.
>>
>> When reading by slice, even if you request for fresh data, Cassandra has
>> to scan over a lot tombstones to fetch the correct range of data thus your
>> issue
>>
>> On Sun, Jan 29, 2017 at 8:19 PM, John Sanda <john.sa...@gmail.com> wrote:
>>
>> It was with STCS. It was on a 2.x version before TWCS was available.
>>
>> On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan <doanduy...@gmail.com>
>> wrote:
>>
>> Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ?
>>
>> If you're using DTCS, beware of its weird behavior and tricky
>> configuration.
>>
>> On Sun, Jan 29, 2017 at 3:52 PM, John Sanda <john.sa...@gmail.com> wrote:
>>
>> Your partitioning key is text. If you have multiple entries per id you
>> are likely hitting older cells that have expired. Descending only affects
>> how the data is stored on disk, if you have to read the whole partition to
>> find whichever time you are querying for you could potentially hit
>> tombstones in other SSTables that contain the same "id". As mentioned
>> previously, you need to add a time bucket to your partitioning key and
>> definitely use DTCS/TWCS.
>>
>>
>> As I mentioned previously, the UI only queries recent data, e.g., the
>> past hour, past two hours, past day, past week. The UI does not query for
>> anything older than the TTL which is 7 days. My understanding and
>> expectation was that Cassandra would only scan live cells. The UI is a
>> separate application that I do not maintain, so I am not 100% certain about
>> the queries. I have been told that it does not query for anything older
>> than 7 days.
>>
>> On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves <k...@instaclustr.com>
>> wrote:
>>
>>
>> Your partitioning key is text. If you have multiple entries per id you
>> are likely hitting older cells that have expired. Descending only affects
>> how the data is stored on disk, if you have to read the whole partition to
>> find whichever time you are querying for you could potentially hit
>> tombstones in other SSTables that contain the same "id". As mentioned
>> previously, you need to add a time bucket to your partitioning key and
>> definitely use DTCS/TWCS.
>>
>>
>>
>>
>>
>> --
>>
>> - John
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> - John
>>
>>
>>
>>
>>
>>
>>
>>

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan

"Should the data be sorted by my time column regardless of the compaction
strategy" --> It does

What I mean is that an old "chunk" of expired data in SSTABLE-12 may be
compacted together with a new chunk of SSTABLE-2 containing fresh data so
in the new resulting SSTable will contain tombstones AND fresh data inside
the same partition, but of course sorted by clustering column "time".

On Sun, Jan 29, 2017 at 8:55 PM, John Sanda <john.sa...@gmail.com> wrote:

> Since STCS does not sort data based on timestamp, your wide partition may
>> span over multiple SSTables and inside each SSTable, old data (+
>> tombstones) may sit on the same partition as newer data.
>
>
> Should the data be sorted by my time column regardless of the compaction
> strategy? I didn't think that the column timestamp came into play with
> respect to sorting. I have been able to review some SSTables with
> sstablemetadata and I can see that old/expired data is definitely living
> with live data.
>
>
> On Sun, Jan 29, 2017 at 2:38 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> Ok so give it a try with TWCS. Since STCS does not sort data based on
>> timestamp, your wide partition may span over multiple SSTables and inside
>> each SSTable, old data (+ tombstones) may sit on the same partition as
>> newer data.
>>
>> When reading by slice, even if you request for fresh data, Cassandra has
>> to scan over a lot tombstones to fetch the correct range of data thus your
>> issue
>>
>> On Sun, Jan 29, 2017 at 8:19 PM, John Sanda <john.sa...@gmail.com> wrote:
>>
>>> It was with STCS. It was on a 2.x version before TWCS was available.
>>>
>>> On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan <doanduy...@gmail.com>
>>> wrote:
>>>
>>>> Did you get this Overwhelming tombstonne behavior with STCS or with
>>>> TWCS ?
>>>>
>>>> If you're using DTCS, beware of its weird behavior and tricky
>>>> configuration.
>>>>
>>>> On Sun, Jan 29, 2017 at 3:52 PM, John Sanda <john.sa...@gmail.com>
>>>> wrote:
>>>>
>>>> Your partitioning key is text. If you have multiple entries per id you
>>>> are likely hitting older cells that have expired. Descending only affects
>>>> how the data is stored on disk, if you have to read the whole partition to
>>>> find whichever time you are querying for you could potentially hit
>>>> tombstones in other SSTables that contain the same "id". As mentioned
>>>> previously, you need to add a time bucket to your partitioning key and
>>>> definitely use DTCS/TWCS.
>>>>
>>>>
>>>> As I mentioned previously, the UI only queries recent data, e.g., the
>>>> past hour, past two hours, past day, past week. The UI does not query for
>>>> anything older than the TTL which is 7 days. My understanding and
>>>> expectation was that Cassandra would only scan live cells. The UI is a
>>>> separate application that I do not maintain, so I am not 100% certain about
>>>> the queries. I have been told that it does not query for anything older
>>>> than 7 days.
>>>>
>>>> On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves <k...@instaclustr.com>
>>>> wrote:
>>>>
>>>>
>>>> Your partitioning key is text. If you have multiple entries per id you
>>>> are likely hitting older cells that have expired. Descending only affects
>>>> how the data is stored on disk, if you have to read the whole partition to
>>>> find whichever time you are querying for you could potentially hit
>>>> tombstones in other SSTables that contain the same "id". As mentioned
>>>> previously, you need to add a time bucket to your partitioning key and
>>>> definitely use DTCS/TWCS.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> - John
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>
>
> --
>
> - John
>

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan

Ok so give it a try with TWCS. Since STCS does not sort data based on
timestamp, your wide partition may span over multiple SSTables and inside
each SSTable, old data (+ tombstones) may sit on the same partition as
newer data.

When reading by slice, even if you request for fresh data, Cassandra has to
scan over a lot tombstones to fetch the correct range of data thus your
issue

On Sun, Jan 29, 2017 at 8:19 PM, John Sanda <john.sa...@gmail.com> wrote:

> It was with STCS. It was on a 2.x version before TWCS was available.
>
> On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ?
>>
>> If you're using DTCS, beware of its weird behavior and tricky
>> configuration.
>>
>> On Sun, Jan 29, 2017 at 3:52 PM, John Sanda <john.sa...@gmail.com> wrote:
>>
>> Your partitioning key is text. If you have multiple entries per id you
>> are likely hitting older cells that have expired. Descending only affects
>> how the data is stored on disk, if you have to read the whole partition to
>> find whichever time you are querying for you could potentially hit
>> tombstones in other SSTables that contain the same "id". As mentioned
>> previously, you need to add a time bucket to your partitioning key and
>> definitely use DTCS/TWCS.
>>
>>
>> As I mentioned previously, the UI only queries recent data, e.g., the
>> past hour, past two hours, past day, past week. The UI does not query for
>> anything older than the TTL which is 7 days. My understanding and
>> expectation was that Cassandra would only scan live cells. The UI is a
>> separate application that I do not maintain, so I am not 100% certain about
>> the queries. I have been told that it does not query for anything older
>> than 7 days.
>>
>> On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves <k...@instaclustr.com>
>> wrote:
>>
>>
>> Your partitioning key is text. If you have multiple entries per id you
>> are likely hitting older cells that have expired. Descending only affects
>> how the data is stored on disk, if you have to read the whole partition to
>> find whichever time you are querying for you could potentially hit
>> tombstones in other SSTables that contain the same "id". As mentioned
>> previously, you need to add a time bucket to your partitioning key and
>> definitely use DTCS/TWCS.
>>
>>
>>
>>
>>
>> --
>>
>> - John
>>
>>
>>
>>
>>
>>
>>
>>

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan

Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ?

If you're using DTCS, beware of its weird behavior and tricky configuration.

On Sun, Jan 29, 2017 at 3:52 PM, John Sanda  wrote:

> Your partitioning key is text. If you have multiple entries per id you are
>> likely hitting older cells that have expired. Descending only affects how
>> the data is stored on disk, if you have to read the whole partition to find
>> whichever time you are querying for you could potentially hit tombstones in
>> other SSTables that contain the same "id". As mentioned previously, you
>> need to add a time bucket to your partitioning key and definitely use
>> DTCS/TWCS.
>
>
> As I mentioned previously, the UI only queries recent data, e.g., the past
> hour, past two hours, past day, past week. The UI does not query for
> anything older than the TTL which is 7 days. My understanding and
> expectation was that Cassandra would only scan live cells. The UI is a
> separate application that I do not maintain, so I am not 100% certain about
> the queries. I have been told that it does not query for anything older
> than 7 days.
>
> On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves 
> wrote:
>
>>
>> Your partitioning key is text. If you have multiple entries per id you
>> are likely hitting older cells that have expired. Descending only affects
>> how the data is stored on disk, if you have to read the whole partition to
>> find whichever time you are querying for you could potentially hit
>> tombstones in other SSTables that contain the same "id". As mentioned
>> previously, you need to add a time bucket to your partitioning key and
>> definitely use DTCS/TWCS.
>>
>
>
>
> --
>
> - John
>

Re: Time series data model and tombstones

2017-01-28 Thread DuyHai Doan

When the data expired (after TTL of 7 days), at the next compaction they
are transformed into tombstonnes and will still stay there during
gc_grace_seconds. After that, they (the tombstonnes) will be completely
removed at the next compaction, if there is any ...

So doing some maths, supposing that you have let gc_grace_seconds to its
default value of 10 days then you'll have tombstonnes for 10 days worth of
data before they got eventually removed...

What is your compaction strategy ? I strongly suggest

1) Setting TTL directly as the table property (ALTER TABLE) instead of
setting it at query level (INSERT INTO ... USING TTL). When setting TTL at
table level, Cassandra can perform some optimization and drop entirely some
SSTable and don't even bother compact them

2) Use TimeWindowCompactionStrategy and tune it properly to accomodate your
workload

On Sat, Jan 28, 2017 at 5:30 PM, John Sanda  wrote:

> I have a time series data model that is basically:
>
> CREATE TABLE metrics (
> id text,
> time timeuuid,
> value double,
> PRIMARY KEY (id, time)
> ) WITH CLUSTERING ORDER BY (time DESC);
>
> I do append-only writes, no deletes, and use a TTL of seven days. Data
> points are written every seconds. The UI queries data for the past hour,
> two hours, day, or week. The UI refreshes and executes queries every 30
> seconds. In one test environment I am seeing lots of tombstone threshold
> warnings and Cassandra has even OOME'd. Since I am storing data in
> descending order and always query for recent data, I do not understand why
> I am running into this problem.
>
> I know that it is recommended to do some date partitioning in part to
> ensure partitions do not grow too large. I already have some changes in
> place to partition by day.. Before I make those changes I want to
> understand why I am scanning so many tombstones so that I can be more
> confident that the date partitioning changes will help.
>
> Thanks
>
> - John
>

Re: implementing a 'sorted set' on top of cassandra

2017-01-14 Thread DuyHai Doan

Sorting on an "incremented" numeric value has always been a nightmare to be
done properly in C*

Either use Counter type but then no sorting is possible since counter
cannot be used as type for clustering column (which allows sort)

Or use simple numeric type on clustering column but then to increment the
value *concurrently* and *safely* it's prohibitive (SELECT to fetch current
value + UPDATE ... IF value = ) + retry



On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth 
wrote:

> If your proposed solution is crazy depends on your needs :)
> It sounds like you can live with not-realtime data. So it is ok to cache
> it. Why preproduce the results if you only need 5% of them? Why not use
> redis as a cache with expiring sorted sets that are filled on demand from
> cassandra partitions with counters?
> So redis has much less to do and can scale much better. And you are not
> limited on keeping all data in ram as cache data is volatile and can be
> evicted on demand.
> If this is effective also depends on the size of your sets. CS wont be
> able to sort them by score for you, so you will have to load the complete
> set to redis for caching and / or do sorting in your app on demand. This
> certainly won't work out well with sets with millions of entries.
>
> 2017-01-13 23:14 GMT+01:00 Mike Torra :
>
>> We currently use redis to store sorted sets that we increment many, many
>> times more than we read. For example, only about 5% of these sets are ever
>> read. We are getting to the point where redis is becoming difficult to
>> scale (currently at >20 nodes).
>>
>> We've started using cassandra for other things, and now we are
>> experimenting to see if having a similar 'sorted set' data structure is
>> feasible in cassandra. My approach so far is:
>>
>>1. Use a counter CF to store the values I want to sort by
>>2. Periodically read in all key/values in the counter CF and sort in
>>the client application (~every five minutes or so)
>>3. Write back to a different CF with the ordered keys I care about
>>
>> Does this seem crazy? Is there a simpler way to do this in cassandra?
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>

1 2 3 4 5 6 >

1 - 100 of 553 matches

Mail list logo