Re: Log4j vulnerability

2021-12-12 Thread Stefan Miklosovic
Hi users,

I just add to it that there was recently added a dependency check ant
target (by myself) to scan the deps on CVE's. People can execute that
themselves by "ant dependency-check" and it will scan the database of
vulnerabilities automatically against Cassandra libraries we ship.

Regards

On Sat, 11 Dec 2021 at 18:44, Brandon Williams  wrote:
>
> https://issues.apache.org/jira/browse/CASSANDRA-5883
>
> As that ticket shows, Apache Cassandra has never used log4j2.
>
> On Sat, Dec 11, 2021 at 11:07 AM Abdul Patel  wrote:
> >
> > Hi all,
> >
> > Any idea if any of open source Cassandra versions are impacted with log4j 
> > vulnerability which was reported on dec 9th


Re: Schema collision results in multiple data directories per table

2021-10-13 Thread Stefan Miklosovic
Hi Tom,

while I am not completely sure what might cause your issue, I just
want to highlight that schema agreements were overhauled in 4.0 (1) a
lot so that may be somehow related to what that ticket was trying to
fix.

Regards

(1) https://issues.apache.org/jira/browse/CASSANDRA-15158

On Fri, 1 Oct 2021 at 18:43, Tom Offermann  wrote:
>
> When adding a datacenter to a keyspace (following the Last Pickle [Data 
> Center Switch][lp] playbook), I ran into a "Configuration exception merging 
> remote schema" error. The nodes in one datacenter didn't converge to the new 
> schema version, and after restarting them, I saw the symptoms described in 
> this Datastax article on [Fixing a table schema collision][ds], where there 
> were two data directories for each table in the keyspace on the nodes that 
> didn't converge. I followed the recovery steps in the Datastax article to 
> move the data from the older directories to the new directories, ran 
> `nodetool refresh`, and that fixed the problem.
>
> [lp]: https://thelastpickle.com/blog/2019/02/26/data-center-switch.html
> [ds]: 
> https://docs.datastax.com/en/dse/6.0/cql/cql/cql_using/useCreateTableCollisionFix.html
>
> While the Datastax article was super helpful for helping me recover, I'm left 
> wondering *why* this happened. If anyone can shed some light on that, or 
> offer advice on how I can avoid getting in this situation in the future, I 
> would be most appreciative. I'll describe the steps I took in more detail in 
> the thread.
>
> ## Steps
>
> 1. The day before, I had added the second datacenter ('dc2') to the 
> system_traces, system_distributed, and system_auth keyspaces and ran 
> `nodetool rebuild` for each of the 3 keyspaces. All of that went smoothly 
> with no issues.
>
> 2. For a large keyspace, I added the second datacenter ('dc2') with an `ALTER 
> KEYSPACE foo WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 
> '2', 'dc2': '3'};` statement. Immediately, I saw this error in the log:
> ```
> "ERROR 16:45:47 Exception in thread Thread[MigrationStage:1,5,main]"
> "org.apache.cassandra.exceptions.ConfigurationException: Column family ID 
> mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected 
> 20739eb0-d92e-11e6-b42f-e7eb6f21c481)"
> "\tat 
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
> "\tat org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903) 
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> "\tat org.apache.cassandra.config.Schema.updateTable(Schema.java:687) 
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> "\tat 
> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1482)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
> "\tat 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1438)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
> "\tat 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1407)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
> "\tat 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1384)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
> "\tat 
> org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:594)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
> "\tat 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> "\tat 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_232]"
> "\tat java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_232]"
> "\tat 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[na:1.8.0_232]"
> "\tat 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_232]"
> "\tat 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
>  [apache-cassandra-3.11.5.jar:3.11.5]"
> "\tat java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]"
> ```
>
> I also saw this:
> ```
> "ERROR 16:46:48 Configuration exception merging remote schema"
> "org.apache.cassandra.exceptions.ConfigurationException: Column family ID 
> mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected 
> 20739eb0-d92e-11e6-b42f-e7eb6f21c481)"
> "\tat 
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
> "\tat org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903) 
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> "\tat org.apache.cassandra.config.Schema.updateTable(Schema.java:687) 
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> "\tat 
> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1482)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
> "\tat 
> 

Some more tooling around Cassandra 4.0

2021-09-30 Thread Stefan Miklosovic
Hi users,

I would like to highlight some tooling we put together at Instaclustr
focusing / updating it to the recent Cassandra 4.0 release.

We wrote a short and descriptive blog about that here (1).

All these tools are completely free of charge and Apache 2.0 licensed.

We hope you find them useful, in case you have any questions, feel
free to reach us on GitHub issues.

(1) https://www.instaclustr.com/cassandra-tools-updated-cassandra-4-0/

Regards

Stefan Miklosovic


Re: Change of Cassandra TTL

2021-09-30 Thread Stefan Miklosovic
Hi Raman,

we at Instaclustr have created a CLI tool (1) which can strip TTLs
from your SSTables and you can import that back to your node. Maybe
that is something you find handy.

We had some customers who had data which expired and they wanted to
resurrect them - so they took SSTables with expired TTLs, removed them
and voila, they had it back. So I can imagine you do this and then you
re-enable TTL on it which is different.

(1) https://github.com/instaclustr/cassandra-ttl-remover

Regards.

On Tue, 14 Sept 2021 at 16:24, raman gugnani  wrote:
>
> Thanks Eric for the update.
>
> On Tue, 14 Sept 2021 at 16:50, Erick Ramirez  
> wrote:
>>
>> You'll need to write an ETL app (most common case is with Spark) to scan 
>> through the existing data and update it with a new TTL. You'll need to make 
>> sure that the ETL job is throttled down so it doesn't overload your 
>> production cluster. Cheers!
>
>
>
> --
> Raman Gugnani


Re: Cassandra 4 alpha/alpha2

2019-10-31 Thread Stefan Miklosovic
Hi,

I have tested both alpha and alpha2 and 3.11.5 on Centos 7.7.1908 and
all went fine (I have some custom images for my own purposes).

Update between alpha and alpha2 was just about mere version bump.

Cheers

On Thu, 31 Oct 2019 at 20:40, Abdul Patel  wrote:
>
> Hey Everyone
>
> Did anyone was successfull to install either alpha or alpha2 version for 
> cassandra 4.0?
> Found 2 issues :
> 1> cassandra-env.sh:
> JAVA_VERSION varianle is not defined.
> Jvm-server.options file is not defined.
>
> This is fixable and after adding those , the error for cassandra-env.sh 
> errora went away.
>
> 2> second and major issue the cassandea binary when i try to start says 
> syntax error.
>
> /bin/cassandea: line 198:exec: : not found.
>
> Anyone has any idea on second issue?
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Disk space utilization by from some Cassandra

2019-08-21 Thread Stefan Miklosovic
Hi,

for example compaction uses a lot of disk space. It is quite common so
it is not safe to have your disk utilised like on 85% because
compactions would not have room to comapact and that node would be
stuck. This happens in production quite often.

Hence, having it on 50% and having big buffer to do compaction is good
idea. If it is all compacted, it should go back to normal under 50%
(or what figure you have).

On Wed, 21 Aug 2019 at 14:33,  wrote:
>
> Good day,
>
>
>
> I’m running the monitoring script for disk space utilization set the 
> benchmark to 50%. Currently am getting the alerts from some of the nodes
>
> About disk space greater than 50%.
>
>
>
> Is there a way, I can quickly figure out why the space has increased and how 
> I can maintain the disk space used by Cassandra to be below the benchmark at 
> all the times.
>
>
>
> Any ideas would be much appreciated.
>
>
>
> Sent from Mail for Windows 10
>
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Cassandra copy command

2019-08-21 Thread Stefan Miklosovic
Hi Rahul,

how did you add that dc3 to cluster? The rule of thumb here is to do
rebuild from each node, for example like here
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html

On Wed, 21 Aug 2019 at 12:57, Rahul Reddy  wrote:
>
> Hi sefan,
>
> I'm adding new DC3 to exiting cluster and see discripencies couple of 
> millions in Nodetool cfstats in new DC.
>
> My table size is 50gb
> I'm trying to run copy entire table.
>
> Copy table to 'full_tablr.csv' with delimiter ',';
>
> If I run above command from dc3. Does it get the data only from dc3?
>
>
>
> On Wed, Aug 21, 2019, 6:46 AM Stefan Miklosovic 
>  wrote:
>>
>> Hi Rahul,
>>
>> what is your motivation behind this? Why do you want to make sure the
>> count is same? What is the purpose of that? All you should care about
>> is that Cassandra will return you right results. It was designed from
>> the very bottom to do that for you, you should not be bothered too
>> much about such discrepancies, they will be always there in general.
>> But the important fact is that once queried, you can rest assured it
>> is returned (and consequentially repaired if data not match) as they
>> should.
>>
>> What copy command you are talking about precisely, why you cant use just 
>> repair?
>>
>> On Wed, 21 Aug 2019 at 12:14, Rahul Reddy  wrote:
>> >
>> > Hello,
>> >
>> > I have 3 datacenters . Want to make sure record count is same in all dc's 
>> > . If I run copy command node1 in dc1 does it get the data from only dc1? 
>> > Nodetool cfstats I'm seeing discrepancies in partitions count is it 
>> > because we didn't run cleanup after adding few nodes and remove them?. To 
>> > rule out any discripencies I want to run copy command from 3 DC's and 
>> > compare. Please let me know if copy command extracts data from the DC only 
>> > I ran it from?
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Cassandra copy command

2019-08-21 Thread Stefan Miklosovic
Hi Rahul,

what is your motivation behind this? Why do you want to make sure the
count is same? What is the purpose of that? All you should care about
is that Cassandra will return you right results. It was designed from
the very bottom to do that for you, you should not be bothered too
much about such discrepancies, they will be always there in general.
But the important fact is that once queried, you can rest assured it
is returned (and consequentially repaired if data not match) as they
should.

What copy command you are talking about precisely, why you cant use just repair?

On Wed, 21 Aug 2019 at 12:14, Rahul Reddy  wrote:
>
> Hello,
>
> I have 3 datacenters . Want to make sure record count is same in all dc's . 
> If I run copy command node1 in dc1 does it get the data from only dc1? 
> Nodetool cfstats I'm seeing discrepancies in partitions count is it because 
> we didn't run cleanup after adding few nodes and remove them?. To rule out 
> any discripencies I want to run copy command from 3 DC's and compare. Please 
> let me know if copy command extracts data from the DC only I ran it from?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: New column

2019-08-18 Thread Stefan Miklosovic
You have to basically create new table and include that column either
as part of primary key or you make it a clustering column. Avoid using
allow filtering, it should not be used in production nor any serious
app.

On Sun, 18 Aug 2019 at 21:57, Rahul Reddy  wrote:
>
> Hello,
>
> We have a table and want to add column and select based on existing entire 
> primary key plus new column using allow filtering. Since my where clause has 
> all the primary key + new column does the allow filtering scan only the 
> partions which are listed or does it has to scan whole table? What is the 
> best approach add new column and query it based on existing primary key plus 
> new column?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Cassandra DataStax Java Driver in combination with Java EE / EJBs

2019-06-11 Thread Stefan Miklosovic
Hi Ralph,

yes this is completely fine, even advisable. You can further extend
this idea to have sessions per keyspace for example if you really
insist, and it could be injectable based on some qualifier ... thats
up to you.

On Wed, 12 Jun 2019 at 11:31, John Sanda  wrote:
>
> Hi Ralph,
>
> A session is intended to be a long-lived, i.e., application-scoped object. 
> You only need one session per cluster. I think what you are doing with the 
> @Singleton is fine. In my opinion though, EJB really does not offer much 
> value when working with Cassandra. I would be inclined to just use CDI.
>
> Cheers
>
> John
>
> On Tue, Jun 11, 2019 at 5:38 PM Ralph Soika  wrote:
>>
>> Hi,
>>
>> I have a question concerning the Cassandra DataStax Java Driver in 
>> combination with Java EE and EJBs.
>>
>> I have implemented a Rest Service API based on Java EE8. In my application I 
>> have for example a jax-rs rest resource to write data into cassandra 
>> cluster. My first approach was to create in each method call
>>
>>  a new Casssandra Cluster and Session object,
>>  write my data into cassandra
>>  and finally close the session and the cluster object.
>>
>> This works but it takes a lot of time (2-3 seconds) until the cluster object 
>> / session is opened for each request.
>>
>>  So my second approach is now a @Singleton EJB providing the session object 
>> for my jax-rs resources. My service implementation to hold the Session 
>> object looks something like this:
>>
>>
>> @Singleton
>> public class ClusterService {
>> private Cluster cluster;
>> private Session session;
>>
>> @PostConstruct
>> private void init() throws ArchiveException {
>> cluster=initCluster();
>> session = initArchiveSession();
>> }
>>
>> @PreDestroy
>> private void tearDown() throws ArchiveException {
>> // close session and cluster object
>> if (session != null) {
>> session.close();
>> }
>> if (cluster != null) {
>> cluster.close();
>> }
>> }
>>
>> public Session getSession() {
>> if (session==null) {
>> try {
>> init();
>> } catch (ArchiveException e) {
>> logger.warning("unable to get falid session: " + 
>> e.getMessage());
>> e.printStackTrace();
>> }
>> }
>> return session;
>> }
>>
>>.
>>
>> }
>>
>>
>> And my rest service calls now looking like this:
>>
>>
>> @Path("/archive")
>> @Stateless
>> public class ArchiveRestService {
>>
>> @EJB
>> ClusterService clusterService;
>>
>> @POST
>> @Consumes({ MediaType.APPLICATION_XML, MediaType.TEXT_XML })
>> public Response postData(XMLDocument xmlDocument) {
>> Session session = clusterService.getSession();
>> session.execute();
>> ...
>> }
>> ...
>> }
>>
>>
>> The result is now a super-fast behavior! Seems to be clear because my rest 
>> service no longer need to open a new session for each request.
>>
>> My question is: Is this approach with a @Singleton ClusterService EJB valid 
>> or is there something I should avoid?
>> As far as I can see this works pretty fine and is really fast. I am running 
>> the application on a Wildfly 15 server which is Java EE8.
>>
>> Thanks for your comments
>>
>> Ralph
>>
>>
>>
>>
>> --
>>
>> Imixs Software Solutions GmbH
>> Web: www.imixs.com Phone: +49 (0)89-452136 16
>> Office: Agnes-Pockels-Bogen 1, 80992 München
>> Registergericht: Amtsgericht Muenchen, HRB 136045
>> Geschaeftsführer: Gaby Heinle u. Ralph Soika
>>
>> Imixs is an open source company, read more: www.imixs.org
>
>
>
> --
>
> - John

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Cluster schema version choosing

2019-05-20 Thread Stefan Miklosovic
My guess is that the "latest" schema would be chosen but I am
definitely interested in in-depth explanation.

On Tue, 21 May 2019 at 00:28, Alexey Korolkov  wrote:
>
> Hello team,
> In some circumstances, my cluster was split onto two schema versions
> (half on one version, and rest on another)
> In the process of resolving this issue, I restarted some nodes.
> Eventually, nodes migrated to one schema, but it was not clear why they 
> choose exactly this version of schema?
> I haven't found any explainings of the factor on which they picking schema 
> version,
> please help me to find out the algorithm of choosing schema or classes in 
> source code responsible for this.
>
>
>
>
>
> --
> Sincerely yours,  Korolkov Aleksey

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Exception while running two CQL queries in Parallel

2019-05-01 Thread Stefan Miklosovic
what are your replication factors for that keyspace? why are you using
each quorum?

might be handy 
https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigSerialConsistency.html

On Wed, 1 May 2019 at 17:57, Bhavesh Prajapati
 wrote:
>
> I had two queries run on same row in parallel (that’s a use-case). While 
> Batch Query 2 completed successfully, query 1 failed with exception.
>
> Following are driver logs and sequence of log events.
>
>
>
> QUERY 1: STARTED
>
> 2019-04-30T13:14:50.858+ CQL update "EACH_QUORUM" "UPDATE dir SET 
> bid='value' WHERE repoid='06A7490B5CBFA1DE0A494027' IF EXISTS;"
>
>
>
> QUERY 2: STARTED
>
> 2019-04-30T13:14:51.161+ CQL BEGIN BATCH
>
> 2019-04-30T13:14:51.161+ CQL batch-update "06A7490B5CBFA1DE0A494027"
>
> 2019-04-30T13:14:51.161+ CQL batch-delete "06A7490B5CBFA1DE0A494027"
>
> 2019-04-30T13:14:51.161+ CQL APPLY BATCH
>
> 2019-04-30T13:14:51.165+ Cassandra delete directory call completed 
> successfully for "06A7490B5CBFA1DE0A494027"
>
> QUERY 2: COMPLETED - WITH SUCCESS
>
>
>
> QUERY 1: FAILED
>
> 2019-04-30T13:14:52.311+ CQL 
> "org.springframework.cassandra.support.exception.CassandraWriteTimeoutException"
>  "Cassandra timeout during write query at consistency SERIAL (5 replica were 
> required but only 0 acknowledged the write); nested exception is 
> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout 
> during write query at consistency SERIAL (5 replica were required but only 0 
> acknowledged the write)"
>
> org.springframework.cassandra.support.exception.CassandraWriteTimeoutException:
>  Cassandra timeout during write query at consistency SERIAL (5 replica were 
> required but only 0 acknowledged the write); nested exception is 
> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout 
> during write query at consistency SERIAL (5 replica were required but only 0 
> acknowledged the write)
>
> at 
> org.springframework.cassandra.support.CassandraExceptionTranslator.translateExceptionIfPossible(CassandraExceptionTranslator.java:95)
>  ~[spring-cql-1.5.18.RELEASE.jar!/:?]
>
> at 
> org.springframework.cassandra.core.CqlTemplate.potentiallyConvertRuntimeException(CqlTemplate.java:946)
>  ~[spring-cql-1.5.18.RELEASE.jar!/:?]
>
> at 
> org.springframework.cassandra.core.CqlTemplate.translateExceptionIfPossible(CqlTemplate.java:930)
>  ~[spring-cql-1.5.18.RELEASE.jar!/:?]
>
>
>
> What could have caused this exception ?
>
> How to resolve or handle such situation ?
>
>
>
> Thanks,
>
> Bhavesh

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: gc_grace config for time serie database

2019-04-17 Thread Stefan Miklosovic
I am wrong in this paragraph:

>> On the other hand, a node was down, it was TTLed on healthy nodes and
>> tombstone was created, then you start the first one which was down and
>> as it counts down you hit that node with update.

It does not matter how long that dead node was dead. Once you start
the DB it will compute TTL value regardless, it does not suddenly stop
to take time it was dead into account. It would just mean it would
TTLed and it should not as other healthy nodes could receive updates
after they stopped to make hints.

But you say you dont ever update so it is not applicable here.

It is interesting question. I wont give you an ultimate answer. Maybe
somebody else gives their opinion on this. I am curious what
consequences that has if any if you set it to be equal.



On Wed, 17 Apr 2019 at 23:12, onmstester onmstester
 wrote:
>
> I do not use table default ttl (every row has its own TTL) and also no update 
> occurs to the rows.
> I suppose that (because of immutable nature of everything in cassandra) 
> cassandra would keep only the insertion timestamp + the original ttl and  
> computes ttl of a row using these two and current timestamp of the system 
> whenever needed (when you select ttl or when the compaction occurs).
> So there should be something like this attached to every row: "this row 
> inserted at 4/17/2019 12:20 PM  and should be deleted in 2 months", so 
> whatever happens to the row replicas, my intention of removing it at 6/17 
> should not be changed!
>
> Would you suggest that my idea of "gc_grace = max_hint = 3 hours" for a time 
> serie db is not reasonable?
>
> Sent using Zoho Mail
>
>
>
>  On Wed, 17 Apr 2019 17:13:02 +0430 Stefan Miklosovic 
>  wrote 
>
> TTL value is decreasing every second and it is set to original TTL
> value back after some update occurs on that row (see example below).
> Does not it logically imply that if a node is down for some time and
> updates are occurring on live nodes and handoffs are saved for three
> hours and after three hours it stops to do them, your data on other
> nodes would not be deleted as TTLS are reset upon every update and
> countdown starts again, which is correct, but they would be deleted on
> that node which was down because it didnt receive updates so if you
> query that node, data will not be there but they should.
>
> On the other hand, a node was down, it was TTLed on healthy nodes and
> tombstone was created, then you start the first one which was down and
> as it counts down you hit that node with update. So there is not a
> tombstone on the previously dead node but there are tombstones on
> healthy ones and if you delete tombstones after 3 hours, previously
> dead node will never get that info and it your data might actually end
> up being resurrected as they would be replicated to always healthy
> nodes as part of the repair.
>
> Do you see some flaw in my reasoning?
>
> cassandra@cqlsh> DESCRIBE TABLE test.test;
>
> CREATE TABLE test.test (
> id uuid PRIMARY KEY,
> value text
> ) WITH bloom_filter_fp_chance = 0.6
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 60
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
>
>
> cassandra@cqlsh> select ttl(value) from test.test where id =
> 4f860bf0-d793-4408-8330-a809c6cf6375;
>
> ttl(value)
> 
> 25
>
> (1 rows)
> cassandra@cqlsh> UPDATE test.test SET value = 'c' WHERE id =
> 4f860bf0-d793-4408-8330-a809c6cf6375;
> cassandra@cqlsh> select ttl(value) from test.test where id =
> 4f860bf0-d793-4408-8330-a809c6cf6375;
>
> ttl(value)
> 
> 59
>
> (1 rows)
> cassandra@cqlsh> select * from test.test ;
>
> id | value
> --+---
> 4f860bf0-d793-4408-8330-a809c6cf6375 | c
>
>
> On Wed, 17 Apr 2019 at 19:18, fald 1970  wrote:
> >
> >
> >
> > Hi,
> >
> > According to these Facts:
> > 1. If a node is down for longer than max_hint_window_in_ms (3 hours by 
> > default), the coordinator stops writing new hints.
> > 2. The main purpose of gc_grace property is to prevent Zombie data and also 
> > it d

Re: gc_grace config for time serie database

2019-04-17 Thread Stefan Miklosovic
TTL value is decreasing every second and it is set to original TTL
value back after some update occurs on that row (see example below).
Does not it logically imply that if a node is down for some time and
updates are occurring on live nodes and handoffs are saved for three
hours and after three hours it stops to do them, your data on other
nodes would not be deleted as TTLS are reset upon every update and
countdown starts again, which is correct, but they would be deleted on
that node which was down because it didnt receive updates so if you
query that node, data will not be there but they should.

On the other hand, a node was down, it was TTLed on healthy nodes and
tombstone was created, then you start the first one which was down and
as it counts down you hit that node with update. So there is not a
tombstone on the previously dead node but there are tombstones on
healthy ones and if you delete tombstones after 3 hours, previously
dead node will never get that info and it your data might actually end
up being resurrected as they would be replicated to always healthy
nodes as part of the repair.

Do you see some flaw in my reasoning?

cassandra@cqlsh> DESCRIBE TABLE test.test;

CREATE TABLE test.test (
id uuid PRIMARY KEY,
value text
) WITH bloom_filter_fp_chance = 0.6
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 60
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';


cassandra@cqlsh> select ttl(value) from test.test where id =
4f860bf0-d793-4408-8330-a809c6cf6375;

 ttl(value)

 25

(1 rows)
cassandra@cqlsh> UPDATE test.test SET value = 'c' WHERE  id =
4f860bf0-d793-4408-8330-a809c6cf6375;
cassandra@cqlsh> select ttl(value) from test.test where id =
4f860bf0-d793-4408-8330-a809c6cf6375;

 ttl(value)

 59

(1 rows)
cassandra@cqlsh> select * from test.test  ;

 id   | value
--+---
 4f860bf0-d793-4408-8330-a809c6cf6375 | c


On Wed, 17 Apr 2019 at 19:18, fald 1970  wrote:
>
>
>
> Hi,
>
> According to these Facts:
> 1. If a node is down for longer than max_hint_window_in_ms (3 hours by 
> default), the coordinator stops writing new hints.
> 2. The main purpose of gc_grace property is to prevent Zombie data and also 
> it determines for how long the coordinator should keep hinted files
>
> When we use Cassandra for Time series data which:
> A) Every row of data has TTL and there would be no explicit delete so not so 
> much worried about zombies
> B) At every minute there should be hundredrs of write requets to each node, 
> so if one of the node was down for longer than max_hint_window_in_ms, we 
> should run manual repair on that node, so anyway stored hints on the 
> coordinator won't be necessary.
>
> So Finally the question, is this a good idea to set gc_grace equal to 
> max_hint_window_in_ms (/1000 to convert to seconds),
> for example set them both to 3 hours (why should keep the tombstones for 10 
> days when they won't be needed at all)?
>
> Best Regards
> Federica Albertini

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Bloom filter false positives high

2019-04-17 Thread Stefan Miklosovic
Lastly I wonder if that number is very same from every node you
connect your nodetool to. Do all nodes see very similar false
positives ratio / number?

On Wed, 17 Apr 2019 at 21:41, Stefan Miklosovic
 wrote:
>
> One thing comes to my mind but my reasoning is questionable as I am
> not an expert in this.
>
> If you think about this, the whole concept of Bloom filter is to check
> if some record is in particular SSTable. False positive mean that,
> obviously, filter thought it was there but in fact it is not. So
> Cassandra did a look unnecessarily. Why does it think that it is there
> in such number of cases? You either make a lot of same requests on
> same partition key over time hence querying same data over and over
> again (but would not that data be cached?) or there was a lot of data
> written with same partition key so it thinks it is there but
> clustering column is different. As ts is of type timeuuid, isnt it
> true that you are doing a lot of queries with some date? It might be
> true that hash is done only on partition keys and not on clustering
> columns so filter gives you "yes" and it goes there, checks it
> clustering column is equal what you queried and its not there. But as
> I say I might be wrong ...
>
> More to it, your read_repair_chance is 0.0 so it will never do a
> repair after successful read (e.g. you have rf 3 and cl quorum so one
> node is somehow behind) so if you dont run repairs maybe it is just
> somehow unsychronized but that is really just my guess.
>
> On Wed, 17 Apr 2019 at 21:39, Martin Mačura  wrote:
> >
> > We cannot run any repairs on these tables.  Whenever we tried it 
> > (incremental or full or partitioner range), it caused a node to run out of 
> > disk space during anticompaction.  We'll try again once Cassandra 4.0 is 
> > released.
> >
> > On Wed, Apr 17, 2019 at 1:07 PM Stefan Miklosovic 
> >  wrote:
> >>
> >> if you invoke nodetool it gets false positives number from this metric
> >>
> >> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L564-L578
> >>
> >> You get high false positives so this accumulates them
> >>
> >> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L572
> >>
> >> If you follow that, that number is computed here
> >>
> >> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/io/sstable/BloomFilterTracker.java#L44-L55
> >>
> >> In order to have that number so high, the difference has to be so big
> >> so lastFalsePositiveCount is imho significantly lower
> >>
> >> False positives are ever increased only in BigTableReader where it get
> >> complicated very quickly and I am not sure why it is called to be
> >> honest.
> >>
> >> Is all fine with db as such? Do you run repairs? Does that number
> >> increses or decreases over time? Has repair or compaction some effect
> >> on it?
> >>
> >> On Wed, 17 Apr 2019 at 20:48, Martin Mačura  wrote:
> >> >
> >> > Both tables use the default bloom_filter_fp_chance of 0.01 ...
> >> >
> >> > CREATE TABLE ... (
> >> >a int,
> >> >b int,
> >> >bucket timestamp,
> >> >ts timeuuid,
> >> >c int,
> >> > ...
> >> >PRIMARY KEY ((a, b, bucket), ts, c)
> >> > ) WITH CLUSTERING ORDER BY (ts DESC, monitor ASC)
> >> >AND bloom_filter_fp_chance = 0.01
> >> >AND compaction = {'class': 
> >> > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
> >> > 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 
> >> > 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction':
> >> > 'false'}
> >> >AND dclocal_read_repair_chance = 0.0
> >> >AND default_time_to_live = 63072000
> >> >AND gc_grace_seconds = 10800
> >> > ...
> >> >AND read_repair_chance = 0.0
> >> >AND speculative_retry = 'NONE';
> >> >
> >> >
> >> > CREATE TABLE ... (
> >> >c int,
> >> >b int,
> >> >bucket timestamp,
> >> >ts timeuuid,
> >> > ...
> >> >PRIMARY KEY ((c, b, bucket), ts)
> >> > ) WITH CLUSTERING ORDER BY (ts DESC)
> >> >AND bloom_filter_fp_chance = 0.01
> >> >AND compaction = {'class': 
> >> >

Re: Bloom filter false positives high

2019-04-17 Thread Stefan Miklosovic
One thing comes to my mind but my reasoning is questionable as I am
not an expert in this.

If you think about this, the whole concept of Bloom filter is to check
if some record is in particular SSTable. False positive mean that,
obviously, filter thought it was there but in fact it is not. So
Cassandra did a look unnecessarily. Why does it think that it is there
in such number of cases? You either make a lot of same requests on
same partition key over time hence querying same data over and over
again (but would not that data be cached?) or there was a lot of data
written with same partition key so it thinks it is there but
clustering column is different. As ts is of type timeuuid, isnt it
true that you are doing a lot of queries with some date? It might be
true that hash is done only on partition keys and not on clustering
columns so filter gives you "yes" and it goes there, checks it
clustering column is equal what you queried and its not there. But as
I say I might be wrong ...

More to it, your read_repair_chance is 0.0 so it will never do a
repair after successful read (e.g. you have rf 3 and cl quorum so one
node is somehow behind) so if you dont run repairs maybe it is just
somehow unsychronized but that is really just my guess.

On Wed, 17 Apr 2019 at 21:39, Martin Mačura  wrote:
>
> We cannot run any repairs on these tables.  Whenever we tried it (incremental 
> or full or partitioner range), it caused a node to run out of disk space 
> during anticompaction.  We'll try again once Cassandra 4.0 is released.
>
> On Wed, Apr 17, 2019 at 1:07 PM Stefan Miklosovic 
>  wrote:
>>
>> if you invoke nodetool it gets false positives number from this metric
>>
>> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L564-L578
>>
>> You get high false positives so this accumulates them
>>
>> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L572
>>
>> If you follow that, that number is computed here
>>
>> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/io/sstable/BloomFilterTracker.java#L44-L55
>>
>> In order to have that number so high, the difference has to be so big
>> so lastFalsePositiveCount is imho significantly lower
>>
>> False positives are ever increased only in BigTableReader where it get
>> complicated very quickly and I am not sure why it is called to be
>> honest.
>>
>> Is all fine with db as such? Do you run repairs? Does that number
>> increses or decreases over time? Has repair or compaction some effect
>> on it?
>>
>> On Wed, 17 Apr 2019 at 20:48, Martin Mačura  wrote:
>> >
>> > Both tables use the default bloom_filter_fp_chance of 0.01 ...
>> >
>> > CREATE TABLE ... (
>> >a int,
>> >b int,
>> >bucket timestamp,
>> >ts timeuuid,
>> >c int,
>> > ...
>> >PRIMARY KEY ((a, b, bucket), ts, c)
>> > ) WITH CLUSTERING ORDER BY (ts DESC, monitor ASC)
>> >AND bloom_filter_fp_chance = 0.01
>> >AND compaction = {'class': 
>> > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
>> > 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 
>> > 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction':
>> > 'false'}
>> >AND dclocal_read_repair_chance = 0.0
>> >AND default_time_to_live = 63072000
>> >AND gc_grace_seconds = 10800
>> > ...
>> >AND read_repair_chance = 0.0
>> >AND speculative_retry = 'NONE';
>> >
>> >
>> > CREATE TABLE ... (
>> >c int,
>> >b int,
>> >bucket timestamp,
>> >ts timeuuid,
>> > ...
>> >PRIMARY KEY ((c, b, bucket), ts)
>> > ) WITH CLUSTERING ORDER BY (ts DESC)
>> >AND bloom_filter_fp_chance = 0.01
>> >AND compaction = {'class': 
>> > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
>> > 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 
>> > 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction':
>> > 'false'}
>> >AND dclocal_read_repair_chance = 0.0
>> >AND default_time_to_live = 63072000
>> >AND gc_grace_seconds = 10800
>> > ...
>> >AND read_repair_chance = 0.0
>> >AND speculative_retry = 'NONE';
>> >
>> > On Wed, Apr 17, 2019 at 12:25 PM Stefan Miklosovic 
>> >  wrote:
>> >>
>> >> What is your bloom_filt

Re: Bloom filter false positives high

2019-04-17 Thread Stefan Miklosovic
if you invoke nodetool it gets false positives number from this metric

https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L564-L578

You get high false positives so this accumulates them

https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L572

If you follow that, that number is computed here

https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/io/sstable/BloomFilterTracker.java#L44-L55

In order to have that number so high, the difference has to be so big
so lastFalsePositiveCount is imho significantly lower

False positives are ever increased only in BigTableReader where it get
complicated very quickly and I am not sure why it is called to be
honest.

Is all fine with db as such? Do you run repairs? Does that number
increses or decreases over time? Has repair or compaction some effect
on it?

On Wed, 17 Apr 2019 at 20:48, Martin Mačura  wrote:
>
> Both tables use the default bloom_filter_fp_chance of 0.01 ...
>
> CREATE TABLE ... (
>a int,
>b int,
>bucket timestamp,
>ts timeuuid,
>c int,
> ...
>PRIMARY KEY ((a, b, bucket), ts, c)
> ) WITH CLUSTERING ORDER BY (ts DESC, monitor ASC)
>AND bloom_filter_fp_chance = 0.01
>AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
> 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 
> 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction':
> 'false'}
>AND dclocal_read_repair_chance = 0.0
>AND default_time_to_live = 63072000
>AND gc_grace_seconds = 10800
> ...
>AND read_repair_chance = 0.0
>AND speculative_retry = 'NONE';
>
>
> CREATE TABLE ... (
>c int,
>b int,
>bucket timestamp,
>ts timeuuid,
> ...
>PRIMARY KEY ((c, b, bucket), ts)
> ) WITH CLUSTERING ORDER BY (ts DESC)
>AND bloom_filter_fp_chance = 0.01
>AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
> 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 
> 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction':
> 'false'}
>AND dclocal_read_repair_chance = 0.0
>AND default_time_to_live = 63072000
>AND gc_grace_seconds = 10800
> ...
>AND read_repair_chance = 0.0
>AND speculative_retry = 'NONE';
>
> On Wed, Apr 17, 2019 at 12:25 PM Stefan Miklosovic 
>  wrote:
>>
>> What is your bloom_filter_fp_chance for either table? I guess it is
>> bigger for the first one, bigger that number is between 0 and 1, less
>> memory it will use (17 MiB against 54.9 Mib) which means more false
>> positives you will get.
>>
>> On Wed, 17 Apr 2019 at 19:59, Martin Mačura  wrote:
>> >
>> > Hi,
>> > I have a table with poor bloom filter false ratio:
>> >SSTable count: 1223
>> >Space used (live): 726.58 GiB
>> >Number of partitions (estimate): 8592749
>> >Bloom filter false positives: 35796352
>> >Bloom filter false ratio: 0.68472
>> >Bloom filter space used: 17.82 MiB
>> >Compacted partition maximum bytes: 386857368
>> >
>> > It's a time series, TWCS compaction, window size 1 day, data partitioned 
>> > in daily buckets, TTL 2 years.
>> >
>> > I have another table with a similar schema, but it is not affected for 
>> > some reason:
>> >SSTable count: 1114
>> >Space used (live): 329.87 GiB
>> >Number of partitions (estimate): 25460768
>> >Bloom filter false positives: 156942
>> >Bloom filter false ratio: 0.00010
>> >Bloom filter space used: 54.9 MiB
>> >Compacted partition maximum bytes: 20924300
>> >
>> > Thanks for any advice,
>> >
>> > Martin
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Bloom filter false positives high

2019-04-17 Thread Stefan Miklosovic
What is your bloom_filter_fp_chance for either table? I guess it is
bigger for the first one, bigger that number is between 0 and 1, less
memory it will use (17 MiB against 54.9 Mib) which means more false
positives you will get.

On Wed, 17 Apr 2019 at 19:59, Martin Mačura  wrote:
>
> Hi,
> I have a table with poor bloom filter false ratio:
>SSTable count: 1223
>Space used (live): 726.58 GiB
>Number of partitions (estimate): 8592749
>Bloom filter false positives: 35796352
>Bloom filter false ratio: 0.68472
>Bloom filter space used: 17.82 MiB
>Compacted partition maximum bytes: 386857368
>
> It's a time series, TWCS compaction, window size 1 day, data partitioned in 
> daily buckets, TTL 2 years.
>
> I have another table with a similar schema, but it is not affected for some 
> reason:
>SSTable count: 1114
>Space used (live): 329.87 GiB
>Number of partitions (estimate): 25460768
>Bloom filter false positives: 156942
>Bloom filter false ratio: 0.00010
>Bloom filter space used: 54.9 MiB
>Compacted partition maximum bytes: 20924300
>
> Thanks for any advice,
>
> Martin

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Issue while updating a record in 3 node cassandra cluster deployed using kubernetes

2019-04-09 Thread Stefan Miklosovic
>> I have a 3 node cassandra cluster with Replication factor as 2 and 
>> read-write consistency set to QUORUM.

I am not sure what you want to achieve with this. If you have three
nodes and RF 2, for each write there will be two replicas, right ...
If one of your replicas is down out of two in total, you will never
reach quorum as one node is down and one is up and that is not quorum
if half of your nodes is up. If one of your nodes fails and the record
is on that one and some other, your query fails too so your cluster is
not protected against any failed nodes.

On Tue, 9 Apr 2019 at 23:10, Mahesh Daksha  wrote:
>
> Hello All,
>
> I have a 3 node cassandra cluster with Replication factor as 2 and read-write 
> consistency set to QUORUM. We are using Spring data cassandra. All 
> infrastructure is deployed using kubernetes.
>
> Now in normal use case many records gets inserted to cassandra table. Then we 
> try to modify/update one of the record using save method of repo, like below:
>
> ChunkMeta tmpRec = chunkMetaRepository.save(chunkMeta);
>
> After execution of above statement we never see any exception or error. But 
> still this update state goes silent/fail intermittently. That is at times the 
> record in the db gets updated successfully where as other time it fails. Also 
> in the above query when we print tmpRec it contains the updated and correct 
> value every time. Still in the db these updated values doesn't get reflected.
>
> We check the the cassandra transport TRACE logs on all nodes and found the 
> our queries are getting logged there and are being executed also with out any 
> error or exception.
>
> Now another weird observation is this all thing works erfectly fine if I am 
> using single cassandra node (in kubernetes) or if we deploy above infra using 
> ansible (even works for 3 nodes for Ansible).
>
> It looks some issue is specifically with the kubernetes 3 node deployment of 
> cassandra. Primarily looks like replication among nodes causing this.
>
> Please suggest.
>
>
>
> I have a 3 node cassandra cluster with Replication factor as 2 and read-write 
> consistency set to QUORUM. We are using Spring data cassandra. All 
> infrastructure is deployed using kubernetes.
>
> Now in normal use case many records gets inserted to cassandra table. Then we 
> try to modify/update one of the record using save method of repo, like below:
>
> ChunkMeta tmpRec = chunkMetaRepository.save(chunkMeta);
>
> After execution of above statement we never see any exception or error. But 
> still this update fail intermittently. That is when we check the record in 
> the db sometime it gets updated successfully where as other time it fails. 
> Also in the above query when we print tmpRec it contains the updated and 
> correct value. Still in the db these updated values doesnt get reflected.
>
> We check the the cassandra transport TRACE logs on all nodes and found the 
> our queries are getting logged there and are being executed also.
>
> Now another weird observation is this all thing works if I am using single 
> cassandra node (in kubernetes) or if we deploy above infra using ansible 
> (even works for 3 nodes for Ansible).
>
> It looks some issue is specifically with the kubernetes 3 node deployment of 
> cassandra. Primarily looks like replication among nodes causing this.
>
> Please suggest.
>
> Below are the contents of  my cassandra Docker file:
>
> FROM ubuntu:16.04
>
> RUN apt-get update && apt-get install -y python sudo lsof vim dnsutils 
> net-tools && apt-get clean && \
> addgroup testuser && useradd -g testuser testuser && usermod --password 
> testuser testuser;
>
> RUN mkdir -p /opt/test && \
> mkdir -p /opt/test/data;
>
> ADD jre8.tar.gz /opt/test/
> ADD apache-cassandra-3.11.0-bin.tar.gz /opt/test/
>
> RUN chmod 755 -R /opt/test/jre && \
> ln -s /opt/test/jre/bin/java /usr/bin/java && \
> mv /opt/test/apache-cassandra* /opt/test/cassandra;
>
> RUN mkdir -p /opt/test/cassandra/logs;
>
> ENV JAVA_HOME /opt/test/jre
> RUN export JAVA_HOME
>
> COPY version.txt /opt/test/cassandra/version.txt
>
> WORKDIR /opt/test/cassandra/bin/
>
> RUN mkdir -p /opt/test/data/saved_caches && \
> mkdir -p /opt/test/data/commitlog && \
> mkdir -p /opt/test/data/hints && \
> chown -R testuser:testuser /opt/test/data && \
> chown -R testuser:testuser /opt/test;
>
> USER testuser
>
> CMD cp /etc/cassandra/cassandra.yml ../conf/conf.yml && perl -p -e 
> 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg; s/\$\{([^}]+)\}//eg' 
> ../conf/conf.yml > ../conf/cassandra.yaml && rm ../conf/conf.yml && 
> ./cassandra -f
>
> Please note conf.yml is basically cassandra.yml file having properties 
> related to cassandra.
>
>
> Thanks,
>
> Mahesh Daksha

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: time tracking for down node for nodetool repair

2019-04-08 Thread Stefan Miklosovic
Ah I see it is the default for hinted handoffs. I was somehow thinking
its bigger figure I do not know why :)

I would say you should run repairs continuously / periodically so you
would not even have to do some thinking about that and it should run
in the background in a scheduled manner if possible.

Regards

On Tue, 9 Apr 2019 at 04:19, Kunal  wrote:
>
> Hello everyone..
>
>
>
> I have a 6 node Cassandra datacenter, 3 nodes on each datacenter. If one of 
> the node goes down and remain down for more than 3 hr, I have to run nodetool 
> repair. Just wanted to ask if Cassandra  automatically tracks the time when 
> one of the Cassandra node goes down or do I need to write code to track the 
> time and run repair when node comes back online after 3 hrs.
>
>
> Thanks in anticipation.
>
> Regards,
> Kunal Vaid

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: time tracking for down node for nodetool repair

2019-04-08 Thread Stefan Miklosovic
Hi Kunal,

where do you have that "more than 3 hours" from?

Regards

On Tue, 9 Apr 2019 at 04:19, Kunal  wrote:
>
> Hello everyone..
>
>
>
> I have a 6 node Cassandra datacenter, 3 nodes on each datacenter. If one of 
> the node goes down and remain down for more than 3 hr, I have to run nodetool 
> repair. Just wanted to ask if Cassandra  automatically tracks the time when 
> one of the Cassandra node goes down or do I need to write code to track the 
> time and run repair when node comes back online after 3 hrs.
>
>
> Thanks in anticipation.
>
> Regards,
> Kunal Vaid

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Procedures for moving part of a C* cluster to a different datacenter

2019-04-03 Thread Stefan Miklosovic
On Wed, 3 Apr 2019 at 18:38, Oleksandr Shulgin
 wrote:
>
> On Wed, Apr 3, 2019 at 12:28 AM Saleil Bhat (BLOOMBERG/ 731 LEX) 
>  wrote:
>>
>>
>> The standard procedure for doing this seems to be add a 3rd datacenter to 
>> the cluster, stream data to the new datacenter via nodetool rebuild, then 
>> decommission the old datacenter. A more detailed review of this procedure 
>> can be found here:

>> http://thelastpickle.com/blog/2019/02/26/data-center-switch.html
>>
>>

However, I see two problems with the above protocol. First, it requires 
>>changes on the application layer because of the datacenter name change; e.g. 
>>all applications referring to the datacenter ‘Orlando’ will now have to be 
>>changed to refer to ‘Tampa’.
>
>
> Alternatively, you may omit DC specification in the client and provide 
> internal network addresses as the contact points.

I am afraid you are mixing two things together. I believe OP means
that he has to change local dc in DCAwareRoundRobinPolicy. I am not
sure what contact points have to do with that. If there is at least
one contact point from DC nobody removes all should be fine.

The process in the article is right. Before transitioning to new DC
one has to be sure that all writes and reads still target old dc too
after you alter a keyspace and add new dc there so you dont miss any
write when something goes south and you have to switch it back. Thats
achieved by local_one / local_quorum and DCAwareRoundRobinPolicy with
localDc pointing to the old one.

Then you do rebuild and you restart your app in such way that new DC
will be in that policy so new writes and reads are going primarily to
that DC and once all is fine you drop the old one (you can do maybe
additional repair to be sure). I think the rolling restart of the app
is inevitable but if services are in some kind of HA setup I dont see
a problem with that. From outside it would look like there is not any
downtime.

OP has a problem with repair on nodes and it is true that can be time
consuming, even not doable, but there are workarounds around that and
I do not want to go into here. You can speed this process
significantly when you are smart about that and you repair in smaller
chunks so you dont clog your cluster completely, its called subrange
repair.

>> As such, I was wondering what peoples’ thoughts were on the following 
>> alternative procedure:
>> 1) Kill one node in the old datacenter
>> 2) Add a new node in the new datacenter but indicate that it is to REPLACE 
>> the one just shutdown; this node will bootstrap, and all the data which it 
>> is supposed to be responsible for will be streamed to it
>
>
> I don't think this is going to work.  First, I believe streaming for 
> bootstrap or for replacing a node is DC-local, so the first node won't have 
> any peers to stream from.  Even if it would stream from the remote DC, this 
> single node will own 100% of the ring and will most likely die of the load 
> well before it finishes streaming.
>
> Regards,
> --
> Alex
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Multi-DC replication and hinted handoff

2019-04-02 Thread Stefan Miklosovic
Hi Jens,

I am reading Cassandra The definitive guide and there is a chapter 9 -
Reading and Writing Data and section The Cassandra Write Path and this
sentence in it:

If a replica does not respond within the timeout, it is presumed to be down
and a hint is stored for the write.

So your node might be actually fine eventually but it just can not cope
with the load and it will reply too late after a coordinator has sufficient
replies from other replicas. So it makes a hint for that write and for that
node. I am not sure how is this related to turning off handoffs completely.
I can do some tests locally if time allows to investigate various
scenarios. There might be some subtle differences 

On Wed, 3 Apr 2019 at 07:19, Jens Fischer  wrote:

> Yes, Apache Cassandra 3.11.2 (no DSE).
>
> On 2. Apr 2019, at 19:40, sankalp kohli  wrote:
>
> Are you using OSS C*?
>
> On Fri, Mar 29, 2019 at 1:49 AM Jens Fischer  wrote:
>
>> Hi,
>>
>> I have a Cassandra setup with multiple data centres. The vast majority of
>> writes are LOCAL_ONE writes to data center DC-A. One node (lets call this
>> node A1) in DC-A has accumulated large amounts of hint files (~100 GB). In
>> the logs of this node I see lots of messages like the following:
>>
>> INFO  [HintsDispatcher:26] 2019-03-28 01:49:25,217
>> HintsDispatchExecutor.java:289 - Finished hinted handoff of file
>> db485ac6-8acd-4241-9e21-7a2b540459de-1553419324363-1.hints to endpoint /
>> 10.10.2.55: db485ac6-8acd-4241-9e21-7a2b540459de
>>
>> The node 10.10.2.55 is in DC-B, lets call this node B1. There is no
>> indication whatsoever that B1 was down: Nothing in our monitoring, nothing
>> in the logs of B1, nothing in the logs of A1. Are there any other
>> situations where hints to B1 are stored at A1? Other than A1's failure
>> detection detecting B1 as down I mean. For example could the reason for the
>> hints be that B1 is overloaded and can not handle the intake from the A1?
>> Or that the network connection between DC-A and DC-B is to slow?
>>
>> While researching this I also found the following information on Stack
>> Overflow from Ben Slater regarding hints and multi-dc replication:
>>
>> Another factor here is the consistency level you are using - a LOCAL_*
>> consistency level will only require writes to be written to the local DC
>> for the operation to be considered a success (and hints will be stored for
>> replication to the other DC).
>> (…)
>> The hints are the records of writes that have been made in one DC that
>> are not yet replicated to the other DC (or even nodes within a DC). I think
>> your options to avoid them are: (1) write with ALL or QUOROM (not LOCAL_*)
>> consistency - this will slow down your writes but will ensure writes go
>> into both DCs before the op completes (2) Don't replicate the data to the
>> second DC (by setting the replication factor to 0 for the second DC in the
>> keyspace definition) (3) Increase the capacity of the second DC so it can
>> keep up with the writes (4) Slow down your writes so the second DC can keep
>> up.
>>
>>
>> Source: https://stackoverflow.com/a/37382726
>>
>> This reads like hints are used for “normal” (async) replication between
>> data centres, i.e. hints could show up without any nodes being down
>> whatsoever. This could explain what I am seeing. Does anyone now more about
>> this? Does that mean I will see hints even if I disable hinted handoff?
>>
>> Any pointers or help are greatly appreciated!
>>
>> Thanks in advance
>> Jens
>>
>> Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen
>> Schneider, Hermann Schweizer.
>> Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer
>> 127/137/50792, USt.-IdNr. DE272208908
>>
>
> Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen
> Schneider, Hermann Schweizer.
> Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer
> 127/137/50792, USt.-IdNr. DE272208908
>


Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

2019-03-14 Thread Stefan Miklosovic
It is just a C* in Docker Compose with static IP addresses as long as all
containers run. I am just killing Cassandra process and starting it again
in each container.

On Fri, 15 Mar 2019 at 10:47, Jeff Jirsa  wrote:

> Are your IPs changing as you restart the cluster? Kubernetes or Mesos or
> something where your data gets scheduled on different machines? If so, if
> it gets an IP that was previously in the cluster, it’ll stomp on the old
> entry in the gossiper maps
>
>
>
> --
> Jeff Jirsa
>
>
> On Mar 14, 2019, at 3:42 PM, Fd Habash  wrote:
>
> I can conclusively say, none of these commands were run. However, I think
> this is  the likely scenario …
>
>
>
> If you have a cluster of three nodes 1,2,3 …
>
>- If 3 shows as DN
>- Restart C* on 1 & 2
>- Nodetool status should NOT show node 3 IP at all.
>
>
>
> Restarting the cluster while a node is down resets gossip state.
>
>
>
> There is a good chance this is what happened.
>
>
>
> Plausible?
>
>
>
> 
> Thank you
>
>
>
> *From: *Jeff Jirsa 
> *Sent: *Thursday, March 14, 2019 11:06 AM
> *To: *cassandra 
> *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't
> exist ingossip
>
>
>
> Two things that wouldn't be a bug:
>
>
>
> You could have run removenode
>
> You could have run assassinate
>
>
>
> Also could be some new bug, but that's much less likely.
>
>
>
>
>
> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash  wrote:
>
> I have a node which I know for certain was a cluster member last week. It
> showed in nodetool status as DN. When I attempted to replace it today, I
> got this message
>
>
>
> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception
> encountered during startup
>
> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx
> because it doesn't exist in gossip
>
> at
> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
> ~[apache-cassandra-2.2.8.jar:2.2.8]
>
>
>
>
>
> DN  10.xx.xx.xx  388.43 KB  256  6.9%
> bdbd632a-bf5d-44d4-b220-f17f258c4701  1e
>
>
>
> Under what conditions does this happen?
>
>
>
>
>
> 
> Thank you
>
>
>
>
>
>

-- 


*Stefan Miklosovic**Senior Software Engineer*


M: +61459911436

<https://www.instaclustr.com>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

Instaclustr values your privacy. Our privacy policy can be found at
https://www.instaclustr.com/company/policies/privacy-policy


Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

2019-03-14 Thread Stefan Miklosovic
Hi Fd,

I tried this on 3 nodes cluster. I killed node 2, both node1 and node3
reported node2 to be DN, then I killed node1 and node3 and I restarted them
and node2 was reported like this:

[root@spark-master-1 /]# nodetool status
Datacenter: DC1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens   Owns (effective)  Host ID
 Rack
DN  172.19.0.8  ?  256  64.0%
 bd75a5e2-2890-44c5-8f7a-fca1b4ce94ab  r1
Datacenter: dc1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens   Owns (effective)  Host ID
 Rack
UN  172.19.0.5  382.75 KiB  256  64.4%
 2a062140-2428-4092-b48b-7495d083d7f9  rack1
UN  172.19.0.9  171.41 KiB  256  71.6%
 9590b791-ad53-4b5a-b4c7-b00408ed02dd  rack3

Prior to killing of node1 and node3, node2 was indeed marked as DN but it
was part of the "Datacenter: dc1" output where both node1 and node3 were.

But after killing both node1 and node3 (so cluster was totally down), after
restarting them, node2 was reported like that.

I do not know what is the difference here. Are gossiping data somewhere
stored on the disk? I would say so, otherwise there is no way how could
node1 / node3 report
that node2 is down but at the same time I dont get why it is "out of the
list" where node1 and node3 are.


On Fri, 15 Mar 2019 at 02:42, Fd Habash  wrote:

> I can conclusively say, none of these commands were run. However, I think
> this is  the likely scenario …
>
>
>
> If you have a cluster of three nodes 1,2,3 …
>
>- If 3 shows as DN
>- Restart C* on 1 & 2
>- Nodetool status should NOT show node 3 IP at all.
>
>
>
> Restarting the cluster while a node is down resets gossip state.
>
>
>
> There is a good chance this is what happened.
>
>
>
> Plausible?
>
>
>
> 
> Thank you
>
>
>
> *From: *Jeff Jirsa 
> *Sent: *Thursday, March 14, 2019 11:06 AM
> *To: *cassandra 
> *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't
> exist ingossip
>
>
>
> Two things that wouldn't be a bug:
>
>
>
> You could have run removenode
>
> You could have run assassinate
>
>
>
> Also could be some new bug, but that's much less likely.
>
>
>
>
>
> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash  wrote:
>
> I have a node which I know for certain was a cluster member last week. It
> showed in nodetool status as DN. When I attempted to replace it today, I
> got this message
>
>
>
> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception
> encountered during startup
>
> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx
> because it doesn't exist in gossip
>
> at
> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
> ~[apache-cassandra-2.2.8.jar:2.2.8]
>
>
>
>
>
> DN  10.xx.xx.xx  388.43 KB  256  6.9%
> bdbd632a-bf5d-44d4-b220-f17f258c4701  1e
>
>
>
> Under what conditions does this happen?
>
>
>
>
>
> 
> Thank you
>
>
>
>
>

Stefan Miklosovic


Re: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-13 Thread Stefan Miklosovic
Hi Leena,

as already suggested in my previous email, you could use Apache Spark and
Cassandra Spark connector (1). I have checked TTLs and I believe you should
especially read this section (2) about TTLs. Seems like thats what you need
to do, ttls per row. The workflow would be that you read from your source
table, making transformations per row (via some mapping) and then you would
save it to new table.

This would import it "all" but until you switch to the new table and
records are still being saved into the original one, I am not sure how to
cover "the gap" in such sense that once you make the switch, you would miss
records which were created in the first table after you did the loading.
You could maybe leverage Spark streaming (Cassandra connector knows that
too) so you would make this transformation on the fly with new ones.

(1) https://github.com/datastax/spark-cassandra-connector
(2)
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md#using-a-different-value-for-each-row


On Thu, 14 Mar 2019 at 00:13, Leena Ghatpande 
wrote:

> Understand, 2nd table would be a better approach. So what would be the
> best way to copy 70M rows from current table to the 2nd table with ttl set
> on each record as the first table?
>
> --
> *From:* Durity, Sean R 
> *Sent:* Wednesday, March 13, 2019 8:17 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: [EXTERNAL] Re: Migrate large volume of data from one table
> to another table within the same cluster when COPY is not an option.
>
>
> Correct, there is no current flag. I think there SHOULD be one.
>
>
>
>
>
> *From:* Dieudonné Madishon NGAYA 
> *Sent:* Tuesday, March 12, 2019 7:17 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Migrate large volume of data from one table to
> another table within the same cluster when COPY is not an option.
>
>
>
> Hi Sean, you can’t flag in Cassandra.yaml not allowing allow filtering ,
> the only thing you can do will be from your data model .
>
> Don’t ask Cassandra to query all data from table but the ideal query will
> be using single partition.
>
>
>
> On Tue, Mar 12, 2019 at 6:46 PM Stefan Miklosovic <
> stefan.mikloso...@instaclustr.com> wrote:
>
> Hi Sean,
>
>
>
> for sure, the best approach would be to create another table which would
> treat just that specific query.
>
>
>
> How do I set the flag for not allowing allow filtering in cassandra.yaml?
> I read a doco and there seems to be nothing about that.
>
>
>
> Regards
>
>
>
> On Wed, 13 Mar 2019 at 06:57, Durity, Sean R 
> wrote:
>
> If there are 2 access patterns, I would consider having 2 tables. The
> first one with the ID, which you say is the majority use case.  Then have a
> second table that uses a time-bucket approach as others have suggested:
>
> (time bucket, id) as primary key
>
> Choose a time bucket (day, week, hour, month, whatever) that would hold
> less than 100 MB of data in the time-bucket partition.
>
>
>
> You could include all relevant data in the second table to meet your
> query. OR, if that data seems too large or too volatile to duplicate, just
> include your primary key and look-up the data in the primary table as
> needed.
>
>
>
> If you use allow filtering, you are setting yourself up for failure to
> scale. I tell my developers, “if you use allow filtering, you are doing it
> wrong.” In fact, I think the Cassandra admin should be able to set a flag
> in cassandra.yaml to not allow filtering at all. The cluster should be able
> to protect itself from bad queries.
>
>
>
>
>
>
>
> *From:* Leena Ghatpande 
> *Sent:* Tuesday, March 12, 2019 9:02 AM
> *To:* Stefan Miklosovic ;
> user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Migrate large volume of data from one table to
> another table within the same cluster when COPY is not an option.
>
>
>
> Our data model cannot be like below as you have recommended as majority of
> the reads need to select the data by the partition key (id) only, not by
> date.
>
> You could remodel your data in such way that you would make primary key
> like this
>
> ((date), hour-minute, id)
>
> or
>
> ((date, hour-minute), id)
>
>
>
>
>
> By adding the date as clustering column, yes the idea was to use the Allow
> Filtering on the date and pull the records. Understand that it is not
> recommended to do this, but we have been doing this on another existing
> large table and have not run into any issue so far. But want to understand
> if there is a better approach to this?
>
>
>
> Thanks
>
>
> --
>
> *From

Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-12 Thread Stefan Miklosovic
Hi Sean,

for sure, the best approach would be to create another table which would
treat just that specific query.

How do I set the flag for not allowing allow filtering in cassandra.yaml? I
read a doco and there seems to be nothing about that.

Regards

On Wed, 13 Mar 2019 at 06:57, Durity, Sean R 
wrote:

> If there are 2 access patterns, I would consider having 2 tables. The
> first one with the ID, which you say is the majority use case.  Then have a
> second table that uses a time-bucket approach as others have suggested:
>
> (time bucket, id) as primary key
>
> Choose a time bucket (day, week, hour, month, whatever) that would hold
> less than 100 MB of data in the time-bucket partition.
>
>
>
> You could include all relevant data in the second table to meet your
> query. OR, if that data seems too large or too volatile to duplicate, just
> include your primary key and look-up the data in the primary table as
> needed.
>
>
>
> If you use allow filtering, you are setting yourself up for failure to
> scale. I tell my developers, “if you use allow filtering, you are doing it
> wrong.” In fact, I think the Cassandra admin should be able to set a flag
> in cassandra.yaml to not allow filtering at all. The cluster should be able
> to protect itself from bad queries.
>
>
>
>
>
>
>
> *From:* Leena Ghatpande 
> *Sent:* Tuesday, March 12, 2019 9:02 AM
> *To:* Stefan Miklosovic ;
> user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Migrate large volume of data from one table to
> another table within the same cluster when COPY is not an option.
>
>
>
> Our data model cannot be like below as you have recommended as majority of
> the reads need to select the data by the partition key (id) only, not by
> date.
>
> You could remodel your data in such way that you would make primary key
> like this
>
> ((date), hour-minute, id)
>
> or
>
> ((date, hour-minute), id)
>
>
>
>
>
> By adding the date as clustering column, yes the idea was to use the Allow
> Filtering on the date and pull the records. Understand that it is not
> recommended to do this, but we have been doing this on another existing
> large table and have not run into any issue so far. But want to understand
> if there is a better approach to this?
>
>
>
> Thanks
>
>
> --
>
> *From:* Stefan Miklosovic 
> *Sent:* Monday, March 11, 2019 7:12 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Migrate large volume of data from one table to another
> table within the same cluster when COPY is not an option.
>
>
>
> The query which does not work should be like this, I made a mistake there
>
>
>
> cqlsh> SELECT * from my_keyspace.my_table where  number > 2;
>
> InvalidRequest: Error from server: code=2200 [Invalid query]
> message="Cannot execute this query as it might involve data filtering and
> thus may have unpredictable performance. If you want to execute this query
> despite the performance unpredictability, use ALLOW FILTERING"
>
>
>
>
>
> On Tue, 12 Mar 2019 at 10:10, Stefan Miklosovic <
> stefan.mikloso...@instaclustr.com> wrote:
>
> Hi Leena,
>
>
>
> "We are thinking of creating a new table with a date field as a
> clustering column to be able to query for date ranges, but partition key to
> clustering key will be 1-1. Is this a good approach?"
>
>
>
> If you want to select by some time range here, I am wondering how would
> making datetime a clustering column help you here? You still have to
> provide primary key, right?
>
>
>
> E.g. select * from your_keyspace.your_table where id=123 and my_date >
> yesterday and my_date < tomorrow (you got the idea)
>
>
>
> If you make my_date clustering column, you cant not do this below, because
> you still have to specify partition key fully and then clustering key
> (optionally) where you can further order and do ranges. But you cant do a
> query without specifying partition key. Well, you can use ALLOW FILTERING
> but you do not want to do this at all in your situation as it would scan
> everything.
>
>
>
> select * from your_keyspace.your_table where my_date > yesterday and
> my_date < tomorrow
>
>
>
> cqlsh> create KEYSPACE my_keyspace WITH replication = {'class':
> 'NetworkTopologyStrategy', 'dc1': '1'};
>
> cqlsh> CREATE TABLE my_keyspace.my_table (id uuid, number int, PRIMARY KEY
> ((id), number));
>
>
>
> cqlsh> SELECT * from my_keyspace.my_table ;
>
>
>
>  id   | number
>
> --+
>
>  6e23f79a-8b67-47

Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-11 Thread Stefan Miklosovic
The query which does not work should be like this, I made a mistake there

cqlsh> SELECT * from my_keyspace.my_table where  number > 2;
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Cannot execute this query as it might involve data filtering and
thus may have unpredictable performance. If you want to execute this query
despite the performance unpredictability, use ALLOW FILTERING"


On Tue, 12 Mar 2019 at 10:10, Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:

> Hi Leena,
>
> "We are thinking of creating a new table with a date field as a
> clustering column to be able to query for date ranges, but partition key to
> clustering key will be 1-1. Is this a good approach?"
>
> If you want to select by some time range here, I am wondering how would
> making datetime a clustering column help you here? You still have to
> provide primary key, right?
>
> E.g. select * from your_keyspace.your_table where id=123 and my_date >
> yesterday and my_date < tomorrow (you got the idea)
>
> If you make my_date clustering column, you cant not do this below, because
> you still have to specify partition key fully and then clustering key
> (optionally) where you can further order and do ranges. But you cant do a
> query without specifying partition key. Well, you can use ALLOW FILTERING
> but you do not want to do this at all in your situation as it would scan
> everything.
>
> select * from your_keyspace.your_table where my_date > yesterday and
> my_date < tomorrow
>
> cqlsh> create KEYSPACE my_keyspace WITH replication = {'class':
> 'NetworkTopologyStrategy', 'dc1': '1'};
> cqlsh> CREATE TABLE my_keyspace.my_table (id uuid, number int, PRIMARY KEY
> ((id), number));
>
> cqlsh> SELECT * from my_keyspace.my_table ;
>
>  id   | number
> --+
>  6e23f79a-8b67-47e0-b8e0-50be78bb1c7f |  3
>  abdc0184-a695-427d-b63b-57cdf7a45f00 |  1
>  90fe112e-0f74-4cbc-8767-67bdc9c8c3b0 |  4
>  8cff3eb7-1aff-4dc7-9969-60190c7e4675 |  2
>
> cqlsh> SELECT * from my_keyspace.my_table where id =
> '6e23f79a-8b67-47e0-b8e0-50be78bb1c7f' and  number > 2;
> InvalidRequest: Error from server: code=2200 [Invalid query]
> message="Invalid STRING constant (6e23f79a-8b67-47e0-b8e0-50be78bb1c7f) for
> "id" of type uuid"
>
> cqlsh> SELECT * from my_keyspace.my_table where id =
> 6e23f79a-8b67-47e0-b8e0-50be78bb1c7f and  number > 2;
>
>  id   | number
> --+
>  6e23f79a-8b67-47e0-b8e0-50be78bb1c7f |  3
>
> You could remodel your data in such way that you would make primary key
> like this
>
> ((date), hour-minute, id)
>
> or
>
> ((date, hour-minute), id)
>
> I would prefer the second one because if you expect a lot of data per day,
> they would all end up on same set of replicas as hash of partition key
> would be same whole day if you have same date all day so I think you would
> end up with hotspots. You want to have your data spread more evenly so the
> second one seems to be better to me.
>
> You can also investigate how to do this with materialized view but I am
> not sure about the performance here.
>
> If you want to copy data you can do this e.g. by Cassandra Spark
> connector, you would just read table and as you read it you would write to
> another one. That is imho the fastest approach and the least error prone.
> You can do that on live production data and you can just make a "switch"
> afterwards. Not sure about ttls but that should be transparent while
> copying that.
>
> On Tue, 12 Mar 2019 at 03:04, Leena Ghatpande 
> wrote:
>
>> We have a table with over 70M rows with a partition key that is unique.  We
>> have a  created datetime stamp on each record, and we have a need to
>> select all rows created for a date range. Secondary index is not an option
>> as its high cardinality and could slow performance doing a full scan on 70M
>> rows.
>>
>>
>> We are thinking of creating a new table with a date field as a clustering
>> column to be able to query for date ranges, but partition key to clustering
>> key will be 1-1. Is this a good approach?
>>
>> To do this, we need to copy this large volume of data from table1 to
>> table2 within the same cluster, while updates are still happening to
>> table1. We need to do this real time without impacting our customers. COPY
>> is not an option, as we have ttl's on each row on table1 that need to be
>> applied to table2 as well.
>>
>>
>> So what would be the best approach
>>
>>1. To be able select data using date range without impacting
>>performance. This operation will be needed only on adhoc basis and it wont
>>be as frequent .
>>2. Best way to migrate large volume of data with ttl from one table
>>to another within the same cluster.
>>
>>
>> Any other suggestions also will be greatly appreciated.
>>
>>
>>
>
> Stefan Miklosovic
>

Stefan Miklosovic


Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-11 Thread Stefan Miklosovic
Hi Leena,

"We are thinking of creating a new table with a date field as a clustering
column to be able to query for date ranges, but partition key to clustering
key will be 1-1. Is this a good approach?"

If you want to select by some time range here, I am wondering how would
making datetime a clustering column help you here? You still have to
provide primary key, right?

E.g. select * from your_keyspace.your_table where id=123 and my_date >
yesterday and my_date < tomorrow (you got the idea)

If you make my_date clustering column, you cant not do this below, because
you still have to specify partition key fully and then clustering key
(optionally) where you can further order and do ranges. But you cant do a
query without specifying partition key. Well, you can use ALLOW FILTERING
but you do not want to do this at all in your situation as it would scan
everything.

select * from your_keyspace.your_table where my_date > yesterday and
my_date < tomorrow

cqlsh> create KEYSPACE my_keyspace WITH replication = {'class':
'NetworkTopologyStrategy', 'dc1': '1'};
cqlsh> CREATE TABLE my_keyspace.my_table (id uuid, number int, PRIMARY KEY
((id), number));

cqlsh> SELECT * from my_keyspace.my_table ;

 id   | number
--+
 6e23f79a-8b67-47e0-b8e0-50be78bb1c7f |  3
 abdc0184-a695-427d-b63b-57cdf7a45f00 |  1
 90fe112e-0f74-4cbc-8767-67bdc9c8c3b0 |  4
 8cff3eb7-1aff-4dc7-9969-60190c7e4675 |  2

cqlsh> SELECT * from my_keyspace.my_table where id =
'6e23f79a-8b67-47e0-b8e0-50be78bb1c7f' and  number > 2;
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Invalid STRING constant (6e23f79a-8b67-47e0-b8e0-50be78bb1c7f) for
"id" of type uuid"

cqlsh> SELECT * from my_keyspace.my_table where id =
6e23f79a-8b67-47e0-b8e0-50be78bb1c7f and  number > 2;

 id   | number
--+
 6e23f79a-8b67-47e0-b8e0-50be78bb1c7f |  3

You could remodel your data in such way that you would make primary key
like this

((date), hour-minute, id)

or

((date, hour-minute), id)

I would prefer the second one because if you expect a lot of data per day,
they would all end up on same set of replicas as hash of partition key
would be same whole day if you have same date all day so I think you would
end up with hotspots. You want to have your data spread more evenly so the
second one seems to be better to me.

You can also investigate how to do this with materialized view but I am not
sure about the performance here.

If you want to copy data you can do this e.g. by Cassandra Spark connector,
you would just read table and as you read it you would write to another
one. That is imho the fastest approach and the least error prone. You can
do that on live production data and you can just make a "switch"
afterwards. Not sure about ttls but that should be transparent while
copying that.

On Tue, 12 Mar 2019 at 03:04, Leena Ghatpande 
wrote:

> We have a table with over 70M rows with a partition key that is unique.  We
> have a  created datetime stamp on each record, and we have a need to
> select all rows created for a date range. Secondary index is not an option
> as its high cardinality and could slow performance doing a full scan on 70M
> rows.
>
>
> We are thinking of creating a new table with a date field as a clustering
> column to be able to query for date ranges, but partition key to clustering
> key will be 1-1. Is this a good approach?
>
> To do this, we need to copy this large volume of data from table1 to
> table2 within the same cluster, while updates are still happening to
> table1. We need to do this real time without impacting our customers. COPY
> is not an option, as we have ttl's on each row on table1 that need to be
> applied to table2 as well.
>
>
> So what would be the best approach
>
>1. To be able select data using date range without impacting
>performance. This operation will be needed only on adhoc basis and it wont
>be as frequent .
>2. Best way to migrate large volume of data with ttl from one table to
>another within the same cluster.
>
>
> Any other suggestions also will be greatly appreciated.
>
>
>

Stefan Miklosovic


Re: data modelling

2019-03-05 Thread Stefan Miklosovic
Hi Bobbie,

as Kenneth already mentioned, you should model your schema based on what
queries you are expecting to do and read related literature. From what I
see your table is named "customer_sensor_tagids" so its quite possible you
would have tagids as a part of primary key? Something like:

select * from keyspace.customer_sensor_tagids where tag_id = 11358097.

This implies that you would have as many records per customer and sensor
ids as many tag_id's there are. If you want to query such table and you
know customerid and sensorid in advance, you could query like

select * from keyspace.customer_sensor_tagids where customerid = X and
sensorid =Y and tag_id = 11358097

so your primary key would look like (customerid, sensorid, tagid) or
((customerid, sensorid), tagid)

If you do not know customerid nor sensorid while doing a query, you would
have to make tag_id a partition key and customerid and sensorid clustering
columns, optionally ordered, thats up to you. Now you may object that there
would be data duplication as you would have to have "as many tables as
queries" which might be true but thats not in general a problem. Thats the
cost you "pay" for having queries super fast and tailored for your use case.

I suggest to read more about data modelling in general.

On Wed, 6 Mar 2019 at 11:19, Bobbie Haynes  wrote:

> Hi
>Could you help  modelling this usecase
>
>I have below table ..I will update tagid's columns set(bigit) based on
> PK. I have created the secondary index column on tagid to query like below..
>
> Select * from keyspace.customer_sensor_tagids where tagids CONTAINS
> 11358097;
>
> this query is doing the range scan because of the secondary index.. and
> causing performance issues
>
> If i create a MV on Tagid's can i be able to query like above.. please
> suggest a Datamodel for this scenario.Apprecite your help on this.
>
> ---
>
> ---
> example of Tagids for each row:-
>4608831, 608886, 608890, 609164, 615024, 679579, 814791, 830404, 71756,
> 8538307, 9936868, 10883336, 10954034, 10958062, 10976553, 10976554,
> 10980255, 11009971, 11043805, 11075379, 11078819, 11167844, 11358097,
> 11479340, 11481769, 11481770, 11481771, 11481772, 11693597, 11709012,
> 12193230, 12421500, 12421516, 12421781, 12422011, 12422368, 12422501,
> 12422512, 12422553, 12422555, 12423381, 12423382
>
>
>  
> ---
>
> ---
>
>CREATE TABLE keyspace.customer_sensor_tagids (
> customerid bigint,
> sensorid bigint,
> XXX frozen,
> XXX frozen,
> XXX text,
> XXX text,
> XXX frozen,
> XXX bigint,
> XXX bigint,
> XXX list>,
> XXX frozen,
> XXX boolean,
> XXX bigint,
> XXX list>,
> XXX frozen,
> XXX bigint,
> XXX bigint,
> XXX list>,
> XXX list>,
> XXX set>,
> XXX set,
> XXX set,
> tagids set,
> XXX bigint,
> XXX list>,
> PRIMARY KEY ((customerid, sensorid))
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(tagids));
> CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX));
> CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);
> CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);
> CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);
> CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);
> CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX));
> CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);
>


-- 


*Stefan Miklosovic**Senior Software Engineer*


M: +61459911436

<https://www.instaclustr.com>

<https://www.facebook.com/instaclustr>   <h