Re: Adding disk to operating C*

2018-03-08 Thread Niclas Hedhman
I am curious about the side comment; "Depending on your usecase you may not want to have a data density over 1.5 TB per node." Why is that? I am planning much bigger than that, and now you give me pause... Cheers Niclas On Wed, Mar 7, 2018 at 6:59 PM, Rahul Singh

Re: Adding disk to operating C*

2018-03-08 Thread Eunsu Kim
Thanks for the answer. I never forget to flush, drain before shutting down Cassandra. It is a system that deals with lighter and faster data than accuracy. So rf = 2 and cl = one. Thank you again. > On 9 Mar 2018, at 3:12 PM, Jeff Jirsa wrote: > > There is no shuffling as

Re: Amazon Time Sync Service + ntpd vs chrony

2018-03-08 Thread Dor Laor
There is no one size fits all but take a look at this scenario: - T0 -- T1 Op0 Op1 Client Client deletes writes Y to cell X cell X T0 < T1 in the real world. When using client timestamp T0

Re: Adding disk to operating C*

2018-03-08 Thread Jeff Jirsa
There is no shuffling as the servers go up and down. Cassandra doesn’t do that. However, rf=2 is atypical and sometime problematic. If you read or write with quorum / two / all, you’ll get unavailables during the restart If you read or write with cl one, you’ll potentially not see data

Re: Adding disk to operating C*

2018-03-08 Thread Eunsu Kim
There are currently 5 writes per second. I was worried that the server downtime would be quite long during disk mount operations. If the data shuffling that occurs when the server goes down or up is working as expected, I seem to be an unnecessary concern. > On 9 Mar 2018, at 2:19 PM, Jeff

Re: Adding disk to operating C*

2018-03-08 Thread Jeff Jirsa
I see no reason to believe you’d lose data doing this - why do you suspect you may? -- Jeff Jirsa > On Mar 8, 2018, at 8:36 PM, Eunsu Kim wrote: > > The auto_snapshot setting is disabled. And the directory architecture on the > five nodes will match exactly. > >

Re: Adding disk to operating C*

2018-03-08 Thread Eunsu Kim
The auto_snapshot setting is disabled. And the directory architecture on the five nodes will match exactly. (Cassandra/Server shutdown -> Mount disk -> Add directory to data_file_directories -> Start Cassandra) * 5 rolling Is it possible to add disks without losing data by doing the above

Re: Amazon Time Sync Service + ntpd vs chrony

2018-03-08 Thread Ben Bromhead
I wouldn't 100% rely on your clients to generate timestamps (actually don't 100% rely on timestamps at all!) . Clients tend to be stateless, scaled up and down, stopped, started, ntp takes time to skew a clock and they are more likely to be moved between hypervisor's in cloud environments etc.

Re: Amazon Time Sync Service + ntpd vs chrony

2018-03-08 Thread Dor Laor
Agree! When using client timestamp, NTP should be running on them as well. On Thu, Mar 8, 2018 at 6:24 PM, Jeff Jirsa wrote: > Clients can race (and go backward), so the more computer answer tends to > be to use LWT/CAS to guarantee state if you have a data model where it >

Re: Amazon Time Sync Service + ntpd vs chrony

2018-03-08 Thread Jeff Jirsa
Clients can race (and go backward), so the more computer answer tends to be to use LWT/CAS to guarantee state if you have a data model where it matters. -- Jeff Jirsa > On Mar 8, 2018, at 6:18 PM, Dor Laor wrote: > > While NTP on the servers is important, make sure that

Re: Amazon Time Sync Service + ntpd vs chrony

2018-03-08 Thread Dor Laor
While NTP on the servers is important, make sure that you use client timestamps and not server. Since the last write wins, the data generator should be the one setting its timestamp. On Thu, Mar 8, 2018 at 2:12 PM, Ben Slater wrote: > It is important to make sure you

Re: Cassandra/Spark failing to process large table

2018-03-08 Thread kurt greaves
Note that read repairs only occur for QUORUM/equivalent and higher, and also with a 10% (default) chance on anything less than QUORUM (ONE/LOCAL_ONE). This is configured at the table level through the dclocal_read_repair_chance and read_repair_chance settings (which are going away in 4.0). So if

Re: Amazon Time Sync Service + ntpd vs chrony

2018-03-08 Thread Ben Slater
It is important to make sure you are using the same NTP servers across your cluster - we used to see relatively frequent NTP issues across our fleet using default/public NTP servers until (back in 2015) we implemented our own NTP pool (see

Re: Amazon Time Sync Service + ntpd vs chrony

2018-03-08 Thread Michael Shuler
As long as your nodes are syncing time using the same method, that should be good. Don't mix daemons, however, since they may sync from different sources. Whether you use ntpd, openntp, ntpsec, chrony isn't really important, since they are all just background daemons to sync the system clock.

Re: Joining a cluster of nodes having multi valued initial_token parameters.

2018-03-08 Thread Oleksandr Shulgin
On Thu, Mar 8, 2018 at 1:41 PM, Mikhail Tsaplin wrote: > Thank you for the answer, are you sure that it at least safe? > I would test in a lab first of course, but I don't see why it should be a problem. I wonder more why did you have tokens listed explicitly on the

Re: Joining a cluster of nodes having multi valued initial_token parameters.

2018-03-08 Thread Mikhail Tsaplin
Thank you for the answer, are you sure that it at least safe? As I understand I will have to specify auto_bootstrap=true too? 2018-03-08 18:16 GMT+07:00 Oleksandr Shulgin : > On Thu, Mar 8, 2018 at 12:09 PM, Mikhail Tsaplin > wrote: > >> Hi, >>

Re: Joining a cluster of nodes having multi valued initial_token parameters.

2018-03-08 Thread Oleksandr Shulgin
On Thu, Mar 8, 2018 at 12:09 PM, Mikhail Tsaplin wrote: > Hi, > > I have a three node Cassandra cluster. Every node has initial_token > configuration parameter holding 256 tokens (looks like randomly > distributed). Now I have to add a fourth node. How could this be done? >

Joining a cluster of nodes having multi valued initial_token parameters.

2018-03-08 Thread Mikhail Tsaplin
Hi, I have a three node Cassandra cluster. Every node has initial_token configuration parameter holding 256 tokens (looks like randomly distributed). Now I have to add a fourth node. How could this be done? PS. Part of 'nodetool ring' output: 192.168.1.123 rack1 Up Normal 2.84

Amazon Time Sync Service + ntpd vs chrony

2018-03-08 Thread Kyrylo Lebediev
Hi! Recently Amazon announced launch of Amazon Time Sync Service (https://aws.amazon.com/blogs/aws/keeping-time-with-amazon-time-sync-service/) and now it's AWS-recommended way for time sync on EC2 instances (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html). It's stated

Re: Cassandra/Spark failing to process large table

2018-03-08 Thread Faraz Mateen
Hi Ben, That makes sense. I also read about "read repairs". So, once an inconsistent record is read, cassandra synchronizes its replicas on other nodes as well. I ran the same spark query again, this time with default consistency level (LOCAL_ONE) and the result was correct. Thanks again for the

Re: backup/restore cassandra data

2018-03-08 Thread Dipan Shah
Commitlog gets truncated once the relevant data is written to sstables. So you cant use to replay all the data stored in the node. Also, snapshots are not automatic. You need to run snapshot command on all the nodes of your cluster. Snapshots only get created automatically if you run a

Re: backup/restore cassandra data

2018-03-08 Thread onmstester onmstester
Thanks But is'nt there a method to restore the node as it was before the crash, like commitlog and every last data inserted? How often snapshots would be created? Shouldn't they be created manually by nodetool? I haven't created snapshots on the node! Sent using Zoho Mail On