sstable corruption and schema migration issues

2018-10-24 Thread David Payne
which versions of cassandra 2.x and 3.x are best for avoiding sstable 
corruption and schema migration slowness?

is this a "cassandra is not a set it and forget it system" concept?


RE: TWCS: Repair create new buckets with old data

2018-10-24 Thread Meg Mara
Hey Jon,

About table level TTL -> It wasn’t for optimization, just  a suggestion. The 
user had no table level TTL set. It was at default 0. So if an insert comes in 
with no TTL, that row would never expire. There is no default TTL to fall back 
on in his case. Just thinking about possible problems with that setup and data 
never expiring.

About inc repair -> Yup, I agree, I did some extensive research into that 
earlier this year. We’re just waiting for 4.0 as well. But for some of our 
larger clusters, doing a full repair, even if it’s just with the –pr option is 
just not feasible. It takes weeks to repair and is a huge load on the cluster.

So I’m continuing to use Inc repair and keeping an eye out for any issues. And 
I’m using –pr for the smaller clusters. But for tables with TWCS, I wouldn’t do 
a Full repair. Because it would include all the old and newer sstables in the 
operation and it is bound to cause new sstables with mixed data and may 
eventually lead to sstable blockers. It also kind of defeats the purpose of 
separating our data into neat TWCS windows. And therefore would take a while 
for tombstone cleanup too.

Thank you,
Meg


From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: Wednesday, October 24, 2018 1:42 PM
To: user 
Subject: Re: TWCS: Repair create new buckets with old data

Hey Meg, a couple thoughts.

>   Set a table level TTL with TWCS, and stop setting it with inserts/updates 
> (insert TTL overrides table level TTL). So, that your entire sstable expires 
> at the same time, as opposed to each insert expiring at its own pace. So that 
> for tombstone clean up, the system can just drop the entire sstable at once.

Setting the TTL on a table or the query gives you the same exact result.  
Setting it on the table is just there for convenience.  If it's not set at the 
query level, it uses the default value.  See 
org.apache.cassandra.cql3.Attributes#getTimeToLive.  Generally speaking I'd 
rather set it at the table level as well, but it's just to avoid weird 
application bugs, not as an optimization.

>   I’d suggest removing the -pr. Running incremental repair with TWCS is 
> better.

If incremental repair worked correctly I would agree with you, but 
unfortunately it doesn't.  Incremental repair has some subtle bugs that can 
result in massive overstreaming due to the fact that it will sometimes not 
correctly mark data as repaired.  My coworker Alex wrote up a good summary on 
the changes to incremental going into 4.0 to fix these issues, it's worth a 
read.  
http://thelastpickle.com/blog/2018/09/10/incremental-repair-improvements-in-cassandra-4.html.

Reaper (OSS, maintained by us @ TLP, see http://cassandra-reaper.io/) has the 
ability to schedule subrange repairs on one or more tables, or all tables 
except those in a blacklist.  Doing frequent subrange repairs will limit the 
amount of data that will get streamed in and should help keep things pretty 
consistent unless you're dropping a lot of mutations.  It's not perfect but 
should cause less headache than incremental repair will.

Hope this helps.
Jon



On Thu, Oct 25, 2018 at 4:21 AM Meg Mara 
mailto:mm...@digitalriver.com>> wrote:
Hi Maik,

I have a similar Cassandra env, with similar table requirements. So these would 
be my suggestions:


•   Set a table level TTL with TWCS, and stop setting it with 
inserts/updates (insert TTL overrides table level TTL). So, that your entire 
sstable expires at the same time, as opposed to each insert expiring at its own 
pace. So that for tombstone clean up, the system can just drop the entire 
sstable at once.

•   Since you’re on v3.0.9, nodetool repair command runs incremental repair 
by default. And with inc repair, -pr option is not recommended. (ref. link 
below)

•   I’d suggest removing the -pr. Running incremental repair with TWCS is 
better.

•   Here’s why I think so -> Full repair and Full repair with –PR option 
would include all  the sstables in the repair process, which means the chance 
of your oldest and newest data mixing is very high.

•   Whereas, if you run incremental repair every 5 days for example, only 
the last five days of data would be included in that repair operation. So, the 
maximum ‘damage’ it would do is mixing 5 day old data in a new sstable.

•   Your table level TTL would then tombstone this data on 4 month + 5 day 
mark instead of on the 4 month mark. Which shouldn’t be a big concern. At least 
in our case it isn’t!

•   I wouldn’t stop running repairs on our TWCS tables, because we are too 
concerned with data consistency.


Please read the note here:
https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsRepair.html


Thank you,
Meg


From: Caesar, Maik [mailto:maik.cae...@dxc.com]
Sent: Wednesday, October 24, 2018 2:17 AM
To: user@cassandra.apache.org
Subject: RE: TWCS: Repair create new buckets with old data

Hi Meg,
the ttl (4 

Re: Cassandra trace

2018-10-24 Thread Nate McCall
At this point, query tracing is easier to do from the driver side.
Docs for python and java:
http://datastax.github.io/python-driver/api/cassandra/query.html#
https://github.com/datastax/java-driver/tree/3.x/manual/logging#logging-query-latencies

This has been completely redone in 4.0. For details (which also
include some good discussion on the current limitations) see:
https://issues.apache.org/jira/browse/CASSANDRA-13983
https://issues.apache.org/jira/browse/CASSANDRA-12151
On Tue, Oct 23, 2018 at 5:10 PM Mun Dega  wrote:
>
> Hello,
>
> Does anyone know how I can see queries coming when they're as prepared 
> statements when trace is turned on Cassandra 3.x?
>
> If trace doesn't show, any ideas how I can see these type of queries?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Cassandra running Multiple JVM's

2018-10-24 Thread Jonathan Haddad
Another issue you'll need to consider is how the JVM allocates resources
towards GC, especially if you're using G1 with a pause time goal.
Specifically, if you let it pick it's own numbers for ParallelGCThreads &
ConcGCThreads they'll be based on the total number of CPUs, not the number
you've restricted access to, and you'll end up with GC taking way longer
than it should.  Anything that auto sizes based on available cores will
also be affected.  Set -XX:ParallelGCThreads=n and -XX:ConcGCThreads=n to
be no more than the number of cores you're going to allocate to each JVM
(by default CMS uses ConcGCThreads=8+(# proc-8)*(5/8)).

This is also a problem when working with containers, the JVM sees all the
cores not the resource limit.

Some other good reading on the topic:
https://xmlandmore.blogspot.com/2014/06/oracle-has-published-some-tuning-guides.html

Jon


On Thu, Oct 25, 2018 at 7:51 AM Jeff Jirsa  wrote:

> I don't have time to reply to your stackoverflow post, but what you
> proposed is a great idea for a server that size.
>
> You can use taskset or numactl to bind each JVM to the appropriate
> cores/zones.
> Setup a data directory on each SSD for the data
>
> There are two caveats you need to think about:
>
> 1) You'll need an IP per instance - so bind a second IP to each NIC and
> you should be good to go.
> 2) You need to make sure you dont have 2 replicas on the same host.
> Cassandra has "rack aware" snitches that can help with this, but you need
> to make sure that you treat each server as it's own rack to force replicas
> to land on different physical machines.
>
> If you do both of those things, you'll probably be happy with the result.
>
>
>
>
> On Wed, Oct 24, 2018 at 12:33 PM Bobbie Haynes 
> wrote:
>
>>  I have three Physical servers. I want to run cassandra on multiple JVM's
>> i.e each physical node contains 2 cassandra nodes so that i could able to
>> run 6 node cluster.Could anyone help me pointing setup guide.
>>
>> Each Physical node Configuration:- RAM -256 GB --- I want to assign each
>> JVM(64GB) cores --64 --- 32 cores each node. 6TB SSD .. seperate SSD disk
>> for each node 3TB each.
>>
>>
>>
>> https://stackoverflow.com/questions/52976646/cassandra-running-multiple-jvms
>>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: Cassandra 4.0

2018-10-24 Thread Nate McCall
When it's ready :)

In all seriousness, the past two blog posts include some discussion on
our motivations and current goals with regard to 4.0:
http://cassandra.apache.org/blog/
On Wed, Oct 24, 2018 at 4:49 AM Abdul Patel  wrote:
>
> Hi all,
>
> Any idea when 4.0 is planned to release?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Cassandra running Multiple JVM's

2018-10-24 Thread Jeff Jirsa
I don't have time to reply to your stackoverflow post, but what you
proposed is a great idea for a server that size.

You can use taskset or numactl to bind each JVM to the appropriate
cores/zones.
Setup a data directory on each SSD for the data

There are two caveats you need to think about:

1) You'll need an IP per instance - so bind a second IP to each NIC and you
should be good to go.
2) You need to make sure you dont have 2 replicas on the same host.
Cassandra has "rack aware" snitches that can help with this, but you need
to make sure that you treat each server as it's own rack to force replicas
to land on different physical machines.

If you do both of those things, you'll probably be happy with the result.




On Wed, Oct 24, 2018 at 12:33 PM Bobbie Haynes 
wrote:

>  I have three Physical servers. I want to run cassandra on multiple JVM's
> i.e each physical node contains 2 cassandra nodes so that i could able to
> run 6 node cluster.Could anyone help me pointing setup guide.
>
> Each Physical node Configuration:- RAM -256 GB --- I want to assign each
> JVM(64GB) cores --64 --- 32 cores each node. 6TB SSD .. seperate SSD disk
> for each node 3TB each.
>
>
>
> https://stackoverflow.com/questions/52976646/cassandra-running-multiple-jvms
>


Cassandra running Multiple JVM's

2018-10-24 Thread Bobbie Haynes
 I have three Physical servers. I want to run cassandra on multiple JVM's
i.e each physical node contains 2 cassandra nodes so that i could able to
run 6 node cluster.Could anyone help me pointing setup guide.

Each Physical node Configuration:- RAM -256 GB --- I want to assign each
JVM(64GB) cores --64 --- 32 cores each node. 6TB SSD .. seperate SSD disk
for each node 3TB each.


https://stackoverflow.com/questions/52976646/cassandra-running-multiple-jvms


Re: TWCS: Repair create new buckets with old data

2018-10-24 Thread Jonathan Haddad
Hey Meg, a couple thoughts.

>   Set a table level TTL with TWCS, and stop setting it with
inserts/updates (insert TTL overrides table level TTL). So, that your
entire sstable expires at the same time, as opposed to each insert expiring
at its own pace. So that for tombstone clean up, the system can just drop
the entire sstable at once.

Setting the TTL on a table or the query gives you the same exact result.
Setting it on the table is just there for convenience.  If it's not set at
the query level, it uses the default value.
See org.apache.cassandra.cql3.Attributes#getTimeToLive.  Generally speaking
I'd rather set it at the table level as well, but it's just to avoid weird
application bugs, not as an optimization.

>   I’d suggest removing the -pr. Running incremental repair with TWCS is
better.

If incremental repair worked correctly I would agree with you, but
unfortunately it doesn't.  Incremental repair has some subtle bugs that can
result in massive overstreaming due to the fact that it will sometimes not
correctly mark data as repaired.  My coworker Alex wrote up a good summary
on the changes to incremental going into 4.0 to fix these issues, it's
worth a read.
http://thelastpickle.com/blog/2018/09/10/incremental-repair-improvements-in-cassandra-4.html
.

Reaper (OSS, maintained by us @ TLP, see http://cassandra-reaper.io/) has
the ability to schedule subrange repairs on one or more tables, or all
tables except those in a blacklist.  Doing frequent subrange repairs will
limit the amount of data that will get streamed in and should help keep
things pretty consistent unless you're dropping a lot of mutations.  It's
not perfect but should cause less headache than incremental repair will.

Hope this helps.
Jon



On Thu, Oct 25, 2018 at 4:21 AM Meg Mara  wrote:

> Hi Maik,
>
>
>
> I have a similar Cassandra env, with similar table requirements. So these
> would be my suggestions:
>
>
>
> ·   Set a table level TTL with TWCS, and stop setting it with
> inserts/updates (insert TTL overrides table level TTL). So, that your
> entire sstable expires at the same time, as opposed to each insert expiring
> at its own pace. So that for tombstone clean up, the system can just drop
> the entire sstable at once.
>
> ·   Since you’re on v3.0.9, nodetool repair command runs incremental
> repair by default. And with inc repair, -pr option is not recommended.
> (ref. link below)
>
> ·   I’d suggest removing the -pr. Running incremental repair with
> TWCS is better.
>
> ·   Here’s why I think so -> Full repair and Full repair with –PR
> option would include all  the sstables in the repair process, which means
> the chance of your oldest and newest data mixing is very high.
>
> ·   Whereas, if you run incremental repair every 5 days for example,
> only the last five days of data would be included in that repair operation.
> So, the maximum ‘damage’ it would do is mixing 5 day old data in a new
> sstable.
>
> ·   Your table level TTL would then tombstone this data on 4 month +
> 5 day mark instead of on the 4 month mark. Which shouldn’t be a big
> concern. At least in our case it isn’t!
>
> ·   I wouldn’t stop running repairs on our TWCS tables, because we
> are too concerned with data consistency.
>
>
>
>
>
> Please read the note here:
>
> https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsRepair.html
>
>
>
>
>
> Thank you,
>
> *Meg*
>
>
>
>
>
> *From:* Caesar, Maik [mailto:maik.cae...@dxc.com]
> *Sent:* Wednesday, October 24, 2018 2:17 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: TWCS: Repair create new buckets with old data
>
>
>
> Hi Meg,
>
> the ttl (4 month) is set during insert via insert statement with the
> application.
>
> The repair is started each day on one of ten hosts with command : nodetool
> --host hostname_# repair –pr
>
>
>
> Regards
>
> Maik
>
>
>
> *From:* Meg Mara [mailto:mm...@digitalriver.com ]
> *Sent:* Dienstag, 23. Oktober 2018 17:05
> *To:* user@cassandra.apache.org
> *Subject:* RE: TWCS: Repair create new buckets with old data
>
>
>
> Hi Maik,
>
>
>
> I noticed in your table description that your default_time_to_live = 0,
> so where is your TTL property set? At what point do your sstables get
> tombstoned?
>
>
>
> Also, could you please mention what kind of repair you performed on this
> table? (Incremental, Full, Full repair with -pr option)
>
>
>
> Thank you,
>
> *Meg*
>
>
>
>
>
> *From:* Caesar, Maik [mailto:maik.cae...@dxc.com ]
> *Sent:* Monday, October 22, 2018 10:17 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: TWCS: Repair create new buckets with old data
>
>
>
> Ok, thanks.
>
> My conclusion:
>
> 1.   I will set unchecked_tombstone_compaction to true to get old
> data with tombstones removed
>
> 2.   I will exclude TWCS tables from repair
>
>
>
> Regarding exclude table from repair, is there any easy way to do this?
> Nodetool repaire do not support excludes.
>
>
>
> Regards
>
> Maik
>
>
>
> *From:* wxn...@zjqunshuo.com 

RE: TWCS: Repair create new buckets with old data

2018-10-24 Thread Meg Mara
Hi Maik,

I have a similar Cassandra env, with similar table requirements. So these would 
be my suggestions:


·   Set a table level TTL with TWCS, and stop setting it with 
inserts/updates (insert TTL overrides table level TTL). So, that your entire 
sstable expires at the same time, as opposed to each insert expiring at its own 
pace. So that for tombstone clean up, the system can just drop the entire 
sstable at once.

·   Since you’re on v3.0.9, nodetool repair command runs incremental repair 
by default. And with inc repair, -pr option is not recommended. (ref. link 
below)

·   I’d suggest removing the -pr. Running incremental repair with TWCS is 
better.

·   Here’s why I think so -> Full repair and Full repair with –PR option 
would include all  the sstables in the repair process, which means the chance 
of your oldest and newest data mixing is very high.

·   Whereas, if you run incremental repair every 5 days for example, only 
the last five days of data would be included in that repair operation. So, the 
maximum ‘damage’ it would do is mixing 5 day old data in a new sstable.

·   Your table level TTL would then tombstone this data on 4 month + 5 day 
mark instead of on the 4 month mark. Which shouldn’t be a big concern. At least 
in our case it isn’t!

·   I wouldn’t stop running repairs on our TWCS tables, because we are too 
concerned with data consistency.


Please read the note here:
https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsRepair.html


Thank you,
Meg


From: Caesar, Maik [mailto:maik.cae...@dxc.com]
Sent: Wednesday, October 24, 2018 2:17 AM
To: user@cassandra.apache.org
Subject: RE: TWCS: Repair create new buckets with old data

Hi Meg,
the ttl (4 month) is set during insert via insert statement with the 
application.
The repair is started each day on one of ten hosts with command : nodetool 
--host hostname_# repair –pr

Regards
Maik

From: Meg Mara [mailto:mm...@digitalriver.com]
Sent: Dienstag, 23. Oktober 2018 17:05
To: user@cassandra.apache.org
Subject: RE: TWCS: Repair create new buckets with old data

Hi Maik,

I noticed in your table description that your default_time_to_live = 0, so 
where is your TTL property set? At what point do your sstables get tombstoned?

Also, could you please mention what kind of repair you performed on this table? 
(Incremental, Full, Full repair with -pr option)

Thank you,
Meg


From: Caesar, Maik [mailto:maik.cae...@dxc.com]
Sent: Monday, October 22, 2018 10:17 AM
To: user@cassandra.apache.org
Subject: RE: TWCS: Repair create new buckets with old data

Ok, thanks.
My conclusion:

1.   I will set unchecked_tombstone_compaction to true to get old data with 
tombstones removed

2.   I will exclude TWCS tables from repair

Regarding exclude table from repair, is there any easy way to do this?  
Nodetool repaire do not support excludes.

Regards
Maik

From: wxn...@zjqunshuo.com 
[mailto:wxn...@zjqunshuo.com]
Sent: Freitag, 19. Oktober 2018 03:58
To: user mailto:user@cassandra.apache.org>>
Subject: RE: TWCS: Repair create new buckets with old data

> Is the repair not necessary to get data files remove from filesystem ?
The answer is no. IMO, Cassandra will remove sstable files automatically if it 
can make sure the sstable files are 100% of tombstones and safe to do deletion. 
If you use TWCS and you have only insertion and no update, you don't need run 
repair manually.

-Simon

From: Caesar, Maik
Date: 2018-10-18 20:30
To: user@cassandra.apache.org
Subject: RE: TWCS: Repair create new buckets with old data
Hello Simon,
Is the repair not necessary to get data files remove from filesystem ? My 
assumption was, that only repaired data will removed after TTL is reached.

Regards
Maik

From: wxn...@zjqunshuo.com 
[mailto:wxn...@zjqunshuo.com]
Sent: Mittwoch, 17. Oktober 2018 09:02
To: user mailto:user@cassandra.apache.org>>
Subject: Re: TWCS: Repair create new buckets with old data

Hi Maik,
IMO, when using TWCS, you had better not run repair. The behaviour of TWCS is 
same with STCS for repair when merging sstables and the result is leaving 
sstables spanning multiple time buckets, but maybe I'm wrong. In my use case, I 
don't do repair with table using TWCS.

-Simon

From: Caesar, Maik
Date: 2018-10-16 17:46
To: user@cassandra.apache.org
Subject: TWCS: Repair create new buckets with old data
Hallo,
we work with Cassandra version 3.0.9 and have a problem in a table with TWCS. 
The command “nodetool repair” create always new files with old data. This avoid 
the delete of the old data.
The layout of the Table is following:
cqlsh> desc stat.spa

CREATE TABLE stat.spa (
region int,
id int,
date text,
hour int,
zippedjsonstring blob,

Re: Cassandra: Inconsistent data on reads (LOCAL_QUORUM)

2018-10-24 Thread Naik, Ninad
Mick, sorry I think I missed your following questions:


- SPECULATIVE_RETRY='ALWAYS'

We saw this issue a couple of times a few years ago. That's why we introduced 
this change. Although at that time, it was on cassandra 1.x version.


- Topology changes:

The only change we did was that we added nodes for 2 months. But that process 
finished 1 month ago. Since then we've run multiple repairs on the table. Other 
than that no other topology changes were done.


From: Mick Semb Wever 
Sent: Wednesday, October 24, 2018 4:12:57 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra: Inconsistent data on reads (LOCAL_QUORUM)

[ This email has been sent from a source external to Epsilon. Please use 
caution when clicking links or opening attachments. ]

Ninad,

> Here's a bit more information:
>
> -Few rows in this column family can grow quite wide (> 100K columns)
>
> -But we keep seeing this behavior most frequently with rows with just 1 or 
> two columns . The typical behavior is: Machine A adds a new row and a column. 
> 30-60 seconds later Machine B tries to read this row. It doesn't find the 
> row. So the application retries within 500ms. This time it finds the row.


You wrote a lot of useful info in your original post, sorry I missed
it in my first reply.
Only thing there that stands out, apart from short reads that Jeff's
already pointed out, is the use of `speculative_retry='ALWAYS'`. Has
there topology changes in your cluster recently?

Next step would be to try and repeat it with tracing.

regards,
Mick


--
Mick Semb Wever
Australia

The Last Pickle
Apache Cassandra Consulting
http://www.thelastpickle.com

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


The information contained in this e-mail message and any attachments may be 
privileged and confidential. If the reader of this message is not the intended 
recipient or an agent responsible for delivering it to the intended recipient, 
you are hereby notified that any review, dissemination, distribution or copying 
of this communication is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately by replying to 
this e-mail and delete the message and any attachments from your computer.


Re: Cassandra: Inconsistent data on reads (LOCAL_QUORUM)

2018-10-24 Thread Naik, Ninad
Thanks  Mick.


Yeah we are planning to try with tracing and by enabling trace level logs for a 
short duration. I will update this thread with the related details.


One other thing we verified is that these partial reads happen all across the 
cluster. It's not limited to certain cassandra nodes.


Meanwhile, can following scenarios (however unlikely they might seem) ever 
happen?


- Can a bloom filter malfunction (Ex: For some reason, bloom filter says to 
skip a SSTable which has the key)


- Can the co-ordinator ever send a read request to a wrong replica (Ex: for 
some reason it calculated a wrong decorated key)?


- Can a replica ever drop a read in case of an exception?


- Can read timeouts for other wide rows (we've a few rows with > 100k cols) 
impact the entire cluster to behave in a delayed manner?


Thanks again for all of your help !



From: Mick Semb Wever 
Sent: Wednesday, October 24, 2018 4:12:57 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra: Inconsistent data on reads (LOCAL_QUORUM)

[ This email has been sent from a source external to Epsilon. Please use 
caution when clicking links or opening attachments. ]

Ninad,

> Here's a bit more information:
>
> -Few rows in this column family can grow quite wide (> 100K columns)
>
> -But we keep seeing this behavior most frequently with rows with just 1 or 
> two columns . The typical behavior is: Machine A adds a new row and a column. 
> 30-60 seconds later Machine B tries to read this row. It doesn't find the 
> row. So the application retries within 500ms. This time it finds the row.


You wrote a lot of useful info in your original post, sorry I missed
it in my first reply.
Only thing there that stands out, apart from short reads that Jeff's
already pointed out, is the use of `speculative_retry='ALWAYS'`. Has
there topology changes in your cluster recently?

Next step would be to try and repeat it with tracing.

regards,
Mick


--
Mick Semb Wever
Australia

The Last Pickle
Apache Cassandra Consulting
http://www.thelastpickle.com

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


The information contained in this e-mail message and any attachments may be 
privileged and confidential. If the reader of this message is not the intended 
recipient or an agent responsible for delivering it to the intended recipient, 
you are hereby notified that any review, dissemination, distribution or copying 
of this communication is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately by replying to 
this e-mail and delete the message and any attachments from your computer.


Re: java.lang.AssertionError: Memory was freed during index rebuild

2018-10-24 Thread Jeff Jirsa
3.5 is probably not a version you should be using in production in 2018 - it 
was a feature release and has had no bug fixes for years. Going up to 3.11.3 
will likely fix many serious bugs you’re not noticing, and maybe the bug below 
you are noticing 


-- 
Jeff Jirsa


> On Oct 24, 2018, at 5:03 AM, Mark Bryant  wrote:
> 
> Has anyone seen this error or might have an idea what is causing this?
> 
> Version: 3.5
> 
> java.lang.AssertionError: Memory was freed
>at 
> org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:103)
>at org.apache.cassandra.io.util.Memory.getLong(Memory.java:260)
>at 
> org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:235)
>at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap(CompressedRandomAccessReader.java:170)
>at 
> org.apache.cassandra.io.util.RandomAccessReader.reBuffer(RandomAccessReader.java:111)
>at 
> org.apache.cassandra.io.util.RebufferingInputStream.read(RebufferingInputStream.java:88)
>at 
> org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:66)
>at 
> org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60)
>at 
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:400)
>at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithVIntLength(ByteBufferUtil.java:338)
>at 
> org.apache.cassandra.db.marshal.AbstractType.readValue(AbstractType.java:391)
>at 
> org.apache.cassandra.db.ClusteringPrefix$Deserializer.deserializeOne(ClusteringPrefix.java:480)
>at 
> org.apache.cassandra.db.ClusteringPrefix$Deserializer.deserializeAll(ClusteringPrefix.java:486)
>at 
> org.apache.cassandra.db.ClusteringPrefix$Deserializer.deserializeNextClustering(ClusteringPrefix.java:502)
>at 
> org.apache.cassandra.db.UnfilteredDeserializer$CurrentDeserializer.readNext(UnfilteredDeserializer.java:210)
>at 
> org.apache.cassandra.db.columniterator.SSTableIterator$ForwardReader.computeNext(SSTableIterator.java:125)
>at 
> org.apache.cassandra.db.columniterator.SSTableIterator$ForwardReader.hasNextInternal(SSTableIterator.java:149)
>at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator$Reader.hasNext(AbstractSSTableIterator.java:354)
>at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator.hasNext(AbstractSSTableIterator.java:229)
>at 
> org.apache.cassandra.db.columniterator.SSTableIterator.hasNext(SSTableIterator.java:32)
>at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)
>at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:93)
>at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:25)
>at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
>at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
>at 
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:419)
>at 
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:279)
>at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
>at 
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:112)
>at 
> org.apache.cassandra.index.SecondaryIndexManager.indexPartition(SecondaryIndexManager.java:549)
>at org.apache.cassandra.db.Keyspace.indexPartition(Keyspace.java:570)
>at 
> org.apache.cassandra.index.internal.CollatedViewIndexBuilder.build(CollatedViewIndexBuilder.java:70)
>at 
> org.apache.cassandra.db.compaction.CompactionManager$12.run(CompactionManager.java:1472)
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>at java.lang.Thread.run(Thread.java:748)
> 
> I ran `nodetool rebuild_index -- keyspace table index` on 6 nodes of a 6 node 
> cluster. After several hours of processing, it failed on 3 of the nodes with 
> the following additional call 

java.lang.AssertionError: Memory was freed during index rebuild

2018-10-24 Thread Mark Bryant
Has anyone seen this error or might have an idea what is causing this?

Version: 3.5

java.lang.AssertionError: Memory was freed
at 
org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:103)
at org.apache.cassandra.io.util.Memory.getLong(Memory.java:260)
at 
org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:235)
at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap(CompressedRandomAccessReader.java:170)
at 
org.apache.cassandra.io.util.RandomAccessReader.reBuffer(RandomAccessReader.java:111)
at 
org.apache.cassandra.io.util.RebufferingInputStream.read(RebufferingInputStream.java:88)
at 
org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:66)
at 
org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60)
at 
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:400)
at 
org.apache.cassandra.utils.ByteBufferUtil.readWithVIntLength(ByteBufferUtil.java:338)
at 
org.apache.cassandra.db.marshal.AbstractType.readValue(AbstractType.java:391)
at 
org.apache.cassandra.db.ClusteringPrefix$Deserializer.deserializeOne(ClusteringPrefix.java:480)
at 
org.apache.cassandra.db.ClusteringPrefix$Deserializer.deserializeAll(ClusteringPrefix.java:486)
at 
org.apache.cassandra.db.ClusteringPrefix$Deserializer.deserializeNextClustering(ClusteringPrefix.java:502)
at 
org.apache.cassandra.db.UnfilteredDeserializer$CurrentDeserializer.readNext(UnfilteredDeserializer.java:210)
at 
org.apache.cassandra.db.columniterator.SSTableIterator$ForwardReader.computeNext(SSTableIterator.java:125)
at 
org.apache.cassandra.db.columniterator.SSTableIterator$ForwardReader.hasNextInternal(SSTableIterator.java:149)
at 
org.apache.cassandra.db.columniterator.AbstractSSTableIterator$Reader.hasNext(AbstractSSTableIterator.java:354)
at 
org.apache.cassandra.db.columniterator.AbstractSSTableIterator.hasNext(AbstractSSTableIterator.java:229)
at 
org.apache.cassandra.db.columniterator.SSTableIterator.hasNext(SSTableIterator.java:32)
at 
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:93)
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:25)
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:419)
at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:279)
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:112)
at 
org.apache.cassandra.index.SecondaryIndexManager.indexPartition(SecondaryIndexManager.java:549)
at org.apache.cassandra.db.Keyspace.indexPartition(Keyspace.java:570)
at 
org.apache.cassandra.index.internal.CollatedViewIndexBuilder.build(CollatedViewIndexBuilder.java:70)
at 
org.apache.cassandra.db.compaction.CompactionManager$12.run(CompactionManager.java:1472)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

I ran `nodetool rebuild_index -- keyspace table index` on 6 nodes of a 6 node 
cluster. After several hours of processing, it failed on 3 of the nodes with 
the following additional call stacks:

java.lang.AssertionError: Memory was freed
at 
org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:103)
at org.apache.cassandra.io.util.Memory.getLong(Memory.java:260)
at 
org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:235)
at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap(CompressedRandomAccessReader.java:170)
at 

Insert from Select - CQL

2018-10-24 Thread Philip Ó Condúin
Hi All,

I have a problem that I'm trying to work out and can't find anything online
that may help me.

I have been asked to delete 4K records from a Column Family that has a
total of 1.8 million rows.  I have been given an excel spreadsheet with a
list of the 4K PRIMARY KEY numbers to be deleted.  Great, the delete will
be easy anyway.

But before I delete them I want to take a backup of what I'm deleting
before I do, so that if the customer comes along and says they got the
wrong numbers then I can quickly restore one or all of them.
I have been trying to figure out how I can generate inserts from a select
but it looks like this is not possible.

I'm using centos and Cassandra 2.11

Does anyone have any ideas of what I can do to generate inserts based on
primary key numbers in an excel spreadsheet?

Kind Regards,
Phil


Re: Cassandra: Inconsistent data on reads (LOCAL_QUORUM)

2018-10-24 Thread Mick Semb Wever
Ninad,

> Here's a bit more information:
>
> -Few rows in this column family can grow quite wide (> 100K columns)
>
> -But we keep seeing this behavior most frequently with rows with just 1 or 
> two columns . The typical behavior is: Machine A adds a new row and a column. 
> 30-60 seconds later Machine B tries to read this row. It doesn't find the 
> row. So the application retries within 500ms. This time it finds the row.


You wrote a lot of useful info in your original post, sorry I missed
it in my first reply.
Only thing there that stands out, apart from short reads that Jeff's
already pointed out, is the use of `speculative_retry='ALWAYS'`. Has
there topology changes in your cluster recently?

Next step would be to try and repeat it with tracing.

regards,
Mick


-- 
Mick Semb Wever
Australia

The Last Pickle
Apache Cassandra Consulting
http://www.thelastpickle.com

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



RE: TWCS: Repair create new buckets with old data

2018-10-24 Thread Caesar, Maik
Hi Meg,
the ttl (4 month) is set during insert via insert statement with the 
application.
The repair is started each day on one of ten hosts with command : nodetool 
--host hostname_# repair –pr

Regards
Maik

From: Meg Mara [mailto:mm...@digitalriver.com]
Sent: Dienstag, 23. Oktober 2018 17:05
To: user@cassandra.apache.org
Subject: RE: TWCS: Repair create new buckets with old data

Hi Maik,

I noticed in your table description that your default_time_to_live = 0, so 
where is your TTL property set? At what point do your sstables get tombstoned?

Also, could you please mention what kind of repair you performed on this table? 
(Incremental, Full, Full repair with -pr option)

Thank you,
Meg


From: Caesar, Maik [mailto:maik.cae...@dxc.com]
Sent: Monday, October 22, 2018 10:17 AM
To: user@cassandra.apache.org
Subject: RE: TWCS: Repair create new buckets with old data

Ok, thanks.
My conclusion:

1.  I will set unchecked_tombstone_compaction to true to get old data with 
tombstones removed

2.  I will exclude TWCS tables from repair

Regarding exclude table from repair, is there any easy way to do this?  
Nodetool repaire do not support excludes.

Regards
Maik

From: wxn...@zjqunshuo.com 
[mailto:wxn...@zjqunshuo.com]
Sent: Freitag, 19. Oktober 2018 03:58
To: user mailto:user@cassandra.apache.org>>
Subject: RE: TWCS: Repair create new buckets with old data

> Is the repair not necessary to get data files remove from filesystem ?
The answer is no. IMO, Cassandra will remove sstable files automatically if it 
can make sure the sstable files are 100% of tombstones and safe to do deletion. 
If you use TWCS and you have only insertion and no update, you don't need run 
repair manually.

-Simon

From: Caesar, Maik
Date: 2018-10-18 20:30
To: user@cassandra.apache.org
Subject: RE: TWCS: Repair create new buckets with old data
Hello Simon,
Is the repair not necessary to get data files remove from filesystem ? My 
assumption was, that only repaired data will removed after TTL is reached.

Regards
Maik

From: wxn...@zjqunshuo.com 
[mailto:wxn...@zjqunshuo.com]
Sent: Mittwoch, 17. Oktober 2018 09:02
To: user mailto:user@cassandra.apache.org>>
Subject: Re: TWCS: Repair create new buckets with old data

Hi Maik,
IMO, when using TWCS, you had better not run repair. The behaviour of TWCS is 
same with STCS for repair when merging sstables and the result is leaving 
sstables spanning multiple time buckets, but maybe I'm wrong. In my use case, I 
don't do repair with table using TWCS.

-Simon

From: Caesar, Maik
Date: 2018-10-16 17:46
To: user@cassandra.apache.org
Subject: TWCS: Repair create new buckets with old data
Hallo,
we work with Cassandra version 3.0.9 and have a problem in a table with TWCS. 
The command “nodetool repair” create always new files with old data. This avoid 
the delete of the old data.
The layout of the Table is following:
cqlsh> desc stat.spa

CREATE TABLE stat.spa (
region int,
id int,
date text,
hour int,
zippedjsonstring blob,
PRIMARY KEY ((region, id), date, hour)
) WITH CLUSTERING ORDER BY (date ASC, hour ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 
'max_threshold': '100', 'min_threshold': '4', 'tombstone_compaction_interval': 
'86460'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

Actual the oldest data are from 2017/04/15 and will not remove:

$ for f in *Data.db; do meta=$(sudo sstablemetadata $f); echo -e "Max:" $(date 
--date=@$(echo "$meta" | grep Maximum\ time | cut -d" "  -f3| cut -c 1-10) 
'+%Y/%m/%d %H:%M') "Min:" $(date --date=@$(echo "$meta" | grep Minimum\ time | 
cut -d" "  -f3| cut -c 1-10) '+%Y/%m/%d %H:%M') $(echo "$meta" | grep 
droppable) $(echo "$meta" | grep "Repaired at") ' \t ' $(ls -lh $f | awk 
'{print $5" "$6" "$7" "$8" "$9}'); done | sort
Max: 2017/04/15 12:08 Min: 2017/03/31 13:09 Estimated droppable tombstones: 
1.7731048805815162 Repaired at: 1525685601400 42K May 7 19:56 
mc-22922-big-Data.db
Max: 2017/04/17 13:49 Min: 2017/03/31 13:09 Estimated droppable tombstones: 
1.9600207684319835 Repaired at: 1525685601400 116M May 7 13:31 
mc-15096-big-Data.db
Max: 2017/04/21 13:43 Min: