Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Kevin Burton
We get lots of write timeouts when we decommission a node.  About 80% of
them are write timeout and just about 20% of them are read timeout.

We’ve tried to adjust streamthroughput (and compaction throughput) for that
matter and that doesn’t resolve the issue.

We’ve increased write_request_timeout_in_ms … and read timeout as well.

Is there anything else I should be looking at?

I can’t seem to find the documentation that explains what the heck is
happening.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Re: Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Kevin Burton
Looks like all of this is happening because we’re using CAS operations and
the driver is going to SERIAL consistency level.

SERIAL and LOCAL_SERIAL write failure scenarios¶

 http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html?scroll=concept_ds_umf_5xx_zj__failure-scenariosIf
 one of three nodes is down, the Paxos commit fails under the following
 conditions:

- CQL query-configured consistency level of ALL


- Driver-configured serial consistency level of SERIAL


- Replication factor of 3


I don’t understand why this would fail.. it seems completely broken in this
situation.

We were having write timeout at replication factor of 2 .. and a lot of
people from the list said of course , because 2 nodes with 1 node down
means there’s no quorum and paxos needs a quorum.  .. and not sure why I
missed that :-P

So we went with 3 replicas, and a quorum,

but this is new and I didn’t see this documented.  We set the driver to
QUORUM but then I guess the driver sees that this is a CAS operation and
forces it back to SERIAL?  Doesn’t this mean that all decommissions result
in failures of CAS?

This is Cassandra 2.0.9 btw.


On Wed, Jul 1, 2015 at 2:22 PM, Kevin Burton bur...@spinn3r.com wrote:

 We get lots of write timeouts when we decommission a node.  About 80% of
 them are write timeout and just about 20% of them are read timeout.

 We’ve tried to adjust streamthroughput (and compaction throughput) for
 that matter and that doesn’t resolve the issue.

 We’ve increased write_request_timeout_in_ms … and read timeout as well.

 Is there anything else I should be looking at?

 I can’t seem to find the documentation that explains what the heck is
 happening.

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Re: Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Robert Coli
On Wed, Jul 1, 2015 at 2:58 PM, Kevin Burton bur...@spinn3r.com wrote:

 Looks like all of this is happening because we’re using CAS operations and
 the driver is going to SERIAL consistency level.
 ...
 This is Cassandra 2.0.9 btw.


 https://issues.apache.org/jira/browse/CASSANDRA-8640

=Rob
(credit to iamaleksey on IRC for remembering the JIRA #)


Re: Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Kevin Burton
WOW.. nice. you rock!!

On Wed, Jul 1, 2015 at 3:18 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jul 1, 2015 at 2:58 PM, Kevin Burton bur...@spinn3r.com wrote:

 Looks like all of this is happening because we’re using CAS operations
 and the driver is going to SERIAL consistency level.
 ...
 This is Cassandra 2.0.9 btw.


  https://issues.apache.org/jira/browse/CASSANDRA-8640

 =Rob
 (credit to iamaleksey on IRC for remembering the JIRA #)




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Missing data

2015-06-15 Thread Jean Tremblay
Hi,

I have reloaded the data in my cluster of 3 nodes RF: 2.
I have loaded about 2 billion rows in one table.
I use LeveledCompactionStrategy on my table.
I use version 2.1.6.
I use the default cassandra.yaml, only the ip address for seeds and throughput 
has been change.

I loaded my data with simple insert statements. This took a bit more than one 
day to load the data… and one more day to compact the data on all nodes.
For me this is quite acceptable since I should not be doing this again.
I have done this with previous versions like 2.1.3 and others and I basically 
had absolutely no problems.

Now I read the log files on the client side, there I see no warning and no 
errors.
On the nodes side there I see many WARNING, all related with tombstones, but 
there are no ERRORS.

My problem is that I see some *many missing records* in the DB, and I have 
never observed this with previous versions.

1) Is this a know problem?
2) Do you have any idea how I could track down this problem?
3) What is the meaning of this WARNING (the only type of ERROR | WARN  I could 
find)?

WARN  [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - 
Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for 
key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, 
slices=[388:201001-388:201412:!]


4) Is it possible to have Tombstone when we make no DELETE statements?

I’m lost…

Thanks for your help.


Re: Missing data

2015-06-15 Thread Carlos Rolo
Hi Jean,

The problem of that Warning is that you are reading too many tombstones per
request.

If you do have Tombstones without doing DELETE it because you probably
TTL'ed the data when inserting (By mistake? Or did you set
default_time_to_live in your table?). You can use nodetool cfstats to see
how many tombstones per read slice you have. This is, probably, also the
cause of your missing data. Data was tombstoned, so it is not available.



Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay 
jean.tremb...@zen-innovations.com wrote:

  Hi,

  I have reloaded the data in my cluster of 3 nodes RF: 2.
 I have loaded about 2 billion rows in one table.
 I use LeveledCompactionStrategy on my table.
 I use version 2.1.6.
 I use the default cassandra.yaml, only the ip address for seeds and
 throughput has been change.

  I loaded my data with simple insert statements. This took a bit more
 than one day to load the data… and one more day to compact the data on all
 nodes.
 For me this is quite acceptable since I should not be doing this again.
 I have done this with previous versions like 2.1.3 and others and I
 basically had absolutely no problems.

  Now I read the log files on the client side, there I see no warning and
 no errors.
 On the nodes side there I see many WARNING, all related with tombstones,
 but there are no ERRORS.

  My problem is that I see some *many missing records* in the DB, and I
 have never observed this with previous versions.

  1) Is this a know problem?
 2) Do you have any idea how I could track down this problem?
 3) What is the meaning of this WARNING (the only type of ERROR | WARN  I
 could find)?

  WARN  [SharedPool-Worker-2] 2015-06-15 10:12:00,866
 SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in
 gttdata.alltrades_co_rep_pcode for key: D:07 (see
 tombstone_warn_threshold). 5000 columns were requested,
 slices=[388:201001-388:201412:!]


  4) Is it possible to have Tombstone when we make no DELETE statements?

  I’m lost…

  Thanks for your help.


-- 


--





Re: Missing data

2015-06-15 Thread Robert Wille
You can get tombstones from inserting null values. Not sure if that’s the 
problem, but it is another way of getting tombstones in your data.

On Jun 15, 2015, at 10:50 AM, Jean Tremblay 
jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com 
wrote:

Dear all,

I identified a bit more closely the root cause of my missing data.

The problem is occurring when I use

dependency
groupIdcom.datastax.cassandra/groupId
artifactIdcassandra-driver-core/artifactId
version2.1.6/version
/dependency

on my client against Cassandra 2.1.6.

I did not have the problem when I was using the driver 2.1.4 with C* 2.1.4.
Interestingly enough I don’t have the problem with the driver 2.1.4 with C* 
2.1.6.  !!

So as far as I can locate the problem, I would say that the version 2.1.6 of 
the driver is not working properly and is loosing some of my records.!!!

——

As far as my tombstones are concerned I don’t understand their origin.
I removed all location in my code where I delete items, and I do not use TTL 
anywhere ( I don’t need this feature in my project).

And yet I have many tombstones building up.

Is there another origin for tombstone beside TTL, and deleting items? Could the 
compaction of LeveledCompactionStrategy be the origin of them?

@Carlos thanks for your guidance.

Kind regards

Jean



On 15 Jun 2015, at 11:17 , Carlos Rolo 
r...@pythian.commailto:r...@pythian.com wrote:

Hi Jean,

The problem of that Warning is that you are reading too many tombstones per 
request.

If you do have Tombstones without doing DELETE it because you probably TTL'ed 
the data when inserting (By mistake? Or did you set default_time_to_live in 
your table?). You can use nodetool cfstats to see how many tombstones per read 
slice you have. This is, probably, also the cause of your missing data. Data 
was tombstoned, so it is not available.



Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: 
linkedin.com/in/carlosjuzarterolohttp://linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.comhttp://www.pythian.com/

On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay 
jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com 
wrote:
Hi,

I have reloaded the data in my cluster of 3 nodes RF: 2.
I have loaded about 2 billion rows in one table.
I use LeveledCompactionStrategy on my table.
I use version 2.1.6.
I use the default cassandra.yaml, only the ip address for seeds and throughput 
has been change.

I loaded my data with simple insert statements. This took a bit more than one 
day to load the data… and one more day to compact the data on all nodes.
For me this is quite acceptable since I should not be doing this again.
I have done this with previous versions like 2.1.3 and others and I basically 
had absolutely no problems.

Now I read the log files on the client side, there I see no warning and no 
errors.
On the nodes side there I see many WARNING, all related with tombstones, but 
there are no ERRORS.

My problem is that I see some *many missing records* in the DB, and I have 
never observed this with previous versions.

1) Is this a know problem?
2) Do you have any idea how I could track down this problem?
3) What is the meaning of this WARNING (the only type of ERROR | WARN  I could 
find)?

WARN  [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - 
Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for 
key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, 
slices=[388:201001-388:201412:!]


4) Is it possible to have Tombstone when we make no DELETE statements?

I’m lost…

Thanks for your help.



--







Re: Missing data

2015-06-15 Thread Jean Tremblay
Thanks Robert, but I don’t insert NULL values, but thanks anyway.

On 15 Jun 2015, at 19:16 , Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:

You can get tombstones from inserting null values. Not sure if that’s the 
problem, but it is another way of getting tombstones in your data.

On Jun 15, 2015, at 10:50 AM, Jean Tremblay 
jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com 
wrote:

Dear all,

I identified a bit more closely the root cause of my missing data.

The problem is occurring when I use

dependency
groupIdcom.datastax.cassandra/groupId
artifactIdcassandra-driver-core/artifactId
version2.1.6/version
/dependency

on my client against Cassandra 2.1.6.

I did not have the problem when I was using the driver 2.1.4 with C* 2.1.4.
Interestingly enough I don’t have the problem with the driver 2.1.4 with C* 
2.1.6.  !!

So as far as I can locate the problem, I would say that the version 2.1.6 of 
the driver is not working properly and is loosing some of my records.!!!

——

As far as my tombstones are concerned I don’t understand their origin.
I removed all location in my code where I delete items, and I do not use TTL 
anywhere ( I don’t need this feature in my project).

And yet I have many tombstones building up.

Is there another origin for tombstone beside TTL, and deleting items? Could the 
compaction of LeveledCompactionStrategy be the origin of them?

@Carlos thanks for your guidance.

Kind regards

Jean



On 15 Jun 2015, at 11:17 , Carlos Rolo 
r...@pythian.commailto:r...@pythian.com wrote:

Hi Jean,

The problem of that Warning is that you are reading too many tombstones per 
request.

If you do have Tombstones without doing DELETE it because you probably TTL'ed 
the data when inserting (By mistake? Or did you set default_time_to_live in 
your table?). You can use nodetool cfstats to see how many tombstones per read 
slice you have. This is, probably, also the cause of your missing data. Data 
was tombstoned, so it is not available.



Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: 
linkedin.com/in/carlosjuzarterolohttp://linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.comhttp://www.pythian.com/

On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay 
jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com 
wrote:
Hi,

I have reloaded the data in my cluster of 3 nodes RF: 2.
I have loaded about 2 billion rows in one table.
I use LeveledCompactionStrategy on my table.
I use version 2.1.6.
I use the default cassandra.yaml, only the ip address for seeds and throughput 
has been change.

I loaded my data with simple insert statements. This took a bit more than one 
day to load the data… and one more day to compact the data on all nodes.
For me this is quite acceptable since I should not be doing this again.
I have done this with previous versions like 2.1.3 and others and I basically 
had absolutely no problems.

Now I read the log files on the client side, there I see no warning and no 
errors.
On the nodes side there I see many WARNING, all related with tombstones, but 
there are no ERRORS.

My problem is that I see some *many missing records* in the DB, and I have 
never observed this with previous versions.

1) Is this a know problem?
2) Do you have any idea how I could track down this problem?
3) What is the meaning of this WARNING (the only type of ERROR | WARN  I could 
find)?

WARN  [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - 
Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for 
key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, 
slices=[388:201001-388:201412:!]


4) Is it possible to have Tombstone when we make no DELETE statements?

I’m lost…

Thanks for your help.



--








Re: Missing data

2015-06-15 Thread Jean Tremblay
Dear all,

I identified a bit more closely the root cause of my missing data.

The problem is occurring when I use

dependency
groupIdcom.datastax.cassandra/groupId
artifactIdcassandra-driver-core/artifactId
version2.1.6/version
/dependency

on my client against Cassandra 2.1.6.

I did not have the problem when I was using the driver 2.1.4 with C* 2.1.4.
Interestingly enough I don’t have the problem with the driver 2.1.4 with C* 
2.1.6.  !!

So as far as I can locate the problem, I would say that the version 2.1.6 of 
the driver is not working properly and is loosing some of my records.!!!

——

As far as my tombstones are concerned I don’t understand their origin.
I removed all location in my code where I delete items, and I do not use TTL 
anywhere ( I don’t need this feature in my project).

And yet I have many tombstones building up.

Is there another origin for tombstone beside TTL, and deleting items? Could the 
compaction of LeveledCompactionStrategy be the origin of them?

@Carlos thanks for your guidance.

Kind regards

Jean



On 15 Jun 2015, at 11:17 , Carlos Rolo 
r...@pythian.commailto:r...@pythian.com wrote:

Hi Jean,

The problem of that Warning is that you are reading too many tombstones per 
request.

If you do have Tombstones without doing DELETE it because you probably TTL'ed 
the data when inserting (By mistake? Or did you set default_time_to_live in 
your table?). You can use nodetool cfstats to see how many tombstones per read 
slice you have. This is, probably, also the cause of your missing data. Data 
was tombstoned, so it is not available.



Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: 
linkedin.com/in/carlosjuzarterolohttp://linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.comhttp://www.pythian.com/

On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay 
jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com 
wrote:
Hi,

I have reloaded the data in my cluster of 3 nodes RF: 2.
I have loaded about 2 billion rows in one table.
I use LeveledCompactionStrategy on my table.
I use version 2.1.6.
I use the default cassandra.yaml, only the ip address for seeds and throughput 
has been change.

I loaded my data with simple insert statements. This took a bit more than one 
day to load the data… and one more day to compact the data on all nodes.
For me this is quite acceptable since I should not be doing this again.
I have done this with previous versions like 2.1.3 and others and I basically 
had absolutely no problems.

Now I read the log files on the client side, there I see no warning and no 
errors.
On the nodes side there I see many WARNING, all related with tombstones, but 
there are no ERRORS.

My problem is that I see some *many missing records* in the DB, and I have 
never observed this with previous versions.

1) Is this a know problem?
2) Do you have any idea how I could track down this problem?
3) What is the meaning of this WARNING (the only type of ERROR | WARN  I could 
find)?

WARN  [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - 
Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for 
key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, 
slices=[388:201001-388:201412:!]


4) Is it possible to have Tombstone when we make no DELETE statements?

I’m lost…

Thanks for your help.



--






Re: Missing data

2015-06-15 Thread Bryan Holladay
Theres your problem, you're using the DataStax java driver :) I just ran
into this issue in the last week and it was incredibly frustrating. If you
are doing a simple loop on a select *  query, then the DataStax java
driver will only process 2^31 rows (e.g. the Java Integer Max
(2,147,483,647)) before it stops w/o any error or output in the logs. The
fact that you said you only had about 2 billion rows but you are seeing
missing data is a red flag.

I found the only way around this is to do your select * in chunks based
on the token range (see this gist for an example:
https://gist.github.com/baholladay/21eb4c61ea8905302195 )
Just loop for every 100million rows and make a new query select * from
TABLE where token(key)  lastToken

Thanks,
Bryan




On Mon, Jun 15, 2015 at 12:50 PM, Jean Tremblay 
jean.tremb...@zen-innovations.com wrote:

  Dear all,

  I identified a bit more closely the root cause of my missing data.

  The problem is occurring when I use

   dependency
 groupIdcom.datastax.cassandra/groupId
 artifactIdcassandra-driver-core/artifactId
  version2.1.6/version
  /dependency

  on my client against Cassandra 2.1.6.

  I did not have the problem when I was using the driver 2.1.4 with C*
 2.1.4.
 Interestingly enough I don’t have the problem with the driver 2.1.4 with
 C* 2.1.6.  !!

  So as far as I can locate the problem, I would say that the version
 2.1.6 of the driver is not working properly and is loosing some of my
 records.!!!

  ——

  As far as my tombstones are concerned I don’t understand their origin.
 I removed all location in my code where I delete items, and I do not use
 TTL anywhere ( I don’t need this feature in my project).

  And yet I have many tombstones building up.

  Is there another origin for tombstone beside TTL, and deleting items?
 Could the compaction of LeveledCompactionStrategy be the origin of them?

  @Carlos thanks for your guidance.

  Kind regards

  Jean



  On 15 Jun 2015, at 11:17 , Carlos Rolo r...@pythian.com wrote:

  Hi Jean,

  The problem of that Warning is that you are reading too many tombstones
 per request.

  If you do have Tombstones without doing DELETE it because you probably
 TTL'ed the data when inserting (By mistake? Or did you set
 default_time_to_live in your table?). You can use nodetool cfstats to see
 how many tombstones per read slice you have. This is, probably, also the
 cause of your missing data. Data was tombstoned, so it is not available.



Regards,

  Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

  rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
 www.pythian.com

 On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay 
 jean.tremb...@zen-innovations.com wrote:

 Hi,

  I have reloaded the data in my cluster of 3 nodes RF: 2.
 I have loaded about 2 billion rows in one table.
 I use LeveledCompactionStrategy on my table.
 I use version 2.1.6.
 I use the default cassandra.yaml, only the ip address for seeds and
 throughput has been change.

  I loaded my data with simple insert statements. This took a bit more
 than one day to load the data… and one more day to compact the data on all
 nodes.
 For me this is quite acceptable since I should not be doing this again.
 I have done this with previous versions like 2.1.3 and others and I
 basically had absolutely no problems.

  Now I read the log files on the client side, there I see no warning and
 no errors.
 On the nodes side there I see many WARNING, all related with tombstones,
 but there are no ERRORS.

  My problem is that I see some *many missing records* in the DB, and I
 have never observed this with previous versions.

  1) Is this a know problem?
 2) Do you have any idea how I could track down this problem?
 3) What is the meaning of this WARNING (the only type of ERROR | WARN  I
 could find)?

  WARN  [SharedPool-Worker-2] 2015-06-15 10:12:00,866
 SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in
 gttdata.alltrades_co_rep_pcode for key: D:07 (see
 tombstone_warn_threshold). 5000 columns were requested,
 slices=[388:201001-388:201412:!]


  4) Is it possible to have Tombstone when we make no DELETE statements?

  I’m lost…

  Thanks for your help.



 --








Re: Missing data

2015-06-15 Thread Jean Tremblay
Thanks Bryan.
I believe I have a different problem with the Datastax 2.1.6 driver.
My problem is not that I make huge selects.
My problem seems more to occur on some inserts. I inserts MANY rows and with 
the version 2.1.6 of the driver I seem to be loosing some records.

But thanks anyway I will remember your mail when I bump into the select problem.

Cheers

Jean


On 15 Jun 2015, at 19:13 , Bryan Holladay 
holla...@longsight.commailto:holla...@longsight.com wrote:

Theres your problem, you're using the DataStax java driver :) I just ran into 
this issue in the last week and it was incredibly frustrating. If you are doing 
a simple loop on a select *  query, then the DataStax java driver will only 
process 2^31 rows (e.g. the Java Integer Max (2,147,483,647)) before it stops 
w/o any error or output in the logs. The fact that you said you only had about 
2 billion rows but you are seeing missing data is a red flag.

I found the only way around this is to do your select * in chunks based on 
the token range (see this gist for an example: 
https://gist.github.com/baholladay/21eb4c61ea8905302195 )
Just loop for every 100million rows and make a new query select * from TABLE 
where token(key)  lastToken

Thanks,
Bryan




On Mon, Jun 15, 2015 at 12:50 PM, Jean Tremblay 
jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com 
wrote:
Dear all,

I identified a bit more closely the root cause of my missing data.

The problem is occurring when I use

dependency
groupIdcom.datastax.cassandra/groupId
artifactIdcassandra-driver-core/artifactId
version2.1.6/version
/dependency

on my client against Cassandra 2.1.6.

I did not have the problem when I was using the driver 2.1.4 with C* 2.1.4.
Interestingly enough I don’t have the problem with the driver 2.1.4 with C* 
2.1.6.  !!

So as far as I can locate the problem, I would say that the version 2.1.6 of 
the driver is not working properly and is loosing some of my records.!!!

——

As far as my tombstones are concerned I don’t understand their origin.
I removed all location in my code where I delete items, and I do not use TTL 
anywhere ( I don’t need this feature in my project).

And yet I have many tombstones building up.

Is there another origin for tombstone beside TTL, and deleting items? Could the 
compaction of LeveledCompactionStrategy be the origin of them?

@Carlos thanks for your guidance.

Kind regards

Jean



On 15 Jun 2015, at 11:17 , Carlos Rolo 
r...@pythian.commailto:r...@pythian.com wrote:

Hi Jean,

The problem of that Warning is that you are reading too many tombstones per 
request.

If you do have Tombstones without doing DELETE it because you probably TTL'ed 
the data when inserting (By mistake? Or did you set default_time_to_live in 
your table?). You can use nodetool cfstats to see how many tombstones per read 
slice you have. This is, probably, also the cause of your missing data. Data 
was tombstoned, so it is not available.



Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: 
linkedin.com/in/carlosjuzarterolohttp://linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 
x1649tel:%2B1%20613%20565%208696%20x1649
www.pythian.comhttp://www.pythian.com/

On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay 
jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com 
wrote:
Hi,

I have reloaded the data in my cluster of 3 nodes RF: 2.
I have loaded about 2 billion rows in one table.
I use LeveledCompactionStrategy on my table.
I use version 2.1.6.
I use the default cassandra.yaml, only the ip address for seeds and throughput 
has been change.

I loaded my data with simple insert statements. This took a bit more than one 
day to load the data… and one more day to compact the data on all nodes.
For me this is quite acceptable since I should not be doing this again.
I have done this with previous versions like 2.1.3 and others and I basically 
had absolutely no problems.

Now I read the log files on the client side, there I see no warning and no 
errors.
On the nodes side there I see many WARNING, all related with tombstones, but 
there are no ERRORS.

My problem is that I see some *many missing records* in the DB, and I have 
never observed this with previous versions.

1) Is this a know problem?
2) Do you have any idea how I could track down this problem?
3) What is the meaning of this WARNING (the only type of ERROR | WARN  I could 
find)?

WARN  [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - 
Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for 
key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, 
slices=[388:201001-388:201412:!]


4) Is it possible to have Tombstone when we make no DELETE statements?

I’m lost…

Thanks for your help.



--








Re: MIssing data in range query

2014-10-08 Thread Robert Coli
On Tue, Oct 7, 2014 at 3:11 PM, Owen Kim ohech...@gmail.com wrote:

 Sigh, it is a bit grating. I (genuinely) appreciate your acknowledgement
 of that. Though, I didn't intend for the question to be about
 supercolumns.


(Yep, understand tho that if you hadn't been told that advice before, it
would grate a lot less. I will try to remember that Owen Kim has received
this piece of info, and will do my best to not repeat it to you... :D)


 It is possible I'm hitting an odd edge case though I'm having trouble
 reproducing the issue in a controlled environment since there seems to be a
 timing element to it, or at least it's not consistently happening. I
 haven't been able to reproduce it on a single node test cluster. I'm moving
 on to test a larger one now.


Right, my hypothesis is that there is something within the supercolumn
write path which differs from the non-supercolumn write path. In theory
this should be less possible since the 1.2 era supercolumn rewrite.

To be clear, are you reading back via PK? No secondary indexes involved,
right? The only bells your symptoms are ringing are secondary index bugs...

=Rob


Re: MIssing data in range query

2014-10-08 Thread Owen Kim
Nope. No secondary index. Just a slice query on the PK.



On Tuesday, October 7, 2014, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Oct 7, 2014 at 3:11 PM, Owen Kim ohech...@gmail.com
 javascript:_e(%7B%7D,'cvml','ohech...@gmail.com'); wrote:

 Sigh, it is a bit grating. I (genuinely) appreciate your acknowledgement
 of that. Though, I didn't intend for the question to be about
 supercolumns.


 (Yep, understand tho that if you hadn't been told that advice before, it
 would grate a lot less. I will try to remember that Owen Kim has received
 this piece of info, and will do my best to not repeat it to you... :D)


 It is possible I'm hitting an odd edge case though I'm having trouble
 reproducing the issue in a controlled environment since there seems to be a
 timing element to it, or at least it's not consistently happening. I
 haven't been able to reproduce it on a single node test cluster. I'm moving
 on to test a larger one now.


 Right, my hypothesis is that there is something within the supercolumn
 write path which differs from the non-supercolumn write path. In theory
 this should be less possible since the 1.2 era supercolumn rewrite.

 To be clear, are you reading back via PK? No secondary indexes involved,
 right? The only bells your symptoms are ringing are secondary index bugs...

 =Rob




MIssing data in range query

2014-10-07 Thread Owen Kim
Hello,

I'm running Cassandra 1.2.16 with supercolumns and Hector.

create column family CFName

  with column_type = 'Super'

  and comparator = 'UTF8Type'

  and subcomparator = 'UTF8Type'

  and default_validation_class = 'UTF8Type'

  and key_validation_class = 'UTF8Type'

  and read_repair_chance = 0.2

  and dclocal_read_repair_chance = 0.0

  and populate_io_cache_on_flush = false

  and gc_grace = 43200

  and min_compaction_threshold = 4

  and max_compaction_threshold = 32

  and replicate_on_write = true

  and compaction_strategy =
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'

  and caching = 'KEYS_ONLY';


I'm doing a adding a time series supercolumn then doing a slice query over
this super column. I'm really just trying to see if any data is in the time
slice so I'm doing a slice query with limit 1. The insert isn't at the data
bounds.

However, sometimes, nothing shows up in the time slice, even 8 seconds
after the insert. I'm doing quorum reads and writes so I'd expect
consistent results but the slice query comes up empty, even if there have
been multiple inserts.

I'm not sure what's happening here and trying to narrow down suspects. Can
key caching produce stale results? Do slice queries have different
consistency guarantees?


Re: MIssing data in range query

2014-10-07 Thread Robert Coli
On Tue, Oct 7, 2014 at 1:38 PM, Owen Kim ohech...@gmail.com wrote:

 I'm running Cassandra 1.2.16 with supercolumns and Hector.


Slightly non-responsive response :

In general supercolumn use is not recommended. It makes it more difficult
to get support when one uses a feature no one else uses.

=Rob


Re: MIssing data in range query

2014-10-07 Thread Owen Kim
I'm aware. I've had the system up since pre-composite columns and haven't
had the cycles to do a major data and schema migration.

And that's not slightly non-responsive.

On Tue, Oct 7, 2014 at 1:49 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Oct 7, 2014 at 1:38 PM, Owen Kim ohech...@gmail.com wrote:

 I'm running Cassandra 1.2.16 with supercolumns and Hector.


 Slightly non-responsive response :

 In general supercolumn use is not recommended. It makes it more difficult
 to get support when one uses a feature no one else uses.

 =Rob




Re: MIssing data in range query

2014-10-07 Thread Robert Coli
On Tue, Oct 7, 2014 at 2:03 PM, Owen Kim ohech...@gmail.com wrote:

 I'm aware. I've had the system up since pre-composite columns and haven't
 had the cycles to do a major data and schema migration.

 And that's not slightly non-responsive.


There may be unknown bugs in the code you're using, especially because no
one else uses it is in fact slightly responsive. While I'm sure it does
grate to be told that one should not be using a feature one cannot choose
to not-use, I consider don't use them responsive to every question about
supercolumns since 2010, unless the asker pre-emptively states they know
this fact. I assure you that my meta-response is infinitely more responsive
than the total non-response you were otherwise likely to receive...

... aaanyway ...

Probably you are just hitting an edge case in the 1.2 era rewrite of
supercolumns which no one else has ever encountered because no one uses
them. For the record, I do not believe either of your hypotheses (key cache
or slice queries having different guarantees) are likely to be implicated.
One of them is trivial to test : create a test CF with the key cache
disabled and try to repro there.

Instead of attempting to debug by yourself, or on the user list (which will
be full of people not-using supercolumns) I suggest filing an JIRA with
reproduction steps, and then mentioning the URL on this thread for future
googlers.

=Rob


Re: MIssing data in range query

2014-10-07 Thread Owen Kim
Sigh, it is a bit grating. I (genuinely) appreciate your acknowledgement of
that. Though, I didn't intend for the question to be about supercolumns.

It is possible I'm hitting an odd edge case though I'm having trouble
reproducing the issue in a controlled environment since there seems to be a
timing element to it, or at least it's not consistently happening. I
haven't been able to reproduce it on a single node test cluster. I'm moving
on to test a larger one now.

On Tue, Oct 7, 2014 at 2:39 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Oct 7, 2014 at 2:03 PM, Owen Kim ohech...@gmail.com wrote:

 I'm aware. I've had the system up since pre-composite columns and haven't
 had the cycles to do a major data and schema migration.

 And that's not slightly non-responsive.


 There may be unknown bugs in the code you're using, especially because no
 one else uses it is in fact slightly responsive. While I'm sure it does
 grate to be told that one should not be using a feature one cannot choose
 to not-use, I consider don't use them responsive to every question about
 supercolumns since 2010, unless the asker pre-emptively states they know
 this fact. I assure you that my meta-response is infinitely more responsive
 than the total non-response you were otherwise likely to receive...

 ... aaanyway ...

 Probably you are just hitting an edge case in the 1.2 era rewrite of
 supercolumns which no one else has ever encountered because no one uses
 them. For the record, I do not believe either of your hypotheses (key cache
 or slice queries having different guarantees) are likely to be implicated.
 One of them is trivial to test : create a test CF with the key cache
 disabled and try to repro there.

 Instead of attempting to debug by yourself, or on the user list (which
 will be full of people not-using supercolumns) I suggest filing an JIRA
 with reproduction steps, and then mentioning the URL on this thread for
 future googlers.

 =Rob





Re: restoring from snapshot - missing data

2012-05-21 Thread Tyler Hobbs
On Mon, May 21, 2012 at 12:01 AM, Tamar Fraenkel ta...@tok-media.comwrote:

 If I am putting the snapshots on a clean ring, I need to first create the
 data model?


Yes.

-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: restoring from snapshot - missing data

2012-05-21 Thread Tamar Fraenkel
Thanks.
After creating the data model and matching the correct snapshot with the
correct new node (same token) all worked fine!

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Mon, May 21, 2012 at 9:06 PM, Tyler Hobbs ty...@datastax.com wrote:

 On Mon, May 21, 2012 at 12:01 AM, Tamar Fraenkel ta...@tok-media.comwrote:

 If I am putting the snapshots on a clean ring, I need to first create the
 data model?


 Yes.

 --
 Tyler Hobbs
 DataStax http://datastax.com/


tokLogo.png

restoring from snapshot - missing data

2012-05-20 Thread Tamar Fraenkel
Hi!
I am testing backup and restore.
I created the restore using parallel ssh on all 3 nodes.
I created a new 3 ring setup and used the snapshot to test recover.
Snapshot from every original node went to one of the new nodes.
When I compare the content of the data dir it seems that all files from the
original cluster exist on the backup cluster.
*But* when I do some cqlsh queries it seems as though about 1/3 of my data
is missing.

Any idea what could be the issue?
I thought that snapshot flushes all in-memory writes to disk, so it can't
be that some data was not on the original snapshot.

Help is much appreciated,
Thanks

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956
tokLogo.png

Re: restoring from snapshot - missing data

2012-05-20 Thread Tyler Hobbs
Did you use the same tokens for the nodes in both clusters?

On Sun, May 20, 2012 at 1:25 PM, Tamar Fraenkel ta...@tok-media.com wrote:

 Hi!
 I am testing backup and restore.
 I created the restore using parallel ssh on all 3 nodes.
 I created a new 3 ring setup and used the snapshot to test recover.
 Snapshot from every original node went to one of the new nodes.
 When I compare the content of the data dir it seems that all files from
 the original cluster exist on the backup cluster.
 *But* when I do some cqlsh queries it seems as though about 1/3 of my
 data is missing.

 Any idea what could be the issue?
 I thought that snapshot flushes all in-memory writes to disk, so it can't
 be that some data was not on the original snapshot.

 Help is much appreciated,
 Thanks

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956






-- 
Tyler Hobbs
DataStax http://datastax.com/
tokLogo.png

Re: restoring from snapshot - missing data

2012-05-20 Thread Tamar Fraenkel
Thanks. Just figured out yesterday that I switched the snapshots mixing the
tokens.
Will try again today.
And another question. If I am putting the snapshots on a clean ring, I need
to first create the data model?
Thanks
*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Mon, May 21, 2012 at 1:44 AM, Tyler Hobbs ty...@datastax.com wrote:

 Did you use the same tokens for the nodes in both clusters?


 On Sun, May 20, 2012 at 1:25 PM, Tamar Fraenkel ta...@tok-media.comwrote:

 Hi!
 I am testing backup and restore.
 I created the restore using parallel ssh on all 3 nodes.
 I created a new 3 ring setup and used the snapshot to test recover.
 Snapshot from every original node went to one of the new nodes.
 When I compare the content of the data dir it seems that all files from
 the original cluster exist on the backup cluster.
 *But* when I do some cqlsh queries it seems as though about 1/3 of my
 data is missing.

 Any idea what could be the issue?
 I thought that snapshot flushes all in-memory writes to disk, so it
 can't be that some data was not on the original snapshot.

 Help is much appreciated,
 Thanks

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956






 --
 Tyler Hobbs
 DataStax http://datastax.com/


tokLogo.pngtokLogo.png

Re: commitlog replay missing data

2011-07-13 Thread Aaron Morton
Have you verified that data you expect to see is not in the server after 
shutdown?

WRT the differed in the difference between the Memtable data size and SSTable 
live size, don't believe everything you read :)

Memtable live size is increased by the serialised byte size of every column 
inserted, and is never decremented. Deletes and overwrites will inflate this 
value. What was your workload like?

As of 0.8 we now have global memory management for cf's that tracks actual JVM 
bytes used by a CF. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 12/07/2011, at 3:28 PM, Jeffrey Wang jw...@palantir.com wrote:

 Hey all,
 
  
 
 Recently upgraded to 0.8.1 and noticed what seems to be missing data after a 
 commitlog replay on a single-node cluster. I start the node, insert a bunch 
 of stuff (~600MB), stop it, and restart it. There are log messages pertaining 
 to the commitlog replay and no errors, but some of the data is missing. If I 
 flush before stopping the node, everything is fine, and running cfstats in 
 the two cases shows different amounts of data in the SSTables. Moreover, the 
 amount of data that is missing is nondeterministic. Has anyone run into this? 
 Thanks.
 
  
 
 Here is the output of a side-by-side diff between cfstats outputs for a 
 single CF before restarting (left) and after (right). Somehow a 37MB memtable 
 became a 2.9MB SSTable (note the difference in write count as well)?
 
  
 
 Column Family: Blocks   Column 
 Family: Blocks
 
 SSTable count: 0  | SSTable 
 count: 1
 
 Space used (live): 0  | Space used 
 (live): 2907637
 
 Space used (total): 0 | Space used 
 (total): 2907637
 
 Memtable Columns Count: 8198  | Memtable 
 Columns Count: 0
 
 Memtable Data Size: 37550510  | Memtable Data 
 Size: 0
 
 Memtable Switch Count: 0  | Memtable 
 Switch Count: 1
 
 Read Count: 0   Read Count: 0
 
 Read Latency: NaN ms.   Read Latency: 
 NaN ms.
 
 Write Count: 8198 | Write Count: 
 1526
 
 Write Latency: 0.018 ms.  | Write 
 Latency: 0.011 ms.
 
 Pending Tasks: 0Pending 
 Tasks: 0
 
 Key cache capacity: 20  Key cache 
 capacity: 20
 
 Key cache size: 0   Key cache 
 size: 0
 
 Key cache hit rate: NaN Key cache hit 
 rate: NaN
 
 Row cache: disabled Row cache: 
 disabled
 
 Compacted row minimum size: 0 | Compacted row 
 minimum size: 1110
 
 Compacted row maximum size: 0 | Compacted row 
 maximum size: 2299
 
 Compacted row mean size: 0| Compacted row 
 mean size: 1960
 
  
 
 Note that I patched https://issues.apache.org/jira/browse/CASSANDRA-2317 in 
 my version, but there are no deletions involved so I don’t think it’s 
 relevant unless I messed something up while patching.
 
  
 
 -Jeffrey
 


Re: commitlog replay missing data

2011-07-13 Thread Peter Schuller
 Recently upgraded to 0.8.1 and noticed what seems to be missing data after a
 commitlog replay on a single-node cluster. I start the node, insert a bunch
 of stuff (~600MB), stop it, and restart it. There are log messages

If you stop by a kill, make sure you use batched commitlog synch mode
instead of periodic if you want guarantees on individual writes.

(I don't believe you'd expect a significant disk space discrepancy
though since in practice the delay until write() should be small. But
don't quote me on this because I'd have to check the code to make sure
that commit log reply isn't dependent on some marker that isn't
written until commit log synch.)

-- 
/ Peter Schuller (@scode on twitter)


Re: commitlog replay missing data

2011-07-13 Thread mcasandra

Peter Schuller wrote:
 
 Recently upgraded to 0.8.1 and noticed what seems to be missing data
 after a
 commitlog replay on a single-node cluster. I start the node, insert a
 bunch
 of stuff (~600MB), stop it, and restart it. There are log messages
 
 If you stop by a kill, make sure you use batched commitlog synch mode
 instead of periodic if you want guarantees on individual writes.
 

What are the other ways to stop Cassandra?

What's the difference between batch vs periodic?

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/commitlog-replay-missing-data-tp6573659p6580886.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: commitlog replay missing data

2011-07-13 Thread Peter Schuller
 # wait for a bit until no one is sending it writes anymore

More accurately, until all other nodes have realized it's down
(nodetool ring on each respective host).

-- 
/ Peter Schuller (@scode on twitter)


commitlog replay missing data

2011-07-11 Thread Jeffrey Wang
Hey all,

 

Recently upgraded to 0.8.1 and noticed what seems to be missing data after a
commitlog replay on a single-node cluster. I start the node, insert a bunch
of stuff (~600MB), stop it, and restart it. There are log messages
pertaining to the commitlog replay and no errors, but some of the data is
missing. If I flush before stopping the node, everything is fine, and
running cfstats in the two cases shows different amounts of data in the
SSTables. Moreover, the amount of data that is missing is nondeterministic.
Has anyone run into this? Thanks.

 

Here is the output of a side-by-side diff between cfstats outputs for a
single CF before restarting (left) and after (right). Somehow a 37MB
memtable became a 2.9MB SSTable (note the difference in write count as
well)?

 

Column Family: Blocks   Column
Family: Blocks

SSTable count: 0  | SSTable
count: 1

Space used (live): 0  | Space used
(live): 2907637

Space used (total): 0 | Space used
(total): 2907637

Memtable Columns Count: 8198  | Memtable
Columns Count: 0

Memtable Data Size: 37550510  | Memtable
Data Size: 0

Memtable Switch Count: 0  | Memtable
Switch Count: 1

Read Count: 0   Read Count:
0

Read Latency: NaN ms.   Read
Latency: NaN ms.

Write Count: 8198 | Write Count:
1526

Write Latency: 0.018 ms.  | Write
Latency: 0.011 ms.

Pending Tasks: 0Pending
Tasks: 0

Key cache capacity: 20  Key cache
capacity: 20

Key cache size: 0   Key cache
size: 0

Key cache hit rate: NaN Key cache
hit rate: NaN

Row cache: disabled Row cache:
disabled

Compacted row minimum size: 0 | Compacted
row minimum size: 1110

Compacted row maximum size: 0 | Compacted
row maximum size: 2299

Compacted row mean size: 0| Compacted
row mean size: 1960

 

Note that I patched https://issues.apache.org/jira/browse/CASSANDRA-2317 in
my version, but there are no deletions involved so I don't think it's
relevant unless I messed something up while patching.

 

-Jeffrey



smime.p7s
Description: S/MIME cryptographic signature