Re: Inconsistent query results and node state

Jason Kania Thu, 31 Mar 2016 09:56:53 -0700

Thanks for responding. The problems that we are having are in Cassandra 3.03 
and 3.0.4. We had upgraded to see if the problem went away.


The values have been out of sync this way for some time and we cannot get a row 
with the 1969 timestamp in any query that directly queries on the timestamp. 
The 1969-12-31 19:00 value comes inconsistently in range queries but seems to 
be tied to the 192.168.10.9 node.
We tried the writetime function value in the query on time but it is not 
allowed as the time column is part of the primary key. Instead we used it on an 
additional field that is written at the same time (classId):
subscriberId  sensorUnitId  sensorId  time                      
writetime(classId)       JASKAN             0          0  2015-05-24 02:09      
  1458178461272000
       JASKAN             0          0  1969-12-31 19:00        1458178801214000
       JASKAN             0          0  2016-01-21 02:10        1458178801221000
       JASKAN             0          0  2016-01-21 02:10        1458178801226000
       JASKAN             0          0  2016-01-21 02:10        1458178801231000
       JASKAN             0          0  2016-01-21 02:11        1458178801235000
       JASKAN             0          0  2016-01-21 02:22        1458178801241000
       JASKAN             0          0  2016-01-21 02:22        1458178801247000
       JASKAN             0          0  2016-01-21 02:22        1458178801252000
       JASKAN             0          0  2016-01-21 02:22        1458178801258000
Based on the other column values in the table row, we confirmed that the actual 
time in the row showing up with the 1969-12-31 19:00 timestamp is associated 
with the following timestamp.
 subscriberId  sensorUnitId  sensorId  time                      
writetime(classId)
       JASKAN             0          0  2016-01-21 02:09        1458178801214000

The 2016-01-21 02:09 timestamp is always present on all nodes if queried 
directly based on using tracing.
To me it just seems like the timestamp column value is sometimes not being set 
somewhere in the pipeline and the result is the epoch 0 value.
Thoughts on how to proceed?
Thanks,

Jason

      From: Tyler Hobbs <ty...@datastax.com>
 To: user@cassandra.apache.org 
 Sent: Wednesday, March 30, 2016 11:31 AM
 Subject: Re: Inconsistent query results and node state
   

org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
DecoratedKey(-4908797801227889951, 4a41534b414e) 
(6a6c8ab013d7757e702af50cbdae045c vs 2ece61a01b2a640ac10509f4c49ae6fb)

That key matches the row you mentioned, so it seems like all of the replicas 
should have converged on the same value for that row.  Do you consistently get 
the 1969-12-31 19:00 timestamp back now?  If not, try selecting both "time" and 
"writetime(time)}" from that row and see what write timestamps each of the 
values have.

The ArrayIndexOutOfBoundsException in response to nodetool compact looks like a 
bug.  What version of Cassandra are you running?
 
On Wed, Mar 30, 2016 at 9:59 AM, Kai Wang <dep...@gmail.com> wrote:

Do you have NTP setup on all nodes?

On Tue, Mar 29, 2016 at 11:48 PM, Jason Kania <jason.ka...@ymail.com> wrote:

We have encountered a query inconsistency problem wherein the following query 
returns different results sporadically with invalid values for a timestamp 
field looking like the field is uninitialized (a zero timestamp) in the query 
results.

Attempts to repair and compact have not changed the results.

select "subscriberId","sensorUnitId","sensorId","time" from 
"sensorReadingIndex" where "subscriberId"='JASKAN' AND "sensorUnitId"=0 AND 
"sensorId"=0 ORDER BY "time" LIMIT 10;

Invalid Query Results
subscriberId    sensorUnitId    sensorId    time
JASKAN    0    0    2015-05-24 2:09
JASKAN    0    0    1969-12-31 19:00
JASKAN    0    0    2016-01-21 2:10
JASKAN    0    0    2016-01-21 2:10
JASKAN    0    0    2016-01-21 2:10
JASKAN    0    0    2016-01-21 2:11
JASKAN    0    0    2016-01-21 2:22
JASKAN    0    0    2016-01-21 2:22
JASKAN    0    0    2016-01-21 2:22
JASKAN    0    0    2016-01-21 2:22

Valid Query Results
subscriberId    sensorUnitId    sensorId    time
JASKAN    0    0    2015-05-24 2:09
JASKAN    0    0    2015-05-24 2:09
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:11
JASKAN    0    0    2015-05-24 2:13
JASKAN    0    0    2015-05-24 2:13
JASKAN    0    0    2015-05-24 2:14

We have confirmed that the 1969-12-31 timestamp is not within the data based on 
running and number of queries so it looks like the invalid timestamp value is 
generated by the query. The query below returns no row.

select * from "sensorReadingIndex" where "subscriberId"='JASKAN' AND 
"sensorUnitId"=0 AND "sensorId"=0 AND time='1969-12-31 19:00:00-0500';

No logs are coming out but the following was observed intermittently in the 
tracing output, but not correlated to the invalid query results:

 Digest mismatch: org.apache.cassandra.service.DigestMismatchException: 
Mismatch for key DecoratedKey(-7563144029910940626, 
00064a41534b414e000004000000000000040000000000) 
(be22d379c18f75c2f51dd6942d2f9356 vs da4e95d571b41303b908e0c5c3fff7ba) 
[ReadRepairStage:3179] | 2016-03-29 23:12:35.025000 | 192.168.10.10 |
An error from the debug log that might be related is:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
DecoratedKey(-4908797801227889951, 4a41534b414e) 
(6a6c8ab013d7757e702af50cbdae045c vs 2ece61a01b2a640ac10509f4c49ae6fb)
        at 
org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85) 
~[apache-cassandra-3.0.3.jar:3.0.3]
        at 
org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
 ~[apache-cassandra-3.0.3.jar:3.0.3]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_74]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_74]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]

The tracing files are attached and seem to show that in the failed case, 
content is skipped because of tombstones if we understand it correctly. This 
could be an inconsistency problem on 192.168.10.9 Unfortunately, attempts to 
compact on 192.168.10.9 only give the following error without any stack trace 
detail and are not fixed with repair.

root@cutthroat:/usr/local/bin/analyzer/bin# nodetool compact
error: null
-- StackTrace --
java.lang.ArrayIndexOutOfBoundsException
Any suggestions on how to fix or what to search for would be much appreciated.
Thanks,
Jason









-- 
Tyler Hobbs
DataStax

Re: Inconsistent query results and node state

Reply via email to