Re: Cassandra upgrade from 2.1 to 3.0
I haven't seen this before, but I have a guess. What client/driver are you using? Are you using a prepared statement that has every column listed for the update, and leaving the un-set columns as null? If so, the null is being translated into a delete, which is clearly not what you want. The differentiation between UNSET and NULL went into 2.2 ( https://issues.apache.org/jira/browse/CASSANDRA-7304 ) , and most drivers have been updated to know the difference ( https://github.com/gocql/gocql/issues/861 , https://datastax-oss.atlassian.net/browse/JAVA-777 , etc). I haven't read the patch for 7304, but I suspect that maybe there's some sort of mixup along the way (maybe in your driver, or maybe you upgraded the driver to support 3.0 and picked up a new feature you didnt realize you picked up, etc) On Fri, May 11, 2018 at 11:26 AM, kooljava2 wrote: > After further analyzing the data. I see some pattern. The rows which were > updated in last 2-3 weeks, the column which were not part of this update > have the null values. > > Has anyone encountered this issue during the upgrade? > > > Thank you, > > > On Thursday, 10 May 2018, 19:49:50 GMT-7, kooljava2 > wrote: > > > Hello Jeff, > > 2.1.19 to 3.0.15. > > Thank you. > > On Thursday, 10 May 2018, 17:43:58 GMT-7, Jeff Jirsa > wrote: > > > Which minor version of 3.0 > > -- > Jeff Jirsa > > > On May 11, 2018, at 2:54 AM, kooljava2 > wrote: > > > Hello, > > Upgraded Cassandra 2.1 to 3.0. We see certain data in few columns being > set to "null". These null columns were created during the row creation time. > > After looking at the data see a pattern where update was done on these > rows. Rows which were updated has data but rows which were not part of the > update are set to null. > > created_on| created_by | id > -+-+ > - > null |null > |12345 > > > > sstabledump:- > > WARN 20:47:38,741 Small cdc volume detected at > /var/lib/cassandra/cdc_raw; setting cdc_total_space_in_mb to 1278. You can > override this in cassandra.yaml > [ > { > "partition" : { > "key" : [ "12345" ], > "position" : 5155159 > }, > "rows" : [ > { > "type" : "row", > "position" : 5168738, > "deletion_info" : { "marked_deleted" : > "2018-03-28T20:38:08.05Z", "local_delete_time" : "2018-03-28T20:38:08Z" > }, > "cells" : [ > { "name" : "doc_type", "value" : false, "tstamp" : > "2018-03-28T20:38:08.060Z" }, > { "name" : "industry", "deletion_info" : { "local_delete_time" : > "2018-03-28T20:38:08Z" }, > "tstamp" : "2018-03-28T20:38:08.060Z" > }, > { "name" : "last_modified_by", "value" : "12345", "tstamp" : > "2018-03-28T20:38:08.060Z" }, > { "name" : "last_modified_date", "value" : "2018-03-28 > 20:38:08.059Z", "tstamp" : "2018-03-28T20:38:08.060Z" }, > { "name" : "locale", "deletion_info" : { "local_delete_time" : > "2018-03-28T20:38:08Z" }, > "tstamp" : "2018-03-28T20:38:08.060Z" > }, > { "name" : "postal_code", "deletion_info" : { > "local_delete_time" : "2018-03-28T20:38:08Z" }, > "tstamp" : "2018-03-28T20:38:08.060Z" > }, > { "name" : "ticket", "deletion_info" : { "marked_deleted" : > "2018-03-28T20:38:08.05Z", "local_delete_time" : "2018-03-28T20:38:08Z" > } }, > { "name" : "ticket", "path" : [ "TEMP_DATA" ], "value" : > "{\"name\":\"TEMP_DATA\",\"ticket\":\"a42638dae8350e889f2603be1427ac > 6f5dec5e486d4db164a76bf80820cdf68d635cff5e7d555e6d4eabb9b5b8 > 2597b68bec0fcd735fcca\",\"lastRenewedDate\":\"2018-03-28T20:38:08Z\"}", > "tstamp" : "2018-03-28T20:38:08.060Z" }, > { "name" : "ticket", "path" : [ "TEMP_TEMP2" ], "value" : > "{\"name\":\"TEMP_TEMP2\",\"ticket\":\"a4263b7350d1f2683\" > ,\"lastRenewedDate\":\"2018-03-28T20:38:07Z\"}", "tstamp" : > "2018-03-28T20:38:08.060Z" }, > { "name" : "ppstatus_pf", "deletion_info" : { "marked_deleted" : > "2018-03-28T20:38:08.05Z", "local_delete_time" : "2018-03-28T20:38:08Z" > } }, > { "name" : "ppstatus_pers", "deletion_info" : { "marked_deleted" > : "2018-03-28T20:38:08.05Z", "local_delete_time" : > "2018-03-28T20:38:08Z" } } > ] > } > ] > } > ]WARN 20:47:41,325 Small cdc volume detected at > /var/lib/cassandra/cdc_raw; setting cdc_total_space_in_mb to 1278. You can > override this in cassandra.yaml > [ > { > "partition" : { > "key" : [ "12345" ], > "position" : 18743072 > }, > "rows" : [ > { > "type" : "row", > "position" : 18751808, > "liveness_info" : { "tstamp" : "2017-10-25T10:22:41.612Z" }, > "cells" : [ > { "name" : "created_by", "value" : "12345" }, > { "name" : "created_on", "value" : "2017-10
Re: Error after 3.1.0 to 3.11.2 upgrade
RF of one means all auth requests go to the same node, so they’re more likely to time out if that host is overloaded or restarts Increasing it distributed the queries among more hosts -- Jeff Jirsa > On May 12, 2018, at 6:11 AM, Abdul Patel wrote: > > Yeah found that all had 3 replication factor and system_auth had 1 , chnaged > to 3 now ..so was this issue due to system_auth replication facyor mismatch? > >> On Saturday, May 12, 2018, Hannu Kröger wrote: >> Hi, >> >> Did you check replication strategy and amounts of replicas of system_auth >> keyspace? >> >> Hannu >> >>> Abdul Patel kirjoitti 12.5.2018 kello 5.21: >>> >>> No applicatiom isnt impacted ..no complains .. >>> Also its an 4 node cluster in lower non production and all are on same >>> version. >>> On Friday, May 11, 2018, Jeff Jirsa wrote: The read is timing out - is the cluster healthy? Is it fully upgraded or mixed versions? Repeated isn’t great, but is the application impacted? -- Jeff Jirsa > On May 12, 2018, at 6:17 AM, Abdul Patel wrote: > > Seems its coming from 3.10, got bunch of them today for 3.11.2, so if > this is repeatedly coming , whats solution for this? > > WARN [Native-Transport-Requests-24] 2018-05-11 16:46:20,938 > CassandraAuthorizer.java:96 - CassandraAuthorizer failed to authorize > # for > ERROR [Native-Transport-Requests-24] 2018-05-11 16:46:20,940 > ErrorMessage.java:384 - Unexpected exception during request > com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.RuntimeException: > org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out > - received only 0 responses. > at > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) > ~[guava-18.0.jar:na] > at com.google.common.cache.LocalCache.get(LocalCache.java:3937) > ~[guava-18.0.jar:na] > at > com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) > ~[guava-18.0.jar:na] > at > com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) > ~[guava-18.0.jar:na] > at org.apache.cassandra.auth.AuthCache.get(AuthCache.java:108) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:45) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.service.ClientState.authorize(ClientState.java:439) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.service.ClientState.checkPermissionOnResourceChain(ClientState.java:368) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:345) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:332) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:310) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.cql3.statements.SelectStatement.checkAccess(SelectStatement.java:260) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:221) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:530) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:507) > ~[apache-cassandra-3.11.2.jar:3.11.2] > >> On Fri, May 11, 2018 at 8:30 PM, Jeff Jirsa wrote: >> That looks like Cassandra 3.10 not 3.11.2 >> >> It’s also just the auth cache failing to refresh - if it’s transient >> it’s probably not a big deal. If it continues then there may be an issue >> with the cache refresher. >> >> -- >> Jeff Jirsa >> >> >>> On May 12, 2018, at 5:55 AM, Abdul Patel wrote: >>> >>> HI All, >>> >>> Seen below stack trace messages , in errorlog one day after upgrade. >>> one of the blogs said this might be due to old drivers, but not sure on >>> it. >>> >>> FYI : >>> >>> INFO [HANDSHAKE-/10.152.205.150] 2018-05-09 10:22:27,160 >>> OutboundTcpConnection.java:510 - Handshaking version with >>> /10.152.205.150 >>> DEBUG [MessagingService-Outgoing-/10.152.205.150-Gossip] 2018-05-09 >>> 10:22:27,160 OutboundTcpConnection.java:482 - Done connecting to
Re: Error after 3.1.0 to 3.11.2 upgrade
Yeah found that all had 3 replication factor and system_auth had 1 , chnaged to 3 now ..so was this issue due to system_auth replication facyor mismatch? On Saturday, May 12, 2018, Hannu Kröger wrote: > Hi, > > Did you check replication strategy and amounts of replicas of system_auth > keyspace? > > Hannu > > Abdul Patel kirjoitti 12.5.2018 kello 5.21: > > No applicatiom isnt impacted ..no complains .. > Also its an 4 node cluster in lower non production and all are on same > version. > > On Friday, May 11, 2018, Jeff Jirsa wrote: > >> The read is timing out - is the cluster healthy? Is it fully upgraded or >> mixed versions? Repeated isn’t great, but is the application impacted? >> >> -- >> Jeff Jirsa >> >> >> On May 12, 2018, at 6:17 AM, Abdul Patel wrote: >> >> Seems its coming from 3.10, got bunch of them today for 3.11.2, so if >> this is repeatedly coming , whats solution for this? >> >> WARN [Native-Transport-Requests-24] 2018-05-11 16:46:20,938 >> CassandraAuthorizer.java:96 - CassandraAuthorizer failed to authorize >> # for >> ERROR [Native-Transport-Requests-24] 2018-05-11 16:46:20,940 >> ErrorMessage.java:384 - Unexpected exception during request >> com.google.common.util.concurrent.UncheckedExecutionException: >> java.lang.RuntimeException: >> org.apache.cassandra.exceptions.ReadTimeoutException: >> Operation timed out - received only 0 responses. >> at >> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) >> ~[guava-18.0.jar:na] >> at com.google.common.cache.LocalCache.get(LocalCache.java:3937) >> ~[guava-18.0.jar:na] >> at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) >> ~[guava-18.0.jar:na] >> at >> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) >> ~[guava-18.0.jar:na] >> at org.apache.cassandra.auth.AuthCache.get(AuthCache.java:108) >> ~[apache-cassandra-3.11.2.jar:3.11.2] >> at >> org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:45) >> ~[apache-cassandra-3.11.2.jar:3.11.2] >> at >> org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104) >> ~[apache-cassandra-3.11.2.jar:3.11.2] >> at >> org.apache.cassandra.service.ClientState.authorize(ClientState.java:439) >> ~[apache-cassandra-3.11.2.jar:3.11.2] >> at org.apache.cassandra.service.ClientState.checkPermissionOnRe >> sourceChain(ClientState.java:368) ~[apache-cassandra-3.11.2.jar:3.11.2] >> at >> org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:345) >> ~[apache-cassandra-3.11.2.jar:3.11.2] >> at >> org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:332) >> ~[apache-cassandra-3.11.2.jar:3.11.2] >> at >> org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:310) >> ~[apache-cassandra-3.11.2.jar:3.11.2] >> at >> org.apache.cassandra.cql3.statements.SelectStatement.checkAccess(SelectStatement.java:260) >> ~[apache-cassandra-3.11.2.jar:3.11.2] >> at >> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:221) >> ~[apache-cassandra-3.11.2.jar:3.11.2] >> at >> org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:530) >> ~[apache-cassandra-3.11.2.jar:3.11.2] >> at >> org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:507) >> ~[apache-cassandra-3.11.2.jar:3.11.2] >> >> On Fri, May 11, 2018 at 8:30 PM, Jeff Jirsa wrote: >> >>> That looks like Cassandra 3.10 not 3.11.2 >>> >>> It’s also just the auth cache failing to refresh - if it’s transient >>> it’s probably not a big deal. If it continues then there may be an issue >>> with the cache refresher. >>> >>> -- >>> Jeff Jirsa >>> >>> >>> On May 12, 2018, at 5:55 AM, Abdul Patel wrote: >>> >>> HI All, >>> >>> Seen below stack trace messages , in errorlog one day after upgrade. >>> one of the blogs said this might be due to old drivers, but not sure on >>> it. >>> >>> FYI : >>> >>> INFO [HANDSHAKE-/10.152.205.150] 2018-05-09 10:22:27,160 >>> OutboundTcpConnection.java:510 - Handshaking version with / >>> 10.152.205.150 >>> DEBUG [MessagingService-Outgoing-/10.152.205.150-Gossip] 2018-05-09 >>> 10:22:27,160 OutboundTcpConnection.java:482 - Done connecting to / >>> 10.152.205.150 >>> ERROR [Native-Transport-Requests-1] 2018-05-09 10:22:29,971 >>> ErrorMessage.java:384 - Unexpected exception during request >>> com.google.common.util.concurrent.UncheckedExecutionException: >>> com.google.common.util.concurrent.UncheckedExecutionException: >>> java.lang.RuntimeException: >>> org.apache.cassandra.exceptions.UnavailableException: >>> Cannot achieve consistency level LOCAL_ONE >>> at >>> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) >>> ~[guava-18.0.jar:na] >>> at com.google.common.cache.LocalCache.get(LocalCache.java:3937) >>> ~[guava-18.0.jar:na
Re: Insert-only application repair
In a TTL only use case with no explicit deletes, if read CL + write CL > RF you can likely avoid repairs with a few huge caveats: 1) read repair may mess up your ttl expiration if you’re using TWCS 2) if you lose a host you probably need to run repairs or you may not see some data after replacement (true in general) -- Jeff Jirsa > On May 12, 2018, at 5:27 AM, onmstester onmstester > wrote: > > Thank you Nitan, > That's exactly my case (RF > CL). But as long as there is no node outage, > shouldn't the hinted handoff handle data consistency? > > Sent using Zoho Mail > > > > On Sat, 12 May 2018 16:26:13 +0430 Nitan Kainth > wrote > > If you have RF>CL then Repair needs to be run to make sure data is in sync. > > Sent from my iPhone > > On May 12, 2018, at 3:54 AM, onmstester onmstester > wrote: > > > In an insert-only use case with TTL (6 months), should i run this command, > every 5-7 days on all the nodes of production cluster (according to this: > http://cassandra.apache.org/doc/latest/operating/repair.html )? > nodetool repair -pr --full > When none of the nodes was down in 4 months (ever since the cluster was > launched) and none of the rows been deleted, why should i run nodetool repair? > >
Re: Insert-only application repair
Thank you Nitan, That's exactly my case (RF > CL). But as long as there is no node outage, shouldn't the hinted handoff handle data consistency? Sent using Zoho Mail On Sat, 12 May 2018 16:26:13 +0430 Nitan Kainthwrote If you have RF>CL then Repair needs to be run to make sure data is in sync. Sent from my iPhone On May 12, 2018, at 3:54 AM, onmstester onmstester wrote: In an insert-only use case with TTL (6 months), should i run this command, every 5-7 days on all the nodes of production cluster (according to this: http://cassandra.apache.org/doc/latest/operating/repair.html )? nodetool repair -pr --full When none of the nodes was down in 4 months (ever since the cluster was launched) and none of the rows been deleted, why should i run nodetool repair?
Re: Insert-only application repair
If you have RF>CL then Repair needs to be run to make sure data is in sync. Sent from my iPhone > On May 12, 2018, at 3:54 AM, onmstester onmstester > wrote: > > > In an insert-only use case with TTL (6 months), should i run this command, > every 5-7 days on all the nodes of production cluster (according to this: > http://cassandra.apache.org/doc/latest/operating/repair.html )? > nodetool repair -pr --full > When none of the nodes was down in 4 months (ever since the cluster was > launched) and none of the rows been deleted, why should i run nodetool repair? >
Insert-only application repair
In an insert-only use case with TTL (6 months), should i run this command, every 5-7 days on all the nodes of production cluster (according to this: http://cassandra.apache.org/doc/latest/operating/repair.html )? nodetool repair -pr --full When none of the nodes was down in 4 months (ever since the cluster was launched) and none of the rows been deleted, why should i run nodetool repair?