Forensics are gone at this point, so I can't verify exact errors, but
wanted to mention we had seen something similar to corroborate your
experience and warn others.

The version would have been 3.0.15 or 3.11.3 as that is what we were
deploying on our clusters at the time. I think it was more likely 3.0.15.

So sorry for the "vagueness" :(

On Tue, Feb 18, 2020, 8:54 PM Surbhi Gupta <surbhi.gupt...@gmail.com> wrote:

> Jonathan,
> As per https://issues.apache.org/jira/browse/CASSANDRA-13696 the issue,
> Digest mismatch Exception if hints file has UnknownColumnFamily,  is fixed
> for 3.0.15 , did you still faced this issue on 3.0.15 ?
>
> Thanks
> Surbhi
>
> On Tue, 18 Feb 2020 at 17:40, Jonathan Koppenhofer <j...@koppedomain.com>
> wrote:
>
>> I believe we had something similar happen on 3.0.15 a while back. We had
>> a cluster that created mass havoc by creating MVs on a large existing
>> dataset. We thought we had stabilized the cluster, but saw similar issues
>> as you when we dropped the MVs.
>>
>> We interpreted our errors to mean that we should not attempt to write to
>> base tables while also dropping downstream materialized views. We
>> essentially had the app stop their app, then drop the views 1 by 1 with
>> some pause between. That then seemed to work fine, but yes, be careful with
>> everything MVs.
>>
>> We now disallow the use of MVs globally.
>>
>> On Tue, Feb 18, 2020, 8:27 PM Surbhi Gupta <surbhi.gupt...@gmail.com>
>> wrote:
>>
>>> We are on cassandra 3.11 , we are using G1GC and using 16GB of heap.
>>>
>>> So we had to drop 7 MVs in production, as soon as we dropped the first
>>> Materialized View, our cluster became unstable and app started giving 100%
>>> error, what we noticed:
>>> 1. As soon as MV was dropped , cluster became unstable and nodes were
>>> showing down from each other.
>>> 2. We saw below warnings in system.log which is understood,
>>>  WARN [MessagingService-Incoming-/10.X.X.X] 2020-02-18 14:21:47,115
>>> IncomingTcpConnection.java:103 - UnknownColumnFamilyException reading from
>>> socket; closing
>>>
>>> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table 
>>> for cfId 7bb4b2d2-8622-11e8-a44b-a92f2dd93ff1. If a table was just created, 
>>> this is likely due to the schema not being fully propagated.  Please wait 
>>> for schema agreement on table creation.
>>>
>>> 3. We noticed below errors as well:
>>>
>>> ERROR [MutationStage-9] 2020-02-18 14:21:47,267 Keyspace.java:593 - 
>>> Attempting to mutate non-existant table 7bb4b2d2-8622-11e8-a44b-a92f2dd93ff1
>>>
>>> 4. We noticed messages like below:
>>>
>>> WARN  [BatchlogTasks:1] 2020-02-18 14:21:53,786 BatchlogManager.java:252 - 
>>> Skipped batch replay of a19c6480-5294-11ea-9e09-3948c59ad0f5 due to {}
>>>
>>> 5. Hints file corrupted:
>>>
>>> WARN  [HintsDispatcher:6737] 2020-02-18 14:22:24,932 HintsReader.java:237 - 
>>> Failed to read a hint for /10.X.X.X: f75e58e8-c255-4318-a553-06487b6bbe8c - 
>>> table with id 7bb4b2d2-8622-11e8-a44b-a92f2dd93ff1 is unknown in file 
>>> f75e58e8-c255-4318-a553-06487b6bbe8c-1582060925656-1.hints
>>> ERROR [HintsDispatcher:6737] 2020-02-18 14:22:24,933 
>>> HintsDispatchExecutor.java:243 - Failed to dispatch hints file 
>>> f75e58e8-c255-4318-a553-06487b6bbe8c-1582060925656-1.hints: file is 
>>> corrupted ({})
>>> org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch 
>>> exception
>>>
>>>
>>> 5. And then Cassandra shut down:
>>>
>>> ERROR [HintsDispatcher:6737] 2020-02-18 14:22:24,937 
>>> StorageService.java:430 - Stopping gossiper
>>> WARN  [HintsDispatcher:6737] 2020-02-18 14:22:24,937 
>>> StorageService.java:321 - Stopping gossip by operator request
>>> INFO  [HintsDispatcher:6737] 2020-02-18 14:22:24,937 Gossiper.java:1530 - 
>>> Announcing shutdown
>>>
>>>
>>> Any views? Below are the issues I
>>>
>>> https://support.datastax.com/hc/en-us/articles/360000368126-Hints-file-with-unknown-CFID-can-cause-nodes-to-fail
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-13696
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-6822
>>>
>>> https://support.datastax.com/hc/en-us/articles/360000368126-Hints-file-with-unknown-CFID-can-cause-nodes-to-fail
>>>
>>>
>>>
>>>
>>>
>>> On Wed, 12 Feb 2020 at 19:10, Surbhi Gupta <surbhi.gupt...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Eric ...
>>>> This is helpful...
>>>>
>>>>
>>>> On Wed, 12 Feb 2020 at 17:46, Erick Ramirez <erick.rami...@datastax.com>
>>>> wrote:
>>>>
>>>>> There shouldn't be any negative impact from dropping MVs and there's
>>>>> certainly no risk to the base table if that is your concern. All it will 
>>>>> do
>>>>> is remove all the data in the respective views plus drop any pending view
>>>>> mutations from the batch log. If anything, you should see some performance
>>>>> gain since updates to the base table will only trigger 4 view updates
>>>>> instead of the previous 11. Cheers!
>>>>>
>>>>> Erick Ramirez  |  Developer Relations
>>>>>
>>>>> erick.rami...@datastax.com | datastax.com <http://www.datastax.com>
>>>>> <https://www.linkedin.com/company/datastax>
>>>>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>>>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>>>
>>>>> <https://www.datastax.com/accelerate>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, 13 Feb 2020 at 04:26, Surbhi Gupta <surbhi.gupt...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> So application team created 11 materialized views on a base table in
>>>>>> production and we need to drop 7 Materialized views as they are not in 
>>>>>> use.
>>>>>> Wanted to understand the impact of dropping the materialized views.
>>>>>> We are on Cassandra 3.11.1 , multi datacenter with replication factor
>>>>>> of 3 in each datacenter.
>>>>>> We are using LOCAL_QUORUM for write consistency and LOCAL_ONE for
>>>>>> read consistency.
>>>>>>
>>>>>> Any thoughts or suggestion to keep in mind before dropping the
>>>>>> Materialized views.
>>>>>>
>>>>>> Thanks
>>>>>> Surbhi
>>>>>>
>>>>>>
>>>>>>
>>>>>>

Reply via email to