Re: KeyCache Harmless Error on Startup
Thanks Erick! I will check with the owners of this keyspace, hoping to find the culprit. If they won't come up with anything, is there a way to read the key cache file? (as I understand it's a binary file) On another note, there's actually another keyspace I missed to point out on which I found a weird behavior (not necessarily related though). CREATE KEYSPACE ks3 WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} AND durable_writes = true; CREATE TABLE ks3.tbl4 ( account_id text, consumer_phone_number text, channel text, event_time_stamp timestamp, brand_phone_number text, campaign_id bigint, engagement_id bigint, event_type text, PRIMARY KEY ((account_id, consumer_phone_number), channel, event_time_stamp) ); When I select from this table, I get the following warning: *cqlsh.py:395: DateOverFlowWarning: Some timestamps are larger than Python datetime can represent. Timestamps are displayed in milliseconds from epoch.* I don't know if it's related but worth pointing out. account_id | consumer_phone_number | channel | event_time_stamp | brand_phone_number | campaign_id | engagement_id | event_type +--+-+--+--+-+---+ 12345678 | OIs1HXovJ9W/AJZI+Tm8CSCbAavdVI06qt0c | sms | *1580305508799000* | PY0yHHItI9BibOtNis8hDuLwN91prPa+ |null | null |opt-out Thanks! On Thu, Jan 30, 2020 at 1:43 AM Erick Ramirez wrote: > Specifically for the NegativeArraySizeException, what's happening is that >> the keyLength is so huge that it blows up MAX_UNSIGNED_SHORT so it looks >> like it's a negative value. Someone will correct me if I got that wrong but >> the "Key length longer than max" error confirms that. >> > > Is it possible that you have a rogue metric_name value that's impossibly > long? I'm a bit more convinced now that's what's happening because you said > it happens on multiple servers which rules out local file corruption at the > filesystem level. Cheers! >
Re: KeyCache Harmless Error on Startup
> Specifically for the NegativeArraySizeException, what's happening is that > the keyLength is so huge that it blows up MAX_UNSIGNED_SHORT so it looks > like it's a negative value. Someone will correct me if I got that wrong but > the "Key length longer than max" error confirms that. > Is it possible that you have a rogue metric_name value that's impossibly long? I'm a bit more convinced now that's what's happening because you said it happens on multiple servers which rules out local file corruption at the filesystem level. Cheers!
Re: KeyCache Harmless Error on Startup
Oh, I just saw the example. Never mind. :)
Re: KeyCache Harmless Error on Startup
> > Does anyone perhaps have an idea on what could've gone wrong here? > Could it be just a calculation error on startup? > Specifically for the NegativeArraySizeException, what's happening is that the keyLength is so huge that it blows up MAX_UNSIGNED_SHORT so it looks like it's a negative value. Someone will correct me if I got that wrong but the "Key length longer than max" error confirms that. That's why Jeff is interested in seeing a copy of the cache so we can see what's in it if your keys are not sensitive/confidential. Maybe you can provide an obfuscated example to give us some clues? Cheers!
Re: KeyCache Harmless Error on Startup
Hi again, Does anyone perhaps have an idea on what could've gone wrong here? Could it be just a calculation error on startup? Thanks! On Sun, Jan 26, 2020 at 5:57 PM Shalom Sagges wrote: > Hi Jeff, > > It is happening on multiple servers and even on different DCs. > The schema contains two keyspaces as follows: > > CREATE KEYSPACE ks1 WITH replication = {'class': > 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} AND durable_writes = > true; > > CREATE TABLE ks1.tbl1 ( > account_id text, > metric_name text, > day_of_month int, > hour_of_day int, > insert_time timestamp, > metric_value text, > PRIMARY KEY ((account_id, metric_name, day_of_month, hour_of_day), > insert_time) > ) WITH CLUSTERING ORDER BY (insert_time DESC) > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND default_time_to_live = 2764800 > AND gc_grace_seconds = 864000; > > CREATE TABLE ks1.tbl2 ( > account_id text, > lambda_uuid uuid, > metric_name text, > day_of_month int, > insert_time timestamp, > hour_of_day int, > metric_value text, > PRIMARY KEY ((account_id, lambda_uuid, metric_name, day_of_month), > insert_time) > ) WITH CLUSTERING ORDER BY (insert_time DESC) > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND default_time_to_live = 2764800 > AND gc_grace_seconds = 864000; > > > CREATE KEYSPACE ks2 WITH replication = {'class': > 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} AND durable_writes = > true; > > CREATE TABLE ks2.tbl3 ( > account_id text, > lambda_uuid uuid, > insert_time timestamp, > data text, > error_count int, > info_count int, > request_id text, > warn_count int, > PRIMARY KEY ((account_id, lambda_uuid), insert_time) > ) WITH CLUSTERING ORDER BY (insert_time DESC) > AND default_time_to_live = 2764800 > AND gc_grace_seconds = 864000; > > > I'm afraid I am unable to provide the actual key cache file, but for what > it's worth, here's a data example from the above tables: > > > *ks1.tbl1:* > account_id | metric_name | > day_of_month | hour_of_day | insert_time | metric_value > > +---+--+-+-+--- >12345678 | com.selfbuilds.faas.system.invocation-finished-successful | >1 | 10 | 2020-01-01 10:59:03.588000+ | > {"invocationIdentifier":"51176dd4-aa34-4834-bcf5-7e3d5b5494d1","lambdaUUID":"ADD3A5B4-7497-499A-8364-A97108C7400B","event":"","invocationState":"SUCCEEDED","executionTimeInMs":3392} > > > *ks1.tbl2:* > account_id | lambda_uuid | metric_name > | day_of_month | insert_time > | hour_of_day | metric_value > > +--+-+--+-+-+-- >12345678 | ed7f0629-8d5c-47dc-a66d-4a1197aa01bc | > com.selfbuilds.faas.system.invocation-triggered |9 | 2020-01-09 > 13:03:06.653000+ | 13 | > {"invocationIdentifier":"3631e9ea-b230-4c4d-a051-c7b4241ee231","lambdaUUID":"ED7F0629-8D5C-47DC-A66D-4A1197AA01BC","event":"","lambdaName":"1578574875000","lastModifierOfLambda":"FaaSAll"} > > > *ks2.tbl3:* > account_id | lambda_uuid | insert_time > | data > | error_count | info_count | request_id > | warn_count > > +--+-++-++--+ >12345678 | 9f4eaa4f-42de-4c6e-9c96-02a4749281f3 | 2020-01-01 > 01:30:22.302000+ | > [{"timestamp":157784458,"level":"Info","message":"Best Logs in > town","extras":[]}] | 0 | 1 | > 2966683f-cf37-4ea3-9d82-1de46207d51e | 0 > > > > Thanks for your help on this one! > > > > On Thu, Jan 23, 2020 at 5:40 PM Jeff Jirsa wrote: > >> 489MB key seems huge. Unlikely it’s really that large. >> >> If it’s happening on multiple servers it’d be interesting to see the >> schema and/or saved cache file if it’s not incredibly sensitive >> >> On Jan 23, 2020, at 3:48 AM, Shalom Sagges >> wrote: >> >> >> Hi All, >> >> Cassandra 3.11.4. >> On one of our clusters, during startup, I see two types of
Re: KeyCache Harmless Error on Startup
Hi Jeff, It is happening on multiple servers and even on different DCs. The schema contains two keyspaces as follows: CREATE KEYSPACE ks1 WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} AND durable_writes = true; CREATE TABLE ks1.tbl1 ( account_id text, metric_name text, day_of_month int, hour_of_day int, insert_time timestamp, metric_value text, PRIMARY KEY ((account_id, metric_name, day_of_month, hour_of_day), insert_time) ) WITH CLUSTERING ORDER BY (insert_time DESC) AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND default_time_to_live = 2764800 AND gc_grace_seconds = 864000; CREATE TABLE ks1.tbl2 ( account_id text, lambda_uuid uuid, metric_name text, day_of_month int, insert_time timestamp, hour_of_day int, metric_value text, PRIMARY KEY ((account_id, lambda_uuid, metric_name, day_of_month), insert_time) ) WITH CLUSTERING ORDER BY (insert_time DESC) AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND default_time_to_live = 2764800 AND gc_grace_seconds = 864000; CREATE KEYSPACE ks2 WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} AND durable_writes = true; CREATE TABLE ks2.tbl3 ( account_id text, lambda_uuid uuid, insert_time timestamp, data text, error_count int, info_count int, request_id text, warn_count int, PRIMARY KEY ((account_id, lambda_uuid), insert_time) ) WITH CLUSTERING ORDER BY (insert_time DESC) AND default_time_to_live = 2764800 AND gc_grace_seconds = 864000; I'm afraid I am unable to provide the actual key cache file, but for what it's worth, here's a data example from the above tables: *ks1.tbl1:* account_id | metric_name | day_of_month | hour_of_day | insert_time | metric_value +---+--+-+-+--- 12345678 | com.selfbuilds.faas.system.invocation-finished-successful | 1 | 10 | 2020-01-01 10:59:03.588000+ | {"invocationIdentifier":"51176dd4-aa34-4834-bcf5-7e3d5b5494d1","lambdaUUID":"ADD3A5B4-7497-499A-8364-A97108C7400B","event":"","invocationState":"SUCCEEDED","executionTimeInMs":3392} *ks1.tbl2:* account_id | lambda_uuid | metric_name | day_of_month | insert_time | hour_of_day | metric_value +--+-+--+-+-+-- 12345678 | ed7f0629-8d5c-47dc-a66d-4a1197aa01bc | com.selfbuilds.faas.system.invocation-triggered |9 | 2020-01-09 13:03:06.653000+ | 13 | {"invocationIdentifier":"3631e9ea-b230-4c4d-a051-c7b4241ee231","lambdaUUID":"ED7F0629-8D5C-47DC-A66D-4A1197AA01BC","event":"","lambdaName":"1578574875000","lastModifierOfLambda":"FaaSAll"} *ks2.tbl3:* account_id | lambda_uuid | insert_time | data | error_count | info_count | request_id | warn_count +--+-++-++--+ 12345678 | 9f4eaa4f-42de-4c6e-9c96-02a4749281f3 | 2020-01-01 01:30:22.302000+ | [{"timestamp":157784458,"level":"Info","message":"Best Logs in town","extras":[]}] | 0 | 1 | 2966683f-cf37-4ea3-9d82-1de46207d51e | 0 Thanks for your help on this one! On Thu, Jan 23, 2020 at 5:40 PM Jeff Jirsa wrote: > 489MB key seems huge. Unlikely it’s really that large. > > If it’s happening on multiple servers it’d be interesting to see the > schema and/or saved cache file if it’s not incredibly sensitive > > On Jan 23, 2020, at 3:48 AM, Shalom Sagges wrote: > > > Hi All, > > Cassandra 3.11.4. > On one of our clusters, during startup, I see two types of "Harmless > error" notification regarding the keycache: > > *Server 1:* > > INFO [pool-3-thread-1] 2020-01-23 04:34:46,167 AutoSavingCache.java:263 - > *Harmless > error reading saved cache* /path/to/saved_caches/KeyCache-e.db > *java.lang.NegativeArraySizeException: null* > at > org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401) > at >
Re: KeyCache Harmless Error on Startup
489MB key seems huge. Unlikely it’s really that large. If it’s happening on multiple servers it’d be interesting to see the schema and/or saved cache file if it’s not incredibly sensitive > On Jan 23, 2020, at 3:48 AM, Shalom Sagges wrote: > > > Hi All, > > Cassandra 3.11.4. > On one of our clusters, during startup, I see two types of "Harmless error" > notification regarding the keycache: > > Server 1: > > INFO [pool-3-thread-1] 2020-01-23 04:34:46,167 AutoSavingCache.java:263 - > Harmless error reading saved cache /path/to/saved_caches/KeyCache-e.db > java.lang.NegativeArraySizeException: null > at > org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401) > at > org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:472) > at > org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:220) > at > org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:165) > at > org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:161) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > INFO [pool-3-thread-1] 2020-01-23 04:34:46,168 AutoSavingCache.java:174 - > Completed loading (188 ms; 10916 keys) KeyCache cache > > > Server 2: > > INFO [pool-3-thread-1] 2020-01-15 11:18:35,797 AutoSavingCache.java:263 - > Harmless error reading saved cache /path/to/saved_caches//KeyCache-e.db > java.io.IOException: Corrupted key cache. Key length of 489735494 is longer > than maximum of 65535 > at > org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:469) > at > org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:220) > at > org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:165) > at > org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:161) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > INFO [pool-3-thread-1] 2020-01-15 11:18:35,799 AutoSavingCache.java:174 - > Completed loading (285 ms; 20681 keys) KeyCache cache > > The KeyCache size is a lot less than max default (100MB). So does this mean > that the key itself is the culprit? > > I'd really appreciate some guidance on what this message actually means in > the logs during startup and what problems may arise from it. > I see that one of the nodes experiences high CPU load and even gets kernel > panic from time to time, so I want to understand if the above may cause such > issues. > > Thanks a lot! >
KeyCache Harmless Error on Startup
Hi All, Cassandra 3.11.4. On one of our clusters, during startup, I see two types of "Harmless error" notification regarding the keycache: *Server 1:* INFO [pool-3-thread-1] 2020-01-23 04:34:46,167 AutoSavingCache.java:263 - *Harmless error reading saved cache* /path/to/saved_caches/KeyCache-e.db *java.lang.NegativeArraySizeException: null* at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401) at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:472) at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:220) at org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:165) at org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:161) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) INFO [pool-3-thread-1] 2020-01-23 04:34:46,168 AutoSavingCache.java:174 - Completed loading (188 ms; 10916 keys) KeyCache cache *Server 2:* INFO [pool-3-thread-1] 2020-01-15 11:18:35,797 AutoSavingCache.java:263 - Harmless error reading saved cache /path/to/saved_caches//KeyCache-e.db *java.io.IOException: Corrupted key cache. Key length of 489735494 is longer than maximum of 65535* at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:469) at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:220) at org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:165) at org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:161) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) INFO [pool-3-thread-1] 2020-01-15 11:18:35,799 AutoSavingCache.java:174 - Completed loading (285 ms; 20681 keys) KeyCache cache The KeyCache size is a lot less than max default (100MB). So does this mean that the key itself is the culprit? I'd really appreciate some guidance on what this message actually means in the logs during startup and what problems may arise from it. I see that one of the nodes experiences high CPU load and even gets kernel panic from time to time, so I want to understand if the above may cause such issues. Thanks a lot!