subject:"KeyCache Harmless Error on Startup"

Re: KeyCache Harmless Error on Startup

2020-01-30 Thread Shalom Sagges

Thanks Erick!

I will check with the owners of this keyspace, hoping to find the culprit.
If they won't come up with anything, is there a way to read the key cache
file? (as I understand it's a binary file)
On another note, there's actually another keyspace I missed to point out on
which I found a weird behavior (not necessarily related though).

CREATE KEYSPACE ks3 WITH replication = {'class': 'NetworkTopologyStrategy',
'DC1': '3', 'DC2': '3'}  AND durable_writes = true;

CREATE TABLE ks3.tbl4 (
account_id text,
consumer_phone_number text,
channel text,
event_time_stamp timestamp,
brand_phone_number text,
campaign_id bigint,
engagement_id bigint,
event_type text,
PRIMARY KEY ((account_id, consumer_phone_number), channel,
event_time_stamp)
);

When I select from this table, I get the following warning:
*cqlsh.py:395: DateOverFlowWarning: Some timestamps are larger than Python
datetime can represent. Timestamps are displayed in milliseconds from
epoch.*

I don't know if it's related but worth pointing out.

account_id   | consumer_phone_number  | channel
| event_time_stamp | brand_phone_number   | campaign_id |
engagement_id | event_type
+--+-+--+--+-+---+
   12345678 | OIs1HXovJ9W/AJZI+Tm8CSCbAavdVI06qt0c | sms |
*1580305508799000* | PY0yHHItI9BibOtNis8hDuLwN91prPa+ |null |
   null |opt-out

Thanks!

On Thu, Jan 30, 2020 at 1:43 AM Erick Ramirez  wrote:

> Specifically for the NegativeArraySizeException, what's happening is that
>> the keyLength is so huge that it blows up MAX_UNSIGNED_SHORT so it looks
>> like it's a negative value. Someone will correct me if I got that wrong but
>> the "Key length longer than max" error confirms that.
>>
>
> Is it possible that you have a rogue metric_name value that's impossibly
> long? I'm a bit more convinced now that's what's happening because you said
> it happens on multiple servers which rules out local file corruption at the
> filesystem level. Cheers!
>

Re: KeyCache Harmless Error on Startup

2020-01-29 Thread Erick Ramirez

> Specifically for the NegativeArraySizeException, what's happening is that
> the keyLength is so huge that it blows up MAX_UNSIGNED_SHORT so it looks
> like it's a negative value. Someone will correct me if I got that wrong but
> the "Key length longer than max" error confirms that.
>

Is it possible that you have a rogue metric_name value that's impossibly
long? I'm a bit more convinced now that's what's happening because you said
it happens on multiple servers which rules out local file corruption at the
filesystem level. Cheers!

Re: KeyCache Harmless Error on Startup

2020-01-29 Thread Erick Ramirez

Oh, I just saw the example. Never mind. :)

Re: KeyCache Harmless Error on Startup

2020-01-29 Thread Erick Ramirez

>
> Does anyone perhaps have an idea on what could've gone wrong here?
> Could it be just a calculation error on startup?
>

Specifically for the NegativeArraySizeException, what's happening is that
the keyLength is so huge that it blows up MAX_UNSIGNED_SHORT so it looks
like it's a negative value. Someone will correct me if I got that wrong but
the "Key length longer than max" error confirms that.

That's why Jeff is interested in seeing a copy of the cache so we can see
what's in it if your keys are not sensitive/confidential. Maybe you can
provide an obfuscated example to give us some clues? Cheers!

Re: KeyCache Harmless Error on Startup

2020-01-29 Thread Shalom Sagges

Hi again,

Does anyone perhaps have an idea on what could've gone wrong here?
Could it be just a calculation error on startup?

Thanks!

On Sun, Jan 26, 2020 at 5:57 PM Shalom Sagges 
wrote:

> Hi Jeff,
>
> It is happening on multiple servers and even on different DCs.
> The schema contains two keyspaces as follows:
>
> CREATE KEYSPACE ks1 WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'}  AND durable_writes =
> true;
>
> CREATE TABLE ks1.tbl1 (
> account_id text,
> metric_name text,
> day_of_month int,
> hour_of_day int,
> insert_time timestamp,
> metric_value text,
> PRIMARY KEY ((account_id, metric_name, day_of_month, hour_of_day),
> insert_time)
> ) WITH CLUSTERING ORDER BY (insert_time DESC)
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND default_time_to_live = 2764800
> AND gc_grace_seconds = 864000;
>
> CREATE TABLE ks1.tbl2 (
> account_id text,
> lambda_uuid uuid,
> metric_name text,
> day_of_month int,
> insert_time timestamp,
> hour_of_day int,
> metric_value text,
> PRIMARY KEY ((account_id, lambda_uuid, metric_name, day_of_month),
> insert_time)
> ) WITH CLUSTERING ORDER BY (insert_time DESC)
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND default_time_to_live = 2764800
> AND gc_grace_seconds = 864000;
>
>
> CREATE KEYSPACE ks2 WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'}  AND durable_writes =
> true;
>
> CREATE TABLE ks2.tbl3 (
> account_id text,
> lambda_uuid uuid,
> insert_time timestamp,
> data text,
> error_count int,
> info_count int,
> request_id text,
> warn_count int,
> PRIMARY KEY ((account_id, lambda_uuid), insert_time)
> ) WITH CLUSTERING ORDER BY (insert_time DESC)
> AND default_time_to_live = 2764800
> AND gc_grace_seconds = 864000;
>
>
> I'm afraid I am unable to provide the actual key cache file, but for what
> it's worth, here's a data example from the above tables:
>
>
> *ks1.tbl1:*
>  account_id | metric_name   |
> day_of_month | hour_of_day | insert_time | metric_value
>
> +---+--+-+-+---
>12345678 | com.selfbuilds.faas.system.invocation-finished-successful |
>1 |  10 | 2020-01-01 10:59:03.588000+ |
> {"invocationIdentifier":"51176dd4-aa34-4834-bcf5-7e3d5b5494d1","lambdaUUID":"ADD3A5B4-7497-499A-8364-A97108C7400B","event":"","invocationState":"SUCCEEDED","executionTimeInMs":3392}
>
>
> *ks1.tbl2:*
>  account_id | lambda_uuid  | metric_name
>   | day_of_month | insert_time
> | hour_of_day | metric_value
>
> +--+-+--+-+-+--
>12345678 | ed7f0629-8d5c-47dc-a66d-4a1197aa01bc |
> com.selfbuilds.faas.system.invocation-triggered |9 | 2020-01-09
> 13:03:06.653000+ |  13 |
> {"invocationIdentifier":"3631e9ea-b230-4c4d-a051-c7b4241ee231","lambdaUUID":"ED7F0629-8D5C-47DC-A66D-4A1197AA01BC","event":"","lambdaName":"1578574875000","lastModifierOfLambda":"FaaSAll"}
>
>
> *ks2.tbl3:*
>  account_id | lambda_uuid  | insert_time
>   | data
> | error_count | info_count | request_id
>   | warn_count
>
> +--+-++-++--+
>12345678 | 9f4eaa4f-42de-4c6e-9c96-02a4749281f3 | 2020-01-01
> 01:30:22.302000+ |
> [{"timestamp":157784458,"level":"Info","message":"Best Logs in
> town","extras":[]}] |   0 |  1 |
> 2966683f-cf37-4ea3-9d82-1de46207d51e |  0
>
>
>
> Thanks for your help on this one!
>
>
>
> On Thu, Jan 23, 2020 at 5:40 PM Jeff Jirsa  wrote:
>
>> 489MB key seems huge. Unlikely it’s really that large.
>>
>> If it’s happening on multiple servers it’d be interesting to see the
>> schema and/or saved cache file if it’s not incredibly sensitive
>>
>> On Jan 23, 2020, at 3:48 AM, Shalom Sagges 
>> wrote:
>>
>> 
>> Hi All,
>>
>> Cassandra 3.11.4.
>> On one of our clusters, during startup, I see two types of

Re: KeyCache Harmless Error on Startup

2020-01-26 Thread Shalom Sagges

Hi Jeff,

It is happening on multiple servers and even on different DCs.
The schema contains two keyspaces as follows:

CREATE KEYSPACE ks1 WITH replication = {'class': 'NetworkTopologyStrategy',
'DC1': '3', 'DC2': '3'}  AND durable_writes = true;

CREATE TABLE ks1.tbl1 (
account_id text,
metric_name text,
day_of_month int,
hour_of_day int,
insert_time timestamp,
metric_value text,
PRIMARY KEY ((account_id, metric_name, day_of_month, hour_of_day),
insert_time)
) WITH CLUSTERING ORDER BY (insert_time DESC)
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND default_time_to_live = 2764800
AND gc_grace_seconds = 864000;

CREATE TABLE ks1.tbl2 (
account_id text,
lambda_uuid uuid,
metric_name text,
day_of_month int,
insert_time timestamp,
hour_of_day int,
metric_value text,
PRIMARY KEY ((account_id, lambda_uuid, metric_name, day_of_month),
insert_time)
) WITH CLUSTERING ORDER BY (insert_time DESC)
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND default_time_to_live = 2764800
AND gc_grace_seconds = 864000;


CREATE KEYSPACE ks2 WITH replication = {'class': 'NetworkTopologyStrategy',
'DC1': '3', 'DC2': '3'}  AND durable_writes = true;

CREATE TABLE ks2.tbl3 (
account_id text,
lambda_uuid uuid,
insert_time timestamp,
data text,
error_count int,
info_count int,
request_id text,
warn_count int,
PRIMARY KEY ((account_id, lambda_uuid), insert_time)
) WITH CLUSTERING ORDER BY (insert_time DESC)
AND default_time_to_live = 2764800
AND gc_grace_seconds = 864000;


I'm afraid I am unable to provide the actual key cache file, but for what
it's worth, here's a data example from the above tables:


*ks1.tbl1:*
 account_id | metric_name   |
day_of_month | hour_of_day | insert_time | metric_value
+---+--+-+-+---
   12345678 | com.selfbuilds.faas.system.invocation-finished-successful |
 1 |  10 | 2020-01-01 10:59:03.588000+ |
{"invocationIdentifier":"51176dd4-aa34-4834-bcf5-7e3d5b5494d1","lambdaUUID":"ADD3A5B4-7497-499A-8364-A97108C7400B","event":"","invocationState":"SUCCEEDED","executionTimeInMs":3392}


*ks1.tbl2:*
 account_id | lambda_uuid  | metric_name
  | day_of_month | insert_time
| hour_of_day | metric_value
+--+-+--+-+-+--
   12345678 | ed7f0629-8d5c-47dc-a66d-4a1197aa01bc |
com.selfbuilds.faas.system.invocation-triggered |9 | 2020-01-09
13:03:06.653000+ |  13 |
{"invocationIdentifier":"3631e9ea-b230-4c4d-a051-c7b4241ee231","lambdaUUID":"ED7F0629-8D5C-47DC-A66D-4A1197AA01BC","event":"","lambdaName":"1578574875000","lastModifierOfLambda":"FaaSAll"}


*ks2.tbl3:*
 account_id | lambda_uuid  | insert_time
  | data
| error_count | info_count | request_id
  | warn_count
+--+-++-++--+
   12345678 | 9f4eaa4f-42de-4c6e-9c96-02a4749281f3 | 2020-01-01
01:30:22.302000+ |
[{"timestamp":157784458,"level":"Info","message":"Best Logs in
town","extras":[]}] |   0 |  1 |
2966683f-cf37-4ea3-9d82-1de46207d51e |  0



Thanks for your help on this one!



On Thu, Jan 23, 2020 at 5:40 PM Jeff Jirsa  wrote:

> 489MB key seems huge. Unlikely it’s really that large.
>
> If it’s happening on multiple servers it’d be interesting to see the
> schema and/or saved cache file if it’s not incredibly sensitive
>
> On Jan 23, 2020, at 3:48 AM, Shalom Sagges  wrote:
>
> 
> Hi All,
>
> Cassandra 3.11.4.
> On one of our clusters, during startup, I see two types of  "Harmless
> error" notification regarding the keycache:
>
> *Server 1:*
>
> INFO  [pool-3-thread-1] 2020-01-23 04:34:46,167 AutoSavingCache.java:263 - 
> *Harmless
> error reading saved cache* /path/to/saved_caches/KeyCache-e.db
> *java.lang.NegativeArraySizeException: null*
> at
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401)
> at
>

Re: KeyCache Harmless Error on Startup

2020-01-23 Thread Jeff Jirsa

489MB key seems huge. Unlikely it’s really that large. 

If it’s happening on multiple servers it’d be interesting to see the schema 
and/or saved cache file if it’s not incredibly sensitive 

> On Jan 23, 2020, at 3:48 AM, Shalom Sagges  wrote:
> 
> 
> Hi All,
> 
> Cassandra 3.11.4. 
> On one of our clusters, during startup, I see two types of  "Harmless error" 
> notification regarding the keycache: 
> 
> Server 1:
> 
> INFO  [pool-3-thread-1] 2020-01-23 04:34:46,167 AutoSavingCache.java:263 - 
> Harmless error reading saved cache /path/to/saved_caches/KeyCache-e.db
> java.lang.NegativeArraySizeException: null
> at 
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401)
> at 
> org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:472)
> at 
> org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:220)
> at 
> org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:165)
> at 
> org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:161)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> INFO  [pool-3-thread-1] 2020-01-23 04:34:46,168 AutoSavingCache.java:174 - 
> Completed loading (188 ms; 10916 keys) KeyCache cache
> 
> 
> Server 2:
> 
> INFO  [pool-3-thread-1] 2020-01-15 11:18:35,797 AutoSavingCache.java:263 - 
> Harmless error reading saved cache /path/to/saved_caches//KeyCache-e.db
> java.io.IOException: Corrupted key cache. Key length of 489735494 is longer 
> than maximum of 65535
> at 
> org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:469)
> at 
> org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:220)
> at 
> org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:165)
> at 
> org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:161)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> INFO  [pool-3-thread-1] 2020-01-15 11:18:35,799 AutoSavingCache.java:174 - 
> Completed loading (285 ms; 20681 keys) KeyCache cache
> 
> The KeyCache size is a lot less than max default (100MB). So does this mean 
> that the key itself is the culprit?
> 
> I'd really appreciate some guidance on what this message actually means in 
> the logs during startup and what problems may arise from it. 
> I see that one of the nodes experiences high CPU load and even gets kernel 
> panic from time to time, so I want to understand if the above may cause such 
> issues. 
> 
> Thanks a lot!
>

KeyCache Harmless Error on Startup

2020-01-23 Thread Shalom Sagges

Hi All,

Cassandra 3.11.4.
On one of our clusters, during startup, I see two types of  "Harmless
error" notification regarding the keycache:

*Server 1:*

INFO  [pool-3-thread-1] 2020-01-23 04:34:46,167
AutoSavingCache.java:263 - *Harmless
error reading saved cache* /path/to/saved_caches/KeyCache-e.db
*java.lang.NegativeArraySizeException: null*
at
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401)
at
org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:472)
at
org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:220)
at
org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:165)
at
org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:161)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
INFO  [pool-3-thread-1] 2020-01-23 04:34:46,168 AutoSavingCache.java:174 -
Completed loading (188 ms; 10916 keys) KeyCache cache


*Server 2:*

INFO  [pool-3-thread-1] 2020-01-15 11:18:35,797 AutoSavingCache.java:263 -
Harmless error reading saved cache /path/to/saved_caches//KeyCache-e.db
*java.io.IOException: Corrupted key cache. Key length of 489735494 is
longer than maximum of 65535*
at
org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:469)
at
org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:220)
at
org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:165)
at
org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:161)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
INFO  [pool-3-thread-1] 2020-01-15 11:18:35,799 AutoSavingCache.java:174 -
Completed loading (285 ms; 20681 keys) KeyCache cache

The KeyCache size is a lot less than max default (100MB). So does this mean
that the key itself is the culprit?

I'd really appreciate some guidance on what this message actually means in
the logs during startup and what problems may arise from it.
I see that one of the nodes experiences high CPU load and even gets kernel
panic from time to time, so I want to understand if the above may cause
such issues.

Thanks a lot!

Re: KeyCache Harmless Error on Startup

Re: KeyCache Harmless Error on Startup

Re: KeyCache Harmless Error on Startup

Re: KeyCache Harmless Error on Startup

Re: KeyCache Harmless Error on Startup

Re: KeyCache Harmless Error on Startup

Re: KeyCache Harmless Error on Startup

KeyCache Harmless Error on Startup

8 matches

Site Navigation

Mail list logo

Footer information