[
https://issues.apache.org/jira/browse/CASSANDRA-19033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789873#comment-17789873
]
Jon Haddad commented on CASSANDRA-19033:
My understanding of [JEP 158|https://openjdk.org/jeps/158] was the logging
format was to be unified, which seems like it should reduce what we'd
potentially have to read in, but I'm not actually sure if it just unifies the
configuration or the log format itself. I originally thought it was the format
but as I look closer at it, I'm seeing there's quite a few options to change
the format itself. I need to compare the output formats in java 11, 17 and 21
before I can say for certain.
We may want to consider writing to a C* table, which I like from the simplicity
standpoint, and realistically it would have a pretty trivial overhead. Rather
than reading from the logs, we'd could make the GCInspector write to the table
after a pause.
Thoughts?
I think regarding schema, I think it's safe to assume that as a starting point,
we'd want to see the start time, elapsed time, time to stop threads, and the
entire raw message. Generational collectors will have information about eden,
survivor and old gen. Those could either be stored as JSON or as a map with a
UDT. I think the only data available there is the space usage, but I want to
check on newer versions as well as Shenandoah and ZGC to be sure. Regional
collectors like G1 are going to have specific information we'll want to include
as well.
So far all I can say is that at a minimum we'd want the following:
{noformat}
CREATE TABLE gc_history (
start_time datetime primary key,
total_elapsed_ms int,
stop_thread_time,
raw_message text
)
{noformat}
Obviously the above is fairly minimal, but I think it would be pretty useful
even in it's limited state. I'd be able to look at all the pauses in a
specific window of time, or find all pauses lasting longer than N ms which are
the two types of queries I'd do most often. I could also see a simple tool
that renders a histogram of GC pause times over a specific window of time which
would be incredibly helpful even if it doesn't provide additional diagnostic
info.
I'll try to get some examples of different log types this week so we can try to
break out other useful fields in the schema.
> Add virtual table with GC pause history
> ---
>
> Key: CASSANDRA-19033
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19033
> Project: Cassandra
> Issue Type: New Feature
> Components: Feature/Virtual Tables
>Reporter: Jon Haddad
>Priority: Normal
>
> We should be able to view GC pause history in a virtual table.
> I think the best approach here is to read from the GC logs. The format was
> unified in Java 9, and we've dropped older JVM support so I think this is
> reasonable. The benefits of using logs are that we can preserve it across
> restarts and we enable GC logs by default.
> The downside is people might not have GC logs configured and it seems weird
> that a feature would just stop working because logs aren't enabled. Maybe
> that's OK if we call it out, or error if people try to read from it and the
> logs aren't enabled. I think if someone disables -Xlog:gc then an error
> might be fine as I don't expect it to happen often. I think I lean towards
> this from a usability perspective, and Microsoft has a
> [project|https://github.com/microsoft/gctoolkit] to parse them, but I haven't
> used it so I'm not sure if it's suitable for us.
> At a minimum, pause time should be it's own field so we can query for pauses
> over a specific threshold, but there may be other data we want to explicitly
> split out as well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org