Stephen Mallette created CASSANDRA-15821:
--------------------------------------------

             Summary: Metrics Documentation Enhancements
                 Key: CASSANDRA-15821
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15821
             Project: Cassandra
          Issue Type: Improvement
          Components: Documentation/Website
            Reporter: Stephen Mallette


CASSANDRA-15582 involves quality around metrics and it was mentioned that 
reviewing and [improving 
documentation|https://github.com/apache/cassandra/blob/trunk/doc/source/operating/metrics.rst]
 around metrics would fall into that scope. Please consider some of this 
analysis in determining what improvements to make here:

Please see [this 
spreadsheet|https://docs.google.com/spreadsheets/d/1iPWfCMIG75CI6LbYuDtCTjEOvZw-5dyH-e08bc63QnI/edit?usp=sharing]
 that itemizes almost all of cassandra's metrics and whether they are 
documented or not (and other notes).  That spreadsheet is "almost all" because 
there are some metrics that don't seem to initialize as part of Cassandra 
startup (i was able to trigger some to initialize, but all were not immediately 
obvious). The missing metrics seem to be related to the following:

* ThreadPool metrics - only some initialize at startup the list of which follow 
below
* Streaming Metrics
* HintedHandoff Metrics
* HintsService Metrics

Here are the ThreadPool scopes that get listed:

{code}
AntiEntropyStage
CacheCleanupExecutor
CompactionExecutor
GossipStage
HintsDispatcher
MemtableFlushWriter
MemtablePostFlush
MemtableReclaimMemory
MigrationStage
MutationStage
Native-Transport-Requests
PendingRangeCalculator
PerDiskMemtableFlushWriter_0
ReadStage
Repair-Task
RequestResponseStage
Sampler
SecondaryIndexManagement
ValidationExecutor
ViewBuildExecutor
{code}

I noticed that Keyspace Metrics have this note: "Most of these metrics are the 
same as the Table Metrics above, only they are aggregated at the Keyspace 
level." I think I've isolated those metrics on table that are not on keyspace 
to specifically be:

{code}
BloomFilterFalsePositives
BloomFilterFalseRatio
BytesAnticompacted
BytesFlushed
BytesMutatedAnticompaction
BytesPendingRepair
BytesRepaired
BytesUnrepaired
CompactionBytesWritten
CompressionRatio
CoordinatorReadLatency
CoordinatorScanLatency
CoordinatorWriteLatency
EstimatedColumnCountHistogram
EstimatedPartitionCount
EstimatedPartitionSizeHistogram
KeyCacheHitRate
LiveSSTableCount
MaxPartitionSize
MeanPartitionSize
MinPartitionSize
MutatedAnticompactionGauge
PercentRepaired
RowCacheHitOutOfRange
RowCacheHit
RowCacheMiss
SpeculativeSampleLatencyNanos
SyncTime
WaitingOnFreeMemtableSpace
DroppedMutations
{code}

Someone with greater knowledge of this area might consider it worth the effort 
to see if any of these metrics should be aggregated to the keyspace level in 
case they were inadvertently missed. In any case, perhaps the documentation 
could easily now reflect which metric names could be expected on Keyspace.

The DroppedMessage metrics have a much larger body of scopes than just what 
were documented:

{code}
ASYMMETRIC_SYNC_REQ
BATCH_REMOVE_REQ
BATCH_REMOVE_RSP
BATCH_STORE_REQ
BATCH_STORE_RSP
CLEANUP_MSG
COUNTER_MUTATION_REQ
COUNTER_MUTATION_RSP
ECHO_REQ
ECHO_RSP
FAILED_SESSION_MSG
FAILURE_RSP
FINALIZE_COMMIT_MSG
FINALIZE_PROMISE_MSG
FINALIZE_PROPOSE_MSG
GOSSIP_DIGEST_ACK
GOSSIP_DIGEST_ACK2
GOSSIP_DIGEST_SYN
GOSSIP_SHUTDOWN
HINT_REQ
HINT_RSP
INTERNAL_RSP
MUTATION_REQ
MUTATION_RSP
PAXOS_COMMIT_REQ
PAXOS_COMMIT_RSP
PAXOS_PREPARE_REQ
PAXOS_PREPARE_RSP
PAXOS_PROPOSE_REQ
PAXOS_PROPOSE_RSP
PING_REQ
PING_RSP
PREPARE_CONSISTENT_REQ
PREPARE_CONSISTENT_RSP
PREPARE_MSG
RANGE_REQ
RANGE_RSP
READ_REPAIR_REQ
READ_REPAIR_RSP
READ_REQ
READ_RSP
REPAIR_RSP
REPLICATION_DONE_REQ
REPLICATION_DONE_RSP
REQUEST_RSP
SCHEMA_PULL_REQ
SCHEMA_PULL_RSP
SCHEMA_PUSH_REQ
SCHEMA_PUSH_RSP
SCHEMA_VERSION_REQ
SCHEMA_VERSION_RSP
SNAPSHOT_MSG
SNAPSHOT_REQ
SNAPSHOT_RSP
STATUS_REQ
STATUS_RSP
SYNC_REQ
SYNC_RSP
TRUNCATE_REQ
TRUNCATE_RSP
VALIDATION_REQ
VALIDATION_RSP
_SAMPLE
_TEST_1
_TEST_2
_TRACE
{code}

I suppose I may yet be missing some metrics as my knowledge of what's available 
is limited to what I can get from JMX after cassandra initialization (and some 
initial starting commands) and what's int he documentation. If something is 
present that is missing from both then I won't know it's there.  Anyway, 
perhaps this issue can help build some discussion around the improvements that 
might be made given the analysis that has been provided so far. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to