[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17704701#comment-17704701 ] Paulo Motta commented on CASSANDRA-16325: - Now that CASSANDRA-18304 is committed I rebased this patch and resubmitted CI: [!https://ci-cassandra.apache.org/job/Cassandra-devbranch/2392/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/2392/pipeline] Prepared patch is on [this branch|https://github.com/pauloricardomg/cassandra/tree/feature/CASSANDRA-16325]. Also includes a [commit|https://github.com/pauloricardomg/cassandra/commit/43e9f7c7932dbdb6b654be59fecd8300ccdbf2b] to log node configuration on dtests, which allowed to verify that the configuration is being properly set. > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 5.0 > > Time Spent: 10h 50m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17681030#comment-17681030 ] Paulo Motta commented on CASSANDRA-16325: - build failed due to CASSANDRA-18197 Resubmitted CI: !https://ci-cassandra.apache.org/job/Cassandra-devbranch/2234/badge/icon! > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 50m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680808#comment-17680808 ] Paulo Motta commented on CASSANDRA-16325: - LGTM, nice work Isaac! Prepared patch for commit [here|https://github.com/pauloricardomg/cassandra/commit/49946329d05b736943678345c2e110ce315940bc] and submitted a final round of CI: * [https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/2228/] (queued) ok to merge if CI looks good [~smiklosovic]? > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 50m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680378#comment-17680378 ] David Capwell commented on CASSANDRA-16325: --- Looked at the change and LGTM, looks really good now! > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 50m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678933#comment-17678933 ] Isaac Reath commented on CASSANDRA-16325: - [~paulo] I've updated the patch to include testing for compressed tables and {{stream_entire_sstables=true}}. > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678348#comment-17678348 ] Josh McKenzie commented on CASSANDRA-16325: --- {quote}if processing of downstream streaming listeners turn out to be a bottleneck I think we should send the notifications on a separate thread versus adding special logic to batch metric updates to not make this overly complex. {quote} In the "how do we solve for this case" I 100% agree. From the "we have lots of metrics that are potentially contended and will be queried programmatically by users via vtables" angle / lens, I think there might be a {{NoSpamLogger}} analog worth pursuing here to socket and update metrics when something crosses a certain threshold; trade off memory for reduced contention in general. Definitely wouldn't block here on it but might be a nice defensive paradigm for us to make idiomatic as we add more metrics. > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678015#comment-17678015 ] Paulo Motta commented on CASSANDRA-16325: - The new approach looks good to me. Right now, {{testMetricsUpdateIncrementallyWithRepairAndStreamingBetweenNodes}} and {{testMetricsUpdateIncrementallyWithRebuildAndStreamingBetweenNodes}} are only testing the uncompressed/partial stream path. I think we should extend these tests to also test the metrics are working correctly for compressed and {{stream_entire_sstables=true}} I don't expect these more frequent streamed bytes counter updates to have any noticeable impact on streaming performance, but if processing of downstream streaming listeners turn out to be a bottleneck I think we should send the notifications on a separate thread versus adding special logic to batch metric updates to not make this overly complex. > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677968#comment-17677968 ] Isaac Reath commented on CASSANDRA-16325: - Thank you [~dcapwell]. I've now updated my PR to update metrics through a {{StreamEventListener}}, overall seems like a much better outcome. In testing I found that the {{CassandraCompressedStreamReader}} seemed to be reporting way more bytes than were written. I did some digging and found that {{cis.bytesRead()}} does not reset after each section is finished causing the delta calculation to be incorrect after we've completed writing a section of an SSTable. Moving that to outside of the {{for (SSTableReader.PartitionPositionBounds section : sections)}} loop made the metrics collection accurate. [~jmckenzie], I'm not sure if moving this logic to a StreamEventListener addresses your concerns about hammering the counters. Moving to a {{StreamEventListener}} makes a threshold counter a bit more complicated (since you will have 1:many relationship between {{StreamEventListener}} and peer, forcing us to track the deltas per peer). Per [~dcapwell] we get a progress event for every 64kb which may still be too often. I kept the patch simple for now but we could move to either a batch-based approach or having a thread to update the counters if you still feel this is too frequent. > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676280#comment-17676280 ] David Capwell commented on CASSANDRA-16325: --- Merged to 4.1 and trunk, so events now include the delta > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675685#comment-17675685 ] Isaac Reath commented on CASSANDRA-16325: - This looks good to me. I left a comment about usage of {{bytesWritten}} in {{CassandraEntireSSTableStreamWriter}}. I might be wrong about that usage so I can look into it on my ticket. > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17657008#comment-17657008 ] David Capwell commented on CASSANDRA-16325: --- [~isaacreath] mind reviewing the changes in https://github.com/apache/cassandra/pull/2066? Here is the commit https://github.com/apache/cassandra/pull/2066/commits/5e6fd378da0297f09566cf3aafaa20c7600d2249 Most usage actually had the delta... subset needed extra memory to retain previous "progress" > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17657003#comment-17657003 ] Isaac Reath commented on CASSANDRA-16325: - [~dcapwell] yeah that sounds good to me! > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656999#comment-17656999 ] David Capwell commented on CASSANDRA-16325: --- CASSANDRA-18110 is approved and ready to merge. If it would help this patch to move the deltas to the caller I am glad to do it as part of that work so I can remove the cache; let me know [~isaacreath] > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656998#comment-17656998 ] David Capwell commented on CASSANDRA-16325: --- bq. I think if we updated the ProgressInfo object to store the delta from the last time progress was called I just looked into this and it would require light caching to the callers, such as {code} diff --git src/java/org/apache/cassandra/db/streaming/CassandraCompressedStreamReader.java src/java/org/apache/cassandra/db/streaming/CassandraCompressedStreamReader.java index dda874ba40..b9f4566aac 100644 --- src/java/org/apache/cassandra/db/streaming/CassandraCompressedStreamReader.java +++ src/java/org/apache/cassandra/db/streaming/CassandraCompressedStreamReader.java @@ -79,6 +79,7 @@ public class CassandraCompressedStreamReader extends CassandraStreamReader writer = createWriter(cfs, totalSize, repairedAt, pendingRepair, format); String filename = writer.getFilename(); int sectionIdx = 0; +long read = 0; for (SSTableReader.PartitionPositionBounds section : sections) { assert cis.chunkBytesRead() <= totalSize; @@ -93,7 +94,10 @@ public class CassandraCompressedStreamReader extends CassandraStreamReader { writePartition(deserializer, writer); // when compressed, report total bytes of compressed chunks read since remoteFile.size is the sum of chunks transferred -session.progress(filename + '-' + fileSeqNum, ProgressInfo.Direction.IN, cis.chunkBytesRead(), totalSize); +long bytes = cis.chunkBytesRead(); +long delta = bytes - read; +read = bytes; +session.progress(filename + '-' + fileSeqNum, ProgressInfo.Direction.IN, bytes, delta, totalSize); } assert in.getBytesRead() == sectionLength; } {code} So, if the reason to avoid using a listener was tracking progress info, we can compute the delta at the event creation side (non look hard), I personally find that better than embedding your custom metrics listener as it gives benefits to other listeners (such as removing the cache in StreamingState) > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656997#comment-17656997 ] David Capwell commented on CASSANDRA-16325: --- .bq I think if we updated the ProgressInfo object to store the delta from the last time progress was called That would be a good and welcome change... my patch stores a cache of String -> Long just so I can compute the delta... if streaming could do this for me then I can lower the memory of the listener. .bq The only thought I have about moving the counters into the StreamingSession class is if there is a feature flag around per-file progress. As a user, I would still want metrics to report if that flag is enabled. There is a feature flag to disable the stream stats listener, so it makes complete sets not to couple it with that listener; I agree. > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656990#comment-17656990 ] Isaac Reath commented on CASSANDRA-16325: - Yes [~dcapwell] this seems to be solving something very similar to what's happening in StreamingState::streamProgress in your patch. Looking at these lines in your patch: [https://github.com/apache/cassandra/pull/2066/files#diff-2c11c59ea199afd7097d89eece180d6ce6f9ee7568ffb141994f5dff9d9c2d27R267-R269] it looks very similar to the approach I initially took in this patch (see: [https://github.com/apache/cassandra/pull/2058#discussion_r1061577063).] We changed our approach to use this different one to avoid needing to keep track of the incoming / outgoing ProgressInfo objects (per [~smiklosovic]'s concerns). I think if we updated the ProgressInfo object to store the delta from the last time progress was called, then we could avoid having to track ProgressInfo objects and still leverage the ProgressEvent. From what I can tell, we have that information most of the time we call StreamSession.progress. The only thought I have about moving the counters into the StreamingSession class is if there is a feature flag around per-file progress. As a user, I would still want metrics to report if that flag is enabled. > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656768#comment-17656768 ] David Capwell commented on CASSANDRA-16325: --- Saw this due to CASSANDRA-18110 and the vtable work. Looking at the patch I have 2 main questions: 1) why embed into the calls when we get progress events providing the same details, just batched for every 64k bytes? Using the stream listens we have would resolve [~jmckenzie] comments as they are every 64k; this also drastically shrinks the size and scope of this patch as its just adding a single listener and listening to the FILE_PROGRESS event. 2) it seems that most of this work is to boil down to calling 2 lines: https://github.com/apache/cassandra/pull/2058/files#diff-bc79ba735741b3d997c735ad38ea4793d079d3ca2f2162412a422f26b3efdb40R99-R100. This work has a lot of overlap with the stream vtable which does the same work, just using the events rather than embedding into the call sites; should we move these counters there? > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656759#comment-17656759 ] Stefan Miklosovic commented on CASSANDRA-16325: --- If you want to introduce system property, please do not forget to put it into CassandraRelevantProperties. > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656757#comment-17656757 ] Paulo Motta commented on CASSANDRA-16325: - bq. ISTM we might benefit from a simple threshold counter inside CassandraStreamWriter where it'll update the metrics every N multiples of CassandraStreamWriter.DEFAULT_CHUNK_SIZE so we don't spam the metrics updates. Makes sense, but I think we should keep this metric buffering logic inside {{FileStreamMetricsListener}} so it's transparent to different readers/writers. We should also probably make the threshold configurable via a system property. > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656750#comment-17656750 ] Josh McKenzie commented on CASSANDRA-16325: --- I'm a little concerned about us hammering those Counters every time we transfer any date: {code:java} // stream each of the required sections of the file for (SSTableReader.PartitionPositionBounds section : sections) { long start = validator == null ? section.lowerPosition : validator.chunkStart(section.lowerPosition); // if the transfer does not start on the valididator's chunk boundary, this is the number of bytes to offset by int transferOffset = (int) (section.lowerPosition - start); if (validator != null) validator.seek(start); // length of the section to read long length = section.upperPosition - start; // tracks write progress long bytesRead = 0; while (bytesRead < length) { int toTransfer = (int) Math.min(bufferSize, length - bytesRead); long lastBytesRead = write(proxy, validator, out, start, transferOffset, toTransfer, bufferSize); start += lastBytesRead; bytesRead += lastBytesRead; progress += (lastBytesRead - transferOffset); session.progress(sstable.descriptor.filenameFor(Component.DATA), ProgressInfo.Direction.OUT, progress, totalSize); fileStreamMetricsListener.onStreamingBytesTransferred(progress); // REVIEWER NOTE: This is where we're going to update the counters on each call transferOffset = 0; }{code} ISTM we might benefit from a simple threshold counter inside {{CassandraStreamWriter}} where it'll update the metrics every N multiples of {{CassandraStreamWriter.DEFAULT_CHUNK_SIZE}} so we don't spam the metrics updates. > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656738#comment-17656738 ] Abe Ratnofsky commented on CASSANDRA-16325: --- Wanted to flag potential overlap with CASSANDRA-18110, which also has a PR up right now. > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.2 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656504#comment-17656504 ] Stefan Miklosovic commented on CASSANDRA-16325: --- [~isaacreath] thanks for the cooperation! I left couple last comments on the PR. When resolve I am +1. [~paulo] would you mind to take care of the eventual merging, please? > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654491#comment-17654491 ] Stefan Miklosovic commented on CASSANDRA-16325: --- Ive put some comments in. > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Time Spent: 20m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654206#comment-17654206 ] Paulo Motta commented on CASSANDRA-16325: - Submitted CI with updated test: [https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/2156/] I think this should be good for a second round of review. [~blerer] [~yifanc] [~stefan.miklosovic] would you be interested in reviewing this? Thanks > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Time Spent: 20m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654203#comment-17654203 ] Paulo Motta commented on CASSANDRA-16325: - The approach looks good to me, but the test was not working in my computer because streaming was completed by the time we checked [if the streaming metrics were updated|https://github.com/apache/cassandra/pull/2058/commits/b217c700892ac1164ebcee0e4333035c1a97ec33#diff-cfc5f5318c05f7dafd83a877597fd56afdadd78e5a48f0d2f9f228289655d01aR183]. Furthermore, since we only [checked that the streaming metrics were updated once|https://github.com/apache/cassandra/pull/2058/commits/b217c700892ac1164ebcee0e4333035c1a97ec33#diff-cfc5f5318c05f7dafd83a877597fd56afdadd78e5a48f0d2f9f228289655d01aR225] this did not guarantee metrics are updated incrementally since this check could be performed *after* the streaming is fully completed. In order to make this a bit more robust, I did the following changes to {{{}testMetricsUpdateIncrementally{}}}: * Update cluster size to 2 and RF=2, since this will make node1 transfer all of its data to node2 during streaming making it easier to reason about test state * Create single 10MB sstable on node1 with compression disabled, so we can ensure streaming of a single file will update streaming metrics incrementally * Check that 3 or more incoming/outgoing bytes are reported during streaming from node1 to node2 * Before, only 2 values are reported: [0, total_streamed_size] * Check that final transferred bytes matches expected sstable size (10MB) * Test both peer incoming/outgoing metrics as well as total incoming/outgoing metrics I added log statements on the updated test and these are the reported bytes before and after this patch: BEFORE: {noformat} OUTGOING [0, 10508890] INCOMING [0, 10508890] {noformat} AFTER {noformat} OUTGOING [0, 65536, 131072, 196608, 262144, 327680, 393216, 458752, 524288, 589824, 655360, 720896, 786432, 851968, 917504, 983040, 1048576, 1114112, 1179648, 1245184, 1310720, 1376256, 1441792, 1507328, 1572864, 1638400, 1703936, 1769472, 1835008, 1900544, 1966080, 2031616, 2097152, 2162688, 2228224, 2293760, 2359296, 2424832, 2490368, 2555904, 2621440, 2686976, 2752512, 2818048, 2883584, 2949120, 3014656, 3080192, 3276800, 3342336, 3538944, 3604480, 3670016, 3735552, 3801088, 3866624, 3932160, 3997696, 4063232, 4128768, 4194304, 4259840, 4325376, 4390912, 4456448, 4521984, 4587520, 4653056, 4718592, 4784128, 4849664, 4915200, 4980736, 5046272, 5111808, 5177344, 5242880, 5308416, 5373952, 5439488, 5505024, 5570560, 5636096, 5701632, 5767168, 5832704, 5898240, 5963776, 6029312, 6094848, 6160384, 6225920, 6291456, 6356992, 6422528, 6488064, 6553600, 6619136, 6684672, 6750208, 6815744, 6881280, 6946816, 7012352, 7077888, 7143424, 7208960, 7274496, 7340032, 7405568, 7471104, 7536640, 7602176, 7667712, 7733248, 7798784, 7864320, 7929856, 7995392, 8060928, 8126464, 8192000, 8257536, 8323072, 8388608, 8454144, 8519680, 8585216, 8650752, 8716288, 8781824, 8847360, 8912896, 8978432, 9043968, 9109504, 9175040, 9240576, 9306112, 9371648, 9437184, 9502720, 9568256, 9633792, 9699328, 9764864, 9830400, 9895936, 9961472, 10027008, 10092544, 10158080, 10223616, 10289152, 10354688, 10420224, 10485760, 10508890] INCOMING [0, 1051, 4204, 12611, 22070, 28372, 31525, 36777, 44134, 48338, 55693, 60948, 65151, 67253, 73559, 78814, 89323, 97730, 106137, 112443, 119799, 128207, 132408, 134510, 138714, 149224, 156581, 164986, 173394, 182852, 196512, 207022, 217530, 226987, 238546, 248004, 255360, 261666, 266919, 267970, 280581, 297395, 316313, 329973, 347837, 356243, 365702, 385669, 393026, 403534, 423497, 443465, 458179, 480245, 499163, 520182, 527538, 545403, 570623, 589538, 597945, 616862, 642081, 655743, 679915, 706187, 727205, 754527, 783954, 797617, 821786, 851212, 868025, 909010, 921620, 949993, 982571, 1009893, 1042470, 1063489, 1089762, 1090813, 1113932, 1141256, 1179088, 1210615, 1244242, 1273666, 1310450, 1351438, 1395577, 1440761, 1494357, 1541647, 1572125, 1605756, 1614161, 1615212, 1638331, 1658299, 1680367, 1703488, 1727652, 1762330, 1775993, 1806466, 1821180, 1837994, 1873722, 183, 1924162, 1947283, 1965147, 1988268, 2026099, 2048168, 2083899, 2097561, 2128036, 2152207, 2162716, 2191090, 2214208, 2227869, 2250988, 2285667, 2293024, 2306687, 2335064, 2359233, 2414932, 2424390, 2425441, 2476934, 2489546, 2528430, 2555754, 2577823, 2613554, 2620909, 2652436, 2686065, 2687116, 2728101, 2752266, 2756469, 2804811, 2817422, 2818473, 2859457, 2882576, 2905696, 2942476, 2948781, 2959291, 3013940, 3046520, 3080149, 3096963, 3138998, 3170524, 3211505, 3229370, 3230421, 3276657, 3330250, 3377539, 3445847, 3501549, 3538331, 3579316, 3632911, 3699116, 3734848, 3801057, 3843094, 3866214, 3867265, 3899844, 3931368, 3980761, 3997575, 4017543, 4062734, 4098462, 4127887, 4166768,
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649444#comment-17649444 ] Isaac Reath commented on CASSANDRA-16325: - Created: [https://github.com/apache/cassandra/pull/2058.] The approach we've taken is to update the metrics every time StreamSession::progress by the number of bytes streamed for a file since the last time progress was called. For testing, we've updated the StreamingMetrics distributed test to kick off a repair on a separate thread and check that the streaming metrics are updated before the repair completes. > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Time Spent: 20m > Remaining Estimate: 0h > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648819#comment-17648819 ] Isaac Reath commented on CASSANDRA-16325: - Picking up where Dejan left off to add some tests. > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Dejan Gvozdenac >Priority: Normal > Labels: lhf > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527654#comment-17527654 ] Dejan Gvozdenac commented on CASSANDRA-16325: - https://github.com/apache/cassandra/pull/1589 > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Dejan Gvozdenac >Priority: Normal > Labels: lhf > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503042#comment-17503042 ] Paulo Motta commented on CASSANDRA-16325: - Hi [~dgvozdenac]. Thanks for taking this up. It seems like a similar issue happens with compaction metrics. There was some code changes in these areas in recent versions. I think a good start would be to inspect the code in {{trunk}} to check where streaming and compaction size metrics are updated, and if they're being done incrementally or only at the end of the operation. > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Dejan Gvozdenac >Priority: Normal > Labels: lhf > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org