wypoon edited a comment on pull request #31451:
URL: https://github.com/apache/spark/pull/31451#issuecomment-810480565


   > > Can you please explain why you changed from passing a completion 
function to `DataSourceRDD` to passing a `Map[String, SQLMetric]`? What is the 
benefit?
   > 
   > It is more fit to the current approach we use `SQLMetric`. This approach 
looks more clear to me. We update custom metrics during consuming the data now, 
instead of at the completion of data consuming.
   
   Thanks for the explanation. This sounds like a change from the API discussed 
in https://github.com/apache/spark/pull/31476. IIUC, before, the expectation 
was that `PartitionReader#currentMetricsValues()` is called after the partition 
is read. Now, the expectation is that `PartitionReader#currentMetricsValues()` 
is called for every row we iterate through in the reader. Such expectation 
should be documented clearly in the API, for implementors of custom metrics.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to