eejbyfeldt commented on PR #38428: URL: https://github.com/apache/spark/pull/38428#issuecomment-1380312253
> The PR as such looks reasonable to me - can we add a test to explicitly test for EOF behavior ? @mridulm I added a spec for this in: https://github.com/apache/spark/pull/38428/commits/77e616a910cbe7330c612e7ae8a34707c3bc8fb1 > Could you add a benchmark for your specific cases (lots of small streams)? Added a benchmark that shows that there is overhead in using `asIterator.toArray` compared to just reading the number expected elements in the current master that goes away in this branch. Add results from master with benchmark added (this branch: https://github.com/eejbyfeldt/spark/tree/SPARK-40912-only-adding-benchmark) in https://github.com/apache/spark/pull/38428/commits/75806338f9f56cb4ef9bbac4035e825ef07b5ab8 and then overwrote them in with this branch in https://github.com/apache/spark/pull/38428/commits/bc011c639619be65656b081edf2e1f8bcd91be44 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
