Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2882#issuecomment-60040386
This is nice work overall. I like the thorough tests, especially the
decoupling of the writer / reader tests so that you can test the components
separately and as part of the complete log manager system. I left a few
comments on the code, but I have a couple of high-level questions, too:
When is it safe to rotate / delete old logs? In general, it seems like
safe log compaction / deletion is application-specific and that a simple
time-based mechanism might be unsafe.
What would happen if Spark Streaming crashed, stayed down for some multiple
of the threshold time, then recovered? Would we read this old portion of the
log or would it be deleted / ignored?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]