HeartSaVioR edited a comment on issue #27694: [SPARK-30946][SS] Serde entry with UnsafeRow on FileStream(Source/Sink)Log with LZ4 compression URL: https://github.com/apache/spark/pull/27694#issuecomment-590903965 Honestly I have been thinking about larger changes, like: * avoid rewriting all entities on compact operation * support retention (ideal to design with above item, as we won't be able to read all existing entities on compact operation) * use tree structure, or at least two kinds of entry "directory" and "file" to heavily reduce down path string on entries * streamline the compaction - instead of loading all entities to do next operation, iterate the loop on "load an entry -> transform/filter -> store entry if not filtered out". This would help on reducing driver memory usage on compact operation. but would like to have priorities on the perspective of (less changes & bigger impact), and make changes incrementally. This patch brings the least changes but great impact on performance. Above items are orthogonal to this improvement so they can be addressed on demand later.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
