shahidki31 edited a comment on issue #26821: [SPARK-20656][CORE]Support Incremental parsing of event logs in SHS URL: https://github.com/apache/spark/pull/26821#issuecomment-564417301 > From what I tested locally, filtering by lines roughly takes about 30 seconds for file sizes ranging from 10MB to 400MB, while skipping by bytes only takes 2 ms for a 400MB file. Hi @oopDaniel , Actually I tried with both the approaches, and it seems skipping bytes seems more complicated as we need to handle more edge cases. Also, I tested this PR with 2GB event log file and I think the time to load UI took around 2 seconds (?) including filtering and replaying. Also it is not difficult to add the skipping bytes, as all we need to do is add the bytes read parameter instead of lines read parameter and handle the edge cases. @HeartSaVioR I think the approach which you guys are doing is great, as it handles restarting SHS. But, if we can review this PR related to incremental parsing, extending to snapshotting would be easier I guess.. If there is a working PR for that, I can close this.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
