[GitHub] [spark] shahidki31 edited a comment on issue #26821: [SPARK-20656][CORE]Support Incremental parsing of event logs in SHS

GitBox Tue, 10 Dec 2019 23:44:47 -0800

shahidki31 edited a comment on issue #26821: [SPARK-20656][CORE]Support 
Incremental parsing of event logs in SHS
URL: https://github.com/apache/spark/pull/26821#issuecomment-564417301
 
 
   > From what I tested locally, filtering by lines roughly takes about 30 
seconds for file sizes ranging from 10MB to 400MB, while skipping by bytes only 
takes 2 ms for a 400MB file.
   
   Hi @oopDaniel , Actually I tried with both the approaches, and it seems 
skipping bytes seems more complicated as we need to handle more edge cases. 
Also, I tested this PR with 2GB event log file and I think the time to load UI 
took around 2 seconds (?) including filtering and replaying. Also it is not 
difficult to add the skipping bytes, as all we need to do is add the bytes read 
parameter instead of lines read parameter and handle the edge cases. 
@HeartSaVioR I think the approach which you guys are doing is great, as it 
handles restarting SHS. But, if we can review this PR related to incremental 
parsing, extending to snapshotting would be easier I guess.. If there is a 
working PR for that, I can close this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] shahidki31 edited a comment on issue #26821: [SPARK-20656][CORE]Support Incremental parsing of event logs in SHS

Reply via email to