[GitHub] [spark] HeartSaVioR edited a comment on issue #26821: [SPARK-20656][CORE]Support Incremental parsing of event logs in SHS
HeartSaVioR edited a comment on issue #26821: [SPARK-20656][CORE]Support Incremental parsing of event logs in SHS URL: https://github.com/apache/spark/pull/26821#issuecomment-564422644 @shahidki31 No I didn't intend to persuade you to close this. I'd just wanted to make sure we get a clear picture of full implementation before dealing with each part, but it's OK for me if you'd like to deal with current solution as I think I can deal with extending the solution with snapshotting. I can take a look with current solution, but you still need to persuade at least one committer to push this forward. Btw, we'd be better to clarify the performance test in details. It should include at least... * size of event log file for initial load * elapsed time for initial load * count/size of events for addition (mostly about size) * elapsed time for loading additional events (and sure it would be nicer if you can experiment with various matrix, at least couple of tests around the size of event log file - as you said, huge event log file doesn't only take couple of GBs. It's 10s of GBs.) For me, your statement in PR description sounds to me as skipping (via read and drop) 2G takes around 2 secs which is still not ideal (as we know how to do it better), though I agree that's still a huge improvement. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on issue #26821: [SPARK-20656][CORE]Support Incremental parsing of event logs in SHS
HeartSaVioR edited a comment on issue #26821: [SPARK-20656][CORE]Support Incremental parsing of event logs in SHS URL: https://github.com/apache/spark/pull/26821#issuecomment-564422644 @shahidki31 No I didn't intend to persuade you to close this. I'd just wanted to make sure we get a clear picture of full implementation before dealing with each part, but it's OK for me if you'd like to deal with current solution as I think I can deal with extending the solution with snapshotting. I can take a look with current solution, but you still need to persuade at least one committer to push this forward. Btw, we'd be better to clarify the performance test in details. It should include at least... * size of event log file for initial load * elapsed time for initial load * count/size of events for addition (mostly about size) * elapsed time for loading additional events For me, your statement in PR description sounds to me as skipping (via read and drop) 2G takes around 2 secs which is still not ideal (as we know how to do it better), though I agree that's still a huge improvement. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on issue #26821: [SPARK-20656][CORE]Support Incremental parsing of event logs in SHS
HeartSaVioR edited a comment on issue #26821: [SPARK-20656][CORE]Support Incremental parsing of event logs in SHS URL: https://github.com/apache/spark/pull/26821#issuecomment-563502972 Thanks for cc.ing me, @dongjoon-hyun . I'll take a look. Btw, I think we have another JIRA issue for supporting incremental parsing [SPARK-28870](https://issues.apache.org/jira/browse/SPARK-28870) which has broader goal - run with any implementation of KVStore. At first glance, this patch could cover SPARK-29261 and with SPARK-29111 it may resolve SPARK-28870 altogether - though we struggled on the details previously so I need some time to go through deeply. @shahidki31 I assume you've been following through the previous discussions/efforts @Ngone51 and me, and @vanzin, @squito have been made. (#25577 and #25943, and relevant google docs in relevant JIRA issues/PRs) If not, it would worth to go through, as we've discussed in details. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org