[GitHub] [spark] HeartSaVioR edited a comment on issue #26821: [SPARK-20656][CORE]Support Incremental parsing of event logs in SHS

2019-12-11 Thread GitBox
HeartSaVioR edited a comment on issue #26821: [SPARK-20656][CORE]Support 
Incremental parsing of event logs in SHS
URL: https://github.com/apache/spark/pull/26821#issuecomment-564422644
 
 
   @shahidki31 
   No I didn't intend to persuade you to close this. I'd just wanted to make 
sure we get a clear picture of full implementation before dealing with each 
part, but it's OK for me if you'd like to deal with current solution as I think 
I can deal with extending the solution with snapshotting.
   
   I can take a look with current solution, but you still need to persuade at 
least one committer to push this forward.
   
   Btw, we'd be better to clarify the performance test in details. It should 
include at least...
   
   * size of event log file for initial load
   * elapsed time for initial load
   * count/size of events for addition (mostly about size)
   * elapsed time for loading additional events
   
   (and sure it would be nicer if you can experiment with various matrix, at 
least couple of tests around the size of event log file - as you said, huge 
event log file doesn't only take couple of GBs. It's 10s of GBs.)
   
   For me, your statement in PR description sounds to me as skipping (via read 
and drop) 2G takes around 2 secs which is still not ideal (as we know how to do 
it better), though I agree that's still a huge improvement.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on issue #26821: [SPARK-20656][CORE]Support Incremental parsing of event logs in SHS

2019-12-10 Thread GitBox
HeartSaVioR edited a comment on issue #26821: [SPARK-20656][CORE]Support 
Incremental parsing of event logs in SHS
URL: https://github.com/apache/spark/pull/26821#issuecomment-564422644
 
 
   @shahidki31 
   No I didn't intend to persuade you to close this. I'd just wanted to make 
sure we get a clear picture of full implementation before dealing with each 
part, but it's OK for me if you'd like to deal with current solution as I think 
I can deal with extending the solution with snapshotting.
   
   I can take a look with current solution, but you still need to persuade at 
least one committer to push this forward.
   
   Btw, we'd be better to clarify the performance test in details. It should 
include at least...
   
   * size of event log file for initial load
   * elapsed time for initial load
   * count/size of events for addition (mostly about size)
   * elapsed time for loading additional events
   
   For me, your statement in PR description sounds to me as skipping (via read 
and drop) 2G takes around 2 secs which is still not ideal (as we know how to do 
it better), though I agree that's still a huge improvement.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on issue #26821: [SPARK-20656][CORE]Support Incremental parsing of event logs in SHS

2019-12-09 Thread GitBox
HeartSaVioR edited a comment on issue #26821: [SPARK-20656][CORE]Support 
Incremental parsing of event logs in SHS
URL: https://github.com/apache/spark/pull/26821#issuecomment-563502972
 
 
   Thanks for cc.ing me, @dongjoon-hyun . I'll take a look.
   
   Btw, I think we have another JIRA issue for supporting incremental parsing 
[SPARK-28870](https://issues.apache.org/jira/browse/SPARK-28870) which has 
broader goal - run with any implementation of KVStore.
   
   At first glance, this patch could cover SPARK-29261 and with SPARK-29111 it 
may resolve SPARK-28870 altogether - though we struggled on the details 
previously so I need some time to go through deeply.
   
   @shahidki31 
   I assume you've been following through the previous discussions/efforts 
@Ngone51 and me, and @vanzin, @squito have been made. (#25577 and #25943, and 
relevant google docs in relevant JIRA issues/PRs) If not, it would worth to go 
through, as we've discussed in details.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org