[GitHub] [spark] agrawaldevesh commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-29 Thread GitBox


agrawaldevesh commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-664719022







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-26 Thread GitBox


agrawaldevesh commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-664048482


   @attilapiros Thanks again for a great review. I have incorporated all of 
your feedback. The test is less hacky now :-) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-24 Thread GitBox


agrawaldevesh commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-663695810


   > It is extremely late here so I have to postpone the last round at least 
for Monday.
   
   Thank you so much for working on this. I will post my update soon. No rush 
!. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-21 Thread GitBox


agrawaldevesh commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-662207447


   > oh hmmm I remember seeing the GH action failure with that suite. It looks 
like it's putting two of the blocks partitions on the same block manager so 
when we decommission the other executor it's behaving weirdly. Probably the 
easist fix would be bumping up `numParts` to 6 to decrease the chance of that 
happening.
   
   Done. It turns out that numParts = 6 made the other two tests in that suite 
fail :-), so I made numParts a configurable per test. Updated the 
BlockManagerDecommissionIntegrationSuite.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-21 Thread GitBox


agrawaldevesh commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-662150889


   I can't repro the test failure in 
"org.apache.spark.storage.BlockManagerDecommissionIntegrationSuite" even after 
running locally for 100's of times. I reread the test code and I think it is 
orthogonal to my change, despite the name.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-21 Thread GitBox


agrawaldevesh commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-662026486


   > Sorry I still need time to reach the end of the this changes and I would 
like to run some tests too.
   
   Thanks a lot for your careful review and helping me improve the code. Please 
take your time in reviewing it. I have incorporated your feedback uptil now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-18 Thread GitBox


agrawaldevesh commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-660540565


   The test failure in 
`org.apache.spark.sql.execution.adaptive.AdaptiveQueryExecSuite` looks 
unrelated and I cannot reproduce it locally.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-14 Thread GitBox


agrawaldevesh commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-658455874


   I cannot reproduce the failures in the `ExecutorAllocationManagerSuite`: I 
ran 300 back to back invocations on my local SBT/Laptop and they all passed. I 
suspect that the test is flaky. I cannot see any interaction with the 
decommissioning codepath. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-09 Thread GitBox


agrawaldevesh commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-656474975


   @holdenk, @jiangxb1987 @cloud-fan @Ngone51 -- This PR is ready for your 
review please. Thanks !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org