Re: RFR for JDK-6772009 Intermittent test failure: java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java test failed with 'Completed != 2'
Hi David/Martin, If you agree with Kalyan's fix for this issue, could one of you please sponsor the push. Thanks, Sandeep On Dec 23, 2013, at 11:17 AM, srikalyan chandrashekar srikalyan.chandrashe...@oracle.com wrote: Hi David/Martin, could any one of you sponsor this change for me? --- Thanks kalyan On 12/20/2013 10:28 PM, David Holmes wrote: On 21/12/2013 4:19 AM, srikalyan wrote: Hi David, i retained only the changes to ITERS, ProbleMList.txt and upstream changes by Doug Lea(as pointed by Martin), could you please review the new change available here http://cr.openjdk.java.net/~srikchan/Regression/6772009-CancelledLockLoop-webrev/ . Ok. Thanks, David -- Thanks kalyan Ph: (408)-585-8040 On 12/19/13, 8:10 PM, David Holmes wrote: Sorry Kalyan but I don't see the need for all the incidental changes if the primary change is to just increase the iterations. I also don't see why you need to do anything for BrokenBarrierException as it is not expected to happen and the test should just fail if it does. David On 10/12/2013 6:15 AM, srikalyan wrote: Hi David/Martin a gentle reminder for review. -- Thanks kalyan Ph: (408)-585-8040 On 12/2/13, 11:21 AM, srikalyan wrote: Hi David, Thanks for the review, the new webrev is hosted at http://cr.openjdk.java.net/~cl/host_for_kal/6772009-CancelledLockLoop/ . Please see inline text. On 11/20/13, 6:35 PM, David Holmes wrote: On 21/11/2013 10:28 AM, Martin Buchholz wrote: I again tried and failed to reproduce a failure. Even if I go whole hog and multiply TIMEOUT by 100 and divide ITERS by 100, the test continues to PASS. Is it just me?! I think you are going the wrong way Martin - you want the timeout to be smaller than the time it takes to execute ITERS. I don't think there's any reason to make result long. It's not even used except to inhibit hotspot optimizations. +private volatile long result = 17;//Can get int overflow,so using long Further the subsequent use of += is incorrect as it is not an atomic operation. Even if we don't care about the value, it looks bad. Made the necessary changes for atomic update. I'm not sure result must be updated if we get a BrokenBarrierException either. Probably harmless, but necessary? I retained it in the fix for completeness in updating the numbers, please let me know if you still think otherwise. need to fix spelling and spacing below. +barrier.await();//If a BrokeBarrierException happens here(due to There are a number of style issues with spacing around comments. Fixed the spelling error and styling issues. And I don't think this change is sufficient to claim co-author status with Doug either ;-) Removed the claim :) The additional tracing may be useful but seems stylistically different from the rest of the code. Retained the tracking to understand if it is again the timing issue which is the cause in an event of a failure, however i can remove it if you think it is not necessary (OR) include an alternate solution as you may want to suggest. Overall I'm suspicious that the changed timeout actually fixes anything - normally we need to add longer timeouts not shorter ones. Does this fail on a range of machines or only specific ones? Have we verified that the clocks/timers are behaving properly on those systems? Here the time out is not about waiting for threads to complete something but to be interrupted before being considered done, so we decreased the timeout. However we now chose to increase the number of iterations to 500 from 100(thanks to tristan for the suggestion) instead of decreasing the timeout as done earlier because the increasing iterations ensures the threads are busy for long time curtailing the need to touch the timeout. Thanks, David -- Thanks kalyan Ph: (408)-585-8040 On Wed, Nov 20, 2013 at 11:52 AM, srikalyan srikalyan.chandrashe...@oracle.com wrote: Hi Martin , apologies for the delay , was trying to get help for hosting my webrev. . Please see inline text. On 11/19/13, 10:35 PM, Martin Buchholz wrote: Hi Kalyan, None of us can review your changes yet because you haven't given us a URL of your webrev. It is located here http://cr.openjdk.java.net/~cl/host_for_srikalyan_6772009_CancelledLockLoops/ I've tried to make the jsr166 copy of CancelledLockLoops fail by adjusting ITERS and TIMEOUT radically up and down, but the test just keeps on passing for me. Hints appreciated. Bump up the timeout to 500ms and you will see a failure (i can see it consistently on my machine Linux 64bit,8GBRAM,dual cores, with JDK 1.8 latest any promoted build). On Tue, Nov 19, 2013 at 6:39 PM, srikalyan chandrashekar srikalyan.chandrashe...@oracle.com wrote: Suggested Fix: a) Decrease the timeout from 100 to 50ms which will ensure that the test will pass even on faster
Re: RFR for JDK-6772009 Intermittent test failure: java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java test failed with 'Completed != 2'
Hi David/Martin, could any one of you sponsor this change for me? --- Thanks kalyan On 12/20/2013 10:28 PM, David Holmes wrote: On 21/12/2013 4:19 AM, srikalyan wrote: Hi David, i retained only the changes to ITERS, ProbleMList.txt and upstream changes by Doug Lea(as pointed by Martin), could you please review the new change available here http://cr.openjdk.java.net/~srikchan/Regression/6772009-CancelledLockLoop-webrev/ . Ok. Thanks, David -- Thanks kalyan Ph: (408)-585-8040 On 12/19/13, 8:10 PM, David Holmes wrote: Sorry Kalyan but I don't see the need for all the incidental changes if the primary change is to just increase the iterations. I also don't see why you need to do anything for BrokenBarrierException as it is not expected to happen and the test should just fail if it does. David On 10/12/2013 6:15 AM, srikalyan wrote: Hi David/Martin a gentle reminder for review. -- Thanks kalyan Ph: (408)-585-8040 On 12/2/13, 11:21 AM, srikalyan wrote: Hi David, Thanks for the review, the new webrev is hosted at http://cr.openjdk.java.net/~cl/host_for_kal/6772009-CancelledLockLoop/ . Please see inline text. On 11/20/13, 6:35 PM, David Holmes wrote: On 21/11/2013 10:28 AM, Martin Buchholz wrote: I again tried and failed to reproduce a failure. Even if I go whole hog and multiply TIMEOUT by 100 and divide ITERS by 100, the test continues to PASS. Is it just me?! I think you are going the wrong way Martin - you want the timeout to be smaller than the time it takes to execute ITERS. I don't think there's any reason to make result long. It's not even used except to inhibit hotspot optimizations. +private volatile long result = 17;//Can get int overflow,so using long Further the subsequent use of += is incorrect as it is not an atomic operation. Even if we don't care about the value, it looks bad. Made the necessary changes for atomic update. I'm not sure result must be updated if we get a BrokenBarrierException either. Probably harmless, but necessary? I retained it in the fix for completeness in updating the numbers, please let me know if you still think otherwise. need to fix spelling and spacing below. +barrier.await();//If a BrokeBarrierException happens here(due to There are a number of style issues with spacing around comments. Fixed the spelling error and styling issues. And I don't think this change is sufficient to claim co-author status with Doug either ;-) Removed the claim :) The additional tracing may be useful but seems stylistically different from the rest of the code. Retained the tracking to understand if it is again the timing issue which is the cause in an event of a failure, however i can remove it if you think it is not necessary (OR) include an alternate solution as you may want to suggest. Overall I'm suspicious that the changed timeout actually fixes anything - normally we need to add longer timeouts not shorter ones. Does this fail on a range of machines or only specific ones? Have we verified that the clocks/timers are behaving properly on those systems? Here the time out is not about waiting for threads to complete something but to be interrupted before being considered done, so we decreased the timeout. However we now chose to increase the number of iterations to 500 from 100(thanks to tristan for the suggestion) instead of decreasing the timeout as done earlier because the increasing iterations ensures the threads are busy for long time curtailing the need to touch the timeout. Thanks, David -- Thanks kalyan Ph: (408)-585-8040 On Wed, Nov 20, 2013 at 11:52 AM, srikalyan srikalyan.chandrashe...@oracle.com wrote: Hi Martin , apologies for the delay , was trying to get help for hosting my webrev. . Please see inline text. On 11/19/13, 10:35 PM, Martin Buchholz wrote: Hi Kalyan, None of us can review your changes yet because you haven't given us a URL of your webrev. It is located here http://cr.openjdk.java.net/~cl/host_for_srikalyan_6772009_CancelledLockLoops/ I've tried to make the jsr166 copy of CancelledLockLoops fail by adjusting ITERS and TIMEOUT radically up and down, but the test just keeps on passing for me. Hints appreciated. Bump up the timeout to 500ms and you will see a failure (i can see it consistently on my machine Linux 64bit,8GBRAM,dual cores, with JDK 1.8 latest any promoted build). On Tue, Nov 19, 2013 at 6:39 PM, srikalyan chandrashekar srikalyan.chandrashe...@oracle.com wrote: Suggested Fix: a) Decrease the timeout from 100 to 50ms which will ensure that the test will pass even on faster machines This doesn't look like a permanent fix - it just makes the failing case rarer. Thats true , the other way is to make the main thread wait on TIMEOUT after firing the interrupts instead of other way round, but that would be over-optimization which probably is not desirable as well. The 50 ms was arrived at empirically
Re: RFR for JDK-6772009 Intermittent test failure: java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java test failed with 'Completed != 2'
Hi David, i retained only the changes to ITERS, ProbleMList.txt and upstream changes by Doug Lea(as pointed by Martin), could you please review the new change available here http://cr.openjdk.java.net/~srikchan/Regression/6772009-CancelledLockLoop-webrev/ . -- Thanks kalyan Ph: (408)-585-8040 On 12/19/13, 8:10 PM, David Holmes wrote: Sorry Kalyan but I don't see the need for all the incidental changes if the primary change is to just increase the iterations. I also don't see why you need to do anything for BrokenBarrierException as it is not expected to happen and the test should just fail if it does. David On 10/12/2013 6:15 AM, srikalyan wrote: Hi David/Martin a gentle reminder for review. -- Thanks kalyan Ph: (408)-585-8040 On 12/2/13, 11:21 AM, srikalyan wrote: Hi David, Thanks for the review, the new webrev is hosted at http://cr.openjdk.java.net/~cl/host_for_kal/6772009-CancelledLockLoop/ . Please see inline text. On 11/20/13, 6:35 PM, David Holmes wrote: On 21/11/2013 10:28 AM, Martin Buchholz wrote: I again tried and failed to reproduce a failure. Even if I go whole hog and multiply TIMEOUT by 100 and divide ITERS by 100, the test continues to PASS. Is it just me?! I think you are going the wrong way Martin - you want the timeout to be smaller than the time it takes to execute ITERS. I don't think there's any reason to make result long. It's not even used except to inhibit hotspot optimizations. +private volatile long result = 17;//Can get int overflow,so using long Further the subsequent use of += is incorrect as it is not an atomic operation. Even if we don't care about the value, it looks bad. Made the necessary changes for atomic update. I'm not sure result must be updated if we get a BrokenBarrierException either. Probably harmless, but necessary? I retained it in the fix for completeness in updating the numbers, please let me know if you still think otherwise. need to fix spelling and spacing below. +barrier.await();//If a BrokeBarrierException happens here(due to There are a number of style issues with spacing around comments. Fixed the spelling error and styling issues. And I don't think this change is sufficient to claim co-author status with Doug either ;-) Removed the claim :) The additional tracing may be useful but seems stylistically different from the rest of the code. Retained the tracking to understand if it is again the timing issue which is the cause in an event of a failure, however i can remove it if you think it is not necessary (OR) include an alternate solution as you may want to suggest. Overall I'm suspicious that the changed timeout actually fixes anything - normally we need to add longer timeouts not shorter ones. Does this fail on a range of machines or only specific ones? Have we verified that the clocks/timers are behaving properly on those systems? Here the time out is not about waiting for threads to complete something but to be interrupted before being considered done, so we decreased the timeout. However we now chose to increase the number of iterations to 500 from 100(thanks to tristan for the suggestion) instead of decreasing the timeout as done earlier because the increasing iterations ensures the threads are busy for long time curtailing the need to touch the timeout. Thanks, David -- Thanks kalyan Ph: (408)-585-8040 On Wed, Nov 20, 2013 at 11:52 AM, srikalyan srikalyan.chandrashe...@oracle.com wrote: Hi Martin , apologies for the delay , was trying to get help for hosting my webrev. . Please see inline text. On 11/19/13, 10:35 PM, Martin Buchholz wrote: Hi Kalyan, None of us can review your changes yet because you haven't given us a URL of your webrev. It is located here http://cr.openjdk.java.net/~cl/host_for_srikalyan_6772009_CancelledLockLoops/ I've tried to make the jsr166 copy of CancelledLockLoops fail by adjusting ITERS and TIMEOUT radically up and down, but the test just keeps on passing for me. Hints appreciated. Bump up the timeout to 500ms and you will see a failure (i can see it consistently on my machine Linux 64bit,8GBRAM,dual cores, with JDK 1.8 latest any promoted build). On Tue, Nov 19, 2013 at 6:39 PM, srikalyan chandrashekar srikalyan.chandrashe...@oracle.com wrote: Suggested Fix: a) Decrease the timeout from 100 to 50ms which will ensure that the test will pass even on faster machines This doesn't look like a permanent fix - it just makes the failing case rarer. Thats true , the other way is to make the main thread wait on TIMEOUT after firing the interrupts instead of other way round, but that would be over-optimization which probably is not desirable as well. The 50 ms was arrived at empirically after running several 100 times on multiple configurations and did not cause failures. -- Thanks kalyan Ph: (408)-585-8040
Re: RFR for JDK-6772009 Intermittent test failure: java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java test failed with 'Completed != 2'
On 21/12/2013 4:19 AM, srikalyan wrote: Hi David, i retained only the changes to ITERS, ProbleMList.txt and upstream changes by Doug Lea(as pointed by Martin), could you please review the new change available here http://cr.openjdk.java.net/~srikchan/Regression/6772009-CancelledLockLoop-webrev/ . Ok. Thanks, David -- Thanks kalyan Ph: (408)-585-8040 On 12/19/13, 8:10 PM, David Holmes wrote: Sorry Kalyan but I don't see the need for all the incidental changes if the primary change is to just increase the iterations. I also don't see why you need to do anything for BrokenBarrierException as it is not expected to happen and the test should just fail if it does. David On 10/12/2013 6:15 AM, srikalyan wrote: Hi David/Martin a gentle reminder for review. -- Thanks kalyan Ph: (408)-585-8040 On 12/2/13, 11:21 AM, srikalyan wrote: Hi David, Thanks for the review, the new webrev is hosted at http://cr.openjdk.java.net/~cl/host_for_kal/6772009-CancelledLockLoop/ . Please see inline text. On 11/20/13, 6:35 PM, David Holmes wrote: On 21/11/2013 10:28 AM, Martin Buchholz wrote: I again tried and failed to reproduce a failure. Even if I go whole hog and multiply TIMEOUT by 100 and divide ITERS by 100, the test continues to PASS. Is it just me?! I think you are going the wrong way Martin - you want the timeout to be smaller than the time it takes to execute ITERS. I don't think there's any reason to make result long. It's not even used except to inhibit hotspot optimizations. +private volatile long result = 17;//Can get int overflow,so using long Further the subsequent use of += is incorrect as it is not an atomic operation. Even if we don't care about the value, it looks bad. Made the necessary changes for atomic update. I'm not sure result must be updated if we get a BrokenBarrierException either. Probably harmless, but necessary? I retained it in the fix for completeness in updating the numbers, please let me know if you still think otherwise. need to fix spelling and spacing below. +barrier.await();//If a BrokeBarrierException happens here(due to There are a number of style issues with spacing around comments. Fixed the spelling error and styling issues. And I don't think this change is sufficient to claim co-author status with Doug either ;-) Removed the claim :) The additional tracing may be useful but seems stylistically different from the rest of the code. Retained the tracking to understand if it is again the timing issue which is the cause in an event of a failure, however i can remove it if you think it is not necessary (OR) include an alternate solution as you may want to suggest. Overall I'm suspicious that the changed timeout actually fixes anything - normally we need to add longer timeouts not shorter ones. Does this fail on a range of machines or only specific ones? Have we verified that the clocks/timers are behaving properly on those systems? Here the time out is not about waiting for threads to complete something but to be interrupted before being considered done, so we decreased the timeout. However we now chose to increase the number of iterations to 500 from 100(thanks to tristan for the suggestion) instead of decreasing the timeout as done earlier because the increasing iterations ensures the threads are busy for long time curtailing the need to touch the timeout. Thanks, David -- Thanks kalyan Ph: (408)-585-8040 On Wed, Nov 20, 2013 at 11:52 AM, srikalyan srikalyan.chandrashe...@oracle.com wrote: Hi Martin , apologies for the delay , was trying to get help for hosting my webrev. . Please see inline text. On 11/19/13, 10:35 PM, Martin Buchholz wrote: Hi Kalyan, None of us can review your changes yet because you haven't given us a URL of your webrev. It is located here http://cr.openjdk.java.net/~cl/host_for_srikalyan_6772009_CancelledLockLoops/ I've tried to make the jsr166 copy of CancelledLockLoops fail by adjusting ITERS and TIMEOUT radically up and down, but the test just keeps on passing for me. Hints appreciated. Bump up the timeout to 500ms and you will see a failure (i can see it consistently on my machine Linux 64bit,8GBRAM,dual cores, with JDK 1.8 latest any promoted build). On Tue, Nov 19, 2013 at 6:39 PM, srikalyan chandrashekar srikalyan.chandrashe...@oracle.com wrote: Suggested Fix: a) Decrease the timeout from 100 to 50ms which will ensure that the test will pass even on faster machines This doesn't look like a permanent fix - it just makes the failing case rarer. Thats true , the other way is to make the main thread wait on TIMEOUT after firing the interrupts instead of other way round, but that would be over-optimization which probably is not desirable as well. The 50 ms was arrived at empirically after running several 100 times on multiple configurations and did not cause failures. -- Thanks kalyan Ph: (408)-585-8040
Re: RFR for JDK-6772009 Intermittent test failure: java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java test failed with 'Completed != 2'
Sorry Kalyan but I don't see the need for all the incidental changes if the primary change is to just increase the iterations. I also don't see why you need to do anything for BrokenBarrierException as it is not expected to happen and the test should just fail if it does. David On 10/12/2013 6:15 AM, srikalyan wrote: Hi David/Martin a gentle reminder for review. -- Thanks kalyan Ph: (408)-585-8040 On 12/2/13, 11:21 AM, srikalyan wrote: Hi David, Thanks for the review, the new webrev is hosted at http://cr.openjdk.java.net/~cl/host_for_kal/6772009-CancelledLockLoop/ . Please see inline text. On 11/20/13, 6:35 PM, David Holmes wrote: On 21/11/2013 10:28 AM, Martin Buchholz wrote: I again tried and failed to reproduce a failure. Even if I go whole hog and multiply TIMEOUT by 100 and divide ITERS by 100, the test continues to PASS. Is it just me?! I think you are going the wrong way Martin - you want the timeout to be smaller than the time it takes to execute ITERS. I don't think there's any reason to make result long. It's not even used except to inhibit hotspot optimizations. +private volatile long result = 17;//Can get int overflow,so using long Further the subsequent use of += is incorrect as it is not an atomic operation. Even if we don't care about the value, it looks bad. Made the necessary changes for atomic update. I'm not sure result must be updated if we get a BrokenBarrierException either. Probably harmless, but necessary? I retained it in the fix for completeness in updating the numbers, please let me know if you still think otherwise. need to fix spelling and spacing below. +barrier.await();//If a BrokeBarrierException happens here(due to There are a number of style issues with spacing around comments. Fixed the spelling error and styling issues. And I don't think this change is sufficient to claim co-author status with Doug either ;-) Removed the claim :) The additional tracing may be useful but seems stylistically different from the rest of the code. Retained the tracking to understand if it is again the timing issue which is the cause in an event of a failure, however i can remove it if you think it is not necessary (OR) include an alternate solution as you may want to suggest. Overall I'm suspicious that the changed timeout actually fixes anything - normally we need to add longer timeouts not shorter ones. Does this fail on a range of machines or only specific ones? Have we verified that the clocks/timers are behaving properly on those systems? Here the time out is not about waiting for threads to complete something but to be interrupted before being considered done, so we decreased the timeout. However we now chose to increase the number of iterations to 500 from 100(thanks to tristan for the suggestion) instead of decreasing the timeout as done earlier because the increasing iterations ensures the threads are busy for long time curtailing the need to touch the timeout. Thanks, David -- Thanks kalyan Ph: (408)-585-8040 On Wed, Nov 20, 2013 at 11:52 AM, srikalyan srikalyan.chandrashe...@oracle.com wrote: Hi Martin , apologies for the delay , was trying to get help for hosting my webrev. . Please see inline text. On 11/19/13, 10:35 PM, Martin Buchholz wrote: Hi Kalyan, None of us can review your changes yet because you haven't given us a URL of your webrev. It is located here http://cr.openjdk.java.net/~cl/host_for_srikalyan_6772009_CancelledLockLoops/ I've tried to make the jsr166 copy of CancelledLockLoops fail by adjusting ITERS and TIMEOUT radically up and down, but the test just keeps on passing for me. Hints appreciated. Bump up the timeout to 500ms and you will see a failure (i can see it consistently on my machine Linux 64bit,8GBRAM,dual cores, with JDK 1.8 latest any promoted build). On Tue, Nov 19, 2013 at 6:39 PM, srikalyan chandrashekar srikalyan.chandrashe...@oracle.com wrote: Suggested Fix: a) Decrease the timeout from 100 to 50ms which will ensure that the test will pass even on faster machines This doesn't look like a permanent fix - it just makes the failing case rarer. Thats true , the other way is to make the main thread wait on TIMEOUT after firing the interrupts instead of other way round, but that would be over-optimization which probably is not desirable as well. The 50 ms was arrived at empirically after running several 100 times on multiple configurations and did not cause failures. -- Thanks kalyan Ph: (408)-585-8040
Re: RFR for JDK-6772009 Intermittent test failure: java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java test failed with 'Completed != 2'
Hi David/Martin a gentle reminder for review. -- Thanks kalyan Ph: (408)-585-8040 On 12/2/13, 11:21 AM, srikalyan wrote: Hi David, Thanks for the review, the new webrev is hosted at http://cr.openjdk.java.net/~cl/host_for_kal/6772009-CancelledLockLoop/ . Please see inline text. On 11/20/13, 6:35 PM, David Holmes wrote: On 21/11/2013 10:28 AM, Martin Buchholz wrote: I again tried and failed to reproduce a failure. Even if I go whole hog and multiply TIMEOUT by 100 and divide ITERS by 100, the test continues to PASS. Is it just me?! I think you are going the wrong way Martin - you want the timeout to be smaller than the time it takes to execute ITERS. I don't think there's any reason to make result long. It's not even used except to inhibit hotspot optimizations. +private volatile long result = 17;//Can get int overflow,so using long Further the subsequent use of += is incorrect as it is not an atomic operation. Even if we don't care about the value, it looks bad. Made the necessary changes for atomic update. I'm not sure result must be updated if we get a BrokenBarrierException either. Probably harmless, but necessary? I retained it in the fix for completeness in updating the numbers, please let me know if you still think otherwise. need to fix spelling and spacing below. +barrier.await();//If a BrokeBarrierException happens here(due to There are a number of style issues with spacing around comments. Fixed the spelling error and styling issues. And I don't think this change is sufficient to claim co-author status with Doug either ;-) Removed the claim :) The additional tracing may be useful but seems stylistically different from the rest of the code. Retained the tracking to understand if it is again the timing issue which is the cause in an event of a failure, however i can remove it if you think it is not necessary (OR) include an alternate solution as you may want to suggest. Overall I'm suspicious that the changed timeout actually fixes anything - normally we need to add longer timeouts not shorter ones. Does this fail on a range of machines or only specific ones? Have we verified that the clocks/timers are behaving properly on those systems? Here the time out is not about waiting for threads to complete something but to be interrupted before being considered done, so we decreased the timeout. However we now chose to increase the number of iterations to 500 from 100(thanks to tristan for the suggestion) instead of decreasing the timeout as done earlier because the increasing iterations ensures the threads are busy for long time curtailing the need to touch the timeout. Thanks, David -- Thanks kalyan Ph: (408)-585-8040 On Wed, Nov 20, 2013 at 11:52 AM, srikalyan srikalyan.chandrashe...@oracle.com wrote: Hi Martin , apologies for the delay , was trying to get help for hosting my webrev. . Please see inline text. On 11/19/13, 10:35 PM, Martin Buchholz wrote: Hi Kalyan, None of us can review your changes yet because you haven't given us a URL of your webrev. It is located here http://cr.openjdk.java.net/~cl/host_for_srikalyan_6772009_CancelledLockLoops/ I've tried to make the jsr166 copy of CancelledLockLoops fail by adjusting ITERS and TIMEOUT radically up and down, but the test just keeps on passing for me. Hints appreciated. Bump up the timeout to 500ms and you will see a failure (i can see it consistently on my machine Linux 64bit,8GBRAM,dual cores, with JDK 1.8 latest any promoted build). On Tue, Nov 19, 2013 at 6:39 PM, srikalyan chandrashekar srikalyan.chandrashe...@oracle.com wrote: Suggested Fix: a) Decrease the timeout from 100 to 50ms which will ensure that the test will pass even on faster machines This doesn't look like a permanent fix - it just makes the failing case rarer. Thats true , the other way is to make the main thread wait on TIMEOUT after firing the interrupts instead of other way round, but that would be over-optimization which probably is not desirable as well. The 50 ms was arrived at empirically after running several 100 times on multiple configurations and did not cause failures. -- Thanks kalyan Ph: (408)-585-8040
Re: RFR for JDK-6772009 Intermittent test failure: java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java test failed with 'Completed != 2'
Hi David, Thanks for the review, the new webrev is hosted at http://cr.openjdk.java.net/~cl/host_for_kal/6772009-CancelledLockLoop/ . Please see inline text. On 11/20/13, 6:35 PM, David Holmes wrote: On 21/11/2013 10:28 AM, Martin Buchholz wrote: I again tried and failed to reproduce a failure. Even if I go whole hog and multiply TIMEOUT by 100 and divide ITERS by 100, the test continues to PASS. Is it just me?! I think you are going the wrong way Martin - you want the timeout to be smaller than the time it takes to execute ITERS. I don't think there's any reason to make result long. It's not even used except to inhibit hotspot optimizations. +private volatile long result = 17;//Can get int overflow,so using long Further the subsequent use of += is incorrect as it is not an atomic operation. Even if we don't care about the value, it looks bad. Made the necessary changes for atomic update. I'm not sure result must be updated if we get a BrokenBarrierException either. Probably harmless, but necessary? I retained it in the fix for completeness in updating the numbers, please let me know if you still think otherwise. need to fix spelling and spacing below. +barrier.await();//If a BrokeBarrierException happens here(due to There are a number of style issues with spacing around comments. Fixed the spelling error and styling issues. And I don't think this change is sufficient to claim co-author status with Doug either ;-) Removed the claim :) The additional tracing may be useful but seems stylistically different from the rest of the code. Retained the tracking to understand if it is again the timing issue which is the cause in an event of a failure, however i can remove it if you think it is not necessary (OR) include an alternate solution as you may want to suggest. Overall I'm suspicious that the changed timeout actually fixes anything - normally we need to add longer timeouts not shorter ones. Does this fail on a range of machines or only specific ones? Have we verified that the clocks/timers are behaving properly on those systems? Here the time out is not about waiting for threads to complete something but to be interrupted before being considered done, so we decreased the timeout. However we now chose to increase the number of iterations to 500 from 100(thanks to tristan for the suggestion) instead of decreasing the timeout as done earlier because the increasing iterations ensures the threads are busy for long time curtailing the need to touch the timeout. Thanks, David -- Thanks kalyan Ph: (408)-585-8040 On Wed, Nov 20, 2013 at 11:52 AM, srikalyan srikalyan.chandrashe...@oracle.com wrote: Hi Martin , apologies for the delay , was trying to get help for hosting my webrev. . Please see inline text. On 11/19/13, 10:35 PM, Martin Buchholz wrote: Hi Kalyan, None of us can review your changes yet because you haven't given us a URL of your webrev. It is located here http://cr.openjdk.java.net/~cl/host_for_srikalyan_6772009_CancelledLockLoops/ I've tried to make the jsr166 copy of CancelledLockLoops fail by adjusting ITERS and TIMEOUT radically up and down, but the test just keeps on passing for me. Hints appreciated. Bump up the timeout to 500ms and you will see a failure (i can see it consistently on my machine Linux 64bit,8GBRAM,dual cores, with JDK 1.8 latest any promoted build). On Tue, Nov 19, 2013 at 6:39 PM, srikalyan chandrashekar srikalyan.chandrashe...@oracle.com wrote: Suggested Fix: a) Decrease the timeout from 100 to 50ms which will ensure that the test will pass even on faster machines This doesn't look like a permanent fix - it just makes the failing case rarer. Thats true , the other way is to make the main thread wait on TIMEOUT after firing the interrupts instead of other way round, but that would be over-optimization which probably is not desirable as well. The 50 ms was arrived at empirically after running several 100 times on multiple configurations and did not cause failures. -- Thanks kalyan Ph: (408)-585-8040
Re: RFR for JDK-6772009 Intermittent test failure: java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java test failed with 'Completed != 2'
Hi Martin , apologies for the delay , was trying to get help for hosting my webrev. . Please see inline text. On 11/19/13, 10:35 PM, Martin Buchholz wrote: Hi Kalyan, None of us can review your changes yet because you haven't given us a URL of your webrev. It is located here http://cr.openjdk.java.net/~cl/host_for_srikalyan_6772009_CancelledLockLoops/ I've tried to make the jsr166 copy of CancelledLockLoops fail by adjusting ITERS and TIMEOUT radically up and down, but the test just keeps on passing for me. Hints appreciated. Bump up the timeout to 500ms and you will see a failure (i can see it consistently on my machine Linux 64bit,8GBRAM,dual cores, with JDK 1.8 latest any promoted build). On Tue, Nov 19, 2013 at 6:39 PM, srikalyan chandrashekar srikalyan.chandrashe...@oracle.com mailto:srikalyan.chandrashe...@oracle.com wrote: Suggested Fix: a) Decrease the timeout from 100 to 50ms which will ensure that the test will pass even on faster machines This doesn't look like a permanent fix - it just makes the failing case rarer. Thats true , the other way is to make the main thread wait on TIMEOUT after firing the interrupts instead of other way round, but that would be over-optimization which probably is not desirable as well. The 50 ms was arrived at empirically after running several 100 times on multiple configurations and did not cause failures. -- Thanks kalyan Ph: (408)-585-8040
Re: RFR for JDK-6772009 Intermittent test failure: java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java test failed with 'Completed != 2'
Hi David , webrev is hosted here http://cr.openjdk.java.net/~cl/host_for_srikalyan_6772009_CancelledLockLoops/ . -- Thanks kalyan Ph: (408)-585-8040 On 11/19/13, 11:03 AM, David Holmes wrote: Hi, Attachments are stripped. Please post on cr.openjdk.java.net (get a colleague to host this if you don't have an account yet.) Thanks, David On 19/11/2013 4:12 PM, srikalyan chandrashekar wrote: Hi all, I am working on bug JDK-6772009 https://bugs.openjdk.java.net/browse/JDK-6772009 . Root Cause: The timeout value gives too much grace period on faster machines on which the TO BE INTERRUPTED threads complete faster before being interrupted at right time. Suggested Fix: a) Decrease the timeout from 100 to 50ms which will ensure that the test will pass even on faster machines , and ensures the threads will be canceled when running and anyways there is a Barrier to ensure the test threads all complete gracefully. Miscellaneous fixes b) Convert result from int to long(possible integer overflow otherwise) c) Catch BrokenBarrierException in more granular fashion in ReentrantLockLoop to update and print the results (which will be missed otherwise) Add more diagnostics d) Assign names to threads e) Print which threads update the 'completed' variable. I have attached the webrev for convenience as the changes are interleaved and is best viewed as sdiff. Please let me know if you have any comments or suggestions. Thank you
Re: RFR for JDK-6772009 Intermittent test failure: java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java test failed with 'Completed != 2'
On 21/11/2013 10:28 AM, Martin Buchholz wrote: I again tried and failed to reproduce a failure. Even if I go whole hog and multiply TIMEOUT by 100 and divide ITERS by 100, the test continues to PASS. Is it just me?! I think you are going the wrong way Martin - you want the timeout to be smaller than the time it takes to execute ITERS. I don't think there's any reason to make result long. It's not even used except to inhibit hotspot optimizations. +private volatile long result = 17;//Can get int overflow,so using long Further the subsequent use of += is incorrect as it is not an atomic operation. Even if we don't care about the value, it looks bad. I'm not sure result must be updated if we get a BrokenBarrierException either. Probably harmless, but necessary? need to fix spelling and spacing below. +barrier.await();//If a BrokeBarrierException happens here(due to There are a number of style issues with spacing around comments. And I don't think this change is sufficient to claim co-author status with Doug either ;-) The additional tracing may be useful but seems stylistically different from the rest of the code. Overall I'm suspicious that the changed timeout actually fixes anything - normally we need to add longer timeouts not shorter ones. Does this fail on a range of machines or only specific ones? Have we verified that the clocks/timers are behaving properly on those systems? Thanks, David On Wed, Nov 20, 2013 at 11:52 AM, srikalyan srikalyan.chandrashe...@oracle.com wrote: Hi Martin , apologies for the delay , was trying to get help for hosting my webrev. . Please see inline text. On 11/19/13, 10:35 PM, Martin Buchholz wrote: Hi Kalyan, None of us can review your changes yet because you haven't given us a URL of your webrev. It is located here http://cr.openjdk.java.net/~cl/host_for_srikalyan_6772009_CancelledLockLoops/ I've tried to make the jsr166 copy of CancelledLockLoops fail by adjusting ITERS and TIMEOUT radically up and down, but the test just keeps on passing for me. Hints appreciated. Bump up the timeout to 500ms and you will see a failure (i can see it consistently on my machine Linux 64bit,8GBRAM,dual cores, with JDK 1.8 latest any promoted build). On Tue, Nov 19, 2013 at 6:39 PM, srikalyan chandrashekar srikalyan.chandrashe...@oracle.com wrote: Suggested Fix: a) Decrease the timeout from 100 to 50ms which will ensure that the test will pass even on faster machines This doesn't look like a permanent fix - it just makes the failing case rarer. Thats true , the other way is to make the main thread wait on TIMEOUT after firing the interrupts instead of other way round, but that would be over-optimization which probably is not desirable as well. The 50 ms was arrived at empirically after running several 100 times on multiple configurations and did not cause failures. -- Thanks kalyan Ph: (408)-585-8040
Re: RFR for JDK-6772009 Intermittent test failure: java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java test failed with 'Completed != 2'
Hi, Attachments are stripped. Please post on cr.openjdk.java.net (get a colleague to host this if you don't have an account yet.) Thanks, David On 19/11/2013 4:12 PM, srikalyan chandrashekar wrote: Hi all, I am working on bug JDK-6772009 https://bugs.openjdk.java.net/browse/JDK-6772009 . Root Cause: The timeout value gives too much grace period on faster machines on which the TO BE INTERRUPTED threads complete faster before being interrupted at right time. Suggested Fix: a) Decrease the timeout from 100 to 50ms which will ensure that the test will pass even on faster machines , and ensures the threads will be canceled when running and anyways there is a Barrier to ensure the test threads all complete gracefully. Miscellaneous fixes b) Convert result from int to long(possible integer overflow otherwise) c) Catch BrokenBarrierException in more granular fashion in ReentrantLockLoop to update and print the results (which will be missed otherwise) Add more diagnostics d) Assign names to threads e) Print which threads update the 'completed' variable. I have attached the webrev for convenience as the changes are interleaved and is best viewed as sdiff. Please let me know if you have any comments or suggestions. Thank you
Re: RFR for JDK-6772009 Intermittent test failure: java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java test failed with 'Completed != 2'
Hi Martin, i incorporated the recent changes from the pointer as well. I have reproduced the failure, the logs of which are attached to the bug JDK-6772009 https://bugs.openjdk.java.net/browse/JDK-6772009 . The failed log is especially interesting . -- Thanks kalyan On 11/18/13 10:15 PM, Martin Buchholz wrote: Thanks for working on this. There have been some recent upstream changes to this test as well. Please incorporate them. http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/jtreg/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java?view=co The jsr166 maintainers haven't been able to reproduce any failures in this test. Do you have any hints that might help us? On Mon, Nov 18, 2013 at 10:12 PM, srikalyan chandrashekar srikalyan.chandrashe...@oracle.com mailto:srikalyan.chandrashe...@oracle.com wrote: Hi all, I am working on bug JDK-6772009 https://bugs.openjdk.java.net/browse/JDK-6772009 . Root Cause: The timeout value gives too much grace period on faster machines on which the TO BE INTERRUPTED threads complete faster before being interrupted at right time. Suggested Fix: a) Decrease the timeout from 100 to 50ms which will ensure that the test will pass even on faster machines , and ensures the threads will be canceled when running and anyways there is a Barrier to ensure the test threads all complete gracefully. Miscellaneous fixes b) Convert result from int to long(possible integer overflow otherwise) c) Catch BrokenBarrierException in more granular fashion in ReentrantLockLoop to update and print the results (which will be missed otherwise) Add more diagnostics d) Assign names to threads e) Print which threads update the 'completed' variable. I have attached the webrev for convenience as the changes are interleaved and is best viewed as sdiff. Please let me know if you have any comments or suggestions. Thank you -- -- Thanks kalyan