Re: ConcurrentModificationException during pipeline serialization (MAX_SURVIVABILITY)

2021-11-17 Thread Ullrich Hafner


> Am 17.11.2021 um 16:36 schrieb 'Devin Nusbaum' via Jenkins Developers 
> :
> 
> > As far as I understood the problem in JENKINS-67145 
> >  my step is somehow 
> > serialized during the execution in lines 205-215.
> 
> Kind of, but I was talking about the publishIssues step, not the 
> scanForIssues step. I think the scanForIssues steps in the parallel branches 
> have already completed when the problem manifests and can be ignored based on 
> the information in the Jira ticket.
> 
> > Or can I view the code in a step as an atomic unit? 
> 
> You only need to worry about the serialized state of the Pipeline, and in 
> this case I think you just need to worry about local variables in the 
> Pipeline (e.g. AnnotatedReport objects returned by scanForIssues) and the 
> StepExecution object for currently-running steps (e.g. publishIssues). Local 
> variables in PublishIssuesStep.Execution.run and other Java code can be 
> ignored. Also, since you are using SynchronousNonBlockingStepExecution, 
> PublishIssuesStep.Execution is not resumable 
> 
>  and you may as well mark Execution.step as transient (but I don't think that 
> would fix the issue!).
> 

Ah ok, this is what I thought initially.  Then I can continue using my approach 
here :-) I totally misunderstood the problem.

> In your case, I think the main issue is with the AnnotatedReport objects that 
> are returned by scanForIssues that end up being local variables in the 
> Pipeline. You need to be careful about the ways in which you modify those 
> objects in the pubishIssues step, since local Groovy variables are serialized 
> as part of the Pipeline's state, and PublishIssuesStep.Execution.run is 
> running on a background thread. My original message describes the only way 
> that I noticed that publishIssues might modify one of those existing 
> AnnotatedReport objects (on a background thread separate from the CPS VM 
> thread that will serialize the Pipeline) that was returned by a previous 
> scanForIssues step.
> 
> My best guess as to minimally fixing the problem is to introduce a copy 
> constructor for FileStatistics and use it here 
> 
>  like this:
> 
> statisticsMapping.merge(additionalStatistics.getFileName(), new 
> FileStatistics(additionalStatistics), this::merge);
> 
> This way, if one of the FileStatistics from one of the other AnnotatedReport 
> instances returned by a scanIssues step has the same file name, 
> RepositoryStatistics.merge will only end up modifying the copied 
> FileStatistics rather than the original object.
> 

This is the problem, thanks! I thought that I am creating a new object but 
actually I am doing not. I’m changing the existing ones, that is a design flaw 
that I did overlook. I think a simple solution is to create new objects as you 
already mentioned. Thanks!   

> A more robust change would be to make AnnotatedReport and all of its fields 
> (and their fields recursively) immutable like Jesse mentioned.
> 
> On Wed, Nov 17, 2021 at 10:02 AM Ullrich Hafner  > wrote:
> 
> 
>> Am 17.11.2021 um 14:42 schrieb 'Jesse Glick' via Jenkins Developers 
>> mailto:jenkinsci-dev@googlegroups.com>>:
>> 
>> On Wed, Nov 17, 2021 at 5:57 AM Ullrich Hafner > > wrote:
>> […] is stored in a CopyOnWriteArrayList. If your background thread 
>> serializes my instance during the loop it can happen that the loop variable 
>> is already set, but the list with the results is not.
>> 
>> Hard to follow what you are saying without a code reference, but a 
>> `CopyOnWriteArrayList` is designed to be safe to save from another thread; 
>> that is its whole purpose.
>> 
>> I need some lock around the whole block of variables
>> 
>> Block of fields? Again hard to discuss without a concrete example, but in 
>> general if there is a block of data that should be replaced atomically, best 
>> to keep it all in its own object with `final` fields, and refer to that one 
>> object with a `volatile` field. 
>> 
> 
> Actually I do not have fields anymore in the step:
> https://github.com/jenkinsci/warnings-ng-plugin/blob/089ad6c3aac292bc926503e644b892371f609e88/plugin/src/main/java/io/jenkins/plugins/analysis/core/steps/ScanForIssuesStep.java#L205-L215
>  
> 
> 
> As far as I understood the problem in JENKINS-67145 
>  my step is 

Re: ConcurrentModificationException during pipeline serialization (MAX_SURVIVABILITY)

2021-11-17 Thread 'Devin Nusbaum' via Jenkins Developers
> As far as I understood the problem in JENKINS-67145
 my step is somehow
serialized during the execution in lines 205-215.

Kind of, but I was talking about the publishIssues step, not the
scanForIssues step. I think the scanForIssues steps in the
parallel branches have already completed when the problem manifests and can
be ignored based on the information in the Jira ticket.

> Or can I view the code in a step as an atomic unit?

You only need to worry about the serialized state of the Pipeline, and in
this case I think you just need to worry about local variables in the
Pipeline (e.g. AnnotatedReport objects returned by scanForIssues) and the
StepExecution object for currently-running steps (e.g. publishIssues).
Local variables in PublishIssuesStep.Execution.run and other Java code can
be ignored. Also, since you are using SynchronousNonBlockingStepExecution,
PublishIssuesStep.Execution is not resumable

and you may as well mark Execution.step as transient (but I don't think
that would fix the issue!).

In your case, I think the main issue is with the AnnotatedReport objects
that are returned by scanForIssues that end up being local variables in the
Pipeline. You need to be careful about the ways in which you modify those
objects in the pubishIssues step, since local Groovy variables are
serialized as part of the Pipeline's state, and
PublishIssuesStep.Execution.run is running on a background thread. My
original message describes the only way that I noticed that publishIssues
might modify one of those existing AnnotatedReport objects (on a background
thread separate from the CPS VM thread that will serialize the Pipeline)
that was returned by a previous scanForIssues step.

My best guess as to minimally fixing the problem is to introduce a copy
constructor for FileStatistics and use it here

like this:

statisticsMapping.merge(additionalStatistics.getFileName(), new
FileStatistics(additionalStatistics), this::merge);

This way, if one of the FileStatistics from one of the other
AnnotatedReport instances returned by a scanIssues step has the same file
name, RepositoryStatistics.merge will only end up modifying the copied
FileStatistics rather than the original object.

A more robust change would be to make AnnotatedReport and all of its fields
(and their fields recursively) immutable like Jesse mentioned.

On Wed, Nov 17, 2021 at 10:02 AM Ullrich Hafner 
wrote:

>
>
> Am 17.11.2021 um 14:42 schrieb 'Jesse Glick' via Jenkins Developers <
> jenkinsci-dev@googlegroups.com>:
>
> On Wed, Nov 17, 2021 at 5:57 AM Ullrich Hafner 
> wrote:
>
>> […] is stored in a CopyOnWriteArrayList. If your background thread
>> serializes my instance during the loop it can happen that the loop variable
>> is already set, but the list with the results is not.
>>
>
> Hard to follow what you are saying without a code reference, but a
> `CopyOnWriteArrayList` is designed to be safe to save from another thread;
> that is its whole purpose.
>
> I need some lock around the whole block of variables
>>
>
> Block of *fields*? Again hard to discuss without a concrete example, but
> in general if there is a block of data that should be replaced atomically,
> best to keep it all in its own object with `final` fields, and refer to
> that one object with a `volatile` field.
>
>
> Actually I do not have fields anymore in the step:
>
> https://github.com/jenkinsci/warnings-ng-plugin/blob/089ad6c3aac292bc926503e644b892371f609e88/plugin/src/main/java/io/jenkins/plugins/analysis/core/steps/ScanForIssuesStep.java#L205-L215
>
> As far as I understood the problem in JENKINS-67145
>  my step is somehow
> serialized during the execution in lines 205-215. Or can I view the code in
> a step as an atomic unit?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Jenkins Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to jenkinsci-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jenkinsci-dev/CANfRfr0ouBxhg1arOFwosdrit-O9WdBeiiAbgSAA00%3DPnbBkPQ%40mail.gmail.com
> 
> .
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Jenkins Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to 

Re: ConcurrentModificationException during pipeline serialization (MAX_SURVIVABILITY)

2021-11-17 Thread 'Jesse Glick' via Jenkins Developers
On Wed, Nov 17, 2021 at 10:02 AM Ullrich Hafner 
wrote:

> I do not have fields anymore in the step
>
>
Any class which could be referenced directly or indirectly as part of the
program state (object graph rooted in Pipeline virtual threads) must be
written to expect concurrent access from a serializer thread (JBoss
Marshalling into `program.dat`). The code you linked to is uninteresting in
this context except insofar as an `AnnotatedReport` object might be passed
eventually back to the program as the return value of a step. Therefore
that object should either be immutable, or all mutations must have been
finished by the time the step completes, or (if necessary, but suspicious)
all fields referring to mutable structures must be thread-safe. The same
applies to classes referenced from `build.xml` (typically via `Action`s)
since these would be serialized via XStream. Other Java objects created
transiently during the course of a step’s execution need not be considered.

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/CANfRfr0COr5os0QsOep-Wr7K6CyipWewWPFSYdy3uqqCDMPxKA%40mail.gmail.com.


Re: ConcurrentModificationException during pipeline serialization (MAX_SURVIVABILITY)

2021-11-17 Thread Ullrich Hafner


> Am 17.11.2021 um 14:42 schrieb 'Jesse Glick' via Jenkins Developers 
> :
> 
> On Wed, Nov 17, 2021 at 5:57 AM Ullrich Hafner  > wrote:
> […] is stored in a CopyOnWriteArrayList. If your background thread serializes 
> my instance during the loop it can happen that the loop variable is already 
> set, but the list with the results is not.
> 
> Hard to follow what you are saying without a code reference, but a 
> `CopyOnWriteArrayList` is designed to be safe to save from another thread; 
> that is its whole purpose.
> 
> I need some lock around the whole block of variables
> 
> Block of fields? Again hard to discuss without a concrete example, but in 
> general if there is a block of data that should be replaced atomically, best 
> to keep it all in its own object with `final` fields, and refer to that one 
> object with a `volatile` field. 
> 

Actually I do not have fields anymore in the step:
https://github.com/jenkinsci/warnings-ng-plugin/blob/089ad6c3aac292bc926503e644b892371f609e88/plugin/src/main/java/io/jenkins/plugins/analysis/core/steps/ScanForIssuesStep.java#L205-L215
 


As far as I understood the problem in JENKINS-67145 
 my step is somehow serialized 
during the execution in lines 205-215. Or can I view the code in a step as an 
atomic unit? 

> -- 
> You received this message because you are subscribed to the Google Groups 
> "Jenkins Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to jenkinsci-dev+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/jenkinsci-dev/CANfRfr0ouBxhg1arOFwosdrit-O9WdBeiiAbgSAA00%3DPnbBkPQ%40mail.gmail.com
>  
> .

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/80549ED3-122A-49E6-9327-0B2D3A376EB9%40gmail.com.


Re: ConcurrentModificationException during pipeline serialization (MAX_SURVIVABILITY)

2021-11-17 Thread 'Jesse Glick' via Jenkins Developers
On Wed, Nov 17, 2021 at 5:57 AM Ullrich Hafner 
wrote:

> […] is stored in a CopyOnWriteArrayList. If your background thread
> serializes my instance during the loop it can happen that the loop variable
> is already set, but the list with the results is not.
>

Hard to follow what you are saying without a code reference, but a
`CopyOnWriteArrayList` is designed to be safe to save from another thread;
that is its whole purpose.

I need some lock around the whole block of variables
>

Block of *fields*? Again hard to discuss without a concrete example, but in
general if there is a block of data that should be replaced atomically,
best to keep it all in its own object with `final` fields, and refer to
that one object with a `volatile` field.

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/CANfRfr0ouBxhg1arOFwosdrit-O9WdBeiiAbgSAA00%3DPnbBkPQ%40mail.gmail.com.


Re: ConcurrentModificationException during pipeline serialization (MAX_SURVIVABILITY)

2021-11-17 Thread Ullrich Hafner
After thinking about the problem in more detail have an additional question:

I can replace all of those list modifying operations with a 
CopyOnWriteArrayList. What is still not clear for me is, how this works if 
multiple objects are affected? E.g. I have a loop over a number of files that 
should be post-processed. The result of the post-processing is stored in a 
CopyOnWriteArrayList. If your background thread serializes my instance during 
the loop it can happen that the loop variable is already set, but the list with 
the results is not. That means I need some lock around the whole block of 
variables, or how is this handled?

> Am 16.11.2021 um 18:09 schrieb Ullrich Hafner :
> 
> Thanks for this in depth description of the problem! I see how much work is 
> required in order to rewrite that part of the code.
> 
>> Am 16.11.2021 um 17:42 schrieb 'Devin Nusbaum' via Jenkins Developers 
>> mailto:jenkinsci-dev@googlegroups.com>>:
>> 
>> The CPS VM thread is responsible for saving the Pipeline's execution state, 
>> so if you are using a non-blocking step execution (and it looks like you are 
>> ),
>>  it is possible that your step is executing on a background thread while the 
>> Pipeline program is being saved on the CPS VM thread. You should account for 
>> this in any mutable data reachable from non-blocking step executions that is 
>> part of the execution's serialized state, for example by using 
>> CopyOnWriteArrayList, replacing fields rather than mutating them, using 
>> writeReplace, etc.
>> 
>> The exception in the Jira ticket suggests that one of the objects inside of 
>> one of the AnnotatedReport instances returned by a scanIssues step, which is 
>> stored a local variable in the user's Pipeline, is being modified. Perhaps 
>> the issue is that there is a filename match in the FileStatistics between 
>> the issues collected by the user's different parallel branches, so when this 
>> code 
>> 
>>  runs for the first time inside of AnnotatedReport.addAll in 
>> PublishIssuesStep.Execution.Run it uses the exact FileStatistics instance 
>> from an existing AnnotatedReport instance, but when it runs after that it 
>> modifies the previous FileStatistics instance here 
>> ,
>>  and then since the FileStatistics instance is also reachable via one of the 
>> AnnotatedReport local variables in the Pipeline you have a potential 
>> ConcurrentModificationError depending on the serialization timing.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/56D693ED-8A35-427F-8BDF-AA50057AB7DC%40gmail.com.


Re: ConcurrentModificationException during pipeline serialization (MAX_SURVIVABILITY)

2021-11-16 Thread Ullrich Hafner
Thanks for this in depth description of the problem! I see how much work is 
required in order to rewrite that part of the code.

> Am 16.11.2021 um 17:42 schrieb 'Devin Nusbaum' via Jenkins Developers 
> :
> 
> The CPS VM thread is responsible for saving the Pipeline's execution state, 
> so if you are using a non-blocking step execution (and it looks like you are 
> ),
>  it is possible that your step is executing on a background thread while the 
> Pipeline program is being saved on the CPS VM thread. You should account for 
> this in any mutable data reachable from non-blocking step executions that is 
> part of the execution's serialized state, for example by using 
> CopyOnWriteArrayList, replacing fields rather than mutating them, using 
> writeReplace, etc.
> 
> The exception in the Jira ticket suggests that one of the objects inside of 
> one of the AnnotatedReport instances returned by a scanIssues step, which is 
> stored a local variable in the user's Pipeline, is being modified. Perhaps 
> the issue is that there is a filename match in the FileStatistics between the 
> issues collected by the user's different parallel branches, so when this code 
> 
>  runs for the first time inside of AnnotatedReport.addAll in 
> PublishIssuesStep.Execution.Run it uses the exact FileStatistics instance 
> from an existing AnnotatedReport instance, but when it runs after that it 
> modifies the previous FileStatistics instance here 
> ,
>  and then since the FileStatistics instance is also reachable via one of the 
> AnnotatedReport local variables in the Pipeline you have a potential 
> ConcurrentModificationError depending on the serialization timing.

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/5D420359-52FD-4F40-BD54-E02EA1AD2F14%40gmail.com.


Re: ConcurrentModificationException during pipeline serialization (MAX_SURVIVABILITY)

2021-11-16 Thread 'Devin Nusbaum' via Jenkins Developers
The CPS VM thread is responsible for saving the Pipeline's execution state,
so if you are using a non-blocking step execution (and it looks like you are
),
it is possible that your step is executing on a background thread while the
Pipeline program is being saved on the CPS VM thread. You should account
for this in any mutable data reachable from non-blocking step executions
that is part of the execution's serialized state, for example by using
CopyOnWriteArrayList, replacing fields rather than mutating them, using
writeReplace, etc.

The exception in the Jira ticket suggests that one of the objects inside of
one of the AnnotatedReport instances returned by a scanIssues step, which
is stored a local variable in the user's Pipeline, is being modified.
Perhaps the issue is that there is a filename match in the FileStatistics
between the issues collected by the user's different parallel branches, so
when this code

runs
for the first time inside of AnnotatedReport.addAll in
PublishIssuesStep.Execution.Run it uses the exact FileStatistics instance
from an existing AnnotatedReport instance, but when it runs after that it
modifies the previous FileStatistics instance here
,
and then since the FileStatistics instance is also reachable via one of the
AnnotatedReport local variables in the Pipeline you have a potential
ConcurrentModificationError depending on the serialization timing.

On Tue, Nov 16, 2021 at 8:09 AM Ullrich Hafner 
wrote:

> As far as I understand the concept of CPS pipelines, steps may be
> serialized to disk during runtime if the durability level is set to
> MAX_SURVIVABILITY. It seems that one of my steps causes a
> ConcurrentModificationException while the step’s memory is written to disk
> and in parallel the fields (or local variables?) are modified by my step
> (see [1] for the bug report). Is there anything I need to do in my plugin
> to avoid such problems?
>
> [1] https://issues.jenkins.io/browse/JENKINS-67145
>
> --
> You received this message because you are subscribed to the Google Groups
> "Jenkins Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to jenkinsci-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jenkinsci-dev/6F19E406-45C9-4D4F-82C9-D2712E996199%40gmail.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/CAF73ShAM%2ByRivnQBcW7XegcvKRjbHFCH5haF%3D2afOkYuP-tsig%40mail.gmail.com.


ConcurrentModificationException during pipeline serialization (MAX_SURVIVABILITY)

2021-11-16 Thread Ullrich Hafner
As far as I understand the concept of CPS pipelines, steps may be serialized to 
disk during runtime if the durability level is set to MAX_SURVIVABILITY. It 
seems that one of my steps causes a ConcurrentModificationException while the 
step’s memory is written to disk and in parallel the fields (or local 
variables?) are modified by my step (see [1] for the bug report). Is there 
anything I need to do in my plugin to avoid such problems? 

[1] https://issues.jenkins.io/browse/JENKINS-67145

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/6F19E406-45C9-4D4F-82C9-D2712E996199%40gmail.com.