Re: Hadoop Windows Build

2024-04-28 Thread Gautham Banasandra
Yeah, I just noticed that. May I know how I can abort all the jobs at once? I 
only saw that I
can cancel the jobs one-by-one.

Thanks,
--Gautham

On 2024/04/28 15:19:13 Ayush Saxena wrote:
> Thanx Gautham for chasing this.
> 
> I think there are still some 119 in the build queue, if you see on the left
> here [1](Search for Build Queue). They are all stuck on "Waiting for next
> available executor on Windows"
> 
> If you aborted all previously & they showed up now again, then something is
> still messed up with the configurations that the pipeline is getting
> triggered for the existing PR (not new), if you didn't abort earlier then
> maybe you need to abort all the ones in queue and free up the resources.
> 
> One example of build waiting (as of now) for resource since past 7 hours [2]
> 
> Let me know if you are stuck, we can together get things figured out :-)
> 
> -Ayush
> 
> 
> [1]
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/change-requests/builds
> [2]
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/change-requests/job/PR-6423/2/console
> 
> On Sun, 28 Apr 2024 at 13:43, Gautham Banasandra  wrote:
> 
> > Hi folks,
> >
> > I apologize for the inconvenience caused. I've now applied the mitigation
> > described in [3].
> >
> > Unfortunately, there are only 12 Windows nodes in the whole swarm of
> > Jenkins build nodes.
> > Thus, this caused a starvation of the Windows nodes for other projects.
> >
> > I had reached out to the infra team several months ago and requested them
> > to add more
> > Windows nodes, but it was turned down. I'm not sure if there's a way
> > around this, other than
> > getting more Windows nodes.
> >
> > Thanks,
> > --Gautham
> >
> > On 2024/04/28 04:53:32 Ayush Saxena wrote:
> > > Found this on dev@hadoop -> Moving to common-dev (the ML we use)
> > >
> > > I think there was some initiative to enable Windows Pre-Commit for every
> > PR
> > > and that seems to have gone wild, either the number of PRs raised are way
> > > more than the capacity the nodes can handle or something got
> > misconfigured
> > > in the job itself that the build is getting triggered for all the open PR
> > > not just new, which is leading to starvation of resources.
> > >
> > > To the best of my knowledge
> > > @Gautham Banasandra  / @Iñigo Goiri <
> > elgo...@gmail.com> are
> > > chasing the initiative, can you folks help check?
> > >
> > > There are concerns raised by the Infra team here [1] on dev@hadoop
> > >
> > > Most probably something messed up while configuring the
> > > hadoop-multibranch-windows job, it shows some 613 PR scheduled [2], I
> > think
> > > it scheduled for all open ones, something similar happened long-long ago
> > > when we were doing migrations, can fetch pointers from [3]
> > >
> > > [1] https://lists.apache.org/thread/7nsyd0vtpb87fhm0fpv8frh6dzk3b3tl
> > > [2]
> > >
> > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/change-requests/builds
> > > [3] https://lists.apache.org/thread/8pxf2yon3r9g61zgv9cf120qnhrs8q23
> > >
> > > -Ayush
> > >
> > >
> > > On 2024/04/26 16:59:04 Wei-Chiu Chuang wrote:
> > > > I'm not familiar with Windows build. But you may have better luck
> > reaching
> > > > out to Apache Infra
> > > > https://infra.apache.org/contact.html
> > > >
> > > > mailing list, jira or even slack
> > > >
> > > > On Fri, Apr 26, 2024 at 9:42 AM Cesar Hernandez 
> > > > wrote:
> > > >
> > > > > Hello,
> > > > > An option that can be implemented in the Hadoop pipeline [1] is to
> > set a
> > > > > timeout [2] on critical stages within the pipelines, for example in
> > > > > "Windows 10" stage .
> > > > > As for the issue the Ci build is logging [3] in the
> > hadoop-multibranch
> > > jobs
> > > > > reported by Chris, it seems the issue is around the Post (cleanup)
> > > pipeline
> > > > > process. My two cents is to use cleanWs() instead of deleteDir() as
> > > > > documented in: https://plugins.jenkins.io/ws-cleanup/
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> > https://github.com/apache/hadoop/blob/trunk/dev-support/jenkinsfile-windows-10
> > > > >
> > > > > [2]
> > > > >
> > > > >
> > >
> > https://www.jenkins.io/doc/pipeline/steps/workflow-basic-steps/#timeout-enforce-time-limit
> > > > >
> > > > > [3]
> > > > >
> > > > > Still waiting to schedule task
> > > > > Waiting for next available executor on ‘Windows
> > > > > ’[Pipeline] //
> > > > > node[Pipeline] stage
> > > > > <
> > > > >
> > >
> > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > > >[Pipeline]
> > > > > { (Declarative: Post Actions)
> > > > > <
> > > > >
> > >
> > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > > >[Pipeline]
> > > > > script <
> > > > >
> > >
> > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > > >[Pipeline]
> > > > > { <
> > > > >
> > >

Re: Hadoop Windows Build

2024-04-28 Thread Ayush Saxena
Thanx Gautham for chasing this.

I think there are still some 119 in the build queue, if you see on the left
here [1](Search for Build Queue). They are all stuck on "Waiting for next
available executor on Windows"

If you aborted all previously & they showed up now again, then something is
still messed up with the configurations that the pipeline is getting
triggered for the existing PR (not new), if you didn't abort earlier then
maybe you need to abort all the ones in queue and free up the resources.

One example of build waiting (as of now) for resource since past 7 hours [2]

Let me know if you are stuck, we can together get things figured out :-)

-Ayush


[1]
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/change-requests/builds
[2]
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/change-requests/job/PR-6423/2/console

On Sun, 28 Apr 2024 at 13:43, Gautham Banasandra  wrote:

> Hi folks,
>
> I apologize for the inconvenience caused. I've now applied the mitigation
> described in [3].
>
> Unfortunately, there are only 12 Windows nodes in the whole swarm of
> Jenkins build nodes.
> Thus, this caused a starvation of the Windows nodes for other projects.
>
> I had reached out to the infra team several months ago and requested them
> to add more
> Windows nodes, but it was turned down. I'm not sure if there's a way
> around this, other than
> getting more Windows nodes.
>
> Thanks,
> --Gautham
>
> On 2024/04/28 04:53:32 Ayush Saxena wrote:
> > Found this on dev@hadoop -> Moving to common-dev (the ML we use)
> >
> > I think there was some initiative to enable Windows Pre-Commit for every
> PR
> > and that seems to have gone wild, either the number of PRs raised are way
> > more than the capacity the nodes can handle or something got
> misconfigured
> > in the job itself that the build is getting triggered for all the open PR
> > not just new, which is leading to starvation of resources.
> >
> > To the best of my knowledge
> > @Gautham Banasandra  / @Iñigo Goiri <
> elgo...@gmail.com> are
> > chasing the initiative, can you folks help check?
> >
> > There are concerns raised by the Infra team here [1] on dev@hadoop
> >
> > Most probably something messed up while configuring the
> > hadoop-multibranch-windows job, it shows some 613 PR scheduled [2], I
> think
> > it scheduled for all open ones, something similar happened long-long ago
> > when we were doing migrations, can fetch pointers from [3]
> >
> > [1] https://lists.apache.org/thread/7nsyd0vtpb87fhm0fpv8frh6dzk3b3tl
> > [2]
> >
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/change-requests/builds
> > [3] https://lists.apache.org/thread/8pxf2yon3r9g61zgv9cf120qnhrs8q23
> >
> > -Ayush
> >
> >
> > On 2024/04/26 16:59:04 Wei-Chiu Chuang wrote:
> > > I'm not familiar with Windows build. But you may have better luck
> reaching
> > > out to Apache Infra
> > > https://infra.apache.org/contact.html
> > >
> > > mailing list, jira or even slack
> > >
> > > On Fri, Apr 26, 2024 at 9:42 AM Cesar Hernandez 
> > > wrote:
> > >
> > > > Hello,
> > > > An option that can be implemented in the Hadoop pipeline [1] is to
> set a
> > > > timeout [2] on critical stages within the pipelines, for example in
> > > > "Windows 10" stage .
> > > > As for the issue the Ci build is logging [3] in the
> hadoop-multibranch
> > jobs
> > > > reported by Chris, it seems the issue is around the Post (cleanup)
> > pipeline
> > > > process. My two cents is to use cleanWs() instead of deleteDir() as
> > > > documented in: https://plugins.jenkins.io/ws-cleanup/
> > > >
> > > > [1]
> > > >
> > > >
> >
> https://github.com/apache/hadoop/blob/trunk/dev-support/jenkinsfile-windows-10
> > > >
> > > > [2]
> > > >
> > > >
> >
> https://www.jenkins.io/doc/pipeline/steps/workflow-basic-steps/#timeout-enforce-time-limit
> > > >
> > > > [3]
> > > >
> > > > Still waiting to schedule task
> > > > Waiting for next available executor on ‘Windows
> > > > ’[Pipeline] //
> > > > node[Pipeline] stage
> > > > <
> > > >
> >
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > >[Pipeline]
> > > > { (Declarative: Post Actions)
> > > > <
> > > >
> >
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > >[Pipeline]
> > > > script <
> > > >
> >
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > >[Pipeline]
> > > > { <
> > > >
> >
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > >[Pipeline]
> > > > deleteDir <
> > > >
> >
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > >[Pipeline]
> > > > }[Pipeline] // scriptError when executing cleanup post condition:
> > > > Also:   org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId:
> > > > ca1b7f2f-ec16-4bde-ac51-85f964794e37
> > > > 

Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64

2024-04-28 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/

No changes




-1 overall


The following subsystems voted -1:
asflicense hadolint mvnsite pathlen unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.fs.TestFileUtil 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   
hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain 
   hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion 
   hadoop.hdfs.TestDFSInotifyEventInputStream 
   hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap 
   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   hadoop.hdfs.server.federation.router.TestRouterQuota 
   hadoop.hdfs.server.federation.router.TestRouterNamenodeHeartbeat 
   hadoop.hdfs.server.federation.resolver.order.TestLocalResolver 
   hadoop.hdfs.server.federation.resolver.TestMultipleDestinationResolver 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.mapreduce.lib.input.TestLineRecordReader 
   hadoop.mapred.TestLineRecordReader 
   hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter 
   hadoop.resourceestimator.service.TestResourceEstimatorService 
   hadoop.resourceestimator.solver.impl.TestLpSolver 
   hadoop.yarn.sls.TestSLSRunner 
   
hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceAllocator
 
   
hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceHandlerImpl
 
   hadoop.yarn.server.resourcemanager.TestClientRMService 
   
hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker
 
  

   cc:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/diff-compile-javac-root.txt
  [488K]

   checkstyle:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/diff-checkstyle-root.txt
  [14M]

   hadolint:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   mvnsite:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/patch-mvnsite-root.txt
  [568K]

   pathlen:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/diff-patch-pylint.txt
  [20K]

   shellcheck:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/diff-patch-shellcheck.txt
  [72K]

   whitespace:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/whitespace-eol.txt
  [12M]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/whitespace-tabs.txt
  [1.3M]

   javadoc:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/patch-javadoc-root.txt
  [36K]

   unit:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
  [220K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [452K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
  [36K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt
  [16K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt
  [104K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/patch-unit-hadoop-tools_hadoop-azure.txt
  [20K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/patch-unit-hadoop-tools_hadoop-resourceestimator.txt
  [16K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1376/artifact/out/patch-unit-hadoop-tools_hadoop-sls.txt
  [28K]
   

Re: Hadoop Windows Build

2024-04-28 Thread Gautham Banasandra
Hi folks,

I apologize for the inconvenience caused. I've now applied the mitigation 
described in [3].

Unfortunately, there are only 12 Windows nodes in the whole swarm of Jenkins 
build nodes.
Thus, this caused a starvation of the Windows nodes for other projects.

I had reached out to the infra team several months ago and requested them to 
add more
Windows nodes, but it was turned down. I'm not sure if there's a way around 
this, other than
getting more Windows nodes.

Thanks,
--Gautham

On 2024/04/28 04:53:32 Ayush Saxena wrote:
> Found this on dev@hadoop -> Moving to common-dev (the ML we use)
> 
> I think there was some initiative to enable Windows Pre-Commit for every PR
> and that seems to have gone wild, either the number of PRs raised are way
> more than the capacity the nodes can handle or something got misconfigured
> in the job itself that the build is getting triggered for all the open PR
> not just new, which is leading to starvation of resources.
> 
> To the best of my knowledge
> @Gautham Banasandra  / @Iñigo Goiri  
> are
> chasing the initiative, can you folks help check?
> 
> There are concerns raised by the Infra team here [1] on dev@hadoop
> 
> Most probably something messed up while configuring the
> hadoop-multibranch-windows job, it shows some 613 PR scheduled [2], I think
> it scheduled for all open ones, something similar happened long-long ago
> when we were doing migrations, can fetch pointers from [3]
> 
> [1] https://lists.apache.org/thread/7nsyd0vtpb87fhm0fpv8frh6dzk3b3tl
> [2]
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/change-requests/builds
> [3] https://lists.apache.org/thread/8pxf2yon3r9g61zgv9cf120qnhrs8q23
> 
> -Ayush
> 
> 
> On 2024/04/26 16:59:04 Wei-Chiu Chuang wrote:
> > I'm not familiar with Windows build. But you may have better luck reaching
> > out to Apache Infra
> > https://infra.apache.org/contact.html
> >
> > mailing list, jira or even slack
> >
> > On Fri, Apr 26, 2024 at 9:42 AM Cesar Hernandez 
> > wrote:
> >
> > > Hello,
> > > An option that can be implemented in the Hadoop pipeline [1] is to set a
> > > timeout [2] on critical stages within the pipelines, for example in
> > > "Windows 10" stage .
> > > As for the issue the Ci build is logging [3] in the hadoop-multibranch
> jobs
> > > reported by Chris, it seems the issue is around the Post (cleanup)
> pipeline
> > > process. My two cents is to use cleanWs() instead of deleteDir() as
> > > documented in: https://plugins.jenkins.io/ws-cleanup/
> > >
> > > [1]
> > >
> > >
> https://github.com/apache/hadoop/blob/trunk/dev-support/jenkinsfile-windows-10
> > >
> > > [2]
> > >
> > >
> https://www.jenkins.io/doc/pipeline/steps/workflow-basic-steps/#timeout-enforce-time-limit
> > >
> > > [3]
> > >
> > > Still waiting to schedule task
> > > Waiting for next available executor on ‘Windows
> > > ’[Pipeline] //
> > > node[Pipeline] stage
> > > <
> > >
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > >[Pipeline]
> > > { (Declarative: Post Actions)
> > > <
> > >
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > >[Pipeline]
> > > script <
> > >
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > >[Pipeline]
> > > { <
> > >
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > >[Pipeline]
> > > deleteDir <
> > >
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > >[Pipeline]
> > > }[Pipeline] // scriptError when executing cleanup post condition:
> > > Also:   org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId:
> > > ca1b7f2f-ec16-4bde-ac51-85f964794e37
> > > org.jenkinsci.plugins.workflow.steps.MissingContextVariableException:
> > > Required context class hudson.FilePath is missing
> > > Perhaps you forgot to surround the code with a step that provides
> > > this, such as: node
> > > at
> > >
> org.jenkinsci.plugins.workflow.steps.StepDescriptor.checkContextAvailability(StepDescriptor.java:265)
> > > at
> org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:300)
> > > at
> > > org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:196)
> > > at
> > >
> org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:124)
> > > at
> jdk.internal.reflect.GeneratedMethodAccessor1084.invoke(Unknown
> > > Source)
> > > at
> > >
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> > > at
> > > org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98)
> > > at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
> > > at
> groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225)
> > >