Re: [E] Yetus is failing with Java unable to create threads

2020-12-21 Thread Ahmed Hussein
> In most cases, the OOM occurs when closing MiniDFSCluster.

@Akira Ajisaka  , I see some usage of parallelStream()
in RouterRPCServer.
If we consider pool executors in the code and parallelStream(), could it be
possible that there
is a large number of threads created in the MiniCluster and
MiniRouterCluster that causes the JVM to crash?

On Thu, Dec 17, 2020 at 9:13 PM Akira Ajisaka  wrote:

> Thank you for your reply.
>
> > Will reducing the number of threads not increase the build time?
> Yes, but the difference is 30 ~ 60 mins. Not so much, I think.
>
> > Can we not ask for more resources?
> Now the machines in https://ci-hadoop.apache.org/ are physical, and
> the memory size is fixed.
> (They are donated from Y!.
>
> https://cwiki.apache.org/confluence/display/INFRA/Build+nodes+-+node+name+to+hostname+mappings
> )
>
> I'll ask the infrastructure team how much memory we can use. If the
> size is not 20GB, we can update in
> https://github.com/apache/hadoop/pull/2560
>
> > I think RBF builds are quite stable
> Actually not:
> https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/357/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
> I attached the log because it will be deleted in 2 weeks.
>
> -Akira
>
> On Thu, Dec 17, 2020 at 7:19 PM Ayush Saxena  wrote:
> >
> > Hi Akira,
> > Will reducing the number of threads not increase the build time? I guess
> it takes in general 2.5-3.5 hrs in the present scenario. Moreover the
> thread count hasn’t been increased recently, Would that be the root of all
> evil?
> >
> > Can we not ask for more resources?
> >
> > Anyway, I think RBF builds are quite stable, I don’t remember seeing OOM
> there, so, in case we decide to reduce the thread count, may be we can keep
> RBF as is?
> >
> > -Ayush
> >
> > > On 17-Dec-2020, at 2:15 PM, Akira Ajisaka  wrote:
> > >
> > > Sorry, now I think the above comment is wrong. Please ignore.
> > > In hadoop-common, hadoop-hdfs, and hadoop-hdfs-rbf, the unit tests are
> > > executed in parallel. I'd like to reduce the number of tests running
> > > at the same time to avoid OOM. Filed
> > > https://issues.apache.org/jira/browse/HDFS-15731
> > >
> > >> On Thu, Dec 17, 2020 at 4:17 PM Akira Ajisaka 
> wrote:
> > >>
> > >> In most cases, the OOM occurs when closing MiniDFSCluster.
> > >> Added a detailed comment in
> > >> https://issues.apache.org/jira/browse/HDFS-13579 and created a PR:
> > >> https://github.com/apache/hadoop/pull/2555
> > >>
> > >> -Akira
> > >>
> > >>> On Fri, Dec 4, 2020 at 12:43 AM Ahmed Hussein  wrote:
> > >>>
> > >>> I remember this error was there for more than 6 months. It
> significantly
> > >>> slows down the progress of collaboration.
> > >>> Then, eventually, the community will develop another habit of
> ignoring the
> > >>> prebuilds (out of despair).
> > >>>
> > >>> I am willing to help to get this fixed.
> > >>> Anyone knows who owns and has experience with Yetus environment?
> > >>>
> > >>> On Wed, Dec 2, 2020 at 4:43 PM Jim Brennan <
> james.bren...@verizonmedia.com>
> > >>> wrote:
> > >>>
> >  This is still happening.
> >  Latest build:
> > 
> https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/343/#showFailuresLink
> > 
> >  Looks like we are running out of threads in the containers where
> the unit
> >  tests run.  Anyone know where this is setup?
> > 
> >  On Wed, Oct 21, 2020 at 5:51 PM Ahmed Hussein 
> wrote:
> > 
> > > Hey folks,
> > >
> > > Yetus has been failing miserably over the last couple of days.
> > > In the Lastest qbt-report
> > > <
> > >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ci-2Dhadoop.apache.org_job_hadoop-2Dqbt-2Dtrunk-2Djava8-2Dlinux-2Dx86-5F64_301_artifact_out_patch-2Dunit-2Dhadoop-2Dhdfs-2Dproject-5Fhadoop-2Dhdfs.txt=DwIBaQ=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw=HChJ3ymJ0kOlFyiTBsyRZLs9qcTOQD864ZFb8g7y2CA=N-PB427UiouJCuX_U3UbUXvIh2HQTt7VdM2Bs_4XILI=
> > >> ,
> > > hundreds of Junits fail after java failed to acquire resources
> > > to create new threads.
> > >
> > > [ERROR]
> > >>
> > >
> testRecoverAllDataBlocks1(org.apache.hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy)
> > >> Time elapsed: 8.509 s  <<< ERROR!
> > >> java.lang.OutOfMemoryError: unable to create new native thread
> > >
> > >
> > > Any thoughts on what could trigger that in the last few days? Do
> we need
> > > more resources for the image?
> > >
> > > --
> > > Best Regards,
> > >
> > > *Ahmed Hussein, PhD*
> > >
> > 
> > >>>
> > >>> --
> > >>> Best Regards,
> > >>>
> > >>> *Ahmed Hussein, PhD*
> > >
> > > -
> > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> > >
> >
> > 

Re: [E] Yetus is failing with Java unable to create threads

2020-12-17 Thread Ayush Saxena
Hi Akira,
Will reducing the number of threads not increase the build time? I guess it 
takes in general 2.5-3.5 hrs in the present scenario. Moreover the thread count 
hasn’t been increased recently, Would that be the root of all evil?

Can we not ask for more resources? 

Anyway, I think RBF builds are quite stable, I don’t remember seeing OOM there, 
so, in case we decide to reduce the thread count, may be we can keep RBF as is?

-Ayush

> On 17-Dec-2020, at 2:15 PM, Akira Ajisaka  wrote:
> 
> Sorry, now I think the above comment is wrong. Please ignore.
> In hadoop-common, hadoop-hdfs, and hadoop-hdfs-rbf, the unit tests are
> executed in parallel. I'd like to reduce the number of tests running
> at the same time to avoid OOM. Filed
> https://issues.apache.org/jira/browse/HDFS-15731
> 
>> On Thu, Dec 17, 2020 at 4:17 PM Akira Ajisaka  wrote:
>> 
>> In most cases, the OOM occurs when closing MiniDFSCluster.
>> Added a detailed comment in
>> https://issues.apache.org/jira/browse/HDFS-13579 and created a PR:
>> https://github.com/apache/hadoop/pull/2555
>> 
>> -Akira
>> 
>>> On Fri, Dec 4, 2020 at 12:43 AM Ahmed Hussein  wrote:
>>> 
>>> I remember this error was there for more than 6 months. It significantly
>>> slows down the progress of collaboration.
>>> Then, eventually, the community will develop another habit of ignoring the
>>> prebuilds (out of despair).
>>> 
>>> I am willing to help to get this fixed.
>>> Anyone knows who owns and has experience with Yetus environment?
>>> 
>>> On Wed, Dec 2, 2020 at 4:43 PM Jim Brennan 
>>> wrote:
>>> 
 This is still happening.
 Latest build:
 https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/343/#showFailuresLink
 
 Looks like we are running out of threads in the containers where the unit
 tests run.  Anyone know where this is setup?
 
 On Wed, Oct 21, 2020 at 5:51 PM Ahmed Hussein  wrote:
 
> Hey folks,
> 
> Yetus has been failing miserably over the last couple of days.
> In the Lastest qbt-report
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ci-2Dhadoop.apache.org_job_hadoop-2Dqbt-2Dtrunk-2Djava8-2Dlinux-2Dx86-5F64_301_artifact_out_patch-2Dunit-2Dhadoop-2Dhdfs-2Dproject-5Fhadoop-2Dhdfs.txt=DwIBaQ=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw=HChJ3ymJ0kOlFyiTBsyRZLs9qcTOQD864ZFb8g7y2CA=N-PB427UiouJCuX_U3UbUXvIh2HQTt7VdM2Bs_4XILI=
>> ,
> hundreds of Junits fail after java failed to acquire resources
> to create new threads.
> 
> [ERROR]
>> 
> testRecoverAllDataBlocks1(org.apache.hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy)
>> Time elapsed: 8.509 s  <<< ERROR!
>> java.lang.OutOfMemoryError: unable to create new native thread
> 
> 
> Any thoughts on what could trigger that in the last few days? Do we need
> more resources for the image?
> 
> --
> Best Regards,
> 
> *Ahmed Hussein, PhD*
> 
 
>>> 
>>> --
>>> Best Regards,
>>> 
>>> *Ahmed Hussein, PhD*
> 
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> 

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [E] Yetus is failing with Java unable to create threads

2020-12-17 Thread Akira Ajisaka
Sorry, now I think the above comment is wrong. Please ignore.
In hadoop-common, hadoop-hdfs, and hadoop-hdfs-rbf, the unit tests are
executed in parallel. I'd like to reduce the number of tests running
at the same time to avoid OOM. Filed
https://issues.apache.org/jira/browse/HDFS-15731

On Thu, Dec 17, 2020 at 4:17 PM Akira Ajisaka  wrote:
>
> In most cases, the OOM occurs when closing MiniDFSCluster.
> Added a detailed comment in
> https://issues.apache.org/jira/browse/HDFS-13579 and created a PR:
> https://github.com/apache/hadoop/pull/2555
>
> -Akira
>
> On Fri, Dec 4, 2020 at 12:43 AM Ahmed Hussein  wrote:
> >
> > I remember this error was there for more than 6 months. It significantly
> > slows down the progress of collaboration.
> > Then, eventually, the community will develop another habit of ignoring the
> > prebuilds (out of despair).
> >
> > I am willing to help to get this fixed.
> > Anyone knows who owns and has experience with Yetus environment?
> >
> > On Wed, Dec 2, 2020 at 4:43 PM Jim Brennan 
> > wrote:
> >
> > > This is still happening.
> > > Latest build:
> > > https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/343/#showFailuresLink
> > >
> > > Looks like we are running out of threads in the containers where the unit
> > > tests run.  Anyone know where this is setup?
> > >
> > > On Wed, Oct 21, 2020 at 5:51 PM Ahmed Hussein  wrote:
> > >
> > >> Hey folks,
> > >>
> > >> Yetus has been failing miserably over the last couple of days.
> > >> In the Lastest qbt-report
> > >> <
> > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__ci-2Dhadoop.apache.org_job_hadoop-2Dqbt-2Dtrunk-2Djava8-2Dlinux-2Dx86-5F64_301_artifact_out_patch-2Dunit-2Dhadoop-2Dhdfs-2Dproject-5Fhadoop-2Dhdfs.txt=DwIBaQ=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw=HChJ3ymJ0kOlFyiTBsyRZLs9qcTOQD864ZFb8g7y2CA=N-PB427UiouJCuX_U3UbUXvIh2HQTt7VdM2Bs_4XILI=
> > >> >,
> > >> hundreds of Junits fail after java failed to acquire resources
> > >> to create new threads.
> > >>
> > >> [ERROR]
> > >> >
> > >> testRecoverAllDataBlocks1(org.apache.hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy)
> > >> >  Time elapsed: 8.509 s  <<< ERROR!
> > >> > java.lang.OutOfMemoryError: unable to create new native thread
> > >>
> > >>
> > >> Any thoughts on what could trigger that in the last few days? Do we need
> > >> more resources for the image?
> > >>
> > >> --
> > >> Best Regards,
> > >>
> > >> *Ahmed Hussein, PhD*
> > >>
> > >
> >
> > --
> > Best Regards,
> >
> > *Ahmed Hussein, PhD*

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [E] Yetus is failing with Java unable to create threads

2020-12-16 Thread Akira Ajisaka
In most cases, the OOM occurs when closing MiniDFSCluster.
Added a detailed comment in
https://issues.apache.org/jira/browse/HDFS-13579 and created a PR:
https://github.com/apache/hadoop/pull/2555

-Akira

On Fri, Dec 4, 2020 at 12:43 AM Ahmed Hussein  wrote:
>
> I remember this error was there for more than 6 months. It significantly
> slows down the progress of collaboration.
> Then, eventually, the community will develop another habit of ignoring the
> prebuilds (out of despair).
>
> I am willing to help to get this fixed.
> Anyone knows who owns and has experience with Yetus environment?
>
> On Wed, Dec 2, 2020 at 4:43 PM Jim Brennan 
> wrote:
>
> > This is still happening.
> > Latest build:
> > https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/343/#showFailuresLink
> >
> > Looks like we are running out of threads in the containers where the unit
> > tests run.  Anyone know where this is setup?
> >
> > On Wed, Oct 21, 2020 at 5:51 PM Ahmed Hussein  wrote:
> >
> >> Hey folks,
> >>
> >> Yetus has been failing miserably over the last couple of days.
> >> In the Lastest qbt-report
> >> <
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__ci-2Dhadoop.apache.org_job_hadoop-2Dqbt-2Dtrunk-2Djava8-2Dlinux-2Dx86-5F64_301_artifact_out_patch-2Dunit-2Dhadoop-2Dhdfs-2Dproject-5Fhadoop-2Dhdfs.txt=DwIBaQ=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw=HChJ3ymJ0kOlFyiTBsyRZLs9qcTOQD864ZFb8g7y2CA=N-PB427UiouJCuX_U3UbUXvIh2HQTt7VdM2Bs_4XILI=
> >> >,
> >> hundreds of Junits fail after java failed to acquire resources
> >> to create new threads.
> >>
> >> [ERROR]
> >> >
> >> testRecoverAllDataBlocks1(org.apache.hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy)
> >> >  Time elapsed: 8.509 s  <<< ERROR!
> >> > java.lang.OutOfMemoryError: unable to create new native thread
> >>
> >>
> >> Any thoughts on what could trigger that in the last few days? Do we need
> >> more resources for the image?
> >>
> >> --
> >> Best Regards,
> >>
> >> *Ahmed Hussein, PhD*
> >>
> >
>
> --
> Best Regards,
>
> *Ahmed Hussein, PhD*

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [E] Yetus is failing with Java unable to create threads

2020-12-03 Thread Ahmed Hussein
I remember this error was there for more than 6 months. It significantly
slows down the progress of collaboration.
Then, eventually, the community will develop another habit of ignoring the
prebuilds (out of despair).

I am willing to help to get this fixed.
Anyone knows who owns and has experience with Yetus environment?

On Wed, Dec 2, 2020 at 4:43 PM Jim Brennan 
wrote:

> This is still happening.
> Latest build:
> https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/343/#showFailuresLink
>
> Looks like we are running out of threads in the containers where the unit
> tests run.  Anyone know where this is setup?
>
> On Wed, Oct 21, 2020 at 5:51 PM Ahmed Hussein  wrote:
>
>> Hey folks,
>>
>> Yetus has been failing miserably over the last couple of days.
>> In the Lastest qbt-report
>> <
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__ci-2Dhadoop.apache.org_job_hadoop-2Dqbt-2Dtrunk-2Djava8-2Dlinux-2Dx86-5F64_301_artifact_out_patch-2Dunit-2Dhadoop-2Dhdfs-2Dproject-5Fhadoop-2Dhdfs.txt=DwIBaQ=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw=HChJ3ymJ0kOlFyiTBsyRZLs9qcTOQD864ZFb8g7y2CA=N-PB427UiouJCuX_U3UbUXvIh2HQTt7VdM2Bs_4XILI=
>> >,
>> hundreds of Junits fail after java failed to acquire resources
>> to create new threads.
>>
>> [ERROR]
>> >
>> testRecoverAllDataBlocks1(org.apache.hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy)
>> >  Time elapsed: 8.509 s  <<< ERROR!
>> > java.lang.OutOfMemoryError: unable to create new native thread
>>
>>
>> Any thoughts on what could trigger that in the last few days? Do we need
>> more resources for the image?
>>
>> --
>> Best Regards,
>>
>> *Ahmed Hussein, PhD*
>>
>

-- 
Best Regards,

*Ahmed Hussein, PhD*


Re: [E] Yetus is failing with Java unable to create threads

2020-12-02 Thread Jim Brennan
This is still happening.
Latest build:
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/343/#showFailuresLink

Looks like we are running out of threads in the containers where the unit
tests run.  Anyone know where this is setup?

On Wed, Oct 21, 2020 at 5:51 PM Ahmed Hussein  wrote:

> Hey folks,
>
> Yetus has been failing miserably over the last couple of days.
> In the Lastest qbt-report
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ci-2Dhadoop.apache.org_job_hadoop-2Dqbt-2Dtrunk-2Djava8-2Dlinux-2Dx86-5F64_301_artifact_out_patch-2Dunit-2Dhadoop-2Dhdfs-2Dproject-5Fhadoop-2Dhdfs.txt=DwIBaQ=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw=HChJ3ymJ0kOlFyiTBsyRZLs9qcTOQD864ZFb8g7y2CA=N-PB427UiouJCuX_U3UbUXvIh2HQTt7VdM2Bs_4XILI=
> >,
> hundreds of Junits fail after java failed to acquire resources
> to create new threads.
>
> [ERROR]
> >
> testRecoverAllDataBlocks1(org.apache.hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy)
> >  Time elapsed: 8.509 s  <<< ERROR!
> > java.lang.OutOfMemoryError: unable to create new native thread
>
>
> Any thoughts on what could trigger that in the last few days? Do we need
> more resources for the image?
>
> --
> Best Regards,
>
> *Ahmed Hussein, PhD*
>


Yetus is failing with Java unable to create threads

2020-10-21 Thread Ahmed Hussein
Hey folks,

Yetus has been failing miserably over the last couple of days.
In the Lastest qbt-report
,
hundreds of Junits fail after java failed to acquire resources
to create new threads.

[ERROR]
> testRecoverAllDataBlocks1(org.apache.hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy)
>  Time elapsed: 8.509 s  <<< ERROR!
> java.lang.OutOfMemoryError: unable to create new native thread


Any thoughts on what could trigger that in the last few days? Do we need
more resources for the image?

-- 
Best Regards,

*Ahmed Hussein, PhD*