subject:"\[jira\] \[Updated\] \(SPARK\-21693\) AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests"

[jira] [Updated] (SPARK-21693) AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests

2017-08-10 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-21693:
-
Description: 
We finally sometimes reach the time limit, 1.5 hours, 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
I requested to increase this from an hour to 1.5 hours before but it looks we 
should fix this in Spark. I asked this for my account few times before but it 
looks we can't increase this time limit again and again.

I could identify two things that look taking a quite a bit of time:


1. Disabled cache feature in pull request builder, which ends up downloading 
Maven dependencies (roughly 10ish mins)


https://www.appveyor.com/docs/build-cache/

{quote}
Note: Saving cache is disabled in Pull Request builds.
{quote}

and also see 
http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working

This seems difficult to fix within Spark.


2. "MLlib classification algorithms" tests (30-35ish mins)

This test below looks taking 30-35ish mins.

{code}
MLlib classification algorithms, except for tree-based algorithms: Spark 
package found in SPARK_HOME: C:\projects\spark\bin\..
..
{code}

As a (I think) last resort, we could make a matrix for this test alone, so that 
we run the other tests after a build and then run this test after another 
build, for example, I run Scala tests by this workaround - 
https://ci.appveyor.com/project/spark-test/spark/build/757-20170716 (a matrix 
with 7 build and test each).

I am also checking and testing other ways.


  was:
We finally sometimes reach the time limit, 1.5 hours, 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
I requested to increase this from an hour to 1.5 hours before but it looks we 
should fix this in Spark. I asked this for my account few times before but it 
looks we can't increase this time limit again and again.

I could identify two things that look taking a quite a bit of time:


1. Disabled cache feature in pull request builder, which ends up downloading 
Maven dependencies (10-20ish mins)


https://www.appveyor.com/docs/build-cache/

{quote}
Note: Saving cache is disabled in Pull Request builds.
{quote}

and also see 
http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working

This seems difficult to fix within Spark.


2. "MLlib classification algorithms" tests (30-35ish mins)

This test below looks taking 30-35ish mins.

{code}
MLlib classification algorithms, except for tree-based algorithms: Spark 
package found in SPARK_HOME: C:\projects\spark\bin\..
..
{code}

As a (I think) last resort, we could make a matrix for this test alone, so that 
we run the other tests after a build and then run this test after another 
build, for example, I run Scala tests by this workaround - 
https://ci.appveyor.com/project/spark-test/spark/build/757-20170716 (a matrix 
with 7 build and test each).

I am also checking and testing other ways.



> AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests
> -
>
> Key: SPARK-21693
> URL: https://issues.apache.org/jira/browse/SPARK-21693
> Project: Spark
>  Issue Type: Test
>  Components: Build, SparkR
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>
> We finally sometimes reach the time limit, 1.5 hours, 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
> I requested to increase this from an hour to 1.5 hours before but it looks we 
> should fix this in Spark. I asked this for my account few times before but it 
> looks we can't increase this time limit again and again.
> I could identify two things that look taking a quite a bit of time:
> 1. Disabled cache feature in pull request builder, which ends up downloading 
> Maven dependencies (roughly 10ish mins)
> https://www.appveyor.com/docs/build-cache/
> {quote}
> Note: Saving cache is disabled in Pull Request builds.
> {quote}
> and also see 
> http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working
> This seems difficult to fix within Spark.
> 2. "MLlib classification algorithms" tests (30-35ish mins)
> This test below looks taking 30-35ish mins.
> {code}
> MLlib classification algorithms, except for tree-based algorithms: Spark 
> package found in SPARK_HOME: C:\projects\spark\bin\..
> ..
> {code}
> As a (I think) last resort, we could make a matrix for this test alone, so 
> that we run the other tests after a build and then run this test after 
> another build, for example, I run Scala tests by this workaround - 
>

[jira] [Updated] (SPARK-21693) AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests

2017-08-10 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-21693:
-
Description: 
We finally sometimes reach the time limit, 1.5 hours, 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
I requested to increase this from an hour to 1.5 hours before but it looks we 
should fix this in Spark. I asked this for my account few times before but it 
looks we can't increase this time limit again and again.

I could identify two things that look taking a quite a bit of time:


1. Disabled cache feature in pull request builder, which ends up downloading 
Maven dependencies (10-20ish mins)


https://www.appveyor.com/docs/build-cache/

{quote}
Note: Saving cache is disabled in Pull Request builds.
{quote}

and also see 
http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working

This seems difficult to fix within Spark.


2. "MLlib classification algorithms" tests (30-35ish mins)

This test below looks taking 30-35ish mins.

{code}
MLlib classification algorithms, except for tree-based algorithms: Spark 
package found in SPARK_HOME: C:\projects\spark\bin\..
..
{code}

As a (I think) last resort, we could make a matrix for this test alone, so that 
we run the other tests after a build and then run this test after another 
build, for example, I run Scala tests by this workaround - 
https://ci.appveyor.com/project/spark-test/spark/build/757-20170716 (a matrix 
with 7 build and test each).

I am also checking and testing other ways.


  was:
We finally sometimes reach the time limit, 1.5 hours, 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
I requested to increase this from an hour to 1.5 hours before but it looks we 
should fix this in AppVeyor. I asked this for my account few times before but 
it looks we can't increase this time limit again and again.

I could identify two things that look taking a quite a bit of time:


1. Disabled cache feature in pull request builder, which ends up downloading 
Maven dependencies (10-20ish mins)


https://www.appveyor.com/docs/build-cache/

{quote}
Note: Saving cache is disabled in Pull Request builds.
{quote}

and also see 
http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working

This seems difficult to fix within Spark.


2. "MLlib classification algorithms" tests (30-35ish mins)

This test below looks taking 30-35ish mins.

{code}
MLlib classification algorithms, except for tree-based algorithms: Spark 
package found in SPARK_HOME: C:\projects\spark\bin\..
..
{code}

As a (I think) last resort, we could make a matrix for this test alone, so that 
we run the other tests after a build and then run this test after another 
build, for example, I run Scala tests by this workaround - 
https://ci.appveyor.com/project/spark-test/spark/build/757-20170716 (a matrix 
with 7 build and test each).

I am also checking and testing other ways.



> AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests
> -
>
> Key: SPARK-21693
> URL: https://issues.apache.org/jira/browse/SPARK-21693
> Project: Spark
>  Issue Type: Test
>  Components: Build, SparkR
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>
> We finally sometimes reach the time limit, 1.5 hours, 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
> I requested to increase this from an hour to 1.5 hours before but it looks we 
> should fix this in Spark. I asked this for my account few times before but it 
> looks we can't increase this time limit again and again.
> I could identify two things that look taking a quite a bit of time:
> 1. Disabled cache feature in pull request builder, which ends up downloading 
> Maven dependencies (10-20ish mins)
> https://www.appveyor.com/docs/build-cache/
> {quote}
> Note: Saving cache is disabled in Pull Request builds.
> {quote}
> and also see 
> http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working
> This seems difficult to fix within Spark.
> 2. "MLlib classification algorithms" tests (30-35ish mins)
> This test below looks taking 30-35ish mins.
> {code}
> MLlib classification algorithms, except for tree-based algorithms: Spark 
> package found in SPARK_HOME: C:\projects\spark\bin\..
> ..
> {code}
> As a (I think) last resort, we could make a matrix for this test alone, so 
> that we run the other tests after a build and then run this test after 
> another build, for example, I run Scala tests by this workaround - 
>

[jira] [Updated] (SPARK-21693) AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests

2017-08-10 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-21693:
-
Description: 
We finally sometimes reach the time limit, 1.5 hours, 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
I requested to increase this from an hour to 1.5 hours before but it looks we 
should fix this in AppVeyor. I asked this for my account few times before but 
it looks we can't increase this time limit again and again.

I could identify two things that look taking a quite a bit of time:


1. Disabled cache feature in pull request builder, which ends up downloading 
Maven dependencies (10-20ish mins)


https://www.appveyor.com/docs/build-cache/

{quote}
Note: Saving cache is disabled in Pull Request builds.
{quote}

and also see 
http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working

This seems difficult to fix within Spark.


2. "MLlib classification algorithms" tests (30-35ish mins)

This test below looks taking 30-35ish mins.

{code}
MLlib classification algorithms, except for tree-based algorithms: Spark 
package found in SPARK_HOME: C:\projects\spark\bin\..
..
{code}

As a (I think) last resort, we could make a matrix for this test alone, so that 
we run the other tests after a build and then run this test after another 
build, for example, I run Scala tests by this workaround - 
https://ci.appveyor.com/project/spark-test/spark/build/757-20170716 (a matrix 
with 7 build and test each).

I am also checking and testing other ways.


  was:
We finally sometimes reach the time limit, 1.5 hours, 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
I requested to increase this from an hour to 1.5 hours before but it looks we 
should fix this in AppVeyor. I asked this for my account few times before but 
it looks we can't increase this time limit again and again.

I could identify three things that look taking a quite a bit of time:


1. Disabled cache feature in pull request builder, which ends up downloading 
Maven dependencies (10-20ish mins)


https://www.appveyor.com/docs/build-cache/

{quote}
Note: Saving cache is disabled in Pull Request builds.
{quote}

and also see 
http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working

This seems difficult to fix within Spark.


2. "MLlib classification algorithms" tests (30-35ish mins)

This test below looks taking 30-35ish mins.

{code}
MLlib classification algorithms, except for tree-based algorithms: Spark 
package found in SPARK_HOME: C:\projects\spark\bin\..
..
{code}

As a (I think) last resort, we could make a matrix for this test alone, so that 
we run the other tests after a build and then run this test after another 
build, for example, I run Scala tests by this workaround - 
https://ci.appveyor.com/project/spark-test/spark/build/757-20170716 (a matrix 
with 7 build and test each).


3. Disabled {{spark.sparkr.use.daemon}} on Windows due to the limitation of 
{{mcfork}}

See [this 
codes|https://github.com/apache/spark/blob/478fbc866fbfdb4439788583281863ecea14e8af/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L362-L392].
 We disabled this feature and currently fork processes from Java that is 
expensive. I haven't tested this yet but maybe reducing 
{{spark.sql.shuffle.partitions}} can be an approach to work around this. 
Currently, if I understood correctly, this is 200 by default in R tests, which 
ends up with 200 Java processes for every shuffle.




> AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests
> -
>
> Key: SPARK-21693
> URL: https://issues.apache.org/jira/browse/SPARK-21693
> Project: Spark
>  Issue Type: Test
>  Components: Build, SparkR
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>
> We finally sometimes reach the time limit, 1.5 hours, 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
> I requested to increase this from an hour to 1.5 hours before but it looks we 
> should fix this in AppVeyor. I asked this for my account few times before but 
> it looks we can't increase this time limit again and again.
> I could identify two things that look taking a quite a bit of time:
> 1. Disabled cache feature in pull request builder, which ends up downloading 
> Maven dependencies (10-20ish mins)
> https://www.appveyor.com/docs/build-cache/
> {quote}
> Note: Saving cache is disabled in Pull Request builds.
> {quote}
> and also see 
> http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working
> This seems difficult to fix within Spark.
> 2.

[jira] [Updated] (SPARK-21693) AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests

2017-08-10 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-21693:
-
Description: 
We finally sometimes reach the time limit, 1.5 hours, 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
I requested to increase this from an hour to 1.5 hours before but it looks we 
should fix this in AppVeyor. I asked this for my account few times before but 
it looks we can't increase this time limit again and again.

I could identify three things that look taking a quite a bit of time:


1. Disabled cache feature in pull request builder, which ends up downloading 
Maven dependencies (10-20ish mins)


https://www.appveyor.com/docs/build-cache/

{quote}
Note: Saving cache is disabled in Pull Request builds.
{quote}

and also see 
http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working

This seems difficult to fix within Spark.


2. "MLlib classification algorithms" tests (30-35ish mins)

This test below looks taking 30-35ish mins.

{code}
MLlib classification algorithms, except for tree-based algorithms: Spark 
package found in SPARK_HOME: C:\projects\spark\bin\..
..
{code}

As a (I think) last resort, we could make a matrix for this test alone, so that 
we run the other tests after a build and then run this test after another 
build, for example, I run Scala tests by this workaround - 
https://ci.appveyor.com/project/spark-test/spark/build/757-20170716 (a matrix 
with 7 build and test each).


3. Disabled {{spark.sparkr.use.daemon}} on Windows due to the limitation of 
{{mcfork}}

See [this 
codes|https://github.com/apache/spark/blob/478fbc866fbfdb4439788583281863ecea14e8af/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L362-L392].
 We disabled this feature and currently fork processes from Java that is 
expensive. I haven't tested this yet but maybe reducing 
{{spark.sql.shuffle.partitions}} can be an approach to work around this. 
Currently, if I understood correctly, this is 200 by default in R tests, which 
ends up with 200 Java processes for every shuffle.



  was:
We finally sometimes reach the time limit, 1.5 hours, 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
I requested to increase this from an hour to 1.5 hours before but it looks we 
should fix this in AppVeyor. I asked this for my account few times before but 
it looks we can't increase this time limit again and again.

I could identify three things that look taking a quite a bit of times:


1. Disabled cache feature in pull request builder, which ends up downloading 
Maven dependencies (10-20ish mins)


https://www.appveyor.com/docs/build-cache/

{quote}
Note: Saving cache is disabled in Pull Request builds.
{quote}

and also see 
http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working

This seems difficult to fix within Spark.


2. "MLlib classification algorithms" tests (30-35ish mins)

This test below looks taking 30-35ish mins.

{code}
MLlib classification algorithms, except for tree-based algorithms: Spark 
package found in SPARK_HOME: C:\projects\spark\bin\..
..
{code}

As a (I think) last resort, we could make a matrix for this test alone, so that 
we run the other tests after a build and then run this test after another 
build, for example, I run Scala tests by this workaround - 
https://ci.appveyor.com/project/spark-test/spark/build/757-20170716 (a matrix 
with 7 build and test each).


3. Disabled {{spark.sparkr.use.daemon}} on Windows due to the limitation of 
{{mcfork}}

See [this 
codes|https://github.com/apache/spark/blob/478fbc866fbfdb4439788583281863ecea14e8af/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L362-L392].
 We disabled this feature and currently fork processes from Java that is 
expensive. I haven't tested this yet but maybe reducing 
{{spark.sql.shuffle.partitions}} can be an approach to work around this. 
Currently, if I understood correctly, this is 200 by default in R tests, which 
ends up with 200 Java processes for every shuffle.




> AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests
> -
>
> Key: SPARK-21693
> URL: https://issues.apache.org/jira/browse/SPARK-21693
> Project: Spark
>  Issue Type: Test
>  Components: Build, SparkR
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>
> We finally sometimes reach the time limit, 1.5 hours, 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
> I requested to increase this from an hour to 1.5 hours before but it looks we 
> should fix this in AppVeyor. I asked this for my

[jira] [Updated] (SPARK-21693) AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests

2017-08-10 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-21693:
-
Description: 
We finally sometimes reach the time limit, 1.5 hours, 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
I requested to increase this from an hour to 1.5 hours before but it looks we 
should fix this in AppVeyor. I asked this for my account few times before but 
it looks we can't increase this time limit again and again.

I could identify three things that look taking a quite a bit of times:


1. Disabled cache feature in pull request builder, which ends up downloading 
Maven dependencies (10-20ish mins)


https://www.appveyor.com/docs/build-cache/

{quote}
Note: Saving cache is disabled in Pull Request builds.
{quote}

and also see 
http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working

This seems difficult to fix within Spark.


2. "MLlib classification algorithms" tests (30-35ish mins)

This test below looks taking 30-35ish mins.

{code}
MLlib classification algorithms, except for tree-based algorithms: Spark 
package found in SPARK_HOME: C:\projects\spark\bin\..
..
{code}

As a (I think) last resort, we could make a matrix for this test alone, so that 
we run the other tests after a build and then run this test after another 
build, for example, I run Scala tests by this workaround - 
https://ci.appveyor.com/project/spark-test/spark/build/757-20170716 (a matrix 
with 7 build and test each).


3. Disabled {{spark.sparkr.use.daemon}} on Windows due to the limitation of 
{{mcfork}}

See [this 
codes|https://github.com/apache/spark/blob/478fbc866fbfdb4439788583281863ecea14e8af/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L362-L392].
 We disabled this feature and currently fork processes from Java that is 
expensive. I haven't tested this yet but maybe reducing 
{{spark.sql.shuffle.partitions}} can be an approach to work around this. 
Currently, if I understood correctly, this is 200 by default in R tests, which 
ends up with 200 Java processes for every shuffle.



  was:
We finally sometimes reach the time limit, 1.5 hours, 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
I requested to increase this from an hour to 1.5 hours before but it looks we 
should fix this in AppVeyor. I asked this for my account few times before but 
it looks we can't increase this time limit again and again.

I could identify three things that take a quite a bit of times:


1. Disabled cache feature in pull request builder, which ends up downloading 
Maven dependencies (15-20ish mins)


https://www.appveyor.com/docs/build-cache/

{quote}
Note: Saving cache is disabled in Pull Request builds.
{quote}

and also see 
http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working

This seems difficult to fix within Spark.


2. "MLlib classification algorithms" tests (30-35ish mins)

This test below looks taking 30-35ish mins.

{code}
MLlib classification algorithms, except for tree-based algorithms: Spark 
package found in SPARK_HOME: C:\projects\spark\bin\..
..
{code}

As a (I think) last resort, we could make a matrix for this test alone, so that 
we run the other tests after a build and then run this test after another 
build, for example, I run Scala tests by this workaround - 
https://ci.appveyor.com/project/spark-test/spark/build/757-20170716 (a matrix 
with 7 build and test each).


3. Disabled {{spark.sparkr.use.daemon}} on Windows due to the limitation of 
{{mcfork}}

See [this 
codes|https://github.com/apache/spark/blob/478fbc866fbfdb4439788583281863ecea14e8af/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L362-L392].
 We disabled this feature and currently fork processes from Java that is 
expensive. I haven't tested this yet but maybe reducing 
{{spark.sql.shuffle.partitions}} can be an approach to work around this. 
Currently, if I understood correctly, this is 200 by default in R tests, which 
ends up with 200 Java processes for every shuffle.




> AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests
> -
>
> Key: SPARK-21693
> URL: https://issues.apache.org/jira/browse/SPARK-21693
> Project: Spark
>  Issue Type: Test
>  Components: Build, SparkR
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>
> We finally sometimes reach the time limit, 1.5 hours, 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
> I requested to increase this from an hour to 1.5 hours before but it looks we 
> should fix this in AppVeyor. I asked this for my account few

[jira] [Updated] (SPARK-21693) AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests

2017-08-10 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-21693:
-
Description: 
We finally sometimes reach the time limit, 1.5 hours, 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
I requested to increase this from an hour to 1.5 hours before but it looks we 
should fix this in AppVeyor. I asked this for my account few times before but 
it looks we can't increase this time limit again and again.

I could identify three things that take a quite a bit of times:


1. Disabled cache feature in pull request builder, which ends up downloading 
Maven dependencies (15-20ish mins)


https://www.appveyor.com/docs/build-cache/

{quote}
Note: Saving cache is disabled in Pull Request builds.
{quote}

and also see 
http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working

This seems difficult to fix within Spark.


2. "MLlib classification algorithms" tests (30-35ish mins)

This test below looks taking 30-35ish mins.

{code}
MLlib classification algorithms, except for tree-based algorithms: Spark 
package found in SPARK_HOME: C:\projects\spark\bin\..
..
{code}

As a (I think) last resort, we could make a matrix for this test alone, so that 
we run the other tests after a build and then run this test after another 
build, for example, I run Scala tests by this workaround - 
https://ci.appveyor.com/project/spark-test/spark/build/757-20170716 (a matrix 
with 7 build and test each).


3. Disabled {{spark.sparkr.use.daemon}} on Windows due to the limitation of 
{{mcfork}}

See [this 
codes|https://github.com/apache/spark/blob/478fbc866fbfdb4439788583281863ecea14e8af/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L362-L392].
 We disabled this feature and currently fork processes from Java that is 
expensive. I haven't tested this yet but maybe reducing 
{{spark.sql.shuffle.partitions}} can be an approach to work around this. 
Currently, if I understood correctly, this is 200 by default in R tests, which 
ends up with 200 Java processes for every shuffle.



  was:
We finally sometimes reach the time limit, 1.5 hours, 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
I requested to increase this from an hour to 1.5 hours before but it looks we 
should fix this in AppVeyor. I asked this for my account few times before but 
it looks we can't increase this time limit again and again.

I could identify three things that take a quite a bit of times:


1. Disabled cache feature in pull request builder, which ends up downloading 
Maven dependencies (15-20ish mins)


https://www.appveyor.com/docs/build-cache/

{code}
Note: Saving cache is disabled in Pull Request builds.
{code}

and also see 
http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working

This seems difficult to fix within Spark.


2. "MLlib classification algorithms" tests (30-35ish mins)

This test below looks taking 30-35ish mins.

{code}
MLlib classification algorithms, except for tree-based algorithms: Spark 
package found in SPARK_HOME: C:\projects\spark\bin\..
..
{code}

As a (I think) last resort, we could make a matrix for this test alone, so that 
we run the other tests after a build and then run this test after another 
build, for example, I run Scala tests by this workaround - 
https://ci.appveyor.com/project/spark-test/spark/build/757-20170716 (a matrix 
with 7 build and test each).


3. Disabled {{spark.sparkr.use.daemon}} on Windows due to the limitation of 
{{mcfork}}

See [this 
codes|https://github.com/apache/spark/blob/478fbc866fbfdb4439788583281863ecea14e8af/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L362-L392].
 We disabled this feature and currently fork processes from Java that is 
expensive. I haven't tested this yet but maybe reducing 
{{spark.sql.shuffle.partitions}} can be an approach to work around this. 
Currently, if I understood correctly, this is 200 by default in R tests, which 
ends up with 200 Java processes for every shuffle.




> AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests
> -
>
> Key: SPARK-21693
> URL: https://issues.apache.org/jira/browse/SPARK-21693
> Project: Spark
>  Issue Type: Test
>  Components: Build, SparkR
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>
> We finally sometimes reach the time limit, 1.5 hours, 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
> I requested to increase this from an hour to 1.5 hours before but it looks we 
> should fix this in AppVeyor. I asked this for my account few times

[jira] [Updated] (SPARK-21693) AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests

[jira] [Updated] (SPARK-21693) AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests

[jira] [Updated] (SPARK-21693) AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests

[jira] [Updated] (SPARK-21693) AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests

[jira] [Updated] (SPARK-21693) AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests

[jira] [Updated] (SPARK-21693) AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests

6 matches

Site Navigation

Mail list logo

Footer information