Re: [VOTE] Apache Spark 2.1.1 (RC4)

2017-04-28 Thread Felix Cheung
+1

Tested R on linux and windows

Previous issue with building vignettes on windows with stackoverflow in ALS 
still reproduce but as confirmed the issue was in 2.1.0 so this isn't a 
regression (and hope for the best on CRAN..)
https://issues.apache.org/jira/browse/SPARK-20402


From: Denny Lee 
Sent: Friday, April 28, 2017 10:13:41 AM
To: Kazuaki Ishizaki; Michael Armbrust
Cc: dev@spark.apache.org
Subject: Re: [VOTE] Apache Spark 2.1.1 (RC4)

+1

On Fri, Apr 28, 2017 at 9:17 AM Kazuaki Ishizaki 
> wrote:
+1 (non-binding)

I tested it on Ubuntu 16.04 and OpenJDK8 on ppc64le. All of the tests for core 
have passed..

$ java -version
openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
$ build/mvn -DskipTests -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 package 
install
$ build/mvn -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 test -pl core
...
Total number of tests run: 1788
Suites: completed 198, aborted 0
Tests: succeeded 1788, failed 0, canceled 4, ignored 8, pending 0
All tests passed.
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 16:30 min
[INFO] Finished at: 2017-04-29T01:02:29+09:00
[INFO] Final Memory: 54M/576M
[INFO] 

Regards,
Kazuaki Ishizaki,



From:Michael Armbrust 
>
To:"dev@spark.apache.org" 
>
Date:2017/04/27 09:30
Subject:[VOTE] Apache Spark 2.1.1 (RC4)




Please vote on releasing the following candidate as Apache Spark version 2.1.1. 
The vote is open until Sat, April 29th, 2018 at 18:00 PST and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is 
v2.1.1-rc4 
(267aca5bd5042303a718d10635bc0d1a1596853f)

List of JIRA tickets resolved can be found with this 
filter.

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc4-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1232/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc4-docs/


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions.

What should happen to JIRA tickets still targeting 2.1.1?

Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked on 
immediately. Everything else please retarget to 2.1.2 or 2.2.0.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless 
the bug in question is a regression from 2.1.0.

What happened to RC1?

There were issues with the release packaging and as a result was skipped.



Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-04-28 Thread Koert Kuipers
we have been testing the 2.2.0 snapshots in the last few weeks for inhouse
unit tests, integration tests and real workloads and we are very happy with
it. the only issue i had so far (some encoders not being serialize anymore)
has already been dealt with by wenchen.

On Thu, Apr 27, 2017 at 6:49 PM, Sean Owen  wrote:

> By the way the RC looks good. Sigs and license are OK, tests pass with
> -Phive -Pyarn -Phadoop-2.7. +1 from me.
>
> On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.2.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.2.0-rc1
>>  (8ccb4a57c82146c
>> 1a8f8966c7e64010cf5632cb6)
>>
>> List of JIRA tickets resolved can be found with this filter
>> 
>> .
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1235/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.1.1.
>>
>


Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-04-28 Thread Kazuaki Ishizaki
+1 (non-binding)

I tested it on Ubuntu 16.04 and OpenJDK8 on ppc64le. All of the tests for 
core have passed..

$ java -version
openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 
1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
$ build/mvn -DskipTests -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 
package install
$ build/mvn -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 test -pl core
...
Run completed in 15 minutes, 45 seconds.
Total number of tests run: 1937
Suites: completed 205, aborted 0
Tests: succeeded 1937, failed 0, canceled 4, ignored 8, pending 0
All tests passed.
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time: 17:26 min
[INFO] Finished at: 2017-04-29T02:23:08+09:00
[INFO] Final Memory: 53M/491M
[INFO] 


Kazuaki Ishizaki,



From:   Michael Armbrust 
To: "dev@spark.apache.org" 
Date:   2017/04/28 03:32
Subject:[VOTE] Apache Spark 2.2.0 (RC1)



Please vote on releasing the following candidate as Apache Spark version 
2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes 
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (
8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1235/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-docs/


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then 
reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked 
on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release 
unless the bug in question is a regression from 2.1.1.




Re: [VOTE] Apache Spark 2.1.1 (RC4)

2017-04-28 Thread Denny Lee
+1

On Fri, Apr 28, 2017 at 9:17 AM Kazuaki Ishizaki 
wrote:

> +1 (non-binding)
>
> I tested it on Ubuntu 16.04 and OpenJDK8 on ppc64le. All of the tests for
> core have passed..
>
> $ java -version
> openjdk version "1.8.0_111"
> OpenJDK Runtime Environment (build
> 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
> OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
> $ build/mvn -DskipTests -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7
> package install
> $ build/mvn -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 test -pl core
> ...
> Total number of tests run: 1788
> Suites: completed 198, aborted 0
> Tests: succeeded 1788, failed 0, canceled 4, ignored 8, pending 0
> All tests passed.
> [INFO]
> 
> [INFO] BUILD SUCCESS
> [INFO]
> 
> [INFO] Total time: 16:30 min
> [INFO] Finished at: 2017-04-29T01:02:29+09:00
> [INFO] Final Memory: 54M/576M
> [INFO]
> 
>
> Regards,
> Kazuaki Ishizaki,
>
>
>
> From:Michael Armbrust 
> To:"dev@spark.apache.org" 
> Date:2017/04/27 09:30
> Subject:[VOTE] Apache Spark 2.1.1 (RC4)
> --
>
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.1. The vote is open until Sat, April 29th, 2018 at 18:00 PST and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.1
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see *http://spark.apache.org/*
> 
>
> The tag to be voted on is *v2.1.1-rc4*
>  (
> 267aca5bd5042303a718d10635bc0d1a1596853f)
>
> List of JIRA tickets resolved can be found *with this filter*
> 
> .
>
> The release files, including signatures, digests, etc. can be found at:
> *http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc4-bin/*
> 
>
> Release artifacts are signed with the following key:
> *https://people.apache.org/keys/committer/pwendell.asc*
> 
>
> The staging repository for this release can be found at:
> *https://repository.apache.org/content/repositories/orgapachespark-1232/*
> 
>
> The documentation corresponding to this release can be found at:
> *http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc4-docs/*
> 
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.1.1?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>
> *But my bug isn't fixed!??!*
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.0.
>
> *What happened to RC1?*
>
> There were issues with the release packaging and as a result was skipped.
>
>


Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-04-28 Thread Koert Kuipers
this is column names containing dots that do not target fields inside
structs? so not a.b as in field b inside struct a, but somehow a field
called a.b? i didnt even know it is supported at all. its something i would
never try because it sounds like a bad idea to go there...

On Fri, Apr 28, 2017 at 12:17 PM, Andrew Ash  wrote:

> -1 due to regression from 2.1.1
>
> In 2.2.0-rc1 we bumped the Parquet version from 1.8.1 to 1.8.2 in commit
> 26a4cba3ff .  Parquet
> 1.8.2 includes a backport from 1.9.0: PARQUET-389
>  in commit 2282c22c
> 
>
> This backport caused a regression in Spark, where filtering on columns
> containing dots in the column name pushes the filter down into Parquet
> where Parquet incorrectly handles the predicate.  Spark pushes the String
> "col.dots" as the column name, but Parquet interprets this as
> "struct.field" where the predicate is on a field of a struct.  The ultimate
> result is that the predicate always returns zero results, causing a data
> correctness issue.
>
> This issue is filed in Spark as SPARK-20364
>  and has a PR fix up
> at PR #17680 .
>
> I nominate SPARK-20364  as
> a release blocker due to the data correctness regression.
>
> Thanks!
> Andrew
>
> On Thu, Apr 27, 2017 at 6:49 PM, Sean Owen  wrote:
>
>> By the way the RC looks good. Sigs and license are OK, tests pass with
>> -Phive -Pyarn -Phadoop-2.7. +1 from me.
>>
>> On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST
>>> and passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.2.0
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.2.0-rc1
>>>  (8ccb4a57c82146c
>>> 1a8f8966c7e64010cf5632cb6)
>>>
>>> List of JIRA tickets resolved can be found with this filter
>>> 
>>> .
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1235/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-docs/
>>>
>>>
>>> *FAQ*
>>>
>>> *How can I help test this release?*
>>>
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should be
>>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>>
>>> *But my bug isn't fixed!??!*
>>>
>>> In order to make timely releases, we will typically not hold the release
>>> unless the bug in question is a regression from 2.1.1.
>>>
>>
>


Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-04-28 Thread Andrew Ash
-1 due to regression from 2.1.1

In 2.2.0-rc1 we bumped the Parquet version from 1.8.1 to 1.8.2 in commit
26a4cba3ff .  Parquet
1.8.2 includes a backport from 1.9.0: PARQUET-389
 in commit 2282c22c


This backport caused a regression in Spark, where filtering on columns
containing dots in the column name pushes the filter down into Parquet
where Parquet incorrectly handles the predicate.  Spark pushes the String
"col.dots" as the column name, but Parquet interprets this as
"struct.field" where the predicate is on a field of a struct.  The ultimate
result is that the predicate always returns zero results, causing a data
correctness issue.

This issue is filed in Spark as SPARK-20364
 and has a PR fix up at PR
#17680 .

I nominate SPARK-20364  as
a release blocker due to the data correctness regression.

Thanks!
Andrew

On Thu, Apr 27, 2017 at 6:49 PM, Sean Owen  wrote:

> By the way the RC looks good. Sigs and license are OK, tests pass with
> -Phive -Pyarn -Phadoop-2.7. +1 from me.
>
> On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.2.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.2.0-rc1
>>  (8ccb4a57c82146c
>> 1a8f8966c7e64010cf5632cb6)
>>
>> List of JIRA tickets resolved can be found with this filter
>> 
>> .
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1235/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.1.1.
>>
>


Re: [VOTE] Apache Spark 2.1.1 (RC4)

2017-04-28 Thread Kazuaki Ishizaki
+1 (non-binding)

I tested it on Ubuntu 16.04 and OpenJDK8 on ppc64le. All of the tests for 
core have passed..

$ java -version
openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 
1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
$ build/mvn -DskipTests -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 
package install
$ build/mvn -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 test -pl core
...
Total number of tests run: 1788
Suites: completed 198, aborted 0
Tests: succeeded 1788, failed 0, canceled 4, ignored 8, pending 0
All tests passed.
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time: 16:30 min
[INFO] Finished at: 2017-04-29T01:02:29+09:00
[INFO] Final Memory: 54M/576M
[INFO] 


Regards,
Kazuaki Ishizaki, 



From:   Michael Armbrust 
To: "dev@spark.apache.org" 
Date:   2017/04/27 09:30
Subject:[VOTE] Apache Spark 2.1.1 (RC4)



Please vote on releasing the following candidate as Apache Spark version 
2.1.1. The vote is open until Sat, April 29th, 2018 at 18:00 PST and 
passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.1-rc4 (
267aca5bd5042303a718d10635bc0d1a1596853f)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc4-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1232/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc4-docs/


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then 
reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.1?

Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked 
on immediately. Everything else please retarget to 2.1.2 or 2.2.0.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release 
unless the bug in question is a regression from 2.1.0.

What happened to RC1?

There were issues with the release packaging and as a result was skipped.




Re: [VOTE] Apache Spark 2.1.1 (RC4)

2017-04-28 Thread Tom Graves
+1
Tom Graves 

On Thursday, April 27, 2017 5:37 PM, vaquar khan  
wrote:
 

 +1 
Regards, Vaquar khan
On Apr 27, 2017 4:11 PM, "Holden Karau"  wrote:

+1 (non-binding) PySpark packaging issue from the earlier RC seems to have been 
fixed.
On Thu, Apr 27, 2017 at 1:23 PM, Dong Joon Hyun  wrote:

+1
I’ve got the same result (Scala/R test) on JDK 1.8.0_131 at this time.
Bests,Dongjoon.
From: Reynold Xin 
Date: Thursday, April 27, 2017 at 1:06 PM
To: Michael Armbrust , "dev@spark.apache.org" 

Subject: Re: [VOTE] Apache Spark 2.1.1 (RC4)

+1
On Thu, Apr 27, 2017 at 11:59 AM Michael Armbrust  
wrote:

I'll also +1
On Thu, Apr 27, 2017 at 4:20 AM, Sean Owen  wrote:

+1 , same result as with the last RC. All checks out for me.

On Thu, Apr 27, 2017 at 1:29 AM Michael Armbrust  wrote:

Please vote on releasing the following candidate as Apache Spark version 2.1.1. 
The vote is open until Sat, April 29th, 2018 at 18:00 PST and passes if a 
majority of at least 3 +1 PMC votes are cast.
[ ] +1 Release this package as Apache Spark 2.1.1[ ] -1 Do not release this 
package because ...

To learn more about Apache Spark, please see http://spark.apache.org/
The tag to be voted on is v2.1.1-rc4 (267aca5bd504230 3a718d10635bc0d1a1596853f)
List of JIRA tickets resolved can be found with this filter.
The release files, including signatures, digests, etc. can be found 
at:http://home.apache.org/~pwende ll/spark-releases/spark-2.1.1- rc4-bin/
Release artifacts are signed with the following 
key:https://people.apache.org/keys /committer/pwendell.asc
The staging repository for this release can be found 
at:https://repository.apache.org/ content/repositories/orgapache spark-1232/
The documentation corresponding to this release can be found 
at:http://people.apache.org/~pwen dell/spark-releases/spark-2.1. 1-rc4-docs/

FAQ
How can I help test this release?
If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions.
What should happen to JIRA tickets still targeting 2.1.1?
Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked on 
immediately. Everything else please retarget to 2.1.2 or 2.2.0.
But my bug isn't fixed!??!
In order to make timely releases, we will typically not hold the release unless 
the bug in question is a regression from 2.1.0.
What happened to RC1?
There were issues with the release packaging and as a result was skipped.







-- 
Cell : 425-233-8271Twitter: https://twitter.com/ holdenkarau



   

RandomForest caching

2017-04-28 Thread madhu phatak
Hi,

I am testing RandomForestClassification with 50gb of data which is cached
in memory. I have 64gb of ram, in which 28gb is used for original dataset
caching.

When I run random forest, it caches around 300GB of intermediate data which
un caches the original dataset. This caching is triggered by below code in
RandomForest.scala

```
val baggedInput = BaggedPoint
  .convertToBaggedRDD(treeInput, strategy.subsamplingRate,
numTrees, withReplacement, seed)
  .persist(StorageLevel.MEMORY_AND_DISK)

```

As I don't have control over storage level, I cannot make sure original
dataset stays in memory for other interactive tasks when random forest is
running.

Is it a good idea to make this storage level a user parameter? If so I can
open a jira issue and give pr for the same.

-- 
Regards,
Madhukara Phatak
http://datamantra.io/