Re: Build speed

2016-07-22 Thread Michael Allman
I use sbt. Rebuilds are super fast.

Michael

> On Jul 22, 2016, at 7:54 AM, Mikael Ståldal  wrote:
> 
> Is there any way to speed up an incremental build of Spark?
> 
> For me it takes 8 minutes to build the project with just a few code changes.
> 
> -- 
>  
> 
> Mikael Ståldal
> Senior software developer 
> 
> Magine TV
> mikael.stal...@magine.com 
> Grev Turegatan 3  | 114 46 Stockholm, Sweden  |   www.magine.com  
> 
> 
> Privileged and/or Confidential Information may be contained in this message. 
> If you are not the addressee indicated in this message
> (or responsible for delivery of the message to such a person), you may not 
> copy or deliver this message to anyone. In such case, 
> you should destroy this message and kindly notify the sender by reply email.  
>  



Re: Build speed

2016-07-22 Thread Ted Yu
I assume you have enabled Zinc.

Cheers

On Fri, Jul 22, 2016 at 7:54 AM, Mikael Ståldal 
wrote:

> Is there any way to speed up an incremental build of Spark?
>
> For me it takes 8 minutes to build the project with just a few code
> changes.
>
> --
> [image: MagineTV]
>
> *Mikael Ståldal*
> Senior software developer
>
> *Magine TV*
> mikael.stal...@magine.com
> Grev Turegatan 3  | 114 46 Stockholm, Sweden  |   www.magine.com
>
> Privileged and/or Confidential Information may be contained in this
> message. If you are not the addressee indicated in this message
> (or responsible for delivery of the message to such a person), you may not
> copy or deliver this message to anyone. In such case,
> you should destroy this message and kindly notify the sender by reply
> email.
>


Build speed

2016-07-22 Thread Mikael Ståldal
Is there any way to speed up an incremental build of Spark?

For me it takes 8 minutes to build the project with just a few code changes.

-- 
[image: MagineTV]

*Mikael Ståldal*
Senior software developer

*Magine TV*
mikael.stal...@magine.com
Grev Turegatan 3  | 114 46 Stockholm, Sweden  |   www.magine.com

Privileged and/or Confidential Information may be contained in this
message. If you are not the addressee indicated in this message
(or responsible for delivery of the message to such a person), you may not
copy or deliver this message to anyone. In such case,
you should destroy this message and kindly notify the sender by reply
email.


Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Matei Zaharia
+1

Tested on Mac.

Matei

> On Jul 22, 2016, at 11:18 AM, Joseph Bradley  wrote:
> 
> +1
> 
> Mainly tested ML/Graph/R.  Perf tests from Tim Hunter showed minor speedups 
> from 1.6 for common ML algorithms.
> 
> On Thu, Jul 21, 2016 at 9:41 AM, Ricardo Almeida 
> > wrote:
> +1 (non binding)
> 
> Tested PySpark Core, DataFrame/SQL, MLlib and Streaming on a standalone 
> cluster
> 
> On 21 July 2016 at 05:24, Reynold Xin  > wrote:
> +1
> 
> 
> On Wednesday, July 20, 2016, Krishna Sankar  > wrote:
> +1 (non-binding, of course)
> 
> 1. Compiled OS X 10.11.5 (El Capitan) OK Total time: 24:07 min
>  mvn clean package -Pyarn -Phadoop-2.7 -DskipTests
> 2. Tested pyspark, mllib (iPython 4.0)
> 2.0 Spark version is 2.0.0 
> 2.1. statistics (min,max,mean,Pearson,Spearman) OK
> 2.2. Linear/Ridge/Lasso Regression OK 
> 2.3. Classification : Decision Tree, Naive Bayes OK
> 2.4. Clustering : KMeans OK
>Center And Scale OK
> 2.5. RDD operations OK
>   State of the Union Texts - MapReduce, Filter,sortByKey (word count)
> 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
>Model evaluation/optimization (rank, numIter, lambda) with itertools OK
> 3. Scala - MLlib
> 3.1. statistics (min,max,mean,Pearson,Spearman) OK
> 3.2. LinearRegressionWithSGD OK
> 3.3. Decision Tree OK
> 3.4. KMeans OK
> 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
> 3.6. saveAsParquetFile OK
> 3.7. Read and verify the 3.6 save(above) - sqlContext.parquetFile, 
> registerTempTable, sql OK
> 3.8. result = sqlContext.sql("SELECT 
> OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER 
> JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK
> 4.0. Spark SQL from Python OK
> 4.1. result = sqlContext.sql("SELECT * from people WHERE State = 'WA'") OK
> 5.0. Packages
> 5.1. com.databricks.spark.csv - read/write OK (--packages 
> com.databricks:spark-csv_2.10:1.4.0)
> 6.0. DataFrames 
> 6.1. cast,dtypes OK
> 6.2. groupBy,avg,crosstab,corr,isNull,na.drop OK
> 6.3. All joins,sql,set operations,udf OK
> [Dataframe Operations very fast from 11 secs to 3 secs, to 1.8 secs, to 1.5 
> secs! Good work !!!]
> 7.0. GraphX/Scala
> 7.1. Create Graph (small and bigger dataset) OK
> 7.2. Structure APIs - OK
> 7.3. Social Network/Community APIs - OK
> 7.4. Algorithms : PageRank of 2 datasets, aggregateMessages() - OK
> 
> Cheers
> 
> 
> On Tue, Jul 19, 2016 at 7:35 PM, Reynold Xin > wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes 
> if a majority of at least 3 +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
> 
> 
> The tag to be voted on is v2.0.0-rc5 
> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
> 
> This release candidate resolves ~2500 issues: 
> https://s.apache.org/spark-2.0.0-jira 
> 
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/ 
> 
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc 
> 
> 
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1195/ 
> 
> 
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/ 
> 
> 
> 
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an 
> existing Spark workload and running on this release candidate, then reporting 
> any regressions from 1.x.
> 
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
> 
> Bugs already present in 1.x, missing features, or bugs related to new 
> features will not necessarily block this release. Note that historically 
> Spark documentation has been published on the website separately from the 
> main release so we do not need to block the release due to documentation 
> errors either.
> 
> 
> 
> 



Re: ml ALS.fit(..) issue

2016-07-22 Thread VG
Dev team,

Can someone please help me here.

-VG

On Fri, Jul 22, 2016 at 8:30 PM, VG  wrote:

> Can someone please help here.
>
> I tried both scala 2.10 and 2.11 on the system
>
>
>
> On Fri, Jul 22, 2016 at 7:59 PM, VG  wrote:
>
>> I am using version 2.0.0-preview
>>
>>
>>
>> On Fri, Jul 22, 2016 at 7:47 PM, VG  wrote:
>>
>>> I am running into the following error when running ALS
>>>
>>> Exception in thread "main" java.lang.NoSuchMethodError:
>>> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
>>> at org.apache.spark.ml.recommendation.ALS.fit(ALS.scala:452)
>>> at yelp.TestUser.main(TestUser.java:101)
>>>
>>> here line 101 in the above error is the following in code.
>>>
>>> ALSModel model = als.fit(training);
>>>
>>>
>>> Does anyone has a suggestion what is going on here and where I might be
>>> going wrong ?
>>> Please suggest
>>>
>>> -VG
>>>
>>
>>
>


Error in running JavaALSExample example from spark examples

2016-07-22 Thread VG
I am getting the following error

Exception in thread "main" java.lang.NoSuchMethodError:
scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at org.apache.spark.ml.recommendation.ALS.fit(ALS.scala:452)

Any suggestions to resolve this

VG


Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Joseph Bradley
+1

Mainly tested ML/Graph/R.  Perf tests from Tim Hunter showed minor speedups
from 1.6 for common ML algorithms.

On Thu, Jul 21, 2016 at 9:41 AM, Ricardo Almeida <
ricardo.alme...@actnowib.com> wrote:

> +1 (non binding)
>
> Tested PySpark Core, DataFrame/SQL, MLlib and Streaming on a standalone
> cluster
>
> On 21 July 2016 at 05:24, Reynold Xin  wrote:
>
>> +1
>>
>>
>> On Wednesday, July 20, 2016, Krishna Sankar  wrote:
>>
>>> +1 (non-binding, of course)
>>>
>>> 1. Compiled OS X 10.11.5 (El Capitan) OK Total time: 24:07 min
>>>  mvn clean package -Pyarn -Phadoop-2.7 -DskipTests
>>> 2. Tested pyspark, mllib (iPython 4.0)
>>> 2.0 Spark version is 2.0.0
>>> 2.1. statistics (min,max,mean,Pearson,Spearman) OK
>>> 2.2. Linear/Ridge/Lasso Regression OK
>>> 2.3. Classification : Decision Tree, Naive Bayes OK
>>> 2.4. Clustering : KMeans OK
>>>Center And Scale OK
>>> 2.5. RDD operations OK
>>>   State of the Union Texts - MapReduce, Filter,sortByKey (word count)
>>> 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
>>>Model evaluation/optimization (rank, numIter, lambda) with
>>> itertools OK
>>> 3. Scala - MLlib
>>> 3.1. statistics (min,max,mean,Pearson,Spearman) OK
>>> 3.2. LinearRegressionWithSGD OK
>>> 3.3. Decision Tree OK
>>> 3.4. KMeans OK
>>> 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
>>> 3.6. saveAsParquetFile OK
>>> 3.7. Read and verify the 3.6 save(above) - sqlContext.parquetFile,
>>> registerTempTable, sql OK
>>> 3.8. result = sqlContext.sql("SELECT
>>> OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
>>> JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK
>>> 4.0. Spark SQL from Python OK
>>> 4.1. result = sqlContext.sql("SELECT * from people WHERE State = 'WA'")
>>> OK
>>> 5.0. Packages
>>> 5.1. com.databricks.spark.csv - read/write OK (--packages
>>> com.databricks:spark-csv_2.10:1.4.0)
>>> 6.0. DataFrames
>>> 6.1. cast,dtypes OK
>>> 6.2. groupBy,avg,crosstab,corr,isNull,na.drop OK
>>> 6.3. All joins,sql,set operations,udf OK
>>> [Dataframe Operations very fast from 11 secs to 3 secs, to 1.8 secs, to
>>> 1.5 secs! Good work !!!]
>>> 7.0. GraphX/Scala
>>> 7.1. Create Graph (small and bigger dataset) OK
>>> 7.2. Structure APIs - OK
>>> 7.3. Social Network/Community APIs - OK
>>> 7.4. Algorithms : PageRank of 2 datasets, aggregateMessages() - OK
>>>
>>> Cheers
>>> 
>>>
>>> On Tue, Jul 19, 2016 at 7:35 PM, Reynold Xin 
>>> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT
 and passes if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 2.0.0
 [ ] -1 Do not release this package because ...


 The tag to be voted on is v2.0.0-rc5
 (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).

 This release candidate resolves ~2500 issues:
 https://s.apache.org/spark-2.0.0-jira

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1195/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/


 =
 How can I help test this release?
 =
 If you are a Spark user, you can help us test this release by taking an
 existing Spark workload and running on this release candidate, then
 reporting any regressions from 1.x.

 ==
 What justifies a -1 vote for this release?
 ==
 Critical bugs impacting major functionalities.

 Bugs already present in 1.x, missing features, or bugs related to new
 features will not necessarily block this release. Note that historically
 Spark documentation has been published on the website separately from the
 main release so we do not need to block the release due to documentation
 errors either.


>>>
>


Re: Error in running JavaALSExample example from spark examples

2016-07-22 Thread VG
Using 2.0.0-preview using maven
So all dependencies should be correct I guess


org.apache.spark
spark-core_2.11
2.0.0-preview
provided


I see in maven dependencies that this brings in
scala-reflect-2.11.4
scala-compiler-2.11.0

and so on



On Fri, Jul 22, 2016 at 11:04 PM, Aaron Ilovici 
wrote:

> What version of Spark/Scala are you running?
>
>
>
> -Aaron
>


Re: ml ALS.fit(..) issue

2016-07-22 Thread Benjamin Fradet
Seems like there is an incompatibility regarding scala versions between
your program and the scala version Spark was compiled against.
Either you're using scala 2.11 and your spark installation was built using
2.10 or the other way around.

On Fri, Jul 22, 2016 at 11:06 PM, Pedro Rodriguez 
wrote:

> The dev list is meant for working on development of Spark, not as a way of
> escalating an issue just fyi.
>
> If someone hasn't replied on the user list either you haven't given it
> enough time or no one has a fix for you. I've definitely gotten replies
> from committers multiple times to many questions so its definitely *not*
> the case that they don't care
>
> On Fri, Jul 22, 2016 at 10:18 AM, VG  wrote:
>
>> Dev team,
>>
>> Can someone please help me here.
>>
>> -VG
>>
>> On Fri, Jul 22, 2016 at 8:30 PM, VG  wrote:
>>
>>> Can someone please help here.
>>>
>>> I tried both scala 2.10 and 2.11 on the system
>>>
>>>
>>>
>>> On Fri, Jul 22, 2016 at 7:59 PM, VG  wrote:
>>>
 I am using version 2.0.0-preview



 On Fri, Jul 22, 2016 at 7:47 PM, VG  wrote:

> I am running into the following error when running ALS
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
> at org.apache.spark.ml.recommendation.ALS.fit(ALS.scala:452)
> at yelp.TestUser.main(TestUser.java:101)
>
> here line 101 in the above error is the following in code.
>
> ALSModel model = als.fit(training);
>
>
> Does anyone has a suggestion what is going on here and where I might
> be going wrong ?
> Please suggest
>
> -VG
>


>>>
>>
>
>
> --
> Pedro Rodriguez
> PhD Student in Distributed Machine Learning | CU Boulder
> UC Berkeley AMPLab Alumni
>
> ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423
> Github: github.com/EntilZha | LinkedIn:
> https://www.linkedin.com/in/pedrorodriguezscience
>
>


-- 
Ben Fradet.


Re: Spark jdbc update SaveMode

2016-07-22 Thread Maciej Bryński
2016-07-22 23:05 GMT+02:00 Ramon Rosa da Silva :
> Hi Folks,
>
>
>
> What do you think about allow update SaveMode from
> DataFrame.write.mode(“update”)?
>
> Now Spark just has jdbc insert.

I'm working on patch that creates new mode - 'upsert'.
In Mysql it will use 'REPLACE INTO' command.

M.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Spark jdbc update SaveMode

2016-07-22 Thread Ramon Rosa da Silva
Hi Folks,

What do you think about allow update SaveMode from 
DataFrame.write.mode("update")?
Now Spark just has jdbc insert.


This e-mail message, including any attachments, is for the sole use of the 
person to whom it has been sent and may contain information that is 
confidential or legally protected. If you are not the intended recipient or 
have received this message in error, you are not authorized to copy, 
distribute, or otherwise use it or its attachments. Please notify the sender 
immediately by return email and permanently delete this message and any 
attachments. NeoGrid makes no warranty that this email is error or virus free. 
NeoGrid Europe Limited is a company registered in the United Kingdom with the 
registration number 7717968. The registered office is 8-10 Upper Marlborough 
Road, St Albans AL1 3UR, Hertfordshire, UK. NeoGrid Netherlands B.V. is a 
company registered in the Netherlands with the registration number 3416.6499 
and registered office at Science Park 400, 1098 XH Amsterdam, NL. NeoGrid North 
America Limited is a company registered in the United States with the 
registration number 52-2242825. The registered office is 55 West Monroe Street, 
Suite 3590-60603, Chicago, IL, USA. NeoGrid Japan is located at New Otani 
Garden Court 7F, 4-1 Kioi-cho, Chiyoda-ku, Tokyo 102-0094, Japan. NeoGrid 
Software SA is a company registered in Brazil, with the registration number 
CNPJ: 03.553.145/0001-08 and located at Av. Santos Dumont, 935, 89.218-105, 
Joinville - SC - Brazil.

Esta mensagem pode conter informa??o confidencial ou privilegiada, sendo seu 
sigilo protegido por lei. Se voc? n?o for o destinat?rio ou a pessoa autorizada 
a receber esta mensagem, n?o pode usar, copiar ou divulgar as informa??es nela 
contidas ou tomar qualquer a??o baseada nessas informa??es. Se voc? recebeu 
esta mensagem por engano, por favor, avise imediatamente ao remetente, 
respondendo o e-mail e em seguida apague-a. Agradecemos sua coopera??o.


Re: ml ALS.fit(..) issue

2016-07-22 Thread Pedro Rodriguez
The dev list is meant for working on development of Spark, not as a way of
escalating an issue just fyi.

If someone hasn't replied on the user list either you haven't given it
enough time or no one has a fix for you. I've definitely gotten replies
from committers multiple times to many questions so its definitely *not*
the case that they don't care

On Fri, Jul 22, 2016 at 10:18 AM, VG  wrote:

> Dev team,
>
> Can someone please help me here.
>
> -VG
>
> On Fri, Jul 22, 2016 at 8:30 PM, VG  wrote:
>
>> Can someone please help here.
>>
>> I tried both scala 2.10 and 2.11 on the system
>>
>>
>>
>> On Fri, Jul 22, 2016 at 7:59 PM, VG  wrote:
>>
>>> I am using version 2.0.0-preview
>>>
>>>
>>>
>>> On Fri, Jul 22, 2016 at 7:47 PM, VG  wrote:
>>>
 I am running into the following error when running ALS

 Exception in thread "main" java.lang.NoSuchMethodError:
 scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
 at org.apache.spark.ml.recommendation.ALS.fit(ALS.scala:452)
 at yelp.TestUser.main(TestUser.java:101)

 here line 101 in the above error is the following in code.

 ALSModel model = als.fit(training);


 Does anyone has a suggestion what is going on here and where I might be
 going wrong ?
 Please suggest

 -VG

>>>
>>>
>>
>


-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience


Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Luciano Resende
+ 1 (non-binding)

Found a minor issue when trying to run some of the docker tests, but
nothing blocking the release. Will create a JIRA for that.

On Tue, Jul 19, 2016 at 7:35 PM, Reynold Xin  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.0-rc5
> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
>
> This release candidate resolves ~2500 issues:
> https://s.apache.org/spark-2.0.0-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1195/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/
>
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>
>


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/


Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Holden Karau
+1 (non-binding)

Built locally on Ubuntu 14.04, basic pyspark sanity checking & tested with
a simple structured streaming project (spark-structured-streaming-ml) &
spark-testing-base & high-performance-spark-examples (minor changes
required from preview version but seem intentional & jetty conflicts with
out of date testing library - but not a Spark problem).

On Fri, Jul 22, 2016 at 12:45 PM, Luciano Resende 
wrote:

> + 1 (non-binding)
>
> Found a minor issue when trying to run some of the docker tests, but
> nothing blocking the release. Will create a JIRA for that.
>
> On Tue, Jul 19, 2016 at 7:35 PM, Reynold Xin  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.0.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> The tag to be voted on is v2.0.0-rc5
>> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
>>
>> This release candidate resolves ~2500 issues:
>> https://s.apache.org/spark-2.0.0-jira
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1195/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/
>>
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions from 1.x.
>>
>> ==
>> What justifies a -1 vote for this release?
>> ==
>> Critical bugs impacting major functionalities.
>>
>> Bugs already present in 1.x, missing features, or bugs related to new
>> features will not necessarily block this release. Note that historically
>> Spark documentation has been published on the website separately from the
>> main release so we do not need to block the release due to documentation
>> errors either.
>>
>>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>



-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Suresh Thalamati
+1 (non-binding)

Tested data source api , and jdbc data sources. 


> On Jul 19, 2016, at 7:35 PM, Reynold Xin  wrote:
> 
> Please vote on releasing the following candidate as Apache Spark version 
> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes 
> if a majority of at least 3 +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
> 
> 
> The tag to be voted on is v2.0.0-rc5 
> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
> 
> This release candidate resolves ~2500 issues: 
> https://s.apache.org/spark-2.0.0-jira 
> 
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/ 
> 
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc 
> 
> 
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1195/ 
> 
> 
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/ 
> 
> 
> 
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an 
> existing Spark workload and running on this release candidate, then reporting 
> any regressions from 1.x.
> 
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
> 
> Bugs already present in 1.x, missing features, or bugs related to new 
> features will not necessarily block this release. Note that historically 
> Spark documentation has been published on the website separately from the 
> main release so we do not need to block the release due to documentation 
> errors either.
> 



Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Michael Armbrust
+1

On Fri, Jul 22, 2016 at 2:42 PM, Holden Karau  wrote:

> +1 (non-binding)
>
> Built locally on Ubuntu 14.04, basic pyspark sanity checking & tested with
> a simple structured streaming project (spark-structured-streaming-ml) &
> spark-testing-base & high-performance-spark-examples (minor changes
> required from preview version but seem intentional & jetty conflicts with
> out of date testing library - but not a Spark problem).
>
> On Fri, Jul 22, 2016 at 12:45 PM, Luciano Resende 
> wrote:
>
>> + 1 (non-binding)
>>
>> Found a minor issue when trying to run some of the docker tests, but
>> nothing blocking the release. Will create a JIRA for that.
>>
>> On Tue, Jul 19, 2016 at 7:35 PM, Reynold Xin  wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
>>> if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.0.0
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> The tag to be voted on is v2.0.0-rc5
>>> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
>>>
>>> This release candidate resolves ~2500 issues:
>>> https://s.apache.org/spark-2.0.0-jira
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1195/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/
>>>
>>>
>>> =
>>> How can I help test this release?
>>> =
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions from 1.x.
>>>
>>> ==
>>> What justifies a -1 vote for this release?
>>> ==
>>> Critical bugs impacting major functionalities.
>>>
>>> Bugs already present in 1.x, missing features, or bugs related to new
>>> features will not necessarily block this release. Note that historically
>>> Spark documentation has been published on the website separately from the
>>> main release so we do not need to block the release due to documentation
>>> errors either.
>>>
>>>
>>
>>
>> --
>> Luciano Resende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>>
>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>


Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Felix Cheung
+1

Tested on Ubuntu, ran a bunch of SparkR tests, found a broken link in doc but 
not a blocker.


_
From: Michael Armbrust >
Sent: Friday, July 22, 2016 3:18 PM
Subject: Re: [VOTE] Release Apache Spark 2.0.0 (RC5)
To: >
Cc: Reynold Xin >


+1

On Fri, Jul 22, 2016 at 2:42 PM, Holden Karau 
> wrote:
+1 (non-binding)

Built locally on Ubuntu 14.04, basic pyspark sanity checking & tested with a 
simple structured streaming project (spark-structured-streaming-ml) & 
spark-testing-base & high-performance-spark-examples (minor changes required 
from preview version but seem intentional & jetty conflicts with out of date 
testing library - but not a Spark problem).

On Fri, Jul 22, 2016 at 12:45 PM, Luciano Resende 
> wrote:
+ 1 (non-binding)

Found a minor issue when trying to run some of the docker tests, but nothing 
blocking the release. Will create a JIRA for that.

On Tue, Jul 19, 2016 at 7:35 PM, Reynold Xin 
> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.0.0. 
The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.0.0
[ ] -1 Do not release this package because ...


The tag to be voted on is v2.0.0-rc5 (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).

This release candidate resolves ~2500 issues: 
https://s.apache.org/spark-2.0.0-jira

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1195/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/


=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions from 1.x.

==
What justifies a -1 vote for this release?
==
Critical bugs impacting major functionalities.

Bugs already present in 1.x, missing features, or bugs related to new features 
will not necessarily block this release. Note that historically Spark 
documentation has been published on the website separately from the main 
release so we do not need to block the release due to documentation errors 
either.




--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/



--
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau





Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Ewan Leith
I think this new issue in JIRA blocks the release unfortunately?

https://issues.apache.org/jira/browse/SPARK-16664 - Persist call on data frames 
with more than 200 columns is wiping out the data

Otherwise there'll need to be 2.0.1 pretty much right after?

Thanks,
Ewan

On 23 Jul 2016 03:46, Xiao Li  wrote:
+1

2016-07-22 19:32 GMT-07:00 Kousuke Saruta 
>:

+1 (non-binding)

Tested on my cluster with three slave nodes.


On 2016/07/23 10:25, Suresh Thalamati wrote:
+1 (non-binding)

Tested data source api , and jdbc data sources.


On Jul 19, 2016, at 7:35 PM, Reynold Xin 
> wrote:

Please vote on releasing the following candidate as Apache Spark version 2.0.0. 
The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.0.0
[ ] -1 Do not release this package because ...


The tag to be voted on is v2.0.0-rc5 (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).

This release candidate resolves ~2500 issues: 
https://s.apache.org/spark-2.0.0-jira

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1195/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/


=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions from 1.x.

==
What justifies a -1 vote for this release?
==
Critical bugs impacting major functionalities.

Bugs already present in 1.x, missing features, or bugs related to new features 
will not necessarily block this release. Note that historically Spark 
documentation has been published on the website separately from the main 
release so we do not need to block the release due to documentation errors 
either.







Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Kousuke Saruta

+1 (non-binding)

Tested on my cluster with three slave nodes.

On 2016/07/23 10:25, Suresh Thalamati wrote:

+1 (non-binding)

Tested data source api , and jdbc data sources.


On Jul 19, 2016, at 7:35 PM, Reynold Xin > wrote:


Please vote on releasing the following candidate as Apache Spark 
version 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 
PDT and passes if a majority of at least 3 +1 PMC votes are cast.


[ ] +1 Release this package as Apache Spark 2.0.0
[ ] -1 Do not release this package because ...


The tag to be voted on is v2.0.0-rc5 
(13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).


This release candidate resolves ~2500 issues: 
https://s.apache.org/spark-2.0.0-jira


The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/ 



Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1195/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/ 




=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking 
an existing Spark workload and running on this release candidate, 
then reporting any regressions from 1.x.


==
What justifies a -1 vote for this release?
==
Critical bugs impacting major functionalities.

Bugs already present in 1.x, missing features, or bugs related to new 
features will not necessarily block this release. Note that 
historically Spark documentation has been published on the website 
separately from the main release so we do not need to block the 
release due to documentation errors either.








Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Xiao Li
+1

2016-07-22 19:32 GMT-07:00 Kousuke Saruta :

> +1 (non-binding)
>
> Tested on my cluster with three slave nodes.
>
> On 2016/07/23 10:25, Suresh Thalamati wrote:
>
> +1 (non-binding)
>
> Tested data source api , and jdbc data sources.
>
>
> On Jul 19, 2016, at 7:35 PM, Reynold Xin  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.0-rc5
> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
>
> This release candidate resolves ~2500 issues:
> https://s.apache.org/spark-2.0.0-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1195/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/
>
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>
>
>
>


Re: Build error

2016-07-22 Thread Jacek Laskowski
Hi,

Fixed now. git pull and start over.

https://github.com/apache/spark/commit/e1bd70f44b11141b000821e9754efeabc14f24a5


Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

On Fri, Jul 22, 2016 at 1:55 PM, Mikael Ståldal 
wrote:

> I get this error when trying to build from Git master branch:
>
> [ERROR] Failed to execute goal
> net.alchim31.maven:scala-maven-plugin:3.2.2:doc-jar (attach-scaladocs) on
> project spark-catalyst_2.11: MavenReportException: Error while creating
> archive: wrap: Process exited with an error: 1 (Exit value: 1) -> [Help 1]
>
> --
> [image: MagineTV]
>
> *Mikael Ståldal*
> Senior software developer
>
> *Magine TV*
> mikael.stal...@magine.com
> Grev Turegatan 3  | 114 46 Stockholm, Sweden  |   www.magine.com
>
> Privileged and/or Confidential Information may be contained in this
> message. If you are not the addressee indicated in this message
> (or responsible for delivery of the message to such a person), you may not
> copy or deliver this message to anyone. In such case,
> you should destroy this message and kindly notify the sender by reply
> email.
>


Build error

2016-07-22 Thread Mikael Ståldal
I get this error when trying to build from Git master branch:

[ERROR] Failed to execute goal
net.alchim31.maven:scala-maven-plugin:3.2.2:doc-jar (attach-scaladocs) on
project spark-catalyst_2.11: MavenReportException: Error while creating
archive: wrap: Process exited with an error: 1 (Exit value: 1) -> [Help 1]

-- 
[image: MagineTV]

*Mikael Ståldal*
Senior software developer

*Magine TV*
mikael.stal...@magine.com
Grev Turegatan 3  | 114 46 Stockholm, Sweden  |   www.magine.com

Privileged and/or Confidential Information may be contained in this
message. If you are not the addressee indicated in this message
(or responsible for delivery of the message to such a person), you may not
copy or deliver this message to anyone. In such case,
you should destroy this message and kindly notify the sender by reply
email.