Re: KEYS file?

2016-07-11 Thread Sean Owen
Aha, that's landed. OK I'll figure it out tomorrow and push my update
to verify it all works.

On Mon, Jul 11, 2016 at 8:54 PM, Reynold Xin  wrote:
> It's related to this apparently:
> https://issues.apache.org/jira/servicedesk/customer/portal/1/INFRA-12055
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: KEYS file?

2016-07-11 Thread Sean Owen
Eh, to anyone else who's ever pushed to the SVN-hosted
spark.apache.org site: are you able to commit anything right now? This
error is brand-new and has stumped me:

svn: E195023: Changing file
'/Users/srowen/Documents/asf-spark-site/downloads.md' is forbidden by
the server
svn: E175013: Access to
'/repos/asf/!svn/txr/1752209-12gpm/spark/downloads.md' forbidden

Maybe my perms got messed up, so, first checking to see if it affects
anyone else. FWIW this is all I'm trying to change; anyone is welcome
to commit this:


Index: downloads.md
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
--- downloads.md (revision 1752185)
+++ downloads.md (revision )
@@ -31,7 +31,7 @@

 4. Download Spark: 

-5. Verify this release using the .
+5. Verify this release using the  and [project release
KEYS](https://www.apache.org/dist/spark/KEYS).

 _Note: Scala 2.11 users should download the Spark source package and build
 [with Scala 2.11
support](http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211)._
Index: site/downloads.html
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
--- site/downloads.html (revision 1752185)
+++ site/downloads.html (revision )
@@ -213,7 +213,7 @@
 Download Spark: 
   
   
-Verify this release using the .
+Verify this release using the  and https://www.apache.org/dist/spark/KEYS;>project release
KEYS.
   
 




On Mon, Jul 11, 2016 at 5:43 PM, Sean Owen  wrote:
> Yeah the canonical place for a project's KEYS file for ASF projects is
>
> http://www.apache.org/dist/{project}/KEYS
>
> and so you can indeed find this key among:
>
> http://www.apache.org/dist/spark/KEYS
>
> I'll put a link to this info on the downloads page because it is important 
> info.
>
> On Mon, Jul 11, 2016 at 4:48 AM, Shuai Lin  wrote:
>>> at least links to the keys used to sign releases on the
>>> download page
>>
>>
>> +1 for that.
>>
>> On Mon, Jul 11, 2016 at 3:35 AM, Phil Steitz  wrote:
>>>
>>> On 7/10/16 10:57 AM, Shuai Lin wrote:
>>> > Not sure where you see " 0x7C6C105FFC8ED089". I
>>>
>>> That's the key ID for the key below.
>>> > think the release is signed with the
>>> > key https://people.apache.org/keys/committer/pwendell.asc .
>>>
>>> Thanks!  That key matches.  The project should publish a KEYS file
>>> [1] or at least links to the keys used to sign releases on the
>>> download page.  Could be there is one somewhere and I just can't
>>> find it.
>>>
>>> Phil
>>>
>>> [1] http://www.apache.org/dev/release-signing.html#keys-policy
>>> >
>>> > I think this tutorial can be
>>> > helpful: http://www.apache.org/info/verification.html
>>> >
>>> > On Mon, Jul 11, 2016 at 12:57 AM, Phil Steitz
>>> > > wrote:
>>> >
>>> > I can't seem to find a link the the Spark KEYS file.  I am
>>> > trying to
>>> > validate the sigs on the 1.6.2 release artifacts and I need to
>>> > import 0x7C6C105FFC8ED089.  Is there a KEYS file available for
>>> > download somewhere?  Apologies if I am just missing an obvious
>>> > link.
>>> >
>>> > Phil
>>> >
>>> >
>>> >
>>> > -
>>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>> > 
>>> >
>>> >
>>>
>>>
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: branch-2.0 is now 2.0.1-SNAPSHOT?

2016-07-11 Thread Reynold Xin
I just bumped master branch version to 2.1.0-SNAPSHOT
https://github.com/apache/spark/commit/ffcb6e055a28f36208ed058a42df09c154555332

We used to have a problem with binary compatibility check not having the
2.0.0 base version in Maven (because 2.0.0 hasn't been released yet) but I
figured out a way yesterday to work around it.

On Mon, Jul 11, 2016 at 2:19 AM, Sean Owen  wrote:

> You are right, this is messed up a bit now.
>
> branch-2.0 should really still be 2.0.0-SNAPSHOT, technically. I think
> that was accidentally updated in the RC release. It won't matter a
> whole lot except for people who consume snapshots, but, you can always
> roll your own. After 2.0.0 it should be 2.0.1-SNAPSHOT anyway.
>
> Master isn't done yet because of a hiccup in the API checking
> component, MiMa. It should really be on 2.1.0-SNAPSHOT. At the latest
> it will be so after 2.0.0 is released but it sorta looks like Reynold
> maybe has an answer as of a few hours ago?
>
> Sean
>
> On Mon, Jul 11, 2016 at 10:15 AM, Dmitry Zhukov
>  wrote:
> > So, as I understand the correct git branch to maven version mapping
> should
> > be the following:
> >
> > branch-2.0 -> 2.0.0-SNAPSHOT
> > master -> 2.1.0-SNAPSHOT
> >
> > but the current is
> >
> > branch-2.0 -> 2.0.1-SNAPSHOT
> > master -> 2.0.0-SNAPTHOT
> >
> >
> > We are starting to play with Spark 2.0 in TransferWise and find the
> > versioning of the development branches very confusing. Any plans to fix
> it?
> >
> > Thanks!
> >
> > On Sat, Jul 2, 2016 at 11:07 PM, Koert Kuipers 
> wrote:
> >>
> >> that helps, now i know i simply need to look at master
> >>
> >> On Sat, Jul 2, 2016 at 1:37 PM, Sean Owen  wrote:
> >>>
> >>> So, on the one hand I think branch-2.0 should really still be on
> >>> 2.0.0-SNAPSHOT but is on 2.0.1-SNAPSHOT, and while master should
> >>> technically be on 2.1.0-SNAPSHOT but we can't quite because of MiMa
> >>> right now, I do see that both snapshots are being produced still:
> >>>
> >>>
> >>>
> https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-core_2.11/
> >>>
> >>> 2.0.0-SNAPSHOT is actually from master, kinda confusingly. Not sure if
> >>> that helps.
> >>>
> >>> On Sat, Jul 2, 2016 at 5:25 PM, Koert Kuipers 
> wrote:
> >>> > You do, snapshots for spark 2.0.0-SNAPSHOT are updated daily on the
> >>> > apache
> >>> > snapshot repo. I use them in our own unit tests to find regressions
> >>> > etc. in
> >>> > spark and report them back
> >>> >
> >>> > On Jul 2, 2016 3:35 AM, "Sean Owen"  wrote:
> >>> >>
> >>> >> Yeah, interesting question about whether it should be 2.0.1-SNAPSHOT
> >>> >> at this stage because 2.0.0 is not yet released. But I'm not sure we
> >>> >> publish snapshots anyway?
> >>> >>
> >>> >> On Sat, Jul 2, 2016 at 5:41 AM, Koert Kuipers 
> >>> >> wrote:
> >>> >> > is that correct?
> >>> >> > where do i get the latest 2.0.0-SNAPSHOT?
> >>> >> > thanks,
> >>> >> > koert
> >>
> >>
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Spark 2.0.0 performance; potential large Spark core regression

2016-07-11 Thread Adam Roberts
Ted,

That bug was https://issues.apache.org/jira/browse/SPARK-15822 and only 
found as part of running an sql-flights application (not with the unit 
tests), I don't know if this has anything to do with the regression we're 
seeing

One update is that we see the same ballpark regression for 1.6.2 vs 2.0 
with HiBench (large profile, 25g executor memory, 4g driver), again we 
will be carefully checking how these benchmarks are being run and what 
difference the options and configurations can make

Cheers,




From:   Ted Yu 
To: Adam Roberts/UK/IBM@IBMGB
Cc: Michael Allman , dev 
Date:   08/07/2016 17:26
Subject:Re: Spark 2.0.0 performance; potential large Spark core 
regression



bq. we turned it off when fixing a bug

Adam:
Can you refer to the bug JIRA ?

Thanks

On Fri, Jul 8, 2016 at 9:22 AM, Adam Roberts  wrote:
Thanks Michael, we can give your options a try and aim for a 2.0.0 tuned 
vs 2.0.0 default vs 1.6.2 default comparison, for future reference the 
defaults in Spark 2 RC2 look to be: 

sql.shuffle.partitions: 200 
Tungsten enabled: true 
Executor memory: 1 GB (we set to 18 GB) 
kryo buffer max: 64mb 
WholeStageCodegen: on I think, we turned it off when fixing a bug 
offHeap.enabled: false 
offHeap.size: 0 

Cheers, 




From:Michael Allman  
To:Adam Roberts/UK/IBM@IBMGB 
Cc:dev  
Date:08/07/2016 17:05 
Subject:Re: Spark 2.0.0 performance; potential large Spark core 
regression 



Here are some settings we use for some very large GraphX jobs. These are 
based on using EC2 c3.8xl workers: 

.set("spark.sql.shuffle.partitions", "1024")
   .set("spark.sql.tungsten.enabled", "true")
   .set("spark.executor.memory", "24g")
   .set("spark.kryoserializer.buffer.max","1g")
   .set("spark.sql.codegen.wholeStage", "true")
   .set("spark.memory.offHeap.enabled", "true")
   .set("spark.memory.offHeap.size", "25769803776") // 24 GB

Some of these are in fact default configurations. Some are not. 

Michael


On Jul 8, 2016, at 9:01 AM, Michael Allman  wrote: 

Hi Adam, 

>From our experience we've found the default Spark 2.0 configuration to be 
highly suboptimal. I don't know if this affects your benchmarks, but I 
would consider running some tests with tuned and alternate configurations. 


Michael 


On Jul 8, 2016, at 8:58 AM, Adam Roberts  wrote: 

Hi Michael, the two Spark configuration files aren't very exciting 

spark-env.sh 
Same as the template apart from a JAVA_HOME setting 

spark-defaults.conf 
spark.io.compression.codec lzf 

config.py has the Spark home set, is running Spark standalone mode, we run 
and prep Spark tests only, driver 8g, executor memory 16g, Kryo, 0.66 
memory fraction, 100 trials 

We can post the 1.6.2 comparison early next week, running lots of 
iterations over the weekend once we get the dedicated time again 

Cheers, 





From:Michael Allman  
To:Adam Roberts/UK/IBM@IBMGB 
Cc:dev  
Date:08/07/2016 16:44 
Subject:Re: Spark 2.0.0 performance; potential large Spark core 
regression 



Hi Adam, 

Do you have your spark confs and your spark-env.sh somewhere where we can 
see them? If not, can you make them available? 

Cheers, 

Michael 

On Jul 8, 2016, at 3:17 AM, Adam Roberts  wrote: 

Hi, we've been testing the performance of Spark 2.0 compared to previous 
releases, unfortunately there are no Spark 2.0 compatible versions of 
HiBench and SparkPerf apart from those I'm working on (see 
https://github.com/databricks/spark-perf/issues/108) 

With the Spark 2.0 version of SparkPerf we've noticed a 30% geomean 
regression with a very small scale factor and so we've generated a couple 
of profiles comparing 1.5.2 vs 2.0.0. Same JDK version and same platform. 
We will gather a 1.6.2 comparison and increase the scale factor. 

Has anybody noticed a similar problem? My changes for SparkPerf and Spark 
2.0 are very limited and AFAIK don't interfere with Spark core 
functionality, so any feedback on the changes would be much appreciated 
and welcome, I'd much prefer it if my changes are the problem. 

A summary for your convenience follows (this matches what I've mentioned 
on the SparkPerf issue above) 

1. spark-perf/config/config.py : SCALE_FACTOR=0.05
No. Of Workers: 1
Executor per Worker : 1
Executor Memory: 18G
Driver Memory : 8G
Serializer: kryo 

2. $SPARK_HOME/conf/spark-defaults.conf: executor Java Options: 
-Xdisableexplicitgc -Xcompressedrefs 

Main changes I made for the benchmark itself 
Use Scala 2.11.8 and Spark 2.0.0 RC2 on our local filesystem 
MLAlgorithmTests use Vectors.fromML 
For streaming-tests in HdfsRecoveryTest we use wordStream.foreachRDD not 
wordStream.foreach 
KVDataTest uses awaitTerminationOrTimeout in a 

Re: Spark performance regression test suite

2016-07-11 Thread Adam Roberts
Agreed, this is something that we do regularly when producing our own 
Spark distributions in IBM and so it will be beneficial to share updates 
with the wider community, so far it looks like Spark 1.6.2 is the best out 
of the box on spark-perf and HiBench (of course this may vary for real 
workloads, individual applications and tuning efforts) but we have more 
2.0 tests to be performed and we're not aware of any regressions between 
previous versions except for perhaps with the Spark 2.0.0 post I made.

I'm looking for testing and feedback from any Spark gurus with my 2.0 
changes for spark-perf (have a look at the open issue Holden's mentioned: 
https://github.com/databricks/spark-perf/issues/108) and the same goes for 
HiBench (FWIW we see the same regression on HiBench too: 
https://github.com/intel-hadoop/HiBench/issues/221).

One idea for us is that the benchmarking could be run optionally as part 
of the existing contribution process, an ideal solution IMO would involve 
an additional parameter for the Jenkins job that when ticked will result 
in a performance run being done with and without the change. As we don't 
have direct access to the Jenkins build button in the community, when 
contributing a change users could mark their change with something like 
@performance or "jenkins performance test this please". 

Alternatively the influential Spark folk could notice a change with a 
potential performance impact and have it tested accordingly. While 
microbenchmarks are useful it will be important to test the whole of 
Spark. Then there's also the use of tags in the JIRA - lots for us to work 
with if we wanted this.

This probably means the addition and therefore maintenance of dedicated 
machines in the build farm although this would highlight any regressions 
FAST as opposed to later on in the development cycle.

If there is indeed a regression we may have the fun task of binary 
chopping commits between 1.6.2 and now...again TBC but a real possibility, 
so interested to see if anybody else is doing regression testing and if 
they see a similar problem.

If we don't go down the "benchmark as you contribute" route, having such a 
suite will be perfect - it would clone the latest versions of each 
benchmark, build them for the current version of Spark (can identify the 
release from the pom), run the benchmarks we care about (let's say in 
Spark standalone mode with a couple of executors) and produce a geomean 
score - highlighting any significant deviations.

I'm happy to help with designing/reviewing this

Cheers,







From:   Michael Gummelt 
To: Eric Liang 
Cc: Holden Karau , Ted Yu , 
Michael Allman , dev 
Date:   11/07/2016 17:00
Subject:Re: Spark performance regression test suite



I second any effort to update, automate, and communicate the results of 
spark-perf (https://github.com/databricks/spark-perf)

On Fri, Jul 8, 2016 at 12:28 PM, Eric Liang  wrote:
Something like speed.pypy.org or the Chrome performance dashboards would 
be very useful.

On Fri, Jul 8, 2016 at 9:50 AM Holden Karau  wrote:
There are also the spark-perf and spark-sql-perf projects in the 
Databricks github (although I see an open issue for Spark 2.0 support in 
one of them).

On Friday, July 8, 2016, Ted Yu  wrote:
Found a few issues:

[SPARK-6810] Performance benchmarks for SparkR
[SPARK-2833] performance tests for linear regression
[SPARK-15447] Performance test for ALS in Spark 2.0

Haven't found one for Spark core.

On Fri, Jul 8, 2016 at 8:58 AM, Michael Allman  
wrote:
Hello,

I've seen a few messages on the mailing list regarding Spark performance 
concerns, especially regressions from previous versions. It got me 
thinking that perhaps an automated performance regression suite would be a 
worthwhile contribution? Is anyone working on this? Do we have a Jira 
issue for it?

I cannot commit to taking charge of such a project. I just thought it 
would be a great contribution for someone who does have the time and the 
chops to build it.

Cheers,

Michael
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org




-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau




-- 
Michael Gummelt
Software Engineer
Mesosphere

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: Spark performance regression test suite

2016-07-11 Thread Michael Gummelt
I second any effort to update, automate, and communicate the results of
spark-perf (https://github.com/databricks/spark-perf)

On Fri, Jul 8, 2016 at 12:28 PM, Eric Liang  wrote:

> Something like speed.pypy.org
> 
>  or
> the Chrome performance dashboards  would
> be very useful.
>
> On Fri, Jul 8, 2016 at 9:50 AM Holden Karau  wrote:
>
>> There are also the spark-perf and spark-sql-perf projects in the
>> Databricks github (although I see an open issue for Spark 2.0 support in
>> one of them).
>>
>> On Friday, July 8, 2016, Ted Yu  wrote:
>>
>>> Found a few issues:
>>>
>>> [SPARK-6810] Performance benchmarks for SparkR
>>> [SPARK-2833] performance tests for linear regression
>>> [SPARK-15447] Performance test for ALS in Spark 2.0
>>>
>>> Haven't found one for Spark core.
>>>
>>> On Fri, Jul 8, 2016 at 8:58 AM, Michael Allman 
>>> wrote:
>>>
 Hello,

 I've seen a few messages on the mailing list regarding Spark
 performance concerns, especially regressions from previous versions. It got
 me thinking that perhaps an automated performance regression suite would be
 a worthwhile contribution? Is anyone working on this? Do we have a Jira
 issue for it?

 I cannot commit to taking charge of such a project. I just thought it
 would be a great contribution for someone who does have the time and the
 chops to build it.

 Cheers,

 Michael
 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>>
>>
>> --
>> Cell : 425-233-8271
>> Twitter: https://twitter.com/holdenkarau
>>
>>


-- 
Michael Gummelt
Software Engineer
Mesosphere


Re: Stability of branch-2.0

2016-07-11 Thread Sean Owen
I agree -- Wenchen/Reynold do you know what's the theory there?

TBH I think that there has not been a 'real' release candidate yet.
It's not that big a deal if these first two have been speculative RCs
to get more feedback earlier for a major release, and that in fact
people want to let this bake somewhat longer that the RC would imply.
As long as it's converging towards fewer, more critical changes.

Excepting these merges I think that had been generally happening. It's
*mostly* critical stuff now. But yeah this won't actually get released
until blockers are resolved and merges slow down to what belongs in a
maintenance branch.


On Mon, Jul 11, 2016 at 11:08 AM, Pete Robbins  wrote:
> It looks like the vote on 2.0-rc2 will not pass so there will be a new RC
> from the 2.0 branch. With a project management hat on I would expect to see
> only fixes to the remaining blocker issues or high priority bug fixes going
> into the 2.0 branch as defect burn down. However, I see several new
> functional PRs which were originally targeted at 2.1 being merged into
> branch-2.0 (eg children of https://issues.apache.org/jira/browse/SPARK-16275
> ) and these will now be in the upcoming 2.0-RC3.
>
> I assume these are zero risk changes that will not further delay a 2.0
> release.
>
> Cheers,

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.0.0 (RC2)

2016-07-11 Thread Sean Owen
Yeah there were already other blockers when the RC was released. This
one was already noted in this thread. There will another RC soon I'm
sure. I guess it would be ideal if the remaining blockers were
resolved one way or the other before that, to make it possible that
RC3 could be the final release:

SPARK-14808 Spark MLlib, GraphX, SparkR 2.0 QA umbrella
SPARK-14812 ML, Graph 2.0 QA: API: Experimental, DeveloperApi, final,
sealed audit
SPARK-14813 ML 2.0 QA: API: Python API coverage
SPARK-14816 Update MLlib, GraphX, SparkR websites for 2.0
SPARK-14817 ML, Graph, R 2.0 QA: Programming guide update and migration guide
SPARK-15124 R 2.0 QA: New R APIs and API docs
SPARK-15623 2.0 python coverage ml.feature
SPARK-15630 2.0 python coverage ml root module

These are possibly all or mostly resolved already and have been
knocking around a while.

In any event, even a DoA RC3 might be useful if it kept up the testing.

Sean

On Mon, Jul 11, 2016 at 11:12 AM, Sun Rui  wrote:
> -1
> https://issues.apache.org/jira/browse/SPARK-16379
>
> On Jul 6, 2016, at 19:28, Maciej Bryński  wrote:
>
> -1
> https://issues.apache.org/jira/browse/SPARK-16379
>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.0.0 (RC2)

2016-07-11 Thread Sun Rui
-1
https://issues.apache.org/jira/browse/SPARK-16379 


> On Jul 6, 2016, at 19:28, Maciej Bryński  wrote:
> 
> -1
> https://issues.apache.org/jira/browse/SPARK-16379 
> 


Stability of branch-2.0

2016-07-11 Thread Pete Robbins
It looks like the vote on 2.0-rc2 will not pass so there will be a new RC
from the 2.0 branch. With a project management hat on I would expect to see
only fixes to the remaining blocker issues or high priority bug fixes going
into the 2.0 branch as defect burn down. However, I see several new
functional PRs which were originally targeted at 2.1 being merged into
branch-2.0 (eg children of https://issues.apache.org/jira/browse/SPARK-16275
) and these will now be in the upcoming 2.0-RC3.

I assume these are zero risk changes that will not further delay a 2.0
release.

Cheers,


Re: [VOTE] Release Apache Spark 2.0.0 (RC2)

2016-07-11 Thread Dmitry Zhukov
Sorry for bringing this topic up. Any updates here?

Really looking forward to the upcoming RC.

Thanks!

On Wed, Jul 6, 2016 at 6:19 PM, Ted Yu  wrote:

> Running the following command:
> build/mvn clean -Phive -Phive-thriftserver -Pyarn -Phadoop-2.6 -Psparkr
> -Dhadoop.version=2.7.0 package
>
> The build stopped with this test failure:
>
> ^[[31m- SPARK-9757 Persist Parquet relation with decimal column *** FAILED
> ***^[[0m
>
>
> On Wed, Jul 6, 2016 at 6:25 AM, Sean Owen  wrote:
>
>> Yeah we still have some blockers; I agree SPARK-16379 is a blocker
>> which came up yesterday. We also have 5 existing blockers, all doc
>> related:
>>
>> SPARK-14808 Spark MLlib, GraphX, SparkR 2.0 QA umbrella
>> SPARK-14812 ML, Graph 2.0 QA: API: Experimental, DeveloperApi, final,
>> sealed audit
>> SPARK-14816 Update MLlib, GraphX, SparkR websites for 2.0
>> SPARK-14817 ML, Graph, R 2.0 QA: Programming guide update and migration
>> guide
>> SPARK-15124 R 2.0 QA: New R APIs and API docs
>>
>> While we'll almost surely need another RC, this one is well worth
>> testing. It's much closer than even the last one.
>>
>> The sigs/hashes check out, and I successfully built with Ubuntu 16 /
>> Java 8 with -Pyarn -Phadoop-2.7 -Phive. Tests pass except for:
>>
>> DirectKafkaStreamSuite:
>> - offset recovery *** FAILED ***
>>   The code passed to eventually never returned normally. Attempted 196
>> times over 10.028979855 seconds. Last failure message:
>> strings.forall({
>> ((x$1: Any) => DirectKafkaStreamSuite.collectedData.contains(x$1))
>>   }) was false. (DirectKafkaStreamSuite.scala:250)
>> - Direct Kafka stream report input information
>>
>> I know we've seen this before and tried to fix it but it may need another
>> look.
>>
>> On Wed, Jul 6, 2016 at 6:35 AM, Reynold Xin  wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 2.0.0. The vote is open until Friday, July 8, 2016 at 23:00 PDT and
>> passes
>> > if a majority of at least 3 +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.0.0
>> > [ ] -1 Do not release this package because ...
>> >
>> >
>> > The tag to be voted on is v2.0.0-rc2
>> > (4a55b2326c8cf50f772907a8b73fd5e7b3d1aa06).
>> >
>> > This release candidate resolves ~2500 issues:
>> > https://s.apache.org/spark-2.0.0-jira
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc2-bin/
>> >
>> > Release artifacts are signed with the following key:
>> > https://people.apache.org/keys/committer/pwendell.asc
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1189/
>> >
>> > The documentation corresponding to this release can be found at:
>> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc2-docs/
>> >
>> >
>> > =
>> > How can I help test this release?
>> > =
>> > If you are a Spark user, you can help us test this release by taking an
>> > existing Spark workload and running on this release candidate, then
>> > reporting any regressions from 1.x.
>> >
>> > ==
>> > What justifies a -1 vote for this release?
>> > ==
>> > Critical bugs impacting major functionalities.
>> >
>> > Bugs already present in 1.x, missing features, or bugs related to new
>> > features will not necessarily block this release. Note that historically
>> > Spark documentation has been published on the website separately from
>> the
>> > main release so we do not need to block the release due to documentation
>> > errors either.
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>


Re: branch-2.0 is now 2.0.1-SNAPSHOT?

2016-07-11 Thread Sean Owen
You are right, this is messed up a bit now.

branch-2.0 should really still be 2.0.0-SNAPSHOT, technically. I think
that was accidentally updated in the RC release. It won't matter a
whole lot except for people who consume snapshots, but, you can always
roll your own. After 2.0.0 it should be 2.0.1-SNAPSHOT anyway.

Master isn't done yet because of a hiccup in the API checking
component, MiMa. It should really be on 2.1.0-SNAPSHOT. At the latest
it will be so after 2.0.0 is released but it sorta looks like Reynold
maybe has an answer as of a few hours ago?

Sean

On Mon, Jul 11, 2016 at 10:15 AM, Dmitry Zhukov
 wrote:
> So, as I understand the correct git branch to maven version mapping should
> be the following:
>
> branch-2.0 -> 2.0.0-SNAPSHOT
> master -> 2.1.0-SNAPSHOT
>
> but the current is
>
> branch-2.0 -> 2.0.1-SNAPSHOT
> master -> 2.0.0-SNAPTHOT
>
>
> We are starting to play with Spark 2.0 in TransferWise and find the
> versioning of the development branches very confusing. Any plans to fix it?
>
> Thanks!
>
> On Sat, Jul 2, 2016 at 11:07 PM, Koert Kuipers  wrote:
>>
>> that helps, now i know i simply need to look at master
>>
>> On Sat, Jul 2, 2016 at 1:37 PM, Sean Owen  wrote:
>>>
>>> So, on the one hand I think branch-2.0 should really still be on
>>> 2.0.0-SNAPSHOT but is on 2.0.1-SNAPSHOT, and while master should
>>> technically be on 2.1.0-SNAPSHOT but we can't quite because of MiMa
>>> right now, I do see that both snapshots are being produced still:
>>>
>>>
>>> https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-core_2.11/
>>>
>>> 2.0.0-SNAPSHOT is actually from master, kinda confusingly. Not sure if
>>> that helps.
>>>
>>> On Sat, Jul 2, 2016 at 5:25 PM, Koert Kuipers  wrote:
>>> > You do, snapshots for spark 2.0.0-SNAPSHOT are updated daily on the
>>> > apache
>>> > snapshot repo. I use them in our own unit tests to find regressions
>>> > etc. in
>>> > spark and report them back
>>> >
>>> > On Jul 2, 2016 3:35 AM, "Sean Owen"  wrote:
>>> >>
>>> >> Yeah, interesting question about whether it should be 2.0.1-SNAPSHOT
>>> >> at this stage because 2.0.0 is not yet released. But I'm not sure we
>>> >> publish snapshots anyway?
>>> >>
>>> >> On Sat, Jul 2, 2016 at 5:41 AM, Koert Kuipers 
>>> >> wrote:
>>> >> > is that correct?
>>> >> > where do i get the latest 2.0.0-SNAPSHOT?
>>> >> > thanks,
>>> >> > koert
>>
>>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



SPARK-15465 - AnalysisException: cannot cast StructType to VectorUDT

2016-07-11 Thread Dmitry Zhukov
Hi!

I want to bring this issue of Spark 2.0 here
https://issues.apache.org/jira/browse/SPARK-15465.
It looks quite major (I would even say critical) to me. Should it be fixed
within RC?

I would also like to contribute myself but struggle to find a place where
to start...

Thanks!

--
Dmitry Zhukov
TransferWise