Re: [VOTE] Release Apache Spark 2.4.2

2019-05-01 Thread Felix Cheung
: Tuesday, April 30, 2019 1:52:53 PM To: Reynold Xin Cc: Jungtaek Lim; Dongjoon Hyun; Wenchen Fan; Michael Heuer; Terry Kim; dev; Xiao Li Subject: Re: [VOTE] Release Apache Spark 2.4.2 FWIW I'm OK with this even though I proposed the backport PR for discussion. It really is a tough

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-29 Thread Dongjoon Hyun
Hi, All and Xiao (as a next release manager). In any case, can the release manager include the information about the used release script as a part of VOTE email officially? That information will be very helpful to reproduce Spark build (in the downstream environment) Currently, it's not clearly

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-29 Thread Wenchen Fan
> it could just be fixed in master rather than back-port and re-roll the RC I don't think the release script is part of the released product. That said, we can just fix the release script in branch 2.4 without creating a new RC. We can even create a new repo for the release script, like spark-web

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-29 Thread Sean Owen
I think this is a reasonable idea; I know @vanzin had suggested it was simpler to use the latest in case a bug was found in the release script and then it could just be fixed in master rather than back-port and re-roll the RC. That said I think we did / had to already drop the ability to build <= 2

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Michael Heuer
park that includes > Scala versioned artifacts in our release. Our python library on PyPI depends > on pyspark, our Bioconda recipe depends on the pyspark Conda recipe, and our > Homebrew formula depends on the apache-spark Homebrew formula. > > Using Scala 2.12 in the binary distr

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Sean Owen
e, and our Homebrew formula depends on the apache-spark Homebrew > formula. > > Using Scala 2.12 in the binary distribution for Spark 2.4.2 was > unintentional and never voted on. There was a successful vote to default > to Scala 2.12 in Spark version 3.0. > >michael > > &

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Michael Heuer
rmula. Using Scala 2.12 in the binary distribution for Spark 2.4.2 was unintentional and never voted on. There was a successful vote to default to Scala 2.12 in Spark version 3.0. michael > On Apr 26, 2019, at 9:52 AM, Sean Owen wrote: > > To be clear, what's the nature of the pr

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Sean Owen
To be clear, what's the nature of the problem there... just Pyspark apps that are using a Scala-based library? Trying to make sure we understand what is and isn't a problem here. On Fri, Apr 26, 2019 at 9:44 AM Michael Heuer wrote: > This will also cause problems in Conda builds that depend on p

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Michael Heuer
This will also cause problems in Conda builds that depend on pyspark https://anaconda.org/conda-forge/pyspark and Homebrew builds that depend on apache-spark, as that also uses the binary distribution. https://formulae.brew.sh/formula/apache-spark#def

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Sean Owen
Re: .NET, what's the particular issue in there that it's causing? 2.4.2 still builds for 2.11. I'd imagine you'd be pulling dependencies from Maven central (?) or if needed can build for 2.11 from source. I'm more concerned about pyspark because it builds in 2.12 jars. On Fri, Apr 26, 2019 at 1:36

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Reynold Xin
I do feel it'd be better to not switch default Scala versions in a minor release. I don't know how much downstream this impacts. Dotnet is a good data point. Anybody else hit this issue? On Thu, Apr 25, 2019 at 11:36 PM, Terry Kim < yumin...@gmail.com > wrote: > > > > Very much interested in

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-25 Thread Terry Kim
Very much interested in hearing what you folks decide. We currently have a couple asking us questions at https://github.com/dotnet/spark/issues. Thanks, Terry -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-22 Thread Shixiong(Ryan) Zhu
release. >> >> On Thu, Apr 18, 2019 at 9:51 PM Wenchen Fan wrote: >> > >> > Please vote on releasing the following candidate as Apache Spark >> version 2.4.2. >> > >> > The vote is open until April 23 PST and passes if a majority +1 PMC >> vot

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-21 Thread Wenchen Fan
vote is open until April 23 PST and passes if a majority +1 PMC > votes are cast, with > > a minimum of 3 +1 votes. > > > > [ ] +1 Release this package as Apache Spark 2.4.2 > > [ ] -1 Do not release this package because ... > > > > To learn more abo

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-21 Thread Sean Owen
with > a minimum of 3 +1 votes. > > [ ] +1 Release this package as Apache Spark 2.4.2 > [ ] -1 Do not release this package because ... > > To learn more about Apache Spark, please see http://spark.apache.org/ > > The tag to be voted on is v2.4.2-rc1 (commit > a44880ba74c

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-21 Thread vaquar khan
* Wenchen Fan > *Cc:* Spark dev list > *Subject:* Re: [VOTE] Release Apache Spark 2.4.2 > > +1 from me too. > > It seems like there is support for merging the Jackson change into > 2.4.x (and, I think, a few more minor dependency updates) but this > doesn't have to go

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-21 Thread Felix Cheung
+1 R tests, package tests on r-hub. Manually check commits under R, doc etc From: Sean Owen Sent: Saturday, April 20, 2019 11:27 AM To: Wenchen Fan Cc: Spark dev list Subject: Re: [VOTE] Release Apache Spark 2.4.2 +1 from me too. It seems like there is

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-20 Thread Sean Owen
1 Release this package as Apache Spark 2.4.2 > [ ] -1 Do not release this package because ... > > To learn more about Apache Spark, please see http://spark.apache.org/ > > The tag to be voted on is v2.4.2-rc1 (commit > a44880ba74caab7a987128cb09c4bee41617770a): > https://github.c

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-19 Thread shane knapp
Wenchen Fan wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 2.4.2. >> >> The vote is open until April 23 PST and passes if a majority +1 PMC votes >> are cast, with >> a minimum of 3 +1 votes. >> >> [ ] +1 Release th

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-19 Thread Michael Armbrust
> a minimum of 3 +1 votes. > > [ ] +1 Release this package as Apache Spark 2.4.2 > [ ] -1 Do not release this package because ... > > To learn more about Apache Spark, please see http://spark.apache.org/ > > The tag to be voted on is v2.4.2-rc1 (commit > a44880ba74caab7a

Re: Spark 2.4.2

2019-04-19 Thread Sean Owen
heung > >> wrote: > >>> > >>> Re shading - same argument I’ve made earlier today in a PR... > >>> > >>> (Context- in many cases Spark has light or indirect dependencies but > >>> bringing them into the process breaks users cod

Re: Spark 2.4.2

2019-04-19 Thread Sean Owen
e shading - same argument I’ve made earlier today in a PR... >>> >>> (Context- in many cases Spark has light or indirect dependencies but >>> bringing them into the process breaks users code easily) >>> >>> >>> ____ >

Re: Spark 2.4.2

2019-04-19 Thread Driesprong, Fokko
;>> >>> (Context- in many cases Spark has light or indirect dependencies but >>> bringing them into the process breaks users code easily) >>> >>> >>> -- >>> *From:* Michael Heuer >>> *Sent:* Thursday, Ap

Re: Spark 2.4.2

2019-04-19 Thread Arun Mahadevan
bringing them into the process breaks users code easily) >> >> >> -- >> *From:* Michael Heuer >> *Sent:* Thursday, April 18, 2019 6:41 AM >> *To:* Reynold Xin >> *Cc:* Sean Owen; Michael Armbrust; Ryan Blue; Spark Dev List; Wenchen >

Re: Spark 2.4.2

2019-04-18 Thread Wenchen Fan
pendencies but > bringing them into the process breaks users code easily) > > > -- > *From:* Michael Heuer > *Sent:* Thursday, April 18, 2019 6:41 AM > *To:* Reynold Xin > *Cc:* Sean Owen; Michael Armbrust; Ryan Blue; Spark Dev List; Wenchen >

[VOTE] Release Apache Spark 2.4.2

2019-04-18 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 2.4.2. The vote is open until April 23 PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.4.2 [ ] -1 Do not release this package because ... To

Re: Spark 2.4.2

2019-04-18 Thread Felix Cheung
Xin Cc: Sean Owen; Michael Armbrust; Ryan Blue; Spark Dev List; Wenchen Fan; Xiao Li Subject: Re: Spark 2.4.2 +100 On Apr 18, 2019, at 1:48 AM, Reynold Xin mailto:r...@databricks.com>> wrote: We should have shaded all Spark’s dependencies :( On Wed, Apr 17, 2019 at 11:47 PM Sea

Re: Spark 2.4.2

2019-04-18 Thread Michael Heuer
+100 > On Apr 18, 2019, at 1:48 AM, Reynold Xin wrote: > > We should have shaded all Spark’s dependencies :( > > On Wed, Apr 17, 2019 at 11:47 PM Sean Owen > wrote: > For users that would inherit Jackson and use it directly, or whose > dependencies do. Spark itself (w

Re: Spark 2.4.2

2019-04-17 Thread Reynold Xin
We should have shaded all Spark’s dependencies :( On Wed, Apr 17, 2019 at 11:47 PM Sean Owen wrote: > For users that would inherit Jackson and use it directly, or whose > dependencies do. Spark itself (with modifications) should be OK with > the change. > It's risky and normally wouldn't backpor

Re: Spark 2.4.2

2019-04-17 Thread Sean Owen
For users that would inherit Jackson and use it directly, or whose dependencies do. Spark itself (with modifications) should be OK with the change. It's risky and normally wouldn't backport, except that I've heard a few times about concerns about CVEs affecting Databind, so wondering who else out t

Re: Spark 2.4.2

2019-04-17 Thread Reynold Xin
For Jackson - are you worrying about JSON parsing for users or internal Spark functionality breaking? On Wed, Apr 17, 2019 at 6:02 PM Sean Owen wrote: > There's only one other item on my radar, which is considering updating > Jackson to 2.9 in branch-2.4 to get security fixes. Pros: it's come up

Re: Spark 2.4.2

2019-04-17 Thread Sean Owen
There's only one other item on my radar, which is considering updating Jackson to 2.9 in branch-2.4 to get security fixes. Pros: it's come up a few times now that there are a number of CVEs open for 2.6.7. Cons: not clear they affect Spark, and Jackson 2.6->2.9 does change Jackson behavior non-triv

Re: Spark 2.4.2

2019-04-17 Thread Wenchen Fan
I volunteer to be the release manager for 2.4.2, as I was also going to propose 2.4.2 because of the reverting of SPARK-25250. Is there any other ongoing bug fixes we want to include in 2.4.2? If no I'd like to start the release process today (CST). Thanks, Wenchen On Thu, Apr 18, 2019 at 3:44 AM

Re: Spark 2.4.2

2019-04-17 Thread Sean Owen
I think the 'only backport bug fixes to branches' principle remains sound. But what's a bug fix? Something that changes behavior to match what is explicitly supposed to happen, or implicitly supposed to happen -- implied by what other similar things do, by reasonable user expectations, or simply ho

Re: Spark 2.4.2

2019-04-16 Thread Michael Armbrust
Thanks Ryan. To me the "test" for putting things in a maintenance release is really a trade-off between benefit and risk (along with some caveats, like user facing surface should not grow). The benefits here are fairly large (now it is possible to plug in partition aware data sources) and the risk

Re: Spark 2.4.2

2019-04-16 Thread Ryan Blue
Spark has a lot of strange behaviors already that we don't fix in patch releases. And bugs aren't usually fixed with a configuration flag to turn on the fix. That said, I don't have a problem with this commit making it into a patch release. This is a small change and looks safe enough to me. I was

Re: Spark 2.4.2

2019-04-16 Thread Michael Armbrust
I would argue that its confusing enough to a user for options from DataFrameWriter to be silently dropped when instantiating the data source to consider this a bug. They asked for partitioning to occur, and we are doing nothing (not even telling them we can't). I was certainly surprised by this b

Re: Spark 2.4.2

2019-04-16 Thread Ryan Blue
Is this a bug fix? It looks like a new feature to me. On Tue, Apr 16, 2019 at 4:13 PM Michael Armbrust wrote: > Hello All, > > I know we just released Spark 2.4.1, but in light of fixing SPARK-27453 > I was wondering if it > might make sense to

Spark 2.4.2

2019-04-16 Thread Michael Armbrust
Hello All, I know we just released Spark 2.4.1, but in light of fixing SPARK-27453 I was wondering if it might make sense to follow up quickly with 2.4.2. Without this fix its very hard to build a datasource that correctly handles partitioning w