Re: [VOTE] Decommissioning SPIP

2020-07-01 Thread Marcelo Vanzin
e.org/foundation/voting.html. > > Please vote before July 6th at noon: > > [ ] +1: Accept the proposal as an official SPIP > [ ] +0 > [ ] -1: I don't think this is a good idea because ... > > I will start the voting off with a +1 from myself. > > Cheers,

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-10 Thread Marcelo Vanzin
=== > In order to make timely releases, we will typically not hold the > release unless the bug in question is a regression from the previous > release. That being said, if there is something which is a regression > that has not been correctly targeted please ping me or a committer to > help target the issue. > > > Note: I fully expect this RC to fail. > > > > -- Marcelo Vanzin van...@gmail.com "Life's too short to drink cheap beer"

Re: Keytab, Proxy User & Principal

2020-03-12 Thread Marcelo Vanzin
this feels more like something better taken care of in Livy (e.g. by using KRB5CCNAME when running spark-submit). -- Marcelo Vanzin van...@gmail.com "Life's too short to drink cheap beer"

Re: Jenkins looks hosed

2019-12-23 Thread Marcelo Vanzin
Knapp wrote: > > > > > > checking it now. > > > > > > On Mon, Dec 23, 2019 at 11:27 AM Marcelo Vanzin > > > wrote: > > > > > > > > Just in the off-chance that someone with admin access to the Jenkins > > > > servers is around t

Jenkins looks hosed

2019-12-23 Thread Marcelo Vanzin
Just in the off-chance that someone with admin access to the Jenkins servers is around this week... they seem to be in a pretty unhappy state, I can't even load the UI. FYI in case you're waiting for your PR tests to finish (or even start running). -- Marcelo

Re: Do we need to finally update Guava?

2019-12-16 Thread Marcelo Vanzin
Great that Hadoop has done it (which, btw, probably means that Spark won't work with that version of Hadoop yet), but Hive also depends on Guava, and last time I tried, even Hive 3.x did not work with Guava 27. (Newer Hadoop versions also have a new artifact that shades a lot of dependencies,

Re: dev/merge_spark_pr.py broken on python 2

2019-11-08 Thread Marcelo Vanzin
the > author name if that's the case, or just use python 3. > > On Fri, Nov 8, 2019 at 12:20 PM Marcelo Vanzin wrote: > > > > Something related to non-ASCII characters. Worked fine with python 3. > > > > git branch -D PR_TOOL_MERGE_PR_26426_MASTER > > Traceback (mo

Re: dev/merge_spark_pr.py broken on python 2

2019-11-08 Thread Marcelo Vanzin
helped it > still work with Python 2: > https://github.com/apache/spark/commit/2ec3265ae76fc1e136e44c240c476ce572b679df#diff-c321b6c82ebb21d8fd225abea9b7b74c > > Hasn't otherwise changed in a while. What's the error? > > On Fri, Nov 8, 2019 at 11:37 AM Marcelo Vanzin > wrote:

dev/merge_spark_pr.py broken on python 2

2019-11-08 Thread Marcelo Vanzin
Hey all, Something broke that script when running with python 2. I know we want to deprecate python 2, but in that case, scripts should at least be changed to use "python3" in the shebang line... -- Marcelo - To unsubscribe

Re: [VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-28 Thread Marcelo Vanzin
+1 On Mon, Aug 26, 2019 at 1:28 PM Kazuaki Ishizaki wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.3.4. > > The vote is open until August 29th 2PM PST and passes if a majority +1 PMC > votes are cast, with > a minimum of 3 +1 votes. > > [ ] +1 Release

Re: [VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-28 Thread Marcelo Vanzin
(Ah, and the 2.4 RC has the same issue.) On Wed, Aug 28, 2019 at 2:23 PM Marcelo Vanzin wrote: > > Just noticed something before I started to run some tests. The output > of "spark-submit --version" is a little weird, in that it's missing > information (see end of e-mail).

Re: [VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-28 Thread Marcelo Vanzin
Just noticed something before I started to run some tests. The output of "spark-submit --version" is a little weird, in that it's missing information (see end of e-mail). Personally I don't think a lot of that output is super useful (like "Compiled by" or the repo URL), but the branch and

Re: [VOTE] Release Apache Spark 2.4.4 (RC3)

2019-08-28 Thread Marcelo Vanzin
+1 On Tue, Aug 27, 2019 at 4:06 PM Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.4.4. > > The vote is open until August 30th 5PM PST and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1 votes. > > [ ] +1 Release this

Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf

2019-04-05 Thread Marcelo Vanzin
dependencies in the classpath, is that correct? > > On Fri, Apr 5, 2019 at 10:57 AM Marcelo Vanzin wrote: >> >> The hadoop-3 profile doesn't really work yet, not even on master. >> That's being worked on still. >> >> On Fri, Apr 5, 2019 at 10:53 AM akirillov >&

Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf

2019-04-05 Thread Marcelo Vanzin
The hadoop-3 profile doesn't really work yet, not even on master. That's being worked on still. On Fri, Apr 5, 2019 at 10:53 AM akirillov wrote: > > Hi there! I'm trying to run Spark unit tests with the following profiles: > > And 'core' module fails with the following test failing with >

Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-28 Thread Marcelo Vanzin
(Anybody knows what's the deal with all the .invalid e-mail addresses?) Anyway. ASF has voting rules, and some things like releases follow specific rules: https://www.apache.org/foundation/voting.html#ReleaseVotes So, for releases, ultimately, the only votes that "count" towards the final tally

Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-14 Thread Marcelo Vanzin
take more than 15-20 mins, following which i will re-enable builds. >> >> On Wed, Mar 13, 2019 at 12:17 PM shane knapp wrote: >>> >>> ok awesome. let's shoot for 3pm PST. >>> >>> On Wed, Mar 13, 2019 at 11:59 AM Marcelo Vanzin wrote: >>&

Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-14 Thread Marcelo Vanzin
to launch the k8s integration tests. > >>>>> > >>>>> On Wed, Mar 13, 2019 at 2:55 PM shane knapp wrote: > >>>>>> > >>>>>> okie dokie! the time approacheth! > >>>>>> > >>>>>>

Re: Request to disable a bot account, 'Thincrs' in JIRA of Apache Spark

2019-03-13 Thread Marcelo Vanzin
Go for it. I would do it now, instead of waiting, since there's been enough time for them to take action. On Wed, Mar 13, 2019 at 4:32 PM Hyukjin Kwon wrote: > > Looks this bot keeps working. I am going to open a INFRA JIRA to block this > bot in few days. > Please let me know if you guys have

Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread Marcelo Vanzin
Sounds good. On Wed, Mar 13, 2019 at 12:17 PM shane knapp wrote: > > ok awesome. let's shoot for 3pm PST. > > On Wed, Mar 13, 2019 at 11:59 AM Marcelo Vanzin wrote: >> >> On Wed, Mar 13, 2019 at 11:53 AM shane knapp wrote: >> > On Wed, Mar 13, 2019 at 11:

Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread Marcelo Vanzin
On Wed, Mar 13, 2019 at 11:53 AM shane knapp wrote: > On Wed, Mar 13, 2019 at 11:49 AM Marcelo Vanzin wrote: >> >> Do the upgraded minikube/k8s versions break the current master client >> version too? >> > yes. Ah, so that part kinda sucks. Let's do this: sin

Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread Marcelo Vanzin
Do the upgraded minikube/k8s versions break the current master client version too? I'm not super concerned about 2.4 integration tests being broken for a little bit. It's very uncommon for new PRs to be open against branch-2.4 that would affect k8s. But I really don't want master to break. So if

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-08 Thread Marcelo Vanzin
t; > Should we create a new rc7? > > DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, > Inc > > > On Mar 8, 2019, at 10:54 AM, Marcelo Vanzin > > wrote: > > > > I personally find it a little weird to not have the commit in branch-2.

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-08 Thread Marcelo Vanzin
I personally find it a little weird to not have the commit in branch-2.4. Not that this would happen, but if the v2.4.1-rc6 tag is overwritten (e.g. accidentally) then you lose the reference to that commit, and then the exact commit from which the rc was generated is lost. On Fri, Mar 8, 2019 at

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-02-20 Thread Marcelo Vanzin
Just wanted to point out that https://issues.apache.org/jira/browse/SPARK-26859 is not in this RC, and is marked as a correctness bug. (The fix is in the 2.4 branch, just not in rc2.) On Wed, Feb 20, 2019 at 12:07 PM DB Tsai wrote: > > Please vote on releasing the following candidate as Apache

Re: merge script stopped working; Python 2/3 input() issue?

2019-02-15 Thread Marcelo Vanzin
BTW the main script has this that the website script does not: if sys.version < '3': input = raw_input # noqa On Fri, Feb 15, 2019 at 3:55 PM Sean Owen wrote: > > I'm seriously confused on this one. The spark-website merge script > just stopped working for me. It fails on the call to

Re: merge script stopped working; Python 2/3 input() issue?

2019-02-15 Thread Marcelo Vanzin
You're talking about the spark-website script, right? The main repo's script has been working for me, the website one is broken. I think it was caused by this dude changing raw_input to input recently: commit 8b6e7dceaf5d73de3f92907ceeab8925a2586685 Author: Sean Owen Date: Sat Jan 19 19:02:30

Re: building docker images for GPU

2019-02-12 Thread Marcelo Vanzin
I think I remember someone mentioning a thread about this on the PR discussion, and digging a bit I found this: http://apache-spark-developers-list.1001551.n3.nabble.com/Toward-an-quot-API-quot-for-spark-images-used-by-the-Kubernetes-back-end-td23622.html It started a discussion but I haven't

Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-11 Thread Marcelo Vanzin
+1. Ran our regression tests for YARN and Hive, all look good. On Tue, Feb 5, 2019 at 5:07 PM Takeshi Yamamuro wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.3.3. > > The vote is open until February 8 6:00PM (PST) and passes if a majority +1 > PMC votes

Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-08 Thread Marcelo Vanzin
Hi Takeshi, Since we only really have one +1 binding vote, do you want to extend this vote a bit? I've been stuck on a few things but plan to test this (setting things up now), but it probably won't happen before the deadline. On Tue, Feb 5, 2019 at 5:07 PM Takeshi Yamamuro wrote: > > Please

Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-23 Thread Marcelo Vanzin
-1 too. I just upgraded https://issues.apache.org/jira/browse/SPARK-26682 to blocker. It's a small fix and we should make it in 2.3.3. On Thu, Jan 17, 2019 at 6:49 PM Takeshi Yamamuro wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.3.3. > > The vote is

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-15 Thread Marcelo Vanzin
+1 to that. HIVE-16391 by itself means we're giving up things like Hadoop 3, and we're also putting the burden on the Hive folks to fix a problem that we created. The current PR is basically a Spark-side fix for that bug. It does mean also upgrading Hive (which gives us Hadoop 3, yay!), but I

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-15 Thread Marcelo Vanzin
The metastore interactions in Spark are currently based on APIs that are in the Hive exec jar; so that makes it not possible to have Spark work with Hadoop 3 until the exec jar is upgraded. It could be possible to re-implement those interactions based solely on the metastore client Hive

Re: Spark History UI + Keycloak Integration

2019-01-04 Thread Marcelo Vanzin
On Fri, Jan 4, 2019 at 3:25 AM G, Ajay (Nokia - IN/Bangalore) wrote: ... > Added session handler for all context - > contextHandler.setSessionHandler(new SessionHandler()) ... > Keycloak authentication seems to work, Is this the right approach ? If it is > fine I can submit a PR. I don't

Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-10 Thread Marcelo Vanzin
Hmm, it also seems that github comments are being sync'ed to jira. That's gonna get old very quickly, we should probably ask infra to disable that (if we can't do it ourselves). On Mon, Dec 10, 2018 at 9:13 AM Sean Owen wrote: > > Update for committers: now that my user ID is synced, I can >

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-16 Thread Marcelo Vanzin
Now that the switch to 2.12 by default has been made, it might be good to have a serious discussion about dropping 2.11 altogether. Many of the main arguments have already been talked about. But I don't remember anyone mentioning how easy it would be to break the 2.11 build now. For example, the

Re: which classes/methods are considered as private in Spark?

2018-11-13 Thread Marcelo Vanzin
On Tue, Nov 13, 2018 at 6:26 PM Wenchen Fan wrote: > Recently I updated the MiMa exclusion rules, and found MiMa tracks some > private classes/methods unexpectedly. Could you clarify what you mean here? Mima has some known limitations such as not handling "private[blah]" very well (because that

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Marcelo Vanzin
+user@ >> -- Forwarded message - >> From: Wenchen Fan >> Date: Thu, Nov 8, 2018 at 10:55 PM >> Subject: [ANNOUNCE] Announcing Apache Spark 2.4.0 >> To: Spark dev list >> >> >> Hi all, >> >> Apache Spark 2.4.0 is the fifth release in the 2.x line. This release adds >> Barrier

Re: Test and support only LTS JDK release?

2018-11-06 Thread Marcelo Vanzin
https://www.oracle.com/technetwork/java/javase/eol-135779.html On Tue, Nov 6, 2018 at 2:56 PM Felix Cheung wrote: > > Is there a list of LTS release that I can reference? > > > > From: Ryan Blue > Sent: Tuesday, November 6, 2018 1:28 PM > To: sn...@snazy.de > Cc:

Re: Test and support only LTS JDK release?

2018-11-06 Thread Marcelo Vanzin
+1, that's always been my view. Although, to be fair, and as Sean mentioned, the jump from jdk8 is probably the harder part. After that it's less likely (hopefully?) that we'll run into issues in non-LTS releases. And even if we don't officially support them, trying to keep up with breaking

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-31 Thread Marcelo Vanzin
+1 On Mon, Oct 29, 2018 at 3:22 AM Wenchen Fan wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.4.0. > > The vote is open until November 1 PST and passes if a majority +1 PMC votes > are cast, with > a minimum of 3 +1 votes. > > [ ] +1 Release this package

Re: Starting to make changes for Spark 3 -- what can we delete?

2018-10-16 Thread Marcelo Vanzin
Might be good to take a look at things marked "@DeveloperApi" and whether they should stay that way. e.g. I was looking at SparkHadoopUtil and I've always wanted to just make it private to Spark. I don't see why apps would need any of those methods. On Tue, Oct 16, 2018 at 10:18 AM Sean Owen

Re: Remove Flume support in 3.0.0?

2018-10-10 Thread Marcelo Vanzin
BTW, although I did not file a bug for that, I think we should also consider getting rid of the kafka-0.8 connector. That would leave only kafka-0.10 as the single remaining dstream connector in Spark, though. (If you ignore kinesis which we can't ship in binary form or something like that?) On

Re: moving the spark jenkins job builder repo from dbricks --> spark

2018-10-10 Thread Marcelo Vanzin
Thanks for doing this. The more things we have accessible to the project members in general the better! (Now there's that hive fork repo somewhere, but let's not talk about that.) On Wed, Oct 10, 2018 at 9:30 AM shane knapp wrote: >> > * the JJB templates are able to be run by anyone w/jenkins

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

2018-10-08 Thread Marcelo Vanzin
On Mon, Oct 8, 2018 at 6:36 AM Rob Vesse wrote: > Since connectivity back to the client is a potential stumbling block for > cluster mode I wander if it would be better to think in reverse i.e. rather > than having the driver pull from the client have the client push to the > driver pod? > >

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

2018-10-05 Thread Marcelo Vanzin
On Fri, Oct 5, 2018 at 7:54 AM Rob Vesse wrote: > Ideally this would all just be handled automatically for users in the way > that all other resource managers do I think you're giving other resource managers too much credit. In cluster mode, only YARN really distributes local dependencies,

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-17 Thread Marcelo Vanzin
You can log in to https://repository.apache.org and see what's wrong. Just find that staging repo and look at the messages. In your case it seems related to your signature. failureMessageNo public key: Key with id: () was not able to be located on http://gpg-keyserver.de/. Upload your public

Re: data source api v2 refactoring

2018-09-04 Thread Marcelo Vanzin
Same here, I don't see anything from Wenchen... just replies to him. On Sat, Sep 1, 2018 at 9:31 PM Mridul Muralidharan wrote: > > > Is it only me or are all others getting Wenchen’s mails ? (Obviously Ryan did > :-) ) > I did not see it in the mail thread I received or in archives ... [1] >

Re: Nightly Builds in the docs (in spark-nightly/spark-master-bin/latest? Can't seem to find it)

2018-08-31 Thread Marcelo Vanzin
I think there still might be an active job publishing stuff. Here's a pretty recent build from master: https://dist.apache.org/repos/dist/dev/spark/2.4.0-SNAPSHOT-2018_08_31_12_02-32da87d-docs/_site/index.html But it seems only docs are being published, which makes me think it's those builds

Re: [discuss] replacing SPIP template with Heilmeier's Catechism?

2018-08-31 Thread Marcelo Vanzin
I like the questions (aside maybe from the cost one which perhaps does not matter much here), especially since they encourage explaining things in a more plain language than generally used by specs. But I don't think we can ignore design aspects; it's been my observation that a good portion of

Re: [VOTE] SPIP: Executor Plugin (SPARK-24918)

2018-08-28 Thread Marcelo Vanzin
your code > needs this init; I had understood the use cases to be more like "establish > some local config and init for this particular thing I'm doing for this > legacy system". > > On Tue, Aug 28, 2018 at 11:35 AM Marcelo Vanzin wrote: >> >> +1 >> >>

Re: [VOTE] SPIP: Executor Plugin (SPARK-24918)

2018-08-28 Thread Marcelo Vanzin
+1 Class init is not enough because there is nowhere for you to force a random class to be initialized. This is basically adding that mechanism, instead of forcing people to add hacks using e.g. mapPartitions which don't even cover all scenarios. On Tue, Aug 28, 2018 at 7:09 AM, Sean Owen

Re: Persisting driver logs in yarn client mode (SPARK-25118)

2018-08-24 Thread Marcelo Vanzin
I think this would be useful, but I also share Saisai's and Marco's concern about the extra step when shutting down the application. If that could be minimized this would be a much more interesting feature. e.g. you could upload logs incrementally to HDFS, asynchronously, while the app is

Re: [DISCUSS] SparkR support on k8s back-end for Spark 2.4

2018-08-15 Thread Marcelo Vanzin
On Wed, Aug 15, 2018 at 1:35 PM, shane knapp wrote: > in fact, i don't see us getting rid of all of the centos machines until EOY > (see my above comment, re docs, release etc). these are the builds that > will remain on centos for the near future: >

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

2018-08-13 Thread Marcelo Vanzin
On this topic... when I worked on 2.3.1 and caused this breakage by deleting and old release, I tried to write some code to make this more automatic: https://github.com/vanzin/spark/tree/SPARK-24532 I just found that the code was a little too large and hacky for what it does (find out the latest

[RESULT] [VOTE] Spark 2.1.3 (RC2)

2018-06-29 Thread Marcelo Vanzin
The vote passes. Thanks to all who helped with the release! I'll start publishing everything today, and an announcement will be sent when artifacts have propagated to the mirrors (probably early next week). +1 (* = binding): - Marcelo Vanzin * - Sean Owen * - Felix Cheung * - Tom Graves * +0

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
to older branches. On Thu, Jun 28, 2018 at 11:30 AM, Felix Cheung wrote: > If I recall we stop releasing Hadoop 2.3 or 2.4 in newer releases (2.2+?) - > that might be why they are not the release script. > > > ____ > From: Marcelo Vanzin > Sent: Thurs

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
Alright, uploaded the missing packages. I'll send a PR to update the release scripts just in case... On Thu, Jun 28, 2018 at 10:08 AM, Sean Owen wrote: > If it's easy enough to produce them, I agree you can just add them to the RC > dir. > > On Thu, Jun 28, 2018 at 11:56 AM Ma

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
a new RC. On Tue, Jun 26, 2018 at 1:25 PM, Marcelo Vanzin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.1.3. > > The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if a > majority +1 PMC votes are cast, with a minimu

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
BTW that would be a great fix in the docs now that we'll have a 2.3.2 being prepared. On Thu, Jun 28, 2018 at 9:17 AM, Felix Cheung wrote: > Exactly... > > > From: Marcelo Vanzin > Sent: Thursday, June 28, 2018 9:16:08 AM > To: Tom Graves >

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555 > > Since it isn’t a regression I’d say +1 from me. > > > ________ > From: Tom Graves > Sent: Thursday, June 28, 2018 6:56:16 AM > To: Marcelo Vanzin

Re: Time for 2.3.2?

2018-06-28 Thread Marcelo Vanzin
28, 2018 at 12:56 PM Saisai Shao >>>>> wrote: >>>>> >>>>>> +1, like mentioned by Marcelo, these issues seems quite severe. >>>>>> >>>>>> I can work on the release if short of hands :). >>>>>> >>>>

Re: Time for 2.3.2?

2018-06-27 Thread Marcelo Vanzin
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes for those out. (Those are what delayed 2.2.2 and 2.1.3 for those watching...) On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan wrote: > Hi all, > > Spark 2.3.1 was released just a while ago, but unfortunately we discovered > and

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-27 Thread Marcelo Vanzin
lakes: https://amplab.cs.berkeley.edu/jenkins/user/vanzin/my-views/view/Spark/ (Look for the 2.1 branch jobs.) > ____ > From: Marcelo Vanzin > Sent: Wednesday, June 27, 2018 6:55 PM > To: Felix Cheung > Cc: Marcelo Vanzin; Tom Graves; dev > > Sub

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-27 Thread Marcelo Vanzin
mes in code not in docs: > singular.ok > Mismatches in argument names: > Position: 16 Code: singular.ok Docs: contrasts > Position: 17 Code: contrasts Docs: ... > > > From: Sean Owen > Sent: Wednesday, June 27, 2018 5:02:37 AM > To: Marcelo Van

Re: [VOTE] Spark 2.2.2 (RC2)

2018-06-27 Thread Marcelo Vanzin
+1 Checked sigs + ran a bunch of tests on the hadoop-2.7 binary package. On Wed, Jun 27, 2018 at 1:30 PM, Tom Graves wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.2.2. > > The vote is open until Mon, July 2nd @ 9PM UTC (2PM PDT) and passes if a > majority

[VOTE] Spark 2.1.3 (RC2)

2018-06-26 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.1.3. The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.1.3 [ ] -1 Do not release this

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-26 Thread Marcelo Vanzin
Starting with my own +1. On Tue, Jun 26, 2018 at 1:25 PM, Marcelo Vanzin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.1.3. > > The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if a > majority +1 PMC votes are cast, with

Re: Time for 2.1.3

2018-06-19 Thread Marcelo Vanzin
12, 2018 at 4:27 PM, Marcelo Vanzin wrote: > Hey all, > > There are some fixes that went into 2.1.3 recently that probably > deserve a release. So as usual, please take a look if there's anything > else you'd like on that release, otherwise I'd like to start with the > process

Re: [ANNOUNCE] Announcing Apache Spark 2.3.1

2018-06-14 Thread Marcelo Vanzin
tured-streaming > Mastering Kafka Streams https://bit.ly/mastering-kafka-streams > Follow me at https://twitter.com/jaceklaskowski > > On Mon, Jun 11, 2018 at 9:47 PM, Marcelo Vanzin wrote: >> >> We are happy to announce the availability of Spark 2.3.1! >> >>

Re: Missing HiveConf when starting PySpark from head

2018-06-14 Thread Marcelo Vanzin
Yes, my bad. The code in session.py needs to also catch TypeError like before. On Thu, Jun 14, 2018 at 11:03 AM, Li Jin wrote: > Sounds good. Thanks all for the quick reply. > > https://issues.apache.org/jira/browse/SPARK-24563 > > > On Thu, Jun 14, 2018 at 12:19 PM, Xiao Li wrote: >> >> Thanks

Time for 2.1.3

2018-06-12 Thread Marcelo Vanzin
Hey all, There are some fixes that went into 2.1.3 recently that probably deserve a release. So as usual, please take a look if there's anything else you'd like on that release, otherwise I'd like to start with the process by early next week. I'll go through jira to see what's the status of

[ANNOUNCE] Announcing Apache Spark 2.3.1

2018-06-11 Thread Marcelo Vanzin
We are happy to announce the availability of Spark 2.3.1! Apache Spark 2.3.1 is a maintenance release, based on the branch-2.3 maintenance branch of Spark. We strongly recommend all 2.3.x users to upgrade to this stable release. To download Spark 2.3.1, head over to the download page:

[VOTE] [RESULT] Spark 2.3.1 (RC4)

2018-06-08 Thread Marcelo Vanzin
The vote passes. Thanks to all who helped with the release! I'll follow up later with a release announcement once everything is published. +1 (* = binding): - Marcelo Vanzin * - Reynold Xin * - Sean Owen * - Denny Lee - Dongjoon Hyun - Ricardo Almeida - Hyukjin Kwon - John Zhuge - Mark Hamstra

Re: Time for 2.2.2 release

2018-06-07 Thread Marcelo Vanzin
Took a look at our branch and most of the stuff that is not already in 2.2 are flaky test fixes, so +1. On Wed, Jun 6, 2018 at 7:54 AM, Tom Graves wrote: > Hello all, > > I think its time for another 2.2 release. > I took a look at Jira and I don't see anything explicitly targeted for 2.2.2 >

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-02 Thread Marcelo Vanzin
ork for me > either (even building with -Phadoop-2.7). I guess I’ve been relying on an > unsupported pattern and will need to figure something else out going forward > in order to use s3a://. > > > On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin wrote: >> >> I have

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Marcelo Vanzin
ct) and figure > out what I need to change (as due diligence for Flintrock’s users). > > Nick > > > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin wrote: >> >> Using the hadoop-aws package is probably going to be a little more >> complicated than that. The best

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Marcelo Vanzin
= local-m2-cache: tried > > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar > > I’d guess I’m probably using the wrong version of hadoop-aws, but I called > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to try. > >

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Marcelo Vanzin
Starting with my own +1 (binding). On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.3.1. > > Given that I expect at least a few people to be busy with Spark Summit next > week, I'm taking the lib

[VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.3.1. Given that I expect at least a few people to be busy with Spark Summit next week, I'm taking the liberty of setting an extended voting period. The vote will be open until Friday, June 8th, at 19:00 UTC (that's 12:00

Re: [VOTE] Spark 2.3.1 (RC3)

2018-06-01 Thread Marcelo Vanzin
. On Fri, Jun 1, 2018 at 1:20 PM, Xiao Li wrote: > Sorry, I need to say -1 > > This morning, just found a regression in 2.3.1 and reverted > https://github.com/apache/spark/pull/21443 > > Xiao > > 2018-06-01 13:09 GMT-07:00 Marcelo Vanzin : >> >> Please vote

[VOTE] Spark 2.3.1 (RC3)

2018-06-01 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.3.1. Given that I expect at least a few people to be busy with Spark Summit next week, I'm taking the liberty of setting an extended voting period. The vote will be open until Friday, June 8th, at 19:00 UTC (that's 12:00

Re: [VOTE] Spark 2.3.1 (RC2)

2018-05-25 Thread Marcelo Vanzin
up creating throwaway RCs that are just overhead. On Tue, May 22, 2018 at 12:45 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.3.1. > > The vote is open until Friday, May 25, at 20:00 UTC and passes

Re: [VOTE] Spark 2.3.1 (RC2)

2018-05-23 Thread Marcelo Vanzin
this fix to Spark 2.0, 2.1, 2.2, and then we can >> > discuss if we should do a new release for 2.0, 2.1, 2.2 later. >> > >> > Thanks, >> > Wenchen >> > >> > On Wed, May 23, 2018 at 9:54 PM, Sean Owen >> >> > srowen@ >> >

Re: [VOTE] Spark 2.3.1 (RC2)

2018-05-22 Thread Marcelo Vanzin
Starting with my own +1. Did the same testing as RC1. On Tue, May 22, 2018 at 12:45 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.3.1. > > The vote is open until Friday, May 25, at 20:00 UTC and pa

[VOTE] Spark 2.3.1 (RC2)

2018-05-22 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.3.1. The vote is open until Friday, May 25, at 20:00 UTC and passes if at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.3.1 [ ] -1 Do not release this package because ... To learn more

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-21 Thread Marcelo Vanzin
FYI the fix for the blocker has just been committed. I'll prepare RC2 tomorrow morning assuming jenkins is reasonably happy with the current state of the branch. On Fri, May 18, 2018 at 10:39 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > Just to give folks an update. > > In c

Re: Running lint-java during PR builds?

2018-05-21 Thread Marcelo Vanzin
t; backed up since - if I recall - all of ASF shares one queue. > > At the number of PRs Spark has this could be a big issue. > > > ________ > From: Marcelo Vanzin <van...@cloudera.com> > Sent: Monday, May 21, 2018 9:08:28 AM > To: Hyukjin Kwon >

Re: Running lint-java during PR builds?

2018-05-21 Thread Marcelo Vanzin
e give a change to this? >>> >>> Bests, >>> Dongjoon. >>> >>> On 2016-11-15 13:40 (-0800), "Shixiong(Ryan) Zhu" >>> <shixi...@databricks.com> wrote: >>> > I remember it's because you need to run `mvn install` before running &g

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-18 Thread Marcelo Vanzin
309 which is > pretty serious. I've marked it a blocker, I think it should go into 2.3.1. > I'll also take a closer look comparing to the behavior of the old listener > bus. > > On Thu, May 17, 2018 at 12:18 PM, Marcelo Vanzin <van...@cloudera.com> > wrote: >> >>

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-17 Thread Marcelo Vanzin
Wenchen reviewed and pushed that change, so he's the most qualified to make that decision. I plan to cut a new RC tomorrow so hopefully he'll see this by then. On Thu, May 17, 2018 at 10:13 AM, Artem Rudoy wrote: > Can we include

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-16 Thread Marcelo Vanzin
lease. I >> think such a release should contain only bugfixes. >> >> 2018-05-16 12:11 GMT+02:00 kant kodali <kanth...@gmail.com>: >>> >>> Can this https://issues.apache.org/jira/browse/SPARK-23406 be part of >>> 2.3.1? >>> >>> On Tue, Ma

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Marcelo Vanzin
It's in. That link is only a list of the currently open bugs. On Tue, May 15, 2018 at 2:02 PM, Justin Miller <justin.mil...@protectwise.com> wrote: > Did SPARK-24067 not make it in? I don’t see it in https://s.apache.org/Q3Uo. > > Thanks, > Justin > > On May 15, 2018, at

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Marcelo Vanzin
. Still learning the ropes. Also, if you plan on doing this in the future, *do not* do "svn co" on the dist.apache.org repo. The ASF Infra folks will not be very kind to you. I'll update our RM docs later. On Tue, May 15, 2018 at 2:00 PM, Marcelo Vanzin <van...@cloudera.com> wrot

[VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.3.1. The vote is open until Friday, May 18, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.3.1 [ ] -1 Do not release this package because ... To

Time for 2.3.1?

2018-05-10 Thread Marcelo Vanzin
Hello all, It's been a while since we shipped 2.3.0 and lots of important bug fixes have gone into the branch since then. I took a look at Jira and it seems there's not a lot of things explicitly targeted at 2.3.1 - the only potential blocker (a parquet issue) is being worked on since a new

Re: Spark UI Source Code

2018-05-07 Thread Marcelo Vanzin
On Mon, May 7, 2018 at 1:44 AM, Anshi Shrivastava wrote: > I've found a KVStore wrapper which stores all the metrics in a LevelDb > store. This KVStore wrapper is available as a spark-dependency but we cannot > access the metrics directly from spark since they are

Re: time for Apache Spark 3.0?

2018-04-05 Thread Marcelo Vanzin
On Thu, Apr 5, 2018 at 10:30 AM, Matei Zaharia wrote: > Sorry, but just to be clear here, this is the 2.12 API issue: > https://issues.apache.org/jira/browse/SPARK-14643, with more details in this > doc: >

Re: time for Apache Spark 3.0?

2018-04-05 Thread Marcelo Vanzin
I remember seeing somewhere that Scala still has some issues with Java 9/10 so that might be hard... But on that topic, it might be better to shoot for Java 11 compatibility. 9 and 10, following the new release model, aren't really meant to be long-term releases. In general, agree with Sean

Re: Hadoop 3 support

2018-04-02 Thread Marcelo Vanzin
On Mon, Apr 2, 2018 at 2:58 PM, Reynold Xin <r...@databricks.com> wrote: > Is it difficult to upgrade Hive execution version to the latest version? The > metastore used to be an issue but now that part had been separated from the > execution part. > > > On Mon,

  1   2   3   4   >