Re: [VOTE] Decommissioning SPIP

2020-07-01 Thread Marcelo Vanzin
I reviewed the docs and PRs from way before an SPIP was explicitly
asked, so I'm comfortable with giving a +1 even if I haven't really
fully read the new document,

On Wed, Jul 1, 2020 at 6:05 PM Holden Karau  wrote:
>
> Hi Spark Devs,
>
> I think discussion has settled on the SPIP doc at 
> https://docs.google.com/document/d/1EOei24ZpVvR7_w0BwBjOnrWRy4k-qTdIlx60FsHZSHA/edit?usp=sharing
>  , design doc at 
> https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE/edit,
>  or JIRA https://issues.apache.org/jira/browse/SPARK-20624, and I've received 
> a request to put the SPIP up for a VOTE quickly. The discussion thread on the 
> mailing list is at 
> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-SPIP-Graceful-Decommissioning-td29650.html.
>
> Normally this vote would be open for 72 hours, however since it's a long 
> weekend in the US where many of the PMC members are, this vote will not close 
> before July 6th at noon pacific time.
>
> The SPIP procedures are documented at: 
> https://spark.apache.org/improvement-proposals.html. The ASF's voting guide 
> is at https://www.apache.org/foundation/voting.html.
>
> Please vote before July 6th at noon:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don't think this is a good idea because ...
>
> I will start the voting off with a +1 from myself.
>
> Cheers,
>
> Holden



-- 
Marcelo Vanzin
van...@gmail.com
"Life's too short to drink cheap beer"

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-10 Thread Marcelo Vanzin
-0.5, mostly because this requires extra things not in the default
packaging...

But if you add the hadoop-aws libraries and dependencies to Spark built
with Hadoop 3, things don't work:

$ ./bin/spark-shell --jars s3a://blah
20/04/10 16:28:32 WARN Utils: Your hostname, vanzin-t480 resolves to a
loopback address: 127.0.1.1; using 192.168.2.14 instead (on interface
wlp3s0)
20/04/10 16:28:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind
to another address
20/04/10 16:28:32 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where
applicable
20/04/10 16:28:32 WARN MetricsConfig: Cannot locate configuration:
tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V
at
org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:816)
at
org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:792)
at
org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:747)
at
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider.(SimpleAWSCredentialsProvider.java:58)
at
org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:600)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:260)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at
org.apache.spark.deploy.DependencyUtils$.resolveGlobPath(DependencyUtils.scala:191)

That's because Hadoop 3.2 is using Guava 27 and Spark still ships Guava 14
(which is ok for Hadoop 2).


On Tue, Mar 31, 2020 at 8:05 PM Reynold Xin  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.0.0.
>
> The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.0.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.0.0-rc1 (commit
> 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
> https://github.com/apache/spark/tree/v3.0.0-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1341/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-docs/
>
> The list of bug fixes going into 2.4.5 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12339177
>
> This release is using the release script of the tag v3.0.0-rc1.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.0.0?
> ===
> The current list of open tickets targeted at 3.0.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.0.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said

Re: Keytab, Proxy User & Principal

2020-03-12 Thread Marcelo Vanzin
On Fri, Feb 28, 2020 at 6:21 AM Lars Francke  wrote:

> Can we not allow specifying a keytab and principal together with proxy
> user but those are only used for the initial login to submit the job and
> are not shipped to the cluster? This way jobs wouldn't need to rely on the
> operating system.
>

I'm not sure I 100% understand your use case (even if multiple services are
using the credential cache, why would that be a problem?), but from Spark's
side, the only issue with this is making it clear to the user when things
are being submitted one way or another.

But frankly this feels more like something better taken care of in Livy
(e.g. by using KRB5CCNAME when running spark-submit).

-- 
Marcelo Vanzin
van...@gmail.com
"Life's too short to drink cheap beer"


Re: Jenkins looks hosed

2019-12-23 Thread Marcelo Vanzin
Enjoy your break Shane. And at least from my part, don't even bother
checking in - I'm sure we can survive a few days if Jenkins
misbehaves.

On Mon, Dec 23, 2019 at 2:01 PM Shane Knapp  wrote:
>
> i'll be out of the country and on vacation until early january, but
> i'll make a point to check in every couple of days to ensure that
> jenkins is happy.
>
> On Mon, Dec 23, 2019 at 12:25 PM Shane Knapp  wrote:
> >
> > yep, it was most definitely wedged.  restarted the service and it's back up!
> >
> > On Mon, Dec 23, 2019 at 12:23 PM Shane Knapp  wrote:
> > >
> > > checking it now.
> > >
> > > On Mon, Dec 23, 2019 at 11:27 AM Marcelo Vanzin
> > >  wrote:
> > > >
> > > > Just in the off-chance that someone with admin access to the Jenkins
> > > > servers is around this week... they seem to be in a pretty unhappy
> > > > state, I can't even load the UI.
> > > >
> > > > FYI in case you're waiting for your PR tests to finish (or even start 
> > > > running).
> > > >
> > > > --
> > > > Marcelo
> > > >
> > > > -
> > > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > > >
> > >
> > >
> > > --
> > > Shane Knapp
> > > Computer Guy / Voice of Reason
> > > UC Berkeley EECS Research / RISELab Staff Technical Lead
> > > https://rise.cs.berkeley.edu
> >
> >
> >
> > --
> > Shane Knapp
> > Computer Guy / Voice of Reason
> > UC Berkeley EECS Research / RISELab Staff Technical Lead
> > https://rise.cs.berkeley.edu
>
>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Jenkins looks hosed

2019-12-23 Thread Marcelo Vanzin
Just in the off-chance that someone with admin access to the Jenkins
servers is around this week... they seem to be in a pretty unhappy
state, I can't even load the UI.

FYI in case you're waiting for your PR tests to finish (or even start running).

-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Do we need to finally update Guava?

2019-12-16 Thread Marcelo Vanzin
Great that Hadoop has done it (which, btw, probably means that Spark
won't work with that version of Hadoop yet), but Hive also depends on
Guava, and last time I tried, even Hive 3.x did not work with Guava
27.

(Newer Hadoop versions also have a new artifact that shades a lot of
dependencies, which would be great for Spark. But since Spark uses
some test artifacts from Hadoop, that may be a bit tricky, since I
don't believe those are shaded.)

On Sun, Dec 15, 2019 at 8:08 AM Sean Owen  wrote:
>
> See for example:
>
> https://github.com/apache/spark/pull/25932#issuecomment-565822573
> https://issues.apache.org/jira/browse/SPARK-23897
>
> This is a dicey dependency that we have been reluctant to update as a)
> Hadoop used an old version and b) Guava versions are incompatible
> after a few releases.
>
> But Hadoop is going all the way from 11 to 27 in Hadoop 3.2.1. Time to
> match that? I haven't assessed how much internal change it requires.
> If it's a lot, well, that makes it hard, as we need to stay compatible
> with Hadoop 2 / Guava 11-14. But then that causes a problem updating
> past Hadoop 3.2.0.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: dev/merge_spark_pr.py broken on python 2

2019-11-08 Thread Marcelo Vanzin
I remember merging PRs with non-ascii chars in the past...

Anyway, for these scripts, might be easier to just use python3 for
everything, instead of trying to keep them working on two different
versions.

On Fri, Nov 8, 2019 at 10:28 AM Sean Owen  wrote:
>
> Ah OK. I think it's the same type of issue that the last change
> actually was trying to fix for Python 2. Here it seems like the author
> name might have non-ASCII chars?
> I don't immediately know enough to know how to resolve that for Python
> 2. Something with how raw_input works, I take it. You could 'fix' the
> author name if that's the case, or just use python 3.
>
> On Fri, Nov 8, 2019 at 12:20 PM Marcelo Vanzin  wrote:
> >
> > Something related to non-ASCII characters. Worked fine with python 3.
> >
> > git branch -D PR_TOOL_MERGE_PR_26426_MASTER
> > Traceback (most recent call last):
> >   File "./dev/merge_spark_pr.py", line 577, in 
> > main()
> >   File "./dev/merge_spark_pr.py", line 552, in main
> > merge_hash = merge_pr(pr_num, target_ref, title, body, pr_repo_desc)
> >   File "./dev/merge_spark_pr.py", line 147, in merge_pr
> > distinct_authors[0])
> > UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in
> > position 65: ordinal not in range(128)
> > M   docs/running-on-kubernetes.md
> > Already on 'master'
> > Your branch is up to date with 'apache-github/master'.
> > error: cannot pull with rebase: Your index contains uncommitted changes.
> > error: please commit or stash them.
> >
> > On Fri, Nov 8, 2019 at 10:17 AM Sean Owen  wrote:
> > >
> > > Hm, the last change was on Oct 1, and should have actually helped it
> > > still work with Python 2:
> > > https://github.com/apache/spark/commit/2ec3265ae76fc1e136e44c240c476ce572b679df#diff-c321b6c82ebb21d8fd225abea9b7b74c
> > >
> > > Hasn't otherwise changed in a while. What's the error?
> > >
> > > On Fri, Nov 8, 2019 at 11:37 AM Marcelo Vanzin
> > >  wrote:
> > > >
> > > > Hey all,
> > > >
> > > > Something broke that script when running with python 2.
> > > >
> > > > I know we want to deprecate python 2, but in that case, scripts should
> > > > at least be changed to use "python3" in the shebang line...
> > > >
> > > > --
> > > > Marcelo
> > > >
> > > > -
> > > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > > >
> >
> >
> >
> > --
> > Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: dev/merge_spark_pr.py broken on python 2

2019-11-08 Thread Marcelo Vanzin
Something related to non-ASCII characters. Worked fine with python 3.

git branch -D PR_TOOL_MERGE_PR_26426_MASTER
Traceback (most recent call last):
  File "./dev/merge_spark_pr.py", line 577, in 
main()
  File "./dev/merge_spark_pr.py", line 552, in main
merge_hash = merge_pr(pr_num, target_ref, title, body, pr_repo_desc)
  File "./dev/merge_spark_pr.py", line 147, in merge_pr
distinct_authors[0])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in
position 65: ordinal not in range(128)
M   docs/running-on-kubernetes.md
Already on 'master'
Your branch is up to date with 'apache-github/master'.
error: cannot pull with rebase: Your index contains uncommitted changes.
error: please commit or stash them.

On Fri, Nov 8, 2019 at 10:17 AM Sean Owen  wrote:
>
> Hm, the last change was on Oct 1, and should have actually helped it
> still work with Python 2:
> https://github.com/apache/spark/commit/2ec3265ae76fc1e136e44c240c476ce572b679df#diff-c321b6c82ebb21d8fd225abea9b7b74c
>
> Hasn't otherwise changed in a while. What's the error?
>
> On Fri, Nov 8, 2019 at 11:37 AM Marcelo Vanzin
>  wrote:
> >
> > Hey all,
> >
> > Something broke that script when running with python 2.
> >
> > I know we want to deprecate python 2, but in that case, scripts should
> > at least be changed to use "python3" in the shebang line...
> >
> > --
> > Marcelo
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



dev/merge_spark_pr.py broken on python 2

2019-11-08 Thread Marcelo Vanzin
Hey all,

Something broke that script when running with python 2.

I know we want to deprecate python 2, but in that case, scripts should
at least be changed to use "python3" in the shebang line...

-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-28 Thread Marcelo Vanzin
+1

On Mon, Aug 26, 2019 at 1:28 PM Kazuaki Ishizaki  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.3.4.
>
> The vote is open until August 29th 2PM PST and passes if a majority +1 PMC 
> votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.3.4
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v2.3.4-rc1 (commit 
> 8c6f8150f3c6298ff4e1c7e06028f12d7eaf0210):
> https://github.com/apache/spark/tree/v2.3.4-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.4-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1331/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.4-rc1-docs/
>
> The list of bug fixes going into 2.3.4 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12344844
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.4?
> ===
>
> The current list of open tickets targeted at 2.3.4 can be found at:
> https://issues.apache.org/jira/projects/SPARKand search for "Target 
> Version/s" = 2.3.4
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-28 Thread Marcelo Vanzin
(Ah, and the 2.4 RC has the same issue.)

On Wed, Aug 28, 2019 at 2:23 PM Marcelo Vanzin  wrote:
>
> Just noticed something before I started to run some tests. The output
> of "spark-submit --version" is a little weird, in that it's missing
> information (see end of e-mail).
>
> Personally I don't think a lot of that output is super useful (like
> "Compiled by" or the repo URL), but the branch and revision are.
>
> Could be an artifact of the docker-based build scripts. I don't think
> it should block the release, but we should fix that.
>
>
> $ spark2-submit --version
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.4
>   /_/
>
> Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_111
> Branch
> Compiled by user  on 2019-08-26T08:09:33Z
> Revision
> Url
> Type --help for more information.
>
> On Mon, Aug 26, 2019 at 1:28 PM Kazuaki Ishizaki  wrote:
> >
> > Please vote on releasing the following candidate as Apache Spark version 
> > 2.3.4.
> >
> > The vote is open until August 29th 2PM PST and passes if a majority +1 PMC 
> > votes are cast, with
> > a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 2.3.4
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see https://spark.apache.org/
> >
> > The tag to be voted on is v2.3.4-rc1 (commit 
> > 8c6f8150f3c6298ff4e1c7e06028f12d7eaf0210):
> > https://github.com/apache/spark/tree/v2.3.4-rc1
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.3.4-rc1-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1331/
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.3.4-rc1-docs/
> >
> > The list of bug fixes going into 2.3.4 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12344844
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 2.3.4?
> > ===
> >
> > The current list of open tickets targeted at 2.3.4 can be found at:
> > https://issues.apache.org/jira/projects/SPARKand search for "Target 
> > Version/s" = 2.3.4
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
>
>
> --
> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-28 Thread Marcelo Vanzin
Just noticed something before I started to run some tests. The output
of "spark-submit --version" is a little weird, in that it's missing
information (see end of e-mail).

Personally I don't think a lot of that output is super useful (like
"Compiled by" or the repo URL), but the branch and revision are.

Could be an artifact of the docker-based build scripts. I don't think
it should block the release, but we should fix that.


$ spark2-submit --version
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.4
  /_/

Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_111
Branch
Compiled by user  on 2019-08-26T08:09:33Z
Revision
Url
Type --help for more information.

On Mon, Aug 26, 2019 at 1:28 PM Kazuaki Ishizaki  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.3.4.
>
> The vote is open until August 29th 2PM PST and passes if a majority +1 PMC 
> votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.3.4
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v2.3.4-rc1 (commit 
> 8c6f8150f3c6298ff4e1c7e06028f12d7eaf0210):
> https://github.com/apache/spark/tree/v2.3.4-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.4-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1331/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.4-rc1-docs/
>
> The list of bug fixes going into 2.3.4 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12344844
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.4?
> ===
>
> The current list of open tickets targeted at 2.3.4 can be found at:
> https://issues.apache.org/jira/projects/SPARKand search for "Target 
> Version/s" = 2.3.4
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.4.4 (RC3)

2019-08-28 Thread Marcelo Vanzin
+1

On Tue, Aug 27, 2019 at 4:06 PM Dongjoon Hyun  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.4.4.
>
> The vote is open until August 30th 5PM PST and passes if a majority +1 PMC 
> votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.4
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.4-rc3 (commit 
> 7955b3962ac46b89564e0613db7bea98a1478bf2):
> https://github.com/apache/spark/tree/v2.4.4-rc3
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.4-rc3-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1332/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.4-rc3-docs/
>
> The list of bug fixes going into 2.4.4 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12345466
>
> This release is using the release script of the tag v2.4.4-rc3.
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.4?
> ===
>
> The current list of open tickets targeted at 2.4.4 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.4.4
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf

2019-04-05 Thread Marcelo Vanzin
You can always try. But Hadoop 3 is not yet supported by Spark.

On Fri, Apr 5, 2019 at 11:13 AM Anton Kirillov
 wrote:
>
> Marcelo, Sean, thanks for the clarification. So in order to support Hadoop 3+ 
> the preferred way would be to use Hadoop-free builds and provide Hadoop 
> dependencies in the classpath, is that correct?
>
> On Fri, Apr 5, 2019 at 10:57 AM Marcelo Vanzin  wrote:
>>
>> The hadoop-3 profile doesn't really work yet, not even on master.
>> That's being worked on still.
>>
>> On Fri, Apr 5, 2019 at 10:53 AM akirillov  
>> wrote:
>> >
>> > Hi there! I'm trying to run Spark unit tests with the following profiles:
>> >
>> > And 'core' module fails with the following test failing with
>> > NoClassDefFoundError:
>> >
>> > In the meantime building a distribution works fine when running:
>> >
>> > Also, there are no problems with running tests using Hadoop 2.7 profile.
>> > Does this issue look familiar? Any help appreciated!
>> >
>> >
>> >
>> > --
>> > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>>
>>
>> --
>> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf

2019-04-05 Thread Marcelo Vanzin
The hadoop-3 profile doesn't really work yet, not even on master.
That's being worked on still.

On Fri, Apr 5, 2019 at 10:53 AM akirillov  wrote:
>
> Hi there! I'm trying to run Spark unit tests with the following profiles:
>
> And 'core' module fails with the following test failing with
> NoClassDefFoundError:
>
> In the meantime building a distribution works fine when running:
>
> Also, there are no problems with running tests using Hadoop 2.7 profile.
> Does this issue look familiar? Any help appreciated!
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-28 Thread Marcelo Vanzin
(Anybody knows what's the deal with all the .invalid e-mail addresses?)

Anyway. ASF has voting rules, and some things like releases follow
specific rules:
https://www.apache.org/foundation/voting.html#ReleaseVotes

So, for releases, ultimately, the only votes that "count" towards the
final tally are PMC votes. But everyone is welcome to vote, especially
if they have a reason to -1 a release. PMC members can use that to
guide how they vote, or the RM can use that to drop the RC
unilaterally if he agrees with the reason.


On Thu, Mar 28, 2019 at 3:47 PM Jonatan Jäderberg
 wrote:
>
> +1 (user vote)
>
> btw what to call a vote that is not pmc or committer?
> Some people use "non-binding”, but nobody says “my vote is binding”, and if 
> some vote is important to me, I still need to look up the who’s-who of the 
> project to be able to tally the votes.
> I like `user vote` for someone who has their say but is not speaking with any 
> authority (i.e., not pmc/committer). wdyt?
>
> Also, let’s get this release out the door!
>
> cheers,
> Jonatan
>
> On 28 Mar 2019, at 21:31, DB Tsai  wrote:
>
> +1 from myself
>
> On Thu, Mar 28, 2019 at 3:14 AM Mihaly Toth  
> wrote:
>>
>> +1 (non-binding)
>>
>> Thanks, Misi
>>
>> Sean Owen  ezt írta (időpont: 2019. márc. 28., Cs, 0:19):
>>>
>>> +1 from me - same as last time.
>>>
>>> On Wed, Mar 27, 2019 at 1:31 PM DB Tsai  wrote:
>>> >
>>> > Please vote on releasing the following candidate as Apache Spark version 
>>> > 2.4.1.
>>> >
>>> > The vote is open until March 30 PST and passes if a majority +1 PMC votes 
>>> > are cast, with
>>> > a minimum of 3 +1 votes.
>>> >
>>> > [ ] +1 Release this package as Apache Spark 2.4.1
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> > To learn more about Apache Spark, please see http://spark.apache.org/
>>> >
>>> > The tag to be voted on is v2.4.1-rc9 (commit 
>>> > 58301018003931454e93d8a309c7149cf84c279e):
>>> > https://github.com/apache/spark/tree/v2.4.1-rc9
>>> >
>>> > The release files, including signatures, digests, etc. can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc9-bin/
>>> >
>>> > Signatures used for Spark RCs can be found in this file:
>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >
>>> > The staging repository for this release can be found at:
>>> > https://repository.apache.org/content/repositories/orgapachespark-1319/
>>> >
>>> > The documentation corresponding to this release can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc9-docs/
>>> >
>>> > The list of bug fixes going into 2.4.1 can be found at the following URL:
>>> > https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
>>> >
>>> > FAQ
>>> >
>>> > =
>>> > How can I help test this release?
>>> > =
>>> >
>>> > If you are a Spark user, you can help us test this release by taking
>>> > an existing Spark workload and running on this release candidate, then
>>> > reporting any regressions.
>>> >
>>> > If you're working in PySpark you can set up a virtual env and install
>>> > the current RC and see if anything important breaks, in the Java/Scala
>>> > you can add the staging repository to your projects resolvers and test
>>> > with the RC (make sure to clean up the artifact cache before/after so
>>> > you don't end up building with a out of date RC going forward).
>>> >
>>> > ===
>>> > What should happen to JIRA tickets still targeting 2.4.1?
>>> > ===
>>> >
>>> > The current list of open tickets targeted at 2.4.1 can be found at:
>>> > https://issues.apache.org/jira/projects/SPARK and search for "Target 
>>> > Version/s" = 2.4.1
>>> >
>>> > Committers should look at those and triage. Extremely important bug
>>> > fixes, documentation, and API tweaks that impact compatibility should
>>> > be worked on immediately. Everything else please retarget to an
>>> > appropriate release.
>>> >
>>> > ==
>>> > But my bug isn't fixed?
>>> > ==
>>> >
>>> > In order to make timely releases, we will typically not hold the
>>> > release unless the bug in question is a regression from the previous
>>> > release. That being said, if there is something which is a regression
>>> > that has not been correctly targeted please ping me or a committer to
>>> > help target the issue.
>>> >
>>> >
>>> > DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   
>>> > Apple, Inc
>>> >
>>> >
>>> > -
>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
> --
> - DB Sent from my iPhone
>
>


-- 
Marcelo

-
To unsubscribe e-mail: 

Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-14 Thread Marcelo Vanzin
Not sure if anyone else is having the same issues, but jenkins seems
to be in a weird state.

I can't connect to the web UI, and it doesn't seem to be responding to
test requests.

On Wed, Mar 13, 2019 at 3:15 PM shane knapp  wrote:
>
> upgrade completed, jenkins building again...  master PR merged, waiting for 
> the 2.4.1 PR to launch the k8s integration tests.
>
> On Wed, Mar 13, 2019 at 2:55 PM shane knapp  wrote:
>>
>> okie dokie!  the time approacheth!
>>
>> i'll pause jenkins @ 3pm to not accept new jobs.  i don't expect the upgrade 
>> to take more than 15-20 mins, following which i will re-enable builds.
>>
>> On Wed, Mar 13, 2019 at 12:17 PM shane knapp  wrote:
>>>
>>> ok awesome.  let's shoot for 3pm PST.
>>>
>>> On Wed, Mar 13, 2019 at 11:59 AM Marcelo Vanzin  wrote:
>>>>
>>>> On Wed, Mar 13, 2019 at 11:53 AM shane knapp  wrote:
>>>> > On Wed, Mar 13, 2019 at 11:49 AM Marcelo Vanzin  
>>>> > wrote:
>>>> >>
>>>> >> Do the upgraded minikube/k8s versions break the current master client
>>>> >> version too?
>>>> >>
>>>> > yes.
>>>>
>>>> Ah, so that part kinda sucks.
>>>>
>>>> Let's do this: since the master PR is good to go pending the minikube
>>>> upgrade, let's try to synchronize things. Set a time to do the
>>>> minikube upgrade this PM, if that works for you, and I'll merge that
>>>> PR once it's done. Then I'll take care of backporting it to 2.4 and
>>>> make sure it passes the integration tests.
>>>>
>>>> --
>>>> Marcelo
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-14 Thread Marcelo Vanzin
IMO we can vote on rc8; that upgrade should not block it. If it fails
for some other reason, then rc9 will have it.

Supported versions in k8s-land are really weird and don't match well
to our release schedule. There isn't a good solution to this at the
moment... good thing k8s support is still marked as experimental. :-)

On Thu, Mar 14, 2019 at 10:56 AM DB Tsai  wrote:
>
> Since rc8 was already cut without the k8s client upgrade; the build is
> ready to vote, and including k8s client upgrade in 2.4.1 implies that
> we will drop the old-but-not-that-old
> K8S versions as Sean mentioned, should we include this upgrade in 2.4.2?
>
> Thanks.
>
> Sincerely,
>
> DB Tsai
> --
> Web: https://www.dbtsai.com
> PGP Key ID: 42E5B25A8F7A82C1
>
> On Thu, Mar 14, 2019 at 9:48 AM shane knapp  wrote:
> >
> > thanks everyone, both PRs are merged.  :)
> >
> > On Wed, Mar 13, 2019 at 3:51 PM shane knapp  wrote:
> >>
> >> btw, let's wait and see if the non-k8s PRB tests pass before merging 
> >> https://github.com/apache/spark/pull/23993 in to 2.4.1
> >>
> >> On Wed, Mar 13, 2019 at 3:42 PM shane knapp  wrote:
> >>>
> >>> 2.4.1 k8s integration test passed:
> >>>
> >>> https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8875/
> >>>
> >>> thanks everyone!  :)
> >>>
> >>> On Wed, Mar 13, 2019 at 3:24 PM shane knapp  wrote:
> >>>>
> >>>> 2.4.1 integration tests running:  
> >>>> https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8875/
> >>>>
> >>>> On Wed, Mar 13, 2019 at 3:15 PM shane knapp  wrote:
> >>>>>
> >>>>> upgrade completed, jenkins building again...  master PR merged, waiting 
> >>>>> for the 2.4.1 PR to launch the k8s integration tests.
> >>>>>
> >>>>> On Wed, Mar 13, 2019 at 2:55 PM shane knapp  wrote:
> >>>>>>
> >>>>>> okie dokie!  the time approacheth!
> >>>>>>
> >>>>>> i'll pause jenkins @ 3pm to not accept new jobs.  i don't expect the 
> >>>>>> upgrade to take more than 15-20 mins, following which i will re-enable 
> >>>>>> builds.
> >>>>>>
> >>>>>> On Wed, Mar 13, 2019 at 12:17 PM shane knapp  
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> ok awesome.  let's shoot for 3pm PST.
> >>>>>>>
> >>>>>>> On Wed, Mar 13, 2019 at 11:59 AM Marcelo Vanzin  
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> On Wed, Mar 13, 2019 at 11:53 AM shane knapp  
> >>>>>>>> wrote:
> >>>>>>>> > On Wed, Mar 13, 2019 at 11:49 AM Marcelo Vanzin 
> >>>>>>>> >  wrote:
> >>>>>>>> >>
> >>>>>>>> >> Do the upgraded minikube/k8s versions break the current master 
> >>>>>>>> >> client
> >>>>>>>> >> version too?
> >>>>>>>> >>
> >>>>>>>> > yes.
> >>>>>>>>
> >>>>>>>> Ah, so that part kinda sucks.
> >>>>>>>>
> >>>>>>>> Let's do this: since the master PR is good to go pending the minikube
> >>>>>>>> upgrade, let's try to synchronize things. Set a time to do the
> >>>>>>>> minikube upgrade this PM, if that works for you, and I'll merge that
> >>>>>>>> PR once it's done. Then I'll take care of backporting it to 2.4 and
> >>>>>>>> make sure it passes the integration tests.
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Marcelo
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Shane Knapp
> >>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
> >>>>>>> https://rise.cs.berkeley.edu
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Shane Knapp
> >>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
> >>>>>> https://rise.cs.berkeley.edu
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Shane Knapp
> >>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
> >>>>> https://rise.cs.berkeley.edu
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Shane Knapp
> >>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
> >>>> https://rise.cs.berkeley.edu
> >>>
> >>>
> >>>
> >>> --
> >>> Shane Knapp
> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
> >>> https://rise.cs.berkeley.edu
> >>
> >>
> >>
> >> --
> >> Shane Knapp
> >> UC Berkeley EECS Research / RISELab Staff Technical Lead
> >> https://rise.cs.berkeley.edu
> >
> >
> >
> > --
> > Shane Knapp
> > UC Berkeley EECS Research / RISELab Staff Technical Lead
> > https://rise.cs.berkeley.edu



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Request to disable a bot account, 'Thincrs' in JIRA of Apache Spark

2019-03-13 Thread Marcelo Vanzin
Go for it. I would do it now, instead of waiting, since there's been
enough time for them to take action.

On Wed, Mar 13, 2019 at 4:32 PM Hyukjin Kwon  wrote:
>
> Looks this bot keeps working. I am going to open a INFRA JIRA to block this 
> bot in few days.
> Please let me know if you guys have a different idea to prevent this.
>
> 2019년 3월 13일 (수) 오전 8:16, Hyukjin Kwon 님이 작성:
>>
>> Hi whom it may concern in Thincrs
>>
>>
>>
>> I am still observing this bot misuses Apache Spark’s JIRA board (see 
>> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=Thincrs)
>>
>> I contacted you guys once before but I haven’t got any response related with 
>> it. Still, this bot in this specific company looks misusing Apahce JIRA 
>> board.
>> If it continues, I think we should block this bot. Could you guys stop 
>> misusing this bot please?
>>
>>
>>
>> From: Hyukjin Kwon 
>> Date: Tuesday, January 8, 2019 at 11:18 AM
>> To: "h...@thincrs.com" 
>> Subject: Request to disable a bot account, 'Thincrs' in JIRA of Apache Spark
>>
>>
>>
>> Hi all,
>>
>>
>>
>>
>>
>> We, Apache Spark community, lately noticed one bot named ‘Thincrs’ in Apache 
>> Spark’s JIRA:  https://issues.apache.org/jira/issues/?jql=text%20~%20Thincrs
>>
>>
>>
>> Looks like this is a bot and it keeps leaving some comments such as:
>>
>>
>>
>>   A user of thincrs has selected this issue. Deadline: Xxx, Xxx X,  XX:XX
>>
>>
>>
>>
>>
>> This makes some noise to Apache Spark maintainers, committers, contributors 
>> and users. It was asked (by me) to Spark’s dev mailing list before:
>>
>>
>>
>>   
>> http://apache-spark-developers-list.1001551.n3.nabble.com/A-user-of-thincrs-has-selected-this-issue-Deadline-Xxx-Xxx-X--XX-XX-td25836.html
>>
>>
>>
>> And, one of PMCs in Apache Spark contacted to stop this bot if I am not 
>> mistaken.
>>
>>
>>
>>
>>
>> Lately, I noticed again this bot left a comment again as below:
>>
>>
>>
>>   Thincrs commented on SPARK-25823:
>>
>>   -
>>
>>
>>
>>   A user of thincrs has selected this issue. Deadline: Mon, Jan 14, 2019 
>> 10:32 PM
>>
>>
>>
>>
>>
>> This comment is not visible by one of Spark committer for now but leaving 
>> comments there send emails to all the people participating in the JIRA.
>>
>>
>>
>> Could you please stop this bot if it belongs to Thincrs please?
>>
>>
>>
>>
>>
>> Thanks.



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread Marcelo Vanzin
Sounds good.

On Wed, Mar 13, 2019 at 12:17 PM shane knapp  wrote:
>
> ok awesome.  let's shoot for 3pm PST.
>
> On Wed, Mar 13, 2019 at 11:59 AM Marcelo Vanzin  wrote:
>>
>> On Wed, Mar 13, 2019 at 11:53 AM shane knapp  wrote:
>> > On Wed, Mar 13, 2019 at 11:49 AM Marcelo Vanzin  
>> > wrote:
>> >>
>> >> Do the upgraded minikube/k8s versions break the current master client
>> >> version too?
>> >>
>> > yes.
>>
>> Ah, so that part kinda sucks.
>>
>> Let's do this: since the master PR is good to go pending the minikube
>> upgrade, let's try to synchronize things. Set a time to do the
>> minikube upgrade this PM, if that works for you, and I'll merge that
>> PR once it's done. Then I'll take care of backporting it to 2.4 and
>> make sure it passes the integration tests.
>>
>> --
>> Marcelo
>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread Marcelo Vanzin
On Wed, Mar 13, 2019 at 11:53 AM shane knapp  wrote:
> On Wed, Mar 13, 2019 at 11:49 AM Marcelo Vanzin  wrote:
>>
>> Do the upgraded minikube/k8s versions break the current master client
>> version too?
>>
> yes.

Ah, so that part kinda sucks.

Let's do this: since the master PR is good to go pending the minikube
upgrade, let's try to synchronize things. Set a time to do the
minikube upgrade this PM, if that works for you, and I'll merge that
PR once it's done. Then I'll take care of backporting it to 2.4 and
make sure it passes the integration tests.

-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread Marcelo Vanzin
Do the upgraded minikube/k8s versions break the current master client
version too?

I'm not super concerned about 2.4 integration tests being broken for a
little bit. It's very uncommon for new PRs to be open against
branch-2.4 that would affect k8s.

But I really don't want master to break. So if we can upgrade minikube
first, even if that breaks k8s integration tests on branch-2.4 for a
little bit, that would be optimal IMO.

On Wed, Mar 13, 2019 at 11:26 AM shane knapp  wrote:
>
> hey everyone...  i wanted to break this discussion out of the mega-threads 
> for the 2.4.1 RC candidates.
>
> the TL;DR is that we've been trying to update the k8s client libs to 
> something much more modern.  however, for us to do this, we need to update 
> our very old k8s and minikube versions.
>
> the problem here lies in the fact that if we update the client libs on 
> master, but not the 2.4 branch, then the 2.4 branch k8s integration tests 
> will fail if we update our backend minikube/k8s versions.
>
> i've done all of the testing locally for the new k8s client libs, and am 
> ready to pull the trigger on the infrastructure upgrade (which will take all 
> of ~15 mins).
>
> for this to happen, two PRs will need to be merged...  one for 2.4.1 and one 
> for master.
>
> is there a chance that we can get https://github.com/apache/spark/pull/23993 
> merged in for the 2.4.1 release?  this will also require 
> https://github.com/apache/spark/pull/24002 (for master) to be merged 
> simultaneously.
>
> both of those PRs are ready to go (tho 23993 was closed w/o merge and i'm not 
> entirely sure why).
>
> here's the primary jira we're using to track this upgrade:
> https://issues.apache.org/jira/browse/SPARK-26742
>
> thanks in advance,
>
> shane
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-08 Thread Marcelo Vanzin
I'd be more comfortable with an rc7. Either that or manually fix the
branch with a force push, but that's a bit risky, it's easy to mess up
force pushes (if we can even do that?).

It's very possible that there is a bug in the script; IIRC it should
create the commits in the right branch when you generate the rc the
first time. Perhaps you missed some error in the command line in that
invocation (tag was created but commits not pushed to the branch, for
example).

On Fri, Mar 8, 2019 at 11:39 AM DB Tsai  wrote:
>
> I was using `./do-release-docker.sh` to create a release. But since the gpg 
> validation fails couple times when the script tried to publish the jars into 
> Nexus, I re-ran the scripts multiple times without creating a new rc. I was 
> wondering if the script will overwrite the v.2.4.1-rc6 tag instead of using 
> the same commit causing this issue.
>
> Should we create a new rc7?
>
> DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   Apple, 
> Inc
>
> > On Mar 8, 2019, at 10:54 AM, Marcelo Vanzin  
> > wrote:
> >
> > I personally find it a little weird to not have the commit in branch-2.4.
> >
> > Not that this would happen, but if the v2.4.1-rc6 tag is overwritten
> > (e.g. accidentally) then you lose the reference to that commit, and
> > then the exact commit from which the rc was generated is lost.
> >
> > On Fri, Mar 8, 2019 at 7:49 AM Sean Owen  wrote:
> >>
> >> That's weird. I see the commit but can't find it in the branch. Was it 
> >> pushed, or lost in a force push of 2.4 along the way? The change is there, 
> >> just under a different commit in the 2.4 branch.
> >>
> >> It doesn't necessarily invalidate the RC as it is a valid public tagged 
> >> commit and all that. I just want to be sure we do have the code from that 
> >> commit in these tatballs. It looks like it.
> >>
> >> On Fri, Mar 8, 2019, 4:14 AM Mihály Tóth  wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am not sure how problematic it is but v2.4.1-rc6 is not on branch-2.4. 
> >>> Release related commits I have seen so far were also part of the branch.
> >>>
> >>> I guess the "Preparing Spark release v2.4.1-rc6" and "Preparing 
> >>> development version 2.4.2-SNAPSHOT" commits were simply not pushed to 
> >>> spark-2.4 just the tag itself was pushed. I dont know what is the 
> >>> practice in such cases but one solution is to rebase branch-2.4 changes 
> >>> after 3336a21 onto these commits and do a (sorry) force push. In this 
> >>> case there is no impact on this RC.
> >>>
> >>> Best Regards,
> >>>
> >>> Misi
> >>>
> >>> DB Tsai  ezt írta (időpont: 2019. márc. 8., P, 
> >>> 1:15):
> >>>>
> >>>> Please vote on releasing the following candidate as Apache Spark version 
> >>>> 2.4.1.
> >>>>
> >>>> The vote is open until March 11 PST and passes if a majority +1 PMC 
> >>>> votes are cast, with
> >>>> a minimum of 3 +1 votes.
> >>>>
> >>>> [ ] +1 Release this package as Apache Spark 2.4.1
> >>>> [ ] -1 Do not release this package because ...
> >>>>
> >>>> To learn more about Apache Spark, please see http://spark.apache.org/
> >>>>
> >>>> The tag to be voted on is v2.4.1-rc6 (commit 
> >>>> 201ec8c9b46f9d037cc2e3a5d9c896b9840ca1bc):
> >>>> https://github.com/apache/spark/tree/v2.4.1-rc6
> >>>>
> >>>> The release files, including signatures, digests, etc. can be found at:
> >>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc6-bin/
> >>>>
> >>>> Signatures used for Spark RCs can be found in this file:
> >>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>>>
> >>>> The staging repository for this release can be found at:
> >>>> https://repository.apache.org/content/repositories/orgapachespark-1308/
> >>>>
> >>>> The documentation corresponding to this release can be found at:
> >>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc6-docs/
> >>>>
> >>>> The list of bug fixes going into 2.4.1 can be found at the following URL:
> >>>> https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
> >>>>
> >>>> FAQ
> >>>>
> >>>> =
> &

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-08 Thread Marcelo Vanzin
I personally find it a little weird to not have the commit in branch-2.4.

Not that this would happen, but if the v2.4.1-rc6 tag is overwritten
(e.g. accidentally) then you lose the reference to that commit, and
then the exact commit from which the rc was generated is lost.

On Fri, Mar 8, 2019 at 7:49 AM Sean Owen  wrote:
>
> That's weird. I see the commit but can't find it in the branch. Was it 
> pushed, or lost in a force push of 2.4 along the way? The change is there, 
> just under a different commit in the 2.4 branch.
>
> It doesn't necessarily invalidate the RC as it is a valid public tagged 
> commit and all that. I just want to be sure we do have the code from that 
> commit in these tatballs. It looks like it.
>
> On Fri, Mar 8, 2019, 4:14 AM Mihály Tóth  wrote:
>>
>> Hi,
>>
>> I am not sure how problematic it is but v2.4.1-rc6 is not on branch-2.4. 
>> Release related commits I have seen so far were also part of the branch.
>>
>> I guess the "Preparing Spark release v2.4.1-rc6" and "Preparing development 
>> version 2.4.2-SNAPSHOT" commits were simply not pushed to spark-2.4 just the 
>> tag itself was pushed. I dont know what is the practice in such cases but 
>> one solution is to rebase branch-2.4 changes after 3336a21 onto these 
>> commits and do a (sorry) force push. In this case there is no impact on this 
>> RC.
>>
>> Best Regards,
>>
>>   Misi
>>
>> DB Tsai  ezt írta (időpont: 2019. márc. 8., P, 
>> 1:15):
>>>
>>> Please vote on releasing the following candidate as Apache Spark version 
>>> 2.4.1.
>>>
>>> The vote is open until March 11 PST and passes if a majority +1 PMC votes 
>>> are cast, with
>>> a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.4.1
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.4.1-rc6 (commit 
>>> 201ec8c9b46f9d037cc2e3a5d9c896b9840ca1bc):
>>> https://github.com/apache/spark/tree/v2.4.1-rc6
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc6-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1308/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc6-docs/
>>>
>>> The list of bug fixes going into 2.4.1 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 2.4.1?
>>> ===
>>>
>>> The current list of open tickets targeted at 2.4.1 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target 
>>> Version/s" = 2.4.1
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>> DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   Apple, 
>>> Inc
>>>
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-02-20 Thread Marcelo Vanzin
Just wanted to point out that
https://issues.apache.org/jira/browse/SPARK-26859 is not in this RC,
and is marked as a correctness bug. (The fix is in the 2.4 branch,
just not in rc2.)

On Wed, Feb 20, 2019 at 12:07 PM DB Tsai  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.4.1.
>
> The vote is open until Feb 24 PST and passes if a majority +1 PMC votes are 
> cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.1-rc2 (commit 
> 229ad524cfd3f74dd7aa5fc9ba841ae223caa960):
> https://github.com/apache/spark/tree/v2.4.1-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1299/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc2-docs/
>
> The list of bug fixes going into 2.4.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.1?
> ===
>
> The current list of open tickets targeted at 2.4.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.4.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   Apple, 
> Inc
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: merge script stopped working; Python 2/3 input() issue?

2019-02-15 Thread Marcelo Vanzin
BTW the main script has this that the website script does not:

if sys.version < '3':
   input = raw_input  # noqa

On Fri, Feb 15, 2019 at 3:55 PM Sean Owen  wrote:
>
> I'm seriously confused on this one. The spark-website merge script
> just stopped working for me. It fails on the call to input() that
> expects a y/n response, saying 'y' isn't defined.
>
> Indeed, it seems like Python 2's input() tries to evaluate the input,
> rather than return a string. Python 3 input() returns as a string, as
> does Python 2's raw_input().
>
> But the script clearly requires Python 2 as it imports urllib2, and my
> local "python" is Python 2.
>
> And nothing has changed recently and this has worked for a long time.
> The main spark merge script does the same.
>
> How on earth has this worked?
>
> I could replace input() with raw_input(), or just go ahead and fix the
> merge scripts to work with / require Python 3. But am I missing
> something basic?
>
> If not, which change would people be OK with?
>
> Sean
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: merge script stopped working; Python 2/3 input() issue?

2019-02-15 Thread Marcelo Vanzin
You're talking about the spark-website script, right? The main repo's
script has been working for me, the website one is broken.

I think it was caused by this dude changing raw_input to input recently:

commit 8b6e7dceaf5d73de3f92907ceeab8925a2586685
Author: Sean Owen 
Date:   Sat Jan 19 19:02:30 2019 -0600

   More minor style fixes for merge script

On Fri, Feb 15, 2019 at 3:55 PM Sean Owen  wrote:
>
> I'm seriously confused on this one. The spark-website merge script
> just stopped working for me. It fails on the call to input() that
> expects a y/n response, saying 'y' isn't defined.
>
> Indeed, it seems like Python 2's input() tries to evaluate the input,
> rather than return a string. Python 3 input() returns as a string, as
> does Python 2's raw_input().
>
> But the script clearly requires Python 2 as it imports urllib2, and my
> local "python" is Python 2.
>
> And nothing has changed recently and this has worked for a long time.
> The main spark merge script does the same.
>
> How on earth has this worked?
>
> I could replace input() with raw_input(), or just go ahead and fix the
> merge scripts to work with / require Python 3. But am I missing
> something basic?
>
> If not, which change would people be OK with?
>
> Sean
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: building docker images for GPU

2019-02-12 Thread Marcelo Vanzin
I think I remember someone mentioning a thread about this on the PR
discussion, and digging a bit I found this:
http://apache-spark-developers-list.1001551.n3.nabble.com/Toward-an-quot-API-quot-for-spark-images-used-by-the-Kubernetes-back-end-td23622.html

It started a discussion but I haven't really found any conclusion.

In my view here the discussion is the same: what is the contract
between the Spark code that launches the driver / executor pods, and
the images?

Right now the contract is defined by the code, which makes it a little
awkward for people to have their own customized images. They need to
kinda follow what the images in the repo do and hope they get it
right.

If instead you define the contract and make the code follow it, then
it becomes easier for people to provide whatever image they want.

Matt also filed SPARK-24655, which has seen no progress nor discussion.

Someone else filed SPARK-26773, which is similar.

And another person filed SPARK-26597, which is also in the same vein,
and also suggests something that in the end I agree with: Spark
shouldn't be opinionated about the image and what it has; it should
tell the container to run a Spark command to start the driver or
executor, which should be in the image's path, and shouldn't require
an entry point at all.

Anyway, just wanted to point out that this discussion isn't as simple
as "GPU vs. not GPU", but it's a more fundamental discussion about
what should the container image look like, so that people can
customize it easily. After all, that's one of the main points of using
container images, right?

On Mon, Feb 11, 2019 at 11:53 AM Matt Cheah  wrote:
>
> I will reiterate some feedback I left on the PR. Firstly, it’s not 
> immediately clear if we should be opinionated around supporting GPUs in the 
> Docker image in a first class way.
>
>
>
> Firstly there’s the question of how we arbitrate the kinds of customizations 
> we support moving forward. For example if we say we support GPUs now, what’s 
> to say that we should not also support FPGAs?
>
>
>
> Also what kind of testing can we add to CI to ensure what we’ve provided in 
> this Dockerfile works?
>
>
>
> Instead we can make the Spark images have bare minimum support for basic 
> Spark applications, and then provide detailed instructions for how to build 
> custom Docker images (mostly just needing to make sure the custom image has 
> the right entry point).
>
>
>
> -Matt Cheah
>
>
>
> From: Rong Ou 
> Date: Friday, February 8, 2019 at 2:28 PM
> To: "dev@spark.apache.org" 
> Subject: building docker images for GPU
>
>
>
> Hi spark dev,
>
>
>
> I created a JIRA issue a while ago 
> (https://issues.apache.org/jira/browse/SPARK-26398 [issues.apache.org]) to 
> add GPU support to Spark docker images, and sent a PR 
> (https://github.com/apache/spark/pull/23347 [github.com]) that went through 
> several iterations. It was suggested that it should be discussed on the dev 
> mailing list, so here we are. Please chime in if you have any questions or 
> concerns.
>
>
>
> A little more background. I mainly looked at running XGBoost on Spark using 
> GPUs. Preliminary results have shown that there is potential for significant 
> speedup in training time. This seems like a popular use case for Spark. In 
> any event, it'd be nice for Spark to have better support for GPUs. Building 
> gpu-enabled docker images seems like a useful first step.
>
>
>
> Thanks,
>
>
>
> Rong
>
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-11 Thread Marcelo Vanzin
+1. Ran our regression tests for YARN and Hive, all look good.

On Tue, Feb 5, 2019 at 5:07 PM Takeshi Yamamuro  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.3.3.
>
> The vote is open until February 8 6:00PM (PST) and passes if a majority +1 
> PMC votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.3.3
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.3.3-rc2 (commit 
> 66fd9c34bf406a4b5f86605d06c9607752bd637a):
> https://github.com/apache/spark/tree/v2.3.3-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1298/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc2-docs/
>
> The list of bug fixes going into 2.3.3 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12343759
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.3?
> ===
>
> The current list of open tickets targeted at 2.3.3 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.3.3
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> P.S.
> I checked all the tests passed in the Amazon Linux 2 AMI;
> $ java -version
> openjdk version "1.8.0_191"
> OpenJDK Runtime Environment (build 1.8.0_191-b12)
> OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
> $ ./build/mvn -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Psparkr 
> test
>
> --
> ---
> Takeshi Yamamuro



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-08 Thread Marcelo Vanzin
Hi Takeshi,

Since we only really have one +1 binding vote, do you want to extend
this vote a bit?

I've been stuck on a few things but plan to test this (setting things
up now), but it probably won't happen before the deadline.

On Tue, Feb 5, 2019 at 5:07 PM Takeshi Yamamuro  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.3.3.
>
> The vote is open until February 8 6:00PM (PST) and passes if a majority +1 
> PMC votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.3.3
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.3.3-rc2 (commit 
> 66fd9c34bf406a4b5f86605d06c9607752bd637a):
> https://github.com/apache/spark/tree/v2.3.3-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1298/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc2-docs/
>
> The list of bug fixes going into 2.3.3 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12343759
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.3?
> ===
>
> The current list of open tickets targeted at 2.3.3 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.3.3
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> P.S.
> I checked all the tests passed in the Amazon Linux 2 AMI;
> $ java -version
> openjdk version "1.8.0_191"
> OpenJDK Runtime Environment (build 1.8.0_191-b12)
> OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
> $ ./build/mvn -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Psparkr 
> test
>
> --
> ---
> Takeshi Yamamuro



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-23 Thread Marcelo Vanzin
-1 too.

I just upgraded https://issues.apache.org/jira/browse/SPARK-26682 to
blocker. It's a small fix and we should make it in 2.3.3.

On Thu, Jan 17, 2019 at 6:49 PM Takeshi Yamamuro  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.3.3.
>
> The vote is open until January 20 8:00PM (PST) and passes if a majority +1 
> PMC votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.3.3
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.3.3-rc1 (commit 
> b5ea9330e3072e99841270b10dc1d2248127064b):
> https://github.com/apache/spark/tree/v2.3.3-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1297
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc1-docs/
>
> The list of bug fixes going into 2.3.3 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12343759
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.3?
> ===
>
> The current list of open tickets targeted at 2.3.3 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.3.3
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> --
> ---
> Takeshi Yamamuro



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-15 Thread Marcelo Vanzin
+1 to that. HIVE-16391 by itself means we're giving up things like
Hadoop 3, and we're also putting the burden on the Hive folks to fix a
problem that we created.

The current PR is basically a Spark-side fix for that bug. It does
mean also upgrading Hive (which gives us Hadoop 3, yay!), but I think
it's really the right path to take here.

On Tue, Jan 15, 2019 at 6:32 PM Hyukjin Kwon  wrote:
>
> Resolving HIVE-16391 means Hive to release 1.2.x that contains the fixes of 
> our Hive fork (correct me if I am mistaken).
>
> Just to be honest by myself and as a personal opinion, that basically says 
> Hive to take care of Spark's dependency.
> Hive looks going ahead for 3.1.x and no one would use the newer release of 
> 1.2.x. In practice, Spark doesn't make a release 1.6.x anymore for instance,
>
> Frankly, my impression was that it's, honestly, our mistake to fix. Since 
> Spark community is big enough, I was thinking we should try to fix it by 
> ourselves first.
> I am not saying upgrading is the only way to get through this but I think we 
> should at least try first, and see what's next.
>
> It does, yes, sound more risky to upgrade it in our side but I think it's 
> worth to check and try it and see if it's possible.
> I think this is a standard approach to upgrade the dependency than using the 
> fork or letting Hive side to release another 1.2.x.
>
> If we fail to upgrade it for critical or inevitable reasons somehow, yes, we 
> could find an alternative but that basically means
> we're going to stay in 1.2.x for, at least, a long time (say .. until Spark 
> 4.0.0?).
>
> I know somehow it happened to be sensitive but to be just literally honest to 
> myself, I think we should make a try.
>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-15 Thread Marcelo Vanzin
The metastore interactions in Spark are currently based on APIs that
are in the Hive exec jar; so that makes it not possible to have Spark
work with Hadoop 3 until the exec jar is upgraded.

It could be possible to re-implement those interactions based solely
on the metastore client Hive publishes; but that would be a lot of
work IIRC.

I can't comment on how many people use Hive serde tables (although I
do know they use it, just not how extensively), but that's not the
only reason why Spark currently requires the hive-exec jar.

On Tue, Jan 15, 2019 at 10:03 AM Xiao Li  wrote:
>
> Let me take my words back. To read/write a table, Spark users do not use the 
> Hive execution JARs, unless they explicitly create the Hive serde tables. 
> Actually, I want to understand the motivation and use cases why your usage 
> scenarios need to create Hive serde tables instead of our Spark native tables?
>
> BTW, we are still using Hive metastore as our metadata store. This does not 
> require the Hive execution JAR upgrade, based on my understanding. Users can 
> upgrade it to the newer version of Hive metastore.
>
> Felix Cheung  于2019年1月15日周二 上午9:56写道:
>>
>> And we are super 100% dependent on Hive...
>>
>>
>> 
>> From: Ryan Blue 
>> Sent: Tuesday, January 15, 2019 9:53 AM
>> To: Xiao Li
>> Cc: Yuming Wang; dev
>> Subject: Re: [DISCUSS] Upgrade built-in Hive to 2.3.4
>>
>> How do we know that most Spark users are not using Hive? I wouldn't be 
>> surprised either way, but I do want to make sure we aren't making decisions 
>> based on any one person's (or one company's) experience about what "most" 
>> Spark users do.
>>
>> On Tue, Jan 15, 2019 at 9:44 AM Xiao Li  wrote:
>>>
>>> Hi, Yuming,
>>>
>>> Thank you for your contributions! The community aims at reducing the 
>>> dependence on Hive. Currently, most of Spark users are not using Hive. The 
>>> changes looks risky to me.
>>>
>>> To support Hadoop 3.x, we just need to resolve this JIRA: 
>>> https://issues.apache.org/jira/browse/HIVE-16391
>>>
>>> Cheers,
>>>
>>> Xiao
>>>
>>> Yuming Wang  于2019年1月15日周二 上午8:41写道:

 Dear Spark Developers and Users,



 Hyukjin and I plan to upgrade the built-in Hive from1.2.1-spark2 to 2.3.4 
 to solve some critical issues, such as support Hadoop 3.x, solve some ORC 
 and Parquet issues. This is the list:

 Hive issues:

 [SPARK-26332][HIVE-10790] Spark sql write orc table on viewFS throws 
 exception

 [SPARK-25193][HIVE-12505] insert overwrite doesn't throw exception when 
 drop old data fails

 [SPARK-26437][HIVE-13083] Decimal data becomes bigint to query, unable to 
 query

 [SPARK-25919][HIVE-11771] Date value corrupts when tables are 
 "ParquetHiveSerDe" formatted and target table is Partitioned

 [SPARK-12014][HIVE-11100] Spark SQL query containing semicolon is broken 
 in Beeline



 Spark issues:

 [SPARK-23534] Spark run on Hadoop 3.0.0

 [SPARK-20202] Remove references to org.spark-project.hive

 [SPARK-18673] Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop 
 version

 [SPARK-24766] CreateHiveTableAsSelect and InsertIntoHiveDir won't generate 
 decimal column stats in parquet





 Since the code for the hive-thriftserver module has changed too much for 
 this upgrade, I split it into two PRs for easy review.

 The first PR does not contain the changes of hive-thriftserver. Please 
 ignore the failed test in hive-thriftserver.

 The second PR is complete changes.



 I have created a Spark distribution for Apache Hadoop 2.7, you might 
 download it viaGoogle Drive or Baidu Pan.

 Please help review and test. Thanks.
>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark History UI + Keycloak Integration

2019-01-04 Thread Marcelo Vanzin
On Fri, Jan 4, 2019 at 3:25 AM G, Ajay (Nokia - IN/Bangalore)
 wrote:
...
> Added session handler for all context -   
> contextHandler.setSessionHandler(new SessionHandler())
...
> Keycloak authentication seems to work, Is this the right approach ? If it is 
> fine I can submit a PR.

I don't remember many details about servlet session management, and
whether it can be enabled some other way, but that seems ok. I'd just
make it a new config, since otherwise Spark doesn't need the extra
overhead.

-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-10 Thread Marcelo Vanzin
Hmm, it also seems that github comments are being sync'ed to jira.
That's gonna get old very quickly, we should probably ask infra to
disable that (if we can't do it ourselves).
On Mon, Dec 10, 2018 at 9:13 AM Sean Owen  wrote:
>
> Update for committers: now that my user ID is synced, I can
> successfully push to remote https://github.com/apache/spark directly.
> Use that as the 'apache' remote (if you like; gitbox also works). I
> confirmed the sync works both ways.
>
> As a bonus you can directly close pull requests when needed instead of
> using "Close Stale PRs" pull requests.
>
> On Mon, Dec 10, 2018 at 10:30 AM Sean Owen  wrote:
> >
> > Per the thread last week, the Apache Spark repos have migrated from
> > https://git-wip-us.apache.org/repos/asf to
> > https://gitbox.apache.org/repos/asf
> >
> >
> > Non-committers:
> >
> > This just means repointing any references to the old repository to the
> > new one. It won't affect you if you were already referencing
> > https://github.com/apache/spark .
> >
> >
> > Committers:
> >
> > Follow the steps at https://reference.apache.org/committer/github to
> > fully sync your ASF and Github accounts, and then wait up to an hour
> > for it to finish.
> >
> > Then repoint your git-wip-us remotes to gitbox in your git checkouts.
> > For our standard setup that works with the merge script, that should
> > be your 'apache' remote. For example here are my current remotes:
> >
> > $ git remote -v
> > apache https://gitbox.apache.org/repos/asf/spark.git (fetch)
> > apache https://gitbox.apache.org/repos/asf/spark.git (push)
> > apache-github git://github.com/apache/spark (fetch)
> > apache-github git://github.com/apache/spark (push)
> > origin https://github.com/srowen/spark (fetch)
> > origin https://github.com/srowen/spark (push)
> > upstream https://github.com/apache/spark (fetch)
> > upstream https://github.com/apache/spark (push)
> >
> > In theory we also have read/write access to github.com now too, but
> > right now it hadn't yet worked for me. It may need to sync. This note
> > just makes sure anyone knows how to keep pushing commits right now to
> > the new ASF repo.
> >
> > Report any problems here!
> >
> > Sean
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-16 Thread Marcelo Vanzin
Now that the switch to 2.12 by default has been made, it might be good
to have a serious discussion about dropping 2.11 altogether. Many of
the main arguments have already been talked about. But I don't
remember anyone mentioning how easy it would be to break the 2.11
build now.

For example, the following works fine in 2.12 but breaks in 2.11:

java.util.Arrays.asList("hi").stream().forEach(println)

We had a similar issue when we supported java 1.6 but the builds were
all on 1.7 by default. Every once in a while something would silently
break, because PR builds only check the default. And the jenkins
builds, which are less monitored, would stay broken for a while.

On Tue, Nov 6, 2018 at 11:13 AM DB Tsai  wrote:
>
> We made Scala 2.11 as default Scala version in Spark 2.0. Now, the next Spark 
> version will be 3.0, so it's a great time to discuss should we make Scala 
> 2.12 as default Scala version in Spark 3.0.
>
> Scala 2.11 is EOL, and it came out 4.5 ago; as a result, it's unlikely to 
> support JDK 11 in Scala 2.11 unless we're willing to sponsor the needed work 
> per discussion in Scala community, 
> https://github.com/scala/scala-dev/issues/559#issuecomment-436160166
>
> We have initial support of Scala 2.12 in Spark 2.4. If we decide to make 
> Scala 2.12 as default for Spark 3.0 now, we will have ample time to work on 
> bugs and issues that we may run into.
>
> What do you think?
>
> Thanks,
>
> DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   Apple, 
> Inc
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: which classes/methods are considered as private in Spark?

2018-11-13 Thread Marcelo Vanzin
On Tue, Nov 13, 2018 at 6:26 PM Wenchen Fan  wrote:
> Recently I updated the MiMa exclusion rules, and found MiMa tracks some 
> private classes/methods unexpectedly.

Could you clarify what you mean here? Mima has some known limitations
such as not handling "private[blah]" very well (because that means
public in Java). Spark has (had?) this tool to generate an exclusions
file for Mima, but not sure how up-to-date it is.

> AFAIK, we have several rules:
> 1. everything which is really private that end users can't access, e.g. 
> package private classes, private methods, etc.
> 2. classes under certain packages. I don't know if we have a list, the 
> catalyst package is considered as a private package.
> 3. everything which has a @Private annotation.

That's my understanding of the scope of the rules.

(2) to me means "things that show up in the public API docs". That's,
AFAIK, tracked in SparkBuild.scala; seems like it's tracked by a bunch
of exclusions in the Unidoc object (I remember that being different in
the past).

(3) might be a limitation of the doc generation tool? Not sure if it's
easy to say "do not document classes that have @Private". At the very
least, that annotation seems to be missing the "@Documented"
annotation, which would make that info present in the javadoc. I do
not know if the scala doc tool handles that.

-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Marcelo Vanzin
+user@

>> -- Forwarded message -
>> From: Wenchen Fan 
>> Date: Thu, Nov 8, 2018 at 10:55 PM
>> Subject: [ANNOUNCE] Announcing Apache Spark 2.4.0
>> To: Spark dev list 
>>
>>
>> Hi all,
>>
>> Apache Spark 2.4.0 is the fifth release in the 2.x line. This release adds 
>> Barrier Execution Mode for better integration with deep learning frameworks, 
>> introduces 30+ built-in and higher-order functions to deal with complex data 
>> type easier, improves the K8s integration, along with experimental Scala 
>> 2.12 support. Other major updates include the built-in Avro data source, 
>> Image data source, flexible streaming sinks, elimination of the 2GB block 
>> size limitation during transfer, Pandas UDF improvements. In addition, this 
>> release continues to focus on usability, stability, and polish while 
>> resolving around 1100 tickets.
>>
>> We'd like to thank our contributors and users for their contributions and 
>> early feedback to this release. This release would not have been possible 
>> without you.
>>
>> To download Spark 2.4.0, head over to the download page: 
>> http://spark.apache.org/downloads.html
>>
>> To view the release notes: 
>> https://spark.apache.org/releases/spark-release-2-4-0.html
>>
>> Thanks,
>> Wenchen
>>
>> PS: If you see any issues with the release notes, webpage or published 
>> artifacts, please contact me directly off-list.



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Test and support only LTS JDK release?

2018-11-06 Thread Marcelo Vanzin
https://www.oracle.com/technetwork/java/javase/eol-135779.html
On Tue, Nov 6, 2018 at 2:56 PM Felix Cheung  wrote:
>
> Is there a list of LTS release that I can reference?
>
>
> 
> From: Ryan Blue 
> Sent: Tuesday, November 6, 2018 1:28 PM
> To: sn...@snazy.de
> Cc: Spark Dev List; cdelg...@apple.com
> Subject: Re: Test and support only LTS JDK release?
>
> +1 for supporting LTS releases.
>
> On Tue, Nov 6, 2018 at 11:48 AM Robert Stupp  wrote:
>>
>> +1 on supporting LTS releases.
>>
>> VM distributors (RedHat, Azul - to name two) want to provide patches to LTS 
>> versions (i.e. into http://hg.openjdk.java.net/jdk-updates/jdk11u/). How 
>> that will play out in reality ... I don't know. Whether Oracle will 
>> contribute to that repo for 8 after it's EOL and 11 after the 6 month cycle 
>> ... we will see. Most Linux distributions promised(?) long-term support for 
>> Java 11 in their LTS releases (e.g. Ubuntu 18.04). I am not sure what that 
>> exactly means ... whether they will actively provide patches to OpenJDK or 
>> whether they just build from source.
>>
>> But considering that, I think it's definitely worth to at least keep an eye 
>> on Java 12 and 13 - even if those are just EA. Java 12 for example does 
>> already forbid some "dirty tricks" that are still possible in Java 11.
>>
>>
>> On 11/6/18 8:32 PM, DB Tsai wrote:
>>
>> OpenJDK will follow Oracle's release cycle, 
>> https://openjdk.java.net/projects/jdk/, a strict six months model. I'm not 
>> familiar with other non-Oracle VMs and Redhat support.
>>
>> DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   Apple, 
>> Inc
>>
>> On Nov 6, 2018, at 11:26 AM, Reynold Xin  wrote:
>>
>> What does OpenJDK do and other non-Oracle VMs? I know there was a lot of 
>> discussions from Redhat etc to support.
>>
>>
>> On Tue, Nov 6, 2018 at 11:24 AM DB Tsai  wrote:
>>>
>>> Given Oracle's new 6-month release model, I feel the only realistic option 
>>> is to only test and support JDK such as JDK 11 LTS and future LTS release. 
>>> I would like to have a discussion on this in Spark community.
>>>
>>> Thanks,
>>>
>>> DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   Apple, 
>>> Inc
>>>
>>
>> --
>> Robert Stupp
>> @snazy
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Test and support only LTS JDK release?

2018-11-06 Thread Marcelo Vanzin
+1, that's always been my view.

Although, to be fair, and as Sean mentioned, the jump from jdk8 is
probably the harder part. After that it's less likely (hopefully?)
that we'll run into issues in non-LTS releases. And even if we don't
officially support them, trying to keep up with breaking changes might
make it easier to support the following LTS.

(Just as an example, both jdk9 and jdk10 are already EOL.)

On Tue, Nov 6, 2018 at 11:24 AM DB Tsai  wrote:
>
> Given Oracle's new 6-month release model, I feel the only realistic option is 
> to only test and support JDK such as JDK 11 LTS and future LTS release. I 
> would like to have a discussion on this in Spark community.
>
> Thanks,
>
> DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   Apple, 
> Inc
>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-31 Thread Marcelo Vanzin
+1
On Mon, Oct 29, 2018 at 3:22 AM Wenchen Fan  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.4.0.
>
> The vote is open until November 1 PST and passes if a majority +1 PMC votes 
> are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.0-rc5 (commit 
> 0a4c03f7d084f1d2aa48673b99f3b9496893ce8d):
> https://github.com/apache/spark/tree/v2.4.0-rc5
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1291
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-docs/
>
> The list of bug fixes going into 2.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.0?
> ===
>
> The current list of open tickets targeted at 2.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Starting to make changes for Spark 3 -- what can we delete?

2018-10-16 Thread Marcelo Vanzin
Might be good to take a look at things marked "@DeveloperApi" and
whether they should stay that way.

e.g. I was looking at SparkHadoopUtil and I've always wanted to just
make it private to Spark. I don't see why apps would need any of those
methods.
On Tue, Oct 16, 2018 at 10:18 AM Sean Owen  wrote:
>
> There was already agreement to delete deprecated things like Flume and
> Kafka 0.8 support in master. I've got several more on my radar, and
> wanted to highlight them and solicit general opinions on where we
> should accept breaking changes.
>
> For example how about removing accumulator v1?
> https://github.com/apache/spark/pull/22730
>
> Or using the standard Java Optional?
> https://github.com/apache/spark/pull/22383
>
> Or cleaning up some old workarounds and APIs while at it?
> https://github.com/apache/spark/pull/22729 (still in progress)
>
> I think I talked myself out of replacing Java function interfaces with
> java.util.function because...
> https://issues.apache.org/jira/browse/SPARK-25369
>
> There are also, say, old json and csv and avro reading method
> deprecated since 1.4. Remove?
> Anything deprecated since 2.0.0?
>
> Interested in general thoughts on these.
>
> Here are some more items targeted to 3.0:
> https://issues.apache.org/jira/browse/SPARK-17875?jql=project%3D%22SPARK%22%20AND%20%22Target%20Version%2Fs%22%3D%223.0.0%22%20ORDER%20BY%20priority%20ASC
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Remove Flume support in 3.0.0?

2018-10-10 Thread Marcelo Vanzin
BTW, although I did not file a bug for that, I think we should also
consider getting rid of the kafka-0.8 connector.

That would leave only kafka-0.10 as the single remaining dstream
connector in Spark, though. (If you ignore kinesis which we can't ship
in binary form or something like that?)
On Wed, Oct 10, 2018 at 1:32 PM Sean Owen  wrote:
>
> Marcelo makes an argument that Flume support should be removed in
> 3.0.0 at https://issues.apache.org/jira/browse/SPARK-25598
>
> I tend to agree. Is there an argument that it needs to be supported,
> and can this move to Bahir if so?
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: moving the spark jenkins job builder repo from dbricks --> spark

2018-10-10 Thread Marcelo Vanzin
Thanks for doing this. The more things we have accessible to the
project members in general the better!

(Now there's that hive fork repo somewhere, but let's not talk about that.)

On Wed, Oct 10, 2018 at 9:30 AM shane knapp  wrote:
>> > * the JJB templates are able to be run by anyone w/jenkins login access 
>> > without the need to commit changes to the repo.  this means there's a 
>> > non-zero potential for bad actors to change the build configs.  since we 
>> > will only be managing test and compile jobs through this, the chances for 
>> > Real Bad Stuff[tm] is minimized.  i will also have a local server, not on 
>> > the jenkins network, run a nightly cron job that grabs the latest configs 
>> > from github and syncs them to jenkins.

Not sure if that's what you meant; but it should be ok for the jenkins
servers to manually sync with master after you (or someone else) have
verified the changes. That should prevent inadvertent breakages since
I don't expect it to be easy to test those scripts without access to
some test jenkins server.

-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS][K8S] Local dependencies with Kubernetes

2018-10-08 Thread Marcelo Vanzin
On Mon, Oct 8, 2018 at 6:36 AM Rob Vesse  wrote:
> Since connectivity back to the client is a potential stumbling block for 
> cluster mode I wander if it would be better to think in reverse i.e. rather 
> than having the driver pull from the client have the client push to the 
> driver pod?
>
> You can do this manually yourself via kubectl cp so it should be possible to 
> programmatically do this since it looks like this is just a tar piped into a 
> kubectl exec.   This would keep the relevant logic in the Kubernetes specific 
> client which may/may not be desirable depending on whether we’re looking to 
> just fix this for K8S or more generally.  Of course there is probably a fair 
> bit of complexity in making this work but does that sound like something 
> worth exploring?

That sounds like a good solution especially if there's a programmatic
API for it, instead of having to fork a sub-process to upload the
files.

>  I hadn’t really considered the HA aspect

When you say HA here what do you mean exactly? I don't really expect
two drivers for the same app running at the same time, so my first
guess is you mean "reattempts" just like YARN supports - re-running
the driver if the first one fails?

That can be tricky without some shared storage mechanism, because in
cluster mode the submission client doesn't need to stay alive after
the application starts. Or at least it doesn't with other cluster
managers.


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS][K8S] Local dependencies with Kubernetes

2018-10-05 Thread Marcelo Vanzin
On Fri, Oct 5, 2018 at 7:54 AM Rob Vesse  wrote:
> Ideally this would all just be handled automatically for users in the way 
> that all other resource managers do

I think you're giving other resource managers too much credit. In
cluster mode, only YARN really distributes local dependencies, because
YARN has that feature (its distributed cache) and Spark just uses it.

Standalone doesn't do it (see SPARK-4160) and I don't remember seeing
anything similar on the Mesos side.

There are things that could be done; e.g. if you have HDFS you could
do a restricted version of what YARN does (upload files to HDFS, and
change the "spark.jars" and "spark.files" URLs to point to HDFS
instead). Or you could turn the submission client into a file server
that the cluster-mode driver downloads files from - although that
requires connectivity from the driver back to the client.

Neither is great, but better than not having that feature.

Just to be clear: in client mode things work right? (Although I'm not
really familiar with how client mode works in k8s - never tried it.)

-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-17 Thread Marcelo Vanzin
You can log in to https://repository.apache.org and see what's wrong.
Just find that staging repo and look at the messages. In your case it
seems related to your signature.

failureMessageNo public key: Key with id: () was not able to be
located on http://gpg-keyserver.de/. Upload your public key and try
the operation again.
On Sun, Sep 16, 2018 at 10:00 PM Wenchen Fan  wrote:
>
> I confirmed that 
> https://repository.apache.org/content/repositories/orgapachespark-1285 is not 
> accessible. I did it via ./dev/create-release/do-release-docker.sh -d 
> /my/work/dir -s publish , not sure what's going wrong. I didn't see any error 
> message during it.
>
> Any insights are appreciated! So that I can fix it in the next RC. Thanks!
>
> On Mon, Sep 17, 2018 at 11:31 AM Sean Owen  wrote:
>>
>> I think one build is enough, but haven't thought it through. The
>> Hadoop 2.6/2.7 builds are already nearly redundant. 2.12 is probably
>> best advertised as a 'beta'. So maybe publish a no-hadoop build of it?
>> Really, whatever's the easy thing to do.
>> On Sun, Sep 16, 2018 at 10:28 PM Wenchen Fan  wrote:
>> >
>> > Ah I missed the Scala 2.12 build. Do you mean we should publish a Scala 
>> > 2.12 build this time? Current for Scala 2.11 we have 3 builds: with hadoop 
>> > 2.7, with hadoop 2.6, without hadoop. Shall we do the same thing for Scala 
>> > 2.12?
>> >
>> > On Mon, Sep 17, 2018 at 11:14 AM Sean Owen  wrote:
>> >>
>> >> A few preliminary notes:
>> >>
>> >> Wenchen for some weird reason when I hit your key in gpg --import, it
>> >> asks for a passphrase. When I skip it, it's fine, gpg can still verify
>> >> the signature. No issue there really.
>> >>
>> >> The staging repo gives a 404:
>> >> https://repository.apache.org/content/repositories/orgapachespark-1285/
>> >> 404 - Repository "orgapachespark-1285 (staging: open)"
>> >> [id=orgapachespark-1285] exists but is not exposed.
>> >>
>> >> The (revamped) licenses are OK, though there are some minor glitches
>> >> in the final release tarballs (my fault) : there's an extra directory,
>> >> and the source release has both binary and source licenses. I'll fix
>> >> that. Not strictly necessary to reject the release over those.
>> >>
>> >> Last, when I check the staging repo I'll get my answer, but, were you
>> >> able to build 2.12 artifacts as well?
>> >>
>> >> On Sun, Sep 16, 2018 at 9:48 PM Wenchen Fan  wrote:
>> >> >
>> >> > Please vote on releasing the following candidate as Apache Spark 
>> >> > version 2.4.0.
>> >> >
>> >> > The vote is open until September 20 PST and passes if a majority +1 PMC 
>> >> > votes are cast, with
>> >> > a minimum of 3 +1 votes.
>> >> >
>> >> > [ ] +1 Release this package as Apache Spark 2.4.0
>> >> > [ ] -1 Do not release this package because ...
>> >> >
>> >> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >> >
>> >> > The tag to be voted on is v2.4.0-rc1 (commit 
>> >> > 1220ab8a0738b5f67dc522df5e3e77ffc83d207a):
>> >> > https://github.com/apache/spark/tree/v2.4.0-rc1
>> >> >
>> >> > The release files, including signatures, digests, etc. can be found at:
>> >> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc1-bin/
>> >> >
>> >> > Signatures used for Spark RCs can be found in this file:
>> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >> >
>> >> > The staging repository for this release can be found at:
>> >> > https://repository.apache.org/content/repositories/orgapachespark-1285/
>> >> >
>> >> > The documentation corresponding to this release can be found at:
>> >> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc1-docs/
>> >> >
>> >> > The list of bug fixes going into 2.4.0 can be found at the following 
>> >> > URL:
>> >> > https://issues.apache.org/jira/projects/SPARK/versions/2.4.0
>> >> >
>> >> > FAQ
>> >> >
>> >> > =
>> >> > How can I help test this release?
>> >> > =
>> >> >
>> >> > If you are a Spark user, you can help us test this release by taking
>> >> > an existing Spark workload and running on this release candidate, then
>> >> > reporting any regressions.
>> >> >
>> >> > If you're working in PySpark you can set up a virtual env and install
>> >> > the current RC and see if anything important breaks, in the Java/Scala
>> >> > you can add the staging repository to your projects resolvers and test
>> >> > with the RC (make sure to clean up the artifact cache before/after so
>> >> > you don't end up building with a out of date RC going forward).
>> >> >
>> >> > ===
>> >> > What should happen to JIRA tickets still targeting 2.4.0?
>> >> > ===
>> >> >
>> >> > The current list of open tickets targeted at 2.4.0 can be found at:
>> >> > https://issues.apache.org/jira/projects/SPARK and search for "Target 
>> >> > Version/s" = 2.4.0
>> >> >
>> >> > Committers should look at those and triage. Extremely important bug
>> >> > fixes, 

Re: data source api v2 refactoring

2018-09-04 Thread Marcelo Vanzin
Same here, I don't see anything from Wenchen... just replies to him.
On Sat, Sep 1, 2018 at 9:31 PM Mridul Muralidharan  wrote:
>
>
> Is it only me or are all others getting Wenchen’s mails ? (Obviously Ryan did 
> :-) )
> I did not see it in the mail thread I received or in archives ... [1] 
> Wondering which othersenderswere getting dropped (if yes).
>
> Regards
> Mridul
>
> [1] 
> http://apache-spark-developers-list.1001551.n3.nabble.com/data-source-api-v2-refactoring-td24848.html
>
>
> On Sat, Sep 1, 2018 at 8:58 PM Ryan Blue  wrote:
>>
>> Thanks for clarifying, Wenchen. I think that's what I expected.
>>
>> As for the abstraction, here's the way that I think about it: there are two 
>> important parts of a scan: the definition of what will be read, and task 
>> sets that actually perform the read. In batch, there's one definition of the 
>> scan and one task set so it makes sense that there's one scan object that 
>> encapsulates both of these concepts. For streaming, we need to separate the 
>> two into the definition of what will be read (the stream or streaming read) 
>> and the task sets that are run (scans). That way, the streaming read behaves 
>> like a factory for scans, producing scans that handle the data either in 
>> micro-batches or using continuous tasks.
>>
>> To address Jungtaek's question, I think that this does work with continuous. 
>> In continuous mode, the query operators keep running and send data to one 
>> another directly. The API still needs a streaming read layer because it may 
>> still produce more than one continuous scan. That would happen when the 
>> underlying source changes and Spark needs to reconfigure. I think the 
>> example here is when partitioning in a Kafka topic changes and Spark needs 
>> to re-map Kafka partitions to continuous tasks.
>>
>> rb
>>
>> On Fri, Aug 31, 2018 at 5:12 PM Wenchen Fan  wrote:
>>>
>>> Hi Ryan,
>>>
>>> Sorry I may use a wrong wording. The pushdown is done with ScanConfig, 
>>> which is not table/stream/scan, but something between them. The table 
>>> creates ScanConfigBuilder, and table creates stream/scan with ScanConfig. 
>>> For streaming source, stream is the one to take care of the pushdown 
>>> result. For batch source, it's the scan.
>>>
>>> It's a little tricky because stream is an abstraction for streaming source 
>>> only. Better ideas are welcome!
>>>
>>>
>>> On Sat, Sep 1, 2018 at 7:26 AM Ryan Blue  wrote:

 Thanks, Reynold!

 I think your API sketch looks great. I appreciate having the Table level 
 in the abstraction to plug into as well. I think this makes it clear what 
 everything does, particularly having the Stream level that represents a 
 configured (by ScanConfig) streaming read and can act as a factory for 
 individual batch scans or for continuous scans.

 Wenchen, I'm not sure what you mean by doing pushdown at the table level. 
 It seems to mean that pushdown is specific to a batch scan or streaming 
 read, which seems to be what you're saying as well. Wouldn't the pushdown 
 happen to create a ScanConfig, which is then used as Reynold suggests? 
 Looking forward to seeing this PR when you get it posted. Thanks for all 
 of your work on this!

 rb

 On Fri, Aug 31, 2018 at 3:52 PM Wenchen Fan  wrote:
>
> Thank Reynold for writing this and starting the discussion!
>
> Data source v2 was started with batch only, so we didn't pay much 
> attention to the abstraction and just follow the v1 API. Now we are 
> designing the streaming API and catalog integration, the abstraction 
> becomes super important.
>
> I like this proposed abstraction and have successfully prototyped it to 
> make sure it works.
>
> During prototyping, I have to work around the issue that the current 
> streaming engine does query optimization/planning for each micro batch. 
> With this abstraction, the operator pushdown is only applied once 
> per-query. In my prototype, I do the physical planning up front to get 
> the pushdown result, and
> add a logical linking node that wraps the resulting physical plan node 
> for the data source, and then swap that logical linking node into the 
> logical plan for each batch. In the future we should just let the 
> streaming engine do query optimization/planning only once.
>
> About pushdown, I think we should do it at the table level. The table 
> should create a new pushdow handler to apply operator pushdowm for each 
> scan/stream, and create the scan/stream with the pushdown result. The 
> rationale is, a table should have the same pushdown behavior regardless 
> the scan node.
>
> Thanks,
> Wenchen
>
>
>
>
>
> On Fri, Aug 31, 2018 at 2:00 PM Reynold Xin  wrote:
>>
>> I spent some time last week looking at the current data source v2 apis, 
>> and I thought we should be a 

Re: Nightly Builds in the docs (in spark-nightly/spark-master-bin/latest? Can't seem to find it)

2018-08-31 Thread Marcelo Vanzin
I think there still might be an active job publishing stuff. Here's a
pretty recent build from master:

https://dist.apache.org/repos/dist/dev/spark/2.4.0-SNAPSHOT-2018_08_31_12_02-32da87d-docs/_site/index.html

But it seems only docs are being published, which makes me think it's
those builds that Shane mentioned in a recent e-mail.

On Fri, Aug 31, 2018 at 1:29 PM Sean Owen  wrote:
>
> There are some builds there, but they're not recent:
>
> https://people.apache.org/~pwendell/spark-nightly/
>
> We can either get the jobs running again, or just knock this on the head and 
> remove it.
>
> Anyone know how to get it running again and want to? I have a feeling Shane 
> knows if anyone. Or does anyone know if we even need these at this point? if 
> nobody has complained in about a year, unlikely.
>
> On Fri, Aug 31, 2018 at 3:15 PM Cody Koeninger  wrote:
>>
>> Just got a question about this on the user list as well.
>>
>> Worth removing that link to pwendell's directory from the docs?
>>
>> On Sun, Jan 21, 2018 at 12:13 PM, Jacek Laskowski  wrote:
>> > Hi,
>> >
>> > http://spark.apache.org/developer-tools.html#nightly-builds reads:
>> >
>> >> Spark nightly packages are available at:
>> >> Latest master build:
>> >> https://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest
>> >
>> > but the URL gives 404. Is this intended?
>> >
>> > Pozdrawiam,
>> > Jacek Laskowski
>> > 
>> > https://about.me/JacekLaskowski
>> > Mastering Spark SQL https://bit.ly/mastering-spark-sql
>> > Spark Structured Streaming https://bit.ly/spark-structured-streaming
>> > Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
>> > Follow me at https://twitter.com/jaceklaskowski
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [discuss] replacing SPIP template with Heilmeier's Catechism?

2018-08-31 Thread Marcelo Vanzin
I like the questions (aside maybe from the cost one which perhaps does
not matter much here), especially since they encourage explaining
things in a more plain language than generally used by specs.

But I don't think we can ignore design aspects; it's been my
observation that a good portion of SPIPs, when proposed, already have
at the very least some sort of implementation (even if it's a barely
working p.o.c.), so it would also be good to have that information up
front if it's available.

(So I guess I'm just repeating Sean's reply.)

On Fri, Aug 31, 2018 at 11:23 AM Reynold Xin  wrote:
>
> I helped craft the current SPIP template last year. I was recently 
> (re-)introduced to the Heilmeier Catechism, a set of questions DARPA 
> developed to evaluate proposals. The set of questions are:
>
> - What are you trying to do? Articulate your objectives using absolutely no 
> jargon.
> - How is it done today, and what are the limits of current practice?
> - What is new in your approach and why do you think it will be successful?
> - Who cares? If you are successful, what difference will it make?
> - What are the risks?
> - How much will it cost?
> - How long will it take?
> - What are the mid-term and final “exams” to check for success?
>
> When I read the above list, it resonates really well because they are almost 
> always the same set of questions I ask myself and others before I decide 
> whether something is worth doing. In some ways, our SPIP template tries to 
> capture some of these (e.g. target persona), but are not as explicit and well 
> articulated.
>
> What do people think about replacing the current SPIP template with the above?
>
> At a high level, I think the Heilmeier's Catechism emphasizes less about the 
> "how", and more the "why" and "what", which is what I'd argue SPIPs should be 
> about. The hows should be left in design docs for larger projects.
>
>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] SPIP: Executor Plugin (SPARK-24918)

2018-08-28 Thread Marcelo Vanzin
The issue is: where do you name the class you want to initialize? In your job?

What if the plugin has nothing to do with your job, e.g. it's some
monitoring tool? Are you going to modify all your jobs so you include
initialization of that particular plugin? Forcing everybody to change
their code, builds, re-deploy stuff, etc?

What if a task in an early stage of your job, the one that initializes
the plugin, runs in executor A, and a task in a further stage runs on
executor B?

It's just way more complicated, if possible at all, to write this
feature by asking everybody to change their application code.


On Tue, Aug 28, 2018 at 9:39 AM, Sean Owen  wrote:
> I should be able to force a class to init by just naming it. The issue is
> that this means doing that in any job that needs init? not that it doesn't
> work, right? I concede that could be annoying if just about all your code
> needs this init; I had understood the use cases to be more like "establish
> some local config and init for this particular thing I'm doing for this
> legacy system".
>
> On Tue, Aug 28, 2018 at 11:35 AM Marcelo Vanzin  wrote:
>>
>> +1
>>
>> Class init is not enough because there is nowhere for you to force a
>> random class to be initialized. This is basically adding that
>> mechanism, instead of forcing people to add hacks using e.g.
>> mapPartitions which don't even cover all scenarios.
>>
>> On Tue, Aug 28, 2018 at 7:09 AM, Sean Owen  wrote:
>> > Still +0 on the idea, as I am still not sure it does much over simple
>> > JVM
>> > mechanisms like a class init. More comments on the JIRA. I can't say
>> > it's a
>> > bad idea though, so would not object to it.
>> >
>> > On Tue, Aug 28, 2018 at 8:50 AM Imran Rashid
>> > 
>> > wrote:
>> >>
>> >> There has been discussion on jira & the PR, all generally positive, so
>> >> I'd
>> >> like to call a vote for this spip.
>> >>
>> >> I'll start with own +1.
>> >>
>> >> On Fri, Aug 3, 2018 at 11:59 AM Imran Rashid 
>> >> wrote:
>> >>>
>> >>> I'd like to propose adding a plugin api for Executors, primarily for
>> >>> instrumentation and debugging
>> >>> (https://issues.apache.org/jira/browse/SPARK-24918).  The changes are
>> >>> small,
>> >>> but as its adding a new api, it might be spip-worthy.  I mentioned it
>> >>> as
>> >>> well in a recent email I sent about memory monitoring
>> >>>
>> >>> The spip proposal is here (and attached to the jira as well):
>> >>>
>> >>> https://docs.google.com/document/d/1a20gHGMyRbCM8aicvq4LhWfQmoA5cbHBQtyqIA2hgtc/edit?usp=sharing
>> >>>
>> >>> There are already some comments on the jira and pr, and I hope to get
>> >>> more thoughts and opinions on it.
>> >>>
>> >>> thanks,
>> >>> Imran
>>
>>
>>
>> --
>> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] SPIP: Executor Plugin (SPARK-24918)

2018-08-28 Thread Marcelo Vanzin
+1

Class init is not enough because there is nowhere for you to force a
random class to be initialized. This is basically adding that
mechanism, instead of forcing people to add hacks using e.g.
mapPartitions which don't even cover all scenarios.

On Tue, Aug 28, 2018 at 7:09 AM, Sean Owen  wrote:
> Still +0 on the idea, as I am still not sure it does much over simple JVM
> mechanisms like a class init. More comments on the JIRA. I can't say it's a
> bad idea though, so would not object to it.
>
> On Tue, Aug 28, 2018 at 8:50 AM Imran Rashid 
> wrote:
>>
>> There has been discussion on jira & the PR, all generally positive, so I'd
>> like to call a vote for this spip.
>>
>> I'll start with own +1.
>>
>> On Fri, Aug 3, 2018 at 11:59 AM Imran Rashid  wrote:
>>>
>>> I'd like to propose adding a plugin api for Executors, primarily for
>>> instrumentation and debugging
>>> (https://issues.apache.org/jira/browse/SPARK-24918).  The changes are small,
>>> but as its adding a new api, it might be spip-worthy.  I mentioned it as
>>> well in a recent email I sent about memory monitoring
>>>
>>> The spip proposal is here (and attached to the jira as well):
>>> https://docs.google.com/document/d/1a20gHGMyRbCM8aicvq4LhWfQmoA5cbHBQtyqIA2hgtc/edit?usp=sharing
>>>
>>> There are already some comments on the jira and pr, and I hope to get
>>> more thoughts and opinions on it.
>>>
>>> thanks,
>>> Imran



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Persisting driver logs in yarn client mode (SPARK-25118)

2018-08-24 Thread Marcelo Vanzin
I think this would be useful, but I also share Saisai's and Marco's
concern about the extra step when shutting down the application. If
that could be minimized this would be a much more interesting feature.

e.g. you could upload logs incrementally to HDFS, asynchronously,
while the app is running. Or you could pipe them to the YARN AM over
Spark's RPC (losing some logs in  the beginning and end of the driver
execution). Or maybe something else.

There is also the issue of shell logs being at "warn" level by
default, so even if you write these to a file, they're not really that
useful for debugging. So a solution than keeps that behavior, but
writes INFO logs to this new sink, would be great.

If you can come up with a solution to those problems I think this
could be a good feature.


On Wed, Aug 22, 2018 at 10:01 AM, Ankur Gupta
 wrote:
> Thanks for your responses Saisai and Marco.
>
> I agree that "rename" operation can be time-consuming on object storage,
> which can potentially delay the shutdown.
>
> I also agree that customers/users have a way to use log appenders to write
> log files and then send them along with Yarn application logs but I still
> think it is a cumbersome process. Also, there is the issue that customers
> cannot easily identify which logs belong to which application, without
> reading the log file. And if users run multiple applications with default
> log4j configurations on the same host, then they can end up writing to the
> same log file.
>
> Because of the issues mentioned above, we can maybe think of this as an
> optional feature, which will be disabled by default but turned on by
> customers. This will solve the problems mentioned above, reduce the overhead
> on users/customers while adding a bit of overhead during the shutdown phase
> of Spark Application.
>
> Thanks,
> Ankur
>
> On Wed, Aug 22, 2018 at 1:36 AM Marco Gaido  wrote:
>>
>> I agree with Saisai. You can also configure log4j to append anywhere else
>> other than the console. Many companies have their system for collecting and
>> monitoring logs and they just customize the log4j configuration. I am not
>> sure how needed this change would be.
>>
>> Thanks,
>> Marco
>>
>> Il giorno mer 22 ago 2018 alle ore 04:31 Saisai Shao
>>  ha scritto:
>>>
>>> One issue I can think of is that this "moving the driver log" in the
>>> application end is quite time-consuming, which will significantly delay the
>>> shutdown. We already suffered such "rename" problem for event log on object
>>> store, the moving of driver log will make the problem severe.
>>>
>>> For a vanilla Spark on yarn client application, I think user could
>>> redirect the console outputs to log and provides both driver log and yarn
>>> application log to the customers, this seems not a big overhead.
>>>
>>> Just my two cents.
>>>
>>> Thanks
>>> Saisai
>>>
>>> Ankur Gupta  于2018年8月22日周三 上午5:19写道:

 Hi all,

 I want to highlight a problem that we face here at Cloudera and start a
 discussion on how to go about solving it.

 Problem Statement:
 Our customers reach out to us when they face problems in their Spark
 Applications. Those problems can be related to Spark, environment issues,
 their own code or something else altogether. A lot of times these customers
 run their Spark Applications in Yarn Client mode, which as we all know, 
 uses
 a ConsoleAppender to print logs to the console. These customers usually 
 send
 their Yarn logs to us to troubleshoot. As you may have figured, these logs
 do not contain driver logs and makes it difficult for us to troubleshoot 
 the
 issue. In that scenario our customers end up running the application again,
 piping the output to a log file or using a local log appender and then
 sending over that file.

 I believe that there are other users in the community who also face
 similar problem, where the central team managing Spark clusters face
 difficulty in helping the end users because they ran their application in
 shell or yarn client mode (I am not sure what is the equivalent in Mesos).

 Additionally, there may be teams who want to capture all these logs so
 that they can be analyzed at some later point in time and the fact that
 driver logs are not a part of Yarn Logs causes them to capture only partial
 logs or makes it difficult to capture all the logs.

 Proposed Solution:
 One "low touch" approach will be to create an ApplicationListener which
 listens for Application Start and Application End events. On Application
 Start, this listener will append a Log Appender which writes to a local or
 remote (eg:hdfs) log file in an application specific directory and moves
 this to Yarn's Remote Application Dir (or equivalent Mesos Dir) on
 application end. This way the logs will be available as part of Yarn Logs.

 I am also interested in hearing about other ideas that 

Re: [DISCUSS] SparkR support on k8s back-end for Spark 2.4

2018-08-15 Thread Marcelo Vanzin
On Wed, Aug 15, 2018 at 1:35 PM, shane knapp  wrote:
> in fact, i don't see us getting rid of all of the centos machines until EOY
> (see my above comment, re docs, release etc).  these are the builds that
> will remain on centos for the near future:
> https://rise.cs.berkeley.edu/jenkins/label/spark-release/
> https://rise.cs.berkeley.edu/jenkins/label/spark-packaging/
> https://rise.cs.berkeley.edu/jenkins/label/spark-docs/

What is the current purpose of these builds?

- spark-release hasn't been triggered in a long time.
- spark-packaging seems to have been failing miserably for the last 10 months.
- spark-docs seems to be building the docs, is that the only place
where the docs build is tested?

In the last many releases we've moved away from using jenkins jobs for
preparing the packages, and the scripts have changed a lot to be
friendlier to people running them locally (they even support docker
now, and have flags to run "test" builds that don't require
credentials such as GPG keys).

Perhaps we should think about revamping these jobs instead of keeping
them as is.

-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

2018-08-13 Thread Marcelo Vanzin
On this topic... when I worked on 2.3.1 and caused this breakage by
deleting and old release, I tried to write some code to make this more
automatic:

https://github.com/vanzin/spark/tree/SPARK-24532

I just found that the code was a little too large and hacky for what
it does (find out the latest releases on each branch). But maybe it
would be worth to do that?

In any case, agree with Mark that checking signatures would be good, eventually.


On Sun, Jul 15, 2018 at 1:51 PM, Sean Owen  wrote:
> Yesterday I cleaned out old Spark releases from the mirror system -- we're
> supposed to only keep the latest release from active branches out on
> mirrors. (All releases are available from the Apache archive site.)
>
> Having done so I realized quickly that the HiveExternalCatalogVersionsSuite
> relies on the versions it downloads being available from mirrors. It has
> been flaky, as sometimes mirrors are unreliable. I think now it will not
> work for any versions except 2.3.1, 2.2.2, 2.1.3.
>
> Because we do need to clean those releases out of the mirrors soon anyway,
> and because they're flaky sometimes, I propose adding logic to the test to
> fall back on downloading from the Apache archive site.
>
> ... and I'll do that right away to unblock HiveExternalCatalogVersionsSuite
> runs. I think it needs to be backported to other branches as they will still
> be testing against potentially non-current Spark releases.
>
> Sean



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[RESULT] [VOTE] Spark 2.1.3 (RC2)

2018-06-29 Thread Marcelo Vanzin
The vote passes. Thanks to all who helped with the release!

I'll start publishing everything today, and an announcement will
be sent when artifacts have propagated to the mirrors (probably
early next week).

+1 (* = binding):
- Marcelo Vanzin *
- Sean Owen *
- Felix Cheung *
- Tom Graves *

+0: None

-1: None


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
Yep, that's right. There were a bunch of things that were removed from
those scripts that made it tricky to build 2.1 (like Scala 2.10
support). I think it's good to keep the scripts working for older
releases since that allows is to fix things / add features to them
without having to backport to older branches.

On Thu, Jun 28, 2018 at 11:30 AM, Felix Cheung
 wrote:
> If I recall we stop releasing Hadoop 2.3 or 2.4 in newer releases (2.2+?) -
> that might be why they are not the release script.
>
>
> ____
> From: Marcelo Vanzin 
> Sent: Thursday, June 28, 2018 11:12:45 AM
> To: Sean Owen
> Cc: Marcelo Vanzin; dev
> Subject: Re: [VOTE] Spark 2.1.3 (RC2)
>
> Alright, uploaded the missing packages.
>
> I'll send a PR to update the release scripts just in case...
>
> On Thu, Jun 28, 2018 at 10:08 AM, Sean Owen  wrote:
>> If it's easy enough to produce them, I agree you can just add them to the
>> RC
>> dir.
>>
>> On Thu, Jun 28, 2018 at 11:56 AM Marcelo Vanzin
>>  wrote:
>>>
>>> I just noticed this RC is missing builds for hadoop 2.3 and 2.4, which
>>> existed in the previous version:
>>> https://dist.apache.org/repos/dist/release/spark/spark-2.1.2/
>>>
>>> How important do we think are those? I think I can just build them and
>>> publish them to the RC directory without having to create a new RC.
>>>
>>> On Tue, Jun 26, 2018 at 1:25 PM, Marcelo Vanzin 
>>> wrote:
>>> > Please vote on releasing the following candidate as Apache Spark
>>> > version
>>> > 2.1.3.
>>> >
>>> > The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if
>>> > a
>>> > majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>> >
>>> > [ ] +1 Release this package as Apache Spark 2.1.3
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> > To learn more about Apache Spark, please see http://spark.apache.org/
>>> >
>>> > The tag to be voted on is v2.1.3-rc2 (commit b7eac07b):
>>> > https://github.com/apache/spark/tree/v2.1.3-rc2
>>> >
>>> > The release files, including signatures, digests, etc. can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-bin/
>>> >
>>> > Signatures used for Spark RCs can be found in this file:
>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >
>>> > The staging repository for this release can be found at:
>>> > https://repository.apache.org/content/repositories/orgapachespark-1275/
>>> >
>>> > The documentation corresponding to this release can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-docs/
>>> >
>>> > The list of bug fixes going into 2.1.3 can be found at the following
>>> > URL:
>>> > https://issues.apache.org/jira/projects/SPARK/versions/12341660
>>> >
>>> > Notes:
>>> >
>>> > - RC1 was not sent for a vote. I had trouble building it, and by the
>>> > time I got
>>> >   things fixed, there was a blocker bug filed. It was already tagged in
>>> > git
>>> >   at that time.
>>> >
>>> > - If testing the source package, I recommend using Java 8, even though
>>> > 2.1
>>> >   supports Java 7 (and the RC was built with JDK 7). This is because
>>> > Maven
>>> >   Central has updated some configuration that makes the default Java 7
>>> > SSL
>>> >   config not work.
>>> >
>>> > - There are Maven artifacts published for Scala 2.10, but binary
>>> > releases are only
>>> >   available for Scala 2.11. This matches the previous release (2.1.2),
>>> > but if there's
>>> >   a need / desire to have pre-built distributions for Scala 2.10, I can
>>> > probably
>>> >   amend the RC without having to create a new one.
>>> >
>>> > FAQ
>>> >
>>> > =
>>> > How can I help test this release?
>>> > =
>>> >
>>> > If you are a Spark user, you can help us test this release by taking
>>> > an existing Spark workload and running on this release candidate, then
>>> > reporting any regressions.
>>> >
>>> > If you're working in PySpark you can set up a virtual env and install
>>> > the 

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
Alright, uploaded the missing packages.

I'll send a PR to update the release scripts just in case...

On Thu, Jun 28, 2018 at 10:08 AM, Sean Owen  wrote:
> If it's easy enough to produce them, I agree you can just add them to the RC
> dir.
>
> On Thu, Jun 28, 2018 at 11:56 AM Marcelo Vanzin
>  wrote:
>>
>> I just noticed this RC is missing builds for hadoop 2.3 and 2.4, which
>> existed in the previous version:
>> https://dist.apache.org/repos/dist/release/spark/spark-2.1.2/
>>
>> How important do we think are those? I think I can just build them and
>> publish them to the RC directory without having to create a new RC.
>>
>> On Tue, Jun 26, 2018 at 1:25 PM, Marcelo Vanzin 
>> wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 2.1.3.
>> >
>> > The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if
>> > a
>> > majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.1.3
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.1.3-rc2 (commit b7eac07b):
>> > https://github.com/apache/spark/tree/v2.1.3-rc2
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-bin/
>> >
>> > Signatures used for Spark RCs can be found in this file:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1275/
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-docs/
>> >
>> > The list of bug fixes going into 2.1.3 can be found at the following
>> > URL:
>> > https://issues.apache.org/jira/projects/SPARK/versions/12341660
>> >
>> > Notes:
>> >
>> > - RC1 was not sent for a vote. I had trouble building it, and by the
>> > time I got
>> >   things fixed, there was a blocker bug filed. It was already tagged in
>> > git
>> >   at that time.
>> >
>> > - If testing the source package, I recommend using Java 8, even though
>> > 2.1
>> >   supports Java 7 (and the RC was built with JDK 7). This is because
>> > Maven
>> >   Central has updated some configuration that makes the default Java 7
>> > SSL
>> >   config not work.
>> >
>> > - There are Maven artifacts published for Scala 2.10, but binary
>> > releases are only
>> >   available for Scala 2.11. This matches the previous release (2.1.2),
>> > but if there's
>> >   a need / desire to have pre-built distributions for Scala 2.10, I can
>> > probably
>> >   amend the RC without having to create a new one.
>> >
>> > FAQ
>> >
>> > =
>> > How can I help test this release?
>> > =
>> >
>> > If you are a Spark user, you can help us test this release by taking
>> > an existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> > the current RC and see if anything important breaks, in the Java/Scala
>> > you can add the staging repository to your projects resolvers and test
>> > with the RC (make sure to clean up the artifact cache before/after so
>> > you don't end up building with a out of date RC going forward).
>> >
>> > ===
>> > What should happen to JIRA tickets still targeting 2.1.3?
>> > ===
>> >
>> > The current list of open tickets targeted at 2.1.3 can be found at:
>> > https://s.apache.org/spark-2.1.3
>> >
>> > Committers should look at those and triage. Extremely important bug
>> > fixes, documentation, and API tweaks that impact compatibility should
>> > be worked on immediately. Everything else please retarget to an
>> > appropriate release.
>> >
>> > ==
>> > But my bug isn't fixed?
>> > ==
>> >
>> > In order to make timely releases, we will typically not hold the
>> > release unless the bug in question is a regression from the previous
>> > release. That being said, if there is something which is a regression
>> > that has not been correctly targeted please ping me or a committer to
>> > help target the issue.
>> >
>> >
>> > --
>> > Marcelo
>>
>>
>>
>> --
>> Marcelo
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
I just noticed this RC is missing builds for hadoop 2.3 and 2.4, which
existed in the previous version:
https://dist.apache.org/repos/dist/release/spark/spark-2.1.2/

How important do we think are those? I think I can just build them and
publish them to the RC directory without having to create a new RC.

On Tue, Jun 26, 2018 at 1:25 PM, Marcelo Vanzin  wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 2.1.3.
>
> The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.1.3
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.3-rc2 (commit b7eac07b):
> https://github.com/apache/spark/tree/v2.1.3-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1275/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-docs/
>
> The list of bug fixes going into 2.1.3 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12341660
>
> Notes:
>
> - RC1 was not sent for a vote. I had trouble building it, and by the time I 
> got
>   things fixed, there was a blocker bug filed. It was already tagged in git
>   at that time.
>
> - If testing the source package, I recommend using Java 8, even though 2.1
>   supports Java 7 (and the RC was built with JDK 7). This is because Maven
>   Central has updated some configuration that makes the default Java 7 SSL
>   config not work.
>
> - There are Maven artifacts published for Scala 2.10, but binary
> releases are only
>   available for Scala 2.11. This matches the previous release (2.1.2),
> but if there's
>   a need / desire to have pre-built distributions for Scala 2.10, I can 
> probably
>   amend the RC without having to create a new one.
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.1.3?
> ===
>
> The current list of open tickets targeted at 2.1.3 can be found at:
> https://s.apache.org/spark-2.1.3
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
BTW that would be a great fix in the docs now that we'll have a 2.3.2
being prepared.

On Thu, Jun 28, 2018 at 9:17 AM, Felix Cheung  wrote:
> Exactly...
>
> 
> From: Marcelo Vanzin 
> Sent: Thursday, June 28, 2018 9:16:08 AM
> To: Tom Graves
> Cc: Felix Cheung; dev
>
> Subject: Re: [VOTE] Spark 2.1.3 (RC2)
>
> Yeah, we should be more careful with that in general. Like we state
> that "Spark runs on Java 8+"...
>
> On Thu, Jun 28, 2018 at 9:13 AM, Tom Graves  wrote:
>> Right we say we support R3.1+ but we never actually did, so agree its a
>> bug
>> but its not a regression since we never really supported them or tested
>> with
>> them and its not a logic or security bug that ends in corruptions or bad
>> behavior so in my opinion its not a blocker.   Again I'm fine with adding
>> it
>> though if others agree.   Maybe we should really change our documentation
>> to
>> state more clearly what versions we know it works with and have tested
>> with
>> since someone could read R3.1+ as it works with R4 (once released) which
>> very well might not be the case.
>>
>>
>> I'm +1 on the release.
>>
>> Tom
>>
>> On Thursday, June 28, 2018, 10:28:21 AM CDT, Felix Cheung
>>  wrote:
>>
>>
>> Not pushing back, but our support message has always been R 3.1+ so it a
>> bit
>> off to say we don’t support newer releases.
>>
>> https://spark.apache.org/docs/2.1.2/
>>
>> But looking back, this was found during 2.1.2 RC2 and didn’t fix (in time)
>> for 2.1.2?
>>
>>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555
>>
>> Since it isn’t a regression I’d say +1 from me.
>>
>>
>> 
>> From: Tom Graves 
>> Sent: Thursday, June 28, 2018 6:56:16 AM
>> To: Marcelo Vanzin; Felix Cheung
>> Cc: dev
>> Subject: Re: [VOTE] Spark 2.1.3 (RC2)
>>
>> If this is just supporting newer versions of R that 2.1 never supported
>> then
>> I would say its not a blocker. But if you feel its useful enough then I
>> would say its up to Marcelo if he wants to pull in and spin another rc.
>>
>> Tom
>>
>> On Wednesday, June 27, 2018, 8:57:25 PM CDT, Felix Cheung
>>  wrote:
>>
>>
>> Yes, this is broken with newer version of R.
>>
>> We check explicitly for warning for the R check which should fail the test
>> run.
>>
>> 
>> From: Marcelo Vanzin 
>> Sent: Wednesday, June 27, 2018 6:55 PM
>> To: Felix Cheung
>> Cc: Marcelo Vanzin; Tom Graves; dev
>> Subject: Re: [VOTE] Spark 2.1.3 (RC2)
>>
>> Not sure I understand that bug. Is it a compatibility issue with new
>> versions of R?
>>
>> It's at least marked as fixed in 2.2(.1).
>>
>> We do run jenkins on these branches, but that seems like just a
>> warning, which would not fail those builds...
>>
>> On Wed, Jun 27, 2018 at 6:12 PM, Felix Cheung 
>> wrote:
>>> (I don’t want to block the release(s) per se...)
>>>
>>> We need to backport SPARK-22281 (to branch-2.1 and branch-2.2)
>>>
>>> This is fixed in 2.3 back in Nov 2017
>>>
>>>
>>> https://github.com/apache/spark/commit/2ca5aae47a25dc6bc9e333fb592025ff14824501#diff-e1e1d3d40573127e9ee0480caf1283d6
>>>
>>> Perhaps we don't get Jenkins run on these branches? It should have been
>>> detected.
>>>
>>> * checking for code/documentation mismatches ... WARNING
>>> Codoc mismatches from documentation object 'attach':
>>> attach
>>> Code: function(what, pos = 2L, name = deparse(substitute(what),
>>> backtick = FALSE), warn.conflicts = TRUE)
>>> Docs: function(what, pos = 2L, name = deparse(substitute(what)),
>>> warn.conflicts = TRUE)
>>> Mismatches in argument default values:
>>> Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs:
>>> deparse(substitute(what))
>>>
>>> Codoc mismatches from documentation object 'glm':
>>> glm
>>> Code: function(formula, family = gaussian, data, weights, subset,
>>> na.action, start = NULL, etastart, mustart, offset,
>>> control = list(...), model = TRUE, method = "glm.fit",
>>> x = FALSE, y = TRUE, singular.ok = TRUE, contrasts =
>>> NULL, ...)
>>> Docs: function(formula, family = gaussian, data, weights, subset,
>>> na.action, start = NULL, etastart, mustart, of

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
Yeah, we should be more careful with that in general. Like we state
that "Spark runs on Java 8+"...

On Thu, Jun 28, 2018 at 9:13 AM, Tom Graves  wrote:
> Right we say we support R3.1+ but we never actually did, so agree its a bug
> but its not a regression since we never really supported them or tested with
> them and its not a logic or security bug that ends in corruptions or bad
> behavior so in my opinion its not a blocker.   Again I'm fine with adding it
> though if others agree.   Maybe we should really change our documentation to
> state more clearly what versions we know it works with and have tested with
> since someone could read R3.1+ as it works with R4 (once released) which
> very well might not be the case.
>
>
> I'm +1 on the release.
>
> Tom
>
> On Thursday, June 28, 2018, 10:28:21 AM CDT, Felix Cheung
>  wrote:
>
>
> Not pushing back, but our support message has always been R 3.1+ so it a bit
> off to say we don’t support newer releases.
>
> https://spark.apache.org/docs/2.1.2/
>
> But looking back, this was found during 2.1.2 RC2 and didn’t fix (in time)
> for 2.1.2?
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555
>
> Since it isn’t a regression I’d say +1 from me.
>
>
> ________
> From: Tom Graves 
> Sent: Thursday, June 28, 2018 6:56:16 AM
> To: Marcelo Vanzin; Felix Cheung
> Cc: dev
> Subject: Re: [VOTE] Spark 2.1.3 (RC2)
>
> If this is just supporting newer versions of R that 2.1 never supported then
> I would say its not a blocker. But if you feel its useful enough then I
> would say its up to Marcelo if he wants to pull in and spin another rc.
>
> Tom
>
> On Wednesday, June 27, 2018, 8:57:25 PM CDT, Felix Cheung
>  wrote:
>
>
> Yes, this is broken with newer version of R.
>
> We check explicitly for warning for the R check which should fail the test
> run.
>
> 
> From: Marcelo Vanzin 
> Sent: Wednesday, June 27, 2018 6:55 PM
> To: Felix Cheung
> Cc: Marcelo Vanzin; Tom Graves; dev
> Subject: Re: [VOTE] Spark 2.1.3 (RC2)
>
> Not sure I understand that bug. Is it a compatibility issue with new
> versions of R?
>
> It's at least marked as fixed in 2.2(.1).
>
> We do run jenkins on these branches, but that seems like just a
> warning, which would not fail those builds...
>
> On Wed, Jun 27, 2018 at 6:12 PM, Felix Cheung 
> wrote:
>> (I don’t want to block the release(s) per se...)
>>
>> We need to backport SPARK-22281 (to branch-2.1 and branch-2.2)
>>
>> This is fixed in 2.3 back in Nov 2017
>>
>> https://github.com/apache/spark/commit/2ca5aae47a25dc6bc9e333fb592025ff14824501#diff-e1e1d3d40573127e9ee0480caf1283d6
>>
>> Perhaps we don't get Jenkins run on these branches? It should have been
>> detected.
>>
>> * checking for code/documentation mismatches ... WARNING
>> Codoc mismatches from documentation object 'attach':
>> attach
>> Code: function(what, pos = 2L, name = deparse(substitute(what),
>> backtick = FALSE), warn.conflicts = TRUE)
>> Docs: function(what, pos = 2L, name = deparse(substitute(what)),
>> warn.conflicts = TRUE)
>> Mismatches in argument default values:
>> Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs:
>> deparse(substitute(what))
>>
>> Codoc mismatches from documentation object 'glm':
>> glm
>> Code: function(formula, family = gaussian, data, weights, subset,
>> na.action, start = NULL, etastart, mustart, offset,
>> control = list(...), model = TRUE, method = "glm.fit",
>> x = FALSE, y = TRUE, singular.ok = TRUE, contrasts =
>> NULL, ...)
>> Docs: function(formula, family = gaussian, data, weights, subset,
>> na.action, start = NULL, etastart, mustart, offset,
>> control = list(...), model = TRUE, method = "glm.fit",
>> x = FALSE, y = TRUE, contrasts = NULL, ...)
>> Argument names in code not in docs:
>> singular.ok
>> Mismatches in argument names:
>> Position: 16 Code: singular.ok Docs: contrasts
>> Position: 17 Code: contrasts Docs: ...
>>
>> 
>> From: Sean Owen 
>> Sent: Wednesday, June 27, 2018 5:02:37 AM
>> To: Marcelo Vanzin
>> Cc: dev
>> Subject: Re: [VOTE] Spark 2.1.3 (RC2)
>>
>> +1 from me too for the usual reasons.
>>
>> On Tue, Jun 26, 2018 at 3:25 PM Marcelo Vanzin
>> 
>> wrote:
>>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.1.3.
>>>
>>> The vote is open until Fri, June 2

Re: Time for 2.3.2?

2018-06-28 Thread Marcelo Vanzin
Could you mark that bug as blocker and set the target version, in that case?

On Thu, Jun 28, 2018 at 8:46 AM, Felix Cheung 
wrote:

> +1
>
> I’d like to fix SPARK-24535 first though
>
> --
> *From:* Stavros Kontopoulos 
> *Sent:* Thursday, June 28, 2018 3:50:34 AM
> *To:* Marco Gaido
> *Cc:* Takeshi Yamamuro; Xingbo Jiang; Wenchen Fan; Spark dev list; Saisai
> Shao; van...@cloudera.com.invalid
> *Subject:* Re: Time for 2.3.2?
>
> +1 makes sense.
>
> On Thu, Jun 28, 2018 at 12:07 PM, Marco Gaido 
> wrote:
>
>> +1 too, I'd consider also to include SPARK-24208 if we can solve it
>> timely...
>>
>> 2018-06-28 8:28 GMT+02:00 Takeshi Yamamuro :
>>
>>> +1, I heard some Spark users have skipped v2.3.1 because of these bugs.
>>>
>>> On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang 
>>> wrote:
>>>
>>>> +1
>>>>
>>>> Wenchen Fan 于2018年6月28日 周四下午2:06写道:
>>>>
>>>>> Hi Saisai, that's great! please go ahead!
>>>>>
>>>>> On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao 
>>>>> wrote:
>>>>>
>>>>>> +1, like mentioned by Marcelo, these issues seems quite severe.
>>>>>>
>>>>>> I can work on the release if short of hands :).
>>>>>>
>>>>>> Thanks
>>>>>> Jerry
>>>>>>
>>>>>>
>>>>>> Marcelo Vanzin  于2018年6月28日周四 上午11:40写道:
>>>>>>
>>>>>>> +1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
>>>>>>> for those out.
>>>>>>>
>>>>>>> (Those are what delayed 2.2.2 and 2.1.3 for those watching...)
>>>>>>>
>>>>>>> On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan 
>>>>>>> wrote:
>>>>>>> > Hi all,
>>>>>>> >
>>>>>>> > Spark 2.3.1 was released just a while ago, but unfortunately we
>>>>>>> discovered
>>>>>>> > and fixed some critical issues afterward.
>>>>>>> >
>>>>>>> > SPARK-24495: SortMergeJoin may produce wrong result.
>>>>>>> > This is a serious correctness bug, and is easy to hit: have
>>>>>>> duplicated join
>>>>>>> > key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`,
>>>>>>> and the
>>>>>>> > join is a sort merge join. This bug is only present in Spark 2.3.
>>>>>>> >
>>>>>>> > SPARK-24588: stream-stream join may produce wrong result
>>>>>>> > This is a correctness bug in a new feature of Spark 2.3: the
>>>>>>> stream-stream
>>>>>>> > join. Users can hit this bug if one of the join side is
>>>>>>> partitioned by a
>>>>>>> > subset of the join keys.
>>>>>>> >
>>>>>>> > SPARK-24552: Task attempt numbers are reused when stages are
>>>>>>> retried
>>>>>>> > This is a long-standing bug in the output committer that may
>>>>>>> introduce data
>>>>>>> > corruption.
>>>>>>> >
>>>>>>> > SPARK-24542: UDFXPath allow users to pass carefully crafted
>>>>>>> XML to
>>>>>>> > access arbitrary files
>>>>>>> > This is a potential security issue if users build access control
>>>>>>> module upon
>>>>>>> > Spark.
>>>>>>> >
>>>>>>> > I think we need a Spark 2.3.2 to address these issues(especially
>>>>>>> the
>>>>>>> > correctness bugs) ASAP. Any thoughts?
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Wenchen
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Marcelo
>>>>>>>
>>>>>>> 
>>>>>>> -
>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>
>>>>>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>
>
>
> --
> Stavros Kontopoulos
>
> *Senior Software Engineer *
> *Lightbend, Inc. *
>
> *p:  +30 6977967274 <%2B1%20650%20678%200020>*
> *e: stavros.kontopou...@lightbend.com* 
>
>
>


-- 
Marcelo


Re: Time for 2.3.2?

2018-06-27 Thread Marcelo Vanzin
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
for those out.

(Those are what delayed 2.2.2 and 2.1.3 for those watching...)

On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan  wrote:
> Hi all,
>
> Spark 2.3.1 was released just a while ago, but unfortunately we discovered
> and fixed some critical issues afterward.
>
> SPARK-24495: SortMergeJoin may produce wrong result.
> This is a serious correctness bug, and is easy to hit: have duplicated join
> key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
> join is a sort merge join. This bug is only present in Spark 2.3.
>
> SPARK-24588: stream-stream join may produce wrong result
> This is a correctness bug in a new feature of Spark 2.3: the stream-stream
> join. Users can hit this bug if one of the join side is partitioned by a
> subset of the join keys.
>
> SPARK-24552: Task attempt numbers are reused when stages are retried
> This is a long-standing bug in the output committer that may introduce data
> corruption.
>
> SPARK-24542: UDFXPath allow users to pass carefully crafted XML to
> access arbitrary files
> This is a potential security issue if users build access control module upon
> Spark.
>
> I think we need a Spark 2.3.2 to address these issues(especially the
> correctness bugs) ASAP. Any thoughts?
>
> Thanks,
> Wenchen



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-27 Thread Marcelo Vanzin
On Wed, Jun 27, 2018 at 6:57 PM, Felix Cheung  wrote:
> Yes, this is broken with newer version of R.
>
> We check explicitly for warning for the R check which should fail the test
> run.

Hmm, something is missing somewhere then, because Jenkins seems mostly
happy aside from a few flakes:
https://amplab.cs.berkeley.edu/jenkins/user/vanzin/my-views/view/Spark/

(Look for the 2.1 branch jobs.)


> ____
> From: Marcelo Vanzin 
> Sent: Wednesday, June 27, 2018 6:55 PM
> To: Felix Cheung
> Cc: Marcelo Vanzin; Tom Graves; dev
>
> Subject: Re: [VOTE] Spark 2.1.3 (RC2)
>
> Not sure I understand that bug. Is it a compatibility issue with new
> versions of R?
>
> It's at least marked as fixed in 2.2(.1).
>
> We do run jenkins on these branches, but that seems like just a
> warning, which would not fail those builds...
>
> On Wed, Jun 27, 2018 at 6:12 PM, Felix Cheung 
> wrote:
>> (I don’t want to block the release(s) per se...)
>>
>> We need to backport SPARK-22281 (to branch-2.1 and branch-2.2)
>>
>> This is fixed in 2.3 back in Nov 2017
>>
>> https://github.com/apache/spark/commit/2ca5aae47a25dc6bc9e333fb592025ff14824501#diff-e1e1d3d40573127e9ee0480caf1283d6
>>
>> Perhaps we don't get Jenkins run on these branches? It should have been
>> detected.
>>
>> * checking for code/documentation mismatches ... WARNING
>> Codoc mismatches from documentation object 'attach':
>> attach
>> Code: function(what, pos = 2L, name = deparse(substitute(what),
>> backtick = FALSE), warn.conflicts = TRUE)
>> Docs: function(what, pos = 2L, name = deparse(substitute(what)),
>> warn.conflicts = TRUE)
>> Mismatches in argument default values:
>> Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs:
>> deparse(substitute(what))
>>
>> Codoc mismatches from documentation object 'glm':
>> glm
>> Code: function(formula, family = gaussian, data, weights, subset,
>> na.action, start = NULL, etastart, mustart, offset,
>> control = list(...), model = TRUE, method = "glm.fit",
>> x = FALSE, y = TRUE, singular.ok = TRUE, contrasts =
>> NULL, ...)
>> Docs: function(formula, family = gaussian, data, weights, subset,
>> na.action, start = NULL, etastart, mustart, offset,
>> control = list(...), model = TRUE, method = "glm.fit",
>> x = FALSE, y = TRUE, contrasts = NULL, ...)
>> Argument names in code not in docs:
>> singular.ok
>> Mismatches in argument names:
>> Position: 16 Code: singular.ok Docs: contrasts
>> Position: 17 Code: contrasts Docs: ...
>>
>> 
>> From: Sean Owen 
>> Sent: Wednesday, June 27, 2018 5:02:37 AM
>> To: Marcelo Vanzin
>> Cc: dev
>> Subject: Re: [VOTE] Spark 2.1.3 (RC2)
>>
>> +1 from me too for the usual reasons.
>>
>> On Tue, Jun 26, 2018 at 3:25 PM Marcelo Vanzin
>> 
>> wrote:
>>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.1.3.
>>>
>>> The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.1.3
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.1.3-rc2 (commit b7eac07b):
>>> https://github.com/apache/spark/tree/v2.1.3-rc2
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1275/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-docs/
>>>
>>> The list of bug fixes going into 2.1.3 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12341660
>>>
>>> Notes:
>>>
>>> - RC1 was not sent for a vote. I had trouble building it, and by the time
>>> I got
>>> things fixed, there was a blocker bug filed. It was already tagged in
>>> git
>>> at that time.
>>>
>>> - If testing the s

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-27 Thread Marcelo Vanzin
Not sure I understand that bug. Is it a compatibility issue with new
versions of R?

It's at least marked as fixed in 2.2(.1).

We do run jenkins on these branches, but that seems like just a
warning, which would not fail those builds...

On Wed, Jun 27, 2018 at 6:12 PM, Felix Cheung  wrote:
> (I don’t want to block the release(s) per se...)
>
> We need to backport SPARK-22281 (to branch-2.1 and branch-2.2)
>
> This is fixed in 2.3 back in Nov 2017
> https://github.com/apache/spark/commit/2ca5aae47a25dc6bc9e333fb592025ff14824501#diff-e1e1d3d40573127e9ee0480caf1283d6
>
> Perhaps we don't get Jenkins run on these branches? It should have been
> detected.
>
> * checking for code/documentation mismatches ... WARNING
> Codoc mismatches from documentation object 'attach':
> attach
> Code: function(what, pos = 2L, name = deparse(substitute(what),
> backtick = FALSE), warn.conflicts = TRUE)
> Docs: function(what, pos = 2L, name = deparse(substitute(what)),
> warn.conflicts = TRUE)
> Mismatches in argument default values:
> Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs:
> deparse(substitute(what))
>
> Codoc mismatches from documentation object 'glm':
> glm
> Code: function(formula, family = gaussian, data, weights, subset,
> na.action, start = NULL, etastart, mustart, offset,
> control = list(...), model = TRUE, method = "glm.fit",
> x = FALSE, y = TRUE, singular.ok = TRUE, contrasts =
> NULL, ...)
> Docs: function(formula, family = gaussian, data, weights, subset,
> na.action, start = NULL, etastart, mustart, offset,
> control = list(...), model = TRUE, method = "glm.fit",
> x = FALSE, y = TRUE, contrasts = NULL, ...)
> Argument names in code not in docs:
> singular.ok
> Mismatches in argument names:
> Position: 16 Code: singular.ok Docs: contrasts
> Position: 17 Code: contrasts Docs: ...
>
> 
> From: Sean Owen 
> Sent: Wednesday, June 27, 2018 5:02:37 AM
> To: Marcelo Vanzin
> Cc: dev
> Subject: Re: [VOTE] Spark 2.1.3 (RC2)
>
> +1 from me too for the usual reasons.
>
> On Tue, Jun 26, 2018 at 3:25 PM Marcelo Vanzin 
> wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.1.3.
>>
>> The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 2.1.3
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.1.3-rc2 (commit b7eac07b):
>> https://github.com/apache/spark/tree/v2.1.3-rc2
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1275/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-docs/
>>
>> The list of bug fixes going into 2.1.3 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12341660
>>
>> Notes:
>>
>> - RC1 was not sent for a vote. I had trouble building it, and by the time
>> I got
>>   things fixed, there was a blocker bug filed. It was already tagged in
>> git
>>   at that time.
>>
>> - If testing the source package, I recommend using Java 8, even though 2.1
>>   supports Java 7 (and the RC was built with JDK 7). This is because Maven
>>   Central has updated some configuration that makes the default Java 7 SSL
>>   config not work.
>>
>> - There are Maven artifacts published for Scala 2.10, but binary
>> releases are only
>>   available for Scala 2.11. This matches the previous release (2.1.2),
>> but if there's
>>   a need / desire to have pre-built distributions for Scala 2.10, I can
>> probably
>>   amend the RC without having to create a new one.
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env a

Re: [VOTE] Spark 2.2.2 (RC2)

2018-06-27 Thread Marcelo Vanzin
+1

Checked sigs + ran a bunch of tests on the hadoop-2.7 binary package.

On Wed, Jun 27, 2018 at 1:30 PM, Tom Graves
 wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.2.2.
>
> The vote is open until Mon, July 2nd @ 9PM UTC (2PM PDT) and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.2.2
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.2.2-rc2 (commit
> fc28ba3db7185e84b6dbd02ad8ef8f1d06b9e3c6):
> https://github.com/apache/spark/tree/v2.2.2-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.2.2-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1276/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.2.2-rc2-docs/
>
> The list of bug fixes going into 2.2.2 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342171
>
>
> Notes:
>
> - RC1 was not sent for a vote. I had trouble building it, and by the time I
> got
>   things fixed, there was a blocker bug filed. It was already tagged in git
>   at that time.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.2.2?
> ===
>
> The current list of open tickets targeted at 2.2.2 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.2.2
>
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Tom Graves



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[VOTE] Spark 2.1.3 (RC2)

2018-06-26 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.1.3.

The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.1.3
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.3-rc2 (commit b7eac07b):
https://github.com/apache/spark/tree/v2.1.3-rc2

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1275/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-docs/

The list of bug fixes going into 2.1.3 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12341660

Notes:

- RC1 was not sent for a vote. I had trouble building it, and by the time I got
  things fixed, there was a blocker bug filed. It was already tagged in git
  at that time.

- If testing the source package, I recommend using Java 8, even though 2.1
  supports Java 7 (and the RC was built with JDK 7). This is because Maven
  Central has updated some configuration that makes the default Java 7 SSL
  config not work.

- There are Maven artifacts published for Scala 2.10, but binary
releases are only
  available for Scala 2.11. This matches the previous release (2.1.2),
but if there's
  a need / desire to have pre-built distributions for Scala 2.10, I can probably
  amend the RC without having to create a new one.

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.1.3?
===

The current list of open tickets targeted at 2.1.3 can be found at:
https://s.apache.org/spark-2.1.3

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-26 Thread Marcelo Vanzin
Starting with my own +1.

On Tue, Jun 26, 2018 at 1:25 PM, Marcelo Vanzin  wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 2.1.3.
>
> The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.1.3
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.3-rc2 (commit b7eac07b):
> https://github.com/apache/spark/tree/v2.1.3-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1275/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-docs/
>
> The list of bug fixes going into 2.1.3 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12341660
>
> Notes:
>
> - RC1 was not sent for a vote. I had trouble building it, and by the time I 
> got
>   things fixed, there was a blocker bug filed. It was already tagged in git
>   at that time.
>
> - If testing the source package, I recommend using Java 8, even though 2.1
>   supports Java 7 (and the RC was built with JDK 7). This is because Maven
>   Central has updated some configuration that makes the default Java 7 SSL
>   config not work.
>
> - There are Maven artifacts published for Scala 2.10, but binary
> releases are only
>   available for Scala 2.11. This matches the previous release (2.1.2),
> but if there's
>   a need / desire to have pre-built distributions for Scala 2.10, I can 
> probably
>   amend the RC without having to create a new one.
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.1.3?
> ===
>
> The current list of open tickets targeted at 2.1.3 can be found at:
> https://s.apache.org/spark-2.1.3
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Time for 2.1.3

2018-06-19 Thread Marcelo Vanzin
Quick update for everybody: I was trying to deal with the release
scripts to get them to work with 2.1; there were some fixes needed,
and on top of that Maven Central changed something over the weekend
which made Java 7 unhappy.

I actually was able to create an RC1 after many tries and tweaking,
but there's also currently at least one blocker that I think we should
pick up. So right now I'm waiting for that.

So heads up that RC2 will be the first one put up for a vote (I'll
explain that in the vote e-mail so people who missed this one don't
scratch their heads for too long).


On Tue, Jun 12, 2018 at 4:27 PM, Marcelo Vanzin  wrote:
> Hey all,
>
> There are some fixes that went into 2.1.3 recently that probably
> deserve a release. So as usual, please take a look if there's anything
> else you'd like on that release, otherwise I'd like to start with the
> process by early next week.
>
> I'll go through jira to see what's the status of things targeted at
> that release, but last I checked there wasn't anything on the radar.
>
> Thanks!
>
> --
> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [ANNOUNCE] Announcing Apache Spark 2.3.1

2018-06-14 Thread Marcelo Vanzin
Hi Jacek,

I seriously have no idea... I don't even know who owns that account (I
hope they have some connection with the PMC?).

But it seems whoever owns it already sent something.

On Thu, Jun 14, 2018 at 12:31 AM, Jacek Laskowski  wrote:
> Hi Marcelo,
>
> How to announce it on twitter @ https://twitter.com/apachespark? How to make
> it part of the release process?
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> Mastering Spark SQL https://bit.ly/mastering-spark-sql
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> Follow me at https://twitter.com/jaceklaskowski
>
> On Mon, Jun 11, 2018 at 9:47 PM, Marcelo Vanzin  wrote:
>>
>> We are happy to announce the availability of Spark 2.3.1!
>>
>> Apache Spark 2.3.1 is a maintenance release, based on the branch-2.3
>> maintenance branch of Spark. We strongly recommend all 2.3.x users to
>> upgrade to this stable release.
>>
>> To download Spark 2.3.1, head over to the download page:
>> http://spark.apache.org/downloads.html
>>
>> To view the release notes:
>> https://spark.apache.org/releases/spark-release-2-3-1.html
>>
>> We would like to acknowledge all community members for contributing to
>> this release. This release would not have been possible without you.
>>
>>
>> --
>> Marcelo
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Missing HiveConf when starting PySpark from head

2018-06-14 Thread Marcelo Vanzin
Yes, my bad. The code in session.py needs to also catch TypeError like before.

On Thu, Jun 14, 2018 at 11:03 AM, Li Jin  wrote:
> Sounds good. Thanks all for the quick reply.
>
> https://issues.apache.org/jira/browse/SPARK-24563
>
>
> On Thu, Jun 14, 2018 at 12:19 PM, Xiao Li  wrote:
>>
>> Thanks for catching this. Please feel free to submit a PR. I do not think
>> Vanzin wants to introduce the behavior changes in that PR. We should do the
>> code review more carefully.
>>
>> Xiao
>>
>> 2018-06-14 9:18 GMT-07:00 Li Jin :
>>>
>>> Are there objection to restore the behavior for PySpark users? I am happy
>>> to submit a patch.
>>>
>>> On Thu, Jun 14, 2018 at 12:15 PM Reynold Xin  wrote:

 The behavior change is not good...

 On Thu, Jun 14, 2018 at 9:05 AM Li Jin  wrote:
>
> Ah, looks like it's this change:
>
> https://github.com/apache/spark/commit/b3417b731d4e323398a0d7ec6e86405f4464f4f9#diff-3b5463566251d5b09fd328738a9e9bc5
>
> It seems strange that by default Spark doesn't build with Hive but by
> default PySpark requires it...
>
> This might also be a behavior change to PySpark users that build Spark
> without Hive. The old behavior is "fall back to non-hive support" and the
> new behavior is "program won't start".
>
> On Thu, Jun 14, 2018 at 11:51 AM, Sean Owen  wrote:
>>
>> I think you would have to build with the 'hive' profile? but if so
>> that would have been true for a while now.
>>
>>
>> On Thu, Jun 14, 2018 at 10:38 AM Li Jin  wrote:
>>>
>>> Hey all,
>>>
>>> I just did a clean checkout of github.com/apache/spark but failed to
>>> start PySpark, this is what I did:
>>>
>>> git clone g...@github.com:apache/spark.git; cd spark; build/sbt
>>> package; bin/pyspark
>>>
>>>
>>> And got this exception:
>>>
>>> (spark-dev) Lis-MacBook-Pro:spark icexelloss$ bin/pyspark
>>>
>>> Python 3.6.3 |Anaconda, Inc.| (default, Nov  8 2017, 18:10:31)
>>>
>>> [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
>>>
>>> Type "help", "copyright", "credits" or "license" for more
>>> information.
>>>
>>> 18/06/14 11:34:14 WARN NativeCodeLoader: Unable to load native-hadoop
>>> library for your platform... using builtin-java classes where applicable
>>>
>>> Using Spark's default log4j profile:
>>> org/apache/spark/log4j-defaults.properties
>>>
>>> Setting default log level to "WARN".
>>>
>>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>>> setLogLevel(newLevel).
>>>
>>>
>>> /Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py:45:
>>> UserWarning: Failed to initialize Spark session.
>>>
>>>   warnings.warn("Failed to initialize Spark session.")
>>>
>>> Traceback (most recent call last):
>>>
>>>   File
>>> "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py", 
>>> line
>>> 41, in 
>>>
>>> spark = SparkSession._create_shell_session()
>>>
>>>   File
>>> "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/sql/session.py",
>>> line 564, in _create_shell_session
>>>
>>> SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf()
>>>
>>> TypeError: 'JavaPackage' object is not callable
>>>
>>>
>>> I also tried to delete hadoop deps from my ivy2 cache and reinstall
>>> them but no luck. I wonder:
>>>
>>> I have not seen this before, could this be caused by recent change to
>>> head?
>>> Am I doing something wrong in the build process?
>>>
>>>
>>> Thanks much!
>>> Li
>>>
>
>>
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Time for 2.1.3

2018-06-12 Thread Marcelo Vanzin
Hey all,

There are some fixes that went into 2.1.3 recently that probably
deserve a release. So as usual, please take a look if there's anything
else you'd like on that release, otherwise I'd like to start with the
process by early next week.

I'll go through jira to see what's the status of things targeted at
that release, but last I checked there wasn't anything on the radar.

Thanks!

-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[ANNOUNCE] Announcing Apache Spark 2.3.1

2018-06-11 Thread Marcelo Vanzin
We are happy to announce the availability of Spark 2.3.1!

Apache Spark 2.3.1 is a maintenance release, based on the branch-2.3
maintenance branch of Spark. We strongly recommend all 2.3.x users to
upgrade to this stable release.

To download Spark 2.3.1, head over to the download page:
http://spark.apache.org/downloads.html

To view the release notes:
https://spark.apache.org/releases/spark-release-2-3-1.html

We would like to acknowledge all community members for contributing to
this release. This release would not have been possible without you.


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[VOTE] [RESULT] Spark 2.3.1 (RC4)

2018-06-08 Thread Marcelo Vanzin
The vote passes. Thanks to all who helped with the release!

I'll follow up later with a release announcement once everything is published.

+1 (* = binding):
- Marcelo Vanzin *
- Reynold Xin *
- Sean Owen *
- Denny Lee
- Dongjoon Hyun
- Ricardo Almeida
- Hyukjin Kwon
- John Zhuge
- Mark Hamstra *
- Joseph Bradley *
- Bryan Cutler
- Henry Robinson
- Xiao Li *

+0: None

-1: None


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Time for 2.2.2 release

2018-06-07 Thread Marcelo Vanzin
Took a look at our branch and most of the stuff that is not already in
2.2 are flaky test fixes, so +1.

On Wed, Jun 6, 2018 at 7:54 AM, Tom Graves  wrote:
> Hello all,
>
> I think its time for another 2.2 release.
> I took a look at Jira and I don't see anything explicitly targeted for 2.2.2
> that is not yet complete.
>
> So I'd like to propose to release 2.2.2 soon. If there are important
> fixes that should go into the release, please let those be known (by
> replying here or updating the bug in Jira), otherwise I'm volunteering
> to prepare the first RC soon-ish (by early next week since Spark Summit is
> this week).
>
> Thanks!
> Tom Graves
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-02 Thread Marcelo Vanzin
If you're building your own Spark, definitely try the hadoop-cloud
profile. Then you don't even need to pull anything at runtime,
everything is already packaged with Spark.

On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
 wrote:
> pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me
> either (even building with -Phadoop-2.7). I guess I’ve been relying on an
> unsupported pattern and will need to figure something else out going forward
> in order to use s3a://.
>
>
> On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin  wrote:
>>
>> I have personally never tried to include hadoop-aws that way. But at
>> the very least, I'd try to use the same version of Hadoop as the Spark
>> build (2.7.3 IIRC). I don't really expect a different version to work,
>> and if it did in the past it definitely was not by design.
>>
>> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
>>  wrote:
>> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
>> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 release,
>> > so
>> > it appears something has changed since then.
>> >
>> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
>> >
>> > My goal here is simply to confirm that this release of Spark works with
>> > hadoop-aws like past releases did, particularly for Flintrock users who
>> > use
>> > Spark with S3A.
>> >
>> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop builds
>> > with
>> > every Spark release. If the -hadoop2.7 release build won’t work with
>> > hadoop-aws anymore, are there plans to provide a new build type that
>> > will?
>> >
>> > Apologies if the question is poorly formed. I’m batting a bit outside my
>> > league here. Again, my goal is simply to confirm that I/my users still
>> > have
>> > a way to use s3a://. In the past, that way was simply to call pyspark
>> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very similar.
>> > If
>> > that will no longer work, I’m trying to confirm that the change of
>> > behavior
>> > is intentional or acceptable (as a review for the Spark project) and
>> > figure
>> > out what I need to change (as due diligence for Flintrock’s users).
>> >
>> > Nick
>> >
>> >
>> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin 
>> > wrote:
>> >>
>> >> Using the hadoop-aws package is probably going to be a little more
>> >> complicated than that. The best bet is to use a custom build of Spark
>> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
>> >> looking at some nasty dependency issues, especially if you end up
>> >> mixing different versions of Hadoop.
>> >>
>> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>> >>  wrote:
>> >> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4
>> >> > using
>> >> > Flintrock. However, trying to load the hadoop-aws package gave me
>> >> > some
>> >> > errors.
>> >> >
>> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>> >> >
>> >> > 
>> >> >
>> >> > :: problems summary ::
>> >> >  WARNINGS
>> >> > [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>> >> >  local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>> >> > [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>> >> >  local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>> >> > [NOT FOUND  ]
>> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>> >> >  local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>> >> > [NOT FOUND  ]
>> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>> >> 

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Marcelo Vanzin
I have personally never tried to include hadoop-aws that way. But at
the very least, I'd try to use the same version of Hadoop as the Spark
build (2.7.3 IIRC). I don't really expect a different version to work,
and if it did in the past it definitely was not by design.

On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
 wrote:
> Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
> building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 release, so
> it appears something has changed since then.
>
> I wasn’t familiar with -Phadoop-cloud, but I can try that.
>
> My goal here is simply to confirm that this release of Spark works with
> hadoop-aws like past releases did, particularly for Flintrock users who use
> Spark with S3A.
>
> We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop builds with
> every Spark release. If the -hadoop2.7 release build won’t work with
> hadoop-aws anymore, are there plans to provide a new build type that will?
>
> Apologies if the question is poorly formed. I’m batting a bit outside my
> league here. Again, my goal is simply to confirm that I/my users still have
> a way to use s3a://. In the past, that way was simply to call pyspark
> --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very similar. If
> that will no longer work, I’m trying to confirm that the change of behavior
> is intentional or acceptable (as a review for the Spark project) and figure
> out what I need to change (as due diligence for Flintrock’s users).
>
> Nick
>
>
> On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin  wrote:
>>
>> Using the hadoop-aws package is probably going to be a little more
>> complicated than that. The best bet is to use a custom build of Spark
>> that includes it (use -Phadoop-cloud). Otherwise you're probably
>> looking at some nasty dependency issues, especially if you end up
>> mixing different versions of Hadoop.
>>
>> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>>  wrote:
>> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4
>> > using
>> > Flintrock. However, trying to load the hadoop-aws package gave me some
>> > errors.
>> >
>> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>> >
>> > 
>> >
>> > :: problems summary ::
>> >  WARNINGS
>> > [NOT FOUND  ]
>> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>> >  local-m2-cache: tried
>> >
>> >
>> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>> > [NOT FOUND  ]
>> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>> >  local-m2-cache: tried
>> >
>> >
>> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>> > [NOT FOUND  ]
>> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>> >  local-m2-cache: tried
>> >
>> >
>> > file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>> > [NOT FOUND  ]
>> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>> >  local-m2-cache: tried
>> >
>> >
>> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>> >
>> > I’d guess I’m probably using the wrong version of hadoop-aws, but I
>> > called
>> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to try.
>> >
>> > Any quick pointers?
>> >
>> > Nick
>> >
>> >
>> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin 
>> > wrote:
>> >>
>> >> Starting with my own +1 (binding).
>> >>
>> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin 
>> >> wrote:
>> >> > Please vote on releasing the following candidate as Apache Spark
>> >> > version
>> >> > 2.3.1.
>> >> >
>> >> > Given that I expect at least a few people to be busy with Spark
>> >> > Summit
>> >> > next
>> >> > week, I'm taking the liberty of setting an extended voting period.
>> >> > The
>> >> > vote
>> >> > will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).
>> >> >
>> >> > It passes with a majority of +1 votes, which must include at least 3
>> >> > +1
&

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Marcelo Vanzin
Using the hadoop-aws package is probably going to be a little more
complicated than that. The best bet is to use a custom build of Spark
that includes it (use -Phadoop-cloud). Otherwise you're probably
looking at some nasty dependency issues, especially if you end up
mixing different versions of Hadoop.

On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
 wrote:
> I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4 using
> Flintrock. However, trying to load the hadoop-aws package gave me some
> errors.
>
> $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>
> 
>
> :: problems summary ::
>  WARNINGS
> [NOT FOUND  ]
> com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>  local-m2-cache: tried
>
> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
> [NOT FOUND  ]
> com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>  local-m2-cache: tried
>
> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
> [NOT FOUND  ]
> org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>  local-m2-cache: tried
>
> file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
> [NOT FOUND  ]
> com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>  local-m2-cache: tried
>
> file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>
> I’d guess I’m probably using the wrong version of hadoop-aws, but I called
> make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to try.
>
> Any quick pointers?
>
> Nick
>
>
> On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin  wrote:
>>
>> Starting with my own +1 (binding).
>>
>> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin 
>> wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 2.3.1.
>> >
>> > Given that I expect at least a few people to be busy with Spark Summit
>> > next
>> > week, I'm taking the liberty of setting an extended voting period. The
>> > vote
>> > will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).
>> >
>> > It passes with a majority of +1 votes, which must include at least 3 +1
>> > votes
>> > from the PMC.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.3.1
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
>> > https://github.com/apache/spark/tree/v2.3.1-rc4
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>> >
>> > Signatures used for Spark RCs can be found in this file:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1272/
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>> >
>> > The list of bug fixes going into 2.3.1 can be found at the following
>> > URL:
>> > https://issues.apache.org/jira/projects/SPARK/versions/12342432
>> >
>> > FAQ
>> >
>> > =
>> > How can I help test this release?
>> > =
>> >
>> > If you are a Spark user, you can help us test this release by taking
>> > an existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> > the current RC and see if anything important breaks, in the Java/Scala
>> > you can add the staging repository to your projects resolvers and test
>> > with the RC (make sure to clean up the artifact cache before/after so
>> > you don't end up building with a out of date RC going forward).
>> >
>> > ===
>> > What should happen to JIRA tickets still targeting 2.3.1?
>> > ===
>> >
>> > The current list of open tickets targeted at 2.3.1 can be found at:
>> > 

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Marcelo Vanzin
Starting with my own +1 (binding).

On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin  wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 2.3.1.
>
> Given that I expect at least a few people to be busy with Spark Summit next
> week, I'm taking the liberty of setting an extended voting period. The vote
> will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).
>
> It passes with a majority of +1 votes, which must include at least 3 +1 votes
> from the PMC.
>
> [ ] +1 Release this package as Apache Spark 2.3.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
> https://github.com/apache/spark/tree/v2.3.1-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1272/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>
> The list of bug fixes going into 2.3.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.1?
> ===
>
> The current list of open tickets targeted at 2.3.1 can be found at:
> https://s.apache.org/Q3Uo
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.3.1.

Given that I expect at least a few people to be busy with Spark Summit next
week, I'm taking the liberty of setting an extended voting period. The vote
will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).

It passes with a majority of +1 votes, which must include at least 3 +1 votes
from the PMC.

[ ] +1 Release this package as Apache Spark 2.3.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
https://github.com/apache/spark/tree/v2.3.1-rc4

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1272/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/

The list of bug fixes going into 2.3.1 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12342432

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.3.1?
===

The current list of open tickets targeted at 2.3.1 can be found at:
https://s.apache.org/Q3Uo

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.1 (RC3)

2018-06-01 Thread Marcelo Vanzin
Xiao,

This is the third time in this release cycle that this is happening.
Sorry to single out you guys, but can you please do two things:

- do not merge things in 2.3 you're not absolutely sure about
- make sure that things you backport to 2.3 are not causing problems
- let the RM know about these things as soon as you discover them, not
when they send the next RC for voting.

Even though I was in the middle of preparing the rc, I could have
easily aborted that and skipped this whole thread.

This vote is canceled. I'll prepare a new RC right away. I hope this
does not happen again.


On Fri, Jun 1, 2018 at 1:20 PM, Xiao Li  wrote:
> Sorry, I need to say -1
>
> This morning, just found a regression in 2.3.1 and reverted
> https://github.com/apache/spark/pull/21443
>
> Xiao
>
> 2018-06-01 13:09 GMT-07:00 Marcelo Vanzin :
>>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.3.1.
>>
>> Given that I expect at least a few people to be busy with Spark Summit
>> next
>> week, I'm taking the liberty of setting an extended voting period. The
>> vote
>> will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).
>>
>> It passes with a majority of +1 votes, which must include at least 3 +1
>> votes
>> from the PMC.
>>
>> [ ] +1 Release this package as Apache Spark 2.3.1
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.3.1-rc3 (commit 1cc5f68b):
>> https://github.com/apache/spark/tree/v2.3.1-rc3
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc3-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1271/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc3-docs/
>>
>> The list of bug fixes going into 2.3.1 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.3.1?
>> ===
>>
>> The current list of open tickets targeted at 2.3.1 can be found at:
>> https://s.apache.org/Q3Uo
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>
>> --
>> Marcelo
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[VOTE] Spark 2.3.1 (RC3)

2018-06-01 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.3.1.

Given that I expect at least a few people to be busy with Spark Summit next
week, I'm taking the liberty of setting an extended voting period. The vote
will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).

It passes with a majority of +1 votes, which must include at least 3 +1 votes
from the PMC.

[ ] +1 Release this package as Apache Spark 2.3.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.3.1-rc3 (commit 1cc5f68b):
https://github.com/apache/spark/tree/v2.3.1-rc3

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc3-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1271/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc3-docs/

The list of bug fixes going into 2.3.1 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12342432

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.3.1?
===

The current list of open tickets targeted at 2.3.1 can be found at:
https://s.apache.org/Q3Uo

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.1 (RC2)

2018-05-25 Thread Marcelo Vanzin
This vote fails.

There is currently one open blocker, so I'll be waiting to prepare the
next RC. That will probably happen after the (US) long weekend.

Committers: *please* triage bugs you're looking at and mark them as
target 2.3.1 / blocker if you think they must be in 2.3.1. Otherwise
we end up creating throwaway RCs that are just overhead.


On Tue, May 22, 2018 at 12:45 PM, Marcelo Vanzin <van...@cloudera.com> wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 2.3.1.
>
> The vote is open until Friday, May 25, at 20:00 UTC and passes if
> at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.3.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.3.1-rc2 (commit 93258d80):
> https://github.com/apache/spark/tree/v2.3.1-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1270/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-docs/
>
> The list of bug fixes going into 2.3.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.1?
> ===
>
> The current list of open tickets targeted at 2.3.1 can be found at:
> https://s.apache.org/Q3Uo
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.1 (RC2)

2018-05-23 Thread Marcelo Vanzin
Sure. Also, I'd appreciate if these bugs were properly triaged and
targeted, so that we could avoid creating RCs when we know there are
blocking bugs that will prevent the RC vote from succeeding.

On Wed, May 23, 2018 at 9:02 AM, Xiao Li <gatorsm...@gmail.com> wrote:
> -1
>
> Yeah, we should fix it in Spark 2.3.1.
> https://issues.apache.org/jira/browse/SPARK-24257 is a correctness bug. The
> PR can be merged soon. Thus, let us have another RC?
>
> Thanks,
>
> Xiao
>
>
> 2018-05-23 8:04 GMT-07:00 chenliang613 <chenliang6...@gmail.com>:
>>
>> Hi
>>
>> Agree with Wenchen, it is better to fix this issue.
>>
>> Regards
>> Liang
>>
>>
>> cloud0fan wrote
>> > We found a critical bug in tungsten that can lead to silent data
>> > corruption: https://github.com/apache/spark/pull/21311
>> >
>> > This is a long-standing bug that starts with Spark 2.0(not a
>> > regression),
>> > but since we are going to release 2.3.1, I think it's a good chance to
>> > include this fix.
>> >
>> > We will also backport this fix to Spark 2.0, 2.1, 2.2, and then we can
>> > discuss if we should do a new release for 2.0, 2.1, 2.2 later.
>> >
>> > Thanks,
>> > Wenchen
>> >
>> > On Wed, May 23, 2018 at 9:54 PM, Sean Owen 
>>
>> > srowen@
>>
>> >  wrote:
>> >
>> >> +1 Same result for me as with RC1.
>> >>
>> >>
>> >> On Tue, May 22, 2018 at 2:45 PM Marcelo Vanzin 
>>
>> > vanzin@
>>
>> > 
>> >> wrote:
>> >>
>> >>> Please vote on releasing the following candidate as Apache Spark
>> >>> version
>> >>> 2.3.1.
>> >>>
>> >>> The vote is open until Friday, May 25, at 20:00 UTC and passes if
>> >>> at least 3 +1 PMC votes are cast.
>> >>>
>> >>> [ ] +1 Release this package as Apache Spark 2.3.1
>> >>> [ ] -1 Do not release this package because ...
>> >>>
>> >>> To learn more about Apache Spark, please see http://spark.apache.org/
>> >>>
>> >>> The tag to be voted on is v2.3.1-rc2 (commit 93258d80):
>> >>> https://github.com/apache/spark/tree/v2.3.1-rc2
>> >>>
>> >>> The release files, including signatures, digests, etc. can be found
>> >>> at:
>> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-bin/
>> >>>
>> >>> Signatures used for Spark RCs can be found in this file:
>> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >>>
>> >>> The staging repository for this release can be found at:
>> >>>
>> >>> https://repository.apache.org/content/repositories/orgapachespark-1270/
>> >>>
>> >>> The documentation corresponding to this release can be found at:
>> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-docs/
>> >>>
>> >>> The list of bug fixes going into 2.3.1 can be found at the following
>> >>> URL:
>> >>> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>> >>>
>> >>> FAQ
>> >>>
>> >>> =
>> >>> How can I help test this release?
>> >>> =
>> >>>
>> >>> If you are a Spark user, you can help us test this release by taking
>> >>> an existing Spark workload and running on this release candidate, then
>> >>> reporting any regressions.
>> >>>
>> >>> If you're working in PySpark you can set up a virtual env and install
>> >>> the current RC and see if anything important breaks, in the Java/Scala
>> >>> you can add the staging repository to your projects resolvers and test
>> >>> with the RC (make sure to clean up the artifact cache before/after so
>> >>> you don't end up building with a out of date RC going forward).
>> >>>
>> >>> ===
>> >>> What should happen to JIRA tickets still targeting 2.3.1?
>> >>> ===
>> >>>
>> >>> The current list of open tickets targeted at 2.3.1 can be found at:
>> >>> https://s.apache.org/Q3Uo
>> >>>
>> >>> Committers should look at those and triage. Extremely important bug
>> >>> fixes, documentation, and API tweaks that impact compatibility should
>> >>> be worked on immediately. Everything else please retarget to an
>> >>> appropriate release.
>> >>>
>> >>> ==
>> >>> But my bug isn't fixed?
>> >>> ==
>> >>>
>> >>> In order to make timely releases, we will typically not hold the
>> >>> release unless the bug in question is a regression from the previous
>> >>> release. That being said, if there is something which is a regression
>> >>> that has not been correctly targeted please ping me or a committer to
>> >>> help target the issue.
>> >>>
>> >>>
>> >>> --
>> >>> Marcelo
>> >>>
>> >>> -
>> >>> To unsubscribe e-mail:
>>
>> > dev-unsubscribe@.apache
>>
>> >>>
>> >>>
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.1 (RC2)

2018-05-22 Thread Marcelo Vanzin
Starting with my own +1. Did the same testing as RC1.

On Tue, May 22, 2018 at 12:45 PM, Marcelo Vanzin <van...@cloudera.com> wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 2.3.1.
>
> The vote is open until Friday, May 25, at 20:00 UTC and passes if
> at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.3.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.3.1-rc2 (commit 93258d80):
> https://github.com/apache/spark/tree/v2.3.1-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1270/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-docs/
>
> The list of bug fixes going into 2.3.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.1?
> ===
>
> The current list of open tickets targeted at 2.3.1 can be found at:
> https://s.apache.org/Q3Uo
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[VOTE] Spark 2.3.1 (RC2)

2018-05-22 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.3.1.

The vote is open until Friday, May 25, at 20:00 UTC and passes if
at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.3.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.3.1-rc2 (commit 93258d80):
https://github.com/apache/spark/tree/v2.3.1-rc2

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1270/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-docs/

The list of bug fixes going into 2.3.1 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12342432

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.3.1?
===

The current list of open tickets targeted at 2.3.1 can be found at:
https://s.apache.org/Q3Uo

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-21 Thread Marcelo Vanzin
FYI the fix for the blocker has just been committed. I'll prepare RC2
tomorrow morning assuming jenkins is reasonably happy with the current
state of the branch.

On Fri, May 18, 2018 at 10:39 AM, Marcelo Vanzin <van...@cloudera.com> wrote:
> Just to give folks an update.
>
> In case you haven't followed the thread, this vote failed.
>
> I'll cut a new RC once the current blocker is addressed.
>
>
> On Thu, May 17, 2018 at 2:59 PM, Imran Rashid <iras...@cloudera.com> wrote:
>> I just found https://issues.apache.org/jira/browse/SPARK-24309 which is
>> pretty serious.  I've marked it a blocker, I think it should go into 2.3.1.
>> I'll also take a closer look comparing to the behavior of the old listener
>> bus.
>>
>> On Thu, May 17, 2018 at 12:18 PM, Marcelo Vanzin <van...@cloudera.com>
>> wrote:
>>>
>>> Wenchen reviewed and pushed that change, so he's the most qualified to
>>> make that decision.
>>>
>>> I plan to cut a new RC tomorrow so hopefully he'll see this by then.
>>>
>>> On Thu, May 17, 2018 at 10:13 AM, Artem Rudoy <artem.ru...@gmail.com>
>>> wrote:
>>> > Can we include https://issues.apache.org/jira/browse/SPARK-22371 as well
>>> > please?
>>> >
>>> > Artem
>>> >
>>> >
>>> >
>>> > --
>>> > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>> >
>>> > -
>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>
>
>
>
> --
> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Running lint-java during PR builds?

2018-05-21 Thread Marcelo Vanzin
Is there a way to trigger it conditionally? e.g. only if the diff
touches java files.

On Mon, May 21, 2018 at 9:17 AM, Felix Cheung <felixcheun...@hotmail.com> wrote:
> One concern is with the volume of test runs on Travis.
>
> In ASF projects Travis could get significantly
> backed up since - if I recall - all of ASF shares one queue.
>
> At the number of PRs Spark has this could be a big issue.
>
>
> ________
> From: Marcelo Vanzin <van...@cloudera.com>
> Sent: Monday, May 21, 2018 9:08:28 AM
> To: Hyukjin Kwon
> Cc: Dongjoon Hyun; dev
> Subject: Re: Running lint-java during PR builds?
>
> I'm fine with it. I tried to use the existing checkstyle sbt plugin
> (trying to fix SPARK-22269), but it depends on an ancient version of
> checkstyle, and I don't know sbt enough to figure out how to hack
> classpaths and class loaders when applying rules, so gave up.
>
> On Mon, May 21, 2018 at 1:47 AM, Hyukjin Kwon <gurwls...@gmail.com> wrote:
>> I am going to open an INFRA JIRA if there's no explicit objection in few
>> days.
>>
>> 2018-05-21 13:09 GMT+08:00 Hyukjin Kwon <gurwls...@gmail.com>:
>>>
>>> I would like to revive this proposal. Travis CI. Shall we give this try?
>>> I
>>> think it's worth trying it.
>>>
>>> 2016-11-17 3:50 GMT+08:00 Dongjoon Hyun <dongj...@apache.org>:
>>>>
>>>> Hi, Marcelo and Ryan.
>>>>
>>>> That was the main purpose of my proposal about Travis.CI.
>>>> IMO, that is the only way to achieve that without any harmful
>>>> side-effect
>>>> on Jenkins infra.
>>>>
>>>> Spark is already ready for that. Like AppVoyer, if one of you files an
>>>> INFRA jira issue to enable that, they will turn on that. Then, we can
>>>> try it
>>>> and see the result. Also, you can turn off easily again if you don't
>>>> want.
>>>>
>>>> Without this, we will consume more community efforts. For example, we
>>>> merged lint-java error fix PR seven hours ago, but the master branch
>>>> still
>>>> has one lint-java error.
>>>>
>>>> https://travis-ci.org/dongjoon-hyun/spark/jobs/176351319
>>>>
>>>> Actually, I've been monitoring the history here. (It's synced every 30
>>>> minutes.)
>>>>
>>>> https://travis-ci.org/dongjoon-hyun/spark/builds
>>>>
>>>> Could we give a change to this?
>>>>
>>>> Bests,
>>>> Dongjoon.
>>>>
>>>> On 2016-11-15 13:40 (-0800), "Shixiong(Ryan) Zhu"
>>>> <shixi...@databricks.com> wrote:
>>>> > I remember it's because you need to run `mvn install` before running
>>>> > lint-java if the maven cache is empty, and `mvn install` is pretty
>>>> > heavy.
>>>> >
>>>> > On Tue, Nov 15, 2016 at 1:21 PM, Marcelo Vanzin <van...@cloudera.com>
>>>> > wrote:
>>>> >
>>>> > > Hey all,
>>>> > >
>>>> > > Is there a reason why lint-java is not run during PR builds? I see
>>>> > > it
>>>> > > seems to be maven-only, is it really expensive to run after an sbt
>>>> > > build?
>>>> > >
>>>> > > I see a lot of PRs coming in to fix Java style issues, and those all
>>>> > > seem a little unnecessary. Either we're enforcing style checks or
>>>> > > we're not, and right now it seems we aren't.
>>>> > >
>>>> > > --
>>>> > > Marcelo
>>>> > >
>>>> > >
>>>> > > -
>>>> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>> > >
>>>> > >
>>>> >
>>>>
>>>> -
>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>
>>>
>>
>
>
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Running lint-java during PR builds?

2018-05-21 Thread Marcelo Vanzin
I'm fine with it. I tried to use the existing checkstyle sbt plugin
(trying to fix SPARK-22269), but it depends on an ancient version of
checkstyle, and I don't know sbt enough to figure out how to hack
classpaths and class loaders when applying rules, so gave up.

On Mon, May 21, 2018 at 1:47 AM, Hyukjin Kwon <gurwls...@gmail.com> wrote:
> I am going to open an INFRA JIRA if there's no explicit objection in few
> days.
>
> 2018-05-21 13:09 GMT+08:00 Hyukjin Kwon <gurwls...@gmail.com>:
>>
>> I would like to revive this proposal. Travis CI. Shall we give this try? I
>> think it's worth trying it.
>>
>> 2016-11-17 3:50 GMT+08:00 Dongjoon Hyun <dongj...@apache.org>:
>>>
>>> Hi, Marcelo and Ryan.
>>>
>>> That was the main purpose of my proposal about Travis.CI.
>>> IMO, that is the only way to achieve that without any harmful side-effect
>>> on Jenkins infra.
>>>
>>> Spark is already ready for that. Like AppVoyer, if one of you files an
>>> INFRA jira issue to enable that, they will turn on that. Then, we can try it
>>> and see the result. Also, you can turn off easily again if you don't want.
>>>
>>> Without this, we will consume more community efforts. For example, we
>>> merged lint-java error fix PR seven hours ago, but the master branch still
>>> has one lint-java error.
>>>
>>> https://travis-ci.org/dongjoon-hyun/spark/jobs/176351319
>>>
>>> Actually, I've been monitoring the history here. (It's synced every 30
>>> minutes.)
>>>
>>> https://travis-ci.org/dongjoon-hyun/spark/builds
>>>
>>> Could we give a change to this?
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>> On 2016-11-15 13:40 (-0800), "Shixiong(Ryan) Zhu"
>>> <shixi...@databricks.com> wrote:
>>> > I remember it's because you need to run `mvn install` before running
>>> > lint-java if the maven cache is empty, and `mvn install` is pretty
>>> > heavy.
>>> >
>>> > On Tue, Nov 15, 2016 at 1:21 PM, Marcelo Vanzin <van...@cloudera.com>
>>> > wrote:
>>> >
>>> > > Hey all,
>>> > >
>>> > > Is there a reason why lint-java is not run during PR builds? I see it
>>> > > seems to be maven-only, is it really expensive to run after an sbt
>>> > > build?
>>> > >
>>> > > I see a lot of PRs coming in to fix Java style issues, and those all
>>> > > seem a little unnecessary. Either we're enforcing style checks or
>>> > > we're not, and right now it seems we aren't.
>>> > >
>>> > > --
>>> > > Marcelo
>>> > >
>>> > > -
>>> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> > >
>>> > >
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-18 Thread Marcelo Vanzin
Just to give folks an update.

In case you haven't followed the thread, this vote failed.

I'll cut a new RC once the current blocker is addressed.


On Thu, May 17, 2018 at 2:59 PM, Imran Rashid <iras...@cloudera.com> wrote:
> I just found https://issues.apache.org/jira/browse/SPARK-24309 which is
> pretty serious.  I've marked it a blocker, I think it should go into 2.3.1.
> I'll also take a closer look comparing to the behavior of the old listener
> bus.
>
> On Thu, May 17, 2018 at 12:18 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
>>
>> Wenchen reviewed and pushed that change, so he's the most qualified to
>> make that decision.
>>
>> I plan to cut a new RC tomorrow so hopefully he'll see this by then.
>>
>> On Thu, May 17, 2018 at 10:13 AM, Artem Rudoy <artem.ru...@gmail.com>
>> wrote:
>> > Can we include https://issues.apache.org/jira/browse/SPARK-22371 as well
>> > please?
>> >
>> > Artem
>> >
>> >
>> >
>> > --
>> > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>>
>>
>>
>> --
>> Marcelo
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-17 Thread Marcelo Vanzin
Wenchen reviewed and pushed that change, so he's the most qualified to
make that decision.

I plan to cut a new RC tomorrow so hopefully he'll see this by then.

On Thu, May 17, 2018 at 10:13 AM, Artem Rudoy  wrote:
> Can we include https://issues.apache.org/jira/browse/SPARK-22371 as well
> please?
>
> Artem
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-16 Thread Marcelo Vanzin
This is actually in 2.3, jira is just missing the version.

https://github.com/apache/spark/pull/20765

On Wed, May 16, 2018 at 2:14 PM, kant kodali <kanth...@gmail.com> wrote:
> I am not sure how SPARK-23406 is a new feature. since streaming joins are
> already part of SPARK 2.3.0. The self joins didn't work because of a bug and
> it is fixed but I can understand if it touches some other code paths.
>
> On Wed, May 16, 2018 at 3:22 AM, Marco Gaido <marcogaid...@gmail.com> wrote:
>>
>> I'd be against having a new feature in a minor maintenance release. I
>> think such a release should contain only bugfixes.
>>
>> 2018-05-16 12:11 GMT+02:00 kant kodali <kanth...@gmail.com>:
>>>
>>> Can this https://issues.apache.org/jira/browse/SPARK-23406 be part of
>>> 2.3.1?
>>>
>>> On Tue, May 15, 2018 at 2:07 PM, Marcelo Vanzin <van...@cloudera.com>
>>> wrote:
>>>>
>>>> Bummer. People should still feel welcome to test the existing RC so we
>>>> can rule out other issues.
>>>>
>>>> On Tue, May 15, 2018 at 2:04 PM, Xiao Li <gatorsm...@gmail.com> wrote:
>>>> > -1
>>>> >
>>>> > We have a correctness bug fix that was merged after 2.3 RC1. It would
>>>> > be
>>>> > nice to have that in Spark 2.3.1 release.
>>>> >
>>>> > https://issues.apache.org/jira/browse/SPARK-24259
>>>> >
>>>> > Xiao
>>>> >
>>>> >
>>>> > 2018-05-15 14:00 GMT-07:00 Marcelo Vanzin <van...@cloudera.com>:
>>>> >>
>>>> >> Please vote on releasing the following candidate as Apache Spark
>>>> >> version
>>>> >> 2.3.1.
>>>> >>
>>>> >> The vote is open until Friday, May 18, at 21:00 UTC and passes if
>>>> >> a majority of at least 3 +1 PMC votes are cast.
>>>> >>
>>>> >> [ ] +1 Release this package as Apache Spark 2.3.1
>>>> >> [ ] -1 Do not release this package because ...
>>>> >>
>>>> >> To learn more about Apache Spark, please see http://spark.apache.org/
>>>> >>
>>>> >> The tag to be voted on is v2.3.1-rc1 (commit cc93bc95):
>>>> >> https://github.com/apache/spark/tree/v2.3.0-rc1
>>>> >>
>>>> >> The release files, including signatures, digests, etc. can be found
>>>> >> at:
>>>> >> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc1-bin/
>>>> >>
>>>> >> Signatures used for Spark RCs can be found in this file:
>>>> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>> >>
>>>> >> The staging repository for this release can be found at:
>>>> >>
>>>> >> https://repository.apache.org/content/repositories/orgapachespark-1269/
>>>> >>
>>>> >> The documentation corresponding to this release can be found at:
>>>> >> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc1-docs/
>>>> >>
>>>> >> The list of bug fixes going into 2.3.1 can be found at the following
>>>> >> URL:
>>>> >> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>>>> >>
>>>> >> FAQ
>>>> >>
>>>> >> =
>>>> >> How can I help test this release?
>>>> >> =
>>>> >>
>>>> >> If you are a Spark user, you can help us test this release by taking
>>>> >> an existing Spark workload and running on this release candidate,
>>>> >> then
>>>> >> reporting any regressions.
>>>> >>
>>>> >> If you're working in PySpark you can set up a virtual env and install
>>>> >> the current RC and see if anything important breaks, in the
>>>> >> Java/Scala
>>>> >> you can add the staging repository to your projects resolvers and
>>>> >> test
>>>> >> with the RC (make sure to clean up the artifact cache before/after so
>>>> >> you don't end up building with a out of date RC going forward).
>>>> >>
>>>> >> ===
>>>> >> What should happen to JIRA tickets still targeting 2.3.1?
>>>&

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Marcelo Vanzin
It's in. That link is only a list of the currently open bugs.

On Tue, May 15, 2018 at 2:02 PM, Justin Miller
<justin.mil...@protectwise.com> wrote:
> Did SPARK-24067 not make it in? I don’t see it in https://s.apache.org/Q3Uo.
>
> Thanks,
> Justin
>
> On May 15, 2018, at 3:00 PM, Marcelo Vanzin <van...@cloudera.com> wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.3.1.
>
> The vote is open until Friday, May 18, at 21:00 UTC and passes if
> a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.3.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.3.1-rc1 (commit cc93bc95):
> https://github.com/apache/spark/tree/v2.3.0-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1269/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc1-docs/
>
> The list of bug fixes going into 2.3.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.1?
> ===
>
> The current list of open tickets targeted at 2.3.1 can be found at:
> https://s.apache.org/Q3Uo
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Marcelo Vanzin
I'll start with my +1 (binding). I've ran unit tests and a bunch of
integration tests on the hadoop-2.7 package.

Please note that there are still a few flaky tests. Please check jira
before you decide to send a -1 because of a flaky test.

Also, apologies for the delay in getting the RC ready. Still learning
the ropes. Also, if you plan on doing this in the future, *do not* do
"svn co" on the dist.apache.org repo. The ASF Infra folks will not be
very kind to you. I'll update our RM docs later.


On Tue, May 15, 2018 at 2:00 PM, Marcelo Vanzin <van...@cloudera.com> wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 2.3.1.
>
> The vote is open until Friday, May 18, at 21:00 UTC and passes if
> a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.3.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.3.1-rc1 (commit cc93bc95):
> https://github.com/apache/spark/tree/v2.3.0-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1269/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc1-docs/
>
> The list of bug fixes going into 2.3.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.1?
> ===
>
> The current list of open tickets targeted at 2.3.1 can be found at:
> https://s.apache.org/Q3Uo
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.3.1.

The vote is open until Friday, May 18, at 21:00 UTC and passes if
a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.3.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.3.1-rc1 (commit cc93bc95):
https://github.com/apache/spark/tree/v2.3.0-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1269/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc1-docs/

The list of bug fixes going into 2.3.1 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12342432

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.3.1?
===

The current list of open tickets targeted at 2.3.1 can be found at:
https://s.apache.org/Q3Uo

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Time for 2.3.1?

2018-05-10 Thread Marcelo Vanzin
Hello all,

It's been a while since we shipped 2.3.0 and lots of important bug
fixes have gone into the branch since then. I took a look at Jira and
it seems there's not a lot of things explicitly targeted at 2.3.1 -
the only potential blocker (a parquet issue) is being worked on since
a new parquet with the fix was just released.

So I'd like to propose to release 2.3.1 soon. If there are important
fixes that should go into the release, please let those be known (by
replying here or updating the bug in Jira), otherwise I'm volunteering
to prepare the first RC soon-ish (around the weekend).

Thanks!


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark UI Source Code

2018-05-07 Thread Marcelo Vanzin
On Mon, May 7, 2018 at 1:44 AM, Anshi Shrivastava
 wrote:
> I've found a KVStore wrapper which stores all the metrics in a LevelDb
> store. This KVStore wrapper is available as a spark-dependency but we cannot
> access the metrics directly from spark since they are all private.

I'm not sure what it is you're trying to do exactly, but there's a
public REST API that exposes all the data Spark keeps about
applications. There's also a programmatic status tracker
(SparkContext.statusTracker) that's easier to use from within the
running Spark app, but has a lot less info.

> Can we use this store to store our own metrics?

No.

> Also can we retrieve these metrics based on timestamp?

Only if the REST API has that feature, don't remember off the top of my head.


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: time for Apache Spark 3.0?

2018-04-05 Thread Marcelo Vanzin
On Thu, Apr 5, 2018 at 10:30 AM, Matei Zaharia  wrote:
> Sorry, but just to be clear here, this is the 2.12 API issue: 
> https://issues.apache.org/jira/browse/SPARK-14643, with more details in this 
> doc: 
> https://docs.google.com/document/d/1P_wmH3U356f079AYgSsN53HKixuNdxSEvo8nw_tgLgM/edit.
>
> Basically, if we are allowed to change Spark’s API a little to have only one 
> version of methods that are currently overloaded between Java and Scala, we 
> can get away with a single source three for all Scala versions and Java ABI 
> compatibility against any type of Spark (whether using Scala 2.11 or 2.12).

Fair enough. To play devil's advocate, most of those methods seem to
be marked "Experimental / Evolving", which could be used as a reason
to change them for this purpose in a minor release.

Not all of them are, though (e.g. foreach / foreachPartition are not
experimental).

-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: time for Apache Spark 3.0?

2018-04-05 Thread Marcelo Vanzin
I remember seeing somewhere that Scala still has some issues with Java
9/10 so that might be hard...

But on that topic, it might be better to shoot for Java 11
compatibility. 9 and 10, following the new release model, aren't
really meant to be long-term releases.

In general, agree with Sean here. Doesn't look like 2.12 support
requires unexpected API breakages. So unless there's a really good
reason to break / remove a bunch of existing APIs...

On Thu, Apr 5, 2018 at 9:04 AM, Marco Gaido  wrote:
> Hi all,
>
> I also agree with Mark that we should add Java 9/10 support to an eventual
> Spark 3.0 release, because supporting Java 9 is not a trivial task since we
> are using some internal APIs for the memory management which changed: either
> we find a solution which works on both (but I am not sure it is feasible) or
> we have to switch between 2 implementations according to the Java version.
> So I'd rather avoid doing this in a non-major release.
>
> Thanks,
> Marco
>
>
> 2018-04-05 17:35 GMT+02:00 Mark Hamstra :
>>
>> As with Sean, I'm not sure that this will require a new major version, but
>> we should also be looking at Java 9 & 10 support -- particularly with regard
>> to their better functionality in a containerized environment (memory limits
>> from cgroups, not sysconf; support for cpusets). In that regard, we should
>> also be looking at using the latest Scala 2.11.x maintenance release in
>> current Spark branches.
>>
>> On Thu, Apr 5, 2018 at 5:45 AM, Sean Owen  wrote:
>>>
>>> On Wed, Apr 4, 2018 at 6:20 PM Reynold Xin  wrote:

 The primary motivating factor IMO for a major version bump is to support
 Scala 2.12, which requires minor API breaking changes to Spark’s APIs.
 Similar to Spark 2.0, I think there are also opportunities for other 
 changes
 that we know have been biting us for a long time but can’t be changed in
 feature releases (to be clear, I’m actually not sure they are all good
 ideas, but I’m writing them down as candidates for consideration):
>>>
>>>
>>> IIRC from looking at this, it is possible to support 2.11 and 2.12
>>> simultaneously. The cross-build already works now in 2.3.0. Barring some big
>>> change needed to get 2.12 fully working -- and that may be the case -- it
>>> nearly works that way now.
>>>
>>> Compiling vs 2.11 and 2.12 does however result in some APIs that differ
>>> in byte code. However Scala itself isn't mutually compatible between 2.11
>>> and 2.12 anyway; that's never been promised as compatible.
>>>
>>> (Interesting question about what *Java* users should expect; they would
>>> see a difference in 2.11 vs 2.12 Spark APIs, but that has always been true.)
>>>
>>> I don't disagree with shooting for Spark 3.0, just saying I don't know if
>>> 2.12 support requires moving to 3.0. But, Spark 3.0 could consider dropping
>>> 2.11 support if needed to make supporting 2.12 less painful.
>>
>>
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Hadoop 3 support

2018-04-02 Thread Marcelo Vanzin
I haven't looked at it in detail...

Somebody's been trying to do that in
https://github.com/apache/spark/pull/20659, but that's kind of a huge
change.

The parts where I'd be concerned are:
- using Hive's original hive-exec package brings in a bunch of shaded
dependencies, which may break Spark in weird ways. HIVE-16391 was
supposed to fix that but nothing has really been done as part of that
bug.
- the hive-exec "core" package avoids the shaded dependencies but used
to have issues of its own. Maybe it's better now, haven't looked.
- what about the current thrift server which is basically a fork of
the Hive 1.2 source code?
- when using Hadoop 3 + an old metastore client that doesn't know
about Hadoop 3, things may break.

The latter one has two possible fixes: say that Hadoop 3 builds of
Spark don't support old metastores; or add code so that Spark loads a
separate copy of Hadoop libraries in that case (search for
"sharesHadoopClasses" in IsolatedClientLoader for where to start with
that).

If trying to update Hive it would be good to avoid having to fork it,
like it's done currently. But not sure that will be possible given the
current hive-exec packaging.

On Mon, Apr 2, 2018 at 2:58 PM, Reynold Xin <r...@databricks.com> wrote:
> Is it difficult to upgrade Hive execution version to the latest version? The
> metastore used to be an issue but now that part had been separated from the
> execution part.
>
>
> On Mon, Apr 2, 2018 at 1:57 PM, Marcelo Vanzin <van...@cloudera.com> wrote:
>>
>> Saisai filed SPARK-23534, but the main blocking issue is really
>> SPARK-18673.
>>
>>
>> On Mon, Apr 2, 2018 at 1:00 PM, Reynold Xin <r...@databricks.com> wrote:
>> > Does anybody know what needs to be done in order for Spark to support
>> > Hadoop
>> > 3?
>> >
>>
>>
>>
>> --
>> Marcelo
>
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



  1   2   3   4   >