Re: Spark 2.4.2

2019-04-17 Thread Sean Owen
ing of SPARK-25250. Is there any other > ongoing bug fixes we want to include in 2.4.2? If no I'd like to start the > release process today (CST). > > Thanks, > Wenchen > > On Thu, Apr 18, 2019 at 3:44 AM Sean Owen wrote: >> >> I think the 'only backport

Re: Spark 2.4.2

2019-04-17 Thread Sean Owen
ng who else out there might have an opinion. I'm not pushing for it necessarily. On Wed, Apr 17, 2019 at 6:18 PM Reynold Xin wrote: > > For Jackson - are you worrying about JSON parsing for users or internal Spark > functionality breaking? > > On Wed, Apr 17, 2019 at 6:02 PM S

Re: Spark 2.4.2

2019-04-19 Thread Sean Owen
e shading - same argument I’ve made earlier today in a PR... >>> >>> (Context- in many cases Spark has light or indirect dependencies but >>> bringing them into the process breaks users code easily) >>> >>> >>> >

Re: Spark 2.4.2

2019-04-19 Thread Sean Owen
org/jira/browse/SPARK-27469 https://issues.apache.org/jira/browse/SPARK-27470 On Fri, Apr 19, 2019 at 11:13 AM Sean Owen wrote: > > All: here is the backport of changes to update to 2.9.8 from master back to > 2.4. > https://github.com/apache/spark/pull/24418 > > master has

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-20 Thread Sean Owen
+1 from me too. It seems like there is support for merging the Jackson change into 2.4.x (and, I think, a few more minor dependency updates) but this doesn't have to go into 2.4.2. That said, if there is another RC for any reason, I think we could include it. Otherwise can wait for 2.4.3. On Thu,

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-21 Thread Sean Owen
One minor comment: for 2.4.1 we had a couple JIRAs marked 'release-notes': https://issues.apache.org/jira/browse/SPARK-27198?jql=project%20%3D%20SPARK%20and%20fixVersion%20%20in%20(2.4.1%2C%202.4.2)%20and%20labels%20%3D%20%27release-notes%27 They should be mentioned in https://spark.apache.org/rel

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Sean Owen
Re: .NET, what's the particular issue in there that it's causing? 2.4.2 still builds for 2.11. I'd imagine you'd be pulling dependencies from Maven central (?) or if needed can build for 2.11 from source. I'm more concerned about pyspark because it builds in 2.12 jars. On Fri, Apr 26, 2019 at 1:36

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Sean Owen
To be clear, what's the nature of the problem there... just Pyspark apps that are using a Scala-based library? Trying to make sure we understand what is and isn't a problem here. On Fri, Apr 26, 2019 at 9:44 AM Michael Heuer wrote: > This will also cause problems in Conda builds that depend on p

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Sean Owen
e, and our Homebrew formula depends on the apache-spark Homebrew > formula. > > Using Scala 2.12 in the binary distribution for Spark 2.4.2 was > unintentional and never voted on. There was a successful vote to default > to Scala 2.12 in Spark version 3.0. > > michael > > &

Re: Spark build can't find javac

2019-04-29 Thread Sean Owen
Your JAVA_HOME is pointing to a JRE rather than JDK installation. Or you've actually installed the JRE. Only the JDK has javac, etc. On Mon, Apr 29, 2019 at 4:36 PM Shmuel Blitz wrote: > Hi, > > Trying to build Spark on Manjaro with OpenJDK version 1.8.0_212, and I'm > getting the following erro

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-29 Thread Sean Owen
I think this is a reasonable idea; I know @vanzin had suggested it was simpler to use the latest in case a bug was found in the release script and then it could just be fixed in master rather than back-port and re-roll the RC. That said I think we did / had to already drop the ability to build <= 2

Re: [VOTE] Release Apache Spark 2.4.3

2019-05-01 Thread Sean Owen
+1 from me. There is little change from 2.4.2 anyway, except for the important change to the build script that should build pyspark with Scala 2.11 jars. I verified that the package contains the _2.11 Spark jars, but have a look! I'm still getting this weird error from the Kafka module when testin

Re: [VOTE] Release Apache Spark 2.4.3

2019-05-03 Thread Sean Owen
Hadoop 3 has not been supported in 2.4.x. 2.12 has been since 2.4.0, and 2.12 artifacts have always been released where available. What are you referring to? On Fri, May 3, 2019 at 9:28 AM antonkulaga wrote: > > Can you prove release version for Hadoop 3 and Scala 2.12 this time? > -

Re: SparkR latest API docs missing?

2019-05-08 Thread Sean Owen
I think the SparkR release always trails a little bit due to the additional CRAN processes. On Wed, May 8, 2019 at 11:23 AM Shivaram Venkataraman wrote: > > I just noticed that the SparkR API docs are missing at > https://spark.apache.org/docs/latest/api/R/index.html --- It looks > like they were

Interesting implications of supporting Scala 2.13

2019-05-10 Thread Sean Owen
While that's not happening soon (2.13 isn't out), note that some of the changes to collections will be fairly breaking changes. https://issues.apache.org/jira/browse/SPARK-25075 https://docs.scala-lang.org/overviews/core/collections-migration-213.html Some of this may impact a public API, so may

Re: Interesting implications of supporting Scala 2.13

2019-05-10 Thread Sean Owen
d Xin wrote: > Looks like a great idea to make changes in Spark 3.0 to prepare for Scala > 2.13 upgrade. > > Are there breaking changes that would require us to have two different > source code for 2.12 vs 2.13? > > > On Fri, May 10, 2019 at 11:41 AM, Sean Owen wrote:

Re: Interesting implications of supporting Scala 2.13

2019-05-11 Thread Sean Owen
t; specific version. We'll have to deal with all the other points you raised >> when we do cross that bridge, but hopefully those are things we can cover in >> a minor release. >> >> On Fri, May 10, 2019 at 2:31 PM Sean Owen wrote: >>> >>> I real

Re: adding shutdownmanagerhook to spark.

2019-05-13 Thread Sean Owen
Spark just adds a hook to the mechanism that Hadoop exposes. You can do the same. You shouldn't use Spark's. On Mon, May 13, 2019 at 6:11 PM Nasrulla Khan Haris wrote: > > HI All, > > > > I am trying to add shutdown hook, but looks like shutdown manager object > requires the package to be spark

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Sean Owen
I gave up looking through JIRAs a long time ago, so, big respect for continuing to try to triage them. I am afraid we're missing a few important bug reports in the torrent, but most JIRAs are not well-formed, just questions, stale, or simply things that won't be added. I do think it's important to

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Sean Owen
ions (and resolve the >> issues as "Timed Out" / "Cannot Reproduce", not "Fixed"). Using a label >> makes it easier to audit what was closed, simplifying the process of >> identifying and re-opening valid issues caught in our dragnet. >> >>

Re: Access to live data of cached dataFrame

2019-05-17 Thread Sean Owen
A cached DataFrame isn't supposed to change, by definition. You can re-read each time or consider setting up a streaming source on the table which provides a result that updates as new data comes in. On Fri, May 17, 2019 at 1:44 PM Tomas Bartalos wrote: > > Hello, > > I have a cached dataframe: >

Re: Signature verification failed

2019-05-18 Thread Sean Owen
Moving to dev@ Ah, looks like it was added to https://dist.apache.org/repos/dist/dev/spark/KEYS but not the final dist KEYS file. I just copied it over now. On Sat, May 18, 2019 at 7:12 AM Andreas Költringer wrote: > > Hi, > > I wanted to download and verify (via signature) an Apache Spark packag

Re: Resolving all JIRAs affecting EOL releases

2019-05-19 Thread Sean Owen
ense in this way then. >>>> Yes, I am good with 'Incomplete' too. >>>> >>>> 2019년 5월 16일 (목) 오전 11:24, Hyukjin Kwon 님이 작성: >>>> >>>>> I actually recently used 'Incomplete' a bit when the JIRA is >>>>>

Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12

2019-05-20 Thread Sean Owen
Re: 1), I think we tried to fix that on the build side and it requires flags that not all tar versions (i.e. OS X) have. But that's tangential. I think the Avro + Parquet dependency situation is generally problematic -- see JIRA for some details. But yes I'm not surprised if Spark has a different

Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12

2019-05-21 Thread Sean Owen
distro, but avro-1.8.2.jar is not. i tried to fix it but i am > not too familiar with the pom file. > > regarding jline you only run into this if you use spark-shell (and it isnt > always reproducible it seems). see SPARK-25783 > best, > koert > > > > > On

Re: RDD object Out of scope.

2019-05-21 Thread Sean Owen
I'm not clear what you're asking. An RDD itself is just an object in the JVM. It will be garbage collected if there are no references. What else would there be to clean up in your case? ContextCleaner handles cleaned up of persisted RDDs, etc. On Tue, May 21, 2019 at 7:39 PM Nasrulla Khan Haris w

Re: Interesting implications of supporting Scala 2.13

2019-05-29 Thread Sean Owen
I think the particular issue here isn't resolved by scala-collection-compat: TraversableOnce goes away. However I hear that maybe Scala 2.13 retains it as a deprecated alias, which might help. On Wed, May 29, 2019 at 4:59 PM antonkulaga wrote: > > There is https://github.com/scala/scala-collectio

Re: Should python-2 be supported in Spark 3.0?

2019-05-29 Thread Sean Owen
Deprecated -- certainly and sooner than later. I don't have a good sense of the overhead of continuing to support Python 2; is it large enough to consider dropping it in Spark 3.0? On Wed, May 29, 2019 at 11:47 PM Xiangrui Meng wrote: > > Hi all, > > I want to revive this old thread since no acti

Master maven build failing for 6 days -- may need some more eyes

2019-05-30 Thread Sean Owen
I might need some help figuring this out. The master Maven build has been failing for almost a week, and I'm having trouble diagnosing why. Of course, the PR builder has been fine. First one seems to be: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master

jQuery 3.4.1 update

2019-06-14 Thread Sean Owen
Just surfacing this change as it's probably pretty good to go, but, a) I'm not a jQuery / JS expert and b) we don't have comprehensive UI tests. https://github.com/apache/spark/pull/24843 I'd like to get us up to a modern jQuery for 3.0, to keep up with security fixes (which was the minor motivat

Re: Spark 2.4.3 source download is a dead link

2019-06-18 Thread Sean Owen
Huh, I don't know how long that's been a bug, but the JS that creates the filename with .replace doesn't seem to have ever worked? https://github.com/apache/spark-website/pull/207 On Tue, Jun 18, 2019 at 4:07 AM Olivier Girardot wrote: > > Hi everyone, > FYI the spark source download link on spar

Re: Ask for ARM CI for spark

2019-06-19 Thread Sean Owen
I'd begin by reporting and fixing ARM-related issues in the build. If they're small, of course we should do them. If it requires significant modifications, we can discuss how much Spark can support ARM. I don't think it's yet necessary for the Spark project to run these CI builds until that point,

Re: sparkmaster-test-sbt-hadoop-2.7 failing RAT check

2019-06-24 Thread Sean Owen
(We have two PRs to patch it up anyway already) On Mon, Jun 24, 2019 at 11:39 AM shane knapp wrote: > > i'm aware and will be looking in to this later today. > > see: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/6043/console > > -- > Shane Knapp > UC Berkeley EECS

Re: Java version for building Spark

2019-06-24 Thread Sean Owen
"The Maven-based build is the build of reference for Apache Spark. Building Spark using Maven requires Maven 3.5.4 and Java 8." It doesn't depend on a particular version of Java 8. Installing it is platform-dependent. On Mon, Jun 24, 2019 at 6:43 PM Valeriy Trofimov wrote: > > Hi All, > > What Ja

Re: Ask for ARM CI for spark

2019-06-26 Thread Sean Owen
ffort to fix the >>>>> ARM-related issues. I'd be happy to help if you like. And could you give >>>>> the trace link of this issue, then I can check it is fixed or not, thank >>>>> you. >>>>> As far as I know the old versions of spar

Re: Jackson version updation

2019-06-28 Thread Sean Owen
https://github.com/apache/spark/blob/branch-2.4/pom.xml#L161 Correct, because it would introduce behavior changes. On Fri, Jun 28, 2019 at 3:54 AM Pavithra R wrote: > In spark master branch, the version of Jackson jars have been upgraded to > 2.9.9 > > > https://github.com/apache/spark/commit/bd

Re: Timeline for Spark 3.0

2019-06-28 Thread Sean Owen
That's a good question. Although we had penciled in 'middle of the year' I don't think we're in sight of a QA phase just yet, as I believe some key items are still in progress. I'm thinking of the Hive update, and DS v2 work (?). I'm also curious to hear what broad TODOs people see for 3.0? we pro

Re: Disabling `Merge Commits` from GitHub Merge Button

2019-07-01 Thread Sean Owen
I'm using the merge script in both repos. I think that was the best practice? So, sure, I'm fine with disabling it. On Mon, Jul 1, 2019 at 3:53 PM Dongjoon Hyun wrote: > > Hi, Apache Spark PMC members and committers. > > We are using GitHub `Merge Button` in `spark-website` repository > because i

Re: Sample date_trunc error for webpage (https://spark.apache.org/docs/2.3.0/api/sql/#date_trunc )

2019-07-07 Thread Sean Owen
binggan1989, I don't see any problem in that snippet. What are you referring to? On Sun, Jul 7, 2019, 2:22 PM Chris Lambertus wrote: > Spark, > > We received this message. I have not ACKd it. > > -Chris > INFRA > > > Begin forwarded message: > > *From: *"binggan1989" > *Subject: **Sample date_t

Re: Sample date_trunc error for webpage (https://spark.apache.org/docs/2.3.0/api/sql/#date_trunc )

2019-07-07 Thread Sean Owen
es. Thanks! On Sun, Jul 7, 2019 at 1:56 PM Russell Spitzer wrote: > The args look like they are in the wrong order in the doc > > On Sun, Jul 7, 2019, 1:50 PM Sean Owen wrote: > >> binggan1989, I don't see any problem in that snippet. What are you >> referring to? &

Opinions wanted: how much to match PostgreSQL semantics?

2019-07-08 Thread Sean Owen
See the particular issue / question at https://github.com/apache/spark/pull/24872#issuecomment-509108532 and the larger umbrella at https://issues.apache.org/jira/browse/SPARK-27764 -- Dongjoon rightly suggests this is a broader question. ---

Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-09 Thread Sean Owen
We will certainly want a 2.4.4 release eventually. In fact I'd expect 2.4.x gets maintained for longer than the usual 18 months, as it's the last 2.x branch. It doesn't need to happen before 3.0, but could. Usually maintenance releases happen 3-4 months apart and the last one was 2 months ago. If t

Re: Ask for ARM CI for spark

2019-07-17 Thread Sean Owen
On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang wrote: > Two failed and the reason is 'Can't find 1 executors before 1 > milliseconds elapsed', see below, then we try increase timeout the tests > passed, so wonder if we can increase the timeout? and here I have another > question about > htt

Re: Ask for ARM CI for spark

2019-07-26 Thread Sean Owen
000 >> ms to 3(even 2)ms, >> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L764 >> >> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L792 >>

Re: Apache Training contribution for Spark - Feedback welcome

2019-07-26 Thread Sean Owen
Generally speaking, I think we want to encourage more training and tutorial content out there, for sure, so, the more the merrier. My reservation here is that as an Apache project, it might appear to 'bless' one set of materials as authoritative over all the others out there. And there are already

Re: Apache Training contribution for Spark - Feedback welcome

2019-07-26 Thread Sean Owen
On Fri, Jul 26, 2019 at 4:01 PM Lars Francke wrote: > I understand why it might be seen that way and we need to make sure to point > out that we have no intention of becoming "The official Apache Spark > training" because that's not our intention at all. Of course that's the intention; the prob

Re: Ask for ARM CI for spark

2019-07-27 Thread Sean Owen
Great thanks - we can take this to JIRAs now. I think it's worth changing the implementation of atanh if the test value just reflects what Spark does, and there's evidence is a little bit inaccurate. There's an equivalent formula which seems to have better accuracy. On Fri, Jul 26, 2019 at 10:02 P

Re: Apache Training contribution for Spark - Feedback welcome

2019-07-29 Thread Sean Owen
TL;DR is: take the below as feedback to consider, and proceed as you see fit. Nobody's suggesting you can't do this. On Mon, Jul 29, 2019 at 2:58 AM Lars Francke wrote: > The way I read your point is that anyone can publish material (which includes > source code) under the ALv2 outside of the AS

Fwd: The result of Math.log(3.0) is different on x86_64 and aarch64?

2019-07-29 Thread Sean Owen
86_64 and aarch64? To: Sean Owen Sorry to disturb you, I forward the jdk-dev email to you, maybe you are interesting :) -- Forwarded message - From: Pengfei Li (Arm Technology China) Date: Mon, Jul 29, 2019 at 5:52 PM Subject: RE: The result of Math.log(3.0) is different on x

Re: Recognizing non-code contributions

2019-08-01 Thread Sean Owen
n Thu, Aug 1, 2019 at 6:13 PM Hyukjin Kwon wrote: >>>> >>>> I agree with Sean in general, in particular, commit bit. >>>> >>>> Personal thought: >>>> I think committer should at least be used to the dev at some degree as >>>> primary. &

Re: Recognizing non-code contributions

2019-08-02 Thread Sean Owen
o. I personally am not sure it adds enough to justify the process, and may wade too deeply into controversies about whether this is just extra gatekeeping vs something helpful. On Thu, Aug 1, 2019 at 11:09 PM Sean Owen wrote: > > (Let's move this thread to dev@ now as it is a general a

Re: Recognizing non-code contributions

2019-08-04 Thread Sean Owen
On Sun, Aug 4, 2019 at 11:21 AM Myrle Krantz wrote: > Let me make a guess at what you are trying to accomplish with it. Correct me > please if I'm wrong: > * You want to encourage contributions that aren't just code contributions. > You recognize for example that good documentation is critical

Re: Recognizing non-code contributions

2019-08-04 Thread Sean Owen
Oops, I also failed to copy dev@ On Sun, Aug 4, 2019 at 3:06 PM Sean Owen wrote: > > On Sun, Aug 4, 2019 at 1:54 PM Myrle Krantz wrote: > >> No, I think the position here was docs-only contributors _could_ be > >> committers. The "only coders can be c

Re: Recognizing non-code contributions

2019-08-05 Thread Sean Owen
On Mon, Aug 5, 2019 at 3:50 AM Myrle Krantz wrote: > So... events coordinators? I'd still make them committers. I guess I'm > still struggling to understand what problem making people VIP's without > giving them committership is trying to solve. We may just agree to disagree, which is fine, b

Re: Recognizing non-code contributions

2019-08-06 Thread Sean Owen
On Tue, Aug 6, 2019 at 1:14 AM Myrle Krantz wrote: > If someone makes a commit who you are not expecting to make a commit, or in > an area you weren't expecting changes in, you'll notice that, right? Not counterarguments, but just more color on the hesitation: - Probably, but it's less obvious

Re: Recognizing non-code contributions

2019-08-06 Thread Sean Owen
On Tue, Aug 6, 2019 at 10:46 AM Myrle Krantz wrote: >> You can tell there's a range of opinions here. I'm probably less >> 'conservative' about adding committers than most on the PMC, right or >> wrong, but more conservative than some at the ASF. I think there's >> room to inch towards the middle

Re: Recognizing non-code contributions

2019-08-06 Thread Sean Owen
On Tue, Aug 6, 2019 at 11:45 AM Myrle Krantz wrote: > I had understood your position to be that you would be willing to make at > least some non-coding contributors to committers but that your "line" is > somewhat different than my own. My response to you assumed that position on > your part.

Re: [SPARK-23207] Repro

2019-08-09 Thread Sean Owen
Interesting but I'd put this on the JIRA, and also test vs master first. It's entirely possible this is something else that was subsequently fixed, and maybe even backported for 2.4.4. (I can't quite reproduce it - just makes the second job fail, which is also puzzling) On Fri, Aug 9, 2019 at 8:11

Re: My curation of pending structured streaming PRs to review

2019-08-13 Thread Sean Owen
General tips: - dev@ is not usually the right place to discuss _specific_ changes except once in a while to call attention - Ping the authors of the code being changed directly - Tighten the change if possible - Tests, reproductions, docs, etc help prove the change - Bugs are more important than n

Re: Release Apache Spark 2.4.4

2019-08-13 Thread Sean Owen
Seems fine to me if there are enough valuable fixes to justify another release. If there are any other important fixes imminent, it's fine to wait for those. On Tue, Aug 13, 2019 at 6:16 PM Dongjoon Hyun wrote: > > Hi, All. > > Spark 2.4.3 was released three months ago (8th May). > As of today (

Re: Ask for ARM CI for spark

2019-08-15 Thread Sean Owen
penlab/spark/pull/17/ , there are several things I > want to talk about: > > First, about the failed tests: > 1.we have fixed some problems like > https://github.com/apache/spark/pull/25186 and > https://github.com/apache/spark/pull/25279, thanks sean owen and others > to help

Re: Release Apache Spark 2.4.4

2019-08-15 Thread Sean Owen
While we're on the topic: In theory, branch 2.3 is meant to be unsupported as of right about now. There are 69 fixes in branch 2.3 since 2.3.3 was released in Februrary: https://issues.apache.org/jira/projects/SPARK/versions/12344844 Some look moderately important. Should we also, or first, cut

Re: Ask for ARM CI for spark

2019-08-15 Thread Sean Owen
master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull > > Best regards > > ZhaoBo > > > > > [image: Mailtrack] > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;> > Sender > notified by > M

Re: Release Spark 2.3.4

2019-08-16 Thread Sean Owen
I think it's fine to do these in parallel, yes. Go ahead if you are willing. On Fri, Aug 16, 2019 at 9:48 AM Kazuaki Ishizaki wrote: > > Hi, All. > > Spark 2.3.3 was released six months ago (15th February, 2019) at > http://spark.apache.org/news/spark-2-3-3-released.html. And, about 18 months >

Re: [VOTE] Release Apache Spark 2.4.4 (RC1)

2019-08-19 Thread Sean Owen
Things are looking pretty good so far, but a few notes: I thought we might need this PR to make the 2.12 build of 2.4.x not try to build Kafka 0.8 support, but, I'm not seeing that 2.4.x + 2.12 builds or tests it? https://github.com/apache/spark/pull/25482 I can merge this to 2.4 shortly anyway, b

Re: [VOTE] Release Apache Spark 2.4.4 (RC1)

2019-08-20 Thread Sean Owen
Sounds fine, we probably needed SPARK-28775 anyway. I merged that and SPARK-28749. It looks like it's just the one you're talking about right now, SPARK-28699. The rest of the tests seemed to pass OK, release looks good, but bears more testing by everyone out there before a next RC. On Tue, Aug 20

Unmarking most things as experimental, evolving for 3.0?

2019-08-21 Thread Sean Owen
There are currently about 130 things marked as 'experimental' in Spark, and some have been around since Spark 1.x. A few may be legitimately still experimental (e.g. barrier mode), but, would it be safe to say most of these annotations should be removed for 3.0? What's the theory for evolving vs e

Re: Unmarking most things as experimental, evolving for 3.0?

2019-08-22 Thread Sean Owen
> >> +1 for unmarking old ones (made in `2.3.x` and before). >> Thank you, Sean. >> >> Bests, >> Dongjoon. >> >> On Wed, Aug 21, 2019 at 6:46 PM Sean Owen wrote: >>> >>> There are currently about 130 things marked as 'experime

Re: How to load Python Pickle File in Spark Data frame

2019-08-26 Thread Sean Owen
Yes, this does not read raw pickle files. It reads files written in the standard Spark/Hadoop form for binary objects (SequenceFiles) but uses Python pickling for the serialization. See the docs, which say this reads what saveAsPickleFile() writes. On Mon, Aug 26, 2019 at 12:23 AM hxngillani wrot

Re: JDK11 Support in Apache Spark

2019-08-26 Thread Sean Owen
Bringing a side conversation back to main: good news / bad news. We most definitely want one build to run on JDK 8 and JDK 11. That is actually what both of the JDK 11 jobs do right now, so I believe the passing Jenkins job suggests that already works. The downside is I think we haven't necessari

Re: [VOTE] Release Apache Spark 2.4.4 (RC2)

2019-08-26 Thread Sean Owen
+1 as per response to RC1. The existing issues identified there seem to have been fixed. On Mon, Aug 26, 2019 at 2:45 AM Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.4.4. > > The vote is open until August 29th 1AM PST and passes if a majo

Re: [VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-27 Thread Sean Owen
+1 - license and signature looks OK, the docs look OK, the artifacts seem to be in order. Tests passed for me when building from source with most common profiles set. On Mon, Aug 26, 2019 at 3:28 PM Kazuaki Ishizaki wrote: > > Please vote on releasing the following candidate as Apache Spark versi

Re: [DISCUSSION]JDK11 for Apache 2.x?

2019-08-27 Thread Sean Owen
I think one of the key problems here are the required dependency upgrades. It would mean many minor breaking changes and a few bigger ones, notably around Hive, and forces a scala 2.12-only update. I think my question is whether that even makes sense as a minor release? it wouldn't be backwards com

Re: [DISCUSSION]JDK11 for Apache 2.x?

2019-08-27 Thread Sean Owen
mited, so, by the > time we evolve to Spark 3, we could combine it with Java 11. > > On the other hand, not everybody may think this way and it may slow down the > adoption of Spark 3… > > However, I concur with Sean, I don’t think another 2.x is needed for Java 11. > > > On

Re: [VOTE] Release Apache Spark 2.4.4 (RC3)

2019-08-28 Thread Sean Owen
+1 from me again. On Tue, Aug 27, 2019 at 6:06 PM Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.4.4. > > The vote is open until August 30th 5PM PST and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1 votes. > > [ ] +1

Standardizing test build config

2019-08-28 Thread Sean Owen
I'm surfacing this to dev@ as the right answers may depend on a lot of historical decisions that I don't know about. See https://issues.apache.org/jira/browse/SPARK-28900 for a summary of how the different build configs are set up, and why we might need to standardize them to fully test with JDK 1

Re: Providing a namespace for third-party configurations

2019-08-30 Thread Sean Owen
It's possible, but pretty unlikely to have an exact namespace collision. It's probably a best practice to clearly separates settings, etc that are downstream add-ons into a separate namespace, and I don't mind a sentence in a doc somewhere suggesting a convention, but I really think it's up to down

Re: [DISCUSSION]JDK11 for Apache 2.x?

2019-09-01 Thread Sean Owen
blog/2019/05/22/openjdk-8-and-11-still-in-safe-hands/ > ). > > On Tue, Aug 27, 2019 at 2:22 PM Sean Owen wrote: >> >> Spark 3 will not require Java 11; it will work with Java 8 too. I >> think the question is whether someone who _wants_ Java 11 should have >> a 2.x

Re: Why two netty libs?

2019-09-03 Thread Sean Owen
It was for historical reasons; some other transitive dependencies needed it. I actually was just able to exclude Netty 3 last week from master. Spark uses Netty 4. On Tue, Sep 3, 2019 at 6:59 AM Jacek Laskowski wrote: > > Hi, > > Just noticed that Spark 2.4.x uses two netty deps of different vers

Re: maven 3.6.1 removed from apache maven repo

2019-09-03 Thread Sean Owen
It's because build/mvn only queries ASF mirrors, and they remove non-current releases from mirrors regularly (we do the same). This may help avoid this in the future: https://github.com/apache/spark/pull/25667 On Tue, Sep 3, 2019 at 1:41 PM Xiao Li wrote: > Hi, Tom, > > To unblock the build, I m

Re: Schema inference for nested case class issue

2019-09-04 Thread Sean Owen
user@ is the right place for these types of questions. As the error says, you have a case class that defines a schema including columns like 'fix' but these don't appear to be in your DataFrame. It needs to match. On Wed, Sep 4, 2019 at 6:44 AM El Houssain ALLAMI wrote: > > Hi , > > i have nested

Re: Why two netty libs?

2019-09-04 Thread Sean Owen
nything, it ends up on the CP > > On Tue, Sep 3, 2019 at 5:18 PM Shixiong(Ryan) Zhu > wrote: >> >> Yep, historical reasons. And Netty 4 is under another namespace, so we can >> use Netty 3 and Netty 4 in the same JVM. >> >> On Tue, Sep 3, 2019 at 6:15 AM

Re: DataFrameReader bottleneck in DataSource#checkAndGlobPathIfNecessary when reading S3 files

2019-09-06 Thread Sean Owen
I think the problem is calling globStatus to expand all 300K files. This is a general problem for object stores and huge numbers of files. Steve L. may have better thoughts on real solutions. But you might consider, if possible, running a lot of .csv jobs in parallel to query subsets of all the fil

Re: Resolving all JIRAs affecting EOL releases

2019-09-08 Thread Sean Owen
I think simply closing old issues with no activity in a long time is OK. The "Affected Version" is somewhat noisy, so not even particularly important to also query, but yeah I see some value in trying to limit the scope this way. On Sat, Sep 7, 2019 at 10:15 PM Hyukjin Kwon wrote: > > HI all, > >

Thoughts on Spark 3 release, or a preview release

2019-09-11 Thread Sean Owen
I'm curious what current feelings are about ramping down towards a Spark 3 release. It feels close to ready. There is no fixed date, though in the past we had informally tossed around "back end of 2019". For reference, Spark 1 was May 2014, Spark 2 was July 2016. I'd expect Spark 2 to last longer,

Re: Ask for ARM CI for spark

2019-09-12 Thread Sean Owen
iew/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull >>> >>> >>> Best regards >>> >>> ZhaoBo >>> >>> [image: Mailtrack] >>> <https://mailtrack.io?utm_source=gmail&utm_medium

Re: Thoughts on Spark 3 release, or a preview release

2019-09-13 Thread Sean Owen
in early 2020 >>>>>> >>>>>> For JDK11 clean-up, it will meet the timeline and `3.0.0-preview` helps >>>>>> it a lot. >>>>>> >>>>>> After this discussion, can we have some timeline for `Spark 3.0 Release >

Re: Thoughts on Spark 3 release, or a preview release

2019-09-14 Thread Sean Owen
I don't think this suggests anything is finalized, including APIs. I would not guess there will be major changes from here though. On Fri, Sep 13, 2019 at 4:27 PM Andrew Melo wrote: > > Hi Spark Aficionados- > > On Fri, Sep 13, 2019 at 15:08 Ryan Blue wrote: >> >> +1 for a preview release. >> >>

Re: [build system] weird mvn errors post-cache cleaning

2019-09-17 Thread Sean Owen
That's super weird; can you just delete ~/.m2 and let it download the internet again? or at least blow away the downloaded Kafka dir? Turning it on and off, so to speak, often works. On Tue, Sep 17, 2019 at 2:41 PM Shane Knapp wrote: > > a bunch of the PRB builds are now failing w/various permuta

Re: Spark 3.0 preview release on-going features discussion

2019-09-20 Thread Sean Owen
Is this a list of items that might be focused on for the final 3.0 release? At least, Scala 2.13 support shouldn't be on that list. The others look plausible, or are already done, but there are probably more. As for the 3.0 preview, I wouldn't necessarily block on any particular feature, though, y

Re: [DISCUSS] Spark 2.5 release

2019-09-20 Thread Sean Owen
Narrowly on Java 11: the problem is that it'll take some breaking changes, more than would be usually appropriate in a minor release, I think. I'm still not convinced there is a burning need to use Java 11 but stay on 2.4, after 3.0 is out, and at least the wheels are in motion there. Java 8 is sti

Re: [DISCUSS] Spark 2.5 release

2019-09-20 Thread Sean Owen
I don't know enough about DSv2 to comment on this part, but, any theoretical 2.5 is still a ways off. Does waiting for 3.0 to 'stabilize' it as much as is possible help? I say that because re: Java 11, the main breaking change is probably the Hive 2 / Hadoop 3 dependency, JPMML (minor), as well as

Re: Urgent : Changes required in the archive

2019-09-26 Thread Sean Owen
The message in question has already been public, and copied to mirrors the ASF does not control, for a year and a half. There is a process for requesting modification to ASF archives, but this case does not qualify: https://www.apache.org/foundation/public-archives.html On Thu, Sep 26, 2019 at 1:5

Committers: if you can't log into JIRA...

2019-09-26 Thread Sean Owen
I hit a few snags with the JIRA LDAP update. In case this saves anyone time: - You have to use your ASF LDAP password now - If your JIRA and ASF IDs aren't the same, file an INFRA JIRA - If it still won't let you log in after answering captchas, try logging in once in Chrome incognito mode - And/o

Re: [build system] maven master branch builds timing out en masse...

2019-10-07 Thread Sean Owen
Moving the conversation here -- yes, why on earth are they taking this long all of the sudden? we'll have to look again when they come back online. The last successful build took 6 hours, of which 4:45 were the unit tests themselves. It's mostly SQL tests; SQLQuerySuite is approaching an hour. ht

Re: Auto-closing PRs when there are no feedback or response from its author

2019-10-08 Thread Sean Owen
I'm generally all for closing pretty old PRs. They can be reopened easily. Closing a PR (a particular proposal for how to resolve an issue) is less drastic than closing a JIRA (a description of an issue). Closing them just delivers the reality, that nobody is going to otherwise revisit it, and can

Re: [k8s] Spark operator (the Java one)

2019-10-10 Thread Sean Owen
I'd have the same question on the PR - why does this need to be in the Apache Spark project vs where it is now? Yes, it's not a Spark package per se, but it seems like this is a tool for K8S to use Spark rather than a core Spark tool. Yes of course all the packages, licenses, etc have to be overha

Re: Spark 3.0 preview release feature list and major changes

2019-10-10 Thread Sean Owen
See the JIRA - this is too open-ended and not obviously just due to choices in data representation, what you're trying to do, etc. It's correctly closed IMHO. However, identifying the issue more narrowly, and something that looks ripe for optimization, would be useful. On Thu, Oct 10, 2019 at 12:3

Re: Add spark dependency on on org.opencypher:okapi-shade.okapi

2019-10-15 Thread Sean Owen
I do not have a very informed opinion here, so take this with a grain of salt. I'd say that we need to either commit a coherent version of this for Spark 3, or not at all. If it doesn't have support, I'd back out the existing changes. I was initially skeptical about how much this needs to be in Sp

Re: branch-3.0 vs branch-3.0-preview (?)

2019-10-16 Thread Sean Owen
I don't think we would want to cut 'branch-3.0' right now, which would imply that master is 3.1. We don't want to merge every new change into two branches. It may still be useful to have `branch-3.0-preview` as a short-lived branch just used to manage the preview release, as we will need to let dev

<    1   2   3   4   5   6   7   8   9   10   >