Re: Automated formatting

2018-11-21 Thread DB Tsai
I like the idea of checking only the diff. Even I am sometimes confused about the right style in Spark since I am working on multiple projects with slightly different coding styles. On Wed, Nov 21, 2018 at 1:36 PM Sean Owen wrote: > I know the PR builder runs SBT, but I presume this would just b

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-21 Thread DB Tsai
+1 on removing Scala 2.11 support for 3.0 given Scala 2.11 is already EOL. On Tue, Nov 20, 2018 at 2:53 PM Sean Owen wrote: > PS: pull request at https://github.com/apache/spark/pull/23098 > Not going to merge it until there's clear agreement. > > On Tue, Nov 20, 2018 at 10:16 AM Ryan Blue wro

Re: Automated formatting

2018-11-21 Thread Sean Owen
I know the PR builder runs SBT, but I presume this would just be a separate mvn job that runs. If it doesn't take long and only checks the right diff, seems worth a shot. What's the invocation that Shane could add (after this change goes in) On Wed, Nov 21, 2018 at 3:27 PM Cody Koeninger wrote: >

Re: Automated formatting

2018-11-21 Thread Cody Koeninger
There's a mvn plugin (sbt as well, but it requires sbt 1.0+) so it should be runnable from the PR builder Super basic example with a minimal config that's close to current style guide here: https://github.com/apache/spark/compare/master...koeninger:scalafmt I imagine tracking down the corner cas

Re: Automated formatting

2018-11-21 Thread Sean Owen
Yeah fair, maybe mostly consistent in broad strokes but not in the details. Is this something that can be just run in the PR builder? if the rules are simple and not too hard to maintain, seems like a win. On Wed, Nov 21, 2018 at 2:26 PM Cody Koeninger wrote: > > Definitely not suggesting a mass r

Re: Scala lint failing unexpectedly

2018-11-21 Thread Shmuel Blitz
Hi Sean. Thanks for the very fast response. Absolutely. I'm on master, and I couldn't find any Await on line 269 either. That's what's so weird. Shmuel On Wed, Nov 21, 2018 at 10:21 PM Sean Owen wrote: > I don't see any of the CI builds failing like this. There's an > Await.result in the file

Re: Automated formatting

2018-11-21 Thread Cody Koeninger
Definitely not suggesting a mass reformat, just on a per-PR basis. scalafmt --diff will reformat only the files that differ from git head scalafmt --test --diff won't modify files, just throw an exception if they don't match format I don't think code is consistently formatted now. I tried scalaf

Re: Scala lint failing unexpectedly

2018-11-21 Thread Sean Owen
I don't see any of the CI builds failing like this. There's an Await.result in the file, but it's suppressed already, and I don't see it at line 269. I don't see an issue like this in recent branches either. You're sure you are working off, say, master, and/or you're looking at the code that it's

Scala lint failing unexpectedly

2018-11-21 Thread Shmuel Blitz
Hi, These are my first steps in building and testing Spark locally. After successfully building Spark locally, I ran `./build/run-tests`, which starts by running the linters. The Scala lint fails with: Scalastyle checks failed at following occurrences: ``` [error] /usr/dev/spark/core/src/main/sc

Re: Double pass over ORC data files even after supplying schema and setting inferSchema = false

2018-11-21 Thread Thakrar, Jayesh
Thank you for the quick reply Dongjoon. This sound interesting and it might the resolution for our issue. Let me see do some tests and will update the thread. Thanks, Jayesh From: Dongjoon Hyun Date: Wednesday, November 21, 2018 at 11:46 AM To: "Thakrar, Jayesh" Cc: dev Subject: Re: Double pa

Re: Automated formatting

2018-11-21 Thread Sean Owen
I think reformatting the whole code base might be too much. If there are some more targeted cleanups, sure. We do have some links to style guides buried somewhere in the docs, although the conventions are pretty industry standard. I *think* the code is pretty consistently formatted now, and would

Re: Double pass over ORC data files even after supplying schema and setting inferSchema = false

2018-11-21 Thread Dongjoon Hyun
Hi, Thakrar. Which version are you using now? If it's below Spark 2.4.0, please try to use 2.4.0. There was an improvement related to that. https://issues.apache.org/jira/browse/SPARK-25126 Bests, Dongjoon. On Wed, Nov 21, 2018 at 6:17 AM Thakrar, Jayesh < jthak...@conversantmedia.com> wrote:

Automated formatting

2018-11-21 Thread Cody Koeninger
Is there any appetite for revisiting automating formatting? I know over the years various people have expressed opposition to it as unnecessary churn in diffs, but having every new contributor greeted with "nit: 4 space indentation for argument lists" isn't very welcoming. ---

Double pass over ORC data files even after supplying schema and setting inferSchema = false

2018-11-21 Thread Thakrar, Jayesh
Hi All, We have some batch processing where we read 100s of thousands of ORC files. What I found is that this was taking too much time AND that there was a long pause between the point the read begins in the code and the executors get into action. That period is about 1.5+ hours where only the d

Re: Some PRs not automatically linked to JIRAs

2018-11-21 Thread Hyukjin Kwon
This issue is still persistent. https://issues.apache.org/jira/browse/SPARK-26132 https://issues.apache.org/jira/browse/SPARK-26129 https://issues.apache.org/jira/browse/SPARK-26127 https://issues.apache.org/jira/browse/SPARK-26109 https://issues.apache.org/jira/browse/SPARK-26106 https://issues.a