svn commit: r31018 - in /dev/spark/3.0.0-SNAPSHOT-2018_11_20_20_57-4b7f7ef-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-11-20 Thread pwendell
Author: pwendell
Date: Wed Nov 21 05:10:40 2018
New Revision: 31018

Log:
Apache Spark 3.0.0-SNAPSHOT-2018_11_20_20_57-4b7f7ef docs


[This commit notification would consist of 1755 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r31015 - in /dev/spark/2.4.1-SNAPSHOT-2018_11_20_18_56-d8e05d2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-11-20 Thread pwendell
Author: pwendell
Date: Wed Nov 21 03:10:31 2018
New Revision: 31015

Log:
Apache Spark 2.4.1-SNAPSHOT-2018_11_20_18_56-d8e05d2 docs


[This commit notification would consist of 1476 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-26120][TESTS][SS][SPARKR] Fix a streaming query leak in Structured Streaming R tests

2018-11-20 Thread gurwls223
Repository: spark
Updated Branches:
  refs/heads/branch-2.4 3bb9fff68 -> d8e05d23a


[SPARK-26120][TESTS][SS][SPARKR] Fix a streaming query leak in Structured 
Streaming R tests

## What changes were proposed in this pull request?

Stop the streaming query in `Specify a schema by using a DDL-formatted string 
when reading` to avoid outputting annoying logs.

## How was this patch tested?

Jenkins

Closes #23089 from zsxwing/SPARK-26120.

Authored-by: Shixiong Zhu 
Signed-off-by: hyukjinkwon 
(cherry picked from commit 4b7f7ef5007c2c8a5090f22c6e08927e9f9a407b)
Signed-off-by: hyukjinkwon 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d8e05d23
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d8e05d23
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d8e05d23

Branch: refs/heads/branch-2.4
Commit: d8e05d23a046eee559b0c71bcfba5b9809c3d9eb
Parents: 3bb9fff
Author: Shixiong Zhu 
Authored: Wed Nov 21 09:31:12 2018 +0800
Committer: hyukjinkwon 
Committed: Wed Nov 21 09:31:34 2018 +0800

--
 R/pkg/tests/fulltests/test_streaming.R | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d8e05d23/R/pkg/tests/fulltests/test_streaming.R
--
diff --git a/R/pkg/tests/fulltests/test_streaming.R 
b/R/pkg/tests/fulltests/test_streaming.R
index bfb1a04..6f0d2ae 100644
--- a/R/pkg/tests/fulltests/test_streaming.R
+++ b/R/pkg/tests/fulltests/test_streaming.R
@@ -127,6 +127,7 @@ test_that("Specify a schema by using a DDL-formatted string 
when reading", {
   expect_false(awaitTermination(q, 5 * 1000))
   callJMethod(q@ssq, "processAllAvailable")
   expect_equal(head(sql("SELECT count(*) FROM people3"))[[1]], 3)
+  stopQuery(q)
 
   expect_error(read.stream(path = parquetPath, schema = "name stri"),
"DataType stri is not supported.")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-26120][TESTS][SS][SPARKR] Fix a streaming query leak in Structured Streaming R tests

2018-11-20 Thread gurwls223
Repository: spark
Updated Branches:
  refs/heads/master 2df34db58 -> 4b7f7ef50


[SPARK-26120][TESTS][SS][SPARKR] Fix a streaming query leak in Structured 
Streaming R tests

## What changes were proposed in this pull request?

Stop the streaming query in `Specify a schema by using a DDL-formatted string 
when reading` to avoid outputting annoying logs.

## How was this patch tested?

Jenkins

Closes #23089 from zsxwing/SPARK-26120.

Authored-by: Shixiong Zhu 
Signed-off-by: hyukjinkwon 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4b7f7ef5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4b7f7ef5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4b7f7ef5

Branch: refs/heads/master
Commit: 4b7f7ef5007c2c8a5090f22c6e08927e9f9a407b
Parents: 2df34db
Author: Shixiong Zhu 
Authored: Wed Nov 21 09:31:12 2018 +0800
Committer: hyukjinkwon 
Committed: Wed Nov 21 09:31:12 2018 +0800

--
 R/pkg/tests/fulltests/test_streaming.R | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4b7f7ef5/R/pkg/tests/fulltests/test_streaming.R
--
diff --git a/R/pkg/tests/fulltests/test_streaming.R 
b/R/pkg/tests/fulltests/test_streaming.R
index bfb1a04..6f0d2ae 100644
--- a/R/pkg/tests/fulltests/test_streaming.R
+++ b/R/pkg/tests/fulltests/test_streaming.R
@@ -127,6 +127,7 @@ test_that("Specify a schema by using a DDL-formatted string 
when reading", {
   expect_false(awaitTermination(q, 5 * 1000))
   callJMethod(q@ssq, "processAllAvailable")
   expect_equal(head(sql("SELECT count(*) FROM people3"))[[1]], 3)
+  stopQuery(q)
 
   expect_error(read.stream(path = parquetPath, schema = "name stri"),
"DataType stri is not supported.")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-26122][SQL] Support encoding for multiLine in CSV datasource

2018-11-20 Thread gurwls223
Repository: spark
Updated Branches:
  refs/heads/master 47851056c -> 2df34db58


[SPARK-26122][SQL] Support encoding for multiLine in CSV datasource

## What changes were proposed in this pull request?

In the PR, I propose to pass the CSV option `encoding`/`charset` to `uniVocity` 
parser to allow parsing CSV files in different encodings when `multiLine` is 
enabled. The value of the option is passed to the `beginParsing` method of 
`CSVParser`.

## How was this patch tested?

Added new test to `CSVSuite` for different encodings and enabled/disabled 
header.

Closes #23091 from MaxGekk/csv-miltiline-encoding.

Authored-by: Maxim Gekk 
Signed-off-by: hyukjinkwon 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2df34db5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2df34db5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2df34db5

Branch: refs/heads/master
Commit: 2df34db586bec379e40b5cf30021f5b7a2d79271
Parents: 4785105
Author: Maxim Gekk 
Authored: Wed Nov 21 09:29:22 2018 +0800
Committer: hyukjinkwon 
Committed: Wed Nov 21 09:29:22 2018 +0800

--
 .../sql/catalyst/csv/UnivocityParser.scala  | 12 ++-
 .../datasources/csv/CSVDataSource.scala |  6 --
 .../execution/datasources/csv/CSVSuite.scala| 21 
 3 files changed, 32 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2df34db5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
index 46ed58e..ed19693 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
@@ -271,11 +271,12 @@ private[sql] object UnivocityParser {
   def tokenizeStream(
   inputStream: InputStream,
   shouldDropHeader: Boolean,
-  tokenizer: CsvParser): Iterator[Array[String]] = {
+  tokenizer: CsvParser,
+  encoding: String): Iterator[Array[String]] = {
 val handleHeader: () => Unit =
   () => if (shouldDropHeader) tokenizer.parseNext
 
-convertStream(inputStream, tokenizer, handleHeader)(tokens => tokens)
+convertStream(inputStream, tokenizer, handleHeader, encoding)(tokens => 
tokens)
   }
 
   /**
@@ -297,7 +298,7 @@ private[sql] object UnivocityParser {
 val handleHeader: () => Unit =
   () => headerChecker.checkHeaderColumnNames(tokenizer)
 
-convertStream(inputStream, tokenizer, handleHeader) { tokens =>
+convertStream(inputStream, tokenizer, handleHeader, 
parser.options.charset) { tokens =>
   safeParser.parse(tokens)
 }.flatten
   }
@@ -305,9 +306,10 @@ private[sql] object UnivocityParser {
   private def convertStream[T](
   inputStream: InputStream,
   tokenizer: CsvParser,
-  handleHeader: () => Unit)(
+  handleHeader: () => Unit,
+  encoding: String)(
   convert: Array[String] => T) = new Iterator[T] {
-tokenizer.beginParsing(inputStream)
+tokenizer.beginParsing(inputStream, encoding)
 
 // We can handle header here since here the stream is open.
 handleHeader()

http://git-wip-us.apache.org/repos/asf/spark/blob/2df34db5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
index 4808e8e..554baaf 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
@@ -192,7 +192,8 @@ object MultiLineCSVDataSource extends CSVDataSource {
   UnivocityParser.tokenizeStream(
 
CodecStreams.createInputStreamWithCloseResource(lines.getConfiguration, path),
 shouldDropHeader = false,
-new CsvParser(parsedOptions.asParserSettings))
+new CsvParser(parsedOptions.asParserSettings),
+encoding = parsedOptions.charset)
 }.take(1).headOption match {
   case Some(firstRow) =>
 val caseSensitive = 
sparkSession.sessionState.conf.caseSensitiveAnalysis
@@ -203,7 +204,8 @@ object MultiLineCSVDataSource extends CSVDataSource {
   lines.getConfiguration,
   new Path(lines.getPath())),
 parsedOptions.headerFlag,
-new 

svn commit: r31014 - in /dev/spark/3.0.0-SNAPSHOT-2018_11_20_16_52-4785105-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-11-20 Thread pwendell
Author: pwendell
Date: Wed Nov 21 01:05:38 2018
New Revision: 31014

Log:
Apache Spark 3.0.0-SNAPSHOT-2018_11_20_16_52-4785105 docs


[This commit notification would consist of 1755 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-26124][BUILD] Update plugins to latest versions

2018-11-20 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 23bcd6ce4 -> 47851056c


[SPARK-26124][BUILD] Update plugins to latest versions

## What changes were proposed in this pull request?

Update many plugins we use to the latest version, especially MiMa, which 
entails excluding some new errors on old changes.

## How was this patch tested?

N/A

Closes #23087 from srowen/Plugins.

Authored-by: Sean Owen 
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/47851056
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/47851056
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/47851056

Branch: refs/heads/master
Commit: 47851056c20c5d981b1ca66bac3f00c19a882727
Parents: 23bcd6c
Author: Sean Owen 
Authored: Tue Nov 20 18:05:39 2018 -0600
Committer: Sean Owen 
Committed: Tue Nov 20 18:05:39 2018 -0600

--
 pom.xml| 40 
 project/MimaExcludes.scala | 10 +-
 project/plugins.sbt| 14 +++---
 3 files changed, 40 insertions(+), 24 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/47851056/pom.xml
--
diff --git a/pom.xml b/pom.xml
index 9130773..08a29d2 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1977,7 +1977,7 @@
 
   org.apache.maven.plugins
   maven-enforcer-plugin
-  3.0.0-M1
+  3.0.0-M2
   
 
   enforce-versions
@@ -2077,7 +2077,7 @@
 
   org.apache.maven.plugins
   maven-compiler-plugin
-  3.7.0
+  3.8.0
   
 ${java.version}
 ${java.version}
@@ -2094,7 +2094,7 @@
 
   org.apache.maven.plugins
   maven-surefire-plugin
-  2.22.0
+  3.0.0-M1
   
   
 
@@ -2148,7 +2148,7 @@
 
   org.scalatest
   scalatest-maven-plugin
-  1.0
+  2.0.0
   
   
 
${project.build.directory}/surefire-reports
@@ -2195,7 +2195,7 @@
 
   org.apache.maven.plugins
   maven-jar-plugin
-  3.0.2
+  3.1.0
 
 
   org.apache.maven.plugins
@@ -,7 +,7 @@
 
   org.apache.maven.plugins
   maven-clean-plugin
-  3.0.0
+  3.1.0
   
 
   
@@ -2240,9 +2240,12 @@
 
   org.apache.maven.plugins
   maven-javadoc-plugin
-  3.0.0-M1
+  3.0.1
   
--Xdoclint:all -Xdoclint:-missing
+
+  -Xdoclint:all
+  -Xdoclint:-missing
+
 
   
 example
@@ -2293,7 +2296,7 @@
 
   org.apache.maven.plugins
   maven-shade-plugin
-  3.2.0
+  3.2.1
   
 
   org.ow2.asm
@@ -2310,12 +2313,12 @@
 
   org.apache.maven.plugins
   maven-install-plugin
-  2.5.2
+  3.0.0-M1
 
 
   org.apache.maven.plugins
   maven-deploy-plugin
-  2.8.2
+  3.0.0-M1
 
 
   org.apache.maven.plugins
@@ -2361,7 +2364,7 @@
   
 org.apache.maven.plugins
 maven-jar-plugin
-[2.6,)
+3.1.0
 
   test-jar
 
@@ -2518,12 +2521,17 @@
   
 org.apache.maven.plugins
 maven-checkstyle-plugin
-2.17
+3.0.0
 
   false
   true
-  
${basedir}/src/main/java,${basedir}/src/main/scala
-  ${basedir}/src/test/java
+  
+${basedir}/src/main/java
+${basedir}/src/main/scala
+  
+  
+${basedir}/src/test/java
+  
   dev/checkstyle.xml
   ${basedir}/target/checkstyle-output.xml
   ${project.build.sourceEncoding}
@@ -2533,7 +2541,7 @@
   
 com.puppycrawl.tools
 checkstyle
-8.2
+8.14
   
 
 

http://git-wip-us.apache.org/repos/asf/spark/blob/47851056/project/MimaExcludes.scala
--
diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index e35e74a..b750535 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -36,7 +36,15 @@ object MimaExcludes {
 
   // Exclude rules for 3.0.x
   lazy val v30excludes = v24excludes ++ Seq(
-// [SPARK-26090] Resolve most miscellaneous deprecation and build warnings 
for 

spark git commit: [SPARK-26043][HOTFIX] Hotfix a change to SparkHadoopUtil that doesn't work in 2.11

2018-11-20 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 42c48387c -> 23bcd6ce4


[SPARK-26043][HOTFIX] Hotfix a change to SparkHadoopUtil that doesn't work in 
2.11

## What changes were proposed in this pull request?

Hotfix a change to SparkHadoopUtil that doesn't work in 2.11

## How was this patch tested?

Existing tests.

Closes #23097 from srowen/SPARK-26043.2.

Authored-by: Sean Owen 
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/23bcd6ce
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/23bcd6ce
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/23bcd6ce

Branch: refs/heads/master
Commit: 23bcd6ce458f1e49f307c89ca2794dc9a173077c
Parents: 42c4838
Author: Sean Owen 
Authored: Tue Nov 20 18:03:54 2018 -0600
Committer: Sean Owen 
Committed: Tue Nov 20 18:03:54 2018 -0600

--
 .../scala/org/apache/spark/deploy/SparkHadoopUtil.scala | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/23bcd6ce/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
index 217e514..7bb2a41 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
@@ -20,7 +20,7 @@ package org.apache.spark.deploy
 import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, 
DataOutputStream, File, IOException}
 import java.security.PrivilegedExceptionAction
 import java.text.DateFormat
-import java.util.{Arrays, Date, Locale}
+import java.util.{Arrays, Comparator, Date, Locale}
 
 import scala.collection.JavaConverters._
 import scala.collection.immutable.Map
@@ -269,10 +269,11 @@ private[spark] class SparkHadoopUtil extends Logging {
 name.startsWith(prefix) && !name.endsWith(exclusionSuffix)
   }
 })
-  Arrays.sort(fileStatuses,
-(o1: FileStatus, o2: FileStatus) => {
+  Arrays.sort(fileStatuses, new Comparator[FileStatus] {
+override def compare(o1: FileStatus, o2: FileStatus): Int = {
   Longs.compare(o1.getModificationTime, o2.getModificationTime)
-})
+}
+  })
   fileStatuses
 } catch {
   case NonFatal(e) =>


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r31013 - in /dev/spark/2.4.1-SNAPSHOT-2018_11_20_14_51-3bb9fff-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-11-20 Thread pwendell
Author: pwendell
Date: Tue Nov 20 23:07:56 2018
New Revision: 31013

Log:
Apache Spark 2.4.1-SNAPSHOT-2018_11_20_14_51-3bb9fff docs


[This commit notification would consist of 1476 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r31012 - in /dev/spark/2.3.3-SNAPSHOT-2018_11_20_14_51-0fb830c-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-11-20 Thread pwendell
Author: pwendell
Date: Tue Nov 20 23:06:40 2018
New Revision: 31012

Log:
Apache Spark 2.3.3-SNAPSHOT-2018_11_20_14_51-0fb830c docs


[This commit notification would consist of 1443 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r31009 - in /dev/spark/3.0.0-SNAPSHOT-2018_11_20_12_48-42c4838-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-11-20 Thread pwendell
Author: pwendell
Date: Tue Nov 20 21:00:38 2018
New Revision: 31009

Log:
Apache Spark 3.0.0-SNAPSHOT-2018_11_20_12_48-42c4838 docs


[This commit notification would consist of 1755 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [BUILD] refactor dev/lint-python in to something readable

2018-11-20 Thread shaneknapp
Repository: spark
Updated Branches:
  refs/heads/master db136d360 -> 42c48387c


[BUILD] refactor dev/lint-python in to something readable

## What changes were proposed in this pull request?

`dev/lint-python` is a mess of nearly unreadable bash.  i would like to fix 
that as best as i can.

## How was this patch tested?

the build system will test this.

Closes #22994 from shaneknapp/lint-python-refactor.

Authored-by: shane knapp 
Signed-off-by: shane knapp 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/42c48387
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/42c48387
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/42c48387

Branch: refs/heads/master
Commit: 42c48387c047d96154bcfeb95fcb816a43e60d7c
Parents: db136d3
Author: shane knapp 
Authored: Tue Nov 20 12:38:40 2018 -0800
Committer: shane knapp 
Committed: Tue Nov 20 12:38:40 2018 -0800

--
 dev/lint-python | 359 +++
 1 file changed, 220 insertions(+), 139 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/42c48387/dev/lint-python
--
diff --git a/dev/lint-python b/dev/lint-python
index 27d87f6..0681693 100755
--- a/dev/lint-python
+++ b/dev/lint-python
@@ -1,5 +1,4 @@
 #!/usr/bin/env bash
-
 #
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
@@ -16,160 +15,242 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
+# define test binaries + versions
+PYDOCSTYLE_BUILD="pydocstyle"
+MINIMUM_PYDOCSTYLE="3.0.0"
 
-SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
-SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")"
-# Exclude auto-generated configuration file.
-PATHS_TO_CHECK="$( cd "$SPARK_ROOT_DIR" && find . -name "*.py" )"
-DOC_PATHS_TO_CHECK="$( cd "$SPARK_ROOT_DIR" && find . -name "*.py" | grep -vF 
'functions.py' )"
-PYCODESTYLE_REPORT_PATH="$SPARK_ROOT_DIR/dev/pycodestyle-report.txt"
-PYDOCSTYLE_REPORT_PATH="$SPARK_ROOT_DIR/dev/pydocstyle-report.txt"
-PYLINT_REPORT_PATH="$SPARK_ROOT_DIR/dev/pylint-report.txt"
-PYLINT_INSTALL_INFO="$SPARK_ROOT_DIR/dev/pylint-info.txt"
-
-PYDOCSTYLEBUILD="pydocstyle"
-MINIMUM_PYDOCSTYLEVERSION="3.0.0"
-
-FLAKE8BUILD="flake8"
+FLAKE8_BUILD="flake8"
 MINIMUM_FLAKE8="3.5.0"
 
-SPHINXBUILD=${SPHINXBUILD:=sphinx-build}
-SPHINX_REPORT_PATH="$SPARK_ROOT_DIR/dev/sphinx-report.txt"
+PYCODESTYLE_BUILD="pycodestyle"
+MINIMUM_PYCODESTYLE="2.4.0"
 
-cd "$SPARK_ROOT_DIR"
+SPHINX_BUILD="sphinx-build"
 
-# compileall: https://docs.python.org/2/library/compileall.html
-python -B -m compileall -q -l $PATHS_TO_CHECK > "$PYCODESTYLE_REPORT_PATH"
-compile_status="${PIPESTATUS[0]}"
+function compile_python_test {
+local COMPILE_STATUS=
+local COMPILE_REPORT=
+
+if [[ ! "$1" ]]; then
+echo "No python files found!  Something is very wrong -- exiting."
+exit 1;
+fi
 
-# Get pycodestyle at runtime so that we don't rely on it being installed on 
the build server.
-# See: https://github.com/apache/spark/pull/1744#issuecomment-50982162
-# Updated to the latest official version of pep8. pep8 is formally renamed to 
pycodestyle.
-PYCODESTYLE_VERSION="2.4.0"
-PYCODESTYLE_SCRIPT_PATH="$SPARK_ROOT_DIR/dev/pycodestyle-$PYCODESTYLE_VERSION.py"
-PYCODESTYLE_SCRIPT_REMOTE_PATH="https://raw.githubusercontent.com/PyCQA/pycodestyle/$PYCODESTYLE_VERSION/pycodestyle.py;
+# compileall: https://docs.python.org/2/library/compileall.html
+echo "starting python compilation test..."
+COMPILE_REPORT=$( (python -B -mcompileall -q -l $1) 2>&1)
+COMPILE_STATUS=$?
+
+if [ $COMPILE_STATUS -ne 0 ]; then
+echo "Python compilation failed with the following errors:"
+echo "$COMPILE_REPORT"
+echo "$COMPILE_STATUS"
+exit "$COMPILE_STATUS"
+else
+echo "python compilation succeeded."
+echo
+fi
+}
 
-if [ ! -e "$PYCODESTYLE_SCRIPT_PATH" ]; then
-curl --silent -o "$PYCODESTYLE_SCRIPT_PATH" 
"$PYCODESTYLE_SCRIPT_REMOTE_PATH"
-curl_status="$?"
+function pycodestyle_test {
+local PYCODESTYLE_STATUS=
+local PYCODESTYLE_REPORT=
+local RUN_LOCAL_PYCODESTYLE=
+local VERSION=
+local EXPECTED_PYCODESTYLE=
+local 
PYCODESTYLE_SCRIPT_PATH="$SPARK_ROOT_DIR/dev/pycodestyle-$MINIMUM_PYCODESTYLE.py"
+local 
PYCODESTYLE_SCRIPT_REMOTE_PATH="https://raw.githubusercontent.com/PyCQA/pycodestyle/$MINIMUM_PYCODESTYLE/pycodestyle.py;
 
-if [ "$curl_status" -ne 0 ]; then
-echo "Failed to download pycodestyle.py from 
\"$PYCODESTYLE_SCRIPT_REMOTE_PATH\"."
-exit "$curl_status"
+if [[ ! "$1" ]]; then
+echo "No python files found!  Something is 

spark git commit: [SPARK-26084][SQL] Fixes unresolved AggregateExpression.references exception

2018-11-20 Thread hvanhovell
Repository: spark
Updated Branches:
  refs/heads/branch-2.3 90e4dd1cb -> 0fb830c49


[SPARK-26084][SQL] Fixes unresolved AggregateExpression.references exception

## What changes were proposed in this pull request?

This PR fixes an exception in `AggregateExpression.references` called on 
unresolved expressions. It implements the solution proposed in 
[SPARK-26084](https://issues.apache.org/jira/browse/SPARK-26084), a minor 
refactoring that removes the unnecessary dependence on `AttributeSet.toSeq`, 
which requires expression IDs and, therefore, can only execute successfully for 
resolved expressions.

The refactored implementation is both simpler and faster, eliminating the 
conversion of a `Set` to a
`Seq` and back to `Set`.

## How was this patch tested?

Added a new test based on the failing case in 
[SPARK-26084](https://issues.apache.org/jira/browse/SPARK-26084).

hvanhovell

Closes #23075 from ssimeonov/ss_SPARK-26084.

Authored-by: Simeon Simeonov 
Signed-off-by: Herman van Hovell 
(cherry picked from commit db136d360e54e13f1d7071a0428964a202cf7e31)
Signed-off-by: Herman van Hovell 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0fb830c4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0fb830c4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0fb830c4

Branch: refs/heads/branch-2.3
Commit: 0fb830c49a09d292249496fc379d130e7097526e
Parents: 90e4dd1
Author: Simeon Simeonov 
Authored: Tue Nov 20 21:29:56 2018 +0100
Committer: Herman van Hovell 
Committed: Tue Nov 20 21:31:39 2018 +0100

--
 .../expressions/aggregate/interfaces.scala  |  8 ++---
 .../aggregate/AggregateExpressionSuite.scala| 34 
 2 files changed, 37 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0fb830c4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
index e1d16a2..56c2ee6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
@@ -128,12 +128,10 @@ case class AggregateExpression(
   override def nullable: Boolean = aggregateFunction.nullable
 
   override def references: AttributeSet = {
-val childReferences = mode match {
-  case Partial | Complete => aggregateFunction.references.toSeq
-  case PartialMerge | Final => aggregateFunction.aggBufferAttributes
+mode match {
+  case Partial | Complete => aggregateFunction.references
+  case PartialMerge | Final => 
AttributeSet(aggregateFunction.aggBufferAttributes)
 }
-
-AttributeSet(childReferences)
   }
 
   override def toString: String = {

http://git-wip-us.apache.org/repos/asf/spark/blob/0fb830c4/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala
new file mode 100644
index 000..8e9c997
--- /dev/null
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
+import org.apache.spark.sql.catalyst.expressions.{Add, AttributeSet}
+
+class AggregateExpressionSuite extends 

spark git commit: [SPARK-26084][SQL] Fixes unresolved AggregateExpression.references exception

2018-11-20 Thread hvanhovell
Repository: spark
Updated Branches:
  refs/heads/branch-2.4 c28a27a25 -> 3bb9fff68


[SPARK-26084][SQL] Fixes unresolved AggregateExpression.references exception

## What changes were proposed in this pull request?

This PR fixes an exception in `AggregateExpression.references` called on 
unresolved expressions. It implements the solution proposed in 
[SPARK-26084](https://issues.apache.org/jira/browse/SPARK-26084), a minor 
refactoring that removes the unnecessary dependence on `AttributeSet.toSeq`, 
which requires expression IDs and, therefore, can only execute successfully for 
resolved expressions.

The refactored implementation is both simpler and faster, eliminating the 
conversion of a `Set` to a
`Seq` and back to `Set`.

## How was this patch tested?

Added a new test based on the failing case in 
[SPARK-26084](https://issues.apache.org/jira/browse/SPARK-26084).

hvanhovell

Closes #23075 from ssimeonov/ss_SPARK-26084.

Authored-by: Simeon Simeonov 
Signed-off-by: Herman van Hovell 
(cherry picked from commit db136d360e54e13f1d7071a0428964a202cf7e31)
Signed-off-by: Herman van Hovell 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3bb9fff6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3bb9fff6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3bb9fff6

Branch: refs/heads/branch-2.4
Commit: 3bb9fff687a1701b75552bae6a4f8bee3fa6460b
Parents: c28a27a
Author: Simeon Simeonov 
Authored: Tue Nov 20 21:29:56 2018 +0100
Committer: Herman van Hovell 
Committed: Tue Nov 20 21:31:11 2018 +0100

--
 .../expressions/aggregate/interfaces.scala  |  8 ++---
 .../aggregate/AggregateExpressionSuite.scala| 34 
 2 files changed, 37 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3bb9fff6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
index e1d16a2..56c2ee6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
@@ -128,12 +128,10 @@ case class AggregateExpression(
   override def nullable: Boolean = aggregateFunction.nullable
 
   override def references: AttributeSet = {
-val childReferences = mode match {
-  case Partial | Complete => aggregateFunction.references.toSeq
-  case PartialMerge | Final => aggregateFunction.aggBufferAttributes
+mode match {
+  case Partial | Complete => aggregateFunction.references
+  case PartialMerge | Final => 
AttributeSet(aggregateFunction.aggBufferAttributes)
 }
-
-AttributeSet(childReferences)
   }
 
   override def toString: String = {

http://git-wip-us.apache.org/repos/asf/spark/blob/3bb9fff6/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala
new file mode 100644
index 000..8e9c997
--- /dev/null
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
+import org.apache.spark.sql.catalyst.expressions.{Add, AttributeSet}
+
+class AggregateExpressionSuite extends 

spark git commit: [SPARK-26084][SQL] Fixes unresolved AggregateExpression.references exception

2018-11-20 Thread hvanhovell
Repository: spark
Updated Branches:
  refs/heads/master ab61ddb34 -> db136d360


[SPARK-26084][SQL] Fixes unresolved AggregateExpression.references exception

## What changes were proposed in this pull request?

This PR fixes an exception in `AggregateExpression.references` called on 
unresolved expressions. It implements the solution proposed in 
[SPARK-26084](https://issues.apache.org/jira/browse/SPARK-26084), a minor 
refactoring that removes the unnecessary dependence on `AttributeSet.toSeq`, 
which requires expression IDs and, therefore, can only execute successfully for 
resolved expressions.

The refactored implementation is both simpler and faster, eliminating the 
conversion of a `Set` to a
`Seq` and back to `Set`.

## How was this patch tested?

Added a new test based on the failing case in 
[SPARK-26084](https://issues.apache.org/jira/browse/SPARK-26084).

hvanhovell

Closes #23075 from ssimeonov/ss_SPARK-26084.

Authored-by: Simeon Simeonov 
Signed-off-by: Herman van Hovell 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/db136d36
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/db136d36
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/db136d36

Branch: refs/heads/master
Commit: db136d360e54e13f1d7071a0428964a202cf7e31
Parents: ab61ddb
Author: Simeon Simeonov 
Authored: Tue Nov 20 21:29:56 2018 +0100
Committer: Herman van Hovell 
Committed: Tue Nov 20 21:29:56 2018 +0100

--
 .../expressions/aggregate/interfaces.scala  |  8 ++---
 .../aggregate/AggregateExpressionSuite.scala| 34 
 2 files changed, 37 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/db136d36/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
index e1d16a2..56c2ee6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
@@ -128,12 +128,10 @@ case class AggregateExpression(
   override def nullable: Boolean = aggregateFunction.nullable
 
   override def references: AttributeSet = {
-val childReferences = mode match {
-  case Partial | Complete => aggregateFunction.references.toSeq
-  case PartialMerge | Final => aggregateFunction.aggBufferAttributes
+mode match {
+  case Partial | Complete => aggregateFunction.references
+  case PartialMerge | Final => 
AttributeSet(aggregateFunction.aggBufferAttributes)
 }
-
-AttributeSet(childReferences)
   }
 
   override def toString: String = {

http://git-wip-us.apache.org/repos/asf/spark/blob/db136d36/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala
new file mode 100644
index 000..8e9c997
--- /dev/null
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
+import org.apache.spark.sql.catalyst.expressions.{Add, AttributeSet}
+
+class AggregateExpressionSuite extends SparkFunSuite {
+
+  test("test references from unresolved aggregate functions") {
+val x = 

svn commit: r31003 - in /dev/spark/2.4.1-SNAPSHOT-2018_11_20_10_44-c28a27a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-11-20 Thread pwendell
Author: pwendell
Date: Tue Nov 20 18:58:34 2018
New Revision: 31003

Log:
Apache Spark 2.4.1-SNAPSHOT-2018_11_20_10_44-c28a27a docs


[This commit notification would consist of 1476 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r31001 - in /dev/spark/3.0.0-SNAPSHOT-2018_11_20_08_39-ab61ddb-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-11-20 Thread pwendell
Author: pwendell
Date: Tue Nov 20 16:51:54 2018
New Revision: 31001

Log:
Apache Spark 3.0.0-SNAPSHOT-2018_11_20_08_39-ab61ddb docs


[This commit notification would consist of 1755 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-26118][WEB UI] Introducing spark.ui.requestHeaderSize for setting HTTP requestHeaderSize

2018-11-20 Thread irashid
Repository: spark
Updated Branches:
  refs/heads/master c34c42234 -> ab61ddb34


[SPARK-26118][WEB UI] Introducing spark.ui.requestHeaderSize for setting HTTP 
requestHeaderSize

## What changes were proposed in this pull request?

Introducing spark.ui.requestHeaderSize for configuring Jetty's HTTP 
requestHeaderSize.
This way long authorization field does not lead to HTTP 413.

## How was this patch tested?

Manually with curl (which version must be at least 7.55).

With the original default value (8k limit):

```bash
# Starting history server with default requestHeaderSize
$ ./sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to 
/Users/attilapiros/github/spark/logs/spark-attilapiros-org.apache.spark.deploy.history.HistoryServer-1-apiros-MBP.lan.out

# Creating huge header
$ echo -n "X-Custom-Header: " > cookie
$ printf 'A%.0s' {1..9500} >> cookie

# HTTP GET with huge header fails with 431
$ curl  -H cookie http://458apiros-MBP.lan:18080/
Bad Message 431reason: Request Header Fields Too Large

# The log contains the error
$ tail -1 
/Users/attilapiros/github/spark/logs/spark-attilapiros-org.apache.spark.deploy.history.HistoryServer-1-apiros-MBP.lan.out
18/11/19 21:24:28 WARN HttpParser: Header is too large 8193>8192
```

After:

```bash
# Creating the history properties file with the increased requestHeaderSize
$ echo spark.ui.requestHeaderSize=1 > history.properties

# Starting Spark History Server with the settings
$ ./sbin/start-history-server.sh --properties-file history.properties
starting org.apache.spark.deploy.history.HistoryServer, logging to 
/Users/attilapiros/github/spark/logs/spark-attilapiros-org.apache.spark.deploy.history.HistoryServer-1-apiros-MBP.lan.out

# HTTP GET with huge header gives back HTML5 (I have added here only just a 
part of the response)
$ curl  -H cookie http://458apiros-MBP.lan:18080/

  ...
 
History Server
  
  
...
```

Closes #23090 from attilapiros/JettyHeaderSize.

Authored-by: “attilapiros” 
Signed-off-by: Imran Rashid 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ab61ddb3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ab61ddb3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ab61ddb3

Branch: refs/heads/master
Commit: ab61ddb34d58ab5701191c8fd3a24a62f6ebf37b
Parents: c34c422
Author: “attilapiros” 
Authored: Tue Nov 20 08:56:22 2018 -0600
Committer: Imran Rashid 
Committed: Tue Nov 20 08:56:22 2018 -0600

--
 .../scala/org/apache/spark/internal/config/package.scala | 6 ++
 core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 6 --
 docs/configuration.md| 8 
 3 files changed, 18 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ab61ddb3/core/src/main/scala/org/apache/spark/internal/config/package.scala
--
diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index ab2b872..9cc48f6 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -570,6 +570,12 @@ package object config {
   .stringConf
   .createOptional
 
+  private[spark] val UI_REQUEST_HEADER_SIZE =
+ConfigBuilder("spark.ui.requestHeaderSize")
+  .doc("Value for HTTP request header size in bytes.")
+  .bytesConf(ByteUnit.BYTE)
+  .createWithDefaultString("8k")
+
   private[spark] val EXTRA_LISTENERS = ConfigBuilder("spark.extraListeners")
 .doc("Class names of listeners to add to SparkContext during 
initialization.")
 .stringConf

http://git-wip-us.apache.org/repos/asf/spark/blob/ab61ddb3/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala 
b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
index 52a9551..316af9b 100644
--- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
+++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
@@ -356,13 +356,15 @@ private[spark] object JettyUtils extends Logging {
 
 (connector, connector.getLocalPort())
   }
+  val httpConfig = new HttpConfiguration()
+  httpConfig.setRequestHeaderSize(conf.get(UI_REQUEST_HEADER_SIZE).toInt)
 
   // If SSL is configured, create the secure connector first.
   val securePort = sslOptions.createJettySslContextFactory().map { factory 
=>
 val securePort = sslOptions.port.getOrElse(if (port > 0) 
Utils.userPort(port, 400) else 0)
 

spark git commit: [SPARK-26118][WEB UI] Introducing spark.ui.requestHeaderSize for setting HTTP requestHeaderSize

2018-11-20 Thread irashid
Repository: spark
Updated Branches:
  refs/heads/branch-2.4 096e0d8f0 -> c28a27a25


[SPARK-26118][WEB UI] Introducing spark.ui.requestHeaderSize for setting HTTP 
requestHeaderSize

## What changes were proposed in this pull request?

Introducing spark.ui.requestHeaderSize for configuring Jetty's HTTP 
requestHeaderSize.
This way long authorization field does not lead to HTTP 413.

## How was this patch tested?

Manually with curl (which version must be at least 7.55).

With the original default value (8k limit):

```bash
# Starting history server with default requestHeaderSize
$ ./sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to 
/Users/attilapiros/github/spark/logs/spark-attilapiros-org.apache.spark.deploy.history.HistoryServer-1-apiros-MBP.lan.out

# Creating huge header
$ echo -n "X-Custom-Header: " > cookie
$ printf 'A%.0s' {1..9500} >> cookie

# HTTP GET with huge header fails with 431
$ curl  -H cookie http://458apiros-MBP.lan:18080/
Bad Message 431reason: Request Header Fields Too Large

# The log contains the error
$ tail -1 
/Users/attilapiros/github/spark/logs/spark-attilapiros-org.apache.spark.deploy.history.HistoryServer-1-apiros-MBP.lan.out
18/11/19 21:24:28 WARN HttpParser: Header is too large 8193>8192
```

After:

```bash
# Creating the history properties file with the increased requestHeaderSize
$ echo spark.ui.requestHeaderSize=1 > history.properties

# Starting Spark History Server with the settings
$ ./sbin/start-history-server.sh --properties-file history.properties
starting org.apache.spark.deploy.history.HistoryServer, logging to 
/Users/attilapiros/github/spark/logs/spark-attilapiros-org.apache.spark.deploy.history.HistoryServer-1-apiros-MBP.lan.out

# HTTP GET with huge header gives back HTML5 (I have added here only just a 
part of the response)
$ curl  -H cookie http://458apiros-MBP.lan:18080/

  ...
 
History Server
  
  
...
```

Closes #23090 from attilapiros/JettyHeaderSize.

Authored-by: “attilapiros” 
Signed-off-by: Imran Rashid 
(cherry picked from commit ab61ddb34d58ab5701191c8fd3a24a62f6ebf37b)
Signed-off-by: Imran Rashid 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c28a27a2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c28a27a2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c28a27a2

Branch: refs/heads/branch-2.4
Commit: c28a27a2546ebbe0c001662126625638fcbb1100
Parents: 096e0d8
Author: “attilapiros” 
Authored: Tue Nov 20 08:56:22 2018 -0600
Committer: Imran Rashid 
Committed: Tue Nov 20 08:56:39 2018 -0600

--
 .../scala/org/apache/spark/internal/config/package.scala | 6 ++
 core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 6 --
 docs/configuration.md| 8 
 3 files changed, 18 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c28a27a2/core/src/main/scala/org/apache/spark/internal/config/package.scala
--
diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index bde0995..3b3c45f 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -528,6 +528,12 @@ package object config {
   .stringConf
   .createOptional
 
+  private[spark] val UI_REQUEST_HEADER_SIZE =
+ConfigBuilder("spark.ui.requestHeaderSize")
+  .doc("Value for HTTP request header size in bytes.")
+  .bytesConf(ByteUnit.BYTE)
+  .createWithDefaultString("8k")
+
   private[spark] val EXTRA_LISTENERS = ConfigBuilder("spark.extraListeners")
 .doc("Class names of listeners to add to SparkContext during 
initialization.")
 .stringConf

http://git-wip-us.apache.org/repos/asf/spark/blob/c28a27a2/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala 
b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
index 52a9551..316af9b 100644
--- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
+++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
@@ -356,13 +356,15 @@ private[spark] object JettyUtils extends Logging {
 
 (connector, connector.getLocalPort())
   }
+  val httpConfig = new HttpConfiguration()
+  httpConfig.setRequestHeaderSize(conf.get(UI_REQUEST_HEADER_SIZE).toInt)
 
   // If SSL is configured, create the secure connector first.
   val securePort = sslOptions.createJettySslContextFactory().map { factory 
=>

spark git commit: [SPARK-26076][BUILD][MINOR] Revise ambiguous error message from load-spark-env.sh

2018-11-20 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master a00aaf649 -> c34c42234


[SPARK-26076][BUILD][MINOR] Revise ambiguous error message from 
load-spark-env.sh

## What changes were proposed in this pull request?

When I try to run scripts (e.g. `start-master.sh`/`start-history-server.sh ` in 
latest master, I got such error:
```
Presence of build for multiple Scala versions detected.
Either clean one of them or, export SPARK_SCALA_VERSION in spark-env.sh.
```

The error message is quite confusing. Without reading `load-spark-env.sh`,  I 
didn't know which directory to remove, or where to find and edit the 
`spark-evn.sh`.

This PR is to make the error message more clear. Also change the script for 
less maintenance when we add or drop Scala versions in the future.
As now with https://github.com/apache/spark/pull/22967, we can revise the error 
message as following(in my local setup):

```
Presence of build for multiple Scala versions detected 
(/Users/gengliangwang/IdeaProjects/spark/assembly/target/scala-2.12 and 
/Users/gengliangwang/IdeaProjects/spark/assembly/target/scala-2.11).
Remove one of them or, export SPARK_SCALA_VERSION=2.12 in 
/Users/gengliangwang/IdeaProjects/spark/conf/spark-env.sh.
Visit 
https://spark.apache.org/docs/latest/configuration.html#environment-variables 
for more details about setting environment variables in spark-env.sh.
```

## How was this patch tested?

Manual test

Closes #23049 from gengliangwang/reviseEnvScript.

Authored-by: Gengliang Wang 
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c34c4223
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c34c4223
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c34c4223

Branch: refs/heads/master
Commit: c34c42234f308872ebe9c7cdaee32000c0726eea
Parents: a00aaf6
Author: Gengliang Wang 
Authored: Tue Nov 20 08:29:59 2018 -0600
Committer: Sean Owen 
Committed: Tue Nov 20 08:29:59 2018 -0600

--
 bin/load-spark-env.sh | 27 ---
 1 file changed, 16 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c34c4223/bin/load-spark-env.sh
--
diff --git a/bin/load-spark-env.sh b/bin/load-spark-env.sh
index 0b5006d..0ada5d8 100644
--- a/bin/load-spark-env.sh
+++ b/bin/load-spark-env.sh
@@ -26,15 +26,17 @@ if [ -z "${SPARK_HOME}" ]; then
   source "$(dirname "$0")"/find-spark-home
 fi
 
+SPARK_ENV_SH="spark-env.sh"
 if [ -z "$SPARK_ENV_LOADED" ]; then
   export SPARK_ENV_LOADED=1
 
   export SPARK_CONF_DIR="${SPARK_CONF_DIR:-"${SPARK_HOME}"/conf}"
 
-  if [ -f "${SPARK_CONF_DIR}/spark-env.sh" ]; then
+  SPARK_ENV_SH="${SPARK_CONF_DIR}/${SPARK_ENV_SH}"
+  if [[ -f "${SPARK_ENV_SH}" ]]; then
 # Promote all variable declarations to environment (exported) variables
 set -a
-. "${SPARK_CONF_DIR}/spark-env.sh"
+. ${SPARK_ENV_SH}
 set +a
   fi
 fi
@@ -42,19 +44,22 @@ fi
 # Setting SPARK_SCALA_VERSION if not already set.
 
 if [ -z "$SPARK_SCALA_VERSION" ]; then
+  SCALA_VERSION_1=2.12
+  SCALA_VERSION_2=2.11
 
-  ASSEMBLY_DIR2="${SPARK_HOME}/assembly/target/scala-2.11"
-  ASSEMBLY_DIR1="${SPARK_HOME}/assembly/target/scala-2.12"
-
-  if [[ -d "$ASSEMBLY_DIR2" && -d "$ASSEMBLY_DIR1" ]]; then
-echo -e "Presence of build for multiple Scala versions detected." 1>&2
-echo -e 'Either clean one of them or, export SPARK_SCALA_VERSION in 
spark-env.sh.' 1>&2
+  ASSEMBLY_DIR_1="${SPARK_HOME}/assembly/target/scala-${SCALA_VERSION_1}"
+  ASSEMBLY_DIR_2="${SPARK_HOME}/assembly/target/scala-${SCALA_VERSION_2}"
+  
ENV_VARIABLE_DOC="https://spark.apache.org/docs/latest/configuration.html#environment-variables;
+  if [[ -d "$ASSEMBLY_DIR_1" && -d "$ASSEMBLY_DIR_2" ]]; then
+echo "Presence of build for multiple Scala versions detected 
($ASSEMBLY_DIR_1 and $ASSEMBLY_DIR_2)." 1>&2
+echo "Remove one of them or, export SPARK_SCALA_VERSION=$SCALA_VERSION_1 
in ${SPARK_ENV_SH}." 1>&2
+echo "Visit ${ENV_VARIABLE_DOC} for more details about setting environment 
variables in spark-env.sh." 1>&2
 exit 1
   fi
 
-  if [ -d "$ASSEMBLY_DIR2" ]; then
-export SPARK_SCALA_VERSION="2.11"
+  if [[ -d "$ASSEMBLY_DIR_1" ]]; then
+export SPARK_SCALA_VERSION=${SCALA_VERSION_1}
   else
-export SPARK_SCALA_VERSION="2.12"
+export SPARK_SCALA_VERSION=${SCALA_VERSION_2}
   fi
 fi


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [MINOR][YARN] Make memLimitExceededLogMessage more clean

2018-11-20 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master a09d5ba88 -> a00aaf649


[MINOR][YARN] Make memLimitExceededLogMessage more clean

## What changes were proposed in this pull request?
Current `memLimitExceededLogMessage`:

https://user-images.githubusercontent.com/5399861/48467789-ec8e1000-e824-11e8-91fc-280d342e1bf3.png;
 width="360">

It‘s not very clear, because physical memory exceeds but suggestion contains 
virtual memory config. This pr makes it more clear and replace  deprecated 
config: ```spark.yarn.executor.memoryOverhead```.
## How was this patch tested?

manual tests

Closes #23030 from wangyum/EXECUTOR_MEMORY_OVERHEAD.

Authored-by: Yuming Wang 
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a00aaf64
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a00aaf64
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a00aaf64

Branch: refs/heads/master
Commit: a00aaf649cb5a14648102b2980ce21393804f2c7
Parents: a09d5ba
Author: Yuming Wang 
Authored: Tue Nov 20 08:27:57 2018 -0600
Committer: Sean Owen 
Committed: Tue Nov 20 08:27:57 2018 -0600

--
 .../spark/deploy/yarn/YarnAllocator.scala   | 33 +---
 .../spark/deploy/yarn/YarnAllocatorSuite.scala  | 12 ---
 2 files changed, 14 insertions(+), 31 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a00aaf64/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
--
diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
index ebdcf45..9497530 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
@@ -20,7 +20,6 @@ package org.apache.spark.deploy.yarn
 import java.util.Collections
 import java.util.concurrent._
 import java.util.concurrent.atomic.AtomicInteger
-import java.util.regex.Pattern
 
 import scala.collection.JavaConverters._
 import scala.collection.mutable
@@ -598,13 +597,21 @@ private[yarn] class YarnAllocator(
 (false, s"Container ${containerId}${onHostStr} was preempted.")
   // Should probably still count memory exceeded exit codes towards 
task failures
   case VMEM_EXCEEDED_EXIT_CODE =>
-(true, memLimitExceededLogMessage(
-  completedContainer.getDiagnostics,
-  VMEM_EXCEEDED_PATTERN))
+val vmemExceededPattern = raw"$MEM_REGEX of $MEM_REGEX virtual 
memory used".r
+val diag = 
vmemExceededPattern.findFirstIn(completedContainer.getDiagnostics)
+  .map(_.concat(".")).getOrElse("")
+val message = "Container killed by YARN for exceeding virtual 
memory limits. " +
+  s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key} or 
boosting " +
+  s"${YarnConfiguration.NM_VMEM_PMEM_RATIO} or disabling " +
+  s"${YarnConfiguration.NM_VMEM_CHECK_ENABLED} because of 
YARN-4714."
+(true, message)
   case PMEM_EXCEEDED_EXIT_CODE =>
-(true, memLimitExceededLogMessage(
-  completedContainer.getDiagnostics,
-  PMEM_EXCEEDED_PATTERN))
+val pmemExceededPattern = raw"$MEM_REGEX of $MEM_REGEX physical 
memory used".r
+val diag = 
pmemExceededPattern.findFirstIn(completedContainer.getDiagnostics)
+  .map(_.concat(".")).getOrElse("")
+val message = "Container killed by YARN for exceeding physical 
memory limits. " +
+  s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key}."
+(true, message)
   case _ =>
 // all the failures which not covered above, like:
 // disk failure, kill by app master or resource manager, ...
@@ -735,18 +742,6 @@ private[yarn] class YarnAllocator(
 
 private object YarnAllocator {
   val MEM_REGEX = "[0-9.]+ [KMG]B"
-  val PMEM_EXCEEDED_PATTERN =
-Pattern.compile(s"$MEM_REGEX of $MEM_REGEX physical memory used")
-  val VMEM_EXCEEDED_PATTERN =
-Pattern.compile(s"$MEM_REGEX of $MEM_REGEX virtual memory used")
   val VMEM_EXCEEDED_EXIT_CODE = -103
   val PMEM_EXCEEDED_EXIT_CODE = -104
-
-  def memLimitExceededLogMessage(diagnostics: String, pattern: Pattern): 
String = {
-val matcher = pattern.matcher(diagnostics)
-val diag = if (matcher.find()) " " + matcher.group() + "." else ""
-s"Container killed by YARN for exceeding memory limits. $diag " +
-  "Consider boosting spark.yarn.executor.memoryOverhead or " +
-  "disabling