[spark] Git Push Summary

2018-02-27 Thread sameerag
Repository: spark
Updated Tags:  refs/tags/v2.3.0-rc5 [deleted] 992447fb3

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] Git Push Summary

2018-02-27 Thread sameerag
Repository: spark
Updated Tags:  refs/tags/v2.3.0 [created] 992447fb3

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r25325 - in /dev/spark: 2.3.0-SNAPSHOT-2017_12_09_08_01-ab1b6ee-docs/ 2.3.0-SNAPSHOT-2017_12_11_00_01-4289ac9-docs/ 2.3.0-SNAPSHOT-2017_12_11_08_01-6cc7021-docs/ 2.3.0-SNAPSHOT-2017_12_11_

2018-02-27 Thread sameerag
Author: sameerag
Date: Wed Feb 28 07:28:47 2018
New Revision: 25325

Log:
Removing extraneous doc snapshot uploads

Removed:
dev/spark/2.3.0-SNAPSHOT-2017_12_09_08_01-ab1b6ee-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_11_00_01-4289ac9-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_11_08_01-6cc7021-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_11_12_01-bf20abb-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_11_16_01-3d82f6e-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_11_20_01-a400265-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_12_00_01-ecc179e-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_12_04_01-bc8933f-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_12_12_01-7a51e71-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_12_16_01-17cdabb-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_12_20_01-c7d0148-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_13_00_01-682eb4f-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_13_04_01-7453ab0-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_13_08_02-58f7c82-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_13_12_01-1abcbed-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_13_16_01-ef92999-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_13_20_01-2a29a60-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_14_00_01-c3dd2a2-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_14_04_01-7d8e2ca-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_14_12_01-6d99940-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_14_16_01-0ea2d8c-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_15_00_01-3775dd3-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_15_12_01-4677623-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_15_20_01-0c8fca4-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_17_00_01-c2aeddf-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_17_12_01-7f6d10a-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_18_16_01-0609dcc-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_18_20_01-d4e6959-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_19_08_01-b779c93-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_19_12_01-6129ffa-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_20_00_01-9962390-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_20_12_02-c89b431-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_20_16_01-b176014-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_20_20_01-d3ae3e1-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_21_00_01-cb9fc8d-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_21_04_01-59d5263-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_21_08_01-0abaf31-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_21_12_01-fe65361-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_21_16_01-7beb375-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_21_20_01-a36b78b-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_22_00_01-8df1da3-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_22_04_01-13190a4-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_22_16_01-d23dc5b-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_23_00_01-8941a4a-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_23_12_01-1219d7a-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_24_12_01-0bf1a74-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_25_00_01-fba0313-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_25_04_39-12d20dd-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_25_20_01-be03d3a-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_26_00_01-0e68330-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_26_08_01-ff48b1b-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_26_12_01-91d1b30-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_26_20_01-6674acd-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_27_04_01-b8bfce5-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_27_08_01-774715d-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_27_20_01-753793b-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_28_00_01-ded6d27-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_28_04_01-1eebfbe-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_28_08_01-5536f31-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_28_12_01-8f6d573-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_28_16_01-ffe6fd7-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_28_20_01-c745730-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_29_00_01-224375c-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_29_04_01-cc30ef8-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_29_08_01-11a849b-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_29_12_01-66a7d6b-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_29_16_01-ccda75b-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_29_20_01-8169630-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_30_00_01-14c4a62-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_30_08_01-234d943-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_30_12_01-ea0a5ee-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_31_00_01-cfbe11e-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_31_08_01-028ee40-docs/
dev/spark/2.3.0-SNAPSHOT-2017_12_31_16_01-994065d-docs/
dev/spark/2.3.0-SNAPSHOT-2018_01_01_08_01-f5b7714-docs/
dev/spark/2.3.0-SNAPSHOT-2018_01_01_20_01-e0c090f-docs/
dev/spark/2.3.0-SNAPSHOT-2018_01_02_08_01-a6fc300-docs/
dev/spark/2.3.0-SNAPSHOT-2018_01_03_08_01-a66fe36-docs/
dev/spark/2.3.0-SNAPSHOT-2018_01_03_12_01-9a2b65a-docs/
dev/spark/2.3.0-SNAPSHOT-2018_01_03_13_34-79f7263-docs/
dev/spark/2.3.0-SNAPSHOT-2018_01_03_16_01-b297029-docs/

svn commit: r25324 - /dev/spark/v2.3.0-rc5-bin/ /release/spark/spark-2.3.0/

2018-02-27 Thread yhuai
Author: yhuai
Date: Wed Feb 28 07:25:53 2018
New Revision: 25324

Log:
Releasing Apache Spark 2.3.0

Added:
release/spark/spark-2.3.0/
  - copied from r25323, dev/spark/v2.3.0-rc5-bin/
Removed:
dev/spark/v2.3.0-rc5-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r25323 - in /dev/spark/2.3.1-SNAPSHOT-2018_02_27_22_01-fe9cb4a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-02-27 Thread pwendell
Author: pwendell
Date: Wed Feb 28 06:16:02 2018
New Revision: 25323

Log:
Apache Spark 2.3.1-SNAPSHOT-2018_02_27_22_01-fe9cb4a docs


[This commit notification would consist of 1443 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r25321 - in /dev/spark/2.4.0-SNAPSHOT-2018_02_27_20_01-b14993e-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-02-27 Thread pwendell
Author: pwendell
Date: Wed Feb 28 04:17:40 2018
New Revision: 25321

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_02_27_20_01-b14993e docs


[This commit notification would consist of 1444 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior in document

2018-02-27 Thread gurwls223
Repository: spark
Updated Branches:
  refs/heads/branch-2.3 30242b664 -> fe9cb4afe


[SPARK-23448][SQL] Clarify JSON and CSV parser behavior in document

## What changes were proposed in this pull request?

Clarify JSON and CSV reader behavior in document.

JSON doesn't support partial results for corrupted records.
CSV only supports partial results for the records with more or less tokens.

## How was this patch tested?

Pass existing tests.

Author: Liang-Chi Hsieh 

Closes #20666 from viirya/SPARK-23448-2.

(cherry picked from commit b14993e1fcb68e1c946a671c6048605ab4afdf58)
Signed-off-by: hyukjinkwon 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fe9cb4af
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fe9cb4af
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fe9cb4af

Branch: refs/heads/branch-2.3
Commit: fe9cb4afe39944b394b39cc25622e311554ae187
Parents: 30242b6
Author: Liang-Chi Hsieh 
Authored: Wed Feb 28 11:00:54 2018 +0900
Committer: hyukjinkwon 
Committed: Wed Feb 28 11:01:12 2018 +0900

--
 python/pyspark/sql/readwriter.py| 30 +++-
 python/pyspark/sql/streaming.py | 30 +++-
 .../spark/sql/catalyst/json/JacksonParser.scala |  3 ++
 .../org/apache/spark/sql/DataFrameReader.scala  | 22 +++---
 .../datasources/csv/UnivocityParser.scala   |  5 
 .../spark/sql/streaming/DataStreamReader.scala  | 22 +++---
 6 files changed, 64 insertions(+), 48 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fe9cb4af/python/pyspark/sql/readwriter.py
--
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index 49af1bc..9d05ac7 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -209,13 +209,13 @@ class DataFrameReader(OptionUtils):
 :param mode: allows a mode for dealing with corrupt records during 
parsing. If None is
  set, it uses the default value, ``PERMISSIVE``.
 
-* ``PERMISSIVE`` : sets other fields to ``null`` when it meets 
a corrupted \
- record, and puts the malformed string into a field configured 
by \
- ``columnNameOfCorruptRecord``. To keep corrupt records, an 
user can set \
- a string type field named ``columnNameOfCorruptRecord`` in an 
user-defined \
- schema. If a schema does not have the field, it drops corrupt 
records during \
- parsing. When inferring a schema, it implicitly adds a \
- ``columnNameOfCorruptRecord`` field in an output schema.
+* ``PERMISSIVE`` : when it meets a corrupted record, puts the 
malformed string \
+  into a field configured by ``columnNameOfCorruptRecord``, 
and sets other \
+  fields to ``null``. To keep corrupt records, an user can set 
a string type \
+  field named ``columnNameOfCorruptRecord`` in an user-defined 
schema. If a \
+  schema does not have the field, it drops corrupt records 
during parsing. \
+  When inferring a schema, it implicitly adds a 
``columnNameOfCorruptRecord`` \
+  field in an output schema.
 *  ``DROPMALFORMED`` : ignores the whole corrupted records.
 *  ``FAILFAST`` : throws an exception when it meets corrupted 
records.
 
@@ -393,13 +393,15 @@ class DataFrameReader(OptionUtils):
 :param mode: allows a mode for dealing with corrupt records during 
parsing. If None is
  set, it uses the default value, ``PERMISSIVE``.
 
-* ``PERMISSIVE`` : sets other fields to ``null`` when it meets 
a corrupted \
-  record, and puts the malformed string into a field 
configured by \
-  ``columnNameOfCorruptRecord``. To keep corrupt records, an 
user can set \
-  a string type field named ``columnNameOfCorruptRecord`` in 
an \
-  user-defined schema. If a schema does not have the field, it 
drops corrupt \
-  records during parsing. When a length of parsed CSV tokens 
is shorter than \
-  an expected length of a schema, it sets `null` for extra 
fields.
+* ``PERMISSIVE`` : when it meets a corrupted record, puts the 
malformed string \
+  into a field configured by ``columnNameOfCorruptRecord``, 
and sets other \
+  fields to ``null``. To keep corrupt records, an user can set 
a string type \
+  field named ``columnNameOfCorruptRecord`` in an 

spark git commit: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior in document

2018-02-27 Thread gurwls223
Repository: spark
Updated Branches:
  refs/heads/master 23ac3aaba -> b14993e1f


[SPARK-23448][SQL] Clarify JSON and CSV parser behavior in document

## What changes were proposed in this pull request?

Clarify JSON and CSV reader behavior in document.

JSON doesn't support partial results for corrupted records.
CSV only supports partial results for the records with more or less tokens.

## How was this patch tested?

Pass existing tests.

Author: Liang-Chi Hsieh 

Closes #20666 from viirya/SPARK-23448-2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b14993e1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b14993e1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b14993e1

Branch: refs/heads/master
Commit: b14993e1fcb68e1c946a671c6048605ab4afdf58
Parents: 23ac3aa
Author: Liang-Chi Hsieh 
Authored: Wed Feb 28 11:00:54 2018 +0900
Committer: hyukjinkwon 
Committed: Wed Feb 28 11:00:54 2018 +0900

--
 python/pyspark/sql/readwriter.py| 30 +++-
 python/pyspark/sql/streaming.py | 30 +++-
 .../spark/sql/catalyst/json/JacksonParser.scala |  3 ++
 .../org/apache/spark/sql/DataFrameReader.scala  | 22 +++---
 .../datasources/csv/UnivocityParser.scala   |  5 
 .../spark/sql/streaming/DataStreamReader.scala  | 22 +++---
 6 files changed, 64 insertions(+), 48 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b14993e1/python/pyspark/sql/readwriter.py
--
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index 49af1bc..9d05ac7 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -209,13 +209,13 @@ class DataFrameReader(OptionUtils):
 :param mode: allows a mode for dealing with corrupt records during 
parsing. If None is
  set, it uses the default value, ``PERMISSIVE``.
 
-* ``PERMISSIVE`` : sets other fields to ``null`` when it meets 
a corrupted \
- record, and puts the malformed string into a field configured 
by \
- ``columnNameOfCorruptRecord``. To keep corrupt records, an 
user can set \
- a string type field named ``columnNameOfCorruptRecord`` in an 
user-defined \
- schema. If a schema does not have the field, it drops corrupt 
records during \
- parsing. When inferring a schema, it implicitly adds a \
- ``columnNameOfCorruptRecord`` field in an output schema.
+* ``PERMISSIVE`` : when it meets a corrupted record, puts the 
malformed string \
+  into a field configured by ``columnNameOfCorruptRecord``, 
and sets other \
+  fields to ``null``. To keep corrupt records, an user can set 
a string type \
+  field named ``columnNameOfCorruptRecord`` in an user-defined 
schema. If a \
+  schema does not have the field, it drops corrupt records 
during parsing. \
+  When inferring a schema, it implicitly adds a 
``columnNameOfCorruptRecord`` \
+  field in an output schema.
 *  ``DROPMALFORMED`` : ignores the whole corrupted records.
 *  ``FAILFAST`` : throws an exception when it meets corrupted 
records.
 
@@ -393,13 +393,15 @@ class DataFrameReader(OptionUtils):
 :param mode: allows a mode for dealing with corrupt records during 
parsing. If None is
  set, it uses the default value, ``PERMISSIVE``.
 
-* ``PERMISSIVE`` : sets other fields to ``null`` when it meets 
a corrupted \
-  record, and puts the malformed string into a field 
configured by \
-  ``columnNameOfCorruptRecord``. To keep corrupt records, an 
user can set \
-  a string type field named ``columnNameOfCorruptRecord`` in 
an \
-  user-defined schema. If a schema does not have the field, it 
drops corrupt \
-  records during parsing. When a length of parsed CSV tokens 
is shorter than \
-  an expected length of a schema, it sets `null` for extra 
fields.
+* ``PERMISSIVE`` : when it meets a corrupted record, puts the 
malformed string \
+  into a field configured by ``columnNameOfCorruptRecord``, 
and sets other \
+  fields to ``null``. To keep corrupt records, an user can set 
a string type \
+  field named ``columnNameOfCorruptRecord`` in an user-defined 
schema. If a \
+  schema does not have the field, it drops corrupt records 
during parsing. \
+ 

spark git commit: [SPARK-23417][PYTHON] Fix the build instructions supplied by exception messages in python streaming tests

2018-02-27 Thread gurwls223
Repository: spark
Updated Branches:
  refs/heads/master 598446b74 -> 23ac3aaba


[SPARK-23417][PYTHON] Fix the build instructions supplied by exception messages 
in python streaming tests

## What changes were proposed in this pull request?

Fix the build instructions supplied by exception messages in python streaming 
tests.

I also added -DskipTests to the maven instructions to avoid the 170 minutes of 
scala tests that occurs each time one wants to add a jar to the assembly 
directory.

## How was this patch tested?

- clone branch
- run build/sbt package
- run python/run-tests --modules "pyspark-streaming" , expect error message
- follow instructions in error message. i.e., run build/sbt assembly/package 
streaming-kafka-0-8-assembly/assembly
- rerun python tests, expect error message
- follow instructions in error message. i.e run build/sbt -Pflume 
assembly/package streaming-flume-assembly/assembly
- rerun python tests, see success.
- repeated all of the above for mvn version of the process.

Author: Bruce Robbins 

Closes #20638 from bersprockets/SPARK-23417_propa.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/23ac3aab
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/23ac3aab
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/23ac3aab

Branch: refs/heads/master
Commit: 23ac3aaba4a33bc3d31d01f21e93c4681ef6de03
Parents: 598446b
Author: Bruce Robbins 
Authored: Wed Feb 28 09:25:02 2018 +0900
Committer: hyukjinkwon 
Committed: Wed Feb 28 09:25:02 2018 +0900

--
 python/pyspark/streaming/tests.py | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/23ac3aab/python/pyspark/streaming/tests.py
--
diff --git a/python/pyspark/streaming/tests.py 
b/python/pyspark/streaming/tests.py
index 5b86c1c..71f8101 100644
--- a/python/pyspark/streaming/tests.py
+++ b/python/pyspark/streaming/tests.py
@@ -1477,8 +1477,8 @@ def search_kafka_assembly_jar():
 raise Exception(
 ("Failed to find Spark Streaming kafka assembly jar in %s. " % 
kafka_assembly_dir) +
 "You need to build Spark with "
-"'build/sbt assembly/package 
streaming-kafka-0-8-assembly/assembly' or "
-"'build/mvn -Pkafka-0-8 package' before running this test.")
+"'build/sbt -Pkafka-0-8 assembly/package 
streaming-kafka-0-8-assembly/assembly' or "
+"'build/mvn -DskipTests -Pkafka-0-8 package' before running this 
test.")
 elif len(jars) > 1:
 raise Exception(("Found multiple Spark Streaming Kafka assembly JARs: 
%s; please "
  "remove all but one") % (", ".join(jars)))
@@ -1494,8 +1494,8 @@ def search_flume_assembly_jar():
 raise Exception(
 ("Failed to find Spark Streaming Flume assembly jar in %s. " % 
flume_assembly_dir) +
 "You need to build Spark with "
-"'build/sbt assembly/assembly streaming-flume-assembly/assembly' 
or "
-"'build/mvn -Pflume package' before running this test.")
+"'build/sbt -Pflume assembly/package 
streaming-flume-assembly/assembly' or "
+"'build/mvn -DskipTests -Pflume package' before running this 
test.")
 elif len(jars) > 1:
 raise Exception(("Found multiple Spark Streaming Flume assembly JARs: 
%s; please "
 "remove all but one") % (", ".join(jars)))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r25320 - in /dev/spark/2.3.1-SNAPSHOT-2018_02_27_14_01-30242b6-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-02-27 Thread pwendell
Author: pwendell
Date: Tue Feb 27 22:16:01 2018
New Revision: 25320

Log:
Apache Spark 2.3.1-SNAPSHOT-2018_02_27_14_01-30242b6 docs


[This commit notification would consist of 1443 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r25318 - in /dev/spark/2.4.0-SNAPSHOT-2018_02_27_12_01-598446b-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-02-27 Thread pwendell
Author: pwendell
Date: Tue Feb 27 20:15:52 2018
New Revision: 25318

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_02_27_12_01-598446b docs


[This commit notification would consist of 1444 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-23501][UI] Refactor AllStagesPage in order to avoid redundant code

2018-02-27 Thread vanzin
Repository: spark
Updated Branches:
  refs/heads/master ecb8b383a -> 598446b74


[SPARK-23501][UI] Refactor AllStagesPage in order to avoid redundant code

As suggested in #20651, the code is very redundant in `AllStagesPage` and 
modifying it is a copy-and-paste work. We should avoid such a pattern, which is 
error prone, and have a cleaner solution which avoids code redundancy.

existing UTs

Author: Marco Gaido 

Closes #20663 from mgaido91/SPARK-23475_followup.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/598446b7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/598446b7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/598446b7

Branch: refs/heads/master
Commit: 598446b74b61fee272d3aee3a2e9a3fc90a70d6a
Parents: ecb8b38
Author: Marco Gaido 
Authored: Tue Feb 27 11:33:10 2018 -0800
Committer: Marcelo Vanzin 
Committed: Tue Feb 27 11:35:36 2018 -0800

--
 .../apache/spark/ui/jobs/AllStagesPage.scala| 261 ---
 1 file changed, 102 insertions(+), 159 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/598446b7/core/src/main/scala/org/apache/spark/ui/jobs/AllStagesPage.scala
--
diff --git a/core/src/main/scala/org/apache/spark/ui/jobs/AllStagesPage.scala 
b/core/src/main/scala/org/apache/spark/ui/jobs/AllStagesPage.scala
index 38450b9..4658aa1 100644
--- a/core/src/main/scala/org/apache/spark/ui/jobs/AllStagesPage.scala
+++ b/core/src/main/scala/org/apache/spark/ui/jobs/AllStagesPage.scala
@@ -19,46 +19,20 @@ package org.apache.spark.ui.jobs
 
 import javax.servlet.http.HttpServletRequest
 
-import scala.xml.{Node, NodeSeq}
+import scala.xml.{Attribute, Elem, Node, NodeSeq, Null, Text}
 
 import org.apache.spark.scheduler.Schedulable
-import org.apache.spark.status.PoolData
-import org.apache.spark.status.api.v1._
+import org.apache.spark.status.{AppSummary, PoolData}
+import org.apache.spark.status.api.v1.{StageData, StageStatus}
 import org.apache.spark.ui.{UIUtils, WebUIPage}
 
 /** Page showing list of all ongoing and recently finished stages and pools */
 private[ui] class AllStagesPage(parent: StagesTab) extends WebUIPage("") {
   private val sc = parent.sc
+  private val subPath = "stages"
   private def isFairScheduler = parent.isFairScheduler
 
   def render(request: HttpServletRequest): Seq[Node] = {
-val allStages = parent.store.stageList(null)
-
-val activeStages = allStages.filter(_.status == StageStatus.ACTIVE)
-val pendingStages = allStages.filter(_.status == StageStatus.PENDING)
-val skippedStages = allStages.filter(_.status == StageStatus.SKIPPED)
-val completedStages = allStages.filter(_.status == StageStatus.COMPLETE)
-val failedStages = allStages.filter(_.status == StageStatus.FAILED).reverse
-
-val numFailedStages = failedStages.size
-val subPath = "stages"
-
-val activeStagesTable =
-  new StageTableBase(parent.store, request, activeStages, "active", 
"activeStage",
-parent.basePath, subPath, parent.isFairScheduler, parent.killEnabled, 
false)
-val pendingStagesTable =
-  new StageTableBase(parent.store, request, pendingStages, "pending", 
"pendingStage",
-parent.basePath, subPath, parent.isFairScheduler, false, false)
-val completedStagesTable =
-  new StageTableBase(parent.store, request, completedStages, "completed", 
"completedStage",
-parent.basePath, subPath, parent.isFairScheduler, false, false)
-val skippedStagesTable =
-  new StageTableBase(parent.store, request, skippedStages, "skipped", 
"skippedStage",
-parent.basePath, subPath, parent.isFairScheduler, false, false)
-val failedStagesTable =
-  new StageTableBase(parent.store, request, failedStages, "failed", 
"failedStage",
-parent.basePath, subPath, parent.isFairScheduler, false, true)
-
 // For now, pool information is only accessible in live UIs
 val pools = sc.map(_.getAllPools).getOrElse(Seq.empty[Schedulable]).map { 
pool =>
   val uiPool = 
parent.store.asOption(parent.store.pool(pool.name)).getOrElse(
@@ -67,152 +41,121 @@ private[ui] class AllStagesPage(parent: StagesTab) 
extends WebUIPage("") {
 }.toMap
 val poolTable = new PoolTable(pools, parent)
 
-val shouldShowActiveStages = activeStages.nonEmpty
-val shouldShowPendingStages = pendingStages.nonEmpty
-val shouldShowCompletedStages = completedStages.nonEmpty
-val shouldShowSkippedStages = skippedStages.nonEmpty
-val shouldShowFailedStages = failedStages.nonEmpty
+val allStatuses = Seq(StageStatus.ACTIVE, StageStatus.PENDING, 
StageStatus.COMPLETE,
+  StageStatus.SKIPPED, StageStatus.FAILED)
 
+val allStages = 

spark git commit: [SPARK-23365][CORE] Do not adjust num executors when killing idle executors.

2018-02-27 Thread vanzin
Repository: spark
Updated Branches:
  refs/heads/branch-2.3 6eee545f9 -> 30242b664


[SPARK-23365][CORE] Do not adjust num executors when killing idle executors.

The ExecutorAllocationManager should not adjust the target number of
executors when killing idle executors, as it has already adjusted the
target number down based on the task backlog.

The name `replace` was misleading with DynamicAllocation on, as the target 
number
of executors is changed outside of the call to `killExecutors`, so I adjusted 
that name.  Also separated out the logic of `countFailures` as you don't always 
want that tied to `replace`.

While I was there I made two changes that weren't directly related to this:
1) Fixed `countFailures` in a couple cases where it was getting an incorrect 
value since it used to be tied to `replace`, eg. when killing executors on a 
blacklisted node.
2) hard error if you call `sc.killExecutors` with dynamic allocation on, since 
that's another way the ExecutorAllocationManager and the 
CoarseGrainedSchedulerBackend would get out of sync.

Added a unit test case which verifies that the calls to 
ExecutorAllocationClient do not adjust the number of executors.

Author: Imran Rashid 

Closes #20604 from squito/SPARK-23365.

(cherry picked from commit ecb8b383af1cf1b67f3111c148229e00c9c17c40)
Signed-off-by: Marcelo Vanzin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/30242b66
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/30242b66
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/30242b66

Branch: refs/heads/branch-2.3
Commit: 30242b664a05120c01b54ba9ac0ebed114f0d54e
Parents: 6eee545
Author: Imran Rashid 
Authored: Tue Feb 27 11:12:32 2018 -0800
Committer: Marcelo Vanzin 
Committed: Tue Feb 27 11:12:54 2018 -0800

--
 .../apache/spark/ExecutorAllocationClient.scala | 15 ++---
 .../spark/ExecutorAllocationManager.scala   | 20 --
 .../scala/org/apache/spark/SparkContext.scala   | 13 +++-
 .../spark/scheduler/BlacklistTracker.scala  |  3 +-
 .../cluster/CoarseGrainedSchedulerBackend.scala | 22 ---
 .../spark/ExecutorAllocationManagerSuite.scala  | 66 +++-
 .../StandaloneDynamicAllocationSuite.scala  |  3 +-
 .../spark/scheduler/BlacklistTrackerSuite.scala | 14 ++---
 8 files changed, 121 insertions(+), 35 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/30242b66/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala 
b/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala
index 9112d93..63d87b4 100644
--- a/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala
+++ b/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala
@@ -55,18 +55,18 @@ private[spark] trait ExecutorAllocationClient {
   /**
* Request that the cluster manager kill the specified executors.
*
-   * When asking the executor to be replaced, the executor loss is considered 
a failure, and
-   * killed tasks that are running on the executor will count towards the 
failure limits. If no
-   * replacement is being requested, then the tasks will not count towards the 
limit.
-   *
* @param executorIds identifiers of executors to kill
-   * @param replace whether to replace the killed executors with new ones, 
default false
+   * @param adjustTargetNumExecutors whether the target number of executors 
will be adjusted down
+   * after these executors have been killed
+   * @param countFailures if there are tasks running on the executors when 
they are killed, whether
+* to count those failures toward task failure limits
* @param force whether to force kill busy executors, default false
* @return the ids of the executors acknowledged by the cluster manager to 
be removed.
*/
   def killExecutors(
 executorIds: Seq[String],
-replace: Boolean = false,
+adjustTargetNumExecutors: Boolean,
+countFailures: Boolean,
 force: Boolean = false): Seq[String]
 
   /**
@@ -81,7 +81,8 @@ private[spark] trait ExecutorAllocationClient {
* @return whether the request is acknowledged by the cluster manager.
*/
   def killExecutor(executorId: String): Boolean = {
-val killedExecutors = killExecutors(Seq(executorId))
+val killedExecutors = killExecutors(Seq(executorId), 
adjustTargetNumExecutors = true,
+  countFailures = false)
 killedExecutors.nonEmpty && killedExecutors(0).equals(executorId)
   }
 }


spark git commit: [SPARK-23365][CORE] Do not adjust num executors when killing idle executors.

2018-02-27 Thread vanzin
Repository: spark
Updated Branches:
  refs/heads/master 414ee867b -> ecb8b383a


[SPARK-23365][CORE] Do not adjust num executors when killing idle executors.

The ExecutorAllocationManager should not adjust the target number of
executors when killing idle executors, as it has already adjusted the
target number down based on the task backlog.

The name `replace` was misleading with DynamicAllocation on, as the target 
number
of executors is changed outside of the call to `killExecutors`, so I adjusted 
that name.  Also separated out the logic of `countFailures` as you don't always 
want that tied to `replace`.

While I was there I made two changes that weren't directly related to this:
1) Fixed `countFailures` in a couple cases where it was getting an incorrect 
value since it used to be tied to `replace`, eg. when killing executors on a 
blacklisted node.
2) hard error if you call `sc.killExecutors` with dynamic allocation on, since 
that's another way the ExecutorAllocationManager and the 
CoarseGrainedSchedulerBackend would get out of sync.

Added a unit test case which verifies that the calls to 
ExecutorAllocationClient do not adjust the number of executors.

Author: Imran Rashid 

Closes #20604 from squito/SPARK-23365.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ecb8b383
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ecb8b383
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ecb8b383

Branch: refs/heads/master
Commit: ecb8b383af1cf1b67f3111c148229e00c9c17c40
Parents: 414ee86
Author: Imran Rashid 
Authored: Tue Feb 27 11:12:32 2018 -0800
Committer: Marcelo Vanzin 
Committed: Tue Feb 27 11:12:32 2018 -0800

--
 .../apache/spark/ExecutorAllocationClient.scala | 15 ++---
 .../spark/ExecutorAllocationManager.scala   | 20 --
 .../scala/org/apache/spark/SparkContext.scala   | 13 +++-
 .../spark/scheduler/BlacklistTracker.scala  |  3 +-
 .../cluster/CoarseGrainedSchedulerBackend.scala | 22 ---
 .../spark/ExecutorAllocationManagerSuite.scala  | 66 +++-
 .../StandaloneDynamicAllocationSuite.scala  |  3 +-
 .../spark/scheduler/BlacklistTrackerSuite.scala | 14 ++---
 8 files changed, 121 insertions(+), 35 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ecb8b383/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala 
b/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala
index 9112d93..63d87b4 100644
--- a/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala
+++ b/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala
@@ -55,18 +55,18 @@ private[spark] trait ExecutorAllocationClient {
   /**
* Request that the cluster manager kill the specified executors.
*
-   * When asking the executor to be replaced, the executor loss is considered 
a failure, and
-   * killed tasks that are running on the executor will count towards the 
failure limits. If no
-   * replacement is being requested, then the tasks will not count towards the 
limit.
-   *
* @param executorIds identifiers of executors to kill
-   * @param replace whether to replace the killed executors with new ones, 
default false
+   * @param adjustTargetNumExecutors whether the target number of executors 
will be adjusted down
+   * after these executors have been killed
+   * @param countFailures if there are tasks running on the executors when 
they are killed, whether
+* to count those failures toward task failure limits
* @param force whether to force kill busy executors, default false
* @return the ids of the executors acknowledged by the cluster manager to 
be removed.
*/
   def killExecutors(
 executorIds: Seq[String],
-replace: Boolean = false,
+adjustTargetNumExecutors: Boolean,
+countFailures: Boolean,
 force: Boolean = false): Seq[String]
 
   /**
@@ -81,7 +81,8 @@ private[spark] trait ExecutorAllocationClient {
* @return whether the request is acknowledged by the cluster manager.
*/
   def killExecutor(executorId: String): Boolean = {
-val killedExecutors = killExecutors(Seq(executorId))
+val killedExecutors = killExecutors(Seq(executorId), 
adjustTargetNumExecutors = true,
+  countFailures = false)
 killedExecutors.nonEmpty && killedExecutors(0).equals(executorId)
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/ecb8b383/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
--
diff --git 

spark git commit: [SPARK-23523][SQL] Fix the incorrect result caused by the rule OptimizeMetadataOnlyQuery

2018-02-27 Thread lixiao
Repository: spark
Updated Branches:
  refs/heads/master eac0b0672 -> 414ee867b


[SPARK-23523][SQL] Fix the incorrect result caused by the rule 
OptimizeMetadataOnlyQuery

## What changes were proposed in this pull request?
```Scala
val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e")
 Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5")
 .write.json(tablePath.getCanonicalPath)
 val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", 
"CoL3").distinct()
 df.show()
```

It generates a wrong result.
```
[c,e,a]
```

We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the 
attribute order in the original leaf node. This PR is to fix it.

## How was this patch tested?
Added a test case

Author: gatorsmile 

Closes #20684 from gatorsmile/optimizeMetadataOnly.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/414ee867
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/414ee867
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/414ee867

Branch: refs/heads/master
Commit: 414ee867ba0835b97aae2e8d4e489e1879c251dd
Parents: eac0b06
Author: gatorsmile 
Authored: Tue Feb 27 08:44:25 2018 -0800
Committer: gatorsmile 
Committed: Tue Feb 27 08:44:25 2018 -0800

--
 .../catalyst/plans/logical/LocalRelation.scala  |  9 
 .../execution/OptimizeMetadataOnlyQuery.scala   | 12 +--
 .../datasources/HadoopFsRelation.scala  |  3 +++
 .../OptimizeMetadataOnlyQuerySuite.scala| 22 
 4 files changed, 40 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/414ee867/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LocalRelation.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LocalRelation.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LocalRelation.scala
index d73d7e7..b05508d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LocalRelation.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LocalRelation.scala
@@ -43,10 +43,11 @@ object LocalRelation {
   }
 }
 
-case class LocalRelation(output: Seq[Attribute],
- data: Seq[InternalRow] = Nil,
- // Indicates whether this relation has data from a 
streaming source.
- override val isStreaming: Boolean = false)
+case class LocalRelation(
+output: Seq[Attribute],
+data: Seq[InternalRow] = Nil,
+// Indicates whether this relation has data from a streaming source.
+override val isStreaming: Boolean = false)
   extends LeafNode with analysis.MultiInstanceRelation {
 
   // A local relation must have resolved output.

http://git-wip-us.apache.org/repos/asf/spark/blob/414ee867/sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
index 18f6f69..0613d90 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
@@ -17,6 +17,9 @@
 
 package org.apache.spark.sql.execution
 
+import java.util.Locale
+
+import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.catalog.{HiveTableRelation, 
SessionCatalog}
 import org.apache.spark.sql.catalyst.expressions._
@@ -80,8 +83,13 @@ case class OptimizeMetadataOnlyQuery(catalog: 
SessionCatalog) extends Rule[Logic
   private def getPartitionAttrs(
   partitionColumnNames: Seq[String],
   relation: LogicalPlan): Seq[Attribute] = {
-val partColumns = partitionColumnNames.map(_.toLowerCase).toSet
-relation.output.filter(a => partColumns.contains(a.name.toLowerCase))
+val attrMap = 
relation.output.map(_.name.toLowerCase(Locale.ROOT)).zip(relation.output).toMap
+partitionColumnNames.map { colName =>
+  attrMap.getOrElse(colName.toLowerCase(Locale.ROOT),
+throw new AnalysisException(s"Unable to find the column `$colName` " +
+  s"given [${relation.output.map(_.name).mkString(", ")}]")
+  )
+}
   }
 
   /**


svn commit: r25316 - in /dev/spark/2.4.0-SNAPSHOT-2018_02_27_08_01-eac0b06-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-02-27 Thread pwendell
Author: pwendell
Date: Tue Feb 27 16:18:08 2018
New Revision: 25316

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_02_27_08_01-eac0b06 docs


[This commit notification would consist of 1444 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive offsets

2018-02-27 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 649ed9c57 -> eac0b0672


[SPARK-17147][STREAMING][KAFKA] Allow non-consecutive offsets

## What changes were proposed in this pull request?

Add a configuration spark.streaming.kafka.allowNonConsecutiveOffsets to allow 
streaming jobs to proceed on compacted topics (or other situations involving 
gaps between offsets in the log).

## How was this patch tested?

Added new unit test

justinrmiller has been testing this branch in production for a few weeks

Author: cody koeninger 

Closes #20572 from koeninger/SPARK-17147.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/eac0b067
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/eac0b067
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/eac0b067

Branch: refs/heads/master
Commit: eac0b067222a3dfa52be20360a453cb7bd420bf2
Parents: 649ed9c
Author: cody koeninger 
Authored: Tue Feb 27 08:21:11 2018 -0600
Committer: Sean Owen 
Committed: Tue Feb 27 08:21:11 2018 -0600

--
 .../kafka010/CachedKafkaConsumer.scala  |  55 -
 .../spark/streaming/kafka010/KafkaRDD.scala | 236 +--
 .../streaming/kafka010/KafkaRDDSuite.scala  | 106 +
 .../streaming/kafka010/KafkaTestUtils.scala |  25 +-
 .../kafka010/mocks/MockScheduler.scala  |  96 
 .../streaming/kafka010/mocks/MockTime.scala |  51 
 6 files changed, 487 insertions(+), 82 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/eac0b067/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
--
diff --git 
a/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
 
b/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
index fa3ea61..aeb8c1d 100644
--- 
a/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
+++ 
b/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
@@ -22,10 +22,8 @@ import java.{ util => ju }
 import org.apache.kafka.clients.consumer.{ ConsumerConfig, ConsumerRecord, 
KafkaConsumer }
 import org.apache.kafka.common.{ KafkaException, TopicPartition }
 
-import org.apache.spark.SparkConf
 import org.apache.spark.internal.Logging
 
-
 /**
  * Consumer of single topicpartition, intended for cached reuse.
  * Underlying consumer is not threadsafe, so neither is this,
@@ -38,7 +36,7 @@ class CachedKafkaConsumer[K, V] private(
   val partition: Int,
   val kafkaParams: ju.Map[String, Object]) extends Logging {
 
-  assert(groupId == kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG),
+  require(groupId == kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG),
 "groupId used for cache key must match the groupId in kafkaParams")
 
   val topicPartition = new TopicPartition(topic, partition)
@@ -53,7 +51,7 @@ class CachedKafkaConsumer[K, V] private(
 
   // TODO if the buffer was kept around as a random-access structure,
   // could possibly optimize re-calculating of an RDD in the same batch
-  protected var buffer = ju.Collections.emptyList[ConsumerRecord[K, 
V]]().iterator
+  protected var buffer = ju.Collections.emptyListIterator[ConsumerRecord[K, 
V]]()
   protected var nextOffset = -2L
 
   def close(): Unit = consumer.close()
@@ -71,7 +69,7 @@ class CachedKafkaConsumer[K, V] private(
 }
 
 if (!buffer.hasNext()) { poll(timeout) }
-assert(buffer.hasNext(),
+require(buffer.hasNext(),
   s"Failed to get records for $groupId $topic $partition $offset after 
polling for $timeout")
 var record = buffer.next()
 
@@ -79,17 +77,56 @@ class CachedKafkaConsumer[K, V] private(
   logInfo(s"Buffer miss for $groupId $topic $partition $offset")
   seek(offset)
   poll(timeout)
-  assert(buffer.hasNext(),
+  require(buffer.hasNext(),
 s"Failed to get records for $groupId $topic $partition $offset after 
polling for $timeout")
   record = buffer.next()
-  assert(record.offset == offset,
-s"Got wrong record for $groupId $topic $partition even after seeking 
to offset $offset")
+  require(record.offset == offset,
+s"Got wrong record for $groupId $topic $partition even after seeking 
to offset $offset " +
+  s"got offset ${record.offset} instead. If this is a compacted topic, 
consider enabling " +
+  "spark.streaming.kafka.allowNonConsecutiveOffsets"
+  )
 }
 
 nextOffset = offset + 1
 record
   }
 
+  /**
+   * Start a batch on a compacted topic
+   */
+  def compactedStart(offset: Long, timeout: Long): Unit = {
+

spark git commit: [SPARK-23509][BUILD] Upgrade commons-net from 2.2 to 3.1

2018-02-27 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 8077bb04f -> 649ed9c57


[SPARK-23509][BUILD] Upgrade commons-net from 2.2 to 3.1

## What changes were proposed in this pull request?

This PR avoids version conflicts of `commons-net` by upgrading commons-net from 
2.2 to 3.1. We are seeing the following message during the build using sbt.

```
[warn] Found version conflict(s) in library dependencies; some are suspected to 
be binary incompatible:
...
[warn]  * commons-net:commons-net:3.1 is selected over 2.2
[warn]  +- org.apache.hadoop:hadoop-common:2.6.5  (depends on 
3.1)
[warn]  +- org.apache.spark:spark-core_2.11:2.4.0-SNAPSHOT(depends on 
2.2)
[warn]
```

[Here](https://commons.apache.org/proper/commons-net/changes-report.html) is a 
release history.

[Here](https://commons.apache.org/proper/commons-net/migration.html) is a 
migration guide from 2.x to 3.0.

## How was this patch tested?

Existing tests

Author: Kazuaki Ishizaki 

Closes #20672 from kiszk/SPARK-23509.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/649ed9c5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/649ed9c5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/649ed9c5

Branch: refs/heads/master
Commit: 649ed9c5732f85ef1306576fdd3a9278a2a6410c
Parents: 8077bb0
Author: Kazuaki Ishizaki 
Authored: Tue Feb 27 08:18:41 2018 -0600
Committer: Sean Owen 
Committed: Tue Feb 27 08:18:41 2018 -0600

--
 dev/deps/spark-deps-hadoop-2.6 | 2 +-
 dev/deps/spark-deps-hadoop-2.7 | 2 +-
 pom.xml| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/649ed9c5/dev/deps/spark-deps-hadoop-2.6
--
diff --git a/dev/deps/spark-deps-hadoop-2.6 b/dev/deps/spark-deps-hadoop-2.6
index ed31050..c3d1dd4 100644
--- a/dev/deps/spark-deps-hadoop-2.6
+++ b/dev/deps/spark-deps-hadoop-2.6
@@ -48,7 +48,7 @@ commons-lang-2.6.jar
 commons-lang3-3.5.jar
 commons-logging-1.1.3.jar
 commons-math3-3.4.1.jar
-commons-net-2.2.jar
+commons-net-3.1.jar
 commons-pool-1.5.4.jar
 compress-lzf-1.0.3.jar
 core-1.1.2.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/649ed9c5/dev/deps/spark-deps-hadoop-2.7
--
diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7
index 04dec04..2908670 100644
--- a/dev/deps/spark-deps-hadoop-2.7
+++ b/dev/deps/spark-deps-hadoop-2.7
@@ -48,7 +48,7 @@ commons-lang-2.6.jar
 commons-lang3-3.5.jar
 commons-logging-1.1.3.jar
 commons-math3-3.4.1.jar
-commons-net-2.2.jar
+commons-net-3.1.jar
 commons-pool-1.5.4.jar
 compress-lzf-1.0.3.jar
 core-1.1.2.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/649ed9c5/pom.xml
--
diff --git a/pom.xml b/pom.xml
index ac30107..b839616 100644
--- a/pom.xml
+++ b/pom.xml
@@ -579,7 +579,7 @@
   
 commons-net
 commons-net
-2.2
+3.1
   
   
 io.netty


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r25303 - in /dev/spark/2.4.0-SNAPSHOT-2018_02_27_00_01-8077bb0-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-02-27 Thread pwendell
Author: pwendell
Date: Tue Feb 27 08:16:51 2018
New Revision: 25303

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_02_27_00_01-8077bb0 docs


[This commit notification would consist of 1444 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org