GitHub user thinkborm opened a pull request:
https://github.com/apache/spark/pull/12407
Branch 1.6
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/spark branch-1.6
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12407.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12407
----
commit 04e868b63bfda5afe5cb1a0d6387fb873ad393ba
Author: Yanbo Liang <[email protected]>
Date: 2015-12-16T20:59:22Z
[SPARK-12364][ML][SPARKR] Add ML example for SparkR
We have DataFrame example for SparkR, we also need to add ML example under
```examples/src/main/r```.
cc mengxr jkbradley shivaram
Author: Yanbo Liang <[email protected]>
Closes #10324 from yanboliang/spark-12364.
(cherry picked from commit 1a8b2a17db7ab7a213d553079b83274aeebba86f)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit 552b38f87fc0f6fab61b1e5405be58908b7f5544
Author: Davies Liu <[email protected]>
Date: 2015-12-16T23:48:11Z
[SPARK-12380] [PYSPARK] use SQLContext.getOrCreate in mllib
MLlib should use SQLContext.getOrCreate() instead of creating new
SQLContext.
Author: Davies Liu <[email protected]>
Closes #10338 from davies/create_context.
(cherry picked from commit 27b98e99d21a0cc34955337f82a71a18f9220ab2)
Signed-off-by: Davies Liu <[email protected]>
commit 638b89bc3b1c421fe11cbaf52649225662d3d3ce
Author: Andrew Or <[email protected]>
Date: 2015-12-17T00:13:48Z
[MINOR] Add missing interpolation in NettyRPCEnv
```
Exception in thread "main" org.apache.spark.rpc.RpcTimeoutException:
Cannot receive any reply in ${timeout.duration}. This timeout is controlled
by spark.rpc.askTimeout
at
org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
```
Author: Andrew Or <[email protected]>
Closes #10334 from andrewor14/rpc-typo.
(cherry picked from commit 861549acdbc11920cde51fc57752a8bc241064e5)
Signed-off-by: Shixiong Zhu <[email protected]>
commit fb02e4e3bcc50a8f823dfecdb2eef71287225e7b
Author: Imran Rashid <[email protected]>
Date: 2015-12-17T03:01:05Z
[SPARK-10248][CORE] track exceptions in dagscheduler event loop in tests
`DAGSchedulerEventLoop` normally only logs errors (so it can continue to
process more events, from other jobs). However, this is not desirable in the
tests -- the tests should be able to easily detect any exception, and also
shouldn't silently succeed if there is an exception.
This was suggested by mateiz on https://github.com/apache/spark/pull/7699.
It may have already turned up an issue in "zero split job".
Author: Imran Rashid <[email protected]>
Closes #8466 from squito/SPARK-10248.
(cherry picked from commit 38d9795a4fa07086d65ff705ce86648345618736)
Signed-off-by: Andrew Or <[email protected]>
commit 4af64385b085002d94c54d11bbd144f9f026bbd8
Author: tedyu <[email protected]>
Date: 2015-12-17T03:02:12Z
[SPARK-12365][CORE] Use ShutdownHookManager where
Runtime.getRuntime.addShutdownHook() is called
SPARK-9886 fixed ExternalBlockStore.scala
This PR fixes the remaining references to
Runtime.getRuntime.addShutdownHook()
Author: tedyu <[email protected]>
Closes #10325 from ted-yu/master.
(cherry picked from commit f590178d7a06221a93286757c68b23919bee9f03)
Signed-off-by: Andrew Or <[email protected]>
Conflicts:
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
commit 154567dca126d4992c9c9b08d71d22e9af43c995
Author: Rohit Agarwal <[email protected]>
Date: 2015-12-17T03:04:33Z
[SPARK-12186][WEB UI] Send the complete request URI including the query
string when redirecting.
Author: Rohit Agarwal <[email protected]>
Closes #10180 from mindprince/SPARK-12186.
(cherry picked from commit fdb38227564c1af40cbfb97df420b23eb04c002b)
Signed-off-by: Andrew Or <[email protected]>
commit 4ad08035d28b8f103132da9779340c5e64e2d1c2
Author: Marcelo Vanzin <[email protected]>
Date: 2015-12-17T03:47:49Z
[SPARK-12386][CORE] Fix NPE when spark.executor.port is set.
Author: Marcelo Vanzin <[email protected]>
Closes #10339 from vanzin/SPARK-12386.
(cherry picked from commit d1508dd9b765489913bc948575a69ebab82f217b)
Signed-off-by: Andrew Or <[email protected]>
commit d509194b81abc3c7bf9563d26560d596e1415627
Author: Yin Huai <[email protected]>
Date: 2015-12-17T07:18:53Z
[SPARK-12057][SQL] Prevent failure on corrupt JSON records
This PR makes JSON parser and schema inference handle more cases where we
have unparsed records. It is based on #10043. The last commit fixes the failed
test and updates the logic of schema inference.
Regarding the schema inference change, if we have something like
```
{"f1":1}
[1,2,3]
```
originally, we will get a DF without any column.
After this change, we will get a DF with columns `f1` and
`_corrupt_record`. Basically, for the second row, `[1,2,3]` will be the value
of `_corrupt_record`.
When merge this PR, please make sure that the author is simplyianm.
JIRA: https://issues.apache.org/jira/browse/SPARK-12057
Closes #10043
Author: Ian Macalinao <[email protected]>
Author: Yin Huai <[email protected]>
Closes #10288 from yhuai/handleCorruptJson.
(cherry picked from commit 9d66c4216ad830812848c657bbcd8cd50949e199)
Signed-off-by: Reynold Xin <[email protected]>
commit da7542f2408140a9a3b7ea245350976ac18676a5
Author: echo2mei <[email protected]>
Date: 2015-12-17T15:59:17Z
Once driver register successfully, stop it to connect to master.
This commit is to resolve SPARK-12396.
Author: echo2mei <[email protected]>
Closes #10354 from echoTomei/master.
(cherry picked from commit 5a514b61bbfb609c505d8d65f2483068a56f1f70)
Signed-off-by: Davies Liu <[email protected]>
commit a8466489ab01e59fe07ba20adfc3983ec6928157
Author: Davies Liu <[email protected]>
Date: 2015-12-17T16:01:59Z
Revert "Once driver register successfully, stop it to connect to master."
This reverts commit da7542f2408140a9a3b7ea245350976ac18676a5.
commit 1ebedb20f2c5b781eafa9bf2b5ab092d744cc4fd
Author: Davies Liu <[email protected]>
Date: 2015-12-17T16:04:11Z
[SPARK-12395] [SQL] fix resulting columns of outer join
For API DataFrame.join(right, usingColumns, joinType), if the joinType is
right_outer or full_outer, the resulting join columns could be wrong (will be
null).
The order of columns had been changed to match that with MySQL and
PostgreSQL [1].
This PR also fix the nullability of output for outer join.
[1] http://www.postgresql.org/docs/9.2/static/queries-table-expressions.html
Author: Davies Liu <[email protected]>
Closes #10353 from davies/fix_join.
(cherry picked from commit a170d34a1b309fecc76d1370063e0c4f44dc2142)
Signed-off-by: Davies Liu <[email protected]>
commit 41ad8aced2fc6c694c15e9465cfa34517b2395e8
Author: Yanbo Liang <[email protected]>
Date: 2015-12-17T17:19:46Z
[SQL] Update SQLContext.read.text doc
Since we rename the column name from ```text``` to ```value``` for
DataFrame load by ```SQLContext.read.text```, we need to update doc.
Author: Yanbo Liang <[email protected]>
Closes #10349 from yanboliang/text-value.
(cherry picked from commit 6e0771665b3c9330fc0a5b2c7740a796b4cd712e)
Signed-off-by: Reynold Xin <[email protected]>
commit 1fbca41200d6e73cb276d5949b894881c700323f
Author: Shixiong Zhu <[email protected]>
Date: 2015-12-17T17:55:37Z
[SPARK-12220][CORE] Make Utils.fetchFile support files that contain special
characters
This PR encodes and decodes the file name to fix the issue.
Author: Shixiong Zhu <[email protected]>
Closes #10208 from zsxwing/uri.
(cherry picked from commit 86e405f357711ae93935853a912bc13985c259db)
Signed-off-by: Shixiong Zhu <[email protected]>
commit 881f2544e13679c185a7c34ddb82e885aaa79813
Author: Iulian Dragos <[email protected]>
Date: 2015-12-17T18:19:31Z
[SPARK-12345][MESOS] Properly filter out SPARK_HOME in the Mesos REST server
Fix problem with #10332, this one should fix Cluster mode on Mesos
Author: Iulian Dragos <[email protected]>
Closes #10359 from dragos/issue/fix-spark-12345-one-more-time.
(cherry picked from commit 8184568810e8a2e7d5371db2c6a0366ef4841f70)
Signed-off-by: Kousuke Saruta <[email protected]>
commit 88bbb5429dd3efcff6b2835a70143247b08ae6b2
Author: Andrew Or <[email protected]>
Date: 2015-12-17T04:01:47Z
[SPARK-12390] Clean up unused serializer parameter in BlockManager
No change in functionality is intended. This only changes internal API.
Author: Andrew Or <[email protected]>
Closes #10343 from andrewor14/clean-bm-serializer.
Conflicts:
core/src/main/scala/org/apache/spark/storage/BlockManager.scala
commit c0ab14fbeab2a81d174c3643a4fcc915ff2902e8
Author: Shixiong Zhu <[email protected]>
Date: 2015-12-17T21:23:48Z
[SPARK-12410][STREAMING] Fix places that use '.' and '|' directly in split
String.split accepts a regular expression, so we should escape "." and "|".
Author: Shixiong Zhu <[email protected]>
Closes #10361 from zsxwing/reg-bug.
(cherry picked from commit 540b5aeadc84d1a5d61bda4414abd6bf35dc7ff9)
Signed-off-by: Shixiong Zhu <[email protected]>
commit 48dcee48416d87bf9572ace0a82285bacfcbf46e
Author: Reynold Xin <[email protected]>
Date: 2015-12-17T22:16:49Z
[SPARK-12397][SQL] Improve error messages for data sources when they are
not found
Point users to spark-packages.org to find them.
Author: Reynold Xin <[email protected]>
Closes #10351 from rxin/SPARK-12397.
(cherry picked from commit e096a652b92fc64a7b3457cd0766ab324bcc980b)
Signed-off-by: Michael Armbrust <[email protected]>
commit 4df1dd403441a4e4ca056d294385d8d0d8a0c65d
Author: Evan Chen <[email protected]>
Date: 2015-12-17T22:22:30Z
[SPARK-12376][TESTS] Spark Streaming Java8APISuite fails in
assertOrderInvariantEquals method
org.apache.spark.streaming.Java8APISuite.java is failing due to trying to
sort immutable list in assertOrderInvariantEquals method.
Author: Evan Chen <[email protected]>
Closes #10336 from evanyc15/SPARK-12376-StreamingJavaAPISuite.
commit 9177ea383a29653f0591a59e1ee2dff6b87d5a1c
Author: jhu-chang <[email protected]>
Date: 2015-12-18T01:53:15Z
[SPARK-11749][STREAMING] Duplicate creating the RDD in file stream when
recovering from checkpoint data
Add a transient flag `DStream.restoredFromCheckpointData` to control the
restore processing in DStream to avoid duplicate works: check this flag first
in `DStream.restoreCheckpointData`, only when `false`, the restore process will
be executed.
Author: jhu-chang <[email protected]>
Closes #9765 from jhu-chang/SPARK-11749.
(cherry picked from commit f4346f612b6798517153a786f9172cf41618d34d)
Signed-off-by: Shixiong Zhu <[email protected]>
commit df0231952e5542e9870f8dde9ecbd7ad9a50f847
Author: Michael Gummelt <[email protected]>
Date: 2015-12-18T11:18:00Z
[SPARK-12413] Fix Mesos ZK persistence
I believe this fixes SPARK-12413. I'm currently running an integration
test to verify.
Author: Michael Gummelt <[email protected]>
Closes #10366 from mgummelt/fix-zk-mesos.
(cherry picked from commit 2bebaa39d9da33bc93ef682959cd42c1968a6a3e)
Signed-off-by: Kousuke Saruta <[email protected]>
commit 1dc71ec777ff7cac5d3d7adb13f2d63ffe8909b6
Author: Yin Huai <[email protected]>
Date: 2015-12-18T18:52:14Z
[SPARK-12218][SQL] Invalid splitting of nested AND expressions in Data
Source filter API
JIRA: https://issues.apache.org/jira/browse/SPARK-12218
When creating filters for Parquet/ORC, we should not push nested AND
expressions partially.
Author: Yin Huai <[email protected]>
Closes #10362 from yhuai/SPARK-12218.
(cherry picked from commit 41ee7c57abd9f52065fd7ffb71a8af229603371d)
Signed-off-by: Yin Huai <[email protected]>
commit 3b903e44b912cd36ec26e9e95444656eee7b0c46
Author: Andrew Or <[email protected]>
Date: 2015-12-18T20:56:03Z
Revert "[SPARK-12365][CORE] Use ShutdownHookManager where
Runtime.getRuntime.addShutdownHook() is called"
This reverts commit 4af64385b085002d94c54d11bbd144f9f026bbd8.
commit bd33d4ee847973289a58032df35375f03e9f9865
Author: Kousuke Saruta <[email protected]>
Date: 2015-12-18T22:05:06Z
[SPARK-12404][SQL] Ensure objects passed to StaticInvoke is Serializable
Now `StaticInvoke` receives `Any` as a object and `StaticInvoke` can be
serialized but sometimes the object passed is not serializable.
For example, following code raises Exception because
`RowEncoder#extractorsFor` invoked indirectly makes `StaticInvoke`.
```
case class TimestampContainer(timestamp: java.sql.Timestamp)
val rdd = sc.parallelize(1 to 2).map(_ =>
TimestampContainer(System.currentTimeMillis))
val df = rdd.toDF
val ds = df.as[TimestampContainer]
val rdd2 = ds.rdd <-----------------
invokes extractorsFor indirectory
```
I'll add test cases.
Author: Kousuke Saruta <[email protected]>
Author: Michael Armbrust <[email protected]>
Closes #10357 from sarutak/SPARK-12404.
(cherry picked from commit 6eba655259d2bcea27d0147b37d5d1e476e85422)
Signed-off-by: Michael Armbrust <[email protected]>
commit eca401ee5d3ae683cbee531c1f8bc981f9603fc8
Author: Burak Yavuz <[email protected]>
Date: 2015-12-18T23:24:41Z
[SPARK-11985][STREAMING][KINESIS][DOCS] Update Kinesis docs
- Provide example on `message handler`
- Provide bit on KPL record de-aggregation
- Fix typos
Author: Burak Yavuz <[email protected]>
Closes #9970 from brkyvz/kinesis-docs.
(cherry picked from commit 2377b707f25449f4557bf048bb384c743d9008e5)
Signed-off-by: Shixiong Zhu <[email protected]>
commit d6a519ff20652494ac3aeba477526ad1fd810a3c
Author: Yanbo Liang <[email protected]>
Date: 2015-12-19T08:34:30Z
[SQL] Fix mistake doc of join type for dataframe.join
Fix mistake doc of join type for ```dataframe.join```.
Author: Yanbo Liang <[email protected]>
Closes #10378 from yanboliang/leftsemi.
(cherry picked from commit a073a73a561e78c734119c8b764d37a4e5e70da4)
Signed-off-by: Reynold Xin <[email protected]>
commit c754a08793458813d608e48ad1b158da770cd992
Author: pshearer <[email protected]>
Date: 2015-12-21T22:04:59Z
Doc typo: ltrim = trim from left end, not right
Author: pshearer <[email protected]>
Closes #10414 from pshearer/patch-1.
(cherry picked from commit fc6dbcc7038c2b030ef6a2dc8be5848499ccee1c)
Signed-off-by: Andrew Or <[email protected]>
commit ca3998512dd7801379c96c9399d3d053ab7472cd
Author: Andrew Or <[email protected]>
Date: 2015-12-21T22:09:04Z
[SPARK-12466] Fix harmless NPE in tests
```
[info] ReplayListenerSuite:
[info] - Simple replay (58 milliseconds)
java.lang.NullPointerException
at
org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:982)
at
org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:980)
```
https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-SBT/4316/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/consoleFull
This was introduced in #10284. It's harmless because the NPE is caused by a
race that occurs mainly in `local-cluster` tests (but don't actually fail the
tests).
Tested locally to verify that the NPE is gone.
Author: Andrew Or <[email protected]>
Closes #10417 from andrewor14/fix-harmless-npe.
(cherry picked from commit d655d37ddf59d7fb6db529324ac8044d53b2622a)
Signed-off-by: Andrew Or <[email protected]>
commit 4062cda3087ae42c6c3cb24508fc1d3a931accdf
Author: Patrick Wendell <[email protected]>
Date: 2015-12-22T01:50:29Z
Preparing Spark release v1.6.0-rc4
commit 5b19e7cfded0e2e41b6f427b4c3cfc3f06f85466
Author: Patrick Wendell <[email protected]>
Date: 2015-12-22T01:50:36Z
Preparing development version 1.6.0-SNAPSHOT
commit 309ef355fc511b70765983358d5c92b5f1a26bce
Author: Shixiong Zhu <[email protected]>
Date: 2015-12-22T06:28:18Z
[MINOR] Fix typos in JavaStreamingContext
Author: Shixiong Zhu <[email protected]>
Closes #10424 from zsxwing/typo.
(cherry picked from commit 93da8565fea42d8ac978df411daced4a9ea3a9c8)
Signed-off-by: Reynold Xin <[email protected]>
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]