GitHub user witgo reopened a pull request:
https://github.com/apache/spark/pull/1208
SPARK-1470: Use the scala-logging wrapper instead of the directly sfl4j api
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark SPARK-1470
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1208.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1208
----
commit 8fade8973e5fc97f781de5344beb66b90bd6e524
Author: Cheng Lian <[email protected]>
Date: 2014-06-25T07:14:34Z
[SPARK-2263][SQL] Support inserting MAP<K, V> to Hive tables
JIRA issue: [SPARK-2263](https://issues.apache.org/jira/browse/SPARK-2263)
Map objects were not converted to Hive types before inserting into Hive
tables.
Author: Cheng Lian <[email protected]>
Closes #1205 from liancheng/spark-2263 and squashes the following commits:
c7a4373 [Cheng Lian] Addressed @concretevitamin's comment
784940b [Cheng Lian] SARPK-2263: support inserting MAP<K, V> to Hive tables
commit 22036aeb1b2cac7f48cd60afea925b42a5318631
Author: Cheng Lian <[email protected]>
Date: 2014-06-25T07:17:28Z
[BUGFIX][SQL] Should match java.math.BigDecimal when wnrapping Hive output
The `BigDecimal` branch in `unwrap` matches to `scala.math.BigDecimal`
rather than `java.math.BigDecimal`.
Author: Cheng Lian <[email protected]>
Closes #1199 from liancheng/javaBigDecimal and squashes the following
commits:
e9bb481 [Cheng Lian] Should match java.math.BigDecimal when wnrapping Hive
output
commit acc01ab3265c317f36a4fca28d3b9d72b0096c12
Author: CodingCat <[email protected]>
Date: 2014-06-25T07:23:32Z
SPARK-2038: rename "conf" parameters in the saveAsHadoop functions with
source-compatibility
https://issues.apache.org/jira/browse/SPARK-2038
to differentiate with SparkConf object and at the same time keep the source
level compatibility
Author: CodingCat <[email protected]>
Closes #1137 from CodingCat/SPARK-2038 and squashes the following commits:
11abeba [CodingCat] revise the comments
7ee5712 [CodingCat] to keep the source-compatibility
763975f [CodingCat] style fix
d91288d [CodingCat] rename "conf" parameters in the saveAsHadoop functions
commit ac06a85da59db8f2654cdf6601d186348da09c01
Author: Reynold Xin <[email protected]>
Date: 2014-06-25T08:01:23Z
Replace doc reference to Shark with Spark SQL.
commit 5603e4c47f1dc1b87336f57ed4d6bd9e88f5abcc
Author: Andrew Or <[email protected]>
Date: 2014-06-25T17:47:22Z
[SPARK-2242] HOTFIX: pyspark shell hangs on simple job
This reverts a change introduced in
3870248740d83b0292ccca88a494ce19783847f0, which redirected all stderr to the OS
pipe instead of directly to the `bin/pyspark` shell output. This causes a
simple job to hang in two ways:
1. If the cluster is not configured correctly or does not have enough
resources, the job hangs without producing any output, because the relevant
warning messages are masked.
2. If the stderr volume is large, this could lead to a deadlock if we
redirect everything to the OS pipe. From the [python
docs](https://docs.python.org/2/library/subprocess.html):
```
Note Do not use stdout=PIPE or stderr=PIPE with this function as that can
deadlock
based on the child process output volume. Use Popen with the communicate()
method
when you need pipes.
```
Note that we cannot remove `stdout=PIPE` in a similar way, because we
currently use it to communicate the py4j port. However, it should be fine (as
it has been for a long time) because we do not produce a ton of traffic through
`stdout`.
That commit was not merged in branch-1.0, so this fix is for master only.
Author: Andrew Or <[email protected]>
Closes #1178 from andrewor14/fix-python and squashes the following commits:
e68e870 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
fix-python
20849a8 [Andrew Or] Tone down stdout interference message
a09805b [Andrew Or] Return more than 1 line of error message to user
6dfbd1e [Andrew Or] Don't swallow original exception
0d1861f [Andrew Or] Provide more helpful output if stdout is garbled
21c9d7c [Andrew Or] Do not mask stderr from output
commit 9aa603296c285e1acf4bde64583f203008ba3e91
Author: Andrew Or <[email protected]>
Date: 2014-06-25T19:23:08Z
[SPARK-2258 / 2266] Fix a few worker UI bugs
**SPARK-2258.** Worker UI displays zombie processes if the executor throws
an exception before a process is launched. This is because we only inform the
Worker of the change if the process is already launched, which in this case it
isn't.
**SPARK-2266.** We expose "Some(app-id)" on the log page. This is fairly
minor.
Author: Andrew Or <[email protected]>
Closes #1213 from andrewor14/fix-worker-ui and squashes the following
commits:
c1223fe [Andrew Or] Fix worker UI bugs
commit 7ff2c754f340ba4c4077b0ff6285876eb7871c7b
Author: Reynold Xin <[email protected]>
Date: 2014-06-25T19:43:22Z
[SPARK-2270] Kryo cannot serialize results returned by asJavaIterable
and thus groupBy/cogroup are broken in Java APIs when Kryo is used).
@pwendell this should be merged into 1.0.1.
Thanks @sorenmacbeth for reporting this & helping out with the fix.
Author: Reynold Xin <[email protected]>
Closes #1206 from rxin/kryo-iterable-2270 and squashes the following
commits:
09da0aa [Reynold Xin] Updated the comment.
009bf64 [Reynold Xin] [SPARK-2270] Kryo cannot serialize results returned
by asJavaIterable (and thus groupBy/cogroup are broken in Java APIs when Kryo
is used).
commit 1132e472eca1a00c2ce10d2f84e8f0e79a5193d3
Author: Sebastien Rainville <[email protected]>
Date: 2014-06-25T20:21:18Z
[SPARK-2204] Launch tasks on the proper executors in mesos fine-grained mode
The scheduler for Mesos in fine-grained mode launches tasks on the wrong
executors. `MesosSchedulerBackend.resourceOffers(SchedulerDriver, List[Offer])`
is assuming that `TaskSchedulerImpl.resourceOffers(Seq[WorkerOffer])` is
returning task lists in the same order as the offers it was passed, but in the
current implementation `TaskSchedulerImpl.resourceOffers` shuffles the offers
to avoid assigning the tasks always to the same executors. The result is that
the tasks are launched on the wrong executors. The jobs are sometimes able to
complete, but most of the time they fail. It seems that as soon as something
goes wrong with a task for some reason Spark is not able to recover since it's
mistaken as to where the tasks are actually running. Also, it seems that the
more the cluster is under load the more likely the job is to fail because
there's a higher probability that Spark is trying to launch a task on a slave
that doesn't actually have enough resources, again because it's using
the wrong offers.
The solution is to not assume that the order in which the tasks are
returned is the same as the offers, and simply launch the tasks on the executor
decided by `TaskSchedulerImpl.resourceOffers`. What I am not sure about is that
I considered slaveId and executorId to be the same, which is true at least in
my setup, but I don't know if that is always true.
I tested this on top of the 1.0.0 release and it seems to work fine on our
cluster.
Author: Sebastien Rainville <[email protected]>
Closes #1140 from sebastienrainville/fine-grained-mode-fix-master and
squashes the following commits:
a98b0e0 [Sebastien Rainville] Use a HashMap to retrieve the offer indices
d6ffe54 [Sebastien Rainville] Launch tasks on the proper executors in mesos
fine-grained mode
commit 9d824fed8c62dd6c87b4c855c2fea930c01b58f4
Author: Zongheng Yang <[email protected]>
Date: 2014-06-26T01:06:33Z
[SQL] SPARK-1800 Add broadcast hash join operator & associated hints.
This PR is based off Michael's [PR
734](https://github.com/apache/spark/pull/734) and includes a bunch of cleanups.
Moreover, this PR also
- makes `SparkLogicalPlan` take a `tableName: String`, which facilitates
testing.
- moves join-related tests to a single file.
Author: Zongheng Yang <[email protected]>
Author: Michael Armbrust <[email protected]>
Closes #1163 from concretevitamin/auto-broadcast-hash-join and squashes the
following commits:
d0f4991 [Zongheng Yang] Fix bug in broadcast hash join & add test to cover
it.
af080d7 [Zongheng Yang] Fix in joinIterators()'s next().
440d277 [Zongheng Yang] Fixes to imports; add back
requiredChildDistribution (lost when merging)
208d5f6 [Zongheng Yang] Make LeftSemiJoinHash mix in HashJoin.
ad6c7cc [Zongheng Yang] Minor cleanups.
814b3bf [Zongheng Yang] Merge branch 'master' into auto-broadcast-hash-join
a8a093e [Zongheng Yang] Minor cleanups.
6fd8443 [Zongheng Yang] Cut down size estimation related stuff.
a4267be [Zongheng Yang] Add test for broadcast hash join and related
necessary refactorings:
0e64b08 [Zongheng Yang] Scalastyle fix.
91461c2 [Zongheng Yang] Merge branch 'master' into auto-broadcast-hash-join
7c7158b [Zongheng Yang] Prototype of auto conversion to broadcast hash join.
0ad122f [Zongheng Yang] Merge branch 'master' into auto-broadcast-hash-join
3e5d77c [Zongheng Yang] WIP: giant and messy WIP.
a92ed0c [Michael Armbrust] Formatting.
76ca434 [Michael Armbrust] A simple strategy that broadcasts tables only
when they are found in a configuration hint.
cf6b381 [Michael Armbrust] Split out generic logic for hash joins and
create two concrete physical operators: BroadcastHashJoin and ShuffledHashJoin.
a8420ca [Michael Armbrust] Copy records in executeCollect to avoid issues
with mutable rows.
commit 7f196b009d26d4aed403b3c694f8b603601718e3
Author: Cheng Lian <[email protected]>
Date: 2014-06-26T01:41:47Z
[SPARK-2283][SQL] Reset test environment before running PruningSuite
JIRA issue: [SPARK-2283](https://issues.apache.org/jira/browse/SPARK-2283)
If `PruningSuite` is run right after `HiveCompatibilitySuite`, the first
test case fails because `srcpart` table is cached in-memory by
`HiveCompatibilitySuite`, but column pruning is not implemented for
`InMemoryColumnarTableScan` operator yet.
Author: Cheng Lian <[email protected]>
Closes #1221 from liancheng/spark-2283 and squashes the following commits:
dc0b663 [Cheng Lian] SPARK-2283: reset test environment before running
PruningSuite
commit b88a59a66845b8935b22f06fc96d16841ed20c94
Author: Mark Hamstra <[email protected]>
Date: 2014-06-26T03:57:48Z
[SPARK-1749] Job cancellation when SchedulerBackend does not implement
killTask
This is a fixed up version of #686 (cc @markhamstra @pwendell). The last
commit (the only one I authored) reflects the changes I made from Mark's
original patch.
Author: Mark Hamstra <[email protected]>
Author: Kay Ousterhout <[email protected]>
Closes #1219 from kayousterhout/mark-SPARK-1749 and squashes the following
commits:
42dfa7e [Kay Ousterhout] Got rid of terrible double-negative name
80b3205 [Kay Ousterhout] Don't notify listeners of job failure if it wasn't
successfully cancelled.
d156d33 [Mark Hamstra] Do nothing in no-kill submitTasks
9312baa [Mark Hamstra] code review update
cc353c8 [Mark Hamstra] scalastyle
e61f7f8 [Mark Hamstra] Catch UnsupportedOperationException when
DAGScheduler tries to cancel a job on a SchedulerBackend that does not
implement killTask
commit 4a346e242c3f241c575f35536220df01ad724e23
Author: Reynold Xin <[email protected]>
Date: 2014-06-26T05:35:03Z
[SPARK-2284][UI] Mark all failed tasks as failures.
Previously only tasks failed with ExceptionFailure reason was marked as
failure.
Author: Reynold Xin <[email protected]>
Closes #1224 from rxin/SPARK-2284 and squashes the following commits:
be79dbd [Reynold Xin] [SPARK-2284][UI] Mark all failed tasks as failures.
commit 441cdcca64ba0b3cbaae4d4f25ebe4c4ebd46aae
Author: Szul, Piotr <[email protected]>
Date: 2014-06-26T04:55:49Z
[SPARK-2172] PySpark cannot import mllib modules in YARN-client mode
Include pyspark/mllib python sources as resources in the mllib.jar.
This way they will be included in the final assembly
Author: Szul, Piotr <[email protected]>
Closes #1223 from piotrszul/branch-1.0 and squashes the following commits:
69d5174 [Szul, Piotr] Removed unsed resource directory src/main/resource
from mllib pom
f8c52a0 [Szul, Piotr] [SPARK-2172] PySpark cannot import mllib modules in
YARN-client mode Include pyspark/mllib python sources as resources in the jar
(cherry picked from commit fa167194ce1b5898e4d7232346c9f86b2897a722)
Signed-off-by: Reynold Xin <[email protected]>
commit e4899a253728bfa7c78709a37a4837f74b72bd61
Author: Takuya UESHIN <[email protected]>
Date: 2014-06-26T06:55:31Z
[SPARK-2254] [SQL] ScalaRefection should mark primitive types as
non-nullable.
Author: Takuya UESHIN <[email protected]>
Closes #1193 from ueshin/issues/SPARK-2254 and squashes the following
commits:
cfd6088 [Takuya UESHIN] Modify ScalaRefection.schemaFor method to return
nullability of Scala Type.
commit 48a82a827c99526b165c78d7e88faec43568a37a
Author: Kay Ousterhout <[email protected]>
Date: 2014-06-26T13:20:27Z
Remove use of spark.worker.instances
spark.worker.instances was added as part of this commit:
https://github.com/apache/spark/commit/1617816090e7b20124a512a43860a21232ebf511
My understanding is that SPARK_WORKER_INSTANCES is supported for backwards
compatibility,
but spark.worker.instances is never used (SparkSubmit.scala sets
spark.executor.instances) so should
not have been added.
@sryza @pwendell @tgravescs LMK if I'm understanding this correctly
Author: Kay Ousterhout <[email protected]>
Closes #1214 from kayousterhout/yarn_config and squashes the following
commits:
3d7c491 [Kay Ousterhout] Remove use of spark.worker.instances
commit 32a1ad75313472b1b098f7ec99335686d3fe4fc3
Author: Takuya UESHIN <[email protected]>
Date: 2014-06-26T20:37:19Z
[SPARK-2295] [SQL] Make JavaBeans nullability stricter.
Author: Takuya UESHIN <[email protected]>
Closes #1235 from ueshin/issues/SPARK-2295 and squashes the following
commits:
201c508 [Takuya UESHIN] Make JavaBeans nullability stricter.
commit 6587ef7c1783961e6ef250afa387271a1bd6e277
Author: Reynold Xin <[email protected]>
Date: 2014-06-26T21:00:45Z
[SPARK-2286][UI] Report exception/errors for failed tasks that are not
ExceptionFailure
Also added inline doc for each TaskEndReason.
Author: Reynold Xin <[email protected]>
Closes #1225 from rxin/SPARK-2286 and squashes the following commits:
6a7959d [Reynold Xin] Fix unit test failure.
cf9d5eb [Reynold Xin] Merge branch 'master' into SPARK-2286
a61fae1 [Reynold Xin] Move to line above ...
38c7391 [Reynold Xin] [SPARK-2286][UI] Report exception/errors for failed
tasks that are not ExceptionFailure.
commit 62d4a0fa9947e64c1533f66ae577557bcfb271c9
Author: Zichuan Ye <[email protected]>
Date: 2014-06-26T22:21:29Z
Fixing AWS instance type information based upon current EC2 data
Fixed a problem in previous file in which some information regarding AWS
instance types were wrong. Such information was updated base upon current AWS
EC2 data.
Author: Zichuan Ye <[email protected]>
Closes #1156 from jerry86/master and squashes the following commits:
ff36e95 [Zichuan Ye] Fixing AWS instance type information based upon
current EC2 data
commit f1f7385a5087a80c936d419699e3f5232455f189
Author: Patrick Wendell <[email protected]>
Date: 2014-06-27T00:09:24Z
Strip '@' symbols when merging pull requests.
Currently all of the commits with 'X' in them cause person X to
receive e-mails every time someone makes a public fork of Spark.
marmbrus who requested this.
Author: Patrick Wendell <[email protected]>
Closes #1239 from pwendell/strip and squashes the following commits:
22e5a97 [Patrick Wendell] Strip '@' symbols when merging pull requests.
commit 981bde9b056ef5e91aed553e0b5930f12e1ff797
Author: Cheng Hao <[email protected]>
Date: 2014-06-27T02:18:11Z
[SQL]Extract the joinkeys from join condition
Extract the join keys from equality conditions, that can be evaluated using
equi-join.
Author: Cheng Hao <[email protected]>
Closes #1190 from chenghao-intel/extract_join_keys and squashes the
following commits:
4a1060a [Cheng Hao] Fix some of the small issues
ceb4924 [Cheng Hao] Remove the redundant pattern of join keys extraction
cec34e8 [Cheng Hao] Update the code style issues
dcc4584 [Cheng Hao] Extract the joinkeys from join condition
commit bf578deaf2493081ceeb78dfd7617def5699a06e
Author: Reynold Xin <[email protected]>
Date: 2014-06-27T04:12:16Z
Removed throwable field from FetchFailedException and added
MetadataFetchFailedException
FetchFailedException used to have a Throwable field, but in reality we
never propagate any of the throwable/exceptions back to the driver because
Executor explicitly looks for FetchFailedException and then sends FetchFailed
as the TaskEndReason.
This pull request removes the throwable and adds a
MetadataFetchFailedException that extends FetchFailedException (so now
MapOutputTracker throws MetadataFetchFailedException instead).
Author: Reynold Xin <[email protected]>
Closes #1227 from rxin/metadataFetchException and squashes the following
commits:
5cb1e0a [Reynold Xin] MetadataFetchFailedException extends
FetchFailedException.
8861ee2 [Reynold Xin] Throw MetadataFetchFailedException in
MapOutputTracker.
commit d1636dd72fc4966413baeb97ba55b313dc1da63d
Author: Reynold Xin <[email protected]>
Date: 2014-06-27T04:13:26Z
[SPARK-2297][UI] Make task attempt and speculation more explicit in UI.
New UI:

Author: Reynold Xin <[email protected]>
Closes #1236 from rxin/ui-task-attempt and squashes the following commits:
3b645dd [Reynold Xin] Expose attemptId in Stage.
c0474b1 [Reynold Xin] Beefed up unit test.
c404bdd [Reynold Xin] Fix ReplayListenerSuite.
f56be4b [Reynold Xin] Fixed JsonProtocolSuite.
e29e0f7 [Reynold Xin] Minor update.
5e4354a [Reynold Xin] [SPARK-2297][UI] Make task attempt and speculation
more explicit in UI.
commit c23f5db32b3bd4d965d56e5df684a3b814a91cd6
Author: Xiangrui Meng <[email protected]>
Date: 2014-06-27T04:46:55Z
[SPARK-2251] fix concurrency issues in random sampler
The following code is very likely to throw an exception:
~~~
val rdd = sc.parallelize(0 until 111, 10).sample(false, 0.1)
rdd.zip(rdd).count()
~~~
because the same random number generator is used in compute partitions.
Author: Xiangrui Meng <[email protected]>
Closes #1229 from mengxr/fix-sample and squashes the following commits:
f1ee3d7 [Xiangrui Meng] fix concurrency issues in random sampler
commit 18f29b96c7e0948f5f504e522e5aa8a8d1ab163e
Author: witgo <[email protected]>
Date: 2014-06-27T04:59:21Z
SPARK-2181:The keys for sorting the columns of Executor page in SparkUI are
incorrect
Author: witgo <[email protected]>
Closes #1135 from witgo/SPARK-2181 and squashes the following commits:
39dad90 [witgo] The keys for sorting the columns of Executor page in
SparkUI are incorrect
commit 21e0f77b6321590ed86223a60cdb8ae08ea4057f
Author: Andrew Or <[email protected]>
Date: 2014-06-27T22:23:25Z
[SPARK-2307] SparkUI - storage tab displays incorrect RDDs
The issue here is that the `StorageTab` listens for updates from the
`StorageStatusListener`, but when a block is kicked out of the cache,
`StorageStatusListener` removes it from its list. Thus, there is no way for the
`StorageTab` to know whether a block has been dropped.
This issue was introduced in #1080, which was itself a bug fix. Here we
revert that PR and offer a different fix for the original bug (SPARK-2144).
Author: Andrew Or <[email protected]>
Closes #1249 from andrewor14/storage-ui-fix and squashes the following
commits:
af019ce [Andrew Or] Fix SPARK-2307
commit f17510e371dfbeaada3c72b884d70c36503ea30a
Author: Andrew Or <[email protected]>
Date: 2014-06-27T23:11:31Z
[SPARK-2259] Fix highly misleading docs on cluster / client deploy modes
The existing docs are highly misleading. For standalone mode, for example,
it encourages the user to use standalone-cluster mode, which is not officially
supported. The safeguards have been added in Spark submit itself to prevent bad
documentation from leading users down the wrong path in the future.
This PR is prompted by countless headaches users of Spark have run into on
the mailing list.
Author: Andrew Or <[email protected]>
Closes #1200 from andrewor14/submit-docs and squashes the following commits:
5ea2460 [Andrew Or] Rephrase cluster vs client explanation
c827f32 [Andrew Or] Clarify spark submit messages
9f7ed8f [Andrew Or] Clarify client vs cluster deploy mode + add safeguards
commit 0e0686d3ef88e024fcceafe36a0cdbb953f5aeae
Author: Matthew Farrellee <[email protected]>
Date: 2014-06-28T01:20:33Z
[SPARK-2003] Fix python SparkContext example
Author: Matthew Farrellee <[email protected]>
Closes #1246 from mattf/SPARK-2003 and squashes the following commits:
b12e7ca [Matthew Farrellee] [SPARK-2003] Fix python SparkContext example
commit b8f2e13aec715e038bd6d1d07b607683f138ac83
Author: Guillaume Ballet <[email protected]>
Date: 2014-06-28T20:07:12Z
[SPARK-2233] make-distribution script should list the git hash in the
RELEASE file
This patch adds the git revision hash (short version) to the RELEASE file.
It uses git instead of simply checking for the existence of .git, so as to make
sure that this is a functional repository.
Author: Guillaume Ballet <[email protected]>
Closes #1216 from gballet/master and squashes the following commits:
eabc50f [Guillaume Ballet] Refactored the script to take comments into
account.
d93e5e8 [Guillaume Ballet] [SPARK 2233] make-distribution script now lists
the git hash tag in the RELEASE file.
commit 3c104c79d24425786cec0034f269ba19cf465b31
Author: Matthew Farrellee <[email protected]>
Date: 2014-06-29T01:39:27Z
[SPARK-1394] Remove SIGCHLD handler in worker subprocess
It should not be the responsibility of the worker subprocess, which
does not intentionally fork, to try and cleanup child processes. Doing
so is complex and interferes with operations such as
platform.system().
If it is desirable to have tighter control over subprocesses, then
namespaces should be used and it should be the manager's resposibility
to handle cleanup.
Author: Matthew Farrellee <[email protected]>
Closes #1247 from mattf/SPARK-1394 and squashes the following commits:
c36f308 [Matthew Farrellee] [SPARK-1394] Remove SIGCHLD handler in worker
subprocess
commit 2053d793cc2e8e5f5776e6576ddc6f8e6168e60c
Author: Reynold Xin <[email protected]>
Date: 2014-06-29T04:05:03Z
Improve MapOutputTracker error logging.
Author: Reynold Xin <[email protected]>
Closes #1258 from rxin/mapOutputTracker and squashes the following commits:
a7c95b6 [Reynold Xin] Improve MapOutputTracker error logging.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---