GitHub user hxquangnhat opened a pull request:
https://github.com/apache/spark/pull/6635
Branch 1.3
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hxquangnhat/spark branch-1.3
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/6635.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6635
----
commit 53068f56f40bf03b7fc52e5980fb7e205903fc8b
Author: Patrick Wendell <[email protected]>
Date: 2015-02-11T06:45:03Z
Preparing Spark release v1.3.0-snapshot1
commit ba12b793f1f4f432e71439e2a7ebacce74d9c472
Author: Patrick Wendell <[email protected]>
Date: 2015-02-11T06:45:03Z
Preparing development version 1.3.1-SNAPSHOT
commit 0386fc4d6b8ede2d7e4a962b0e3c2569e273e7ec
Author: Patrick Wendell <[email protected]>
Date: 2015-02-11T07:39:21Z
HOTFIX: Adding Junit to Hive tests for Maven build
commit 3a503839ffbcf367578d2148a27ad4300c124646
Author: Patrick Wendell <[email protected]>
Date: 2015-02-11T07:46:02Z
Revert "Preparing development version 1.3.1-SNAPSHOT"
This reverts commit ba12b793f1f4f432e71439e2a7ebacce74d9c472.
commit 6a91d5993380e266441ccb27b1b06b528a968dec
Author: Patrick Wendell <[email protected]>
Date: 2015-02-11T07:46:04Z
Revert "Preparing Spark release v1.3.0-snapshot1"
This reverts commit 53068f56f40bf03b7fc52e5980fb7e205903fc8b.
commit d97bfc6f28ec4b7acfb36410c7c167d8d3c145ec
Author: Patrick Wendell <[email protected]>
Date: 2015-02-11T07:47:02Z
Preparing Spark release v1.3.0-snapshot1
commit e57c81b8c1a6581c2588973eaf30d3c7ae90ed0c
Author: Patrick Wendell <[email protected]>
Date: 2015-02-11T07:47:03Z
Preparing development version 1.3.1-SNAPSHOT
commit 811d1798d7a76a70bc684f11854031763faadd42
Author: cody koeninger <[email protected]>
Date: 2015-02-11T08:13:27Z
[SPARK-4964] [Streaming] refactor createRDD to take leaders via map instead
of array
Author: cody koeninger <[email protected]>
Closes #4511 from koeninger/kafkaRdd-leader-to-broker and squashes the
following commits:
f7151d4 [cody koeninger] [SPARK-4964] test refactoring
6f8680b [cody koeninger] [SPARK-4964] add test of the scala api for
KafkaUtils.createRDD
f81e016 [cody koeninger] [SPARK-4964] leave KafkaStreamSuite host and port
as private
5173f3f [cody koeninger] [SPARK-4964] test the Java variations of createRDD
e9cece4 [cody koeninger] [SPARK-4964] pass leaders as a map to ensure 1
leader per TopicPartition
(cherry picked from commit 658687b25491047f30ee8558733d11e5a0572070)
Signed-off-by: Tathagata Das <[email protected]>
commit 476b6d77b401143bd44441a75131232fdf6efff8
Author: Sean Owen <[email protected]>
Date: 2015-02-11T08:13:51Z
SPARK-5728 [STREAMING] MQTTStreamSuite leaves behind ActiveMQ database files
Use temp dir for ActiveMQ database
Author: Sean Owen <[email protected]>
Closes #4517 from srowen/SPARK-5728 and squashes the following commits:
1d3aeb8 [Sean Owen] Use temp dir for ActiveMQ database
(cherry picked from commit da89720bf4023392436e75b6ed5e10ed8588a132)
Signed-off-by: Sean Owen <[email protected]>
commit 057ec4f3342fbffea497e06e7e43591da2ce1a20
Author: Sean Owen <[email protected]>
Date: 2015-02-11T08:30:16Z
SPARK-5727 [BUILD] Deprecate Debian packaging
This just adds a deprecation message. It's intended for backporting to
branch 1.3 but can go in master too, to be followed by another PR that removes
it for 1.4.
Author: Sean Owen <[email protected]>
Closes #4516 from srowen/SPARK-5727.1 and squashes the following commits:
d48989f [Sean Owen] Refer to Spark 1.4
6c1c8b3 [Sean Owen] Deprecate Debian packaging
(cherry picked from commit bd0d6e0cc3a329c4a1c08451a6d8a9281a422958)
Signed-off-by: Sean Owen <[email protected]>
commit 864dccd7077b30f486e19a846ba5af828d1dc234
Author: guliangliang <[email protected]>
Date: 2015-02-11T15:55:49Z
[SPARK-5733] Error Link in Pagination of HistroyPage when showing
Incomplete Applications
The links in pagination of HistroyPage is wrong when showing Incomplete
Applications.
If "2" is click on the following page
"http://history-server:18080/?page=1&showIncomplete=true", it will go to
"http://history-server:18080/?page=2" instead of
"http://history-server:18080/?page=2&showIncomplete=true".
Author: guliangliang <[email protected]>
Closes #4523 from marsishandsome/Spark5733 and squashes the following
commits:
9d7b593 [guliangliang] [SPARK-5733] Error Link in Pagination of HistroyPage
when showing Incomplete Applications
(cherry picked from commit 1ac099e3e00ddb01af8e6e3a84c70f8363f04b5c)
Signed-off-by: Sean Owen <[email protected]>
commit d66aae21798503cb1eedb4469fe19a4475a45209
Author: Davies Liu <[email protected]>
Date: 2015-02-11T20:13:16Z
[SPARK-5677] [SPARK-5734] [SQL] [PySpark] Python DataFrame API remaining
tasks
1. DataFrame.renameColumn
2. DataFrame.show() and _repr_
3. Use simpleString() rather than jsonValue in DataFrame.dtypes
4. createDataFrame from local Python data, including pandas.DataFrame
Author: Davies Liu <[email protected]>
Closes #4528 from davies/df3 and squashes the following commits:
014acea [Davies Liu] fix typo
6ba526e [Davies Liu] fix tests
46f5f95 [Davies Liu] address comments
6cbc154 [Davies Liu] dataframe.show() and improve dtypes
6f94f25 [Davies Liu] create DataFrame from local Python data
(cherry picked from commit b694eb9c2fefeaa33891d3e61f9bea369bc09984)
Signed-off-by: Reynold Xin <[email protected]>
commit 72adfc59563143ed70f563eb3f84714cb8a61d3b
Author: Daniel Darabos <[email protected]>
Date: 2015-02-11T20:24:17Z
Remove outdated remark about take(n).
Looking at the code, I believe this remark about `take(n)` computing
partitions on the driver is no longer correct. Apologies if I'm wrong.
This came up in http://stackoverflow.com/q/28436559/3318517.
Author: Daniel Darabos <[email protected]>
Closes #4533 from darabos/patch-2 and squashes the following commits:
cc80f3a [Daniel Darabos] Remove outdated remark about take(n).
(cherry picked from commit 03bf704bf442ac7dd960795295b51957ce972491)
Signed-off-by: Sean Owen <[email protected]>
commit 1bb3631ef0db2aa1e2f3aa5ddbe6b93920d28e39
Author: Michael Armbrust <[email protected]>
Date: 2015-02-11T20:31:56Z
[SPARK-5454] More robust handling of self joins
Also I fix a bunch of bad output in test cases.
Author: Michael Armbrust <[email protected]>
Closes #4520 from marmbrus/selfJoin and squashes the following commits:
4f4a85c [Michael Armbrust] comments
49c8e26 [Michael Armbrust] fix tests
6fc38de [Michael Armbrust] fix style
55d64b3 [Michael Armbrust] fix dataframe selfjoins
(cherry picked from commit a60d2b70adff3a8fb3bdfac226b1d86fdb443da4)
Signed-off-by: Michael Armbrust <[email protected]>
commit e136f477ebafa6047051a90ad344fe64ad451f7e
Author: tianyi <[email protected]>
Date: 2015-02-11T20:50:17Z
[SPARK-3688][SQL]LogicalPlan can't resolve column correctlly
This PR fixed the resolving problem described in
https://issues.apache.org/jira/browse/SPARK-3688
```
CREATE TABLE t1(x INT);
CREATE TABLE t2(a STRUCT<x: INT>, k INT);
SELECT a.x FROM t1 a JOIN t2 b ON a.x = b.k;
```
Author: tianyi <[email protected]>
Closes #4524 from tianyi/SPARK-3688 and squashes the following commits:
237a256 [tianyi] resolve a name with table.column pattern first.
(cherry picked from commit 44b2311d946981c8251cb7807d70c8e99db5bbed)
Signed-off-by: Michael Armbrust <[email protected]>
commit 08ab3d236309b2434dacdea54362b3970fd466b4
Author: Reynold Xin <[email protected]>
Date: 2015-02-11T23:26:31Z
[SPARK-3688][SQL] More inline comments for LogicalPlan.
As a follow-up to https://github.com/apache/spark/pull/4524
Author: Reynold Xin <[email protected]>
Closes #4539 from rxin/SPARK-3688 and squashes the following commits:
5ac56c7 [Reynold Xin] exists
da8eea4 [Reynold Xin] [SPARK-3688][SQL] More inline comments for
LogicalPlan.
(cherry picked from commit fa6bdc6e819f9338248b952ec578bcd791ddbf6d)
Signed-off-by: Reynold Xin <[email protected]>
commit bcb13827c684ef2e0e2d76832a3b736b35682ba6
Author: Reynold Xin <[email protected]>
Date: 2015-02-12T02:32:48Z
[SQL] Two DataFrame fixes.
- Removed DataFrame.apply for projection & filtering since they are
extremely confusing.
- Added implicits for RDD[Int], RDD[Long], and RDD[String]
Author: Reynold Xin <[email protected]>
Closes #4543 from rxin/df-cleanup and squashes the following commits:
81ec915 [Reynold Xin] [SQL] More DataFrame fixes.
(cherry picked from commit d931b01dcaaf009dcf68dcfe83428bd7f9e857cc)
Signed-off-by: Reynold Xin <[email protected]>
commit 3c1b9bf65290cc1fd4444690a5c5c252667e4576
Author: Michael Armbrust <[email protected]>
Date: 2015-02-12T03:05:49Z
[SQL] Make dataframe more tolerant of being serialized
Eases use in the spark-shell.
Author: Michael Armbrust <[email protected]>
Closes #4545 from marmbrus/serialization and squashes the following commits:
04748e6 [Michael Armbrust] @scala.annotation.varargs
b36e219 [Michael Armbrust] moreFixes
(cherry picked from commit a38e23c30fb5d12f8f46a119d91a0620036e6800)
Signed-off-by: Michael Armbrust <[email protected]>
commit e23c8f5c8953bcb9a509b8521ca0cb49c5181079
Author: Andrew Rowson <[email protected]>
Date: 2015-02-12T18:41:39Z
[SPARK-5655] Don't chmod700 application files if running in YARN
[Was previously PR4507]
As per SPARK-5655, recently committed code chmod 700s all application files
created on the local fs by a spark executor. This is both unnecessary and
broken on YARN, where files created in the nodemanager's working directory are
already owned by the user running the job and the 'yarn' group. Group read
permission is also needed for the auxiliary shuffle service to be able to read
the files, as this is running as the 'yarn' user.
Author: Andrew Rowson <[email protected]>
Closes #4509 from growse/master and squashes the following commits:
7ca993c [Andrew Rowson] Moved chmod700 functionality into
Utils.getOrCreateLocalRootDirs
f57ce6b [Andrew Rowson] [SPARK-5655] Don't chmod700 application files if
running in a YARN container
(cherry picked from commit 466b1f671b21f575d28f9c103f51765790914fe3)
Signed-off-by: Sean Owen <[email protected]>
commit e26c14990c477249241b429c1bb877c3d9339744
Author: Xiangrui Meng <[email protected]>
Date: 2015-02-12T18:48:13Z
[SPARK-5757][MLLIB] replace SQL JSON usage in model import/export by json4s
This PR detaches MLlib model import/export code from SQL's JSON support,
and hence unblocks #4544 . yhuai
Author: Xiangrui Meng <[email protected]>
Closes #4555 from mengxr/SPARK-5757 and squashes the following commits:
b0415e8 [Xiangrui Meng] replace SQL JSON usage by json4s
(cherry picked from commit 99bd5006650bb15ec5465ffee1ebaca81354a3df)
Signed-off-by: Xiangrui Meng <[email protected]>
commit cbd659e5fc0e4413334bb4cb5ab8e42bbd5aa8c5
Author: Antonio Navarro Perez <[email protected]>
Date: 2015-02-12T20:46:17Z
[SQL][DOCS] Update sql documentation
Updated examples using the new api and added DataFrame concept
Author: Antonio Navarro Perez <[email protected]>
Closes #4560 from ajnavarro/ajnavarro-doc-sql-update and squashes the
following commits:
82ebcf3 [Antonio Navarro Perez] Changed a missing JavaSQLContext to
SQLContext.
8d5376a [Antonio Navarro Perez] fixed typo
8196b6b [Antonio Navarro Perez] [SQL][DOCS] Update sql documentation
(cherry picked from commit 6a1be026cf37e4c8bf39133dfb4a73f7caedcc26)
Signed-off-by: Reynold Xin <[email protected]>
commit e3a975d45a960ddbfe03051a6ae8b614e63cde6b
Author: Michael Armbrust <[email protected]>
Date: 2015-02-12T21:11:28Z
[SQL] Improve error messages
Author: Michael Armbrust <[email protected]>
Author: wangfei <[email protected]>
Closes #4558 from marmbrus/errorMessages and squashes the following commits:
5e5ab50 [Michael Armbrust] Merge pull request #15 from scwf/errorMessages
fa38881 [wangfei] fix for grouping__id
f279a71 [wangfei] make right references for ScriptTransformation
d29fbde [Michael Armbrust] extra case
1a797b4 [Michael Armbrust] comments
d4e9015 [Michael Armbrust] add comment
af9e668 [Michael Armbrust] no braces
34eb3a4 [Michael Armbrust] more work
6197cd5 [Michael Armbrust] [SQL] Better error messages for analysis failures
(cherry picked from commit aa4ca8b873fd83e64e5faea6f7febcc830e30b02)
Signed-off-by: Michael Armbrust <[email protected]>
commit 74f34bb8bb7080c7ae669a6b541e9418cfa1fc9f
Author: Kay Ousterhout <[email protected]>
Date: 2015-02-12T22:35:44Z
[SPARK-5645] Added local read bytes/time to task metrics
ksakellis I stumbled on your JIRA for this yesterday; I know it's assigned
to you but I'd already done this for my own uses a while ago so thought I could
help save you the work of doing it! Hopefully this doesn't duplicate any work
you've already done.
Here's a screenshot of what the UI looks like:

Based on a discussion with pwendell, I put the data read remotely in as an
additional metric rather than showing it in brackets as you'd suggested,
Kostas. The assumption here is that the average user doesn't care about the
differentiation between local / remote data, so it's better not to pollute the
UI.
I also added data about the local read time, which I've found very helpful
for debugging, but I didn't put it in the UI because I think it's probably
something not a ton of people will need to use.
With this change, the total read time and total write time shown in the UI
will be equal, fixing a long-term source of user confusion:

Author: Kay Ousterhout <[email protected]>
Closes #4510 from kayousterhout/SPARK-5645 and squashes the following
commits:
4a0182c [Kay Ousterhout] oops
5f5da1b [Kay Ousterhout] Small style fix
5da04cf [Kay Ousterhout] Addressed more comments from Kostas
ba05149 [Kay Ousterhout] Remove parens
a9dc685 [Kay Ousterhout] Kostas comment, test fix
33d2e2d [Kay Ousterhout] Merge remote-tracking branch 'upstream/master'
into SPARK-5645
347e2cd [Kay Ousterhout] [SPARK-5645] Added local read bytes/time to task
metrics
(cherry picked from commit 893d6fd7049daf3c4d01eb6a960801cd064d5f73)
Signed-off-by: Andrew Or <[email protected]>
commit 9a1de4b20fcfa756f228b263f2a778534f6ca90d
Author: Venkata Ramana Gollamudi <[email protected]>
Date: 2015-02-12T22:44:21Z
[SPARK-5765][Examples]Fixed word split problem in run-example and
compute-classpath
Author: Venkata Ramana G <ramana.gollamudihuawei.com>
Author: Venkata Ramana Gollamudi <[email protected]>
Closes #4561 from gvramana/word_split and squashes the following commits:
285c8d4 [Venkata Ramana Gollamudi] Fixed word split problem in run-example
and compute-classpath
(cherry picked from commit 629d0143eeb3c153dac9c65e7b556723c6b4bfc7)
Signed-off-by: Andrew Or <[email protected]>
commit 0040fc50918cf5e53554b0dc8053528af58e6ba8
Author: Kay Ousterhout <[email protected]>
Date: 2015-02-12T22:46:37Z
[SPARK-5762] Fix shuffle write time for sort-based shuffle
mateiz was excluding the time to write this final file from the shuffle
write time intentional?
Author: Kay Ousterhout <[email protected]>
Closes #4559 from kayousterhout/SPARK-5762 and squashes the following
commits:
5c6f3d9 [Kay Ousterhout] Use foreach
94e4237 [Kay Ousterhout] Removed open time metrics added inadvertently
ace156c [Kay Ousterhout] Moved metrics to finally block
d773276 [Kay Ousterhout] Use nano time
5a59906 [Kay Ousterhout] [SPARK-5762] Fix shuffle write time for sort-based
shuffle
(cherry picked from commit 47c73d410ab533c3196184d2b6004081e79daeaa)
Signed-off-by: Andrew Or <[email protected]>
commit 11d108030516b1a0bd45f36312f6210dc9a577b0
Author: Andrew Or <[email protected]>
Date: 2015-02-12T22:47:52Z
[SPARK-5760][SPARK-5761] Fix standalone rest protocol corner cases + revamp
tests
The changes are summarized in the commit message. Test or test-related code
accounts for 90% of the lines changed.
Author: Andrew Or <[email protected]>
Closes #4557 from andrewor14/rest-tests and squashes the following commits:
b4dc980 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
rest-tests
b55e40f [Andrew Or] Add test for unknown fields
cc96993 [Andrew Or] private[spark] -> private[rest]
578cf45 [Andrew Or] Clean up test code a little
d82d971 [Andrew Or] v1 -> serverVersion
ea48f65 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
rest-tests
00999a8 [Andrew Or] Revamp tests + fix a few corner cases
(cherry picked from commit 1d5663e92cdaaa3dabfa58fdd7aede7e4fa4ec63)
Signed-off-by: Andrew Or <[email protected]>
commit 02d5b32bbebc055c1b4cde4f08a8194397921aa9
Author: lianhuiwang <[email protected]>
Date: 2015-02-12T22:50:16Z
[SPARK-5759][Yarn]ExecutorRunnable should catch YarnException while
NMClient start contain...
some time since some reasons, it lead to some exception while NMClient
start some containers.example:we do not config spark_shuffle on some machines,
so it will throw a exception:
java.lang.Error:
org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The
auxService:spark_shuffle does not exist.
because YarnAllocator use ThreadPoolExecutor to start Container, so we can
not find which container or hostname throw exception. I think we should catch
YarnException in ExecutorRunnable when start container. if there are some
exceptions, we can know the container id or hostname of failed container.
Author: lianhuiwang <[email protected]>
Closes #4554 from lianhuiwang/SPARK-5759 and squashes the following commits:
caf5a99 [lianhuiwang] use SparkException to warp exception
c02140f [lianhuiwang] ExecutorRunnable should catch YarnException while
NMClient start container
(cherry picked from commit 947b8bd82ec0f4c45910e6d781df4661f56e4587)
Signed-off-by: Andrew Or <[email protected]>
commit 11a0d5b6dce49c2beac8fd7eae2ccadf59a1e030
Author: David Y. Ross <[email protected]>
Date: 2015-02-12T22:52:38Z
SPARK-5747: Fix wordsplitting bugs in make-distribution.sh
The `$MVN` command variable may have spaces, so when referring to it, must
wrap in quotes.
Author: David Y. Ross <[email protected]>
Closes #4540 from dyross/dyr-fix-make-distribution2 and squashes the
following commits:
5a41596 [David Y. Ross] SPARK-5747: Fix wordsplitting bugs in
make-distribution.sh
(cherry picked from commit 26c816e7388eaa336a59183029f86548f1cc279c)
Signed-off-by: Andrew Or <[email protected]>
commit bf0d15c5255f054d2fb70d82ca96797a3665f058
Author: Davies Liu <[email protected]>
Date: 2015-02-12T22:54:38Z
[SPARK-5780] [PySpark] Mute the logging during unit tests
There a bunch of logging coming from driver and worker, it's noisy and
scaring, and a lots of exception in it, people are confusing about the tests
are failing or not.
This PR will mute the logging during tests, only show them if any one
failed.
Author: Davies Liu <[email protected]>
Closes #4572 from davies/mute and squashes the following commits:
1e9069c [Davies Liu] mute the logging during python tests
(cherry picked from commit 0bf031582588723dd5a4ca42e6f9f36bc2da1a0b)
Signed-off-by: Andrew Or <[email protected]>
commit b0c79daf4a24739963726dfecedff9a4b129f3c0
Author: Yin Huai <[email protected]>
Date: 2015-02-12T23:17:25Z
[SPARK-5758][SQL] Use LongType as the default type for integers in JSON
schema inference.
Author: Yin Huai <[email protected]>
Closes #4544 from yhuai/jsonUseLongTypeByDefault and squashes the following
commits:
6e2ffc2 [Yin Huai] Use LongType as the default type for integers in JSON
schema inference.
(cherry picked from commit c352ffbdb9112714c176a747edff6115e9369e58)
Signed-off-by: Michael Armbrust <[email protected]>
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]