[GitHub] spark pull request: Branch 1.3

hxquangnhat Thu, 04 Jun 2015 00:48:02 -0700

GitHub user hxquangnhat opened a pull request:

    https://github.com/apache/spark/pull/6635


    Branch 1.3

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hxquangnhat/spark branch-1.3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/6635.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6635
    
----
commit 53068f56f40bf03b7fc52e5980fb7e205903fc8b
Author: Patrick Wendell <[email protected]>
Date:   2015-02-11T06:45:03Z

    Preparing Spark release v1.3.0-snapshot1

commit ba12b793f1f4f432e71439e2a7ebacce74d9c472
Author: Patrick Wendell <[email protected]>
Date:   2015-02-11T06:45:03Z

    Preparing development version 1.3.1-SNAPSHOT

commit 0386fc4d6b8ede2d7e4a962b0e3c2569e273e7ec
Author: Patrick Wendell <[email protected]>
Date:   2015-02-11T07:39:21Z

    HOTFIX: Adding Junit to Hive tests for Maven build

commit 3a503839ffbcf367578d2148a27ad4300c124646
Author: Patrick Wendell <[email protected]>
Date:   2015-02-11T07:46:02Z

    Revert "Preparing development version 1.3.1-SNAPSHOT"
    
    This reverts commit ba12b793f1f4f432e71439e2a7ebacce74d9c472.

commit 6a91d5993380e266441ccb27b1b06b528a968dec
Author: Patrick Wendell <[email protected]>
Date:   2015-02-11T07:46:04Z

    Revert "Preparing Spark release v1.3.0-snapshot1"
    
    This reverts commit 53068f56f40bf03b7fc52e5980fb7e205903fc8b.

commit d97bfc6f28ec4b7acfb36410c7c167d8d3c145ec
Author: Patrick Wendell <[email protected]>
Date:   2015-02-11T07:47:02Z

    Preparing Spark release v1.3.0-snapshot1

commit e57c81b8c1a6581c2588973eaf30d3c7ae90ed0c
Author: Patrick Wendell <[email protected]>
Date:   2015-02-11T07:47:03Z

    Preparing development version 1.3.1-SNAPSHOT

commit 811d1798d7a76a70bc684f11854031763faadd42
Author: cody koeninger <[email protected]>
Date:   2015-02-11T08:13:27Z

    [SPARK-4964] [Streaming] refactor createRDD to take leaders via map instead 
of array
    
    Author: cody koeninger <[email protected]>
    
    Closes #4511 from koeninger/kafkaRdd-leader-to-broker and squashes the 
following commits:
    
    f7151d4 [cody koeninger] [SPARK-4964] test refactoring
    6f8680b [cody koeninger] [SPARK-4964] add test of the scala api for 
KafkaUtils.createRDD
    f81e016 [cody koeninger] [SPARK-4964] leave KafkaStreamSuite host and port 
as private
    5173f3f [cody koeninger] [SPARK-4964] test the Java variations of createRDD
    e9cece4 [cody koeninger] [SPARK-4964] pass leaders as a map to ensure 1 
leader per TopicPartition
    
    (cherry picked from commit 658687b25491047f30ee8558733d11e5a0572070)
    Signed-off-by: Tathagata Das <[email protected]>

commit 476b6d77b401143bd44441a75131232fdf6efff8
Author: Sean Owen <[email protected]>
Date:   2015-02-11T08:13:51Z

    SPARK-5728 [STREAMING] MQTTStreamSuite leaves behind ActiveMQ database files
    
    Use temp dir for ActiveMQ database
    
    Author: Sean Owen <[email protected]>
    
    Closes #4517 from srowen/SPARK-5728 and squashes the following commits:
    
    1d3aeb8 [Sean Owen] Use temp dir for ActiveMQ database
    
    (cherry picked from commit da89720bf4023392436e75b6ed5e10ed8588a132)
    Signed-off-by: Sean Owen <[email protected]>

commit 057ec4f3342fbffea497e06e7e43591da2ce1a20
Author: Sean Owen <[email protected]>
Date:   2015-02-11T08:30:16Z

    SPARK-5727 [BUILD] Deprecate Debian packaging
    
    This just adds a deprecation message. It's intended for backporting to 
branch 1.3 but can go in master too, to be followed by another PR that removes 
it for 1.4.
    
    Author: Sean Owen <[email protected]>
    
    Closes #4516 from srowen/SPARK-5727.1 and squashes the following commits:
    
    d48989f [Sean Owen] Refer to Spark 1.4
    6c1c8b3 [Sean Owen] Deprecate Debian packaging
    
    (cherry picked from commit bd0d6e0cc3a329c4a1c08451a6d8a9281a422958)
    Signed-off-by: Sean Owen <[email protected]>

commit 864dccd7077b30f486e19a846ba5af828d1dc234
Author: guliangliang <[email protected]>
Date:   2015-02-11T15:55:49Z

    [SPARK-5733] Error Link in Pagination of HistroyPage when showing 
Incomplete Applications
    
    The links in pagination of HistroyPage is wrong when showing Incomplete 
Applications.
    
    If "2" is click on the following page 
"http://history-server:18080/?page=1&showIncomplete=true";, it will go to 
"http://history-server:18080/?page=2"; instead of 
"http://history-server:18080/?page=2&showIncomplete=true";.
    
    Author: guliangliang <[email protected]>
    
    Closes #4523 from marsishandsome/Spark5733 and squashes the following 
commits:
    
    9d7b593 [guliangliang] [SPARK-5733] Error Link in Pagination of HistroyPage 
when showing Incomplete Applications
    
    (cherry picked from commit 1ac099e3e00ddb01af8e6e3a84c70f8363f04b5c)
    Signed-off-by: Sean Owen <[email protected]>

commit d66aae21798503cb1eedb4469fe19a4475a45209
Author: Davies Liu <[email protected]>
Date:   2015-02-11T20:13:16Z

    [SPARK-5677] [SPARK-5734] [SQL] [PySpark] Python DataFrame API remaining 
tasks
    
    1. DataFrame.renameColumn
    
    2. DataFrame.show() and _repr_
    
    3. Use simpleString() rather than jsonValue in DataFrame.dtypes
    
    4. createDataFrame from local Python data, including pandas.DataFrame
    
    Author: Davies Liu <[email protected]>
    
    Closes #4528 from davies/df3 and squashes the following commits:
    
    014acea [Davies Liu] fix typo
    6ba526e [Davies Liu] fix tests
    46f5f95 [Davies Liu] address comments
    6cbc154 [Davies Liu] dataframe.show() and improve dtypes
    6f94f25 [Davies Liu] create DataFrame from local Python data
    
    (cherry picked from commit b694eb9c2fefeaa33891d3e61f9bea369bc09984)
    Signed-off-by: Reynold Xin <[email protected]>

commit 72adfc59563143ed70f563eb3f84714cb8a61d3b
Author: Daniel Darabos <[email protected]>
Date:   2015-02-11T20:24:17Z

    Remove outdated remark about take(n).
    
    Looking at the code, I believe this remark about `take(n)` computing 
partitions on the driver is no longer correct. Apologies if I'm wrong.
    
    This came up in http://stackoverflow.com/q/28436559/3318517.
    
    Author: Daniel Darabos <[email protected]>
    
    Closes #4533 from darabos/patch-2 and squashes the following commits:
    
    cc80f3a [Daniel Darabos] Remove outdated remark about take(n).
    
    (cherry picked from commit 03bf704bf442ac7dd960795295b51957ce972491)
    Signed-off-by: Sean Owen <[email protected]>

commit 1bb3631ef0db2aa1e2f3aa5ddbe6b93920d28e39
Author: Michael Armbrust <[email protected]>
Date:   2015-02-11T20:31:56Z

    [SPARK-5454] More robust handling of self joins
    
    Also I fix a bunch of bad output in test cases.
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #4520 from marmbrus/selfJoin and squashes the following commits:
    
    4f4a85c [Michael Armbrust] comments
    49c8e26 [Michael Armbrust] fix tests
    6fc38de [Michael Armbrust] fix style
    55d64b3 [Michael Armbrust] fix dataframe selfjoins
    
    (cherry picked from commit a60d2b70adff3a8fb3bdfac226b1d86fdb443da4)
    Signed-off-by: Michael Armbrust <[email protected]>

commit e136f477ebafa6047051a90ad344fe64ad451f7e
Author: tianyi <[email protected]>
Date:   2015-02-11T20:50:17Z

    [SPARK-3688][SQL]LogicalPlan can't resolve column correctlly
    
    This PR fixed the resolving problem described in 
https://issues.apache.org/jira/browse/SPARK-3688
    ```
    CREATE TABLE t1(x INT);
    CREATE TABLE t2(a STRUCT<x: INT>, k INT);
    SELECT a.x FROM t1 a JOIN t2 b ON a.x = b.k;
    ```
    
    Author: tianyi <[email protected]>
    
    Closes #4524 from tianyi/SPARK-3688 and squashes the following commits:
    
    237a256 [tianyi] resolve a name with table.column pattern first.
    
    (cherry picked from commit 44b2311d946981c8251cb7807d70c8e99db5bbed)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 08ab3d236309b2434dacdea54362b3970fd466b4
Author: Reynold Xin <[email protected]>
Date:   2015-02-11T23:26:31Z

    [SPARK-3688][SQL] More inline comments for LogicalPlan.
    
    As a follow-up to https://github.com/apache/spark/pull/4524
    
    Author: Reynold Xin <[email protected]>
    
    Closes #4539 from rxin/SPARK-3688 and squashes the following commits:
    
    5ac56c7 [Reynold Xin] exists
    da8eea4 [Reynold Xin] [SPARK-3688][SQL] More inline comments for 
LogicalPlan.
    
    (cherry picked from commit fa6bdc6e819f9338248b952ec578bcd791ddbf6d)
    Signed-off-by: Reynold Xin <[email protected]>

commit bcb13827c684ef2e0e2d76832a3b736b35682ba6
Author: Reynold Xin <[email protected]>
Date:   2015-02-12T02:32:48Z

    [SQL] Two DataFrame fixes.
    
    - Removed DataFrame.apply for projection & filtering since they are 
extremely confusing.
    - Added implicits for RDD[Int], RDD[Long], and RDD[String]
    
    Author: Reynold Xin <[email protected]>
    
    Closes #4543 from rxin/df-cleanup and squashes the following commits:
    
    81ec915 [Reynold Xin] [SQL] More DataFrame fixes.
    
    (cherry picked from commit d931b01dcaaf009dcf68dcfe83428bd7f9e857cc)
    Signed-off-by: Reynold Xin <[email protected]>

commit 3c1b9bf65290cc1fd4444690a5c5c252667e4576
Author: Michael Armbrust <[email protected]>
Date:   2015-02-12T03:05:49Z

    [SQL] Make dataframe more tolerant of being serialized
    
    Eases use in the spark-shell.
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #4545 from marmbrus/serialization and squashes the following commits:
    
    04748e6 [Michael Armbrust] @scala.annotation.varargs
    b36e219 [Michael Armbrust] moreFixes
    
    (cherry picked from commit a38e23c30fb5d12f8f46a119d91a0620036e6800)
    Signed-off-by: Michael Armbrust <[email protected]>

commit e23c8f5c8953bcb9a509b8521ca0cb49c5181079
Author: Andrew Rowson <[email protected]>
Date:   2015-02-12T18:41:39Z

    [SPARK-5655] Don't chmod700 application files if running in YARN
    
    [Was previously PR4507]
    
    As per SPARK-5655, recently committed code chmod 700s all application files 
created on the local fs by a spark executor. This is both unnecessary and 
broken on YARN, where files created in the nodemanager's working directory are 
already owned by the user running the job and the 'yarn' group. Group read 
permission is also needed for the auxiliary shuffle service to be able to read 
the files, as this is running as the 'yarn' user.
    
    Author: Andrew Rowson <[email protected]>
    
    Closes #4509 from growse/master and squashes the following commits:
    
    7ca993c [Andrew Rowson] Moved chmod700 functionality into 
Utils.getOrCreateLocalRootDirs
    f57ce6b [Andrew Rowson] [SPARK-5655] Don't chmod700 application files if 
running in a YARN container
    
    (cherry picked from commit 466b1f671b21f575d28f9c103f51765790914fe3)
    Signed-off-by: Sean Owen <[email protected]>

commit e26c14990c477249241b429c1bb877c3d9339744
Author: Xiangrui Meng <[email protected]>
Date:   2015-02-12T18:48:13Z

    [SPARK-5757][MLLIB] replace SQL JSON usage in model import/export by json4s
    
    This PR detaches MLlib model import/export code from SQL's JSON support, 
and hence unblocks #4544 . yhuai
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #4555 from mengxr/SPARK-5757 and squashes the following commits:
    
    b0415e8 [Xiangrui Meng] replace SQL JSON usage by json4s
    
    (cherry picked from commit 99bd5006650bb15ec5465ffee1ebaca81354a3df)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit cbd659e5fc0e4413334bb4cb5ab8e42bbd5aa8c5
Author: Antonio Navarro Perez <[email protected]>
Date:   2015-02-12T20:46:17Z

    [SQL][DOCS] Update sql documentation
    
    Updated examples using the new api and added DataFrame concept
    
    Author: Antonio Navarro Perez <[email protected]>
    
    Closes #4560 from ajnavarro/ajnavarro-doc-sql-update and squashes the 
following commits:
    
    82ebcf3 [Antonio Navarro Perez] Changed a missing JavaSQLContext to 
SQLContext.
    8d5376a [Antonio Navarro Perez] fixed typo
    8196b6b [Antonio Navarro Perez] [SQL][DOCS] Update sql documentation
    
    (cherry picked from commit 6a1be026cf37e4c8bf39133dfb4a73f7caedcc26)
    Signed-off-by: Reynold Xin <[email protected]>

commit e3a975d45a960ddbfe03051a6ae8b614e63cde6b
Author: Michael Armbrust <[email protected]>
Date:   2015-02-12T21:11:28Z

    [SQL] Improve error messages
    
    Author: Michael Armbrust <[email protected]>
    Author: wangfei <[email protected]>
    
    Closes #4558 from marmbrus/errorMessages and squashes the following commits:
    
    5e5ab50 [Michael Armbrust] Merge pull request #15 from scwf/errorMessages
    fa38881 [wangfei] fix for grouping__id
    f279a71 [wangfei] make right references for ScriptTransformation
    d29fbde [Michael Armbrust] extra case
    1a797b4 [Michael Armbrust] comments
    d4e9015 [Michael Armbrust] add comment
    af9e668 [Michael Armbrust] no braces
    34eb3a4 [Michael Armbrust] more work
    6197cd5 [Michael Armbrust] [SQL] Better error messages for analysis failures
    
    (cherry picked from commit aa4ca8b873fd83e64e5faea6f7febcc830e30b02)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 74f34bb8bb7080c7ae669a6b541e9418cfa1fc9f
Author: Kay Ousterhout <[email protected]>
Date:   2015-02-12T22:35:44Z

    [SPARK-5645] Added local read bytes/time to task metrics
    
    ksakellis I stumbled on your JIRA for this yesterday; I know it's assigned 
to you but I'd already done this for my own uses a while ago so thought I could 
help save you the work of doing it!  Hopefully this doesn't duplicate any work 
you've already done.
    
    Here's a screenshot of what the UI looks like:
    
![image](https://cloud.githubusercontent.com/assets/1108612/6135352/c03e7276-b11c-11e4-8f11-c6aefe1f35b9.png)
    Based on a discussion with pwendell, I put the data read remotely in as an 
additional metric rather than showing it in brackets as you'd suggested, 
Kostas.  The assumption here is that the average user doesn't care about the 
differentiation between local / remote data, so it's better not to pollute the 
UI.
    
    I also added data about the local read time, which I've found very helpful 
for debugging, but I didn't put it in the UI because I think it's probably 
something not a ton of people will need to use.
    
    With this change, the total read time and total write time shown in the UI 
will be equal, fixing a long-term source of user confusion:
    
![image](https://cloud.githubusercontent.com/assets/1108612/6135399/25f14490-b11d-11e4-8086-20be5f4002e6.png)
    
    Author: Kay Ousterhout <[email protected]>
    
    Closes #4510 from kayousterhout/SPARK-5645 and squashes the following 
commits:
    
    4a0182c [Kay Ousterhout] oops
    5f5da1b [Kay Ousterhout] Small style fix
    5da04cf [Kay Ousterhout] Addressed more comments from Kostas
    ba05149 [Kay Ousterhout] Remove parens
    a9dc685 [Kay Ousterhout] Kostas comment, test fix
    33d2e2d [Kay Ousterhout] Merge remote-tracking branch 'upstream/master' 
into SPARK-5645
    347e2cd [Kay Ousterhout] [SPARK-5645] Added local read bytes/time to task 
metrics
    
    (cherry picked from commit 893d6fd7049daf3c4d01eb6a960801cd064d5f73)
    Signed-off-by: Andrew Or <[email protected]>

commit 9a1de4b20fcfa756f228b263f2a778534f6ca90d
Author: Venkata Ramana Gollamudi <[email protected]>
Date:   2015-02-12T22:44:21Z

    [SPARK-5765][Examples]Fixed word split problem in run-example and 
compute-classpath
    
    Author: Venkata Ramana G <ramana.gollamudihuawei.com>
    
    Author: Venkata Ramana Gollamudi <[email protected]>
    
    Closes #4561 from gvramana/word_split and squashes the following commits:
    
    285c8d4 [Venkata Ramana Gollamudi] Fixed word split problem in run-example 
and compute-classpath
    
    (cherry picked from commit 629d0143eeb3c153dac9c65e7b556723c6b4bfc7)
    Signed-off-by: Andrew Or <[email protected]>

commit 0040fc50918cf5e53554b0dc8053528af58e6ba8
Author: Kay Ousterhout <[email protected]>
Date:   2015-02-12T22:46:37Z

    [SPARK-5762] Fix shuffle write time for sort-based shuffle
    
    mateiz was excluding the time to write this final file from the shuffle 
write time intentional?
    
    Author: Kay Ousterhout <[email protected]>
    
    Closes #4559 from kayousterhout/SPARK-5762 and squashes the following 
commits:
    
    5c6f3d9 [Kay Ousterhout] Use foreach
    94e4237 [Kay Ousterhout] Removed open time metrics added inadvertently
    ace156c [Kay Ousterhout] Moved metrics to finally block
    d773276 [Kay Ousterhout] Use nano time
    5a59906 [Kay Ousterhout] [SPARK-5762] Fix shuffle write time for sort-based 
shuffle
    
    (cherry picked from commit 47c73d410ab533c3196184d2b6004081e79daeaa)
    Signed-off-by: Andrew Or <[email protected]>

commit 11d108030516b1a0bd45f36312f6210dc9a577b0
Author: Andrew Or <[email protected]>
Date:   2015-02-12T22:47:52Z

    [SPARK-5760][SPARK-5761] Fix standalone rest protocol corner cases + revamp 
tests
    
    The changes are summarized in the commit message. Test or test-related code 
accounts for 90% of the lines changed.
    
    Author: Andrew Or <[email protected]>
    
    Closes #4557 from andrewor14/rest-tests and squashes the following commits:
    
    b4dc980 [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
rest-tests
    b55e40f [Andrew Or] Add test for unknown fields
    cc96993 [Andrew Or] private[spark] -> private[rest]
    578cf45 [Andrew Or] Clean up test code a little
    d82d971 [Andrew Or] v1 -> serverVersion
    ea48f65 [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
rest-tests
    00999a8 [Andrew Or] Revamp tests + fix a few corner cases
    
    (cherry picked from commit 1d5663e92cdaaa3dabfa58fdd7aede7e4fa4ec63)
    Signed-off-by: Andrew Or <[email protected]>

commit 02d5b32bbebc055c1b4cde4f08a8194397921aa9
Author: lianhuiwang <[email protected]>
Date:   2015-02-12T22:50:16Z

    [SPARK-5759][Yarn]ExecutorRunnable should catch YarnException while 
NMClient start contain...
    
    some time since some reasons, it lead to some exception while NMClient 
start some containers.example:we do not config spark_shuffle on some machines, 
so it will throw a exception:
    java.lang.Error: 
org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The 
auxService:spark_shuffle does not exist.
    because YarnAllocator use ThreadPoolExecutor to start Container, so we can 
not find which container or hostname throw exception. I think we should catch 
YarnException in ExecutorRunnable when start container. if there are some 
exceptions, we can know the container id or hostname of failed container.
    
    Author: lianhuiwang <[email protected]>
    
    Closes #4554 from lianhuiwang/SPARK-5759 and squashes the following commits:
    
    caf5a99 [lianhuiwang] use SparkException to warp exception
    c02140f [lianhuiwang] ExecutorRunnable should catch YarnException while 
NMClient start container
    
    (cherry picked from commit 947b8bd82ec0f4c45910e6d781df4661f56e4587)
    Signed-off-by: Andrew Or <[email protected]>

commit 11a0d5b6dce49c2beac8fd7eae2ccadf59a1e030
Author: David Y. Ross <[email protected]>
Date:   2015-02-12T22:52:38Z

    SPARK-5747: Fix wordsplitting bugs in make-distribution.sh
    
    The `$MVN` command variable may have spaces, so when referring to it, must 
wrap in quotes.
    
    Author: David Y. Ross <[email protected]>
    
    Closes #4540 from dyross/dyr-fix-make-distribution2 and squashes the 
following commits:
    
    5a41596 [David Y. Ross] SPARK-5747: Fix wordsplitting bugs in 
make-distribution.sh
    
    (cherry picked from commit 26c816e7388eaa336a59183029f86548f1cc279c)
    Signed-off-by: Andrew Or <[email protected]>

commit bf0d15c5255f054d2fb70d82ca96797a3665f058
Author: Davies Liu <[email protected]>
Date:   2015-02-12T22:54:38Z

    [SPARK-5780] [PySpark] Mute the logging during unit tests
    
    There a bunch of logging coming from driver and worker, it's noisy and 
scaring, and a lots of exception in it, people are confusing about the tests 
are failing or not.
    
    This PR will mute the logging during tests, only show them if any one 
failed.
    
    Author: Davies Liu <[email protected]>
    
    Closes #4572 from davies/mute and squashes the following commits:
    
    1e9069c [Davies Liu] mute the logging during python tests
    
    (cherry picked from commit 0bf031582588723dd5a4ca42e6f9f36bc2da1a0b)
    Signed-off-by: Andrew Or <[email protected]>

commit b0c79daf4a24739963726dfecedff9a4b129f3c0
Author: Yin Huai <[email protected]>
Date:   2015-02-12T23:17:25Z

    [SPARK-5758][SQL] Use LongType as the default type for integers in JSON 
schema inference.
    
    Author: Yin Huai <[email protected]>
    
    Closes #4544 from yhuai/jsonUseLongTypeByDefault and squashes the following 
commits:
    
    6e2ffc2 [Yin Huai] Use LongType as the default type for integers in JSON 
schema inference.
    
    (cherry picked from commit c352ffbdb9112714c176a747edff6115e9369e58)
    Signed-off-by: Michael Armbrust <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Branch 1.3

Reply via email to