[GitHub] spark pull request: Branch 1.5

lin0Xu Wed, 02 Dec 2015 01:27:07 -0800

GitHub user lin0Xu opened a pull request:

    https://github.com/apache/spark/pull/10098


    Branch 1.5

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.5

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10098.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10098
    
----
commit 3ccd2e647a3c3039b1959a1e39c24cbe4fc6d9c5
Author: Dharmesh Kakadia <[email protected]>
Date:   2015-08-28T08:38:35Z

    typo in comment
    
    Author: Dharmesh Kakadia <[email protected]>
    
    Closes #8497 from dharmeshkakadia/patch-2.
    
    (cherry picked from commit 71a077f6c16c8816eae13341f645ba50d997f63d)
    Signed-off-by: Sean Owen <[email protected]>

commit 0cd49bacc8ec344a60bc2f5bf4c90cfd8c79abed
Author: Yuhao Yang <[email protected]>
Date:   2015-08-28T15:00:44Z

    [SPARK-9890] [DOC] [ML] User guide for CountVectorizer
    
    jira: https://issues.apache.org/jira/browse/SPARK-9890
    
    document with Scala and java examples
    
    Author: Yuhao Yang <[email protected]>
    
    Closes #8487 from hhbyyh/cvDoc.
    
    (cherry picked from commit e2a843090cb031f6aa774f6d9c031a7f0f732ee1)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 0abbc181380e644374f4217ee84b76fae035aee2
Author: Luciano Resende <[email protected]>
Date:   2015-08-28T16:13:21Z

    [SPARK-8952] [SPARKR] - Wrap normalizePath calls with suppressWarnings
    
    This is based on davies comment on SPARK-8952 which suggests to only call 
normalizePath() when path starts with '~'
    
    Author: Luciano Resende <[email protected]>
    
    Closes #8343 from lresende/SPARK-8952.
    
    (cherry picked from commit 499e8e154bdcc9d7b2f685b159e0ddb4eae48fe4)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit ccda27a9beb97b11c2522a0700165fd849af44b1
Author: Josh Rosen <[email protected]>
Date:   2015-08-28T18:51:42Z

    [SPARK-10325] Override hashCode() for public Row
    
    This commit fixes an issue where the public SQL `Row` class did not 
override `hashCode`, causing it to violate the hashCode() + equals() contract. 
To fix this, I simply ported the `hashCode` implementation from the 1.4.x 
version of `Row`.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #8500 from JoshRosen/SPARK-10325 and squashes the following commits:
    
    51ffea1 [Josh Rosen] Override hashCode() for public Row.
    
    (cherry picked from commit d3f87dc39480f075170817bbd00142967a938078)
    Signed-off-by: Michael Armbrust <[email protected]>
    
    Conflicts:
        sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala

commit 9c58f6441e353fcb4402e1f36d826937d4801298
Author: Shuo Xiang <[email protected]>
Date:   2015-08-28T20:09:13Z

    [SPARK-10336][example] fix not being able to set intercept in LR example
    
    `fitIntercept` is a command line option but not set in the main program.
    
    dbtsai
    
    Author: Shuo Xiang <[email protected]>
    
    Closes #8510 from coderxiang/intercept and squashes the following commits:
    
    57c9b7d [Shuo Xiang] fix not being able to set intercept in LR example
    
    (cherry picked from commit 45723214e694b9a440723e9504c562e6393709f3)
    Signed-off-by: DB Tsai <[email protected]>

commit 7f014809de25d1d491c46e09fd88ef6c3d5d0e1b
Author: Xiangrui Meng <[email protected]>
Date:   2015-08-28T20:53:31Z

    [SPARK-9671] [MLLIB] re-org user guide and add migration guide
    
    This PR updates the MLlib user guide and adds migration guide for 1.4->1.5.
    
    * merge migration guide for `spark.mllib` and `spark.ml` packages
    * remove dependency section from `spark.ml` guide
    * move the paragraph about `spark.mllib` and `spark.ml` to the top and 
recommend `spark.ml`
    * move Sam's talk to footnote to make the section focus on dependencies
    
    Minor changes to code examples and other wording will be in a separate PR.
    
    jkbradley srowen feynmanliang
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #8498 from mengxr/SPARK-9671.
    
    (cherry picked from commit 88032ecaf0455886aed7a66b30af80dae7f6cff7)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 02e10d2df40e18a14c4c388c41699b5b258e57ac
Author: Davies Liu <[email protected]>
Date:   2015-08-28T21:38:20Z

    [SPARK-10323] [SQL] fix nullability of In/InSet/ArrayContain
    
    After this PR, In/InSet/ArrayContain will return null if value is null, 
instead of false. They also will return null even if there is a null in the 
set/array.
    
    Author: Davies Liu <[email protected]>
    
    Closes #8492 from davies/fix_in.
    
    (cherry picked from commit bb7f35239385ec74b5ee69631b5480fbcee253e4)
    Signed-off-by: Davies Liu <[email protected]>

commit df4a2e68ad4117ffadd63f5032ff8ca4972c2772
Author: Marcelo Vanzin <[email protected]>
Date:   2015-08-28T22:57:27Z

    [SPARK-10326] [YARN] Fix app submission on windows.
    
    Author: Marcelo Vanzin <[email protected]>
    
    Closes #8493 from vanzin/SPARK-10326.

commit b7aab1d1838bdffdf29923fc0f18eb04e582957e
Author: felixcheung <[email protected]>
Date:   2015-08-29T01:35:01Z

    [SPARK-9803] [SPARKR] Add subset and transform + tests
    
    Add subset and transform
    Also reorganize `[` & `[[` to subset instead of select
    
    Note: for transform, transform is very similar to mutate. Spark doesn't 
seem to replace existing column with the name in mutate (ie. `mutate(df, age = 
df$age + 2)` - returned DataFrame has 2 columns with the same name 'age'), so 
therefore not doing that for now in transform.
    Though it is clearly stated it should replace column with matching name 
(should I open a JIRA for mutate/transform?)
    
    Author: felixcheung <[email protected]>
    
    Closes #8503 from felixcheung/rsubset_transform.
    
    (cherry picked from commit 2a4e00ca4d4e7a148b4ff8ce0ad1c6d517cee55f)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit 69d856527d50d01624feaf1461af2d7bff03a668
Author: martinzapletal <[email protected]>
Date:   2015-08-29T04:03:48Z

    [SPARK-9910] [ML] User guide for train validation split
    
    Author: martinzapletal <[email protected]>
    
    Closes #8377 from zapletal-martin/SPARK-9910.
    
    (cherry picked from commit e8ea5bafee9ca734edf62021145d0c2d5491cba8)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit d17316f8b6286c0f8b40cfd86b14abebba6ac6af
Author: GuoQiang Li <[email protected]>
Date:   2015-08-29T20:20:22Z

    [SPARK-10350] [DOC] [SQL] Removed duplicated option description from SQL 
guide
    
    Author: GuoQiang Li <[email protected]>
    
    Closes #8520 from witgo/SPARK-10350.
    
    (cherry picked from commit 5369be806848f43cb87c76504258c4e7de930c90)
    Signed-off-by: Michael Armbrust <[email protected]>

commit a49ad67a5458fc88e7faefa50fa88783d8fbe3c6
Author: Michael Armbrust <[email protected]>
Date:   2015-08-29T20:26:01Z

    [SPARK-10344] [SQL] Add tests for extraStrategies
    
    Actually using this API requires access to a lot of classes that we might 
make private by accident.  I've added some tests to prevent this.
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #8516 from marmbrus/extraStrategiesTests.
    
    (cherry picked from commit 5c3d16a9b91bb9a458d3ba141f7bef525cf3d285)
    Signed-off-by: Yin Huai <[email protected]>

commit 7c65078948c48ed6339452191fcf71b564ad0e8d
Author: wangwei <[email protected]>
Date:   2015-08-29T20:29:50Z

    [SPARK-10226] [SQL] Fix exclamation mark issue in SparkSQL
    
    When I tested the latest version of spark with exclamation mark, I got some 
errors. Then I reseted the spark version and found that commit id 
"a2409d1c8e8ddec04b529ac6f6a12b5993f0eeda" brought the bug. With jline version 
changing from 0.9.94 to 2.12 after this commit, exclamation mark would be 
treated as a special character in ConsoleReader.
    
    Author: wangwei <[email protected]>
    
    Closes #8420 from small-wang/jline-SPARK-10226.
    
    (cherry picked from commit 277148b285748e863f2b9fdf6cf12963977f91ca)
    Signed-off-by: Michael Armbrust <[email protected]>

commit d178e1e77f6d19ae9dafc7b0e26ae5784b288e42
Author: Josh Rosen <[email protected]>
Date:   2015-08-29T20:36:25Z

    [SPARK-10330] Use SparkHadoopUtil TaskAttemptContext reflection methods in 
more places
    
    SparkHadoopUtil contains methods that use reflection to work around 
TaskAttemptContext binary incompatibilities between Hadoop 1.x and 2.x. We 
should use these methods in more places.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #8499 from JoshRosen/use-hadoop-reflection-in-more-places.
    
    (cherry picked from commit 6a6f3c91ee1f63dd464eb03d156d02c1a5887d88)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 3a61e103b8c480596fe38e558d7e4449ec0dc391
Author: Yin Huai <[email protected]>
Date:   2015-08-29T23:39:40Z

    [SPARK-10339] [SPARK-10334] [SPARK-10301] [SQL] Partitioned table scan can 
OOM driver and throw a better error message when users need to enable parquet 
schema merging
    
    This fixes the problem that scanning partitioned table causes driver have a 
high memory pressure and takes down the cluster. Also, with this fix, we will 
be able to correctly show the query plan of a query consuming partitioned 
tables.
    
    https://issues.apache.org/jira/browse/SPARK-10339
    https://issues.apache.org/jira/browse/SPARK-10334
    
    Finally, this PR squeeze in a "quick fix" for SPARK-10301. It is not a real 
fix, but it just throw a better error message to let user know what to do.
    
    Author: Yin Huai <[email protected]>
    
    Closes #8515 from yhuai/partitionedTableScan.
    
    (cherry picked from commit 097a7e36e0bf7290b1879331375bacc905583bd3)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 8071f6edff98e803e4ae6a07973fc0311aa6fef6
Author: Xiangrui Meng <[email protected]>
Date:   2015-08-30T06:26:23Z

    [SPARK-10348] [MLLIB] updates ml-guide
    
    * replace `ML Dataset` by `DataFrame` to unify the abstraction
    * ML algorithms -> pipeline components to describe the main concept
    * remove Scala API doc links from the main guide
    * `Section Title` -> `Section tile` to be consistent with other section 
titles in MLlib guide
    * modified lines break at 100 chars or periods
    
    jkbradley feynmanliang
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #8517 from mengxr/SPARK-10348.
    
    (cherry picked from commit 905fbe498bdd29116468628e6a2a553c1fd57165)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 1d40136ee68d8f2e008c7eed93495d6690ce40ba
Author: Xiangrui Meng <[email protected]>
Date:   2015-08-30T06:57:09Z

    [SPARK-10331] [MLLIB] Update example code in ml-guide
    
    * The example code was added in 1.2, before `createDataFrame`. This PR 
switches to `createDataFrame`. Java code still uses JavaBean.
    * assume `sqlContext` is available
    * fix some minor issues from previous code review
    
    jkbradley srowen feynmanliang
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #8518 from mengxr/SPARK-10331.
    
    (cherry picked from commit ca69fc8efda8a3e5442ffa16692a2b1eb86b7673)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 42a81a608be369fe94c3f7af61bd2281f3d1e6b9
Author: Burak Yavuz <[email protected]>
Date:   2015-08-30T19:21:15Z

    [SPARK-10353] [MLLIB] BLAS gemm not scaling when beta = 0.0 for some subset 
of matrix multiplications
    
    mengxr jkbradley rxin
    
    It would be great if this fix made it into RC3!
    
    Author: Burak Yavuz <[email protected]>
    
    Closes #8525 from brkyvz/blas-scaling.
    
    (cherry picked from commit 8d2ab75d3b71b632f2394f2453af32f417cb45e5)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit bf5b2f26b8f440ca734b06845d6e9c67cd28f4fd
Author: Xiangrui Meng <[email protected]>
Date:   2015-08-31T06:20:03Z

    [SPARK-100354] [MLLIB] fix some apparent memory issues in k-means|| 
initializaiton
    
    * do not cache first cost RDD
    * change following cost RDD cache level to MEMORY_AND_DISK
    * remove Vector wrapper to save a object per instance
    
    Further improvements will be addressed in SPARK-10329
    
    cc: yu-iskw HuJiayin
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #8526 from mengxr/SPARK-10354.
    
    (cherry picked from commit f0f563a3c43fc9683e6920890cce44611c0c5f4b)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 33ce274cdf7538b5816f1a400b2fad394ec2a056
Author: zsxwing <[email protected]>
Date:   2015-08-31T19:19:11Z

    [SPARK-10369] [STREAMING] Don't remove ReceiverTrackingInfo when 
deregisterReceivering since we may reuse it later
    
    `deregisterReceiver` should not remove `ReceiverTrackingInfo`. Otherwise, 
it will throw `java.util.NoSuchElementException: key not found` when restarting 
it.
    
    Author: zsxwing <[email protected]>
    
    Closes #8538 from zsxwing/SPARK-10369.
    
    (cherry picked from commit 4a5fe091658b1d06f427e404a11a84fc84f953c5)
    Signed-off-by: Tathagata Das <[email protected]>

commit 1c752b8b5c7090936b5c2ca94e8fb47c4f570d69
Author: Davies Liu <[email protected]>
Date:   2015-08-31T22:55:22Z

    [SPARK-10341] [SQL] fix memory starving in unsafe SMJ
    
    In SMJ, the first ExternalSorter could consume all the memory before 
spilling, then the second can not even acquire the first page.
    
    Before we have a better memory allocator, SMJ should call prepare() before 
call any compute() of it's children.
    
    cc rxin JoshRosen
    
    Author: Davies Liu <[email protected]>
    
    Closes #8511 from davies/smj_memory.
    
    (cherry picked from commit 540bdee93103a73736d282b95db6a8cda8f6a2b1)
    Signed-off-by: Reynold Xin <[email protected]>

commit 908e37bcc10132bb2aa7f80ae694a9df6e40f31a
Author: Patrick Wendell <[email protected]>
Date:   2015-08-31T22:57:42Z

    Preparing Spark release v1.5.0-rc3

commit 2b270a166d6bd5b42399400924c576c9996bfc10
Author: Patrick Wendell <[email protected]>
Date:   2015-08-31T22:57:49Z

    Preparing development version 1.5.1-SNAPSHOT

commit d19bccd872ccf22b43d3d1e66709413e8d44ec9d
Author: Sean Owen <[email protected]>
Date:   2015-09-01T19:06:01Z

    [SPARK-10398] [DOCS] Migrate Spark download page to use new lua mirroring 
scripts
    
    Migrate Apache download closer.cgi refs to new closer.lua
    
    This is the bit of the change that affects the project docs; I'm 
implementing the changes to the Apache site separately.
    
    Author: Sean Owen <[email protected]>
    
    Closes #8557 from srowen/SPARK-10398.
    
    (cherry picked from commit 3f63bd6023edcc9af268933a235f34e10bc3d2ba)
    Signed-off-by: Sean Owen <[email protected]>

commit 30efa96af8132cd0616859fdf440a5b50bdfad3b
Author: 0x0FFF <[email protected]>
Date:   2015-09-01T21:58:49Z

    [SPARK-10392] [SQL] Pyspark - Wrong DateType support on JDBC connection
    
    This PR addresses issue 
[SPARK-10392](https://issues.apache.org/jira/browse/SPARK-10392)
    The problem is that for "start of epoch" date (01 Jan 1970) PySpark class 
DateType returns 0 instead of the `datetime.date` due to implementation of its 
return statement
    
    Issue reproduction on master:
    ```
    >>> from pyspark.sql.types import *
    >>> a = DateType()
    >>> a.fromInternal(0)
    0
    >>> a.fromInternal(1)
    datetime.date(1970, 1, 2)
    ```
    
    Author: 0x0FFF <[email protected]>
    
    Closes #8556 from 0x0FFF/SPARK-10392.

commit 2fce5d880974c0a3cd07536d1ce226df8e635bb4
Author: Yin Huai <[email protected]>
Date:   2015-09-03T04:00:13Z

    [SPARK-10422] [SQL] String column in InMemoryColumnarCache needs to 
override clone method
    
    https://issues.apache.org/jira/browse/SPARK-10422
    
    Author: Yin Huai <[email protected]>
    
    Closes #8578 from yhuai/SPARK-10422.
    
    (cherry picked from commit 03f3e91ff21707d8a1c7057a00f1b1cd8b743e3f)
    Signed-off-by: Davies Liu <[email protected]>

commit b846a9dc3f74af235111b6313900016c6ccac1b9
Author: Davies Liu <[email protected]>
Date:   2015-09-03T05:15:54Z

    [SPARK-10379] preserve first page in UnsafeShuffleExternalSorter
    
    Author: Davies Liu <[email protected]>
    
    Closes #8543 from davies/preserve_page.
    
    (cherry picked from commit 62b4690d6b3016f41292b640ac28644ef31e299d)
    Signed-off-by: Andrew Or <[email protected]>

commit 94404ee53f382afae345ce2a30c0df657f00eee5
Author: zsxwing <[email protected]>
Date:   2015-09-03T05:17:39Z

    [SPARK-10411] [SQL] Move visualization above explain output and hide 
explain by default
    
    New screenshots after this fix:
    
    <img width="627" alt="s1" 
src="https://cloud.githubusercontent.com/assets/1000778/9625782/4b2dba36-518b-11e5-9104-c713ff026e3d.png";>
    
    Default:
    <img width="462" alt="s2" 
src="https://cloud.githubusercontent.com/assets/1000778/9625817/92366e50-518b-11e5-9981-cdfb774d66b8.png";>
    
    After clicking `+details`:
    <img width="377" alt="s3" 
src="https://cloud.githubusercontent.com/assets/1000778/9625784/4ba24342-518b-11e5-8522-846a16a95d44.png";>
    
    Author: zsxwing <[email protected]>
    
    Closes #8570 from zsxwing/SPARK-10411.
    
    (cherry picked from commit 0349b5b4383cf813bea4e1053bcc4e0268603743)
    Signed-off-by: Andrew Or <[email protected]>

commit f01a96713a6ebb580c83e88652bc6d361aaec6f4
Author: Holden Karau <[email protected]>
Date:   2015-09-03T08:30:54Z

    [SPARK-10332] [CORE] Fix yarn spark executor validation
    
    From Jira:
    Running spark-submit with yarn with number-executors equal to 0 when not 
using dynamic allocation should error out.
    In spark 1.5.0 it continues and ends up hanging.
    yarn.ClientArguments still has the check so something else must have 
changed.
    spark-submit --master yarn --deploy-mode cluster --class 
org.apache.spark.examples.SparkPi --num-executors 0 ....
    spark 1.4.1 errors with:
    java.lang.IllegalArgumentException:
    Number of executors was 0, but must be at least 1
    (or 0 if dynamic executor allocation is enabled).
    
    Author: Holden Karau <[email protected]>
    
    Closes #8580 from 
holdenk/SPARK-10332-spark-submit-to-yarn-executors-0-message.
    
    (cherry picked from commit 67580f1f574d272af3712fd91458f3c87368c2e4)
    Signed-off-by: Sean Owen <[email protected]>

commit f945b641c70790a82c864ec752b673b89bb4310f
Author: robbins <[email protected]>
Date:   2015-09-03T20:48:35Z

    [SPARK-9869] [STREAMING] Wait for all event notifications before asserting 
results
    
    Author: robbins <[email protected]>
    
    Closes #8589 from robbinspg/InputStreamSuite-fix.
    
    (cherry picked from commit 754f853b02e9fd221f138c2446445fd56e3f3fb3)
    Signed-off-by: Andrew Or <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Branch 1.5

Reply via email to