[GitHub] spark pull request #15753: Dev advait

advaitraut Thu, 03 Nov 2016 06:37:41 -0700

GitHub user advaitraut opened a pull request:

    https://github.com/apache/spark/pull/15753


    Dev advait

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Please review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before 
opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/advaitraut/spark dev-advait

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15753.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15753
    
----
commit 8950482ee5e9132d11dc5b5d41132bb1fe1e7ba2
Author: felixcheung <[email protected]>
Date:   2016-01-05T03:09:58Z

    [SPARKR][DOC] minor doc update for version in migration guide
    
    checked that the change is in Spark 1.6.0.
    shivaram
    
    Author: felixcheung <[email protected]>
    
    Closes #10574 from felixcheung/rwritemodedoc.
    
    (cherry picked from commit 8896ec9f02a6747917f3ae42a517ff0e3742eaf6)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit d9e4438b5c7b3569662a50973164955332463d05
Author: Michael Armbrust <[email protected]>
Date:   2016-01-05T07:23:41Z

    [SPARK-12568][SQL] Add BINARY to Encoders
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #10516 from marmbrus/datasetCleanup.
    
    (cherry picked from commit 53beddc5bf04a35ab73de99158919c2fdd5d4508)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 5afa62b20090e763ba10d9939ec214a11466087b
Author: Pete Robbins <[email protected]>
Date:   2016-01-05T21:10:21Z

    [SPARK-12647][SQL] Fix 
o.a.s.sqlexecution.ExchangeCoordinatorSuite.determining the number of reducers: 
aggregate operator
    
    change expected partition sizes
    
    Author: Pete Robbins <[email protected]>
    
    Closes #10599 from robbinspg/branch-1.6.

commit f31d0fd9ea12bfe94434671fbcfe3d0e06a4a97d
Author: Shixiong Zhu <[email protected]>
Date:   2016-01-05T21:10:46Z

    [SPARK-12617] [PYSPARK] Clean up the leak sockets of Py4J
    
    This patch added Py4jCallbackConnectionCleaner to clean the leak sockets of 
Py4J every 30 seconds. This is a workaround before Py4J fixes the leak issue 
https://github.com/bartdag/py4j/issues/187
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #10579 from zsxwing/SPARK-12617.
    
    (cherry picked from commit 047a31bb1042867b20132b347b1e08feab4562eb)
    Signed-off-by: Davies Liu <[email protected]>

commit 83fe5cf9a2621d7e53b5792a7c7549c9da7f130a
Author: Shixiong Zhu <[email protected]>
Date:   2016-01-05T21:48:47Z

    [SPARK-12511] [PYSPARK] [STREAMING] Make sure 
PythonDStream.registerSerializer is called only once
    
    There is an issue that Py4J's PythonProxyHandler.finalize blocks forever. 
(https://github.com/bartdag/py4j/pull/184)
    
    Py4j will create a PythonProxyHandler in Java for "transformer_serializer" 
when calling "registerSerializer". If we call "registerSerializer" twice, the 
second PythonProxyHandler will override the first one, then the first one will 
be GCed and trigger "PythonProxyHandler.finalize". To avoid that, we should not 
call"registerSerializer" more than once, so that "PythonProxyHandler" in Java 
side won't be GCed.
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #10514 from zsxwing/SPARK-12511.
    
    (cherry picked from commit 6cfe341ee89baa952929e91d33b9ecbca73a3ea0)
    Signed-off-by: Davies Liu <[email protected]>

commit 0afad6678431846a6eebda8d5891da9115884915
Author: RJ Nowling <[email protected]>
Date:   2016-01-05T23:05:04Z

    [SPARK-12450][MLLIB] Un-persist broadcasted variables in KMeans
    
    SPARK-12450 . Un-persist broadcasted variables in KMeans.
    
    Author: RJ Nowling <[email protected]>
    
    Closes #10415 from rnowling/spark-12450.
    
    (cherry picked from commit 78015a8b7cc316343e302eeed6fe30af9f2961e8)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit bf3dca2df4dd3be264691be1321e0c700d4f4e32
Author: BrianLondon <[email protected]>
Date:   2016-01-05T23:15:07Z

    [SPARK-12453][STREAMING] Remove explicit dependency on aws-java-sdk
    
    Successfully ran kinesis demo on a live, aws hosted kinesis stream against 
master and 1.6 branches.  For reasons I don't entirely understand it required a 
manual merge to 1.5 which I did as shown here: 
https://github.com/BrianLondon/spark/commit/075c22e89bc99d5e99be21f40e0d72154a1e23a2
    
    The demo ran successfully on the 1.5 branch as well.
    
    According to `mvn dependency:tree` it is still pulling a fairly old version 
of the aws-java-sdk (1.9.37), but this appears to have fixed the kinesis 
regression in 1.5.2.
    
    Author: BrianLondon <[email protected]>
    
    Closes #10492 from BrianLondon/remove-only.
    
    (cherry picked from commit ff89975543b153d0d235c0cac615d45b34aa8fe7)
    Signed-off-by: Sean Owen <[email protected]>

commit c3135d02176cdd679b4a0e4883895b9e9f001a55
Author: Yanbo Liang <[email protected]>
Date:   2016-01-06T06:35:41Z

    [SPARK-12393][SPARKR] Add read.text and write.text for SparkR
    
    Add ```read.text``` and ```write.text``` for SparkR.
    cc sun-rui felixcheung shivaram
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #10348 from yanboliang/spark-12393.
    
    (cherry picked from commit d1fea41363c175a67b97cb7b3fe89f9043708739)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit 175681914af953b7ce1b2971fef83a2445de1f94
Author: zero323 <[email protected]>
Date:   2016-01-06T19:58:33Z

    [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None
    
    If initial model passed to GMM is not empty it causes 
`net.razorvine.pickle.PickleException`. It can be fixed by converting 
`initialModel.weights` to `list`.
    
    Author: zero323 <[email protected]>
    
    Closes #9986 from zero323/SPARK-12006.
    
    (cherry picked from commit fcd013cf70e7890aa25a8fe3cb6c8b36bf0e1f04)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit d821fae0ecca6393d3632977797d72ba594d26a9
Author: Shixiong Zhu <[email protected]>
Date:   2016-01-06T20:03:01Z

    [SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming
    
    Move Py4jCallbackConnectionCleaner to Streaming because the callback server 
starts only in StreamingContext.
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #10621 from zsxwing/SPARK-12617-2.
    
    (cherry picked from commit 1e6648d62fb82b708ea54c51cd23bfe4f542856e)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit 8f0ead3e79beb2c5f2731ceaa34fe1c133763386
Author: huangzhaowei <[email protected]>
Date:   2016-01-06T20:48:57Z

    [SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default 
root path to gain the streaming batch url.
    
    Author: huangzhaowei <[email protected]>
    
    Closes #10617 from SaintBacchus/SPARK-12672.

commit 39b0a348008b6ab532768b90fd578b77711af98c
Author: Shixiong Zhu <[email protected]>
Date:   2016-01-06T21:53:25Z

    Revert "[SPARK-12672][STREAMING][UI] Use the uiRoot function instead of 
default root path to gain the streaming batch url."
    
    This reverts commit 8f0ead3e79beb2c5f2731ceaa34fe1c133763386. Will merge 
#10618 instead.

commit 11b901b22b1cdaa6d19b1b73885627ac601be275
Author: Liang-Chi Hsieh <[email protected]>
Date:   2015-12-14T17:59:42Z

    [SPARK-12016] [MLLIB] [PYSPARK] Wrap Word2VecModel when loading it in 
pyspark
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-12016
    
    We should not directly use Word2VecModel in pyspark. We need to wrap it in 
a Word2VecModelWrapper when loading it in pyspark.
    
    Author: Liang-Chi Hsieh <[email protected]>
    
    Closes #10100 from viirya/fix-load-py-wordvecmodel.
    
    (cherry picked from commit b51a4cdff3a7e640a8a66f7a9c17021f3056fd34)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 94af69c9be70b9d2cd95c26288e2af9599d61e5c
Author: jerryshao <[email protected]>
Date:   2016-01-07T05:28:29Z

    [SPARK-12673][UI] Add missing uri prepending for job description
    
    Otherwise the url will be failed to proxy to the right one if in YARN mode. 
Here is the screenshot:
    
    ![screen shot 2016-01-06 at 5 28 26 
pm](https://cloud.githubusercontent.com/assets/850797/12139632/bbe78ecc-b49c-11e5-8932-94e8b3622a09.png)
    
    Author: jerryshao <[email protected]>
    
    Closes #10618 from jerryshao/SPARK-12673.
    
    (cherry picked from commit 174e72ceca41a6ac17ad05d50832ee9c561918c0)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit d061b852274c12784f3feb96c0cdcab39989f8e7
Author: Guillaume Poulin <[email protected]>
Date:   2016-01-07T05:34:46Z

    [SPARK-12678][CORE] MapPartitionsRDD clearDependencies
    
    MapPartitionsRDD was keeping a reference to `prev` after a call to
    `clearDependencies` which could lead to memory leak.
    
    Author: Guillaume Poulin <[email protected]>
    
    Closes #10623 from gpoulin/map_partition_deps.
    
    (cherry picked from commit b6738520374637347ab5ae6c801730cdb6b35daa)
    Signed-off-by: Reynold Xin <[email protected]>

commit 34effc46cd54735cc660d8b43f0a190e91747a06
Author: Yin Huai <[email protected]>
Date:   2016-01-07T06:03:31Z

    Revert "[SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not 
None"
    
    This reverts commit fcd013cf70e7890aa25a8fe3cb6c8b36bf0e1f04.
    
    Author: Yin Huai <[email protected]>
    
    Closes #10632 from yhuai/pythonStyle.
    
    (cherry picked from commit e5cde7ab11a43334fa01b1bb8904da5c0774bc62)
    Signed-off-by: Yin Huai <[email protected]>

commit 47a58c799206d011587e03178a259974be47d3bc
Author: zzcclp <[email protected]>
Date:   2016-01-07T07:06:21Z

    [DOC] fix 'spark.memory.offHeap.enabled' default value to false
    
    modify 'spark.memory.offHeap.enabled' default value to false
    
    Author: zzcclp <[email protected]>
    
    Closes #10633 from zzcclp/fix_spark.memory.offHeap.enabled_default_value.
    
    (cherry picked from commit 84e77a15df18ba3f1cc871a3c52c783b46e52369)
    Signed-off-by: Reynold Xin <[email protected]>

commit 69a885a71cfe7c62179e784e7d9eee023d3bb6eb
Author: zero323 <[email protected]>
Date:   2016-01-07T18:32:56Z

    [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None
    
    If initial model passed to GMM is not empty it causes 
net.razorvine.pickle.PickleException. It can be fixed by converting 
initialModel.weights to list.
    
    Author: zero323 <[email protected]>
    
    Closes #10644 from zero323/SPARK-12006.
    
    (cherry picked from commit 592f64985d0d58b4f6a0366bf975e04ca496bdbe)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 017b73e69693cd151516f92640a95a4a66e02dff
Author: Sameer Agarwal <[email protected]>
Date:   2016-01-07T18:37:15Z

    [SPARK-12662][SQL] Fix DataFrame.randomSplit to avoid creating overlapping 
splits
    
    https://issues.apache.org/jira/browse/SPARK-12662
    
    cc yhuai
    
    Author: Sameer Agarwal <[email protected]>
    
    Closes #10626 from sameeragarwal/randomsplit.
    
    (cherry picked from commit f194d9911a93fc3a78be820096d4836f22d09976)
    Signed-off-by: Reynold Xin <[email protected]>

commit 6ef823544dfbc8c9843bdedccfda06147a1a74fe
Author: Darek Blasiak <[email protected]>
Date:   2016-01-07T21:15:40Z

    [SPARK-12598][CORE] bug in setMinPartitions
    
    There is a bug in the calculation of ```maxSplitSize```.  The 
```totalLen``` should be divided by ```minPartitions``` and not by 
```files.size```.
    
    Author: Darek Blasiak <[email protected]>
    
    Closes #10546 from datafarmer/setminpartitionsbug.
    
    (cherry picked from commit 8346518357f4a3565ae41e9a5ccd7e2c3ed6c468)
    Signed-off-by: Sean Owen <[email protected]>

commit a7c36362fb9532183b7b6a0ad5020f02b816a9b3
Author: Shixiong Zhu <[email protected]>
Date:   2016-01-08T01:37:46Z

    [SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and 
allowBatching configurations for Streaming
    
    /cc tdas brkyvz
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #10453 from zsxwing/streaming-conf.
    
    (cherry picked from commit c94199e977279d9b4658297e8108b46bdf30157b)
    Signed-off-by: Tathagata Das <[email protected]>

commit 0d96c54534d8bfca191c892b98397a176bc46152
Author: Shixiong Zhu <[email protected]>
Date:   2016-01-08T10:02:06Z

    [SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo (branch 
1.6)
    
    backport #10609 to branch 1.6
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #10656 from zsxwing/SPARK-12591-branch-1.6.

commit fe2cf342e2eddd7414bacf9f5702042a20c6d50f
Author: Jeff Zhang <[email protected]>
Date:   2016-01-08T19:38:46Z

    [DOCUMENTATION] doc fix of job scheduling
    
    spark.shuffle.service.enabled is spark application related configuration, 
it is not necessary to set it in yarn-site.xml
    
    Author: Jeff Zhang <[email protected]>
    
    Closes #10657 from zjffdu/doc-fix.
    
    (cherry picked from commit 00d9261724feb48d358679efbae6889833e893e0)
    Signed-off-by: Marcelo Vanzin <[email protected]>

commit e4227cb3e19afafe3a7b5a2847478681db2f2044
Author: Udo Klein <[email protected]>
Date:   2016-01-08T20:32:37Z

    fixed numVertices in transitive closure example
    
    Author: Udo Klein <[email protected]>
    
    Closes #10642 from udoklein/patch-2.
    
    (cherry picked from commit 8c70cb4c62a353bea99f37965dfc829c4accc391)
    Signed-off-by: Sean Owen <[email protected]>

commit faf094c7c35baf0e73290596d4ca66b7d083ed5b
Author: Thomas Graves <[email protected]>
Date:   2016-01-08T20:38:19Z

    [SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true failâ¦
    
    â¦s on secure Hadoop
    
    https://issues.apache.org/jira/browse/SPARK-12654
    
    So the bug here is that WholeTextFileRDD.getPartitions has:
    val conf = getConf
    in getConf if the cloneConf=true it creates a new Hadoop Configuration. 
Then it uses that to create a new newJobContext.
    The newJobContext will copy credentials around, but credentials are only 
present in a JobConf not in a Hadoop Configuration. So basically when it is 
cloning the hadoop configuration its changing it from a JobConf to 
Configuration and dropping the credentials that were there. NewHadoopRDD just 
uses the conf passed in for the getPartitions (not getConf) which is why it 
works.
    
    Author: Thomas Graves <[email protected]>
    
    Closes #10651 from tgravescs/SPARK-12654.
    
    (cherry picked from commit 553fd7b912a32476b481fd3f80c1d0664b6c6484)
    Signed-off-by: Tom Graves <[email protected]>

commit a6190508b20673952303eff32b3a559f0a264d03
Author: Michael Armbrust <[email protected]>
Date:   2016-01-08T23:43:11Z

    [SPARK-12696] Backport Dataset Bug fixes to 1.6
    
    We've fixed a lot of bugs in master, and since this is experimental in 1.6 
we should consider back porting the fixes.  The only thing that is obviously 
risky to me is 0e07ed3, we might try to remove that.
    
    Author: Wenchen Fan <[email protected]>
    Author: gatorsmile <[email protected]>
    Author: Liang-Chi Hsieh <[email protected]>
    Author: Cheng Lian <[email protected]>
    Author: Nong Li <[email protected]>
    
    Closes #10650 from marmbrus/dataset-backports.

commit 8b5f23043322254c725c703c618ba3d3cc4a4240
Author: Yanbo Liang <[email protected]>
Date:   2016-01-09T06:59:51Z

    [SPARK-12645][SPARKR] SparkR support hash function
    
    Add ```hash``` function for SparkR ```DataFrame```.
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #10597 from yanboliang/spark-12645.
    
    (cherry picked from commit 3d77cffec093bed4d330969f1a996f3358b9a772)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit 7903b0610283a91c47f5df1aab069cf8930b4f27
Author: Josh Rosen <[email protected]>
Date:   2016-01-10T22:49:45Z

    [SPARK-10359][PROJECT-INFRA] Backport dev/test-dependencies script to 
branch-1.6
    
    This patch backports the `dev/test-dependencies` script (from #10461) to 
branch-1.6.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #10680 from JoshRosen/test-deps-16-backport.

commit 43b72d83e1d0c426d00d29e54ab7d14579700330
Author: Josh Rosen <[email protected]>
Date:   2016-01-11T08:36:52Z

    [SPARK-12734][BUILD] Backport Netty exclusion + Maven enforcer fixes to 
branch-1.6
    
    This patch backports the Netty exclusion fixes from #10672 to branch-1.6.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #10691 from JoshRosen/netty-exclude-16-backport.

commit d4cfd2acd62f2b0638a12bbbb48a38263c04eaf8
Author: Udo Klein <[email protected]>
Date:   2016-01-11T09:30:08Z

    removed lambda from sortByKey()
    
    According to the documentation the sortByKey method does not take a lambda 
as an argument, thus the example is flawed. Removed the argument completely as 
this will default to ascending sort.
    
    Author: Udo Klein <[email protected]>
    
    Closes #10640 from udoklein/patch-1.
    
    (cherry picked from commit bd723bd53d9a28239b60939a248a4ea13340aad8)
    Signed-off-by: Sean Owen <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #15753: Dev advait

Reply via email to