[GitHub] spark pull request: Branch 1.4

liumingning Mon, 22 Feb 2016 05:27:52 -0800

GitHub user liumingning opened a pull request:

    https://github.com/apache/spark/pull/11302


    Branch 1.4

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    
    ## How was the this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.4

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11302.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11302
    
----
commit 3a62569afb8fcd3d1610b4ede0f2c5e595acb9b9
Author: Shivaram Venkataraman <[email protected]>
Date:   2015-06-11T20:22:08Z

    [SPARK-8310] [EC2] Update spark-ec2 branch to 1.4
    
    cc pwendell  -- We should probably update our release guidelines to change 
this when we cut a release branch ?
    
    Author: Shivaram Venkataraman <[email protected]>
    
    Closes #6765 from shivaram/SPARK-8310-14 and squashes the following commits:
    
    066e44e [Shivaram Venkataraman] Update spark-ec2 branch to 1.4

commit 8b25f62bf19b02042675aa1d4e4b58cc4deb3e26
Author: Marcelo Vanzin <[email protected]>
Date:   2015-06-11T22:29:03Z

    [SPARK-6511] [docs] Fix example command in hadoop-provided docs.
    
    Author: Marcelo Vanzin <[email protected]>
    
    Closes #6766 from vanzin/SPARK-6511 and squashes the following commits:
    
    49f0f67 [Marcelo Vanzin] [SPARK-6511] [docs] Fix example command in 
hadoop-provided docs.
    
    (cherry picked from commit 9cbdf31ec1399d4d43a1863c15688ce78b6dfd92)
    Signed-off-by: Reynold Xin <[email protected]>

commit 141eab71ee3aa05da899ecfc6bae40b3798a4665
Author: Mark Smith <[email protected]>
Date:   2015-06-12T17:28:30Z

    [SPARK-8322] [EC2] Added spark 1.4.0 into the VALID_SPARK_VERSIONS andâ¦
    
    â¦ SPARK_TACHYON_MAP
    
    Author: Mark Smith <[email protected]>
    
    Closes #6777 from markmsmith/branch-1.4 and squashes the following commits:
    
    a218cfa [Mark Smith] [SPARK-8322][EC2] Fixed tachyon mapp entry to point to 
0.6.4
    90d1655 [Mark Smith] [SPARK-8322][EC2] Added spark 1.4.0 into the 
VALID_SPARK_VERSIONS and SPARK_TACHYON_MAP

commit 76083734196a7571de314df79e88759b650ed1f3
Author: Andrew Or <[email protected]>
Date:   2015-06-12T18:14:55Z

    [SPARK-8330] DAG visualization: trim whitespace from input
    
    Safeguard against DOM rewriting.
    
    Author: Andrew Or <[email protected]>
    
    Closes #6787 from andrewor14/dag-viz-trim and squashes the following 
commits:
    
    0fb4afe [Andrew Or] Trim input metadata from DOM
    
    (cherry picked from commit 88604051511c788d7abb41a49e3eb3a8330c09a9)
    Signed-off-by: Andrew Or <[email protected]>

commit 7c11ccf3913ac6a5d178994704d8b0983829b43b
Author: Tathagata Das <[email protected]>
Date:   2015-06-12T22:22:59Z

    [SPARK-7284] [STREAMING] Updated streaming documentation
    
    - Kinesis API updated
    - Kafka version updated, and Python API for Direct Kafka added
    - Added SQLContext.getOrCreate()
    - Added information on how to get partitionId in foreachRDD
    
    Author: Tathagata Das <[email protected]>
    
    Closes #6781 from tdas/SPARK-7284 and squashes the following commits:
    
    aac7be0 [Tathagata Das] Added information on how to get partition id
    a66ec22 [Tathagata Das] Complete the line incomplete line,
    a92ca39 [Tathagata Das] Updated streaming documentation
    
    (cherry picked from commit e9471d3414d327c7d0853e18f1844ab1bd09c8ed)
    Signed-off-by: Tathagata Das <[email protected]>

commit 1ca431e83f070f9737b4cc3b7918188ad5dd3d36
Author: Michael Armbrust <[email protected]>
Date:   2015-06-13T06:11:16Z

    [SPARK-8329][SQL] Allow _ in DataSource options
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #6786 from marmbrus/optionsParser and squashes the following commits:
    
    e7d18ef [Michael Armbrust] add dots
    99a3452 [Michael Armbrust] [SPARK-8329][SQL] Allow _ in DataSource options
    
    (cherry picked from commit 4aed66f299a67f5a594da9316b6bf4c345838216)
    Signed-off-by: Reynold Xin <[email protected]>

commit 187a3d5385e778c188d0c1c2adc755ac2d25e8e8
Author: Mike Dusenberry <[email protected]>
Date:   2015-06-14T04:22:46Z

    [Spark-8343] [Streaming] [Docs] Improve Spark Streaming Guides.
    
    This improves the Spark Streaming Guides by fixing broken links, rewording 
confusing sections, fixing typos, adding missing words, etc.
    
    Author: Mike Dusenberry <[email protected]>
    
    Closes #6801 from 
dusenberrymw/SPARK-8343_Improve_Spark_Streaming_Guides_MERGED and squashes the 
following commits:
    
    6688090 [Mike Dusenberry] Improvements to the Spark Streaming Custom 
Receiver Guide, including slight rewording of confusing sections, and fixing 
typos & missing words.
    436fbd8 [Mike Dusenberry] Bunch of improvements to the Spark Streaming 
Guide, including fixing broken links, slight rewording of confusing sections, 
fixing typos & missing words, etc.
    
    (cherry picked from commit 35d1267cf8e918032c92a206b22bb301bf0c806e)
    Signed-off-by: Reynold Xin <[email protected]>

commit 4634be5a7db4f2fd82cfb5c602b79129d1d9e246
Author: Josh Rosen <[email protected]>
Date:   2015-06-14T16:34:35Z

    [SPARK-8354] [SQL] Fix off-by-factor-of-8 error when allocating scratch 
space in UnsafeFixedWidthAggregationMap
    
    UnsafeFixedWidthAggregationMap contains an off-by-factor-of-8 error when 
allocating row conversion scratch space: we take a size requirement, measured 
in bytes, then allocate a long array of that size.  This means that we end up 
allocating 8x too much conversion space.
    
    This patch fixes this by allocating a `byte[]` array instead.  This doesn't 
impose any new limitations on the maximum sizes of UnsafeRows, since 
UnsafeRowConverter already used integers when calculating the size requirements 
for rows.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #6809 from JoshRosen/sql-bytes-vs-words-fix and squashes the 
following commits:
    
    6520339 [Josh Rosen] Updates to reflect fact that UnsafeRow max size is 
constrained by max byte[] size
    
    (cherry picked from commit ea7fd2ff6454e8d819a39bf49901074e49b5714e)
    Signed-off-by: Josh Rosen <[email protected]>

commit 2805d145e30e4cabd11a7d33c4f80edbc54cc54a
Author: Michael Armbrust <[email protected]>
Date:   2015-06-14T18:21:42Z

    [SPARK-8358] [SQL] Wait for child resolution when resolving generators
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #6811 from marmbrus/aliasExplodeStar and squashes the following 
commits:
    
    fbd2065 [Michael Armbrust] more style
    806a373 [Michael Armbrust] fix style
    7cbb530 [Michael Armbrust] [SPARK-8358][SQL] Wait for child resolution when 
resolving generatorsa
    
    (cherry picked from commit 9073a426e444e4bc6efa8608e54e0a986f38a270)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 0ffbf085190b9d4dc13a8b6545e4e1022083bd35
Author: Peter Hoffmann <[email protected]>
Date:   2015-06-14T18:41:16Z

    fix read/write mixup
    
    Author: Peter Hoffmann <[email protected]>
    
    Closes #6815 from hoffmann/patch-1 and squashes the following commits:
    
    2abb6da [Peter Hoffmann] fix read/write mixup
    
    (cherry picked from commit f3f2a4397da164f0ddfa5d60bf441099296c4346)
    Signed-off-by: Reynold Xin <[email protected]>

commit fff8d7ee6c7e88ed96c29260480e8228e7fb1435
Author: tedyu <[email protected]>
Date:   2015-06-16T00:00:38Z

    SPARK-8336 Fix NullPointerException with functions.rand()
    
    This PR fixes the problem reported by Justin Yip in the thread 
'NullPointerException with functions.rand()'
    
    Tested using spark-shell and verified that the following works:
    sqlContext.createDataFrame(Seq((1,2), (3, 100))).withColumn("index", 
rand(30)).show()
    
    Author: tedyu <[email protected]>
    
    Closes #6793 from tedyu/master and squashes the following commits:
    
    62fd97b [tedyu] Create RandomSuite
    750f92c [tedyu] Add test for Rand() with seed
    a1d66c5 [tedyu] Fix NullPointerException with functions.rand()
    
    (cherry picked from commit 1a62d61696a0481508d83a07d19ab3701245ac20)
    Signed-off-by: Reynold Xin <[email protected]>

commit f287f7ea141fa7a3e9f8b7d3a2180b63cd77088d
Author: huangzhaowei <[email protected]>
Date:   2015-06-16T06:16:09Z

    [SPARK-8367] [STREAMING] Add a limit for 'spark.streaming.blockInterval` 
since a data loss bug.
    
    Bug had reported in the jira 
[SPARK-8367](https://issues.apache.org/jira/browse/SPARK-8367)
    The relution is limitting the configuration `spark.streaming.blockInterval` 
to a positive number.
    
    Author: huangzhaowei <[email protected]>
    Author: huangzhaowei <[email protected]>
    
    Closes #6818 from SaintBacchus/SPARK-8367 and squashes the following 
commits:
    
    c9d1927 [huangzhaowei] Update BlockGenerator.scala
    bd3f71a [huangzhaowei] Use requre instead of if
    3d17796 [huangzhaowei] [SPARK_8367][Streaming]Add a limit for 
'spark.streaming.blockInterval' since a data loss bug.
    
    (cherry picked from commit ccf010f27bc62f7e7f409c6eef7488ab476de609)
    Signed-off-by: Sean Owen <[email protected]>

commit 1378bdc4a9a974b40c7c509f4af7f07bdc892e14
Author: Moussa Taifi <[email protected]>
Date:   2015-06-16T19:59:22Z

    [SPARK-DOCS] [SPARK-SQL] Update sql-programming-guide.md
    
    Typo in thriftserver section
    
    Author: Moussa Taifi <[email protected]>
    
    Closes #6847 from moutai/patch-1 and squashes the following commits:
    
    1bd29df [Moussa Taifi] Update sql-programming-guide.md
    
    (cherry picked from commit dc455b88330f79b1181a585277ea9ed3e0763703)
    Signed-off-by: Sean Owen <[email protected]>

commit 4da068650800bdf1fa488790049993896d0edc32
Author: Radek Ostrowski <[email protected]>
Date:   2015-06-16T20:04:26Z

    [SQL] [DOC] improved a comment
    
    [SQL][DOC] I found it a bit confusing when I came across it for the first 
time in the docs
    
    Author: Radek Ostrowski <[email protected]>
    Author: radek <[email protected]>
    
    Closes #6332 from radek1st/master and squashes the following commits:
    
    dae3347 [Radek Ostrowski] fixed typo
    c76bb3a [radek] improved a comment
    
    (cherry picked from commit 4bd10fd5090fb5f4f139267b82e9f2fc15659796)
    Signed-off-by: Sean Owen <[email protected]>

commit b9e5d3cadd0f07c211623b045466220c39abdc56
Author: Marcelo Vanzin <[email protected]>
Date:   2015-06-16T20:10:18Z

    [SPARK-8126] [BUILD] Make sure temp dir exists when running tests.
    
    If you ran "clean" at the top-level sbt project, the temp dir would
    go away, so running "test" without restarting sbt would fail. This
    fixes that by making sure the temp dir exists before running tests.
    
    Author: Marcelo Vanzin <[email protected]>
    
    Closes #6805 from vanzin/SPARK-8126-fix and squashes the following commits:
    
    12d7768 [Marcelo Vanzin] [SPARK-8126] [build] Make sure temp dir exists 
when running tests.
    
    (cherry picked from commit cebf2411847706a98dc8df9c754ef53d6d12a87c)
    Signed-off-by: Sean Owen <[email protected]>

commit 15d973f2d9c2512dd5a882b6b65fb494de526643
Author: Yanbo Liang <[email protected]>
Date:   2015-06-16T21:30:30Z

    [SPARK-7916] [MLLIB] MLlib Python doc parity check for classification and 
regression
    
    Check then make the MLlib Python classification and regression doc to be as 
complete as the Scala doc.
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #6460 from yanboliang/spark-7916 and squashes the following commits:
    
    f8deda4 [Yanbo Liang] trigger jenkins
    6dc4d99 [Yanbo Liang] address comments
    ce2a43e [Yanbo Liang] truncate too long line and remove extra sparse
    3eaf6ad [Yanbo Liang] MLlib Python doc parity check for classification and 
regression
    
    (cherry picked from commit ca998757e8ff2bdca2c7e88055c389161521d604)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 877deb046862bff8200c517674f9e1100ab09b9a
Author: Punya Biswal <[email protected]>
Date:   2015-06-17T05:31:49Z

    Fix break introduced by backport
    
    rxin this is the fix you requested for the break introduced by backporting 
#6793
    
    Author: Punya Biswal <[email protected]>
    
    Closes #6850 from punya/feature/fix-backport-break and squashes the 
following commits:
    
    fdc3693 [Punya Biswal] Fix break introduced by backport

commit a5f602efcffea3da03f0cf828045b4e1b862fde8
Author: Vyacheslav Baranov <[email protected]>
Date:   2015-06-17T08:42:29Z

    [SPARK-8309] [CORE] Support for more than 12M items in OpenHashMap
    
    The problem occurs because the position mask `0xEFFFFFF` is incorrect. It 
has zero 25th bit, so when capacity grows beyond 2^24, `OpenHashMap` calculates 
incorrect index of value in `_values` array.
    
    I've also added a size check in `rehash()`, so that it fails instead of 
reporting invalid item indices.
    
    Author: Vyacheslav Baranov <[email protected]>
    
    Closes #6763 from SlavikBaranov/SPARK-8309 and squashes the following 
commits:
    
    8557445 [Vyacheslav Baranov] Resolved review comments
    4d5b954 [Vyacheslav Baranov] Resolved review comments
    eaf1e68 [Vyacheslav Baranov] Fixed failing test
    f9284fd [Vyacheslav Baranov] Resolved review comments
    3920656 [Vyacheslav Baranov] SPARK-8309: Support for more than 12M items in 
OpenHashMap
    
    (cherry picked from commit c13da20a55b80b8632d547240d2c8f97539969a1)
    Signed-off-by: Sean Owen <[email protected]>

commit 320c4420b9cf5d1a4669dc3bb63c63f43dcd9079
Author: Sean Owen <[email protected]>
Date:   2015-06-17T20:31:10Z

    [SPARK-8395] [DOCS] start-slave.sh docs incorrect
    
    start-slave.sh no longer takes a worker # param in 1.4+
    
    Author: Sean Owen <[email protected]>
    
    Closes #6855 from srowen/SPARK-8395 and squashes the following commits:
    
    300278e [Sean Owen] start-slave.sh no longer takes a worker # param in 1.4+
    
    (cherry picked from commit f005be02730db315e2a6d4dbecedfd2562b9ef1f)
    Signed-off-by: Andrew Or <[email protected]>

commit a7f6979d0fecec948c25427bdeb01b4fe296ca41
Author: Punya Biswal <[email protected]>
Date:   2015-06-17T20:37:20Z

    [SPARK-7515] [DOC] Update documentation for PySpark on YARN with cluster 
mode
    
    Now PySpark on YARN with cluster mode is supported so let's update doc.
    
    Author: Kousuke Saruta <sarutakoss.nttdata.co.jp>
    
    Closes #6040 from sarutak/update-doc-for-pyspark-on-yarn and squashes the 
following commits:
    
    ad9f88c [Kousuke Saruta] Brushed up sentences
    469fd2e [Kousuke Saruta] Merge branch 'master' of 
https://github.com/apache/spark into update-doc-for-pyspark-on-yarn
    fcfdb92 [Kousuke Saruta] Updated doc for PySpark on YARN with cluster mode
    
    Author: Punya Biswal <[email protected]>
    Author: Kousuke Saruta <[email protected]>
    
    Closes #6842 from punya/feature/SPARK-7515 and squashes the following 
commits:
    
    0b83648 [Punya Biswal] Merge remote-tracking branch 'origin/branch-1.4' 
into feature/SPARK-7515
    de025cd [Kousuke Saruta] [SPARK-7515] [DOC] Update documentation for 
PySpark on YARN with cluster mode

commit d75c53d88d4d8d176975e499788a43dda2a62476
Author: Mingfei <[email protected]>
Date:   2015-06-17T20:40:07Z

    [SPARK-8161] Set externalBlockStoreInitialized to be true, after 
ExternalBlockStore is initialized
    
    externalBlockStoreInitialized is never set to be true, which causes the 
blocks stored in ExternalBlockStore can not be removed.
    
    Author: Mingfei <[email protected]>
    
    Closes #6702 from shimingfei/SetTrue and squashes the following commits:
    
    add61d8 [Mingfei] Set externalBlockStoreInitialized to be true, after 
ExternalBlockStore is initialized
    
    (cherry picked from commit 7ad8c5d869555b1bf4b50eafdf80e057a0175941)
    Signed-off-by: Andrew Or <[email protected]>

commit f0513733d4f6fc34f86feffd3062600cbbd56a28
Author: Carson Wang <[email protected]>
Date:   2015-06-17T20:41:36Z

    [SPARK-8372] History server shows incorrect information for application not 
started
    
    The history server may show an incorrect App ID for an incomplete 
application like <App ID>.inprogress. This app info will never disappear even 
after the app is completed.
    
![incorrectappinfo](https://cloud.githubusercontent.com/assets/9278199/8156147/2a10fdbe-137d-11e5-9620-c5b61d93e3c1.png)
    
    The cause of the issue is that a log path name is used as the app id when 
app id cannot be got during replay.
    
    Author: Carson Wang <[email protected]>
    
    Closes #6827 from carsonwang/SPARK-8372 and squashes the following commits:
    
    cdbb089 [Carson Wang] Fix code style
    3e46b35 [Carson Wang] Update code style
    90f5dde [Carson Wang] Add a unit test
    d8c9cd0 [Carson Wang] Replaying events only return information when app is 
started
    
    (cherry picked from commit 2837e067099921dd4ab6639ac5f6e89f789d4ff4)
    Signed-off-by: Andrew Or <[email protected]>

commit 5e7973df0ec21c4fd8ae0a26290088def231d26c
Author: zsxwing <[email protected]>
Date:   2015-06-17T20:59:39Z

    [SPARK-8373] [PYSPARK] Add emptyRDD to pyspark and fix the issue when 
calling sum on an empty RDD
    
    This PR fixes the sum issue and also adds `emptyRDD` so that it's easy to 
create a test case.
    
    Author: zsxwing <[email protected]>
    
    Closes #6826 from zsxwing/python-emptyRDD and squashes the following 
commits:
    
    b36993f [zsxwing] Update the return type to JavaRDD[T]
    71df047 [zsxwing] Add emptyRDD to pyspark and fix the issue when calling 
sum on an empty RDD
    
    (cherry picked from commit 0fc4b96f3e3bf81724ac133a6acc97c1b77271b4)
    Signed-off-by: Andrew Or <[email protected]>

commit 5aedfa2ceb5f9a9d22994a5709f663ee6d9a607e
Author: zsxwing <[email protected]>
Date:   2015-06-17T22:00:03Z

    [SPARK-8404] [STREAMING] [TESTS] Use thread-safe collections to make the 
tests more reliable
    
    KafkaStreamSuite, DirectKafkaStreamSuite, JavaKafkaStreamSuite and 
JavaDirectKafkaStreamSuite use non-thread-safe collections to collect data in 
one thread and check it in another thread. It may fail the tests.
    
    This PR changes them to thread-safe collections.
    
    Note: I cannot reproduce the test failures in my environment. But at least, 
this PR should make the tests more reliable.
    
    Author: zsxwing <[email protected]>
    
    Closes #6852 from zsxwing/fix-KafkaStreamSuite and squashes the following 
commits:
    
    d464211 [zsxwing] Use thread-safe collections to make the tests more 
reliable
    
    (cherry picked from commit a06d9c8e76bb904d48764802aa3affff93b00baa)
    Signed-off-by: Tathagata Das <[email protected]>

commit 73cf5def0687bbe556542646e2b1bd569c59cd59
Author: Yin Huai <[email protected]>
Date:   2015-06-17T21:52:43Z

    [SPARK-8306] [SQL] AddJar command needs to set the new class loader to the 
HiveConf inside executionHive.state.
    
    https://issues.apache.org/jira/browse/SPARK-8306
    
    I will try to add a test later.
    
    marmbrus aarondav
    
    Author: Yin Huai <[email protected]>
    
    Closes #6758 from yhuai/SPARK-8306 and squashes the following commits:
    
    1292346 [Yin Huai] [SPARK-8306] AddJar command needs to set the new class 
loader to the HiveConf inside executionHive.state.
    
    (cherry picked from commit 302556ff999ba9a1960281de6932e0d904197204)
    Signed-off-by: Michael Armbrust <[email protected]>
    
    Conflicts:
        
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala

commit 67ad12d793a8f0f8137d0a2e0c0d80bd1b5284f2
Author: xutingjun <[email protected]>
Date:   2015-06-18T05:31:01Z

    [SPARK-8392] RDDOperationGraph: getting cached nodes is slow
    
    ```def getAllNodes: Seq[RDDOperationNode] =
    { _childNodes ++ _childClusters.flatMap(_.childNodes) }```
    
    when the ```_childClusters``` has so many nodes, the process will hang on. 
I think we can improve the efficiency here.
    
    Author: xutingjun <[email protected]>
    
    Closes #6839 from XuTingjun/DAGImprove and squashes the following commits:
    
    53b03ea [xutingjun] change code to more concise and easier to read
    f98728b [xutingjun] fix words: node -> nodes
    f87c663 [xutingjun] put the filter inside
    81f9fd2 [xutingjun] put the filter inside
    
    (cherry picked from commit e2cdb0568b14df29bbdb1ee9a13ee361c9ddad9c)
    Signed-off-by: Andrew Or <[email protected]>

commit 9dabc129368aba7c1255328974bf849b4c3340c2
Author: Burak Yavuz <[email protected]>
Date:   2015-06-18T05:33:37Z

    [SPARK-8095] Resolve dependencies of --packages in local ivy cache
    
    Dependencies of artifacts in the local ivy cache were not being resolved 
properly. The dependencies were not being picked up. Now they should be.
    
    cc andrewor14
    
    Author: Burak Yavuz <[email protected]>
    
    Closes #6788 from brkyvz/local-ivy-fix and squashes the following commits:
    
    2875bf4 [Burak Yavuz] fix temp dir bug
    48cc648 [Burak Yavuz] improve deletion
    a69e3e6 [Burak Yavuz] delete cache before test as well
    0037197 [Burak Yavuz] fix merge conflicts
    f60772c [Burak Yavuz] use different folder for m2 cache during testing
    b6ef038 [Burak Yavuz] [SPARK-8095] Resolve dependencies of Spark Packages 
in local ivy cache
    
    Conflicts:
        core/src/test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala

commit ca23c3b0147de9bcc22e3b9c7b74d20df6402137
Author: Davies Liu <[email protected]>
Date:   2015-06-18T20:45:58Z

    [SPARK-8202] [PYSPARK] fix infinite loop during external sort in PySpark
    
    The batch size during external sort will grow up to max 10000, then shrink 
down to zero, causing infinite loop.
    Given the assumption that the items usually have similar size, so we don't 
need to adjust the batch size after first spill.
    
    cc JoshRosen rxin angelini
    
    Author: Davies Liu <[email protected]>
    
    Closes #6714 from davies/batch_size and squashes the following commits:
    
    b170dfb [Davies Liu] update test
    b9be832 [Davies Liu] Merge branch 'batch_size' of github.com:davies/spark 
into batch_size
    6ade745 [Davies Liu] update test
    5c21777 [Davies Liu] Update shuffle.py
    e746aec [Davies Liu] fix batch size during sort

commit c1da5cf02983d04257f3a3b666a7755de1f79b36
Author: Josh Rosen <[email protected]>
Date:   2015-06-18T22:10:09Z

    [SPARK-8353] [DOCS] Show anchor links when hovering over documentation 
headers
    
    This patch uses [AnchorJS](https://bryanbraun.github.io/anchorjs/) to show 
deep anchor links when hovering over headers in the Spark documentation. For 
example:
    
    
![image](https://cloud.githubusercontent.com/assets/50748/8240800/1502f85c-15ba-11e5-819a-97b231370a39.png)
    
    This makes it easier for users to link to specific sections of the 
documentation.
    
    I also removed some dead Javascript which isn't used in our current docs 
(it was introduced for the old AMPCamp training, but isn't used anymore).
    
    Author: Josh Rosen <[email protected]>
    
    Closes #6808 from JoshRosen/SPARK-8353 and squashes the following commits:
    
    e59d8a7 [Josh Rosen] Suppress underline on hover
    f518b6a [Josh Rosen] Turn on for all headers, since we use H1s in a bunch 
of places
    a9fec01 [Josh Rosen] Add anchor links when hovering over headers; remove 
some dead JS code
    
    (cherry picked from commit 44c931f006194a833f09517c9e35fb3cdf5852b1)
    Signed-off-by: Josh Rosen <[email protected]>

commit 9f293a9eb69d4dac13683edcbd7286a56696cbbb
Author: zsxwing <[email protected]>
Date:   2015-06-18T23:00:27Z

    [SPARK-8376] [DOCS] Add common lang3 to the Spark Flume Sink doc
    
    Commons Lang 3 has been added as one of the dependencies of Spark Flume 
Sink since #5703. This PR updates the doc for it.
    
    Author: zsxwing <[email protected]>
    
    Closes #6829 from zsxwing/flume-sink-dep and squashes the following commits:
    
    f8617f0 [zsxwing] Add common lang3 to the Spark Flume Sink doc
    
    (cherry picked from commit 24e53793b4b100317d59ea16acb42f55d10a9575)
    Signed-off-by: Tathagata Das <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Branch 1.4

Reply via email to