[GitHub] spark pull request: Branch 1.6

JamieZZZ Fri, 12 Feb 2016 11:38:32 -0800

GitHub user JamieZZZ opened a pull request:

    https://github.com/apache/spark/pull/11185


    Branch 1.6

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.6

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11185.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11185
    
----
commit 8f784b8642441d00f12835736109b2560eab0de6
Author: Tathagata Das <[email protected]>
Date:   2015-12-04T09:42:29Z

    [SPARK-12122][STREAMING] Prevent batches from being submitted twice after 
recovering StreamingContext from checkpoint
    
    Author: Tathagata Das <[email protected]>
    
    Closes #10127 from tdas/SPARK-12122.
    
    (cherry picked from commit 4106d80fb6a16713a6cd2f15ab9d60f2527d9be5)
    Signed-off-by: Tathagata Das <[email protected]>

commit 3fd757c8896df8cc3b184522c8d11da0be5ebbc3
Author: Nong <[email protected]>
Date:   2015-12-04T18:01:20Z

    [SPARK-12089] [SQL] Fix memory corrupt due to freeing a page being 
referenced
    
    When the spillable sort iterator was spilled, it was mistakenly keeping
    the last page in memory rather than the current page. This causes the
    current record to get corrupted.
    
    Author: Nong <[email protected]>
    
    Closes #10142 from nongli/spark-12089.
    
    (cherry picked from commit 95296d9b1ad1d9e9396d7dfd0015ef27ce1cf341)
    Signed-off-by: Davies Liu <[email protected]>

commit 39d5cc8adbb09e2d76fe85ccd51c3ffcf3d5b9f5
Author: Burak Yavuz <[email protected]>
Date:   2015-12-04T20:08:42Z

    [SPARK-12058][STREAMING][KINESIS][TESTS] fix Kinesis python tests
    
    Python tests require access to the `KinesisTestUtils` file. When this file 
exists under src/test, python can't access it, since it is not available in the 
assembly jar.
    
    However, if we move KinesisTestUtils to src/main, we need to add the 
KinesisProducerLibrary as a dependency. In order to avoid this, I moved 
KinesisTestUtils to src/main, and extended it with ExtendedKinesisTestUtils 
which is under src/test that adds support for the KPL.
    
    cc zsxwing tdas
    
    Author: Burak Yavuz <[email protected]>
    
    Closes #10050 from brkyvz/kinesis-py.

commit 57d16403edcb4f770174404f8ed7f5697e4fdc26
Author: Sun Rui <[email protected]>
Date:   2015-12-05T23:49:51Z

    [SPARK-11774][SPARKR] Implement struct(), encode(), decode() functions in 
SparkR.
    
    Author: Sun Rui <[email protected]>
    
    Closes #9804 from sun-rui/SPARK-11774.
    
    (cherry picked from commit c8d0e160dadf3b23c5caa379ba9ad5547794eaa0)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit 664694b289a7847807a2be022985c9ed39dbe142
Author: felixcheung <[email protected]>
Date:   2015-12-06T00:00:12Z

    [SPARK-11715][SPARKR] Add R support corr for Column Aggregration
    
    Need to match existing method signature
    
    Author: felixcheung <[email protected]>
    
    Closes #9680 from felixcheung/rcorr.
    
    (cherry picked from commit 895b6c474735d7e0a38283f92292daa5c35875ee)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit 04dfaa6d58bd9ce18a141a976a4a96218e5ee9e0
Author: Yanbo Liang <[email protected]>
Date:   2015-12-06T00:39:01Z

    [SPARK-12115][SPARKR] Change numPartitions() to getNumPartitions() to be 
consistent with Scala/Python
    
    Change ```numPartitions()``` to ```getNumPartitions()``` to be consistent 
with Scala/Python.
    <del>Note: If we can not catch up with 1.6 release, it will be breaking 
change for 1.7 that we also need to explain in release note.<del>
    
    cc sun-rui felixcheung shivaram
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #10123 from yanboliang/spark-12115.
    
    (cherry picked from commit 6979edf4e1a93caafa8d286692097dd377d7616d)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit 2feac49fbca2e2f309c857f10511be2b2c1948cc
Author: Yanbo Liang <[email protected]>
Date:   2015-12-06T06:51:05Z

    [SPARK-12044][SPARKR] Fix usage of isnan, isNaN
    
    1, Add ```isNaN``` to ```Column``` for SparkR. ```Column``` should has 
three related variable functions: ```isNaN, isNull, isNotNull```.
    2, Replace ```DataFrame.isNaN``` with ```DataFrame.isnan``` at SparkR side. 
Because ```DataFrame.isNaN``` has been deprecated and will be removed at Spark 
2.0.
    <del>3, Add ```isnull``` to ```DataFrame``` for SparkR. ```DataFrame``` 
should has two related functions: ```isnan, isnull```.<del>
    
    cc shivaram sun-rui felixcheung
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #10037 from yanboliang/spark-12044.
    
    (cherry picked from commit b6e8e63a0dbe471187a146c96fdaddc6b8a8e55e)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit c8747a9db718deefa5f61cc4dc692c439d4d5ab6
Author: gcc <[email protected]>
Date:   2015-12-06T16:27:40Z

    [SPARK-12048][SQL] Prevent to close JDBC resources twice
    
    Author: gcc <[email protected]>
    
    Closes #10101 from rh99/master.
    
    (cherry picked from commit 04b6799932707f0a4aa4da0f2fc838bdb29794ce)
    Signed-off-by: Sean Owen <[email protected]>

commit 82a71aba043a0b1ed50168d2b5b312c79b8c8fa3
Author: gatorsmile <[email protected]>
Date:   2015-12-06T19:15:02Z

    [SPARK-12138][SQL] Escape \u in the generated comments of codegen
    
    When \u appears in a comment block (i.e. in /**/), code gen will break. So, 
in Expression and CodegenFallback, we escape \u to \\u.
    
    yhuai Please review it. I did reproduce it and it works after the fix. 
Thanks!
    
    Author: gatorsmile <[email protected]>
    
    Closes #10155 from gatorsmile/escapeU.
    
    (cherry picked from commit 49efd03bacad6060d99ed5e2fe53ba3df1d1317e)
    Signed-off-by: Yin Huai <[email protected]>

commit c54b698ecc284bce9b80c40ba46008bd6321c812
Author: Burak Yavuz <[email protected]>
Date:   2015-12-07T08:21:55Z

    [SPARK-12106][STREAMING][FLAKY-TEST] BatchedWAL test transiently flaky when 
Jenkins load is high
    
    We need to make sure that the last entry is indeed the last entry in the 
queue.
    
    Author: Burak Yavuz <[email protected]>
    
    Closes #10110 from brkyvz/batch-wal-test-fix.
    
    (cherry picked from commit 6fd9e70e3ed43836a0685507fff9949f921234f4)
    Signed-off-by: Tathagata Das <[email protected]>

commit 3f230f7b331cf6d67426cece570af3f1340f526e
Author: Sun Rui <[email protected]>
Date:   2015-12-07T18:38:17Z

    [SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases.
    
    This PR:
    1. Suppress all known warnings.
    2. Cleanup test cases and fix some errors in test cases.
    3. Fix errors in HiveContext related test cases. These test cases are 
actually not run previously due to a bug of creating TestHiveContext.
    4. Support 'testthat' package version 0.11.0 which prefers that test cases 
be under 'tests/testthat'
    5. Make sure the default Hadoop file system is local when running test 
cases.
    6. Turn on warnings into errors.
    
    Author: Sun Rui <[email protected]>
    
    Closes #10030 from sun-rui/SPARK-12034.
    
    (cherry picked from commit 39d677c8f1ee7ebd7e142bec0415cf8f90ac84b6)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit fed453821d81470b9035d33e36fa6ef1df99c0de
Author: Davies Liu <[email protected]>
Date:   2015-12-07T19:00:25Z

    [SPARK-12132] [PYSPARK] raise KeyboardInterrupt inside SIGINT handler
    
    Currently, the current line is not cleared by Cltr-C
    
    After this patch
    ```
    >>> asdfasdf^C
    Traceback (most recent call last):
      File "~/spark/python/pyspark/context.py", line 225, in signal_handler
        raise KeyboardInterrupt()
    KeyboardInterrupt
    ```
    
    It's still worse than 1.5 (and before).
    
    Author: Davies Liu <[email protected]>
    
    Closes #10134 from davies/fix_cltrc.
    
    (cherry picked from commit ef3f047c07ef0ac4a3a97e6bc11e1c28c6c8f9a0)
    Signed-off-by: Davies Liu <[email protected]>

commit 539914f1a8d3a0f59e67c178f86e741927e7a658
Author: Tathagata Das <[email protected]>
Date:   2015-12-07T19:03:59Z

    [SPARK-11932][STREAMING] Partition previous TrackStateRDD if partitioner 
not present
    
    The reason is that TrackStateRDDs generated by trackStateByKey expect the 
previous batch's TrackStateRDDs to have a partitioner. However, when recovery 
from DStream checkpoints, the RDDs recovered from RDD checkpoints do not have a 
partitioner attached to it. This is because RDD checkpoints do not preserve the 
partitioner (SPARK-12004).
    
    While #9983 solves SPARK-12004 by preserving the partitioner through RDD 
checkpoints, there may be a non-zero chance that the saving and recovery fails. 
To be resilient, this PR repartitions the previous state RDD if the partitioner 
is not detected.
    
    Author: Tathagata Das <[email protected]>
    
    Closes #9988 from tdas/SPARK-11932.
    
    (cherry picked from commit 5d80d8c6a54b2113022eff31187e6d97521bd2cf)
    Signed-off-by: Tathagata Das <[email protected]>

commit c8aa5f2011cf30a360d5206ee45202c4b1d61e21
Author: Xusen Yin <[email protected]>
Date:   2015-12-07T21:16:47Z

    [SPARK-11963][DOC] Add docs for QuantileDiscretizer
    
    https://issues.apache.org/jira/browse/SPARK-11963
    
    Author: Xusen Yin <[email protected]>
    
    Closes #9962 from yinxusen/SPARK-11963.
    
    (cherry picked from commit 871e85d9c14c6b19068cc732951a8ae8db61b411)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit cdeb89b34614fb39062976c4796d187992333c88
Author: Andrew Ray <[email protected]>
Date:   2015-12-07T23:01:00Z

    [SPARK-12184][PYTHON] Make python api doc for pivot consistant with scala 
doc
    
    In SPARK-11946 the API for pivot was changed a bit and got updated doc, the 
doc changes were not made for the python api though. This PR updates the python 
doc to be consistent.
    
    Author: Andrew Ray <[email protected]>
    
    Closes #10176 from aray/sql-pivot-python-doc.
    
    (cherry picked from commit 36282f78b888743066843727426c6d806231aa97)
    Signed-off-by: Yin Huai <[email protected]>

commit 115bfbdae82b1c2804ea501ffd420d0aa17aac45
Author: Joseph K. Bradley <[email protected]>
Date:   2015-12-08T00:37:09Z

    [SPARK-12160][MLLIB] Use SQLContext.getOrCreate in MLlib
    
    Switched from using SQLContext constructor to using getOrCreate, mainly in 
model save/load methods.
    
    This covers all instances in spark.mllib.  There were no uses of the 
constructor in spark.ml.
    
    CC: mengxr yhuai
    
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #10161 from jkbradley/mllib-sqlcontext-fix.
    
    (cherry picked from commit 3e7e05f5ee763925ed60410d7de04cf36b723de1)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 3c683ed5ffe704a6fec7c6d434eeed784276470d
Author: somideshmukh <[email protected]>
Date:   2015-12-08T07:26:34Z

    [SPARK-11551][DOC][EXAMPLE] Replace example code in ml-features.md using 
include_example
    
    Made new patch contaning only markdown examples moved to exmaple/folder.
    Ony three  java code were not shfted since they were contaning compliation 
error ,these classes are
    1)StandardScale 2)NormalizerExample 3)VectorIndexer
    
    Author: Xusen Yin <[email protected]>
    Author: somideshmukh <[email protected]>
    
    Closes #10002 from somideshmukh/SomilBranch1.33.
    
    (cherry picked from commit 78209b0ccaf3f22b5e2345dfb2b98edfdb746819)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 8652fc03c21f79b41ce13f41991feba11fc7b29c
Author: Takahashi Hiroshi <[email protected]>
Date:   2015-12-08T07:46:55Z

    [SPARK-10259][ML] Add @since annotation to ml.classification
    
    Add since annotation to ml.classification
    
    Author: Takahashi Hiroshi <[email protected]>
    
    Closes #8534 from taishi-oss/issue10259.
    
    (cherry picked from commit 7d05a624510f7299b3dd07f87c203db1ff7caa3e)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 5c8216920b4110d8fc4329e1fe52543ee17c4a54
Author: Yanbo Liang <[email protected]>
Date:   2015-12-08T07:50:57Z

    [SPARK-11958][SPARK-11957][ML][DOC] SQLTransformer user guide and example 
code
    
    Add ```SQLTransformer``` user guide, example code and make Scala API doc 
more clear.
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #10006 from yanboliang/spark-11958.
    
    (cherry picked from commit 4a39b5a1bee28cec792d509654f6236390cafdcb)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit c9e5274ae3d8e6967bee240ec0b7ba17cd15d34e
Author: cody koeninger <[email protected]>
Date:   2015-12-08T11:02:35Z

    [SPARK-12103][STREAMING][KAFKA][DOC] document that K means Key and V â¦
    
    â¦means Value
    
    Author: cody koeninger <[email protected]>
    
    Closes #10132 from koeninger/SPARK-12103.
    
    (cherry picked from commit 48a9804b2ad89b3fb204c79f0dbadbcfea15d8dc)
    Signed-off-by: Sean Owen <[email protected]>

commit 870f435628b7c0eac5f6c45fa19b14ab5289c657
Author: Jeff Zhang <[email protected]>
Date:   2015-12-08T11:05:06Z

    [SPARK-12166][TEST] Unset hadoop related environment in testing
    
    Author: Jeff Zhang <[email protected]>
    
    Closes #10172 from zjffdu/SPARK-12166.
    
    (cherry picked from commit 708129187a460aca30790281e9221c0cd5e271df)
    Signed-off-by: Sean Owen <[email protected]>

commit 8a791a3273039602f91ae311b612eeaeca10ddc7
Author: Cheng Lian <[email protected]>
Date:   2015-12-08T11:18:59Z

    [SPARK-11551][DOC][EXAMPLE] Revert PR #10002
    
    This reverts PR #10002, commit 78209b0ccaf3f22b5e2345dfb2b98edfdb746819.
    
    The original PR wasn't tested on Jenkins before being merged.
    
    Author: Cheng Lian <[email protected]>
    
    Closes #10200 from liancheng/revert-pr-10002.
    
    (cherry picked from commit da2012a0e152aa078bdd19a5c7f91786a2dd7016)
    Signed-off-by: Cheng Lian <[email protected]>

commit c8f9eb749afb825b99a04b0e8f1e9311c5c6c944
Author: Sean Owen <[email protected]>
Date:   2015-12-08T14:34:47Z

    [SPARK-11652][CORE] Remote code execution with InvokerTransformer
    
    Fix commons-collection group ID to commons-collections for version 3.x
    
    Patches earlier PR at https://github.com/apache/spark/pull/9731
    
    Author: Sean Owen <[email protected]>
    
    Closes #10198 from srowen/SPARK-11652.2.
    
    (cherry picked from commit e3735ce1602826f0a8e0ca9e08730923843449ee)
    Signed-off-by: Sean Owen <[email protected]>

commit 8ef33aa1f6d3dc8772c9277a5372a991765af1b3
Author: Wenchen Fan <[email protected]>
Date:   2015-12-08T18:13:40Z

    [SPARK-12201][SQL] add type coercion rule for greatest/least
    
    checked with hive, greatest/least should cast their children to a tightest 
common type,
    i.e. `(int, long) => long`, `(int, string) => error`, `(decimal(10,5), 
decimal(5, 10)) => error`
    
    Author: Wenchen Fan <[email protected]>
    
    Closes #10196 from cloud-fan/type-coercion.
    
    (cherry picked from commit 381f17b540d92507cc07adf18bce8bc7e5ca5407)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 9eeb0f25abd4bd528a5363fda60b1cd1eb34d05b
Author: gatorsmile <[email protected]>
Date:   2015-12-08T18:15:58Z

    [SPARK-12195][SQL] Adding BigDecimal, Date and Timestamp into Encoder
    
    This PR is to add three more data types into Encoder, including 
`BigDecimal`, `Date` and `Timestamp`.
    
    marmbrus cloud-fan rxin Could you take a quick look at these three types? 
Not sure if it can be merged to 1.6. Thank you very much!
    
    Author: gatorsmile <[email protected]>
    
    Closes #10188 from gatorsmile/dataTypesinEncoder.
    
    (cherry picked from commit c0b13d5565c45ae2acbe8cfb17319c92b6a634e4)
    Signed-off-by: Michael Armbrust <[email protected]>

commit be0fe9b450f1bb87b9ce2e0ea153dc496d66a664
Author: gatorsmile <[email protected]>
Date:   2015-12-08T18:25:57Z

    [SPARK-12188][SQL] Code refactoring and comment correction in Dataset APIs
    
    This PR contains the following updates:
    
    - Created a new private variable `boundTEncoder` that can be shared by 
multiple functions, `RDD`, `select` and `collect`.
    - Replaced all the `queryExecution.analyzed` by the function call 
`logicalPlan`
    - A few API comments are using wrong class names (e.g., `DataFrame`) or 
parameter names (e.g., `n`)
    - A few API descriptions are wrong. (e.g., `mapPartitions`)
    
    marmbrus rxin cloud-fan Could you take a look and check if they are 
appropriate? Thank you!
    
    Author: gatorsmile <[email protected]>
    
    Closes #10184 from gatorsmile/datasetClean.
    
    (cherry picked from commit 5d96a710a5ed543ec81e383620fc3b2a808b26a1)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 1c8451b5e71508b974db965502db6dc3b1b4b4c0
Author: Yuhao Yang <[email protected]>
Date:   2015-12-08T18:29:51Z

    [SPARK-10393] use ML pipeline in LDA example
    
    jira: https://issues.apache.org/jira/browse/SPARK-10393
    
    Since the logic of the text processing part has been moved to ML 
estimators/transformers, replace the related code in LDA Example with the ML 
pipeline.
    
    Author: Yuhao Yang <[email protected]>
    Author: yuhaoyang <[email protected]>
    
    Closes #8551 from hhbyyh/ldaExUpdate.
    
    (cherry picked from commit 872a2ee281d84f40a786f765bf772cdb06e8c956)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 9145bfb814c5f53c5e9c0de7e0d6b7aca99c7341
Author: Andrew Ray <[email protected]>
Date:   2015-12-08T18:52:17Z

    [SPARK-12205][SQL] Pivot fails Analysis when aggregate is UnresolvedFunction
    
    Delays application of ResolvePivot until all aggregates are resolved to 
prevent problems with UnresolvedFunction and adds unit test
    
    Author: Andrew Ray <[email protected]>
    
    Closes #10202 from aray/sql-pivot-unresolved-function.
    
    (cherry picked from commit 4bcb894948c1b7294d84e2bf58abb1d79e6759c6)
    Signed-off-by: Yin Huai <[email protected]>

commit 7e45feb005966f6cdf66c4d19223286acf92cc28
Author: Yuhao Yang <[email protected]>
Date:   2015-12-08T19:46:26Z

    [SPARK-11605][MLLIB] ML 1.6 QA: API: Java compatibility, docs
    
    jira: https://issues.apache.org/jira/browse/SPARK-11605
    Check Java compatibility for MLlib for this release.
    
    fix:
    
    1. `StreamingTest.registerStream` needs java friendly interface.
    
    2. `GradientBoostedTreesModel.computeInitialPredictionAndError` and 
`GradientBoostedTreesModel.updatePredictionError` has java compatibility issue. 
Mark them as `developerAPI`.
    
    TBD:
    [updated] no fix for now per discussion.
    `org.apache.spark.mllib.classification.LogisticRegressionModel`
    `public scala.Option<java.lang.Object> getThreshold();` has wrong return 
type for Java invocation.
    `SVMModel` has the similar issue.
    
    Yet adding a `scala.Option<java.util.Double> getThreshold()` would result 
in an overloading error due to the same function signature. And adding a new 
function with different name seems to be not necessary.
    
    cc jkbradley feynmanliang
    
    Author: Yuhao Yang <[email protected]>
    
    Closes #10102 from hhbyyh/javaAPI.
    
    (cherry picked from commit 5cb4695051e3dac847b1ea14d62e54dcf672c31c)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 3e31e7e245dba2c16be044e2f13b786e8608bd07
Author: BenFradet <[email protected]>
Date:   2015-12-08T20:45:34Z

    [SPARK-12159][ML] Add user guide section for IndexToString transformer
    
    Documentation regarding the `IndexToString` label transformer with code 
snippets in Scala/Java/Python.
    
    Author: BenFradet <[email protected]>
    
    Closes #10166 from BenFradet/SPARK-12159.
    
    (cherry picked from commit 06746b3005e5e9892d0314bee3bfdfaebc36d3d4)
    Signed-off-by: Joseph K. Bradley <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Branch 1.6

Reply via email to