[GitHub] spark pull request: Branch 1.4

SeekerResource Sun, 21 Jun 2015 22:01:39 -0700

GitHub user SeekerResource opened a pull request:

    https://github.com/apache/spark/pull/6930


    Branch 1.4

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.4

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/6930.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6930
    
----
commit 51d98b0e97c97a7eca2d4ff2fc14b9cfe9af9e2f
Author: MechCoder <[email protected]>
Date:   2015-05-26T20:21:00Z

    [SPARK-7844] [MLLIB] Fix broken tests in KernelDensity
    
    The densities in KernelDensity are scaled down by
    (number of parallel processes X number of points). It should be just no.of 
samples. This results in broken tests in KernelDensitySuite which haven't been 
tested properly.
    
    Author: MechCoder <[email protected]>
    
    Closes #6383 from MechCoder/spark-7844 and squashes the following commits:
    
    ab81302 [MechCoder] Math->math
    9b8ed50 [MechCoder] Make one pass to update count
    a92fe50 [MechCoder] [SPARK-7844] Fix broken tests in KernelDensity
    
    (cherry picked from commit 61664732b25b35f94be35a42cde651cbfd0e02b7)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit d014a447a361f65db8986e839e2f9a7a7a356f57
Author: Shivaram Venkataraman <[email protected]>
Date:   2015-05-26T22:01:27Z

    [SPARK-3674] YARN support in Spark EC2
    
    This corresponds to https://github.com/mesos/spark-ec2/pull/116 in the 
spark-ec2 repo. The only changes required on the spark_ec2.py script is to open 
the RM port.
    
    cc andrewor14
    
    Author: Shivaram Venkataraman <[email protected]>
    
    Closes #6376 from shivaram/spark-ec2-yarn and squashes the following 
commits:
    
    961504a [Shivaram Venkataraman] Merge branch 'master' of 
https://github.com/apache/spark into spark-ec2-yarn
    152c94c [Shivaram Venkataraman] Open 8088 for YARN in EC2
    
    (cherry picked from commit 2e9a5f229e1a2ccffa74fa59fa6a55b2704d9c1a)
    Signed-off-by: Andrew Or <[email protected]>

commit b5ee7eefdb4162d46eaf538ef38019a775d0c70a
Author: Xiangrui Meng <[email protected]>
Date:   2015-05-26T22:51:31Z

    [SPARK-7748] [MLLIB] Graduate spark.ml from alpha
    
    With descent coverage of feature transformers, algorithms, and model tuning 
support, it is time to graduate `spark.ml` from alpha. This PR changes all 
`AlphaComponent` annotations to either `DeveloperApi` or `Experimental`, 
depending on whether we expect a class/method to be used by end users (who use 
the pipeline API to assemble/tune their ML pipelines but not to create new 
pipeline components.) `UnaryTransformer` becomes a `DeveloperApi` in this PR.
    
    jkbradley harsha2010
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #6417 from mengxr/SPARK-7748 and squashes the following commits:
    
    effbccd [Xiangrui Meng] organize imports
    c15028e [Xiangrui Meng] added missing docs
    1b2e5f8 [Xiangrui Meng] update package doc
    73ca791 [Xiangrui Meng] alpha -> ex/dev for the rest
    93819db [Xiangrui Meng] alpha -> ex/dev in ml.param
    55ca073 [Xiangrui Meng] alpha -> ex/dev in ml.feature
    83572f1 [Xiangrui Meng] add Experimental and DeveloperApi tags (wip)
    
    (cherry picked from commit 836a75898fdc4b10d4d00676ef29e24cc96f09fd)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit f9dfa4d0f075fe44f09e5aa52c7ab5b0515c9e69
Author: Andrew Or <[email protected]>
Date:   2015-05-26T23:31:34Z

    [SPARK-7864] [UI] Do not kill innocent stages from visualization
    
    **Reproduction.** Run a long-running job, go to the job page, expand the 
DAG visualization, and click into a stage. Your stage is now killed. Why? This 
is because the visualization code just reaches into the stage table and grabs 
the first link it finds. In our case, this first link happens to be the kill 
link instead of the one to the stage page.
    
    **Fix.** Use proper CSS selectors to avoid ambiguity.
    
    This is an alternative to #6407. Thanks carsonwang for catching this.
    
    Author: Andrew Or <[email protected]>
    
    Closes #6419 from andrewor14/fix-ui-viz-kill and squashes the following 
commits:
    
    25203bd [Andrew Or] Do not kill innocent stages
    
    (cherry picked from commit 8f2082426828c15704426ebca1d015bf956c6841)
    Signed-off-by: Andrew Or <[email protected]>

commit 311fcf67e0229f7664e8ee8ee0da4966ccb979f4
Author: Mike Dusenberry <[email protected]>
Date:   2015-05-27T01:08:57Z

    [SPARK-7883] [DOCS] [MLLIB] Fixing broken trainImplicit Scala example in 
MLlib Collaborative Filtering documentation.
    
    Fixing broken trainImplicit Scala example in MLlib Collaborative Filtering 
documentation to match one of the possible ALS.trainImplicit function 
signatures.
    
    Author: Mike Dusenberry <[email protected]>
    
    Closes #6422 from 
dusenberrymw/Fix_MLlib_Collab_Filtering_trainImplicit_Example and squashes the 
following commits:
    
    36492f4 [Mike Dusenberry] Fixing broken trainImplicit example in MLlib 
Collaborative Filtering documentation to match one of the possible 
ALS.trainImplicit function signatures.
    
    (cherry picked from commit 0463428b6e8f364f0b1f39445a60cd85ae7c07bc)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit faadbd4d99c51eabf22277430b5e3939b1606cdb
Author: Josh Rosen <[email protected]>
Date:   2015-05-27T03:24:35Z

    [SPARK-7858] [SQL] Use output schema, not relation schema, for data source 
input conversion
    
    In `DataSourceStrategy.createPhysicalRDD`, we use the relation schema as 
the target schema for converting incoming rows into Catalyst rows.  However, we 
should be using the output schema instead, since our scan might return a subset 
of the relation's columns.
    
    This patch incorporates #6414 by liancheng, which fixes an issue in 
`SimpleTestRelation` that prevented this bug from being caught by our old tests:
    
    > In `SimpleTextRelation`, we specified `needsConversion` to `true`, 
indicating that values produced by this testing relation should be of Scala 
types, and need to be converted to Catalyst types when necessary. However, we 
also used `Cast` to convert strings to expected data types. And `Cast` always 
produces values of Catalyst types, thus no conversion is done at all. This PR 
makes `SimpleTextRelation` produce Scala values so that data conversion code 
paths can be properly tested.
    
    Closes #5986.
    
    Author: Josh Rosen <[email protected]>
    Author: Cheng Lian <[email protected]>
    Author: Cheng Lian <[email protected]>
    
    Closes #6400 from JoshRosen/SPARK-7858 and squashes the following commits:
    
    e71c866 [Josh Rosen] Re-fix bug so that the tests pass again
    56b13e5 [Josh Rosen] Add regression test to hadoopFsRelationSuites
    2169a0f [Josh Rosen] Remove use of SpecificMutableRow and BufferedIterator
    6cd7366 [Josh Rosen] Fix SPARK-7858 by using output types for conversion.
    5a00e66 [Josh Rosen] Add assertions in order to reproduce SPARK-7858
    8ba195c [Cheng Lian] Merge 9968fba9979287aaa1f141ba18bfb9d4c116a3b3 into 
61664732b25b35f94be35a42cde651cbfd0e02b7
    9968fba [Cheng Lian] Tests the data type conversion code paths
    
    (cherry picked from commit 0c33c7b4a66e47f6246f1b7f2b96f2c33126ec63)
    Signed-off-by: Yin Huai <[email protected]>

commit d0bd68ff8a1dcfbff8e6d40573ca049d208ab2de
Author: Cheng Lian <[email protected]>
Date:   2015-05-27T03:48:56Z

    [SPARK-7868] [SQL] Ignores _temporary directories in HadoopFsRelation
    
    So that potential partial/corrupted data files left by failed tasks/jobs 
won't affect normal data scan.
    
    Author: Cheng Lian <[email protected]>
    
    Closes #6411 from liancheng/spark-7868 and squashes the following commits:
    
    273ea36 [Cheng Lian] Ignores _temporary directories
    
    (cherry picked from commit b463e6d618e69c535297e51f41eca4f91bd33cc8)
    Signed-off-by: Yin Huai <[email protected]>

commit 34e233f9ce8d5fa616ce981e0e842b4026fb9824
Author: Xiangrui Meng <[email protected]>
Date:   2015-05-27T06:51:32Z

    [SPARK-7535] [.1] [MLLIB] minor changes to the pipeline API
    
    1. removed `Params.validateParams(extra)`
    2. added `Evaluate.evaluate(dataset, paramPairs*)`
    3. updated `RegressionEvaluator` doc
    
    jkbradley
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #6392 from mengxr/SPARK-7535.1 and squashes the following commits:
    
    5ff5af8 [Xiangrui Meng] add unit test for CV.validateParams
    f1f8369 [Xiangrui Meng] update CV.validateParams() to test 
estimatorParamMaps
    607445d [Xiangrui Meng] merge master
    8716f5f [Xiangrui Meng] specify default metric name in RegressionEvaluator
    e4e5631 [Xiangrui Meng] update RegressionEvaluator doc
    801e864 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into 
SPARK-7535.1
    fcbd3e2 [Xiangrui Meng] Merge branch 'master' into SPARK-7535.1
    2192316 [Xiangrui Meng] remove validateParams(extra); add evaluate(dataset, 
extra*)
    
    (cherry picked from commit a9f1c0c57b9be586dbada09dab91dcfce31141d9)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 4e12cec8a18631b6de5d9bb6a4467178444d66a9
Author: Cheolsoo Park <[email protected]>
Date:   2015-05-27T07:18:42Z

    [SPARK-7850][BUILD] Hive 0.12.0 profile in POM should be removed
    
    I grep'ed hive-0.12.0 in the source code and removed all the profiles and 
doc references.
    
    Author: Cheolsoo Park <[email protected]>
    
    Closes #6393 from piaozhexiu/SPARK-7850 and squashes the following commits:
    
    fb429ce [Cheolsoo Park] Remove hive-0.13.1 profile
    82bf09a [Cheolsoo Park] Remove hive 0.12.0 shim code
    f3722da [Cheolsoo Park] Remove hive-0.12.0 profile and references from POM 
and build docs
    
    (cherry picked from commit 6dd645870d34d97ac992032bfd6cf39f20a0c50f)
    Signed-off-by: Reynold Xin <[email protected]>

commit 01c3ef536d60f21f4a63c76ffe4dad2fecaa797e
Author: Liang-Chi Hsieh <[email protected]>
Date:   2015-05-27T07:27:39Z

    [SPARK-7697][SQL] Use LongType for unsigned int in JDBCRDD
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-7697
    
    The reported problem case is mysql. But for h2 db, there is no unsigned 
int. So it is not able to add corresponding test.
    
    Author: Liang-Chi Hsieh <[email protected]>
    
    Closes #6229 from viirya/unsignedint_as_long and squashes the following 
commits:
    
    dc4b5d8 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' 
into unsignedint_as_long
    608695b [Liang-Chi Hsieh] Use LongType for unsigned int in JDBCRDD.
    
    (cherry picked from commit 4f98d7a7f1715273bc91f1903bb7e0f287cc7394)
    Signed-off-by: Reynold Xin <[email protected]>

commit e5357132baa471604c189b06a10344c38b4a3fec
Author: Reynold Xin <[email protected]>
Date:   2015-05-27T08:13:57Z

    [SQL] Rename MathematicalExpression UnaryMathExpression, and specify 
BinaryMathExpression's output data type as DoubleType.
    
    Two minor changes.
    
    cc brkyvz
    
    Author: Reynold Xin <[email protected]>
    
    Closes #6428 from rxin/math-func-cleanup and squashes the following commits:
    
    5910df5 [Reynold Xin] [SQL] Rename MathematicalExpression 
UnaryMathExpression, and specify BinaryMathExpression's output data type as 
DoubleType.
    
    (cherry picked from commit 3e7d7d6b3d6e07b52b1a138f7aa2ef628597fe05)
    Signed-off-by: Reynold Xin <[email protected]>

commit 90525c9ba1ca8528567ea30e611511251d55f685
Author: scwf <[email protected]>
Date:   2015-05-27T14:12:18Z

    [CORE] [TEST] HistoryServerSuite failed due to timezone issue
    
    follow up for #6377
    Change time to the equivalent in GMT
    /cc squito
    
    Author: scwf <[email protected]>
    
    Closes #6425 from scwf/fix-HistoryServerSuite and squashes the following 
commits:
    
    4d37935 [scwf] fix HistoryServerSuite
    
    (cherry picked from commit 4615081d7a10b023491e25478d19b8161e030974)
    Signed-off-by: Imran Rashid <[email protected]>

commit a25ce91f9685604cfb567a6860182ba467ceed8d
Author: Cheng Lian <[email protected]>
Date:   2015-05-27T17:09:12Z

    [SPARK-7847] [SQL] Fixes dynamic partition directory escaping
    
    Please refer to [SPARK-7847] [1] for details.
    
    [1]: https://issues.apache.org/jira/browse/SPARK-7847
    
    Author: Cheng Lian <[email protected]>
    
    Closes #6389 from liancheng/spark-7847 and squashes the following commits:
    
    935c652 [Cheng Lian] Adds test case for writing various data types as 
dynamic partition value
    f4fc398 [Cheng Lian] Converts partition columns to Scala type when writing 
dynamic partitions
    d0aeca0 [Cheng Lian] Fixes dynamic partition directory escaping
    
    (cherry picked from commit 15459db4f6867e95076cf53fade2fca833c4cf4e)
    Signed-off-by: Yin Huai <[email protected]>

commit 13044b0460e866804e6e3f058ebe38c0d005c1ff
Author: Kousuke Saruta <[email protected]>
Date:   2015-05-27T18:41:35Z

    [SPARK-7864] [UI] Fix the logic grabbing the link from table in AllJobPage
    
    This issue is related to #6419 .
    Now AllJobPage doesn't have a "kill link" but I think fix the issue 
mentioned in #6419 just in case to avoid accidents in the future.
    
    So, it's minor issue for now and I don't file this issue in JIRA.
    
    Author: Kousuke Saruta <[email protected]>
    
    Closes #6432 from sarutak/remove-ambiguity-of-link and squashes the 
following commits:
    
    cd1a503 [Kousuke Saruta] Fixed ambiguity link issue in AllJobPage
    
    (cherry picked from commit 0db76c90ad5f84d7a5640c41de74876b906ddc90)
    Signed-off-by: Andrew Or <[email protected]>

commit 0468d57a6fe42a7f06ccd4ac1faad59c4dcc4c68
Author: Reynold Xin <[email protected]>
Date:   2015-05-27T18:54:35Z

    Removed Guava dependency from JavaTypeInference's type signature.
    
    This should also close #6243.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #6431 from rxin/JavaTypeInference-guava and squashes the following 
commits:
    
    e58df3c [Reynold Xin] Removed Gauva dependency from JavaTypeInference's 
type signature.
    
    (cherry picked from commit 6fec1a9409b34d8ce58ea1c330b52cc7ef3e7e7e)
    Signed-off-by: Reynold Xin <[email protected]>

commit d33142fd8c045350bc949f3e55ab4f6d3fab6363
Author: Daoyuan Wang <[email protected]>
Date:   2015-05-27T19:42:13Z

    [SPARK-7790] [SQL] date and decimal conversion for dynamic partition key
    
    Author: Daoyuan Wang <[email protected]>
    
    Closes #6318 from adrian-wang/dynpart and squashes the following commits:
    
    ad73b61 [Daoyuan Wang] not use sqlTestUtils for try catch because dont have 
sqlcontext here
    6c33b51 [Daoyuan Wang] fix according to liancheng
    f0f8074 [Daoyuan Wang] some specific types as dynamic partition
    
    (cherry picked from commit 8161562eabc1eff430cfd9d8eaf413a8c4ef2cfb)
    Signed-off-by: Yin Huai <[email protected]>

commit 89fe93fc3b93009f1741b59dda6a4a9005128d1e
Author: Cheng Lian <[email protected]>
Date:   2015-05-27T20:09:33Z

    [SPARK-7684] [SQL] Refactoring MetastoreDataSourcesSuite to workaround 
SPARK-7684
    
    As stated in SPARK-7684, currently `TestHive.reset` has some execution 
order specific bug, which makes running specific test suites locally pretty 
frustrating. This PR refactors `MetastoreDataSourcesSuite` (which relies on 
`TestHive.reset` heavily) using various `withXxx` utility methods in 
`SQLTestUtils` to ask each test case to cleanup their own mess so that we can 
avoid calling `TestHive.reset`.
    
    Author: Cheng Lian <[email protected]>
    Author: Yin Huai <[email protected]>
    
    Closes #6353 from liancheng/workaround-spark-7684 and squashes the 
following commits:
    
    26939aa [Yin Huai] Move the initialization of jsonFilePath to beforeAll.
    a423d48 [Cheng Lian] Fixes Scala style issue
    dfe45d0 [Cheng Lian] Refactors MetastoreDataSourcesSuite to workaround 
SPARK-7684
    92a116d [Cheng Lian] Fixes minor styling issues
    
    (cherry picked from commit b97ddff000b99adca3dd8fe13d01054fd5014fa0)
    Signed-off-by: Yin Huai <[email protected]>

commit e07b71560cb791c701ad28adff02f5db6b490136
Author: Cheng Hao <[email protected]>
Date:   2015-05-27T21:21:00Z

    [SPARK-7853] [SQL] Fixes a class loader issue in Spark SQL
    
    This PR is based on PR #6396 authored by chenghao-intel. Essentially, Spark 
SQL should use context classloader to load SerDe classes.
    
    yhuai helped updating the test case, and I fixed a bug in the original 
`CliSuite`: while testing the CLI tool with `runCliWithin`, we don't append 
`\n` to the last query, thus the last query is never executed.
    
    Original PR description is pasted below.
    
    ----
    
    ```
    bin/spark-sql --jars 
./sql/hive/src/test/resources/hive-hcatalog-core-0.13.1.jar
    CREATE TABLE t1(a string, b string) ROW FORMAT SERDE 
'org.apache.hive.hcatalog.data.JsonSerDe';
    ```
    
    Throws exception like
    
    ```
    15/05/26 00:16:33 ERROR SparkSQLDriver: Failed in [CREATE TABLE t1(a 
string, b string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe']
    org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution 
Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot 
validate serde: org.apache.hive.hcatalog.data.JsonSerDe
            at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:333)
            at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:310)
            at 
org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:139)
            at 
org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:310)
            at 
org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:300)
            at 
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:457)
            at 
org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
            at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
            at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
            at 
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
            at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
            at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
            at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
            at 
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
            at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:922)
            at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:922)
            at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:147)
            at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:131)
            at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
            at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:727)
            at 
org.apache.spark.sql.hive.thriftserver.AbstractSparkSQLDriver.run(AbstractSparkSQLDriver.scala:57)
    ```
    
    Author: Cheng Hao <[email protected]>
    Author: Cheng Lian <[email protected]>
    Author: Yin Huai <[email protected]>
    
    Closes #6435 from liancheng/classLoader and squashes the following commits:
    
    d4c4845 [Cheng Lian] Fixes CliSuite
    75e80e2 [Yin Huai] Update the fix.
    fd26533 [Cheng Hao] scalastyle
    dd78775 [Cheng Hao] workaround for classloader of IsolatedClientLoader
    
    (cherry picked from commit db3fd054f240c7e38aba0732e471df65cd14011a)
    Signed-off-by: Yin Huai <[email protected]>

commit b4ecbce65c9329e2ed549b04752358a903ad983a
Author: Liang-Chi Hsieh <[email protected]>
Date:   2015-05-28T01:51:36Z

    [SPARK-7897][SQL] Use DecimalType to represent unsigned bigint in JDBCRDD
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-7897
    
    Author: Liang-Chi Hsieh <[email protected]>
    
    Closes #6438 from viirya/jdbc_unsigned_bigint and squashes the following 
commits:
    
    ccb3c3f [Liang-Chi Hsieh] Use DecimalType to represent unsigned bigint.
    
    (cherry picked from commit a1e092eae57172909ff2af06d8b461742595734c)
    Signed-off-by: Reynold Xin <[email protected]>

commit bd9173c14c4a25b6f87797eae348634e7aa7f7ac
Author: Yin Huai <[email protected]>
Date:   2015-05-28T03:04:29Z

    [SPARK-7907] [SQL] [UI] Rename tab ThriftServer to SQL.
    
    This PR has three changes:
    1. Renaming the table of `ThriftServer` to `SQL`;
    2. Renaming the title of the tab from `ThriftServer` to `JDBC/ODBC Server`; 
and
    3. Renaming the title of the session page from `ThriftServer` to `JDBC/ODBC 
Session`.
    
    https://issues.apache.org/jira/browse/SPARK-7907
    
    Author: Yin Huai <[email protected]>
    
    Closes #6448 from yhuai/JDBCServer and squashes the following commits:
    
    eadcc3d [Yin Huai] Update test.
    9168005 [Yin Huai] Use SQL as the tab name.
    221831e [Yin Huai] Rename ThriftServer to JDBCServer.
    
    (cherry picked from commit 3c1f1baaf003d50786d3eee1e288f4bac69096f2)
    Signed-off-by: Yin Huai <[email protected]>

commit 9da4b6bcbb0340fe6f81698451348feb2d87f0ba
Author: Josh Rosen <[email protected]>
Date:   2015-05-28T03:19:53Z

    [SPARK-7873] Allow KryoSerializerInstance to create multiple streams at the 
same time
    
    This is a somewhat obscure bug, but I think that it will seriously impact 
KryoSerializer users who use custom registrators which disabled auto-reset. 
When auto-reset is disabled, then this breaks things in some of our shuffle 
paths which actually end up creating multiple OutputStreams from the same 
shared SerializerInstance (which is unsafe).
    
    This was introduced by a patch (SPARK-3386) which enables serializer re-use 
in some of the shuffle paths, since constructing new serializer instances is 
actually pretty costly for KryoSerializer.  We had already fixed another 
corner-case (SPARK-7766) bug related to this, but missed this one.
    
    I think that the root problem here is that KryoSerializerInstance can be 
used in a way which is unsafe even within a single thread, e.g. by creating 
multiple open OutputStreams from the same instance or by interleaving 
deserialize and deserializeStream calls. I considered a smaller patch which 
adds assertions to guard against this type of "misuse" but abandoned that 
approach after I realized how convoluted the Scaladoc became.
    
    This patch fixes this bug by making it legal to create multiple streams 
from the same KryoSerializerInstance.  Internally, KryoSerializerInstance now 
implements a  `borrowKryo()` / `releaseKryo()` API that's backed by a "pool" of 
capacity 1. Each call to a KryoSerializerInstance method will borrow the Kryo, 
do its work, then release the serializer instance back to the pool. If the pool 
is empty and we need an instance, it will allocate a new Kryo on-demand. This 
makes it safe for multiple OutputStreams to be opened from the same serializer. 
If we try to release a Kryo back to the pool but the pool already contains a 
Kryo, then we'll just discard the new Kryo. I don't think there's a clear 
benefit to having a larger pool since our usages tend to fall into two cases, 
a) where we only create a single OutputStream and b) where we create a huge 
number of OutputStreams with the same lifecycle, then destroy the 
KryoSerializerInstance (this is what's happening in the bypassMergeSort code
  path that my regression test hits).
    
    Author: Josh Rosen <[email protected]>
    
    Closes #6415 from JoshRosen/SPARK-7873 and squashes the following commits:
    
    00b402e [Josh Rosen] Initialize eagerly to fix a failing test
    ba55d20 [Josh Rosen] Add explanatory comments
    3f1da96 [Josh Rosen] Guard against duplicate close()
    ab457ca [Josh Rosen] Sketch a loan/release based solution.
    9816e8f [Josh Rosen] Add a failing test showing how deserialize() and 
deserializeStream() can interfere.
    7350886 [Josh Rosen] Add failing regression test for SPARK-7873
    
    (cherry picked from commit 852f4de2d3d0c5fff2fa66000a7a3088bb3dbe74)
    Signed-off-by: Patrick Wendell <[email protected]>

commit d83c2ee84894b554aab0d88bf99ea2902f482176
Author: Sandy Ryza <[email protected]>
Date:   2015-05-28T05:23:22Z

    [SPARK-7896] Allow ChainedBuffer to store more than 2 GB
    
    Author: Sandy Ryza <[email protected]>
    
    Closes #6440 from sryza/sandy-spark-7896 and squashes the following commits:
    
    49d8a0d [Sandy Ryza] Fix bug introduced when reading over record boundaries
    6006856 [Sandy Ryza] Fix overflow issues
    006b4b2 [Sandy Ryza] Fix scalastyle by removing non ascii characters
    8b000ca [Sandy Ryza] Add ascii art to describe layout of data in metaBuffer
    f2053c0 [Sandy Ryza] Fix negative overflow issue
    0368c78 [Sandy Ryza] Initialize size as 0
    a5a4820 [Sandy Ryza] Use explicit types for all numbers in ChainedBuffer
    b7e0213 [Sandy Ryza] SPARK-7896. Allow ChainedBuffer to store more than 2 GB
    
    (cherry picked from commit bd11b01ebaf62df8b0d8c0b63b51b66e58f50960)
    Signed-off-by: Patrick Wendell <[email protected]>

commit 4983dfc878cc58d182d0e51c8adc3d00c985362a
Author: Patrick Wendell <[email protected]>
Date:   2015-05-28T05:36:23Z

    Preparing Spark release v1.4.0-rc3

commit 7c342bdd9377945337b1bf22344e50ac44d14986
Author: Patrick Wendell <[email protected]>
Date:   2015-05-28T05:36:30Z

    Preparing development version 1.4.0-SNAPSHOT

commit 63be026da3ebf6b77f37f2e950e3b8f516bdfcaa
Author: Matt Wise <[email protected]>
Date:   2015-05-28T05:39:19Z

    [DOCS] Fix typo in documentation for Java UDF registration
    
    This contribution is my original work and I license the work to the project 
under the project's open source license
    
    Author: Matt Wise <[email protected]>
    
    Closes #6447 from wisematthew/fix-typo-in-java-udf-registration-doc and 
squashes the following commits:
    
    e7ef5f7 [Matt Wise] Fix typo in documentation for Java UDF registration
    
    (cherry picked from commit 35410614deb7feea1c9d5cca00a6fa7970404f21)
    Signed-off-by: Reynold Xin <[email protected]>

commit bd568df22445a1ca5183ce357410ef7a76f5bb81
Author: zuxqoj <[email protected]>
Date:   2015-05-28T06:13:13Z

    [SPARK-7782] fixed sort arrow issue
    
    Current behaviour::
    In spark UI
    ![screen shot 2015-05-27 at 3 27 51 
pm](https://cloud.githubusercontent.com/assets/3919211/7837541/47d330ba-04a5-11e5-89d1-e5b11da1a513.png)
    
    In YARN
    ![screen shot 2015-05-27 at 
3](https://cloud.githubusercontent.com/assets/3919211/7837594/aebd1d36-04a5-11e5-8216-86e03c07d2bd.png)
    
    In jira
    ![screen shot 2015-05-27 at 
3_2](https://cloud.githubusercontent.com/assets/3919211/7837616/d3fedce2-04a5-11e5-9e68-960ed54e5d83.png)
    
    Author: zuxqoj <[email protected]>
    
    Closes #6437 from zuxqoj/SPARK-7782_PR and squashes the following commits:
    
    cd068b9 [zuxqoj] [SPARK-7782] fixed sort arrow issue
    
    (cherry picked from commit e838a25bdb5603ef05e779225704c972ce436145)
    Signed-off-by: Reynold Xin <[email protected]>

commit ab62d73ddb973c25de043e8e9ade7800adf244e8
Author: zsxwing <[email protected]>
Date:   2015-05-28T16:04:12Z

    [SPARK-7895] [STREAMING] [EXAMPLES] Move Kafka examples from scala-2.10/src 
to src
    
    Since `spark-streaming-kafka` now is published for both Scala 2.10 and 
2.11, we can move `KafkaWordCount` and `DirectKafkaWordCount` from 
`examples/scala-2.10/src/` to `examples/src/` so that they will appear in 
`spark-examples-***-jar` for Scala 2.11.
    
    Author: zsxwing <[email protected]>
    
    Closes #6436 from zsxwing/SPARK-7895 and squashes the following commits:
    
    c6052f1 [zsxwing] Update examples/pom.xml
    0bcfa87 [zsxwing] Fix the sleep time
    b9d1256 [zsxwing] Move Kafka examples from scala-2.10/src to src
    
    (cherry picked from commit 000df2f0d6af068bb188e81bbb207f0c2f43bf16)
    Signed-off-by: Patrick Wendell <[email protected]>

commit 7b5dffb80288cb491cd9de9da653a78d800be55b
Author: Xiangrui Meng <[email protected]>
Date:   2015-05-28T19:03:46Z

    [SPARK-7911] [MLLIB] A workaround for VectorUDT serialize (or deserialize) 
being called multiple times
    
    ~~A PythonUDT shouldn't be serialized into external Scala types in 
PythonRDD. I'm not sure whether this should fix one of the bugs related to SQL 
UDT/UDF in PySpark.~~
    
    The fix above didn't work. So I added a workaround for this. If a Python 
UDF is applied to a Python UDT. This will put the Python SQL types as inputs. 
Still incorrect, but at least it doesn't throw exceptions on the Scala side. 
davies harsha2010
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #6442 from mengxr/SPARK-7903 and squashes the following commits:
    
    c257d2a [Xiangrui Meng] add a workaround for VectorUDT
    
    (cherry picked from commit 530efe3e80c62b25c869b85167e00330eb1ddea6)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 4485283981e4592dd817fc8956b4a6faea06d817
Author: Li Yao <[email protected]>
Date:   2015-05-28T20:39:39Z

    [MINOR] Fix the a minor bug in PageRank Example.
    
    Fix the bug that entering only 1 arg will cause array out of bounds 
exception in PageRank example.
    
    Author: Li Yao <[email protected]>
    
    Closes #6455 from lastland/patch-1 and squashes the following commits:
    
    de06128 [Li Yao] Fix the bug that entering only 1 arg will cause array out 
of bounds exception.
    
    (cherry picked from commit c771589c96403b2a518fb77d5162eca8f495f37b)
    Signed-off-by: Andrew Or <[email protected]>

commit 0a65224aed9d2bb780e0d3e70d2a7ba34f30219b
Author: Mike Dusenberry <[email protected]>
Date:   2015-05-28T21:15:10Z

    [DOCS] Fixing broken "IDE setup" link in the Building Spark documentation.
    
    The location of the IDE setup information has changed, so this just updates 
the link on the Building Spark page.
    
    Author: Mike Dusenberry <[email protected]>
    
    Closes #6467 from dusenberrymw/Fix_Broken_Link_On_Building_Spark_Doc and 
squashes the following commits:
    
    75c533a [Mike Dusenberry] Fixing broken "IDE setup" link in the Building 
Spark documentation by pointing to new location.
    
    (cherry picked from commit 3e312a5ed0154527c66eeeee0d2cc3bfce0a820e)
    Signed-off-by: Sean Owen <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Branch 1.4

Reply via email to