[GitHub] spark pull request: [SPSARK-8578] [SQL] Should ignore user defined...

yhuai Tue, 23 Jun 2015 18:28:01 -0700

GitHub user yhuai opened a pull request:

    https://github.com/apache/spark/pull/6965


    [SPSARK-8578] [SQL] Should ignore user defined output committer when 
appending data (branch 1.4)

    This is https://github.com/apache/spark/pull/6964 for branch 1.4.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yhuai/spark SPARK-8578-branch-1.4

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/6965.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6965
    
----
commit bd9173c14c4a25b6f87797eae348634e7aa7f7ac
Author: Yin Huai <[email protected]>
Date:   2015-05-28T03:04:29Z

    [SPARK-7907] [SQL] [UI] Rename tab ThriftServer to SQL.
    
    This PR has three changes:
    1. Renaming the table of `ThriftServer` to `SQL`;
    2. Renaming the title of the tab from `ThriftServer` to `JDBC/ODBC Server`; 
and
    3. Renaming the title of the session page from `ThriftServer` to `JDBC/ODBC 
Session`.
    
    https://issues.apache.org/jira/browse/SPARK-7907
    
    Author: Yin Huai <[email protected]>
    
    Closes #6448 from yhuai/JDBCServer and squashes the following commits:
    
    eadcc3d [Yin Huai] Update test.
    9168005 [Yin Huai] Use SQL as the tab name.
    221831e [Yin Huai] Rename ThriftServer to JDBCServer.
    
    (cherry picked from commit 3c1f1baaf003d50786d3eee1e288f4bac69096f2)
    Signed-off-by: Yin Huai <[email protected]>

commit 9da4b6bcbb0340fe6f81698451348feb2d87f0ba
Author: Josh Rosen <[email protected]>
Date:   2015-05-28T03:19:53Z

    [SPARK-7873] Allow KryoSerializerInstance to create multiple streams at the 
same time
    
    This is a somewhat obscure bug, but I think that it will seriously impact 
KryoSerializer users who use custom registrators which disabled auto-reset. 
When auto-reset is disabled, then this breaks things in some of our shuffle 
paths which actually end up creating multiple OutputStreams from the same 
shared SerializerInstance (which is unsafe).
    
    This was introduced by a patch (SPARK-3386) which enables serializer re-use 
in some of the shuffle paths, since constructing new serializer instances is 
actually pretty costly for KryoSerializer.  We had already fixed another 
corner-case (SPARK-7766) bug related to this, but missed this one.
    
    I think that the root problem here is that KryoSerializerInstance can be 
used in a way which is unsafe even within a single thread, e.g. by creating 
multiple open OutputStreams from the same instance or by interleaving 
deserialize and deserializeStream calls. I considered a smaller patch which 
adds assertions to guard against this type of "misuse" but abandoned that 
approach after I realized how convoluted the Scaladoc became.
    
    This patch fixes this bug by making it legal to create multiple streams 
from the same KryoSerializerInstance.  Internally, KryoSerializerInstance now 
implements a  `borrowKryo()` / `releaseKryo()` API that's backed by a "pool" of 
capacity 1. Each call to a KryoSerializerInstance method will borrow the Kryo, 
do its work, then release the serializer instance back to the pool. If the pool 
is empty and we need an instance, it will allocate a new Kryo on-demand. This 
makes it safe for multiple OutputStreams to be opened from the same serializer. 
If we try to release a Kryo back to the pool but the pool already contains a 
Kryo, then we'll just discard the new Kryo. I don't think there's a clear 
benefit to having a larger pool since our usages tend to fall into two cases, 
a) where we only create a single OutputStream and b) where we create a huge 
number of OutputStreams with the same lifecycle, then destroy the 
KryoSerializerInstance (this is what's happening in the bypassMergeSort code
  path that my regression test hits).
    
    Author: Josh Rosen <[email protected]>
    
    Closes #6415 from JoshRosen/SPARK-7873 and squashes the following commits:
    
    00b402e [Josh Rosen] Initialize eagerly to fix a failing test
    ba55d20 [Josh Rosen] Add explanatory comments
    3f1da96 [Josh Rosen] Guard against duplicate close()
    ab457ca [Josh Rosen] Sketch a loan/release based solution.
    9816e8f [Josh Rosen] Add a failing test showing how deserialize() and 
deserializeStream() can interfere.
    7350886 [Josh Rosen] Add failing regression test for SPARK-7873
    
    (cherry picked from commit 852f4de2d3d0c5fff2fa66000a7a3088bb3dbe74)
    Signed-off-by: Patrick Wendell <[email protected]>

commit d83c2ee84894b554aab0d88bf99ea2902f482176
Author: Sandy Ryza <[email protected]>
Date:   2015-05-28T05:23:22Z

    [SPARK-7896] Allow ChainedBuffer to store more than 2 GB
    
    Author: Sandy Ryza <[email protected]>
    
    Closes #6440 from sryza/sandy-spark-7896 and squashes the following commits:
    
    49d8a0d [Sandy Ryza] Fix bug introduced when reading over record boundaries
    6006856 [Sandy Ryza] Fix overflow issues
    006b4b2 [Sandy Ryza] Fix scalastyle by removing non ascii characters
    8b000ca [Sandy Ryza] Add ascii art to describe layout of data in metaBuffer
    f2053c0 [Sandy Ryza] Fix negative overflow issue
    0368c78 [Sandy Ryza] Initialize size as 0
    a5a4820 [Sandy Ryza] Use explicit types for all numbers in ChainedBuffer
    b7e0213 [Sandy Ryza] SPARK-7896. Allow ChainedBuffer to store more than 2 GB
    
    (cherry picked from commit bd11b01ebaf62df8b0d8c0b63b51b66e58f50960)
    Signed-off-by: Patrick Wendell <[email protected]>

commit 4983dfc878cc58d182d0e51c8adc3d00c985362a
Author: Patrick Wendell <[email protected]>
Date:   2015-05-28T05:36:23Z

    Preparing Spark release v1.4.0-rc3

commit 7c342bdd9377945337b1bf22344e50ac44d14986
Author: Patrick Wendell <[email protected]>
Date:   2015-05-28T05:36:30Z

    Preparing development version 1.4.0-SNAPSHOT

commit 63be026da3ebf6b77f37f2e950e3b8f516bdfcaa
Author: Matt Wise <[email protected]>
Date:   2015-05-28T05:39:19Z

    [DOCS] Fix typo in documentation for Java UDF registration
    
    This contribution is my original work and I license the work to the project 
under the project's open source license
    
    Author: Matt Wise <[email protected]>
    
    Closes #6447 from wisematthew/fix-typo-in-java-udf-registration-doc and 
squashes the following commits:
    
    e7ef5f7 [Matt Wise] Fix typo in documentation for Java UDF registration
    
    (cherry picked from commit 35410614deb7feea1c9d5cca00a6fa7970404f21)
    Signed-off-by: Reynold Xin <[email protected]>

commit bd568df22445a1ca5183ce357410ef7a76f5bb81
Author: zuxqoj <[email protected]>
Date:   2015-05-28T06:13:13Z

    [SPARK-7782] fixed sort arrow issue
    
    Current behaviour::
    In spark UI
    ![screen shot 2015-05-27 at 3 27 51 
pm](https://cloud.githubusercontent.com/assets/3919211/7837541/47d330ba-04a5-11e5-89d1-e5b11da1a513.png)
    
    In YARN
    ![screen shot 2015-05-27 at 
3](https://cloud.githubusercontent.com/assets/3919211/7837594/aebd1d36-04a5-11e5-8216-86e03c07d2bd.png)
    
    In jira
    ![screen shot 2015-05-27 at 
3_2](https://cloud.githubusercontent.com/assets/3919211/7837616/d3fedce2-04a5-11e5-9e68-960ed54e5d83.png)
    
    Author: zuxqoj <[email protected]>
    
    Closes #6437 from zuxqoj/SPARK-7782_PR and squashes the following commits:
    
    cd068b9 [zuxqoj] [SPARK-7782] fixed sort arrow issue
    
    (cherry picked from commit e838a25bdb5603ef05e779225704c972ce436145)
    Signed-off-by: Reynold Xin <[email protected]>

commit ab62d73ddb973c25de043e8e9ade7800adf244e8
Author: zsxwing <[email protected]>
Date:   2015-05-28T16:04:12Z

    [SPARK-7895] [STREAMING] [EXAMPLES] Move Kafka examples from scala-2.10/src 
to src
    
    Since `spark-streaming-kafka` now is published for both Scala 2.10 and 
2.11, we can move `KafkaWordCount` and `DirectKafkaWordCount` from 
`examples/scala-2.10/src/` to `examples/src/` so that they will appear in 
`spark-examples-***-jar` for Scala 2.11.
    
    Author: zsxwing <[email protected]>
    
    Closes #6436 from zsxwing/SPARK-7895 and squashes the following commits:
    
    c6052f1 [zsxwing] Update examples/pom.xml
    0bcfa87 [zsxwing] Fix the sleep time
    b9d1256 [zsxwing] Move Kafka examples from scala-2.10/src to src
    
    (cherry picked from commit 000df2f0d6af068bb188e81bbb207f0c2f43bf16)
    Signed-off-by: Patrick Wendell <[email protected]>

commit 7b5dffb80288cb491cd9de9da653a78d800be55b
Author: Xiangrui Meng <[email protected]>
Date:   2015-05-28T19:03:46Z

    [SPARK-7911] [MLLIB] A workaround for VectorUDT serialize (or deserialize) 
being called multiple times
    
    ~~A PythonUDT shouldn't be serialized into external Scala types in 
PythonRDD. I'm not sure whether this should fix one of the bugs related to SQL 
UDT/UDF in PySpark.~~
    
    The fix above didn't work. So I added a workaround for this. If a Python 
UDF is applied to a Python UDT. This will put the Python SQL types as inputs. 
Still incorrect, but at least it doesn't throw exceptions on the Scala side. 
davies harsha2010
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #6442 from mengxr/SPARK-7903 and squashes the following commits:
    
    c257d2a [Xiangrui Meng] add a workaround for VectorUDT
    
    (cherry picked from commit 530efe3e80c62b25c869b85167e00330eb1ddea6)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 4485283981e4592dd817fc8956b4a6faea06d817
Author: Li Yao <[email protected]>
Date:   2015-05-28T20:39:39Z

    [MINOR] Fix the a minor bug in PageRank Example.
    
    Fix the bug that entering only 1 arg will cause array out of bounds 
exception in PageRank example.
    
    Author: Li Yao <[email protected]>
    
    Closes #6455 from lastland/patch-1 and squashes the following commits:
    
    de06128 [Li Yao] Fix the bug that entering only 1 arg will cause array out 
of bounds exception.
    
    (cherry picked from commit c771589c96403b2a518fb77d5162eca8f495f37b)
    Signed-off-by: Andrew Or <[email protected]>

commit 0a65224aed9d2bb780e0d3e70d2a7ba34f30219b
Author: Mike Dusenberry <[email protected]>
Date:   2015-05-28T21:15:10Z

    [DOCS] Fixing broken "IDE setup" link in the Building Spark documentation.
    
    The location of the IDE setup information has changed, so this just updates 
the link on the Building Spark page.
    
    Author: Mike Dusenberry <[email protected]>
    
    Closes #6467 from dusenberrymw/Fix_Broken_Link_On_Building_Spark_Doc and 
squashes the following commits:
    
    75c533a [Mike Dusenberry] Fixing broken "IDE setup" link in the Building 
Spark documentation by pointing to new location.
    
    (cherry picked from commit 3e312a5ed0154527c66eeeee0d2cc3bfce0a820e)
    Signed-off-by: Sean Owen <[email protected]>

commit b9bdf12a1c2ea81cfaae7df540670c34d028838d
Author: Xiangrui Meng <[email protected]>
Date:   2015-05-28T23:32:51Z

    [SPARK-7198] [MLLIB] VectorAssembler should output ML attributes
    
    `VectorAssembler` should carry over ML attributes. For unknown attributes, 
we assume numeric values. This PR handles the following cases:
    
    1. DoubleType with ML attribute: carry over
    2. DoubleType without ML attribute: numeric value
    3. Scalar type: numeric value
    4. VectorType with all ML attributes: carry over and update names
    5. VectorType with number of ML attributes: assume all numeric
    6. VectorType without ML attributes: check the first row and get the number 
of attributes
    
    jkbradley
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #6452 from mengxr/SPARK-7198 and squashes the following commits:
    
    a9d2469 [Xiangrui Meng] add space
    facdb1f [Xiangrui Meng] VectorAssembler should output ML attributes
    
    (cherry picked from commit 7859ab659eecbcf2d8b9a274a4e9e4f5186a528c)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 9c2c6b4a676ea1fdfecd9cd450d43d4081c77385
Author: Reynold Xin <[email protected]>
Date:   2015-05-28T23:56:59Z

    Remove SizeEstimator from o.a.spark package.
    
    See comments on https://github.com/apache/spark/pull/3913
    
    Author: Reynold Xin <[email protected]>
    
    Closes #6471 from rxin/sizeestimator and squashes the following commits:
    
    c057095 [Reynold Xin] Fixed import.
    2da478b [Reynold Xin] Remove SizeEstimator from o.a.spark package.
    
    (cherry picked from commit 0077af22ca5fcb2e50dcf7daa4f6804ae722bfbe)
    Signed-off-by: Reynold Xin <[email protected]>

commit 8f4a86eaa1cad9a2a7607fd5446105c93e5e424e
Author: Yin Huai <[email protected]>
Date:   2015-05-29T00:12:30Z

    [SPARK-7853] [SQL] Fix HiveContext in Spark Shell
    
    https://issues.apache.org/jira/browse/SPARK-7853
    
    This fixes the problem introduced by my change in 
https://github.com/apache/spark/pull/6435, which causes that Hive Context fails 
to create in spark shell because of the class loader issue.
    
    Author: Yin Huai <[email protected]>
    
    Closes #6459 from yhuai/SPARK-7853 and squashes the following commits:
    
    37ad33e [Yin Huai] Do not use hiveQlTable at all.
    47cdb6d [Yin Huai] Move hiveconf.set to the end of setConf.
    005649b [Yin Huai] Update comment.
    35d86f3 [Yin Huai] Access TTable directly to make sure Hive will not 
internally use any metastore utility functions.
    3737766 [Yin Huai] Recursively find all jars.
    
    (cherry picked from commit 572b62cafe4bc7b1d464c9dcfb449c9d53456826)
    Signed-off-by: Yin Huai <[email protected]>

commit 7bb445a38ca37e72d0b11ad1c4448632b679eda6
Author: Xusen Yin <[email protected]>
Date:   2015-05-29T00:30:12Z

    [SPARK-7577] [ML] [DOC] add bucketizer doc
    
    CC jkbradley
    
    Author: Xusen Yin <[email protected]>
    
    Closes #6451 from yinxusen/SPARK-7577 and squashes the following commits:
    
    e2dc32e [Xusen Yin] rename colums
    e350e49 [Xusen Yin] add all demos
    006ddf1 [Xusen Yin] add java test
    3238481 [Xusen Yin] add bucketizer
    
    (cherry picked from commit 1bd63e82fdb6ee57c61051430d63685b801df016)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit f4b135337c5032dcd224ebd14e134aa8de0c1667
Author: Reynold Xin <[email protected]>
Date:   2015-05-29T00:55:22Z

    [SPARK-7927] whitespace fixes for streaming.
    
    So we can enable a whitespace enforcement rule in the style checker to save 
code review time.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #6475 from rxin/whitespace-streaming and squashes the following 
commits:
    
    810dae4 [Reynold Xin] Fixed tests.
    89068ad [Reynold Xin] [SPARK-7927] whitespace fixes for streaming.
    
    (cherry picked from commit 3af0b3136e4b7dea52c413d640653ccddc638574)
    Signed-off-by: Reynold Xin <[email protected]>

commit 3b38c06f0d19bd0d15df768d6ae0037f6c04b88d
Author: Reynold Xin <[email protected]>
Date:   2015-05-29T01:08:56Z

    [SPARK-7927] whitespace fixes for Hive and ThriftServer.
    
    So we can enable a whitespace enforcement rule in the style checker to save 
code review time.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #6478 from rxin/whitespace-hive and squashes the following commits:
    
    e01b0e0 [Reynold Xin] Fixed tests.
    a3bba22 [Reynold Xin] [SPARK-7927] whitespace fixes for Hive and 
ThriftServer.
    
    (cherry picked from commit ee6a0e12fb76e4d5c24175900e5bf6a8cb35e2b0)
    Signed-off-by: Reynold Xin <[email protected]>

commit 3479e6a127d0b93ef38533fdad02a49850716583
Author: Kay Ousterhout <[email protected]>
Date:   2015-05-29T02:04:32Z

    [SPARK-7933] Remove Patrick's username/pw from merge script
    
    Looks like this was added by accident when pwendell merged a commit back in 
September: fe2b1d6a209db9fe96b1c6630677955b94bd48c9
    
    Author: Kay Ousterhout <[email protected]>
    
    Closes #6485 from kayousterhout/SPARK-7933 and squashes the following 
commits:
    
    7c6164a [Kay Ousterhout] [SPARK-7933] Remove Patrick's username/pw from 
merge script
    
    (cherry picked from commit 66c49ed60dcef48a6b38ae2d2c4c479933f3aa19)
    Signed-off-by: Patrick Wendell <[email protected]>

commit 0c05115063df39e6058c9c8ea90dd10724a7366d
Author: Xiangrui Meng <[email protected]>
Date:   2015-05-29T03:09:12Z

    [SPARK-7927] [MLLIB] Enforce whitespace for more tokens in style checker
    
    rxin
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #6481 from mengxr/mllib-scalastyle and squashes the following 
commits:
    
    3ca4d61 [Xiangrui Meng] revert scalastyle config
    30961ba [Xiangrui Meng] adjust spaces in mllib/test
    571b5c5 [Xiangrui Meng] fix spaces in mllib
    
    (cherry picked from commit 04616b1a2f5244710b07ecbb404384ded893292c)
    Signed-off-by: Reynold Xin <[email protected]>

commit 9b97e95e86f0d11e8ae3ba55432c726cec79d5bc
Author: Reynold Xin <[email protected]>
Date:   2015-05-29T03:10:21Z

    [SPARK-7927] whitespace fixes for SQL core.
    
    So we can enable a whitespace enforcement rule in the style checker to save 
code review time.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #6477 from rxin/whitespace-sql-core and squashes the following 
commits:
    
    ce6e369 [Reynold Xin] Fixed tests.
    6095fed [Reynold Xin] [SPARK-7927] whitespace fixes for SQL core.
    
    (cherry picked from commit ff44c711abc7ca545dfa1e836279c00fe7539c18)
    Signed-off-by: Reynold Xin <[email protected]>

commit 142ae52d4800fdb966b14b8f0753ba7567c55204
Author: Reynold Xin <[email protected]>
Date:   2015-05-29T03:11:04Z

    [SPARK-7929] Remove Bagel examples & whitespace fix for examples.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #6480 from rxin/whitespace-example and squashes the following 
commits:
    
    8a4a3d4 [Reynold Xin] [SPARK-7929] Remove Bagel examples & whitespace fix 
for examples.
    
    (cherry picked from commit 2881d14cbedc14f1cd8ae5078446dba1a8d39086)
    Signed-off-by: Reynold Xin <[email protected]>

commit 22e42e3fee21fc1adcb4a4fb515197be6e1a36b0
Author: Reynold Xin <[email protected]>
Date:   2015-05-29T03:11:57Z

    [SPARK-7927] whitespace fixes for Catalyst module.
    
    So we can enable a whitespace enforcement rule in the style checker to save 
code review time.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #6476 from rxin/whitespace-catalyst and squashes the following 
commits:
    
    650409d [Reynold Xin] Fixed tests.
    51a9e5d [Reynold Xin] [SPARK-7927] whitespace fixes for Catalyst module.
    
    (cherry picked from commit 8da560d7de9b3c9a3e3ff197eeb10a3d7023f10d)
    Signed-off-by: Reynold Xin <[email protected]>
    
    Conflicts:
        
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala

commit e3dd2802f6dd8b2df9fb73d8e9901c4e6e4d6b84
Author: Reynold Xin <[email protected]>
Date:   2015-05-29T03:15:52Z

    [SPARK-7927] whitespace fixes for core.
    
    So we can enable a whitespace enforcement rule in the style checker to save 
code review time.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #6473 from rxin/whitespace-core and squashes the following commits:
    
    058195d [Reynold Xin] Fixed tests.
    fce11e9 [Reynold Xin] [SPARK-7927] whitespace fixes for core.
    
    (cherry picked from commit 7f7505d8db7759ea46e904f767c23130eff1104a)
    Signed-off-by: Reynold Xin <[email protected]>

commit b3a590061da09674cb0ff868c808985ea846145e
Author: Reynold Xin <[email protected]>
Date:   2015-05-29T03:17:16Z

    [SPARK-7927] whitespace fixes for GraphX.
    
    So we can enable a whitespace enforcement rule in the style checker to save 
code review time.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #6474 from rxin/whitespace-graphx and squashes the following commits:
    
    4d3cd26 [Reynold Xin] Fixed tests.
    869dde4 [Reynold Xin] [SPARK-7927] whitespace fixes for GraphX.
    
    (cherry picked from commit b069ad23d9b6cbfb3a8bf245547add4816669075)
    Signed-off-by: Reynold Xin <[email protected]>

commit 6e99dd5d042e8a3e49937769a846bef8a66214f8
Author: Xiangrui Meng <[email protected]>
Date:   2015-05-29T04:20:54Z

    [SPARK-7926] [PYSPARK] use the official Pyrolite release
    
    Switch to the official Pyrolite release from the one published under 
`org.spark-project`. Thanks irmen for making the releases on Maven Central. We 
didn't upgrade to 4.6 because we don't have enough time for QA. I excludes 
`serpent` from its dependencies because we don't use it in Spark.
    ~~~
    [info]   +-net.jpountz.lz4:lz4:1.3.0
    [info]   +-net.razorvine:pyrolite:4.4
    [info]   +-net.sf.py4j:py4j:0.8.2.1
    ~~~
    
    davies
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #6472 from mengxr/SPARK-7926 and squashes the following commits:
    
    7b3c6bf [Xiangrui Meng] use the official Pyrolite release
    
    (cherry picked from commit c45d58c143d68cb807186acc9d060daa8549dd5c)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 1d49d8c3fd297f7a6269693fbec623ddec96b279
Author: Xiangrui Meng <[email protected]>
Date:   2015-05-29T04:26:43Z

    [MINOR] fix RegressionEvaluator doc
    
    `make clean html` under `python/doc` returns
    ~~~
    /Users/meng/src/spark/python/pyspark/ml/evaluation.py:docstring of 
pyspark.ml.evaluation.RegressionEvaluator.setParams:3: WARNING: Definition list 
ends without a blank line; unexpected unindent.
    ~~~
    
    harsha2010
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #6469 from mengxr/fix-regression-evaluator-doc and squashes the 
following commits:
    
    91e2dad [Xiangrui Meng] fix RegressionEvaluator doc
    
    (cherry picked from commit 834e699524583a7ebfe9e83b3900ec503150deca)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit aee046dfa111b4323edd5f4ccb36075449492952
Author: Kay Ousterhout <[email protected]>
Date:   2015-05-29T05:09:49Z

    [SPARK-7932] Fix misleading scheduler delay visualization
    
    The existing code rounds down to the nearest percent when computing the 
proportion
    of a task's time that was spent on each phase of execution, and then 
computes
    the scheduler delay proportion as 100 - sum(all other proportions).  As a 
result,
    a few extra percent can end up in the scheduler delay. This commit 
eliminates
    the rounding so that the time visualizations correspond properly to the 
real times.
    
    sarutak If you could take a look at this, that would be great! Not sure if 
there's a good
    reason to round here that I missed.
    
    cc shivaram
    
    Author: Kay Ousterhout <[email protected]>
    
    Closes #6484 from kayousterhout/SPARK-7932 and squashes the following 
commits:
    
    1723cc4 [Kay Ousterhout] [SPARK-7932] Fix misleading scheduler delay 
visualization
    
    (cherry picked from commit 04ddcd4db7801abefa9c9effe5d88413b29d713b)
    Signed-off-by: Kay Ousterhout <[email protected]>

commit f7cb272b7c77de42681287925922d41248efca46
Author: Tathagata Das <[email protected]>
Date:   2015-05-29T05:28:13Z

    [SPARK-7930] [CORE] [STREAMING] Fixed shutdown hook priorities
    
    Shutdown hook for temp directories had priority 100 while SparkContext was 
50. So the local root directory was deleted before SparkContext was shutdown. 
This leads to scary errors on running jobs, at the time of shutdown. This is 
especially a problem when running streaming examples, where Ctrl-C is the only 
way to shutdown.
    
    The fix in this PR is to make the temp directory shutdown priority lower 
than SparkContext, so that the temp dirs are the last thing to get deleted, 
after the SparkContext has been shut down. Also, the DiskBlockManager shutdown 
priority is change from default 100 to temp_dir_prio + 1, so that it gets 
invoked just before all temp dirs are cleared.
    
    Author: Tathagata Das <[email protected]>
    
    Closes #6482 from tdas/SPARK-7930 and squashes the following commits:
    
    d7cbeb5 [Tathagata Das] Removed unnecessary line
    1514d0b [Tathagata Das] Fixed shutdown hook priorities
    
    (cherry picked from commit cd3d9a5c0c3e77098a72c85dffe4a27737009ae7)
    Signed-off-by: Patrick Wendell <[email protected]>

commit 68559423ac2ffc2c9dfcbe95a8efa4868757c4bf
Author: Xiangrui Meng <[email protected]>
Date:   2015-05-29T05:38:38Z

    [SPARK-7922] [MLLIB] use DataFrames for user/item factors in ALSModel
    
    Expose user/item factors in DataFrames. This is to be more consistent with 
the pipeline API. It also helps maintain consistent APIs across languages. This 
PR also removed fitting params from `ALSModel`.
    
    coderxiang
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #6468 from mengxr/SPARK-7922 and squashes the following commits:
    
    7bfb1d5 [Xiangrui Meng] update ALSModel in PySpark
    1ba5607 [Xiangrui Meng] use DataFrames for user/item factors in ALS
    
    (cherry picked from commit db9513789756da4f16bb1fe8cf1d19500f231f54)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 7a52fdf25f8d635ba05796abb0c491454d7869cf
Author: Tathagata Das <[email protected]>
Date:   2015-05-29T05:39:21Z

    [SPARK-7931] [STREAMING] Do not restart receiver when stopped
    
    Attempts to restart the socket receiver when it is supposed to be stopped 
causes undesirable error messages.
    
    Author: Tathagata Das <[email protected]>
    
    Closes #6483 from tdas/SPARK-7931 and squashes the following commits:
    
    09aeee1 [Tathagata Das] Do not restart receiver when stopped

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPSARK-8578] [SQL] Should ignore user defined...

Reply via email to