[GitHub] spark pull request: Branch 1.5

bigballofmud Tue, 04 Aug 2015 10:19:58 -0700

GitHub user bigballofmud opened a pull request:

    https://github.com/apache/spark/pull/7938


    Branch 1.5

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.5

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/7938.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #7938
    
----
commit 4de833e9e81415832c0556d8f1b9e3c3ae48cafa
Author: Joseph Batchik <[email protected]>
Date:   2015-08-03T18:17:38Z

    [SPARK-9511] [SQL] Fixed Table Name Parsing
    
    The issue was that the tokenizer was parsing "1one" into the numeric 1 
using the code on line 110. I added another case to accept strings that start 
with a number and then have a letter somewhere else in it as well.
    
    Author: Joseph Batchik <[email protected]>
    
    Closes #7844 from JDrit/parse_error and squashes the following commits:
    
    b8ca12f [Joseph Batchik] fixed parsing issue by adding another case
    
    (cherry picked from commit dfe7bd168d9bcf8c53f993f459ab473d893457b0)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 5452e93f03bc308282cb8f189f65bb1b258d8813
Author: Reynold Xin <[email protected]>
Date:   2015-08-03T18:22:02Z

    [SQL][minor] Simplify UnsafeRow.calculateBitSetWidthInBytes.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #7897 from rxin/calculateBitSetWidthInBytes and squashes the 
following commits:
    
    2e73b3a [Reynold Xin] [SQL][minor] Simplify 
UnsafeRow.calculateBitSetWidthInBytes.
    
    (cherry picked from commit 7a9d09f0bb472a1671d3457e1f7108f4c2eb4121)
    Signed-off-by: Reynold Xin <[email protected]>

commit 6d46e9b7c8ffde5d3cc3d86b005c40c51934e56b
Author: Cheng Lian <[email protected]>
Date:   2015-08-03T19:06:58Z

    [SPARK-9554] [SQL] Enables in-memory partition pruning by default
    
    Author: Cheng Lian <[email protected]>
    
    Closes #7895 from liancheng/spark-9554/enable-in-memory-partition-pruning 
and squashes the following commits:
    
    67c403e [Cheng Lian] Enables in-memory partition pruning by default
    
    (cherry picked from commit 703e44bff19f4c394f6f9bff1ce9152cdc68c51e)
    Signed-off-by: Reynold Xin <[email protected]>

commit b3117d312332af3b4bd416857f632cacb5230feb
Author: Joseph K. Bradley <[email protected]>
Date:   2015-08-03T19:17:46Z

    [SPARK-5133] [ML] Added featureImportance to RandomForestClassifier and 
Regressor
    
    Added featureImportance to RandomForestClassifier and Regressor.
    
    This follows the scikit-learn implementation here: 
[https://github.com/scikit-learn/scikit-learn/blob/a95203b249c1cf392f86d001ad999e29b2392739/sklearn/tree/_tree.pyx#L3341]
    
    CC: yanboliang  Would you mind taking a look?  Thanks!
    
    Author: Joseph K. Bradley <[email protected]>
    Author: Feynman Liang <[email protected]>
    
    Closes #7838 from jkbradley/dt-feature-importance and squashes the 
following commits:
    
    72a167a [Joseph K. Bradley] fixed unit test
    86cea5f [Joseph K. Bradley] Modified RF featuresImportances to return 
Vector instead of Map
    5aa74f0 [Joseph K. Bradley] finally fixed unit test for real
    33df5db [Joseph K. Bradley] fix unit test
    42a2d3b [Joseph K. Bradley] fix unit test
    fe94e72 [Joseph K. Bradley] modified feature importance unit tests
    cc693ee [Feynman Liang] Add classifier tests
    79a6f87 [Feynman Liang] Compare dense vectors in test
    21d01fc [Feynman Liang] Added failing SKLearn test
    ac0b254 [Joseph K. Bradley] Added featureImportance to 
RandomForestClassifier/Regressor.  Need to add unit tests
    
    (cherry picked from commit ff9169a002f1b75231fd25b7d04157a912503038)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 444058d9158d426ae455208f07bf9c202e8f9925
Author: Kousuke Saruta <[email protected]>
Date:   2015-08-03T19:53:44Z

    [SPARK-9558][DOCS]Update docs to follow the increase of memory defaults.
    
    Now the memory defaults of master and slave in Standalone mode and History 
Server is 1g, not 512m. So let's update docs.
    
    Author: Kousuke Saruta <[email protected]>
    
    Closes #7896 from sarutak/update-doc-for-daemon-memory and squashes the 
following commits:
    
    a77626c [Kousuke Saruta] Fix docs to follow the update of increase of 
memory defaults
    
    (cherry picked from commit ba1c4e138de2ea84b55def4eed2bd363e60aea4d)
    Signed-off-by: Reynold Xin <[email protected]>

commit dc0c8c982825c3c58b7c6c4570c03ba97dba608b
Author: Xiangrui Meng <[email protected]>
Date:   2015-08-03T20:59:35Z

    [SPARK-9544] [MLLIB] add Python API for RFormula
    
    Add Python API for RFormula. Similar to other feature transformers in 
Python. This is just a thin wrapper over the Scala implementation. ericl 
MechCoder
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #7879 from mengxr/SPARK-9544 and squashes the following commits:
    
    3d5ff03 [Xiangrui Meng] add an doctest for . and -
    5e969a5 [Xiangrui Meng] fix pydoc
    1cd41f8 [Xiangrui Meng] organize imports
    3c18b10 [Xiangrui Meng] add Python API for RFormula
    
    (cherry picked from commit e4765a46833baff1dd7465c4cf50e947de7e8f21)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit e7329ab31323a89d1e07c808927e5543876e3ce3
Author: Yanbo Liang <[email protected]>
Date:   2015-08-03T20:58:00Z

    [SPARK-9191] [ML] [Doc] Add ml.PCA user guide and code examples
    
    Add ml.PCA user guide document and code examples for Scala/Java/Python.
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #7522 from yanboliang/ml-pca-md and squashes the following commits:
    
    60dec05 [Yanbo Liang] address comments
    f992abe [Yanbo Liang] Add ml.PCA doc and examples
    
    (cherry picked from commit 8ca287ebbd58985a568341b08040d0efa9d3641a)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 29756ff11c7bea73436153f37af631cbe5e58250
Author: Andrew Or <[email protected]>
Date:   2015-08-03T21:22:07Z

    [SPARK-8735] [SQL] Expose memory usage for shuffles, joins and aggregations
    
    This patch exposes the memory used by internal data structures on the 
SparkUI. This tracks memory used by all spilling operations and SQL operators 
backed by Tungsten, e.g. `BroadcastHashJoin`, `ExternalSort`, 
`GeneratedAggregate` etc. The metric exposed is "peak execution memory", which 
broadly refers to the peak in-memory sizes of each of these data structure.
    
    A separate patch will extend this by linking the new information to the SQL 
operators themselves.
    
    <img width="950" alt="screen shot 2015-07-29 at 7 43 17 pm" 
src="https://cloud.githubusercontent.com/assets/2133137/8974776/b90fc980-362a-11e5-9e2b-842da75b1641.png";>
    <img width="802" alt="screen shot 2015-07-29 at 7 43 05 pm" 
src="https://cloud.githubusercontent.com/assets/2133137/8974777/baa76492-362a-11e5-9b77-e364a6a6b64e.png";>
    
    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png"; height=40 alt="Review 
on Reviewable"/>](https://reviewable.io/reviews/apache/spark/7770)
    <!-- Reviewable:end -->
    
    Author: Andrew Or <[email protected]>
    
    Closes #7770 from andrewor14/expose-memory-metrics and squashes the 
following commits:
    
    9abecb9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
expose-memory-metrics
    f5b0d68 [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
expose-memory-metrics
    d7df332 [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
expose-memory-metrics
    8eefbc5 [Andrew Or] Fix non-failing tests
    9de2a12 [Andrew Or] Fix tests due to another logical merge conflict
    876bfa4 [Andrew Or] Fix failing test after logical merge conflict
    361a359 [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
expose-memory-metrics
    40b4802 [Andrew Or] Fix style?
    d0fef87 [Andrew Or] Fix tests?
    b3b92f6 [Andrew Or] Address comments
    0625d73 [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
expose-memory-metrics
    c00a197 [Andrew Or] Fix potential NPEs
    10da1cd [Andrew Or] Fix compile
    17f4c2d [Andrew Or] Fix compile?
    a87b4d0 [Andrew Or] Fix compile?
    d70874d [Andrew Or] Fix test compile + address comments
    2840b7d [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
expose-memory-metrics
    6aa2f7a [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
expose-memory-metrics
    b889a68 [Andrew Or] Minor changes: comments, spacing, style
    663a303 [Andrew Or] UnsafeShuffleWriter: update peak memory before close
    d090a94 [Andrew Or] Fix style
    2480d84 [Andrew Or] Expand test coverage
    5f1235b [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
expose-memory-metrics
    1ecf678 [Andrew Or] Minor changes: comments, style, unused imports
    0b6926c [Andrew Or] Oops
    111a05e [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
expose-memory-metrics
    a7a39a5 [Andrew Or] Strengthen presence check for accumulator
    a919eb7 [Andrew Or] Add tests for unsafe shuffle writer
    23c845d [Andrew Or] Add tests for SQL operators
    a757550 [Andrew Or] Address comments
    b5c51c1 [Andrew Or] Re-enable test in JavaAPISuite
    5107691 [Andrew Or] Add tests for internal accumulators
    59231e4 [Andrew Or] Fix tests
    9528d09 [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
expose-memory-metrics
    5b5e6f3 [Andrew Or] Add peak execution memory to summary table + tooltip
    92b4b6b [Andrew Or] Display peak execution memory on the UI
    eee5437 [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
expose-memory-metrics
    d9b9015 [Andrew Or] Track execution memory in unsafe shuffles
    770ee54 [Andrew Or] Track execution memory in broadcast joins
    9c605a4 [Andrew Or] Track execution memory in GeneratedAggregate
    9e824f2 [Andrew Or] Add back execution memory tracking for *ExternalSort
    4ef4cb1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
expose-memory-metrics
    e6c3e2f [Andrew Or] Move internal accumulators creation to Stage
    a417592 [Andrew Or] Expose memory metrics in UnsafeExternalSorter
    3c4f042 [Andrew Or] Track memory usage in ExternalAppendOnlyMap / 
ExternalSorter
    bd7ab3f [Andrew Or] Add internal accumulators to TaskContext
    
    (cherry picked from commit 702aa9d7fb16c98a50e046edfd76b8a7861d0391)
    Signed-off-by: Josh Rosen <[email protected]>

commit db5832708267f4a8413b0ad19c6a454c93f7800e
Author: Reynold Xin <[email protected]>
Date:   2015-08-03T21:51:36Z

    Revert "[SPARK-9372] [SQL] Filter nulls in join keys"
    
    This reverts commit 687c8c37150f4c93f8e57d86bb56321a4891286b.

commit 6bd12e819451dbec602f1f2bbfc4c4bebc881e72
Author: Steve Loughran <[email protected]>
Date:   2015-08-03T22:24:34Z

    [SPARK-8064] [SQL] Build against Hive 1.2.1
    
    Cherry picked the parts of the initial SPARK-8064 WiP branch needed to get 
sql/hive to compile against hive 1.2.1. That's the ASF release packaged under 
org.apache.hive, not any fork.
    
    Tests not run yet: that's what the machines are for
    
    Author: Steve Loughran <[email protected]>
    Author: Cheng Lian <[email protected]>
    Author: Michael Armbrust <[email protected]>
    Author: Patrick Wendell <[email protected]>
    
    Closes #7191 from steveloughran/stevel/feature/SPARK-8064-hive-1.2-002 and 
squashes the following commits:
    
    7556d85 [Cheng Lian] Updates .q files and corresponding golden files
    ef4af62 [Steve Loughran] Merge commit 
'6a92bb09f46a04d6cd8c41bdba3ecb727ebb9030' into 
stevel/feature/SPARK-8064-hive-1.2-002
    6a92bb0 [Cheng Lian] Overrides HiveConf time vars
    dcbb391 [Cheng Lian] Adds com.twitter:parquet-hadoop-bundle:1.6.0 for Hive 
Parquet SerDe
    0bbe475 [Steve Loughran] SPARK-8064 scalastyle rejects the standard Hadoop 
ASF license header...
    fdf759b [Steve Loughran] SPARK-8064 classpath dependency suite to be in 
sync with shading in final (?) hive-exec spark
    7a6c727 [Steve Loughran] SPARK-8064 switch to second staging repo of the 
spark-hive artifacts. This one has the protobuf-shaded hive-exec jar
    376c003 [Steve Loughran] SPARK-8064 purge duplicate protobuf declaration
    2c74697 [Steve Loughran] SPARK-8064 switch to the protobuf shaded hive-exec 
jar with tests to chase it down
    cc44020 [Steve Loughran] SPARK-8064 remove hadoop.version from runtest.py, 
as profile will fix that automatically.
    6901fa9 [Steve Loughran] SPARK-8064 explicit protobuf import
    da310dc [Michael Armbrust] Fixes for Hive tests.
    a775a75 [Steve Loughran] SPARK-8064 cherry-pick-incomplete
    7404f34 [Patrick Wendell] Add spark-hive staging repo
    832c164 [Steve Loughran] SPARK-8064 try to supress compiler warnings on 
Complex.java pasted-thrift-code
    312c0d4 [Steve Loughran] SPARK-8064  maven/ivy dependency purge; calcite 
declaration needed
    fa5ae7b [Steve Loughran] HIVE-8064 fix up hive-thriftserver dependencies 
and cut back on evicted references in the hive- packages; this keeps mvn and 
ivy resolution compatible, as the reconciliation policy is "by hand"
    c188048 [Steve Loughran] SPARK-8064 manage the Hive depencencies to that 
-things that aren't needed are excluded -sql/hive built with ivy is in sync 
with the maven reconciliation policy, rather than latest-first
    4c8be8d [Cheng Lian] WIP: Partial fix for Thrift server and CLI tests
    314eb3c [Steve Loughran] SPARK-8064 deprecation warning  noise in one of 
the tests
    17b0341 [Steve Loughran] SPARK-8064 IDE-hinted cleanups of Complex.java to 
reduce compiler warnings. It's all autogenerated code, so still ugly.
    d029b92 [Steve Loughran] SPARK-8064 rely on unescaping to have already 
taken place, so go straight to map of serde options
    23eca7e [Steve Loughran] HIVE-8064 handle raw and escaped property tokens
    54d9b06 [Steve Loughran] SPARK-8064 fix compilation regression surfacing 
from rebase
    0b12d5f [Steve Loughran] HIVE-8064 use subset of hive complex type whose 
types deserialize
    fce73b6 [Steve Loughran] SPARK-8064 poms rely implicitly on the version of 
kryo chill provides
    fd3aa5d [Steve Loughran] SPARK-8064 version of hive to d/l from ivy is 1.2.1
    dc73ece [Steve Loughran] SPARK-8064 revert to master's determinstic 
pushdown strategy
    d3c1e4a [Steve Loughran] SPARK-8064 purge UnionType
    051cc21 [Steve Loughran] SPARK-8064 switch to an unshaded version of 
hive-exec-core, which must have been built with Kryo 2.21. This currently looks 
for a (locally built) version 1.2.1.spark
    6684c60 [Steve Loughran] SPARK-8064 ignore RTE raised in blocking 
process.exitValue() call
    e6121e5 [Steve Loughran] SPARK-8064 address review comments
    aa43dc6 [Steve Loughran] SPARK-8064  more robust teardown on 
JavaMetastoreDatasourcesSuite
    f2bff01 [Steve Loughran] SPARK-8064 better takeup of asynchronously caught 
error text
    8b1ef38 [Steve Loughran] SPARK-8064: on failures executing spark-submit in 
HiveSparkSubmitSuite, print command line and all logged output.
    5a9ce6b [Steve Loughran] SPARK-8064 add explicit reason for kv split 
failure, rather than array OOB. *does not address the issue*
    642b63a [Steve Loughran] SPARK-8064 reinstate something cut briefly during 
rebasing
    97194dc [Steve Loughran] SPARK-8064 add extra logging to the 
YarnClusterSuite classpath test. There should be no reason why this is failing 
on jenkins, but as it is (and presumably its CP-related), improve the logging 
including any exception raised.
    335357f [Steve Loughran] SPARK-8064 fail fast on thrive process spawning 
tests on exit codes and/or error string patterns seen in log.
    3ed872f [Steve Loughran] SPARK-8064 rename field double to  dbl
    bca55e5 [Steve Loughran] SPARK-8064 missed one of the `date` escapes
    41d6479 [Steve Loughran] SPARK-8064 wrap tests with withTable() calls to 
avoid table-exists exceptions
    2bc29a4 [Steve Loughran] SPARK-8064 ParquetSuites to escape `date` field 
name
    1ab9bc4 [Steve Loughran] SPARK-8064 TestHive to use 
sered2.thrift.test.Complex
    bf3a249 [Steve Loughran] SPARK-8064: more resubmit than fix; tighten 
startup timeout to 60s. Still no obvious reason why jersey server code in 
spark-assembly isn't being picked up -it hasn't been shaded
    c829b8f [Steve Loughran] SPARK-8064: reinstate yarn-rm-server dependencies 
to hive-exec to ensure that jersey server is on classpath on hadoop versions < 
2.6
    0b0f738 [Steve Loughran] SPARK-8064: thrift server startup to fail fast on 
any exception in the main thread
    13abaf1 [Steve Loughran] SPARK-8064 Hive compatibilty tests sin sync with 
explain/show output from Hive 1.2.1
    d14d5ea [Steve Loughran] SPARK-8064: DATE is now a predicate; you can't use 
it as a field in select ops
    26eef1c [Steve Loughran] SPARK-8064: HIVE-9039 renamed TOK_UNION => 
TOK_UNIONALL while adding TOK_UNIONDISTINCT
    3d64523 [Steve Loughran] SPARK-8064 improve diagns on uknown token; fix 
scalastyle failure
    d0360f6 [Steve Loughran] SPARK-8064: delicate merge in of the branch 
vanzin/hive-1.1
    1126e5a [Steve Loughran] SPARK-8064: name of unrecognized file format 
wasn't appearing in error text
    8cb09c4 [Steve Loughran] SPARK-8064: test resilience/assertion 
improvements. Independent of the rest of the work; can be backported to earlier 
versions
    dec12cb [Steve Loughran] SPARK-8064: when a CLI suite test fails include 
the full output text in the raised exception; this ensures that the 
stdout/stderr is included in jenkins reports, so it becomes possible to 
diagnose the cause.
    463a670 [Steve Loughran] SPARK-8064 run-tests.py adds a hadoop-2.6 profile, 
and changes info messages to say "w/Hive 1.2.1" in console output
    2531099 [Steve Loughran] SPARK-8064 successful attempt to get rid of 
pentaho as a transitive dependency of hive-exec
    1d59100 [Steve Loughran] SPARK-8064 (unsuccessful) attempt to get rid of 
pentaho as a transitive dependency of hive-exec
    75733fc [Steve Loughran] SPARK-8064 change thrift binary startup message to 
"Starting ThriftBinaryCLIService on port"
    3ebc279 [Steve Loughran] SPARK-8064 move strings used to check for http/bin 
thrift services up into constants
    c80979d [Steve Loughran] SPARK-8064: SparkSQLCLIDriver drops remote mode 
support. CLISuite Tests pass instead of timing out: undetected regression?
    27e8370 [Steve Loughran] SPARK-8064 fix some style & IDE warnings
    00e50d6 [Steve Loughran] SPARK-8064 stop excluding hive shims from 
dependency (commented out , for now)
    cb4f142 [Steve Loughran] SPARK-8054 cut pentaho dependency from calcite
    f7aa9cb [Steve Loughran] SPARK-8064 everything compiles with some 
commenting and moving of classes into a hive package
    6c310b4 [Steve Loughran] SPARK-8064 subclass  Hive ServerOptionsProcessor 
to make it public again
    f61a675 [Steve Loughran] SPARK-8064 thrift server switched to Hive 1.2.1, 
though it doesn't compile everywhere
    4890b9d [Steve Loughran] SPARK-8064, build against Hive 1.2.1
    
    (cherry picked from commit a2409d1c8e8ddec04b529ac6f6a12b5993f0eeda)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 35264204b8e06c37ca99dd5c769aac20bdab161b
Author: Patrick Wendell <[email protected]>
Date:   2015-08-03T23:37:27Z

    Preparing Spark release v1.5.0-snapshot-20150803

commit 73fab8849f6288f36101f52d663a6e7339b6576e
Author: Patrick Wendell <[email protected]>
Date:   2015-08-03T23:37:34Z

    Preparing development version 1.5.0-SNAPSHOT

commit acda9d9546fa3f54676e48d76a2b66016d204074
Author: MechCoder <[email protected]>
Date:   2015-08-03T23:44:25Z

    [SPARK-8874] [ML] Add missing methods in Word2Vec
    
    Add missing methods
    
    1. getVectors
    2. findSynonyms
    
    to W2Vec scala and python API
    
    mengxr
    
    Author: MechCoder <[email protected]>
    
    Closes #7263 from MechCoder/missing_methods_w2vec and squashes the 
following commits:
    
    149d5ca [MechCoder] minor doc
    69d91b7 [MechCoder] [SPARK-8874] [ML] Add missing methods in Word2Vec
    
    (cherry picked from commit 13675c742a71cbdc8324701c3694775ce1dd5c62)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 4c4f638c7333b44049c75ae34486148ab74db333
Author: Patrick Wendell <[email protected]>
Date:   2015-08-03T23:54:50Z

    Preparing Spark release v1.5.0-snapshot-20150803

commit bc49ca468d3abe4949382a32de92f963f454d36a
Author: Patrick Wendell <[email protected]>
Date:   2015-08-03T23:54:56Z

    Preparing development version 1.5.0-SNAPSHOT

commit 7e7147f3b8fee3ac4f2f1d14c3e6776a4d76bb3a
Author: Patrick Wendell <[email protected]>
Date:   2015-08-03T23:59:13Z

    Preparing Spark release v1.5.0-snapshot-20150803

commit 74792e71cb0584637041cb81660ec3ac4ea10c0b
Author: Patrick Wendell <[email protected]>
Date:   2015-08-03T23:59:19Z

    Preparing development version 1.5.0-SNAPSHOT

commit 73c863ac8e8f6cf664f51c64da1da695f341b273
Author: Matthew Brandyberry <[email protected]>
Date:   2015-08-04T00:36:56Z

    [SPARK-9483] Fix UTF8String.getPrefix for big-endian.
    
    Previous code assumed little-endian.
    
    Author: Matthew Brandyberry <[email protected]>
    
    Closes #7902 from mtbrandy/SPARK-9483 and squashes the following commits:
    
    ec31df8 [Matthew Brandyberry] [SPARK-9483] Changes from review comments.
    17d54c6 [Matthew Brandyberry] [SPARK-9483] Fix UTF8String.getPrefix for 
big-endian.
    
    (cherry picked from commit b79b4f5f2251ed7efeec1f4b26e45a8ea6b85a6a)
    Signed-off-by: Davies Liu <[email protected]>

commit 34335719a372c1951fdb4dd25b75b086faf1076f
Author: Burak Yavuz <[email protected]>
Date:   2015-08-04T00:42:03Z

    [SPARK-9263] Added flags to exclude dependencies when using --packages
    
    While the functionality is there to exclude packages, there are no flags 
that allow users to exclude dependencies, in case of dependency conflicts. We 
should provide users with a flag to add dependency exclusions in case the 
packages are not resolved properly (or not available due to licensing).
    
    The flag I added was --packages-exclude, but I'm open on renaming it. I 
also added property flags in case people would like to use a conf file to 
provide dependencies, which is possible if there is a long list of dependencies 
or exclusions.
    
    cc andrewor14 vanzin pwendell
    
    Author: Burak Yavuz <[email protected]>
    
    Closes #7599 from brkyvz/packages-exclusions and squashes the following 
commits:
    
    636f410 [Burak Yavuz] addressed nits
    6e54ede [Burak Yavuz] is this the culprit
    b5e508e [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into 
packages-exclusions
    154f5db [Burak Yavuz] addressed initial comments
    1536d7a [Burak Yavuz] Added flags to exclude packages using 
--packages-exclude
    
    (cherry picked from commit 1633d0a2612d94151f620c919425026150e69ae1)
    Signed-off-by: Marcelo Vanzin <[email protected]>

commit 93076ae39b58ba8c4a459f2b3a8590c492dc5c4e
Author: CodingCat <[email protected]>
Date:   2015-08-04T01:20:40Z

    [SPARK-8416] highlight and topping the executor threads in thread dumping 
page
    
    https://issues.apache.org/jira/browse/SPARK-8416
    
    To facilitate debugging, I made this patch with three changes:
    
    * render the executor-thread and non executor-thread entries with different 
background colors
    
    * put the executor threads on the top of the list
    
    * sort the threads alphabetically
    
    Author: CodingCat <[email protected]>
    
    Closes #7808 from CodingCat/SPARK-8416 and squashes the following commits:
    
    34fc708 [CodingCat] fix className
    d7b79dd [CodingCat] lowercase threadName
    d032882 [CodingCat] sort alphabetically and change the css class name
    f0513b1 [CodingCat] change the color & group threads by name
    2da6e06 [CodingCat] small fix
    3fc9f36 [CodingCat] define classes in webui.css
    8ee125e [CodingCat] highlight and put on top the executor threads in thread 
dumping page
    
    (cherry picked from commit 3b0e44490aebfba30afc147e4a34a63439d985c6)
    Signed-off-by: Josh Rosen <[email protected]>

commit ebe42b98c8fa0cac6ec267e895402cebe8a670a9
Author: Reynold Xin <[email protected]>
Date:   2015-08-04T01:47:02Z

    [SPARK-9577][SQL] Surface concrete iterator types in various sort classes.
    
    We often return abstract iterator types in various sort-related classes 
(e.g. UnsafeKVExternalSorter). It is actually better to return a more concrete 
type, so the callsite uses that type and JIT can inline the iterator calls.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #7911 from rxin/surface-concrete-type and squashes the following 
commits:
    
    0422add [Reynold Xin] [SPARK-9577][SQL] Surface concrete iterator types in 
various sort classes.
    
    (cherry picked from commit 5eb89f67e323dcf9fa3d5b30f9b5cb8f10ca1e8c)
    Signed-off-by: Reynold Xin <[email protected]>

commit 1f7dbcd6fdeee22c7b670ea98dcb4e794f84a8cd
Author: Sean Owen <[email protected]>
Date:   2015-08-04T04:48:22Z

    [SPARK-9521] [DOCS] Addendum. Require Maven 3.3.3+ in the build
    
    Follow on for #7852: Building Spark doc needs to refer to new Maven 
requirement too
    
    Author: Sean Owen <[email protected]>
    
    Closes #7905 from srowen/SPARK-9521.2 and squashes the following commits:
    
    73285df [Sean Owen] Follow on for #7852: Building Spark doc needs to refer 
to new Maven requirement too
    
    (cherry picked from commit 0afa6fbf525723e97c6dacfdba3ad1762637ffa9)
    Signed-off-by: Sean Owen <[email protected]>

commit 29f2d5a065254e7ed44fb204a1deecf9d44d338c
Author: Ankur Dave <[email protected]>
Date:   2015-08-04T06:07:32Z

    [SPARK-3190] [GRAPHX] Fix VertexRDD.count() overflow regression
    
    SPARK-3190 was originally fixed by 
96df92906978c5f58e0cc8ff5eebe5b35a08be3b, but 
a5ef58113667ff73562ce6db381cff96a0b354b0 introduced a regression during 
refactoring. This commit fixes the regression.
    
    Author: Ankur Dave <[email protected]>
    
    Closes #7923 from ankurdave/SPARK-3190-reopening and squashes the following 
commits:
    
    a3e1b23 [Ankur Dave] Fix VertexRDD.count() overflow regression
    
    (cherry picked from commit 9e952ecbce670e9b532a1c664a4d03b66e404112)
    Signed-off-by: Reynold Xin <[email protected]>

commit 5ae675360d883483e509788b8867c1c98b4820fd
Author: Sean Owen <[email protected]>
Date:   2015-08-04T11:02:26Z

    [SPARK-9534] [BUILD] Enable javac lint for scalac parity; fix a lot of 
build warnings, 1.5.0 edition
    
    Enable most javac lint warnings; fix a lot of build warnings. In a few 
cases, touch up surrounding code in the process.
    
    I'll explain several of the changes inline in comments.
    
    Author: Sean Owen <[email protected]>
    
    Closes #7862 from srowen/SPARK-9534 and squashes the following commits:
    
    ea51618 [Sean Owen] Enable most javac lint warnings; fix a lot of build 
warnings. In a few cases, touch up surrounding code in the process.
    
    (cherry picked from commit 76d74090d60f74412bd45487e8db6aff2e8343a2)
    Signed-off-by: Sean Owen <[email protected]>

commit bd9b7521343c34c42be40ee05a01c8a976ed2307
Author: tedyu <[email protected]>
Date:   2015-08-04T11:22:53Z

    [SPARK-8064] [BUILD] Follow-up. Undo change from SPARK-9507 that was 
accidentally reverted
    
    This PR removes the dependency reduced POM hack brought back by #7191
    
    Author: tedyu <[email protected]>
    
    Closes #7919 from tedyu/master and squashes the following commits:
    
    1bfbd7b [tedyu] [BUILD] Remove dependency reduced POM hack
    
    (cherry picked from commit b211cbc7369af5eb2cb65d93c4c57c4db7143f47)
    Signed-off-by: Sean Owen <[email protected]>

commit 45c8d2bb872bb905a402cf3aa78b1c4efaac07cf
Author: Carson Wang <[email protected]>
Date:   2015-08-04T13:12:30Z

    [SPARK-2016] [WEBUI] RDD partition table pagination for the RDD Page
    
    Add pagination for the RDD page to avoid unresponsive UI when the number of 
the RDD partitions is large.
    Before:
    
![rddpagebefore](https://cloud.githubusercontent.com/assets/9278199/8951533/3d9add54-3601-11e5-99d0-5653b473c49b.png)
    After:
    
![rddpageafter](https://cloud.githubusercontent.com/assets/9278199/8951536/439d66e0-3601-11e5-9cee-1b380fe6620d.png)
    
    Author: Carson Wang <[email protected]>
    
    Closes #7692 from carsonwang/SPARK-2016 and squashes the following commits:
    
    03c7168 [Carson Wang] Fix style issues
    612c18c [Carson Wang] RDD partition table pagination for the RDD Page
    
    (cherry picked from commit cb7fa0aa93dae5a25a8e7e387dbd6b55a5a23fb0)
    Signed-off-by: Kousuke Saruta <[email protected]>

commit f44b27a2b92da2325ed9389cd27b6e2cfd9ec486
Author: Marcelo Vanzin <[email protected]>
Date:   2015-08-04T13:19:11Z

    [SPARK-9583] [BUILD] Do not print mvn debug messages to stdout.
    
    This allows build/mvn to be used by make-distribution.sh.
    
    Author: Marcelo Vanzin <[email protected]>
    
    Closes #7915 from vanzin/SPARK-9583 and squashes the following commits:
    
    6469e60 [Marcelo Vanzin] [SPARK-9583] [build] Do not print mvn debug 
messages to stdout.
    
    (cherry picked from commit d702d53732b44e8242448ce5302738bd130717d8)
    Signed-off-by: Kousuke Saruta <[email protected]>

commit 945da3534762a73fe7ffc52c868ff07a0783502b
Author: Tarek Auel <[email protected]>
Date:   2015-08-04T15:59:42Z

    [SPARK-8244] [SQL] string function: find in set
    
    This PR is based on #7186 (just fix the conflict), thanks to tarekauel .
    
    find_in_set(string str, string strList): int
    
    Returns the first occurance of str in strList where strList is a 
comma-delimited string. Returns null if either argument is null. Returns 0 if 
the first argument contains any commas. For example, find_in_set('ab', 
'abc,b,ab,c,def') returns 3.
    
    Only add this to SQL, not DataFrame.
    
    Closes #7186
    
    Author: Tarek Auel <[email protected]>
    Author: Davies Liu <[email protected]>
    
    Closes #7900 from davies/find_in_set and squashes the following commits:
    
    4334209 [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
find_in_set
    8f00572 [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
find_in_set
    243ede4 [Tarek Auel] [SPARK-8244][SQL] hive compatibility
    1aaf64e [Tarek Auel] [SPARK-8244][SQL] unit test fix
    e4093a4 [Tarek Auel] [SPARK-8244][SQL] final modifier for COMMA_UTF8
    0d05df5 [Tarek Auel] Merge branch 'master' into SPARK-8244
    208d710 [Tarek Auel] [SPARK-8244] address comments & bug fix
    71b2e69 [Tarek Auel] [SPARK-8244] find_in_set
    66c7fda [Tarek Auel] Merge branch 'master' into SPARK-8244
    61b8ca2 [Tarek Auel] [SPARK-8224] removed loop and split; use unsafe String 
comparison
    4f75a65 [Tarek Auel] Merge branch 'master' into SPARK-8244
    e3b20c8 [Tarek Auel] [SPARK-8244] added type check
    1c2bbb7 [Tarek Auel] [SPARK-8244] findInSet

commit b42e13dca38c6e9ff9cf879bcb52efa681437120
Author: Davies Liu <[email protected]>
Date:   2015-08-04T16:07:09Z

    [SPARK-8246] [SQL] Implement get_json_object
    
    This is based on #7485 , thanks to NathanHowell
    
    Tests were copied from Hive, but do not seem to be super comprehensive. 
I've generally replicated Hive's unusual behavior rather than following a 
JSONPath reference, except for one case (as noted in the comments). I don't 
know if there is a way of fully replicating Hive's behavior without a slower 
TreeNode implementation, so I've erred on the side of performance instead.
    
    Author: Davies Liu <[email protected]>
    Author: Yin Huai <[email protected]>
    Author: Nathan Howell <[email protected]>
    
    Closes #7901 from davies/get_json_object and squashes the following commits:
    
    3ace9b9 [Davies Liu] Merge branch 'get_json_object' of 
github.com:davies/spark into get_json_object
    98766fc [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
get_json_object
    a7dc6d0 [Davies Liu] Update JsonExpressionsSuite.scala
    c818519 [Yin Huai] new results.
    18ce26b [Davies Liu] fix tests
    6ac29fb [Yin Huai] Golden files.
    25eebef [Davies Liu] use HiveQuerySuite
    e0ac6ec [Yin Huai] Golden answer files.
    940c060 [Davies Liu] tweat code style
    44084c5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
get_json_object
    9192d09 [Nathan Howell] Match Hiveâs behavior for unwrapping arrays of 
one element
    8dab647 [Nathan Howell] [SPARK-8246] [SQL] Implement get_json_object
    
    (cherry picked from commit 73dedb589d06f7c7a525cc4f07721a77f480c434)
    Signed-off-by: Davies Liu <[email protected]>

commit d875368edd7265cedf808c921c0af0deb4895a67
Author: Yijie Shen <[email protected]>
Date:   2015-08-04T16:09:52Z

    [SPARK-9541] [SQL] DataTimeUtils cleanup
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-9541
    
    Author: Yijie Shen <[email protected]>
    
    Closes #7870 from yjshen/datetime_cleanup and squashes the following 
commits:
    
    9203e33 [Yijie Shen] revert getMonth & getDayOfMonth
    5cad119 [Yijie Shen] rebase code
    7d62a74 [Yijie Shen] remove tmp tuple inside split date
    e98aaac [Yijie Shen] DataTimeUtils cleanup
    
    (cherry picked from commit b5034c9c59947f20423faa46bc6606aad56836b0)
    Signed-off-by: Davies Liu <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Branch 1.5

Reply via email to