GitHub user bigballofmud opened a pull request:
https://github.com/apache/spark/pull/7938
Branch 1.5
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/spark branch-1.5
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/7938.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #7938
----
commit 4de833e9e81415832c0556d8f1b9e3c3ae48cafa
Author: Joseph Batchik <[email protected]>
Date: 2015-08-03T18:17:38Z
[SPARK-9511] [SQL] Fixed Table Name Parsing
The issue was that the tokenizer was parsing "1one" into the numeric 1
using the code on line 110. I added another case to accept strings that start
with a number and then have a letter somewhere else in it as well.
Author: Joseph Batchik <[email protected]>
Closes #7844 from JDrit/parse_error and squashes the following commits:
b8ca12f [Joseph Batchik] fixed parsing issue by adding another case
(cherry picked from commit dfe7bd168d9bcf8c53f993f459ab473d893457b0)
Signed-off-by: Michael Armbrust <[email protected]>
commit 5452e93f03bc308282cb8f189f65bb1b258d8813
Author: Reynold Xin <[email protected]>
Date: 2015-08-03T18:22:02Z
[SQL][minor] Simplify UnsafeRow.calculateBitSetWidthInBytes.
Author: Reynold Xin <[email protected]>
Closes #7897 from rxin/calculateBitSetWidthInBytes and squashes the
following commits:
2e73b3a [Reynold Xin] [SQL][minor] Simplify
UnsafeRow.calculateBitSetWidthInBytes.
(cherry picked from commit 7a9d09f0bb472a1671d3457e1f7108f4c2eb4121)
Signed-off-by: Reynold Xin <[email protected]>
commit 6d46e9b7c8ffde5d3cc3d86b005c40c51934e56b
Author: Cheng Lian <[email protected]>
Date: 2015-08-03T19:06:58Z
[SPARK-9554] [SQL] Enables in-memory partition pruning by default
Author: Cheng Lian <[email protected]>
Closes #7895 from liancheng/spark-9554/enable-in-memory-partition-pruning
and squashes the following commits:
67c403e [Cheng Lian] Enables in-memory partition pruning by default
(cherry picked from commit 703e44bff19f4c394f6f9bff1ce9152cdc68c51e)
Signed-off-by: Reynold Xin <[email protected]>
commit b3117d312332af3b4bd416857f632cacb5230feb
Author: Joseph K. Bradley <[email protected]>
Date: 2015-08-03T19:17:46Z
[SPARK-5133] [ML] Added featureImportance to RandomForestClassifier and
Regressor
Added featureImportance to RandomForestClassifier and Regressor.
This follows the scikit-learn implementation here:
[https://github.com/scikit-learn/scikit-learn/blob/a95203b249c1cf392f86d001ad999e29b2392739/sklearn/tree/_tree.pyx#L3341]
CC: yanboliang Would you mind taking a look? Thanks!
Author: Joseph K. Bradley <[email protected]>
Author: Feynman Liang <[email protected]>
Closes #7838 from jkbradley/dt-feature-importance and squashes the
following commits:
72a167a [Joseph K. Bradley] fixed unit test
86cea5f [Joseph K. Bradley] Modified RF featuresImportances to return
Vector instead of Map
5aa74f0 [Joseph K. Bradley] finally fixed unit test for real
33df5db [Joseph K. Bradley] fix unit test
42a2d3b [Joseph K. Bradley] fix unit test
fe94e72 [Joseph K. Bradley] modified feature importance unit tests
cc693ee [Feynman Liang] Add classifier tests
79a6f87 [Feynman Liang] Compare dense vectors in test
21d01fc [Feynman Liang] Added failing SKLearn test
ac0b254 [Joseph K. Bradley] Added featureImportance to
RandomForestClassifier/Regressor. Need to add unit tests
(cherry picked from commit ff9169a002f1b75231fd25b7d04157a912503038)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit 444058d9158d426ae455208f07bf9c202e8f9925
Author: Kousuke Saruta <[email protected]>
Date: 2015-08-03T19:53:44Z
[SPARK-9558][DOCS]Update docs to follow the increase of memory defaults.
Now the memory defaults of master and slave in Standalone mode and History
Server is 1g, not 512m. So let's update docs.
Author: Kousuke Saruta <[email protected]>
Closes #7896 from sarutak/update-doc-for-daemon-memory and squashes the
following commits:
a77626c [Kousuke Saruta] Fix docs to follow the update of increase of
memory defaults
(cherry picked from commit ba1c4e138de2ea84b55def4eed2bd363e60aea4d)
Signed-off-by: Reynold Xin <[email protected]>
commit dc0c8c982825c3c58b7c6c4570c03ba97dba608b
Author: Xiangrui Meng <[email protected]>
Date: 2015-08-03T20:59:35Z
[SPARK-9544] [MLLIB] add Python API for RFormula
Add Python API for RFormula. Similar to other feature transformers in
Python. This is just a thin wrapper over the Scala implementation. ericl
MechCoder
Author: Xiangrui Meng <[email protected]>
Closes #7879 from mengxr/SPARK-9544 and squashes the following commits:
3d5ff03 [Xiangrui Meng] add an doctest for . and -
5e969a5 [Xiangrui Meng] fix pydoc
1cd41f8 [Xiangrui Meng] organize imports
3c18b10 [Xiangrui Meng] add Python API for RFormula
(cherry picked from commit e4765a46833baff1dd7465c4cf50e947de7e8f21)
Signed-off-by: Xiangrui Meng <[email protected]>
commit e7329ab31323a89d1e07c808927e5543876e3ce3
Author: Yanbo Liang <[email protected]>
Date: 2015-08-03T20:58:00Z
[SPARK-9191] [ML] [Doc] Add ml.PCA user guide and code examples
Add ml.PCA user guide document and code examples for Scala/Java/Python.
Author: Yanbo Liang <[email protected]>
Closes #7522 from yanboliang/ml-pca-md and squashes the following commits:
60dec05 [Yanbo Liang] address comments
f992abe [Yanbo Liang] Add ml.PCA doc and examples
(cherry picked from commit 8ca287ebbd58985a568341b08040d0efa9d3641a)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit 29756ff11c7bea73436153f37af631cbe5e58250
Author: Andrew Or <[email protected]>
Date: 2015-08-03T21:22:07Z
[SPARK-8735] [SQL] Expose memory usage for shuffles, joins and aggregations
This patch exposes the memory used by internal data structures on the
SparkUI. This tracks memory used by all spilling operations and SQL operators
backed by Tungsten, e.g. `BroadcastHashJoin`, `ExternalSort`,
`GeneratedAggregate` etc. The metric exposed is "peak execution memory", which
broadly refers to the peak in-memory sizes of each of these data structure.
A separate patch will extend this by linking the new information to the SQL
operators themselves.
<img width="950" alt="screen shot 2015-07-29 at 7 43 17 pm"
src="https://cloud.githubusercontent.com/assets/2133137/8974776/b90fc980-362a-11e5-9e2b-842da75b1641.png">
<img width="802" alt="screen shot 2015-07-29 at 7 43 05 pm"
src="https://cloud.githubusercontent.com/assets/2133137/8974777/baa76492-362a-11e5-9b77-e364a6a6b64e.png">
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review
on Reviewable"/>](https://reviewable.io/reviews/apache/spark/7770)
<!-- Reviewable:end -->
Author: Andrew Or <[email protected]>
Closes #7770 from andrewor14/expose-memory-metrics and squashes the
following commits:
9abecb9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
expose-memory-metrics
f5b0d68 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
expose-memory-metrics
d7df332 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
expose-memory-metrics
8eefbc5 [Andrew Or] Fix non-failing tests
9de2a12 [Andrew Or] Fix tests due to another logical merge conflict
876bfa4 [Andrew Or] Fix failing test after logical merge conflict
361a359 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
expose-memory-metrics
40b4802 [Andrew Or] Fix style?
d0fef87 [Andrew Or] Fix tests?
b3b92f6 [Andrew Or] Address comments
0625d73 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
expose-memory-metrics
c00a197 [Andrew Or] Fix potential NPEs
10da1cd [Andrew Or] Fix compile
17f4c2d [Andrew Or] Fix compile?
a87b4d0 [Andrew Or] Fix compile?
d70874d [Andrew Or] Fix test compile + address comments
2840b7d [Andrew Or] Merge branch 'master' of github.com:apache/spark into
expose-memory-metrics
6aa2f7a [Andrew Or] Merge branch 'master' of github.com:apache/spark into
expose-memory-metrics
b889a68 [Andrew Or] Minor changes: comments, spacing, style
663a303 [Andrew Or] UnsafeShuffleWriter: update peak memory before close
d090a94 [Andrew Or] Fix style
2480d84 [Andrew Or] Expand test coverage
5f1235b [Andrew Or] Merge branch 'master' of github.com:apache/spark into
expose-memory-metrics
1ecf678 [Andrew Or] Minor changes: comments, style, unused imports
0b6926c [Andrew Or] Oops
111a05e [Andrew Or] Merge branch 'master' of github.com:apache/spark into
expose-memory-metrics
a7a39a5 [Andrew Or] Strengthen presence check for accumulator
a919eb7 [Andrew Or] Add tests for unsafe shuffle writer
23c845d [Andrew Or] Add tests for SQL operators
a757550 [Andrew Or] Address comments
b5c51c1 [Andrew Or] Re-enable test in JavaAPISuite
5107691 [Andrew Or] Add tests for internal accumulators
59231e4 [Andrew Or] Fix tests
9528d09 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
expose-memory-metrics
5b5e6f3 [Andrew Or] Add peak execution memory to summary table + tooltip
92b4b6b [Andrew Or] Display peak execution memory on the UI
eee5437 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
expose-memory-metrics
d9b9015 [Andrew Or] Track execution memory in unsafe shuffles
770ee54 [Andrew Or] Track execution memory in broadcast joins
9c605a4 [Andrew Or] Track execution memory in GeneratedAggregate
9e824f2 [Andrew Or] Add back execution memory tracking for *ExternalSort
4ef4cb1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
expose-memory-metrics
e6c3e2f [Andrew Or] Move internal accumulators creation to Stage
a417592 [Andrew Or] Expose memory metrics in UnsafeExternalSorter
3c4f042 [Andrew Or] Track memory usage in ExternalAppendOnlyMap /
ExternalSorter
bd7ab3f [Andrew Or] Add internal accumulators to TaskContext
(cherry picked from commit 702aa9d7fb16c98a50e046edfd76b8a7861d0391)
Signed-off-by: Josh Rosen <[email protected]>
commit db5832708267f4a8413b0ad19c6a454c93f7800e
Author: Reynold Xin <[email protected]>
Date: 2015-08-03T21:51:36Z
Revert "[SPARK-9372] [SQL] Filter nulls in join keys"
This reverts commit 687c8c37150f4c93f8e57d86bb56321a4891286b.
commit 6bd12e819451dbec602f1f2bbfc4c4bebc881e72
Author: Steve Loughran <[email protected]>
Date: 2015-08-03T22:24:34Z
[SPARK-8064] [SQL] Build against Hive 1.2.1
Cherry picked the parts of the initial SPARK-8064 WiP branch needed to get
sql/hive to compile against hive 1.2.1. That's the ASF release packaged under
org.apache.hive, not any fork.
Tests not run yet: that's what the machines are for
Author: Steve Loughran <[email protected]>
Author: Cheng Lian <[email protected]>
Author: Michael Armbrust <[email protected]>
Author: Patrick Wendell <[email protected]>
Closes #7191 from steveloughran/stevel/feature/SPARK-8064-hive-1.2-002 and
squashes the following commits:
7556d85 [Cheng Lian] Updates .q files and corresponding golden files
ef4af62 [Steve Loughran] Merge commit
'6a92bb09f46a04d6cd8c41bdba3ecb727ebb9030' into
stevel/feature/SPARK-8064-hive-1.2-002
6a92bb0 [Cheng Lian] Overrides HiveConf time vars
dcbb391 [Cheng Lian] Adds com.twitter:parquet-hadoop-bundle:1.6.0 for Hive
Parquet SerDe
0bbe475 [Steve Loughran] SPARK-8064 scalastyle rejects the standard Hadoop
ASF license header...
fdf759b [Steve Loughran] SPARK-8064 classpath dependency suite to be in
sync with shading in final (?) hive-exec spark
7a6c727 [Steve Loughran] SPARK-8064 switch to second staging repo of the
spark-hive artifacts. This one has the protobuf-shaded hive-exec jar
376c003 [Steve Loughran] SPARK-8064 purge duplicate protobuf declaration
2c74697 [Steve Loughran] SPARK-8064 switch to the protobuf shaded hive-exec
jar with tests to chase it down
cc44020 [Steve Loughran] SPARK-8064 remove hadoop.version from runtest.py,
as profile will fix that automatically.
6901fa9 [Steve Loughran] SPARK-8064 explicit protobuf import
da310dc [Michael Armbrust] Fixes for Hive tests.
a775a75 [Steve Loughran] SPARK-8064 cherry-pick-incomplete
7404f34 [Patrick Wendell] Add spark-hive staging repo
832c164 [Steve Loughran] SPARK-8064 try to supress compiler warnings on
Complex.java pasted-thrift-code
312c0d4 [Steve Loughran] SPARK-8064 maven/ivy dependency purge; calcite
declaration needed
fa5ae7b [Steve Loughran] HIVE-8064 fix up hive-thriftserver dependencies
and cut back on evicted references in the hive- packages; this keeps mvn and
ivy resolution compatible, as the reconciliation policy is "by hand"
c188048 [Steve Loughran] SPARK-8064 manage the Hive depencencies to that
-things that aren't needed are excluded -sql/hive built with ivy is in sync
with the maven reconciliation policy, rather than latest-first
4c8be8d [Cheng Lian] WIP: Partial fix for Thrift server and CLI tests
314eb3c [Steve Loughran] SPARK-8064 deprecation warning noise in one of
the tests
17b0341 [Steve Loughran] SPARK-8064 IDE-hinted cleanups of Complex.java to
reduce compiler warnings. It's all autogenerated code, so still ugly.
d029b92 [Steve Loughran] SPARK-8064 rely on unescaping to have already
taken place, so go straight to map of serde options
23eca7e [Steve Loughran] HIVE-8064 handle raw and escaped property tokens
54d9b06 [Steve Loughran] SPARK-8064 fix compilation regression surfacing
from rebase
0b12d5f [Steve Loughran] HIVE-8064 use subset of hive complex type whose
types deserialize
fce73b6 [Steve Loughran] SPARK-8064 poms rely implicitly on the version of
kryo chill provides
fd3aa5d [Steve Loughran] SPARK-8064 version of hive to d/l from ivy is 1.2.1
dc73ece [Steve Loughran] SPARK-8064 revert to master's determinstic
pushdown strategy
d3c1e4a [Steve Loughran] SPARK-8064 purge UnionType
051cc21 [Steve Loughran] SPARK-8064 switch to an unshaded version of
hive-exec-core, which must have been built with Kryo 2.21. This currently looks
for a (locally built) version 1.2.1.spark
6684c60 [Steve Loughran] SPARK-8064 ignore RTE raised in blocking
process.exitValue() call
e6121e5 [Steve Loughran] SPARK-8064 address review comments
aa43dc6 [Steve Loughran] SPARK-8064 more robust teardown on
JavaMetastoreDatasourcesSuite
f2bff01 [Steve Loughran] SPARK-8064 better takeup of asynchronously caught
error text
8b1ef38 [Steve Loughran] SPARK-8064: on failures executing spark-submit in
HiveSparkSubmitSuite, print command line and all logged output.
5a9ce6b [Steve Loughran] SPARK-8064 add explicit reason for kv split
failure, rather than array OOB. *does not address the issue*
642b63a [Steve Loughran] SPARK-8064 reinstate something cut briefly during
rebasing
97194dc [Steve Loughran] SPARK-8064 add extra logging to the
YarnClusterSuite classpath test. There should be no reason why this is failing
on jenkins, but as it is (and presumably its CP-related), improve the logging
including any exception raised.
335357f [Steve Loughran] SPARK-8064 fail fast on thrive process spawning
tests on exit codes and/or error string patterns seen in log.
3ed872f [Steve Loughran] SPARK-8064 rename field double to dbl
bca55e5 [Steve Loughran] SPARK-8064 missed one of the `date` escapes
41d6479 [Steve Loughran] SPARK-8064 wrap tests with withTable() calls to
avoid table-exists exceptions
2bc29a4 [Steve Loughran] SPARK-8064 ParquetSuites to escape `date` field
name
1ab9bc4 [Steve Loughran] SPARK-8064 TestHive to use
sered2.thrift.test.Complex
bf3a249 [Steve Loughran] SPARK-8064: more resubmit than fix; tighten
startup timeout to 60s. Still no obvious reason why jersey server code in
spark-assembly isn't being picked up -it hasn't been shaded
c829b8f [Steve Loughran] SPARK-8064: reinstate yarn-rm-server dependencies
to hive-exec to ensure that jersey server is on classpath on hadoop versions <
2.6
0b0f738 [Steve Loughran] SPARK-8064: thrift server startup to fail fast on
any exception in the main thread
13abaf1 [Steve Loughran] SPARK-8064 Hive compatibilty tests sin sync with
explain/show output from Hive 1.2.1
d14d5ea [Steve Loughran] SPARK-8064: DATE is now a predicate; you can't use
it as a field in select ops
26eef1c [Steve Loughran] SPARK-8064: HIVE-9039 renamed TOK_UNION =>
TOK_UNIONALL while adding TOK_UNIONDISTINCT
3d64523 [Steve Loughran] SPARK-8064 improve diagns on uknown token; fix
scalastyle failure
d0360f6 [Steve Loughran] SPARK-8064: delicate merge in of the branch
vanzin/hive-1.1
1126e5a [Steve Loughran] SPARK-8064: name of unrecognized file format
wasn't appearing in error text
8cb09c4 [Steve Loughran] SPARK-8064: test resilience/assertion
improvements. Independent of the rest of the work; can be backported to earlier
versions
dec12cb [Steve Loughran] SPARK-8064: when a CLI suite test fails include
the full output text in the raised exception; this ensures that the
stdout/stderr is included in jenkins reports, so it becomes possible to
diagnose the cause.
463a670 [Steve Loughran] SPARK-8064 run-tests.py adds a hadoop-2.6 profile,
and changes info messages to say "w/Hive 1.2.1" in console output
2531099 [Steve Loughran] SPARK-8064 successful attempt to get rid of
pentaho as a transitive dependency of hive-exec
1d59100 [Steve Loughran] SPARK-8064 (unsuccessful) attempt to get rid of
pentaho as a transitive dependency of hive-exec
75733fc [Steve Loughran] SPARK-8064 change thrift binary startup message to
"Starting ThriftBinaryCLIService on port"
3ebc279 [Steve Loughran] SPARK-8064 move strings used to check for http/bin
thrift services up into constants
c80979d [Steve Loughran] SPARK-8064: SparkSQLCLIDriver drops remote mode
support. CLISuite Tests pass instead of timing out: undetected regression?
27e8370 [Steve Loughran] SPARK-8064 fix some style & IDE warnings
00e50d6 [Steve Loughran] SPARK-8064 stop excluding hive shims from
dependency (commented out , for now)
cb4f142 [Steve Loughran] SPARK-8054 cut pentaho dependency from calcite
f7aa9cb [Steve Loughran] SPARK-8064 everything compiles with some
commenting and moving of classes into a hive package
6c310b4 [Steve Loughran] SPARK-8064 subclass Hive ServerOptionsProcessor
to make it public again
f61a675 [Steve Loughran] SPARK-8064 thrift server switched to Hive 1.2.1,
though it doesn't compile everywhere
4890b9d [Steve Loughran] SPARK-8064, build against Hive 1.2.1
(cherry picked from commit a2409d1c8e8ddec04b529ac6f6a12b5993f0eeda)
Signed-off-by: Michael Armbrust <[email protected]>
commit 35264204b8e06c37ca99dd5c769aac20bdab161b
Author: Patrick Wendell <[email protected]>
Date: 2015-08-03T23:37:27Z
Preparing Spark release v1.5.0-snapshot-20150803
commit 73fab8849f6288f36101f52d663a6e7339b6576e
Author: Patrick Wendell <[email protected]>
Date: 2015-08-03T23:37:34Z
Preparing development version 1.5.0-SNAPSHOT
commit acda9d9546fa3f54676e48d76a2b66016d204074
Author: MechCoder <[email protected]>
Date: 2015-08-03T23:44:25Z
[SPARK-8874] [ML] Add missing methods in Word2Vec
Add missing methods
1. getVectors
2. findSynonyms
to W2Vec scala and python API
mengxr
Author: MechCoder <[email protected]>
Closes #7263 from MechCoder/missing_methods_w2vec and squashes the
following commits:
149d5ca [MechCoder] minor doc
69d91b7 [MechCoder] [SPARK-8874] [ML] Add missing methods in Word2Vec
(cherry picked from commit 13675c742a71cbdc8324701c3694775ce1dd5c62)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit 4c4f638c7333b44049c75ae34486148ab74db333
Author: Patrick Wendell <[email protected]>
Date: 2015-08-03T23:54:50Z
Preparing Spark release v1.5.0-snapshot-20150803
commit bc49ca468d3abe4949382a32de92f963f454d36a
Author: Patrick Wendell <[email protected]>
Date: 2015-08-03T23:54:56Z
Preparing development version 1.5.0-SNAPSHOT
commit 7e7147f3b8fee3ac4f2f1d14c3e6776a4d76bb3a
Author: Patrick Wendell <[email protected]>
Date: 2015-08-03T23:59:13Z
Preparing Spark release v1.5.0-snapshot-20150803
commit 74792e71cb0584637041cb81660ec3ac4ea10c0b
Author: Patrick Wendell <[email protected]>
Date: 2015-08-03T23:59:19Z
Preparing development version 1.5.0-SNAPSHOT
commit 73c863ac8e8f6cf664f51c64da1da695f341b273
Author: Matthew Brandyberry <[email protected]>
Date: 2015-08-04T00:36:56Z
[SPARK-9483] Fix UTF8String.getPrefix for big-endian.
Previous code assumed little-endian.
Author: Matthew Brandyberry <[email protected]>
Closes #7902 from mtbrandy/SPARK-9483 and squashes the following commits:
ec31df8 [Matthew Brandyberry] [SPARK-9483] Changes from review comments.
17d54c6 [Matthew Brandyberry] [SPARK-9483] Fix UTF8String.getPrefix for
big-endian.
(cherry picked from commit b79b4f5f2251ed7efeec1f4b26e45a8ea6b85a6a)
Signed-off-by: Davies Liu <[email protected]>
commit 34335719a372c1951fdb4dd25b75b086faf1076f
Author: Burak Yavuz <[email protected]>
Date: 2015-08-04T00:42:03Z
[SPARK-9263] Added flags to exclude dependencies when using --packages
While the functionality is there to exclude packages, there are no flags
that allow users to exclude dependencies, in case of dependency conflicts. We
should provide users with a flag to add dependency exclusions in case the
packages are not resolved properly (or not available due to licensing).
The flag I added was --packages-exclude, but I'm open on renaming it. I
also added property flags in case people would like to use a conf file to
provide dependencies, which is possible if there is a long list of dependencies
or exclusions.
cc andrewor14 vanzin pwendell
Author: Burak Yavuz <[email protected]>
Closes #7599 from brkyvz/packages-exclusions and squashes the following
commits:
636f410 [Burak Yavuz] addressed nits
6e54ede [Burak Yavuz] is this the culprit
b5e508e [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into
packages-exclusions
154f5db [Burak Yavuz] addressed initial comments
1536d7a [Burak Yavuz] Added flags to exclude packages using
--packages-exclude
(cherry picked from commit 1633d0a2612d94151f620c919425026150e69ae1)
Signed-off-by: Marcelo Vanzin <[email protected]>
commit 93076ae39b58ba8c4a459f2b3a8590c492dc5c4e
Author: CodingCat <[email protected]>
Date: 2015-08-04T01:20:40Z
[SPARK-8416] highlight and topping the executor threads in thread dumping
page
https://issues.apache.org/jira/browse/SPARK-8416
To facilitate debugging, I made this patch with three changes:
* render the executor-thread and non executor-thread entries with different
background colors
* put the executor threads on the top of the list
* sort the threads alphabetically
Author: CodingCat <[email protected]>
Closes #7808 from CodingCat/SPARK-8416 and squashes the following commits:
34fc708 [CodingCat] fix className
d7b79dd [CodingCat] lowercase threadName
d032882 [CodingCat] sort alphabetically and change the css class name
f0513b1 [CodingCat] change the color & group threads by name
2da6e06 [CodingCat] small fix
3fc9f36 [CodingCat] define classes in webui.css
8ee125e [CodingCat] highlight and put on top the executor threads in thread
dumping page
(cherry picked from commit 3b0e44490aebfba30afc147e4a34a63439d985c6)
Signed-off-by: Josh Rosen <[email protected]>
commit ebe42b98c8fa0cac6ec267e895402cebe8a670a9
Author: Reynold Xin <[email protected]>
Date: 2015-08-04T01:47:02Z
[SPARK-9577][SQL] Surface concrete iterator types in various sort classes.
We often return abstract iterator types in various sort-related classes
(e.g. UnsafeKVExternalSorter). It is actually better to return a more concrete
type, so the callsite uses that type and JIT can inline the iterator calls.
Author: Reynold Xin <[email protected]>
Closes #7911 from rxin/surface-concrete-type and squashes the following
commits:
0422add [Reynold Xin] [SPARK-9577][SQL] Surface concrete iterator types in
various sort classes.
(cherry picked from commit 5eb89f67e323dcf9fa3d5b30f9b5cb8f10ca1e8c)
Signed-off-by: Reynold Xin <[email protected]>
commit 1f7dbcd6fdeee22c7b670ea98dcb4e794f84a8cd
Author: Sean Owen <[email protected]>
Date: 2015-08-04T04:48:22Z
[SPARK-9521] [DOCS] Addendum. Require Maven 3.3.3+ in the build
Follow on for #7852: Building Spark doc needs to refer to new Maven
requirement too
Author: Sean Owen <[email protected]>
Closes #7905 from srowen/SPARK-9521.2 and squashes the following commits:
73285df [Sean Owen] Follow on for #7852: Building Spark doc needs to refer
to new Maven requirement too
(cherry picked from commit 0afa6fbf525723e97c6dacfdba3ad1762637ffa9)
Signed-off-by: Sean Owen <[email protected]>
commit 29f2d5a065254e7ed44fb204a1deecf9d44d338c
Author: Ankur Dave <[email protected]>
Date: 2015-08-04T06:07:32Z
[SPARK-3190] [GRAPHX] Fix VertexRDD.count() overflow regression
SPARK-3190 was originally fixed by
96df92906978c5f58e0cc8ff5eebe5b35a08be3b, but
a5ef58113667ff73562ce6db381cff96a0b354b0 introduced a regression during
refactoring. This commit fixes the regression.
Author: Ankur Dave <[email protected]>
Closes #7923 from ankurdave/SPARK-3190-reopening and squashes the following
commits:
a3e1b23 [Ankur Dave] Fix VertexRDD.count() overflow regression
(cherry picked from commit 9e952ecbce670e9b532a1c664a4d03b66e404112)
Signed-off-by: Reynold Xin <[email protected]>
commit 5ae675360d883483e509788b8867c1c98b4820fd
Author: Sean Owen <[email protected]>
Date: 2015-08-04T11:02:26Z
[SPARK-9534] [BUILD] Enable javac lint for scalac parity; fix a lot of
build warnings, 1.5.0 edition
Enable most javac lint warnings; fix a lot of build warnings. In a few
cases, touch up surrounding code in the process.
I'll explain several of the changes inline in comments.
Author: Sean Owen <[email protected]>
Closes #7862 from srowen/SPARK-9534 and squashes the following commits:
ea51618 [Sean Owen] Enable most javac lint warnings; fix a lot of build
warnings. In a few cases, touch up surrounding code in the process.
(cherry picked from commit 76d74090d60f74412bd45487e8db6aff2e8343a2)
Signed-off-by: Sean Owen <[email protected]>
commit bd9b7521343c34c42be40ee05a01c8a976ed2307
Author: tedyu <[email protected]>
Date: 2015-08-04T11:22:53Z
[SPARK-8064] [BUILD] Follow-up. Undo change from SPARK-9507 that was
accidentally reverted
This PR removes the dependency reduced POM hack brought back by #7191
Author: tedyu <[email protected]>
Closes #7919 from tedyu/master and squashes the following commits:
1bfbd7b [tedyu] [BUILD] Remove dependency reduced POM hack
(cherry picked from commit b211cbc7369af5eb2cb65d93c4c57c4db7143f47)
Signed-off-by: Sean Owen <[email protected]>
commit 45c8d2bb872bb905a402cf3aa78b1c4efaac07cf
Author: Carson Wang <[email protected]>
Date: 2015-08-04T13:12:30Z
[SPARK-2016] [WEBUI] RDD partition table pagination for the RDD Page
Add pagination for the RDD page to avoid unresponsive UI when the number of
the RDD partitions is large.
Before:

After:

Author: Carson Wang <[email protected]>
Closes #7692 from carsonwang/SPARK-2016 and squashes the following commits:
03c7168 [Carson Wang] Fix style issues
612c18c [Carson Wang] RDD partition table pagination for the RDD Page
(cherry picked from commit cb7fa0aa93dae5a25a8e7e387dbd6b55a5a23fb0)
Signed-off-by: Kousuke Saruta <[email protected]>
commit f44b27a2b92da2325ed9389cd27b6e2cfd9ec486
Author: Marcelo Vanzin <[email protected]>
Date: 2015-08-04T13:19:11Z
[SPARK-9583] [BUILD] Do not print mvn debug messages to stdout.
This allows build/mvn to be used by make-distribution.sh.
Author: Marcelo Vanzin <[email protected]>
Closes #7915 from vanzin/SPARK-9583 and squashes the following commits:
6469e60 [Marcelo Vanzin] [SPARK-9583] [build] Do not print mvn debug
messages to stdout.
(cherry picked from commit d702d53732b44e8242448ce5302738bd130717d8)
Signed-off-by: Kousuke Saruta <[email protected]>
commit 945da3534762a73fe7ffc52c868ff07a0783502b
Author: Tarek Auel <[email protected]>
Date: 2015-08-04T15:59:42Z
[SPARK-8244] [SQL] string function: find in set
This PR is based on #7186 (just fix the conflict), thanks to tarekauel .
find_in_set(string str, string strList): int
Returns the first occurance of str in strList where strList is a
comma-delimited string. Returns null if either argument is null. Returns 0 if
the first argument contains any commas. For example, find_in_set('ab',
'abc,b,ab,c,def') returns 3.
Only add this to SQL, not DataFrame.
Closes #7186
Author: Tarek Auel <[email protected]>
Author: Davies Liu <[email protected]>
Closes #7900 from davies/find_in_set and squashes the following commits:
4334209 [Davies Liu] Merge branch 'master' of github.com:apache/spark into
find_in_set
8f00572 [Davies Liu] Merge branch 'master' of github.com:apache/spark into
find_in_set
243ede4 [Tarek Auel] [SPARK-8244][SQL] hive compatibility
1aaf64e [Tarek Auel] [SPARK-8244][SQL] unit test fix
e4093a4 [Tarek Auel] [SPARK-8244][SQL] final modifier for COMMA_UTF8
0d05df5 [Tarek Auel] Merge branch 'master' into SPARK-8244
208d710 [Tarek Auel] [SPARK-8244] address comments & bug fix
71b2e69 [Tarek Auel] [SPARK-8244] find_in_set
66c7fda [Tarek Auel] Merge branch 'master' into SPARK-8244
61b8ca2 [Tarek Auel] [SPARK-8224] removed loop and split; use unsafe String
comparison
4f75a65 [Tarek Auel] Merge branch 'master' into SPARK-8244
e3b20c8 [Tarek Auel] [SPARK-8244] added type check
1c2bbb7 [Tarek Auel] [SPARK-8244] findInSet
commit b42e13dca38c6e9ff9cf879bcb52efa681437120
Author: Davies Liu <[email protected]>
Date: 2015-08-04T16:07:09Z
[SPARK-8246] [SQL] Implement get_json_object
This is based on #7485 , thanks to NathanHowell
Tests were copied from Hive, but do not seem to be super comprehensive.
I've generally replicated Hive's unusual behavior rather than following a
JSONPath reference, except for one case (as noted in the comments). I don't
know if there is a way of fully replicating Hive's behavior without a slower
TreeNode implementation, so I've erred on the side of performance instead.
Author: Davies Liu <[email protected]>
Author: Yin Huai <[email protected]>
Author: Nathan Howell <[email protected]>
Closes #7901 from davies/get_json_object and squashes the following commits:
3ace9b9 [Davies Liu] Merge branch 'get_json_object' of
github.com:davies/spark into get_json_object
98766fc [Davies Liu] Merge branch 'master' of github.com:apache/spark into
get_json_object
a7dc6d0 [Davies Liu] Update JsonExpressionsSuite.scala
c818519 [Yin Huai] new results.
18ce26b [Davies Liu] fix tests
6ac29fb [Yin Huai] Golden files.
25eebef [Davies Liu] use HiveQuerySuite
e0ac6ec [Yin Huai] Golden answer files.
940c060 [Davies Liu] tweat code style
44084c5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into
get_json_object
9192d09 [Nathan Howell] Match Hiveâs behavior for unwrapping arrays of
one element
8dab647 [Nathan Howell] [SPARK-8246] [SQL] Implement get_json_object
(cherry picked from commit 73dedb589d06f7c7a525cc4f07721a77f480c434)
Signed-off-by: Davies Liu <[email protected]>
commit d875368edd7265cedf808c921c0af0deb4895a67
Author: Yijie Shen <[email protected]>
Date: 2015-08-04T16:09:52Z
[SPARK-9541] [SQL] DataTimeUtils cleanup
JIRA: https://issues.apache.org/jira/browse/SPARK-9541
Author: Yijie Shen <[email protected]>
Closes #7870 from yjshen/datetime_cleanup and squashes the following
commits:
9203e33 [Yijie Shen] revert getMonth & getDayOfMonth
5cad119 [Yijie Shen] rebase code
7d62a74 [Yijie Shen] remove tmp tuple inside split date
e98aaac [Yijie Shen] DataTimeUtils cleanup
(cherry picked from commit b5034c9c59947f20423faa46bc6606aad56836b0)
Signed-off-by: Davies Liu <[email protected]>
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]