GitHub user nssalian opened a pull request:
https://github.com/apache/spark/pull/6861
Adding Python code for Spark 8320
Added python code to
https://spark.apache.org/docs/latest/streaming-programming-guide.html
to the Level of Parallelism in Data Receiving section.
Please review and let me know if there are any additional changes that are
needed.
Thank you.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/nssalian/spark SPARK-8320
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/6861.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6861
----
commit 82a396c2f594bade276606dcd0c0545a650fb838
Author: Holden Karau <[email protected]>
Date: 2015-05-29T21:59:18Z
[SPARK-7910] [TINY] [JAVAAPI] expose partitioner information in javardd
Author: Holden Karau <[email protected]>
Closes #6464 from
holdenk/SPARK-7910-expose-partitioner-information-in-javardd and squashes the
following commits:
de1e644 [Holden Karau] Fix the test to get the partitioner
bdb31cc [Holden Karau] Add Mima exclude for the new method
347ef4c [Holden Karau] Add a quick little test for the partitioner JavaAPI
f49dca9 [Holden Karau] Add partitoner information to JavaRDDLike and fix
some whitespace
commit 5fb97dca9bcfc29ac33823554c8783997e811b99
Author: Shivaram Venkataraman <[email protected]>
Date: 2015-05-29T22:08:30Z
[SPARK-7954] [SPARKR] Create SparkContext in sparkRSQL init
cc davies
Author: Shivaram Venkataraman <[email protected]>
Closes #6507 from shivaram/sparkr-init and squashes the following commits:
6fdd169 [Shivaram Venkataraman] Create SparkContext in sparkRSQL init
commit dbf8ff38de0f95f467b874a5b527dcf59439efe8
Author: Ram Sriharsha <[email protected]>
Date: 2015-05-29T22:22:26Z
[SPARK-6013] [ML] Add more Python ML examples for spark.ml
Author: Ram Sriharsha <[email protected]>
Closes #6443 from harsha2010/SPARK-6013 and squashes the following commits:
732506e [Ram Sriharsha] Code Review Feedback
121c211 [Ram Sriharsha] python style fix
5f9b8c3 [Ram Sriharsha] python style fixes
925ca86 [Ram Sriharsha] Simple Params Example
8b372b1 [Ram Sriharsha] GBT Example
965ec14 [Ram Sriharsha] Random Forest Example
commit 8c9979337f193c72fd2f1a891909283de53777e3
Author: Andrew Or <[email protected]>
Date: 2015-05-29T22:26:49Z
[HOTFIX] [SQL] Maven test compilation issue
Tests compile in SBT but not Maven.
commit a4f24123d8857656524c9138c7c067a4b1033a5e
Author: Andrew Or <[email protected]>
Date: 2015-05-30T00:19:46Z
[HOT FIX] [BUILD] Fix maven build failures
This patch fixes a build break in maven caused by #6441.
Note that this patch reverts the changes in flume-sink because
this module does not currently depend on Spark core, but the
tests require it. There is not an easy way to make this work
because mvn test dependencies are not transitive (MNG-1378).
For now, we will leave the one test suite in flume-sink out
until we figure out a better solution. This patch is mainly
intended to unbreak the maven build.
Author: Andrew Or <[email protected]>
Closes #6511 from andrewor14/fix-build-mvn and squashes the following
commits:
3d53643 [Andrew Or] [HOT FIX #6441] Fix maven build failures
commit 3792d25836e1e521da64c5a62ca1b6cca1bcb6b9
Author: Taka Shinagawa <[email protected]>
Date: 2015-05-30T03:35:14Z
[DOCS][Tiny] Added a missing dash(-) in docs/configuration.md
The first line had only two dashes (--) instead of three(---). Because of
this missing dash(-), 'jekyll build' command was not converting
configuration.md to _site/configuration.html
Author: Taka Shinagawa <[email protected]>
Closes #6513 from mrt/docfix3 and squashes the following commits:
c470e2c [Taka Shinagawa] Added a missing dash(-) preventing jekyll from
converting configuration.md to html format
commit 7ed06c39922ac90acab3a78ce0f2f21184ed68a5
Author: Burak Yavuz <[email protected]>
Date: 2015-05-30T05:19:15Z
[SPARK-7957] Preserve partitioning when using randomSplit
cc JoshRosen
Thanks for noticing this!
Author: Burak Yavuz <[email protected]>
Closes #6509 from brkyvz/sample-perf-reg and squashes the following commits:
497465d [Burak Yavuz] addressed code review
293f95f [Burak Yavuz] [SPARK-7957] Preserve partitioning when using
randomSplit
commit 609c4923f98c188bce60ae35c1c8a08a8dfd95f1
Author: Andrew Or <[email protected]>
Date: 2015-05-30T05:57:46Z
[SPARK-7558] Guard against direct uses of FunSuite / FunSuiteLike
This is a follow-up patch to #6441.
Author: Andrew Or <[email protected]>
Closes #6510 from andrewor14/extends-funsuite-check and squashes the
following commits:
6618b46 [Andrew Or] Exempt SparkSinkSuite from the FunSuite check
99d02ac [Andrew Or] Merge branch 'master' of github.com:apache/spark into
extends-funsuite-check
48874dd [Andrew Or] Guard against direct uses of FunSuite / FunSuiteLike
commit 193dba01c77ef1bb63e3f617213eb257960f8d2f
Author: Andrew Or <[email protected]>
Date: 2015-05-30T06:08:47Z
[TRIVIAL] Typo fix for last commit
commit da2112aef28e63c452f592e0abd007141787877d
Author: Octavian Geagla <[email protected]>
Date: 2015-05-30T06:55:19Z
[SPARK-7576] [MLLIB] Add spark.ml user guide doc/example for
ElementwiseProduct
Author: Octavian Geagla <[email protected]>
Closes #6501 from ogeagla/ml-guide-elemwiseprod and squashes the following
commits:
4ad93d5 [Octavian Geagla] [SPARK-7576] [MLLIB] Incorporate code review
feedback.
f7be7ad [Octavian Geagla] [SPARK-7576] [MLLIB] Add spark.ml user guide
doc/example for ElementwiseProduct.
commit 78657d53d71b9d3e86b675cc519868f99e2ffa01
Author: Timothy Chen <[email protected]>
Date: 2015-05-30T06:56:18Z
[SPARK-7962] [MESOS] Fix master url parsing in rest submission client.
Only parse standalone master url when master url starts with spark://
Author: Timothy Chen <[email protected]>
Closes #6517 from tnachen/fix_mesos_client and squashes the following
commits:
61a1198 [Timothy Chen] Fix master url parsing in rest submission client.
commit e3a43748338b02ef6864ca62de40e218e5677506
Author: Octavian Geagla <[email protected]>
Date: 2015-05-30T07:00:36Z
[SPARK-7459] [MLLIB] ElementwiseProduct Java example
Author: Octavian Geagla <[email protected]>
Closes #6008 from ogeagla/elementwise-prod-doc and squashes the following
commits:
72e6dc0 [Octavian Geagla] [SPARK-7459] [MLLIB] Java example import.
cf2afbd [Octavian Geagla] [SPARK-7459] [MLLIB] Update description of
example.
b66431b [Octavian Geagla] [SPARK-7459] [MLLIB] Add override annotation to
java example, make scala example use same data as java.
6b26b03 [Octavian Geagla] [SPARK-7459] [MLLIB] Fix line which is too long.
79af020 [Octavian Geagla] [SPARK-7459] [MLLIB] Actually don't use Java 8.
9d5b31a [Octavian Geagla] [SPARK-7459] [MLLIB] Don't use Java 8
4f0c92f [Octavian Geagla] [SPARK-7459] [MLLIB] ElementwiseProduct Java
example.
commit 0978aec9cd47dc0618e47b74a99e1cc2266be424
Author: Wenchen Fan <[email protected]>
Date: 2015-05-30T07:26:46Z
[SPARK-7964][SQL] remove unnecessary type coercion rule
We have defined these logics in `Cast` already, I think we should remove
this rule.
Author: Wenchen Fan <[email protected]>
Closes #6516 from cloud-fan/tmp2 and squashes the following commits:
d5035a4 [Wenchen Fan] remove useless rule
commit 8c8de3ed863985554e84fd07d1cdcaeca7e3375c
Author: Sean Owen <[email protected]>
Date: 2015-05-30T11:59:27Z
[SPARK-7890] [DOCS] Document that Spark 2.11 now supports Kafka
Remove caveat about Kafka / JDBC not being supported for Scala 2.11
Author: Sean Owen <[email protected]>
Closes #6470 from srowen/SPARK-7890 and squashes the following commits:
4652634 [Sean Owen] One more rewording
7b7f3c8 [Sean Owen] Restore note about JDBC component
126744d [Sean Owen] Remove caveat about Kafka / JDBC not being supported
for Scala 2.11
commit 9d8aadb72bbc86595e253fe30201cda6a8db877e
Author: WangTaoTheTonic <[email protected]>
Date: 2015-05-30T12:04:27Z
[SPARK-7945] [CORE] Do trim to values in properties file
https://issues.apache.org/jira/browse/SPARK-7945
Now applications submited by org.apache.spark.launcher.Main read properties
file without doing trim to values in it.
If user left a space after a value(say spark.driver.extraClassPath) then it
probably affect global functions(like some jar could not be included in the
classpath), so we should do it like Utils.getPropertiesFromFile.
Author: WangTaoTheTonic <[email protected]>
Author: Tao Wang <[email protected]>
Closes #6496 from WangTaoTheTonic/SPARK-7945 and squashes the following
commits:
bb41b4b [Tao Wang] indent 4 to 2
6dd1cf2 [WangTaoTheTonic] use a simpler way
2c053a1 [WangTaoTheTonic] Do trim to values in properties file
commit 2b35c99c7e73d22e82aef90b675709ae7f8d3b4a
Author: zhichao.li <[email protected]>
Date: 2015-05-30T12:06:11Z
[SPARK-7717] [WEBUI] Only showing total memory and cores for alive workers
Author: zhichao.li <[email protected]>
Closes #6317 from zhichao-li/workers and squashes the following commits:
d68bf11 [zhichao.li] change prefix
99b6768 [zhichao.li] remove extra space and add 'Alive' prefix
1e8eb06 [zhichao.li] only showing alive workers
commit 3ab71eb9d5e3fe21af7720421eafa51f6da9b63f
Author: Taka Shinagawa <[email protected]>
Date: 2015-05-30T12:25:21Z
[DOCS] [MINOR] Update for the Hadoop versions table with hadoop-2.6
Updated the doc for the hadoop-2.6 profile, which is new to Spark 1.4
Author: Taka Shinagawa <[email protected]>
Closes #6450 from mrt/docfix2 and squashes the following commits:
db1c43b [Taka Shinagawa] Updated the hadoop versions for hadoop-2.6 profile
323710e [Taka Shinagawa] The hadoop-2.6 profile is added to the Hadoop
versions table
commit d34b43bd5964e1feb03a17937de87a3f718806a5
Author: Reynold Xin <[email protected]>
Date: 2015-05-30T19:06:38Z
Closes #4685
commit 6e3f0c7810a6721698b0ed51cfbd41a0cd07a4a3
Author: Cheng Lian <[email protected]>
Date: 2015-05-30T19:16:09Z
[SPARK-7849] [SQL] [Docs] Updates SQL programming guide for 1.4
Author: Cheng Lian <[email protected]>
Closes #6520 from liancheng/spark-7849 and squashes the following commits:
705264b [Cheng Lian] Updates SQL programming guide for 1.4
commit 7716a5a1ec8ff8dc24e0146f8ead2f51da6512ad
Author: Reynold Xin <[email protected]>
Date: 2015-05-30T21:57:23Z
Updated SQL programming guide's Hive connectivity section.
commit a6430028ecd7a6130f1eb15af9ec00e242c46725
Author: Josh Rosen <[email protected]>
Date: 2015-05-30T22:27:51Z
[SPARK-7855] Move bypassMergeSort-handling from ExternalSorter to own
component
Spark's `ExternalSorter` writes shuffle output files during sort-based
shuffle. Sort-shuffle contains a configuration,
`spark.shuffle.sort.bypassMergeThreshold`, which causes ExternalSorter to skip
sorting and merging and simply write separate files per partition, which are
then concatenated together to form the final map output file.
The code paths used during this bypass are almost completely separate from
ExternalSorter's other code paths, so refactoring them into a separate file can
significantly simplify the code.
In addition to re-arranging code, this patch deletes a bunch of dead code.
The main entry point into ExternalSorter is `insertAll()` and in SPARK-4479 /
#3422 this method was modified to completely bypass in-memory buffering of
records when `bypassMergeSort` takes effect. As a result, some of the spilling
and merging code paths will no longer be called when `bypassMergeSort` is used,
so we should be able to safely remove that code.
There's an open JIRA
([SPARK-6026](https://issues.apache.org/jira/browse/SPARK-6026)) for removing
the `bypassMergeThreshold` parameter and code paths; I have not done that here,
but the changes in this patch will make removing that parameter significantly
easier if we ever decide to do that.
This patch also makes several improvements to shuffle-related tests and
adds more defensive checks to certain shuffle classes:
- DiskBlockObjectWriter now throws an exception if `fileSegment()` is
called before `commitAndClose()` has been called.
- DiskBlockObjectWriter's close methods are now idempotent, so calling any
of the close methods twice in a row will no longer result in incorrect shuffle
write metrics changes. Calling `revertPartialWritesAndClose()` on a closed
DiskBlockObjectWriter now has no effect (before, it might mess up the metrics).
- The end-to-end shuffle record count metrics tests have been moved from
InputOutputMetricsSuite to ShuffleSuite. This means that these tests will now
be run against all shuffle implementations rather than just the default shuffle
configuration.
- The end-to-end metrics tests now include a test of a job which performs
aggregation in the shuffle.
- Our tests now check that `shuffleBytesWritten == totalShuffleBytesRead`.
- FileSegment now throws IllegalArgumentException if it is constructed with
a negative length or offset.
Author: Josh Rosen <[email protected]>
Closes #6397 from JoshRosen/external-sorter-bypass-cleanup and squashes the
following commits:
bf3f3f6 [Josh Rosen] Merge remote-tracking branch 'origin/master' into
external-sorter-bypass-cleanup
8b216c4 [Josh Rosen] Guard against negative offsets and lengths in
FileSegment
03f35a4 [Josh Rosen] Minor fix to cleanup logic.
b5cc35b [Josh Rosen] Move shuffle metrics tests to ShuffleSuite.
8b8fb9e [Josh Rosen] Add more tests + defensive programming to
DiskBlockObjectWriter.
16564eb [Josh Rosen] Guard against calling fileSegment() before
commitAndClose() has been called.
96811b4 [Josh Rosen] Remove confusing taskMetrics.shuffleWriteMetrics()
optional call
8522b6a [Josh Rosen] Do not perform a map-side sort unless we're also doing
map-side aggregation
08e40f3 [Josh Rosen] Remove excessively clever (and wrong) implementation
of newBuffer()
d7f9938 [Josh Rosen] Add missing overrides; fix compilation
71d76ff [Josh Rosen] Update Javadoc
bf0d98f [Josh Rosen] Add comment to clarify confusing factory code
5197f73 [Josh Rosen] Add missing private[this]
30ef2c8 [Josh Rosen] Convert BypassMergeSortShuffleWriter to Java
bc1a820 [Josh Rosen] Fix bug when aggregator is used but map-side combine
is disabled
0d3dcc0 [Josh Rosen] Remove unnecessary overloaded methods
25b964f [Josh Rosen] Rename SortShuffleSorter to SortShuffleFileWriter
0d9848c [Josh Rosen] Make it more clear that curWriteMetrics is now only
used for spill metrics
7af7aea [Josh Rosen] Combine spill() and spillToMergeableFile()
6320112 [Josh Rosen] Add missing negation in deletion success check.
d267e0d [Josh Rosen] Fix style issue
7f15f7b [Josh Rosen] Back out extra cleanup-handling code, since this is
already covered in stop()
25aa3bd [Josh Rosen] Make sure to delete outputFile after errors.
931ca68 [Josh Rosen] Refactor tests.
6a35716 [Josh Rosen] Refactor logic for deciding when to bypass
4b03539 [Josh Rosen] Move conf prior to first use
1265b25 [Josh Rosen] Fix some style errors and comments.
02355ef [Josh Rosen] More simplification
d4cb536 [Josh Rosen] Delete more unused code
bb96678 [Josh Rosen] Add missing interface file
b6cc1eb [Josh Rosen] Realize that bypass never buffers; proceed to delete
tons of code
6185ee2 [Josh Rosen] WIP towards moving bypass code into own file.
8d0678c [Josh Rosen] Move diskBytesSpilled getter next to variable
19bccd6 [Josh Rosen] Remove duplicated buffer creation code.
18959bb [Josh Rosen] Move comparator methods closer together.
commit 1617363fbb9b22a2eb09e7bab98c8d05f9508761
Author: Yanbo Liang <[email protected]>
Date: 2015-05-30T23:24:07Z
[SPARK-7918] [MLLIB] MLlib Python doc parity check for evaluation and
feature
Check then make the MLlib Python evaluation and feature doc to be as
complete as the Scala doc.
Author: Yanbo Liang <[email protected]>
Closes #6461 from yanboliang/spark-7918 and squashes the following commits:
940e3f1 [Yanbo Liang] truncate too long line and remove extra sparse
a80ae58 [Yanbo Liang] MLlib Python doc parity check for evaluation and
feature
commit 1281a3518802bfa624618236e6b9b59bc0e78585
Author: Mike Dusenberry <[email protected]>
Date: 2015-05-30T23:50:59Z
[SPARK-7920] [MLLIB] Make MLlib ChiSqSelector Serializable (& Fix Related
Documentation Example).
The MLlib ChiSqSelector class is not serializable, and so the example in
the ChiSqSelector documentation fails. Also, that example is missing the import
of ChiSqSelector.
This PR makes ChiSqSelector extend Serializable in MLlib, and adds the
ChiSqSelector import statement to the associated example in the documentation.
Author: Mike Dusenberry <[email protected]>
Closes #6462 from
dusenberrymw/Make_ChiSqSelector_Serializable_and_Fix_Related_Docs_Example and
squashes the following commits:
9cb2f94 [Mike Dusenberry] Make MLlib ChiSqSelector Serializable.
d9003bf [Mike Dusenberry] Add missing import in MLlib ChiSqSelector Docs
Scala example.
commit 66a53a69643e0004742667e140bad2aa8dae44e4
Author: Josh Rosen <[email protected]>
Date: 2015-05-30T23:52:34Z
[HOTFIX] Replace FunSuite with SparkFunSuite.
This fixes a build break introduced by merging
a6430028ecd7a6130f1eb15af9ec00e242c46725,
which fails the new style checks that ensure that we use SparkFunSuite
instead
of FunSuite.
commit 2b258e1c0784c8ca958bf94cd9e75fa17f104448
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-31T00:21:41Z
[SPARK-5610] [DOC] update genjavadocSettings to use the patched version of
genjavadoc
This PR updates `genjavadocSettings` to use a patched version of
`genjavadoc-plugin` that hides package private classes/methods/interfaces in
the generated Java API doc. The patch can be found at:
https://github.com/typesafehub/genjavadoc/compare/master...mengxr:spark-1.4.
It wasn't merged into the main repo because there exist corner cases where
a package private Scala class has to be a Java public class in order to
compile. This doesn't seem to apply to the Spark codebase. So we release a
patched version under `org.spark-project` and use it in the Spark build. brkyvz
is publishing the artifacts to Maven Central.
Need more people audit the generated APIs and make sure we don't have false
negatives.
Current listed classes under `org.apache.spark.rdd`:

After this PR:

cc: pwendell rxin srowen
Author: Xiangrui Meng <[email protected]>
Closes #6506 from mengxr/SPARK-5610 and squashes the following commits:
489c785 [Xiangrui Meng] update genjavadocSettings to use the patched
version of genjavadoc
commit 14b314dc2cad7bbf23976347217c676d338e0a2d
Author: Reynold Xin <[email protected]>
Date: 2015-05-31T02:50:52Z
[SQL] Tighten up visibility for JavaDoc.
I went through all the JavaDocs and tightened up visibility.
Author: Reynold Xin <[email protected]>
Closes #6526 from rxin/sql-1.4-visibility-for-docs and squashes the
following commits:
bc37d1e [Reynold Xin] Tighten up visibility for JavaDoc.
commit c63e1a742b3e87e79a4466e9bd0b927a24645756
Author: Reynold Xin <[email protected]>
Date: 2015-05-31T02:51:53Z
[SPARK-7971] Add JavaDoc style deprecation for deprecated DataFrame methods
Scala deprecated annotation actually doesn't show up in JavaDoc.
Author: Reynold Xin <[email protected]>
Closes #6523 from rxin/df-deprecated-javadoc and squashes the following
commits:
26da2b2 [Reynold Xin] [SPARK-7971] Add JavaDoc style deprecation for
deprecated DataFrame methods.
commit 00a7137900d45188673da85cbcef4f02b7a266c1
Author: Reynold Xin <[email protected]>
Date: 2015-05-31T03:10:02Z
Update documentation for the new DataFrame reader/writer interface.
Author: Reynold Xin <[email protected]>
Closes #6522 from rxin/sql-doc-1.4 and squashes the following commits:
c227be7 [Reynold Xin] Updated link.
040b6d7 [Reynold Xin] Update documentation for the new DataFrame
reader/writer interface.
commit f7fe9e474417a68635a5ed1aa819d81a9be40895
Author: Cheng Lian <[email protected]>
Date: 2015-05-31T04:56:41Z
[SQL] [MINOR] Fixes a minor comment mistake in IsolatedClientLoader
Author: Cheng Lian <[email protected]>
Closes #6521 from liancheng/classloader-comment-fix and squashes the
following commits:
fc09606 [Cheng Lian] Addresses @srowen's comment
59945c5 [Cheng Lian] Fixes a minor comment mistake in IsolatedClientLoader
commit 084fef76e90116c6465cd6fad7c0197c3e4d4313
Author: Reynold Xin <[email protected]>
Date: 2015-05-31T06:36:32Z
[SPARK-7976] Add style checker to disallow overriding finalize.
Author: Reynold Xin <[email protected]>
Closes #6528 from rxin/style-finalizer and squashes the following commits:
a2211ca [Reynold Xin] [SPARK-7976] Enable NoFinalizeChecker.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]