GitHub user nssalian opened a pull request:
https://github.com/apache/spark/pull/6860
Addition of Python example for SPARK-8320
Added python code to
https://spark.apache.org/docs/latest/streaming-programming-guide.html
to the Level of Parallelism in Data Receiving section.
Please review and let me know if there are any additional changes that are
needed.
Thank you.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/nssalian/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/6860.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6860
----
commit 5a1a1075a607be683f008ef92fa227803370c45f
Author: Andrew Or <[email protected]>
Date: 2015-05-04T16:17:55Z
[MINOR] Fix python test typo?
I suspect haven't been using anaconda in tests in a while. I wonder if this
change actually does anything but this line as it stands looks strictly less
correct.
Author: Andrew Or <[email protected]>
Closes #5883 from andrewor14/fix-run-tests-typo and squashes the following
commits:
a3ad720 [Andrew Or] Fix typo?
commit e0833c5958bbd73ff27cfe6865648d7b6e5a99bc
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-04T18:28:59Z
[SPARK-5956] [MLLIB] Pipeline components should be copyable.
This PR added `copy(extra: ParamMap): Params` to `Params`, which makes a
copy of the current instance with a randomly generated uid and some extra param
values. With this change, we only need to implement `fit` and `transform`
without extra param values given the default implementation of `fit(dataset,
extra)`:
~~~scala
def fit(dataset: DataFrame, extra: ParamMap): Model = {
copy(extra).fit(dataset)
}
~~~
Inside `fit` and `transform`, since only the embedded values are used, I
added `$` as an alias for `getOrDefault` to make the code easier to read. For
example, in `LinearRegression.fit` we have:
~~~scala
val effectiveRegParam = $(regParam) / yStd
val effectiveL1RegParam = $(elasticNetParam) * effectiveRegParam
val effectiveL2RegParam = (1.0 - $(elasticNetParam)) * effectiveRegParam
~~~
Meta-algorithm like `Pipeline` implements its own `copy(extra)`. So the
fitted pipeline model stored all copied stages (no matter whether it is a
transformer or a model).
Other changes:
* `Params$.inheritValues` is moved to `Params!.copyValues` and returns the
target instance.
* `fittingParamMap` was removed because the `parent` carries this
information.
* `validate` was renamed to `validateParams` to be more precise.
TODOs:
* [x] add tests for newly added methods
* [ ] update documentation
jkbradley dbtsai
Author: Xiangrui Meng <[email protected]>
Closes #5820 from mengxr/SPARK-5956 and squashes the following commits:
7bef88d [Xiangrui Meng] address comments
05229c3 [Xiangrui Meng] assert -> assertEquals
b2927b1 [Xiangrui Meng] organize imports
f14456b [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into
SPARK-5956
93e7924 [Xiangrui Meng] add tests for hasParam & copy
463ecae [Xiangrui Meng] merge master
2b954c3 [Xiangrui Meng] update Binarizer
465dd12 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into
SPARK-5956
282a1a8 [Xiangrui Meng] fix test
819dd2d [Xiangrui Meng] merge master
b642872 [Xiangrui Meng] example code runs
5a67779 [Xiangrui Meng] examples compile
c76b4d1 [Xiangrui Meng] fix all unit tests
0f4fd64 [Xiangrui Meng] fix some tests
9286a22 [Xiangrui Meng] copyValues to trained models
53e0973 [Xiangrui Meng] move inheritValues to Params and rename it to
copyValues
9ee004e [Xiangrui Meng] merge copy and copyWith; rename validate to
validateParams
d882afc [Xiangrui Meng] test compile
f082a31 [Xiangrui Meng] make Params copyable and simply handling of extra
params in all spark.ml components
commit f32e69ecc333867fc966f65cd0aeaeddd43e0945
Author: äºå³¤ <[email protected]>
Date: 2015-05-04T19:08:38Z
[SPARK-7319][SQL] Improve the output from DataFrame.show()
Author: äºå³¤ <[email protected]>
Closes #5865 from kaka1992/df.show and squashes the following commits:
c79204b [äºå³¤] Update
a1338f6 [äºå³¤] Update python dataFrame show test and add empty df unit
test.
734369c [äºå³¤] Update python dataFrame show test and add empty df unit
test.
84aec3e [äºå³¤] Update python dataFrame show test and add empty df unit
test.
159b3d5 [äºå³¤] update
03ef434 [äºå³¤] update
7394fd5 [äºå³¤] update test show
ced487a [äºå³¤] update pep8
b6e690b [äºå³¤] Merge remote-tracking branch 'upstream/master' into df.show
30ac311 [äºå³¤] [SPARK-7294] ADD BETWEEN
7d62368 [äºå³¤] [SPARK-7294] ADD BETWEEN
baf839b [äºå³¤] [SPARK-7294] ADD BETWEEN
d11d5b9 [äºå³¤] [SPARK-7294] ADD BETWEEN
commit fc8b58195afa67fbb75b4c8303e022f703cbf007
Author: Andrew Or <[email protected]>
Date: 2015-05-04T23:21:36Z
[SPARK-6943] [SPARK-6944] DAG visualization on SparkUI
This patch adds the functionality to display the RDD DAG on the SparkUI.
This DAG describes the relationships between
- an RDD and its dependencies,
- an RDD and its operation scopes, and
- an RDD's operation scopes and the stage / job hierarchy
An operation scope here refers to the existing public APIs that created the
RDDs (e.g. `textFile`, `treeAggregate`). In the future, we can expand this to
include higher level operations like SQL queries.
*Note: This blatantly stole a few lines of HTML and JavaScript from #5547
(thanks shroffpradyumn!)*
Here's what the job page looks like:
<img
src="https://issues.apache.org/jira/secure/attachment/12730286/job-page.png"
width="700px"/>
and the stage page:
<img
src="https://issues.apache.org/jira/secure/attachment/12730287/stage-page.png"
width="300px"/>
Author: Andrew Or <[email protected]>
Closes #5729 from andrewor14/viz2 and squashes the following commits:
666c03b [Andrew Or] Round corners of RDD boxes on stage page (minor)
01ba336 [Andrew Or] Change RDD cache color to red (minor)
6f9574a [Andrew Or] Add tests for RDDOperationScope
1c310e4 [Andrew Or] Wrap a few more RDD functions in an operation scope
3ffe566 [Andrew Or] Restore "null" as default for RDD name
5fdd89d [Andrew Or] children -> child (minor)
0d07a84 [Andrew Or] Fix python style
afb98e2 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
viz2
0d7aa32 [Andrew Or] Fix python tests
3459ab2 [Andrew Or] Fix tests
832443c [Andrew Or] Merge branch 'master' of github.com:apache/spark into
viz2
429e9e1 [Andrew Or] Display cached RDDs on the viz
b1f0fd1 [Andrew Or] Rename OperatorScope -> RDDOperationScope
31aae06 [Andrew Or] Extract visualization logic from listener
83f9c58 [Andrew Or] Implement a programmatic representation of operator
scopes
5a7faf4 [Andrew Or] Rename references to viz scopes to viz clusters
ee33d52 [Andrew Or] Separate HTML generating code from listener
f9830a2 [Andrew Or] Refactor + clean up + document JS visualization code
b80cc52 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
viz2
0706992 [Andrew Or] Add link from jobs to stages
deb48a0 [Andrew Or] Translate stage boxes taking into account the width
5c7ce16 [Andrew Or] Connect RDDs across stages + update style
ab91416 [Andrew Or] Introduce visualization to the Job Page
5f07e9c [Andrew Or] Remove more return statements from scopes
5e388ea [Andrew Or] Fix line too long
43de96e [Andrew Or] Add parent IDs to StageInfo
6e2cfea [Andrew Or] Remove all return statements in `withScope`
d19c4da [Andrew Or] Merge branch 'master' of github.com:apache/spark into
viz2
7ef957c [Andrew Or] Fix scala style
4310271 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
viz2
aa868a9 [Andrew Or] Ensure that HadoopRDD is actually serializable
c3bfcae [Andrew Or] Re-implement scopes using closures instead of
annotations
52187fc [Andrew Or] Rat excludes
09d361e [Andrew Or] Add ID to node label (minor)
71281fa [Andrew Or] Embed the viz in the UI in a toggleable manner
8dd5af2 [Andrew Or] Fill in documentation + miscellaneous minor changes
fe7816f [Andrew Or] Merge branch 'master' of github.com:apache/spark into
viz
205f838 [Andrew Or] Reimplement rendering with dagre-d3 instead of viz.js
5e22946 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
viz
6a7cdca [Andrew Or] Move RDD scope util methods and logic to its own file
494d5c2 [Andrew Or] Revert a few unintended style changes
9fac6f3 [Andrew Or] Re-implement scopes through annotations instead
f22f337 [Andrew Or] First working implementation of visualization with
vis.js
2184348 [Andrew Or] Translate RDD information to dot file
5143523 [Andrew Or] Expose the necessary information in RDDInfo
a9ed4f9 [Andrew Or] Add a few missing scopes to certain RDD methods
6b3403b [Andrew Or] Scope all RDD methods
commit 80554111703c08e2bedbe303e04ecd162ec119e1
Author: Burak Yavuz <[email protected]>
Date: 2015-05-05T00:02:49Z
[SPARK-7243][SQL] Contingency Tables for DataFrames
Computes a pair-wise frequency table of the given columns. Also known as
cross-tabulation.
cc mengxr rxin
Author: Burak Yavuz <[email protected]>
Closes #5842 from brkyvz/df-cont and squashes the following commits:
a07c01e [Burak Yavuz] addressed comments v4.1
ae9e01d [Burak Yavuz] fix test
9106585 [Burak Yavuz] addressed comments v4.0
bced829 [Burak Yavuz] fix merge conflicts
a63ad00 [Burak Yavuz] addressed comments v3.0
a0cad97 [Burak Yavuz] addressed comments v3.0
6805df8 [Burak Yavuz] addressed comments and fixed test
939b7c4 [Burak Yavuz] lint python
7f098bc [Burak Yavuz] add crosstab pyTest
fd53b00 [Burak Yavuz] added python support for crosstab
27a5a81 [Burak Yavuz] implemented crosstab
commit 678c4da0fa1bbfb6b5a0d3aced7aefa1bbbc193c
Author: Reynold Xin <[email protected]>
Date: 2015-05-05T01:03:07Z
[SPARK-7266] Add ExpectsInputTypes to expressions when possible.
This should gives us better analysis time error messages (rather than
runtime) and automatic type casting.
Author: Reynold Xin <[email protected]>
Closes #5796 from rxin/expected-input-types and squashes the following
commits:
c900760 [Reynold Xin] [SPARK-7266] Add ExpectsInputTypes to expressions
when possible.
commit 8aa5aea7fee0ae9cd34e16c30655ee02b8747455
Author: Bryan Cutler <[email protected]>
Date: 2015-05-05T01:29:22Z
[SPARK-7236] [CORE] Fix to prevent AkkaUtils askWithReply from sleeping on
final attempt
Added a check so that if `AkkaUtils.askWithReply` is on the final attempt,
it will not sleep for the `retryInterval`. This should also prevent the thread
from sleeping for `Int.Max` when using `askWithReply` with default values for
`maxAttempts` and `retryInterval`.
Author: Bryan Cutler <[email protected]>
Closes #5896 from BryanCutler/askWithReply-sleep-7236 and squashes the
following commits:
653a07b [Bryan Cutler] [SPARK-7236] Fix to prevent AkkaUtils askWithReply
from sleeping on final attempt
commit e9b16e67c636a8a91ab9fb0f4ef98146abbde1e9
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-05T06:52:42Z
[SPARK-7314] [SPARK-3524] [PYSPARK] upgrade Pyrolite to 4.4
This PR upgrades Pyrolite to 4.4, which contains the bug fix for SPARK-3524
and some other performance improvements (e.g., SPARK-6288). The artifact is
still under `org.spark-project` on Maven Central since there is no official
release published there.
Author: Xiangrui Meng <[email protected]>
Closes #5850 from mengxr/SPARK-7314 and squashes the following commits:
2ed4a95 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into
SPARK-7314
da3c2dd [Xiangrui Meng] remove my repo
fe7e29b [Xiangrui Meng] switch to maven central
6ddac0e [Xiangrui Meng] reverse the machine code for float/double
d2d5b5b [Xiangrui Meng] change back to 4.4
7824a9c [Xiangrui Meng] use Pyrolite 3.1
cc3903a [Xiangrui Meng] upgrade Pyrolite to 4.4-0 for testing
commit da738cffa8f7e12545b47f31dcb051f2927e4149
Author: Niccolo Becchi <[email protected]>
Date: 2015-05-05T07:54:42Z
[MINOR] Renamed variables in SparkKMeans.scala, LocalKMeans.scala and
kmeans.py to simplify readability
With the previous syntax it could look like that the reduceByKey sums
separately abscissas and ordinates of some 2D points. Perhaps in this way
should be easier to understand the example, especially for who is starting the
functional programming like me now.
Author: Niccolo Becchi <[email protected]>
Author: pippobaudos <[email protected]>
Closes #5875 from pippobaudos/patch-1 and squashes the following commits:
3bb3a47 [pippobaudos] renamed variables in LocalKMeans.scala and kmeans.py
to simplify readability
2c2a7a2 [Niccolo Becchi] Update SparkKMeans.scala
commit c5790a2f772168351c18bb0da51a124cee89a06f
Author: Marcelo Vanzin <[email protected]>
Date: 2015-05-05T07:56:16Z
[MINOR] [BUILD] Declare ivy dependency in root pom.
Without this, any dependency that pulls ivy transitively may override
the version and potentially cause issue. In my machine, the hive tests
were pulling an old version of ivy, and subsequently failing with a
"NoSuchMethodError".
Author: Marcelo Vanzin <[email protected]>
Closes #5893 from vanzin/ivy-dep-fix and squashes the following commits:
ea2112d [Marcelo Vanzin] [minor] [build] Declare ivy dependency in root pom.
commit 1854ac326a9cc6014817d8df30ed0458eee5d7d1
Author: Tathagata Das <[email protected]>
Date: 2015-05-05T08:45:19Z
[SPARK-7139] [STREAMING] Allow received block metadata to be saved to WAL
and recovered on driver failure
- Enabled ReceivedBlockTracker WAL by default
- Stored block metadata in the WAL
- Optimized WALBackedBlockRDD by skipping block fetch when the block is
known to not exist in Spark
Author: Tathagata Das <[email protected]>
Closes #5732 from tdas/SPARK-7139 and squashes the following commits:
575476e [Tathagata Das] Added more tests to get 100% coverage of the
WALBackedBlockRDD
19668ba [Tathagata Das] Merge remote-tracking branch 'apache-github/master'
into SPARK-7139
685fab3 [Tathagata Das] Addressed comments in PR
637bc9c [Tathagata Das] Changed segment to handle
466212c [Tathagata Das] Merge remote-tracking branch 'apache-github/master'
into SPARK-7139
5f67a59 [Tathagata Das] Fixed HdfsUtils to handle append in local file
system
1bc5bc3 [Tathagata Das] Fixed bug on unexpected recovery
d06fa21 [Tathagata Das] Enabled ReceivedBlockTracker by default, stored
block metadata and optimized block fetching in WALBackedBlockRDD
commit 8776fe0b93b6e6d718738bcaf9838a2196e12c8a
Author: Tathagata Das <[email protected]>
Date: 2015-05-05T08:58:51Z
[HOTFIX] [TEST] Ignoring flaky tests
org.apache.spark.DriverSuite.driver should exit after finishing without
cleanup (SPARK-530)
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2267/
org.apache.spark.deploy.SparkSubmitSuite.includes jars passed in through
--jars
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2271/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/testReport/
org.apache.spark.streaming.flume.FlumePollingStreamSuite.flume polling test
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2269/
Author: Tathagata Das <[email protected]>
Closes #5901 from tdas/ignore-flaky-tests and squashes the following
commits:
9cd8667 [Tathagata Das] Ignoring tests.
commit 8436f7e98e674020007a9175973c6a1095b6774f
Author: jerryshao <[email protected]>
Date: 2015-05-05T09:01:06Z
[SPARK-7113] [STREAMING] Support input information reporting for Direct
Kafka stream
Author: jerryshao <[email protected]>
Closes #5879 from jerryshao/SPARK-7113 and squashes the following commits:
b0b506c [jerryshao] Address the comments
0babe66 [jerryshao] Support input information reporting for Direct Kafka
stream
commit 4d29867ede9a87b160c3d715c1fb02067feef449
Author: zsxwing <[email protected]>
Date: 2015-05-05T09:15:39Z
[SPARK-7341] [STREAMING] [TESTS] Fix the flaky test:
org.apache.spark.stre...
...aming.InputStreamsSuite.socket input stream
Remove non-deterministic "Thread.sleep" and use deterministic strategies to
fix the flaky failure:
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2127/testReport/junit/org.apache.spark.streaming/InputStreamsSuite/socket_input_stream/
Author: zsxwing <[email protected]>
Closes #5891 from zsxwing/SPARK-7341 and squashes the following commits:
611157a [zsxwing] Add wait methods to BatchCounter and use BatchCounter in
InputStreamsSuite
014b58f [zsxwing] Use withXXX to clean up the resources
c9bf746 [zsxwing] Move 'waitForStart' into the 'start' method and fix the
code style
9d0de6d [zsxwing] [SPARK-7341][Streaming][Tests] Fix the flaky test:
org.apache.spark.streaming.InputStreamsSuite.socket input stream
commit fc8feaa8e94e1e611d2abb1e5e38de512961502b
Author: shekhar.bansal <[email protected]>
Date: 2015-05-05T10:09:51Z
[SPARK-6653] [YARN] New config to specify port for sparkYarnAM actor system
Author: shekhar.bansal <[email protected]>
Closes #5719 from zuxqoj/master and squashes the following commits:
5574ff7 [shekhar.bansal] [SPARK-6653][yarn] New config to specify port for
sparkYarnAM actor system
5117258 [shekhar.bansal] [SPARK-6653][yarn] New config to specify port for
sparkYarnAM actor system
9de5330 [shekhar.bansal] [SPARK-6653][yarn] New config to specify port for
sparkYarnAM actor system
456a592 [shekhar.bansal] [SPARK-6653][yarn] New configuration property to
specify port for sparkYarnAM actor system
803e93e [shekhar.bansal] [SPARK-6653][yarn] New configuration property to
specify port for sparkYarnAM actor system
commit 4222da68dc5360b7a2a8b8bdce231e887ac2f044
Author: Sandy Ryza <[email protected]>
Date: 2015-05-05T11:38:46Z
[SPARK-5112] Expose SizeEstimator as a developer api
"The best way to size the amount of memory consumption your dataset will
require is to create an RDD, put it into cache, and look at the SparkContext
logs on your driver program. The logs will tell you how much memory each
partition is consuming, which you can aggregate to get the total size of the
RDD."
-the Tuning Spark page
This is a pain. It would be much nicer to expose simply functionality for
understanding the memory footprint of a Java object.
Author: Sandy Ryza <[email protected]>
Closes #3913 from sryza/sandy-spark-5112 and squashes the following commits:
8d9e082 [Sandy Ryza] Add SizeEstimator in org.apache.spark
2e1a906 [Sandy Ryza] Revert "Move SizeEstimator out of util"
93f4cd0 [Sandy Ryza] Move SizeEstimator out of util
e21c1f4 [Sandy Ryza] Remove unused import
798ab88 [Sandy Ryza] Update documentation and add to SparkContext
34c523c [Sandy Ryza] SPARK-5112. Expose SizeEstimator as a developer api
commit 51f462003b416eac92feb5a6725f6c2994389010
Author: Jihong MA <[email protected]>
Date: 2015-05-05T11:40:41Z
[SPARK-7357] Improving HBaseTest example
Author: Jihong MA <[email protected]>
Closes #5904 from JihongMA/SPARK-7357 and squashes the following commits:
7d6153a [Jihong MA] SPARK-7357 Improving HBaseTest example
commit d49735800db27239c11478aac4b0f2ec9df91a3f
Author: Imran Rashid <[email protected]>
Date: 2015-05-05T12:25:40Z
[SPARK-3454] separate json endpoints for data in the UI
Exposes data available in the UI as json over http. Key points:
* new endpoints, handled independently of existing XyzPage classes. Root
entrypoint is `JsonRootResource`
* Uses jersey + jackson for routing & converting POJOs into json
* tests against known results in `HistoryServerSuite`
* also fixes some minor issues w/ the UI -- synchronizing on access to
`StorageListener` & `StorageStatusListener`, and fixing some inconsistencies w/
the way we handle retained jobs & stages.
Author: Imran Rashid <[email protected]>
Closes #4435 from squito/SPARK-3454 and squashes the following commits:
da1e35f [Imran Rashid] typos etc.
5e78b4f [Imran Rashid] fix rendering problems
5ae02ad [Imran Rashid] Merge branch 'master' into SPARK-3454
f016182 [Imran Rashid] change all constructors json-pojo class constructors
to be private[spark] to protect us from mima-false-positives if we add fields
3347b72 [Imran Rashid] mark EnumUtil as @Private
ec140a2 [Imran Rashid] create @Private
cc1febf [Imran Rashid] add docs on the metrics-as-json api
cbaf287 [Imran Rashid] Merge branch 'master' into SPARK-3454
56db31e [Imran Rashid] update tests for mulit-attempt
7f3bc4e [Imran Rashid] Revert "add sbt-revolved plugin, to make it easier
to start & stop http servers in sbt"
67008b4 [Imran Rashid] rats
9e51400 [Imran Rashid] style
c9bae1c [Imran Rashid] handle multiple attempts per app
b87cd63 [Imran Rashid] add sbt-revolved plugin, to make it easier to start
& stop http servers in sbt
188762c [Imran Rashid] multi-attempt
2af11e5 [Imran Rashid] Merge branch 'master' into SPARK-3454
befff0c [Imran Rashid] review feedback
14ac3ed [Imran Rashid] jersey-core needs to be explicit; move version &
scope to parent pom.xml
f90680e [Imran Rashid] Merge branch 'master' into SPARK-3454
dc8a7fe [Imran Rashid] style, fix errant comments
acb7ef6 [Imran Rashid] fix indentation
7bf1811 [Imran Rashid] move MetricHelper so mima doesnt think its exposed;
comments
9d889d6 [Imran Rashid] undo some unnecessary changes
f48a7b0 [Imran Rashid] docs
52bbae8 [Imran Rashid] StorageListener & StorageStatusListener needs to
synchronize internally to be thread-safe
31c79ce [Imran Rashid] asm no longer needed for SPARK_PREPEND_CLASSES
b2f8b91 [Imran Rashid] @DeveloperApi
2e19be2 [Imran Rashid] lazily convert ApplicationInfo to avoid memory
overhead
ba3d9d2 [Imran Rashid] upper case enums
39ac29c [Imran Rashid] move EnumUtil
d2bde77 [Imran Rashid] update error handling & scoping
4a234d3 [Imran Rashid] avoid jersey-media-json-jackson b/c of potential
version conflicts
a157a2f [Imran Rashid] style
7bd4d15 [Imran Rashid] delete security test, since it doesnt do anything
a325563 [Imran Rashid] style
a9c5cf1 [Imran Rashid] undo changes superceeded by master
0c6f968 [Imran Rashid] update deps
1ed0d07 [Imran Rashid] Merge branch 'master' into SPARK-3454
4c92af6 [Imran Rashid] style
f2e63ad [Imran Rashid] Merge branch 'master' into SPARK-3454
c22b11f [Imran Rashid] fix compile error
9ea682c [Imran Rashid] go back to good ol' java enums
cf86175 [Imran Rashid] style
d493b38 [Imran Rashid] Merge branch 'master' into SPARK-3454
f05ae89 [Imran Rashid] add in ExecutorSummaryInfo for MiMa :(
101a698 [Imran Rashid] style
d2ef58d [Imran Rashid] revert changes that had HistoryServer refresh the
application listing more often
b136e39b [Imran Rashid] Revert "add sbt-revolved plugin, to make it easier
to start & stop http servers in sbt"
e031719 [Imran Rashid] fixes from review
1f53a66 [Imran Rashid] style
b4a7863 [Imran Rashid] fix compile error
2c8b7ee [Imran Rashid] rats
1578a4a [Imran Rashid] doc
674f8dc [Imran Rashid] more explicit about total numbers of jobs & stages
vs. number retained
9922be0 [Imran Rashid] Merge branch 'master' into stage_distributions
f5a5196 [Imran Rashid] undo removal of renderJson from MasterPage, since
there is no substitute yet
db61211 [Imran Rashid] get JobProgressListener directly from UI
fdfc181 [Imran Rashid] stage/taskList
63eb4a6 [Imran Rashid] tests for taskSummary
ad27de8 [Imran Rashid] error handling on quantile values
b2efcaf [Imran Rashid] cleanup, combine stage-related paths into one
resource
aaba896 [Imran Rashid] wire up task summary
a4b1397 [Imran Rashid] stage metric distributions
e48ba32 [Imran Rashid] rename
eaf3bbb [Imran Rashid] style
25cd894 [Imran Rashid] if only given day, assume GMT
51eaedb [Imran Rashid] more visibility fixes
9f28b7e [Imran Rashid] ack, more cleanup
99764e1 [Imran Rashid] Merge branch 'SPARK-3454_w_jersey' into SPARK-3454
a61a43c [Imran Rashid] oops, remove accidental checkin
a066055 [Imran Rashid] set visibility on a lot of classes
1f361c8 [Imran Rashid] update rat-excludes
0be5120 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey
2382bef [Imran Rashid] switch to using new "enum"
fef6605 [Imran Rashid] some utils for working w/ new "enum" format
dbfc7bf [Imran Rashid] style
b86bcb0 [Imran Rashid] update test to look at one stage attempt
5f9df24 [Imran Rashid] style
7fd156a [Imran Rashid] refactor jsonDiff to avoid code duplication
73f1378 [Imran Rashid] test json; also add test cases for cleaned stages &
jobs
97d411f [Imran Rashid] json endpoint for one job
0c96147 [Imran Rashid] better error msgs for bad stageId vs bad attemptId
dddbd29 [Imran Rashid] stages have attempt; jobs are sorted; resource for
all attempts for one stage
190c17a [Imran Rashid] StagePage should distinguish no task data, from
unknown stage
84cd497 [Imran Rashid] AllJobsPage should still report correct completed &
failed job count, even if some have been cleaned, to make it consistent w/
AllStagesPage
36e4062 [Imran Rashid] SparkUI needs to know about startTime, so it can
list its own applicationInfo
b4c75ed [Imran Rashid] fix merge conflicts; need to widen visibility in a
few cases
e91750a [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey
56d2fc7 [Imran Rashid] jersey needs asm for SPARK_PREPEND_CLASSES to work
f7df095 [Imran Rashid] add test for accumulables, and discover that I need
update after all
9c0c125 [Imran Rashid] add accumulableInfo
00e9cc5 [Imran Rashid] more style
3377e61 [Imran Rashid] scaladoc
d05f7a9 [Imran Rashid] dont use case classes for status api POJOs, since
they have binary compatibility issues
654cecf [Imran Rashid] move all the status api POJOs to one file
b86e2b0 [Imran Rashid] style
18a8c45 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey
5598f19 [Imran Rashid] delete some unnecessary code, more to go
56edce0 [Imran Rashid] style
017c755 [Imran Rashid] add in metrics now available
1b78cb7 [Imran Rashid] fix some import ordering
0dc3ea7 [Imran Rashid] if app isnt found, reload apps from FS before giving
up
c7d884f [Imran Rashid] fix merge conflicts
0c12b50 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey
b6a96a8 [Imran Rashid] compare json by AST, not string
cd37845 [Imran Rashid] switch to using java.util.Dates for times
a4ab5aa [Imran Rashid] add in explicit dependency on jersey 1.9 -- maven
wasn't happy before this
4fdc39f [Imran Rashid] refactor case insensitive enum parsing
cba1ef6 [Imran Rashid] add security (maybe?) for metrics json
f0264a7 [Imran Rashid] switch to using jersey for metrics json
bceb3a9 [Imran Rashid] set http response code on error, some testing
e0356b6 [Imran Rashid] put new test expectation files in rat excludes (is
this OK?)
b252e7a [Imran Rashid] small cleanup of accidental changes
d1a8c92 [Imran Rashid] add sbt-revolved plugin, to make it easier to start
& stop http servers in sbt
4b398d0 [Imran Rashid] expose UI data as json in new endpoints
commit b83091ae4589feea78b056827bc3b7659d271e41
Author: Liang-Chi Hsieh <[email protected]>
Date: 2015-05-05T13:44:02Z
[MINOR] Minor update for document
Two minor doc errors in `BytesToBytesMap` and `UnsafeRow`.
Author: Liang-Chi Hsieh <[email protected]>
Closes #5906 from viirya/minor_doc and squashes the following commits:
27f9089 [Liang-Chi Hsieh] Minor update for doc.
commit 5ffc73e68b3a6ea30c25931e9e0495a4c7e5654c
Author: zsxwing <[email protected]>
Date: 2015-05-05T14:04:14Z
[SPARK-5074] [CORE] [TESTS] Fix the flakey test 'run shuffle with map stage
failure' in DAGSchedulerSuite
Test failure:
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=centos/2240/testReport/junit/org.apache.spark.scheduler/DAGSchedulerSuite/run_shuffle_with_map_stage_failure/
This is because many tests share the same `JobListener`. Because after each
test, `scheduler` isn't stopped. So actually it's still running. When running
the test `run shuffle with map stage failure`, some previous test may trigger
[ResubmitFailedStages](https://github.com/apache/spark/blob/ebc25a4ddfe07a67668217cec59893bc3b8cf730/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1120)
logic, and report `jobFailed` and override the global `failure` variable.
This PR uses `after` to call `scheduler.stop()` for each test.
Author: zsxwing <[email protected]>
Closes #5903 from zsxwing/SPARK-5074 and squashes the following commits:
1e6f13e [zsxwing] Fix the flakey test 'run shuffle with map stage failure'
in DAGSchedulerSuite
commit c6d1efba29a4235130024fee9f118e6b2cb89ce1
Author: zsxwing <[email protected]>
Date: 2015-05-05T14:09:58Z
[SPARK-7350] [STREAMING] [WEBUI] Attach the Streaming tab when calling
ssc.start()
It's meaningless to display the Streaming tab before `ssc.start()`. So we
should attach it in the `ssc.start` method.
Author: zsxwing <[email protected]>
Closes #5898 from zsxwing/SPARK-7350 and squashes the following commits:
e676487 [zsxwing] Attach the Streaming tab when calling ssc.start()
commit 5ab652cdb8bef10214edd079502a7f49017579aa
Author: MechCoder <[email protected]>
Date: 2015-05-05T14:53:11Z
[SPARK-7202] [MLLIB] [PYSPARK] Add SparseMatrixPickler to SerDe
Utilities for pickling and unpickling SparseMatrices using SerDe
Author: MechCoder <[email protected]>
Closes #5775 from MechCoder/spark-7202 and squashes the following commits:
7e689dc [MechCoder] [SPARK-7202] Add SparseMatrixPickler to SerDe
commit 5995ada96b661546a80657f2c5ed20604593e4aa
Author: Hrishikesh Subramonian <[email protected]>
Date: 2015-05-05T14:57:39Z
[SPARK-6612] [MLLIB] [PYSPARK] Python KMeans parity
The following items are added to Python kmeans:
kmeans - setEpsilon, setInitializationSteps
KMeansModel - computeCost, k
Author: Hrishikesh Subramonian <[email protected]>
Closes #5647 from FlytxtRnD/newPyKmeansAPI and squashes the following
commits:
b9e451b [Hrishikesh Subramonian] set seed to fixed value in doc test
5fd3ced [Hrishikesh Subramonian] doc test corrections
20b3c68 [Hrishikesh Subramonian] python 3 fixes
4d4e695 [Hrishikesh Subramonian] added arguments in python tests
21eb84c [Hrishikesh Subramonian] Python Kmeans - setEpsilon,
setInitializationSteps, k and computeCost added.
commit 9d250e64dac263bcbbad6b023382ac7b5b592408
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-05T15:00:31Z
Closes #5591
Closes #5878
commit d4cb38aeb7412a353c6cbca2a9b8f9729afbaba7
Author: Alain <[email protected]>
Date: 2015-05-05T15:47:34Z
[MLLIB] [TREE] Verify size of input rdd > 0 when building meta data
Require non empty input rdd such that we can take the first labeledpoint
and get the feature size
Author: Alain <[email protected]>
Author: [email protected] <[email protected]>
Closes #5810 from AiHe/decisiontree-issue and squashes the following
commits:
3b1d08a [[email protected]] [MLLIB][tree] merge the assertion into the
evaluation of numFeatures
cf2e567 [Alain] [MLLIB][tree] Use a rdd api to verify size of input rdd > 0
when building meta data
b448f47 [Alain] [MLLIB][tree] Verify size of input rdd > 0 when building
meta data
commit 1fdabf8dcdb31391fec3952d312eb0ac59ece43b
Author: Andrew Or <[email protected]>
Date: 2015-05-05T16:37:04Z
[SPARK-7237] Many user provided closures are not actually cleaned
Note: ~140 lines are tests.
In a nutshell, we never cleaned closures the user provided through the
following operations:
- sortBy
- keyBy
- mapPartitions
- mapPartitionsWithIndex
- aggregateByKey
- foldByKey
- foreachAsync
- one of the aliases for runJob
- runApproximateJob
For more details on a reproduction and why they were not cleaned, please
see [SPARK-7237](https://issues.apache.org/jira/browse/SPARK-7237).
Author: Andrew Or <[email protected]>
Closes #5787 from andrewor14/clean-more and squashes the following commits:
2f1f476 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
clean-more
7265865 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
clean-more
df3caa3 [Andrew Or] Address comments
7a3cc80 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
clean-more
6498f44 [Andrew Or] Add missing test for groupBy
e83699e [Andrew Or] Clean one more
8ac3074 [Andrew Or] Prevent NPE in tests when CC is used outside of an app
9ac5f9b [Andrew Or] Clean closures that are not currently cleaned
19e33b4 [Andrew Or] Add tests for all public RDD APIs that take in closures
commit 57e9f29e17d97ed9d0f110fb2ce5a075b854a841
Author: Andrew Or <[email protected]>
Date: 2015-05-05T16:37:49Z
[SPARK-7318] [STREAMING] DStream cleans objects that are not closures
I added a check in `ClosureCleaner#clean` to fail fast if this is detected
in the future. tdas
Author: Andrew Or <[email protected]>
Closes #5860 from andrewor14/streaming-closure-cleaner and squashes the
following commits:
8e971d7 [Andrew Or] Do not throw exception if object to clean is not closure
5ee4e25 [Andrew Or] Fix tests
eed3390 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
streaming-closure-cleaner
67eeff4 [Andrew Or] Add tests
a4fa768 [Andrew Or] Clean the closure, not the RDD
commit 9f1f9b1037ee003a07ff09d60bb360cf32c8a564
Author: jerryshao <[email protected]>
Date: 2015-05-05T16:43:49Z
[SPARK-7007] [CORE] Add a metric source for ExecutorAllocationManager
Add a metric source to expose the internal status of
ExecutorAllocationManager to better monitoring the resource usage of executors
when dynamic allocation is enable. Please help to review, thanks a lot.
Author: jerryshao <[email protected]>
Closes #5589 from jerryshao/dynamic-allocation-source and squashes the
following commits:
104d155 [jerryshao] rebase and address the comments
c501a2c [jerryshao] Address the comments
d237ba5 [jerryshao] Address the comments
2c3540f [jerryshao] Add a metric source for ExecutorAllocationManager
commit 18340d7be55a6834918956555bf820c96769aa52
Author: Burak Yavuz <[email protected]>
Date: 2015-05-05T18:01:25Z
[SPARK-7243][SQL] Reduce size for Contingency Tables in DataFrames
Reduced take size from 1e8 to 1e6.
cc rxin
Author: Burak Yavuz <[email protected]>
Closes #5900 from brkyvz/df-cont-followup and squashes the following
commits:
c11e762 [Burak Yavuz] fix grammar
b30ace2 [Burak Yavuz] address comments
a417ba5 [Burak Yavuz] [SPARK-7243][SQL] Reduce size for Contingency Tables
in DataFrames
commit ee374e89cd1f08730fed9d50b742627d5b19d241
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-05T18:45:37Z
[SPARK-7333] [MLLIB] Add BinaryClassificationEvaluator to PySpark
This PR adds `BinaryClassificationEvaluator` to Python ML Pipelines API,
which is a simple wrapper of the Scala implementation. oefirouz
Author: Xiangrui Meng <[email protected]>
Closes #5885 from mengxr/SPARK-7333 and squashes the following commits:
25d7451 [Xiangrui Meng] fix tests in python 3
babdde7 [Xiangrui Meng] fix doc
cb51e6a [Xiangrui Meng] add BinaryClassificationEvaluator in PySpark
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]