GitHub user AnkitBhardwaj12 opened a pull request:
https://github.com/apache/spark/pull/1298
Bounded priority queue and kryo issue
[SPARK-2306] : BoundedPriorityQueue is private and not registered with Kryo
Due to the non registration of BoundedPriorityQueue in kryo ,RDD.Top is not
working with kryo currently.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/AnkitBhardwaj12/spark
BoundedPriorityQueueAndKryoIssue
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1298.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1298
----
commit 88f1da3e212cb41388562bb42bf5d7364e0d3180
Author: Patrick Wendell <[email protected]>
Date: 2014-05-15T06:48:03Z
HOTFIX: Don't build Javadoc in Maven when creating releases.
Because we've added java package descriptions in some packages that don't
have any Java files, running the Javadoc target hits this issue:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4492654
To fix this I've simply removed the javadoc target when publishing
releases.
commit aa5f989a537ec616b30ce8f7e134959eb1bbdc11
Author: Andrew Ash <[email protected]>
Date: 2014-05-15T18:05:39Z
SPARK-1846 Ignore logs directory in RAT checks
https://issues.apache.org/jira/browse/SPARK-1846
Author: Andrew Ash <[email protected]>
Closes #793 from ash211/SPARK-1846 and squashes the following commits:
3f50db5 [Andrew Ash] SPARK-1846 Ignore logs directory in RAT checks
(cherry picked from commit 3abe2b734a5578966f671c34f1de34b4446b90f1)
Signed-off-by: Patrick Wendell <[email protected]>
commit 7515367e361c910ea81bae65e42e32a5a6763a5e
Author: Takuya UESHIN <[email protected]>
Date: 2014-05-15T18:20:21Z
[SPARK-1845] [SQL] Use AllScalaRegistrar for SparkSqlSerializer to register
serializers of ...
...Scala collections.
When I execute `orderBy` or `limit` for `SchemaRDD` including `ArrayType`
or `MapType`, `SparkSqlSerializer` throws the following exception:
```
com.esotericsoftware.kryo.KryoException: Class cannot be created (missing
no-arg constructor): scala.collection.immutable.$colon$colon
```
or
```
com.esotericsoftware.kryo.KryoException: Class cannot be created (missing
no-arg constructor): scala.collection.immutable.Vector
```
or
```
com.esotericsoftware.kryo.KryoException: Class cannot be created (missing
no-arg constructor): scala.collection.immutable.HashMap$HashTrieMap
```
and so on.
This is because registrations of serializers for each concrete collections
are missing in `SparkSqlSerializer`.
I believe it should use `AllScalaRegistrar`.
`AllScalaRegistrar` covers a lot of serializers for concrete classes of
`Seq`, `Map` for `ArrayType`, `MapType`.
Author: Takuya UESHIN <[email protected]>
Closes #790 from ueshin/issues/SPARK-1845 and squashes the following
commits:
d1ed992 [Takuya UESHIN] Use AllScalaRegistrar for SparkSqlSerializer to
register serializers of Scala collections.
(cherry picked from commit db8cc6f28abe4326cea6f53feb604920e4867a27)
Signed-off-by: Reynold Xin <[email protected]>
commit f9eeddccbd42064f5d1234b323ac74bb2a39e0aa
Author: Takuya UESHIN <[email protected]>
Date: 2014-05-15T18:21:33Z
[SPARK-1819] [SQL] Fix GetField.nullable.
`GetField.nullable` should be `true` not only when `field.nullable` is
`true` but also when `child.nullable` is `true`.
Author: Takuya UESHIN <[email protected]>
Closes #757 from ueshin/issues/SPARK-1819 and squashes the following
commits:
8781a11 [Takuya UESHIN] Modify a test to use named parameters.
5bfc77d [Takuya UESHIN] Fix GetField.nullable.
(cherry picked from commit 94c9d6f59859ebc77fae112c2c42c64b7a4d7f83)
Signed-off-by: Reynold Xin <[email protected]>
commit bc9a96e2e97d4a9b4a2075fb026be320b96bd08b
Author: Xiangrui Meng <[email protected]>
Date: 2014-05-15T18:59:59Z
[SPARK-1741][MLLIB] add predict(JavaRDD) to RegressionModel,
ClassificationModel, and KMeans
`model.predict` returns a RDD of Scala primitive type (Int/Double), which
is recognized as Object in Java. Adding predict(JavaRDD) could make life easier
for Java users.
Added tests for KMeans, LinearRegression, and NaiveBayes.
Will update examples after https://github.com/apache/spark/pull/653 gets
merged.
cc: @srowen
Author: Xiangrui Meng <[email protected]>
Closes #670 from mengxr/predict-javardd and squashes the following commits:
b77ccd8 [Xiangrui Meng] Merge branch 'master' into predict-javardd
43caac9 [Xiangrui Meng] add predict(JavaRDD) to RegressionModel,
ClassificationModel, and KMeans
(cherry picked from commit d52761d67f42ad4d2ff02d96f0675fb3ab709f38)
Signed-off-by: Patrick Wendell <[email protected]>
commit 35870574a6e33a39c139139c8739a82796af5ebb
Author: Sandy Ryza <[email protected]>
Date: 2014-05-15T23:35:39Z
SPARK-1851. Upgrade Avro dependency to 1.7.6 so Spark can read Avro file...
...s
Author: Sandy Ryza <[email protected]>
Closes #795 from sryza/sandy-spark-1851 and squashes the following commits:
79c8227 [Sandy Ryza] SPARK-1851. Upgrade Avro dependency to 1.7.6 so Spark
can read Avro files
(cherry picked from commit 08e7606a964e3d1ac1d565f33651ff0035c75044)
Signed-off-by: Patrick Wendell <[email protected]>
commit 22f261a1a3efbd466ca0588cc77beb92fb14b6a3
Author: Stevo SlaviÄ <[email protected]>
Date: 2014-05-15T23:44:14Z
SPARK-1803 Replaced colon in filenames with a dash
This patch replaces colon in several filenames with dash to make these
filenames Windows compatible.
Author: Stevo SlaviÄ <[email protected]>
Author: Stevo Slavic <[email protected]>
Closes #739 from sslavic/SPARK-1803 and squashes the following commits:
3ec66eb [Stevo Slavic] Removed extra empty line which was causing test to
fail
b967cc3 [Stevo SlaviÄ] Aligned tests and names of test resources
2b12776 [Stevo SlaviÄ] Fixed a typo in file name
1c5dfff [Stevo SlaviÄ] Replaced colon in file name with dash
8f5bf7f [Stevo SlaviÄ] Replaced colon in file name with dash
c5b5083 [Stevo SlaviÄ] Replaced colon in file name with dash
a49801f [Stevo SlaviÄ] Replaced colon in file name with dash
401d99e [Stevo SlaviÄ] Replaced colon in file name with dash
40a9621 [Stevo SlaviÄ] Replaced colon in file name with dash
4774580 [Stevo SlaviÄ] Replaced colon in file name with dash
004f8bb [Stevo SlaviÄ] Replaced colon in file name with dash
d6a3e2c [Stevo SlaviÄ] Replaced colon in file name with dash
b585126 [Stevo SlaviÄ] Replaced colon in file name with dash
028e48a [Stevo SlaviÄ] Replaced colon in file name with dash
ece0507 [Stevo SlaviÄ] Replaced colon in file name with dash
84f5d2f [Stevo SlaviÄ] Replaced colon in file name with dash
2fc7854 [Stevo SlaviÄ] Replaced colon in file name with dash
9e1467d [Stevo SlaviÄ] Replaced colon in file name with dash
(cherry picked from commit e66e31be51f396c8f6b7a45119b8b31c4d8cdf79)
Signed-off-by: Reynold Xin <[email protected]>
commit ffa9c49d44ec62762736427be8c37e59d72a5c6b
Author: Michael Armbrust <[email protected]>
Date: 2014-05-15T23:50:42Z
[SQL] Fix tiny/small ints from HiveMetastore.
Author: Michael Armbrust <[email protected]>
Closes #797 from marmbrus/smallInt and squashes the following commits:
2db9dae [Michael Armbrust] Fix tiny/small ints from HiveMetastore.
(cherry picked from commit a4aafe5f9fb191533400caeafddf04986492c95f)
Signed-off-by: Reynold Xin <[email protected]>
commit 2e418f517e29aa7447d67c495af6198d9e163f53
Author: Prashant Sharma <[email protected]>
Date: 2014-05-15T23:58:37Z
Fixes a misplaced comment.
Fixes a misplaced comment from #785.
@pwendell
Author: Prashant Sharma <[email protected]>
Closes #788 from ScrapCodes/patch-1 and squashes the following commits:
3ef6a69 [Prashant Sharma] Update package-info.java
67d9461 [Prashant Sharma] Update package-info.java
(cherry picked from commit e1e3416c4e5f6f32983597d74866dbb809cf6a5e)
Signed-off-by: Reynold Xin <[email protected]>
commit a2742d8506463dd0c7bbab06abcd68a0ae44c8e5
Author: Huajian Mao <[email protected]>
Date: 2014-05-16T01:20:16Z
Typos in Spark
Author: Huajian Mao <[email protected]>
Closes #798 from huajianmao/patch-1 and squashes the following commits:
208a454 [Huajian Mao] A typo in Task
1b515af [Huajian Mao] A typo in the message
(cherry picked from commit 94c5139607ec876782e594012a108ebf55fa97db)
Signed-off-by: Reynold Xin <[email protected]>
commit 54414716ba9d3f02cfcaccf292d6254783617f78
Author: Aaron Davidson <[email protected]>
Date: 2014-05-16T04:37:58Z
SPARK-1860: Do not cleanup application work/ directories by default
This causes an unrecoverable error for applications that are running for
longer
than 7 days that have jars added to the SparkContext, as the jars are
cleaned up
even though the application is still running.
Author: Aaron Davidson <[email protected]>
Closes #800 from aarondav/shitty-defaults and squashes the following
commits:
a573fbb [Aaron Davidson] SPARK-1860: Do not cleanup application work/
directories by default
(cherry picked from commit bb98ecafce196ecc5bc3a1e4cc9264df7b752c6a)
Signed-off-by: Patrick Wendell <[email protected]>
commit eac4ee89021b3929d129c94a3116040e9281a636
Author: Cheng Hao <[email protected]>
Date: 2014-05-16T05:12:34Z
[Spark-1461] Deferred Expression Evaluation (short-circuit evaluation)
This patch unify the foldable & nullable interface for Expression.
1) Deterministic-less UDF (like Rand()) can not be folded.
2) Short-circut will significantly improves the performance in Expression
Evaluation, however, the stateful UDF should not be ignored in a short-circuit
evaluation(e.g. in expression: col1 > 0 and row_sequence() < 1000,
row_sequence() can not be ignored even if col1 > 0 is false)
I brought an concept of DeferredObject from Hive, which has 2 kinds of
children classes (EagerResult / DeferredResult), the former requires triggering
the evaluation before it's created, while the later trigger the evaluation when
first called its get() method.
Author: Cheng Hao <[email protected]>
Closes #446 from chenghao-intel/expression_deferred_evaluation and squashes
the following commits:
d2729de [Cheng Hao] Fix the codestyle issues
a08f09c [Cheng Hao] fix bug in or/and short-circuit evaluation
af2236b [Cheng Hao] revert the short-circuit expression evaluation for IF
b7861d2 [Cheng Hao] Add Support for Deferred Expression Evaluation
(cherry picked from commit a20fea98811d98958567780815fcf0d4fb4e28d4)
Signed-off-by: Reynold Xin <[email protected]>
commit eec4dd884264eda0130fcc2922f6b281b273a95b
Author: Patrick Wendell <[email protected]>
Date: 2014-05-16T06:31:43Z
SPARK-1862: Support for MapR in the Maven build.
Author: Patrick Wendell <[email protected]>
Closes #803 from pwendell/mapr-support and squashes the following commits:
8df60e4 [Patrick Wendell] SPARK-1862: Support for MapR in the Maven build.
(cherry picked from commit 17702e280c4b0b030870962fcb3d50c3085ae862)
Signed-off-by: Patrick Wendell <[email protected]>
commit a16f46f0d004df748a695cc8f2c01f5950b928cf
Author: Patrick Wendell <[email protected]>
Date: 2014-05-16T07:09:43Z
Revert "[maven-release-plugin] prepare for next development iteration"
This reverts commit c4746aa6fe4aaf383e69e34353114d36d1eb9ba6.
commit 610615b9d3d0340e88d339a03bf6e8829246bcae
Author: Patrick Wendell <[email protected]>
Date: 2014-05-16T07:09:48Z
Revert "[maven-release-plugin] prepare release v1.0.0-rc7"
This reverts commit 9212b3e5bb5545ccfce242da8d89108e6fb1c464.
commit 80eea0f111c06260ffaa780d2f3f7facd09c17bc
Author: Patrick Wendell <[email protected]>
Date: 2014-05-16T08:18:53Z
[maven-release-plugin] prepare release v1.0.0-rc8
commit e5436b8c1a79ce108f3af402455ac5f6dc5d1eb3
Author: Patrick Wendell <[email protected]>
Date: 2014-05-16T08:19:00Z
[maven-release-plugin] prepare for next development iteration
commit ff47cdc0cefed7c40da0f4be39770adfa7b4371f
Author: Zhen Peng <[email protected]>
Date: 2014-05-16T18:37:18Z
bugfix: overflow of graphx Edge compare function
Author: Zhen Peng <[email protected]>
Closes #769 from zhpengg/bugfix-graphx-edge-compare and squashes the
following commits:
8a978ff [Zhen Peng] add ut for graphx Edge.lexicographicOrdering.compare
413c258 [Zhen Peng] there maybe a overflow for two Long's substraction
(cherry picked from commit fa6de408a131a3e84350a60af74a92c323dfc5eb)
Signed-off-by: Reynold Xin <[email protected]>
commit 386b31cbc5dd9ef1e9d989a3c6a3ac587c3684c1
Author: Michael Armbrust <[email protected]>
Date: 2014-05-16T18:47:00Z
[SQL] Implement between in hql
Author: Michael Armbrust <[email protected]>
Closes #804 from marmbrus/between and squashes the following commits:
ae24672 [Michael Armbrust] add golden answer.
d9997ef [Michael Armbrust] Implement between in hql.
9bd4433 [Michael Armbrust] Better error on parse failures.
(cherry picked from commit 032d6632ad4ab88c97c9e568b63169a114220a02)
Signed-off-by: Reynold Xin <[email protected]>
commit 2ba6711efb652da65999b2e5ba6eadd3e522aac7
Author: Matei Zaharia <[email protected]>
Date: 2014-05-17T00:35:05Z
Tweaks to Mesos docs
- Mention Apache downloads first
- Shorten some wording
Author: Matei Zaharia <[email protected]>
Closes #806 from mateiz/doc-update and squashes the following commits:
d9345cd [Matei Zaharia] typo
a179f8d [Matei Zaharia] Tweaks to Mesos docs
(cherry picked from commit fed6303f29250bd5e656dbdd731b38938c933a61)
Signed-off-by: Matei Zaharia <[email protected]>
commit a16a19fbd382e1d39cdf403246ad215666f1f402
Author: Michael Armbrust <[email protected]>
Date: 2014-05-17T03:25:10Z
SPARK-1864 Look in spark conf instead of system properties when propagating
configuration to executors.
Author: Michael Armbrust <[email protected]>
Closes #808 from marmbrus/confClasspath and squashes the following commits:
4c31d57 [Michael Armbrust] Look in spark conf instead of system properties
when propagating configuration to executors.
(cherry picked from commit a80a6a139e729ee3f81ec4f0028e084d2d9f7e82)
Signed-off-by: Patrick Wendell <[email protected]>
commit 9cd12f33df6e56d34ff3019c714bddfe298fe5c7
Author: Patrick Wendell <[email protected]>
Date: 2014-05-17T04:42:14Z
Version bump of spark-ec2 scripts
This will allow us to change things in spark-ec2 related to the 1.0 release.
Author: Patrick Wendell <[email protected]>
Closes #809 from pwendell/spark-ec2 and squashes the following commits:
59117fb [Patrick Wendell] Version bump of spark-ec2 scripts
(cherry picked from commit c0ab85d7320cea90e6331fb03a70349bc804c1b1)
Signed-off-by: Patrick Wendell <[email protected]>
commit 318739a0794c9d2994901a5d3b16c4c133d293c6
Author: Andrew Or <[email protected]>
Date: 2014-05-17T05:34:38Z
[SPARK-1808] Route bin/pyspark through Spark submit
**Problem.** For `bin/pyspark`, there is currently no other way to specify
Spark configuration properties other than through `SPARK_JAVA_OPTS` in
`conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead,
it needs to pick up configurations explicitly specified in
`conf/spark-defaults.conf`.
**Solution.** Have `bin/pyspark` invoke `bin/spark-submit`, like all of its
counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This
has the additional benefit of making the invocation of all the user facing
Spark scripts consistent.
**Details.** `bin/pyspark` inherently handles two cases: (1) running python
applications and (2) running the python shell. For (1), Spark submit already
handles running python applications. For cases in which `bin/pyspark` is given
a python file, we can simply call pass the file directly to Spark submit and
let it handle the rest.
For case (2), `bin/pyspark` starts a python process as before, which
launches the JVM as a sub-process. The existing code already provides a code
path to do this. All we needed to change is to use `bin/spark-submit` instead
of `spark-class` to launch the JVM. This requires modifications to Spark submit
to handle the pyspark shell as a special case.
This has been tested locally (OSX and Windows 7), on a standalone cluster,
and on a YARN cluster. Running IPython also works as before, except now it
takes in Spark submit arguments too.
Author: Andrew Or <[email protected]>
Closes #799 from andrewor14/pyspark-submit and squashes the following
commits:
bf37e36 [Andrew Or] Minor changes
01066fa [Andrew Or] bin/pyspark for Windows
c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes)
1866f85 [Andrew Or] Windows is not cooperating
456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is
not set
7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
pyspark-submit
b7ba0d8 [Andrew Or] Address a few comments (minor)
06eb138 [Andrew Or] Use shlex instead of writing our own parser
05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into
pyspark-submit
a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly
6fba412 [Andrew Or] Deal with quotes + address various comments
fe4c8a7 [Andrew Or] Update --help for bin/pyspark
afe47bf [Andrew Or] Fix spark shell
f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into
pyspark-submit
a371d26 [Andrew Or] Route bin/pyspark through Spark submit
(cherry picked from commit 4b8ec6fcfd7a7ef0857d5b21917183c181301c95)
Signed-off-by: Patrick Wendell <[email protected]>
commit 03b4242630600f010bf9ddada0e6008ba9141d6b
Author: Andrew Or <[email protected]>
Date: 2014-05-17T05:36:23Z
[SPARK-1824] Remove <master> from Python examples
A recent PR (#552) fixed this for all Scala / Java examples. We need to do
it for python too.
Note that this blocks on #799, which makes `bin/pyspark` go through Spark
submit. With only the changes in this PR, the only way to run these examples is
through Spark submit. Once #799 goes in, you can use `bin/pyspark` to run them
too. For example,
```
bin/pyspark examples/src/main/python/pi.py 100 --master
local-cluster[4,1,512]
```
Author: Andrew Or <[email protected]>
Closes #802 from andrewor14/python-examples and squashes the following
commits:
cf50b9f [Andrew Or] De-indent python comments (minor)
50f80b1 [Andrew Or] Remove pyFiles from SparkContext construction
c362f69 [Andrew Or] Update docs to use spark-submit for python applications
7072c6a [Andrew Or] Merge branch 'master' of github.com:apache/spark into
python-examples
427a5f0 [Andrew Or] Update docs
d32072c [Andrew Or] Remove <master> from examples + update usages
(cherry picked from commit cf6cbe9f76c3b322a968c836d039fc5b70d4ce43)
Signed-off-by: Patrick Wendell <[email protected]>
commit 3b3d7c8ec4d2ddf632d9fd46a45c87586a8db174
Author: Patrick Wendell <[email protected]>
Date: 2014-05-17T05:58:47Z
Make deprecation warning less severe
Just a small change. I think it's good not to scare people who are using
the old options.
Author: Patrick Wendell <[email protected]>
Closes #810 from pwendell/warnings and squashes the following commits:
cb8a311 [Patrick Wendell] Make deprecation warning less severe
(cherry picked from commit 442808a7482b81c8de887c901b424683da62022e)
Signed-off-by: Patrick Wendell <[email protected]>
commit e98bc194bd694e81d7403d011bcbe2b623cb30e4
Author: Patrick Wendell <[email protected]>
Date: 2014-05-17T06:10:46Z
Revert "[maven-release-plugin] prepare for next development iteration"
This reverts commit e5436b8c1a79ce108f3af402455ac5f6dc5d1eb3.
commit 80889110aad54866f113b18f206694148f715a05
Author: Patrick Wendell <[email protected]>
Date: 2014-05-17T06:10:53Z
Revert "[maven-release-plugin] prepare release v1.0.0-rc8"
This reverts commit 80eea0f111c06260ffaa780d2f3f7facd09c17bc.
commit 920f947eb5a22a679c0c3186cf69ee75f6041c75
Author: Patrick Wendell <[email protected]>
Date: 2014-05-17T06:37:50Z
[maven-release-plugin] prepare release v1.0.0-rc9
commit f8e611955096c5c1c7db5764b9d2851b1d295f0d
Author: Patrick Wendell <[email protected]>
Date: 2014-05-17T06:37:58Z
[maven-release-plugin] prepare for next development iteration
commit e06e4b0affc00bc15498313a36edbc9b7e2aaae2
Author: Neville Li <[email protected]>
Date: 2014-05-18T20:31:23Z
Fix spark-submit path in spark-shell & pyspark
Author: Neville Li <[email protected]>
Closes #812 from nevillelyh/neville/v1.0 and squashes the following commits:
0dc33ed [Neville Li] Fix spark-submit path in pyspark
becec64 [Neville Li] Fix spark-submit path in spark-shell
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---