Repository: spark
Updated Branches:
refs/heads/branch-0.9 c37db1537 - 7e4a0e1a0
[SPARK-2547]:The clustering documentaion example provided for spark 0.9
I modified a trivial mistake in the MLlib documentation.
I checked that the python sample code for a k-means clustering can correctly
Repository: spark
Updated Branches:
refs/heads/branch-1.0 1a0a2f81a - 2693035ba
[SPARK-2580] [PySpark] keep silent in worker if JVM close the socket
During rdd.take(n), JVM will close the socket if it had got enough data, the
Python worker should keep silent in this case.
In the same time,
Repository: spark
Updated Branches:
refs/heads/branch-1.0 2693035ba - e0bc72eb7
[SPARK-791] [PySpark] fix pickle itemgetter with cloudpickle
fix the problem with pickle operator.itemgetter with multiple index.
Author: Davies Liu davies@gmail.com
Closes #1627 from davies/itemgetter and
[SPARK-2024] Add saveAsSequenceFile to PySpark
JIRA issue: https://issues.apache.org/jira/browse/SPARK-2024
This PR is a followup to #455 and adds capabilities for saving PySpark RDDs
using SequenceFile or any Hadoop OutputFormats.
* Added RDD methods ```saveAsSequenceFile```,
Repository: spark
Updated Branches:
refs/heads/master 437dc8c5b - 94d1f46fc
http://git-wip-us.apache.org/repos/asf/spark/blob/94d1f46f/python/pyspark/tests.py
--
diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
with an
incorrect ClassTag by wrapping it and overriding its ClassTag. This should be
okay for cases where the Scala code that calls collect() knows what type of
array should be allocated, which is the case in the MLlib wrappers.
Author: Josh Rosen joshro...@apache.org
Closes #1639 from JoshRosen/SPARK
merged.
Both of these fixes are useful when backporting changes.
Author: Josh Rosen joshro...@apache.org
Closes #1668 from JoshRosen/pr-script-improvements and squashes the following
commits:
ff4f33a [Josh Rosen] Default SPARK_HOME to cwd(); detect missing JIRA
credentials.
ed5bc57 [Josh Rosen
Repository: spark
Updated Branches:
refs/heads/master e02136214 - cc820502f
Docs: monitoring, streaming programming guide
Fix several awkward wordings and grammatical issues in the following
documents:
* docs/monitoring.md
* docs/streaming-programming-guide.md
Author: kballou
Repository: spark
Updated Branches:
refs/heads/master e139e2be6 - 55349f9fe
[SPARK-1740] [PySpark] kill the python worker
Kill only the python worker related to cancelled tasks.
The daemon will start a background thread to monitor all the opened sockets for
all workers. If the socket is
Repository: spark
Updated Branches:
refs/heads/master e053c5581 - 59f84a953
[SPARK-1687] [PySpark] pickable namedtuple
Add an hook to replace original namedtuple with an pickable one, then
namedtuple could be used in RDDs.
PS: pyspark should be import BEFORE from collections import
Repository: spark
Updated Branches:
refs/heads/branch-1.1 3823f6d25 - bfd2f3958
[SPARK-1687] [PySpark] pickable namedtuple
Add an hook to replace original namedtuple with an pickable one, then
namedtuple could be used in RDDs.
PS: pyspark should be import BEFORE from collections import
Repository: spark
Updated Branches:
refs/heads/branch-1.1 aa7a48ee9 - 2225d18a7
[SPARK-1687] [PySpark] fix unit tests related to pickable namedtuple
serializer is imported multiple times during doctests, so it's better to make
_hijack_namedtuple() safe to be called multiple times.
Author:
Repository: spark
Updated Branches:
refs/heads/master 8e7d5ba1a - 9fd82dbbc
[SPARK-1687] [PySpark] fix unit tests related to pickable namedtuple
serializer is imported multiple times during doctests, so it's better to make
_hijack_namedtuple() safe to be called multiple times.
Author:
Repository: spark
Updated Branches:
refs/heads/master 1d03a26a4 - 28dcbb531
[SPARK-2898] [PySpark] fix bugs in deamon.py
1. do not use signal handler for SIGCHILD, it's easy to cause deadlock
2. handle EINTR during accept()
3. pass errno into JVM
4. handle EAGAIN during fork()
Now, it can
Repository: spark
Updated Branches:
refs/heads/branch-1.1 bb23b118e - 92daffed4
[SPARK-2898] [PySpark] fix bugs in deamon.py
1. do not use signal handler for SIGCHILD, it's easy to cause deadlock
2. handle EINTR during accept()
3. pass errno into JVM
4. handle EAGAIN during fork()
Now, it
TestOutputFormat.test_newhadoop on Python 2.6 until SPARK-2951 is fixed.
- Fix MLlib _deserialize_double on Python 2.6.
Closes #1868. Closes #1042.
Author: Josh Rosen joshro...@apache.org
Closes #1874 from JoshRosen/python2.6 and squashes the following commits:
983d259 [Josh Rosen] [SPARK-2954] Fix
TestOutputFormat.test_newhadoop on Python 2.6 until SPARK-2951 is fixed.
- Fix MLlib _deserialize_double on Python 2.6.
Closes #1868. Closes #1042.
Author: Josh Rosen joshro...@apache.org
Closes #1874 from JoshRosen/python2.6 and squashes the following commits:
983d259 [Josh Rosen] [SPARK-2954] Fix
is to reset currentLocalityIndex
after recomputing the locality levels.
Thanks to kayousterhout, mridulm, and lirui-intel for helping me to debug this.
Author: Josh Rosen joshro...@apache.org
Closes #1896 from JoshRosen/SPARK-2931 and squashes the following commits:
48b60b5 [Josh Rosen] Move
here is to reset currentLocalityIndex
after recomputing the locality levels.
Thanks to kayousterhout, mridulm, and lirui-intel for helping me to debug this.
Author: Josh Rosen joshro...@apache.org
Closes #1896 from JoshRosen/SPARK-2931 and squashes the following commits:
48b60b5 [Josh Rosen
an opportunity
to clean this up later if we sever the circular dependencies between
BlockManager and other components and pass those components to BlockManager's
constructor.
Author: Josh Rosen joshro...@apache.org
Closes #1976 from JoshRosen/SPARK-2977 and squashes the following commits:
a9cd1e1 [Josh
to clean this up later if we sever the circular dependencies between
BlockManager and other components and pass those components to BlockManager's
constructor.
Author: Josh Rosen joshro...@apache.org
Closes #1976 from JoshRosen/SPARK-2977 and squashes the following commits:
a9cd1e1 [Josh Rosen
Repository: spark
Updated Branches:
refs/heads/branch-1.1 8c7957446 - bd3ce2ffb
[SPARK-2677] BasicBlockFetchIterator#next can wait forever
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Closes #1632 from sarutak/SPARK-2677 and squashes the following commits:
cddbc7b [Kousuke Saruta]
Repository: spark
Updated Branches:
refs/heads/branch-1.1 a12d3ae32 - 721f2fdc9
[SPARK-3035] Wrong example with SparkContext.addFile
https://issues.apache.org/jira/browse/SPARK-3035
fix for wrong document.
Author: iAmGhost kdh7...@gmail.com
Closes #1942 from iAmGhost/master and squashes
Repository: spark
Updated Branches:
refs/heads/master ac6411c6e - 379e7585c
[SPARK-3035] Wrong example with SparkContext.addFile
https://issues.apache.org/jira/browse/SPARK-3035
fix for wrong document.
Author: iAmGhost kdh7...@gmail.com
Closes #1942 from iAmGhost/master and squashes the
Repository: spark
Updated Branches:
refs/heads/branch-1.1 721f2fdc9 - 5dd571c29
[SPARK-1065] [PySpark] improve supporting for large broadcast
Passing large object by py4j is very slow (cost much memory), so pass broadcast
objects via files (similar to parallelize()).
Add an option to keep
Repository: spark
Updated Branches:
refs/heads/master 2fc8aca08 - bc95fe08d
In the stop method of ConnectionManager to cancel the ackTimeoutMonitor
cc JoshRosen sarutak
Author: GuoQiang Li wi...@qq.com
Closes #1989 from witgo/cancel_ackTimeoutMonitor and squashes the following
commits
Repository: spark
Updated Branches:
refs/heads/branch-1.1 5dd571c29 - f02e327f0
In the stop method of ConnectionManager to cancel the ackTimeoutMonitor
cc JoshRosen sarutak
Author: GuoQiang Li wi...@qq.com
Closes #1989 from witgo/cancel_ackTimeoutMonitor and squashes the following
commits
Repository: spark
Updated Branches:
refs/heads/master 3a5962f0f - d1d0ee41c
[SPARK-3103] [PySpark] fix saveAsTextFile() with utf-8
bugfix: It will raise an exception when it try to encode non-ASCII strings into
unicode. It should only encode unicode as utf-8.
Author: Davies Liu
Repository: spark
Updated Branches:
refs/heads/branch-1.1 cc4015d2f - e08333463
[SPARK-3103] [PySpark] fix saveAsTextFile() with utf-8
bugfix: It will raise an exception when it try to encode non-ASCII strings into
unicode. It should only encode unicode as utf-8.
Author: Davies Liu
tests,
irrespective of whether SparkSQL itself has been modified. It also includes
Davies' fix for the bug.
Closes #2026.
Author: Josh Rosen joshro...@apache.org
Author: Davies Liu davies@gmail.com
Closes #2027 from JoshRosen/pyspark-sql-fix and squashes the following commits:
9af2708
Repository: spark
Updated Branches:
refs/heads/master 7eb9cbc27 - cbfc26ba4
[SPARK-3089] Fix meaningless error message in ConnectionManager
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Closes #2000 from sarutak/SPARK-3089 and squashes the following commits:
02dfdea [Kousuke Saruta]
Repository: spark
Updated Branches:
refs/heads/master 31f0b071e - 94053a7b7
SPARK-2333 - spark_ec2 script should allow option for existing security group
- Uses the name tag to identify machines in a cluster.
- Allows overriding the security group name so it doesn't need to coincide
Repository: spark
Updated Branches:
refs/heads/branch-1.1 04a320862 - c3952b092
SPARK-2333 - spark_ec2 script should allow option for existing security group
- Uses the name tag to identify machines in a cluster.
- Allows overriding the security group name so it doesn't need to
Repository: spark
Updated Branches:
refs/heads/branch-1.1 c3952b092 - f6b4ab83c
Move a bracket in validateSettings of SparkConf
Move a bracket in validateSettings of SparkConf
Author: hzw19900416 carlmartin...@gmail.com
Closes #2012 from hzw19900416/codereading and squashes the following
Repository: spark
Updated Branches:
refs/heads/master 94053a7b7 - 76eaeb452
Move a bracket in validateSettings of SparkConf
Move a bracket in validateSettings of SparkConf
Author: hzw19900416 carlmartin...@gmail.com
Closes #2012 from hzw19900416/codereading and squashes the following
Repository: spark
Updated Branches:
refs/heads/master 76eaeb452 - d7e80c259
[SPARK-2790] [PySpark] fix zip with serializers which have different batch
sizes.
If two RDDs have different batch size in serializers, then it will try to
re-serialize the one with smaller batch size, then call
Repository: spark
Updated Branches:
refs/heads/master f3d65cd0b - 76bb044b9
[Minor] fix typo
Fix a typo in comment.
Author: Liang-Chi Hsieh vii...@gmail.com
Closes #2105 from viirya/fix_typo and squashes the following commits:
6596a80 [Liang-Chi Hsieh] fix typo.
Project:
Repository: spark
Updated Branches:
refs/heads/master db436e36c - 8df4dad49
[SPARK-2871] [PySpark] add approx API for RDD
RDD.countApprox(self, timeout, confidence=0.95)
:: Experimental ::
Approximate version of count() that returns a potentially incomplete
result
Repository: spark
Updated Branches:
refs/heads/master cc40a709c - fd8ace2d9
[FIX] fix error message in sendMessageReliably
rxin
Author: Xiangrui Meng m...@databricks.com
Closes #2120 from mengxr/sendMessageReliably and squashes the following commits:
b14400c [Xiangrui Meng] fix error
Repository: spark
Updated Branches:
refs/heads/master 8856c3d86 - 3cedc4f4d
[SPARK-2871] [PySpark] add histgram() API
RDD.histogram(buckets)
Compute a histogram using the provided buckets. The buckets
are all open to the right except for the last which is closed.
e.g.
Repository: spark
Updated Branches:
refs/heads/branch-1.1 3a9d874d7 - 83d273023
[SPARK-2871] [PySpark] add histgram() API
RDD.histogram(buckets)
Compute a histogram using the provided buckets. The buckets
are all open to the right except for the last which is closed.
Repository: spark
Updated Branches:
refs/heads/master be043e3f2 - d8345471c
Fix unclosed HTML tag in Yarn docs.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d8345471
Tree:
Repository: spark
Updated Branches:
refs/heads/master 48f42781d - 4fa2fda88
[SPARK-2871] [PySpark] add RDD.lookup(key)
RDD.lookup(key)
Return the list of values in the RDD for key `key`. This operation
is done efficiently if the RDD has a known partitioner by only
Repository: spark
Updated Branches:
refs/heads/master 4fa2fda88 - 7faf755ae
Spark-3213 Fixes issue with spark-ec2 not detecting slaves created with Launch
More like this
... copy the spark_cluster_tag from a spot instance requests over to the
instances.
Author: Vida Ha v...@databricks.com
Repository: spark
Updated Branches:
refs/heads/branch-1.0 edea1efe0 - 31de05b08
[SPARK-3150] Fix NullPointerException in in Spark recovery: Add initializing
default values in DriverInfo.init()
The issue happens when Spark is run standalone on a cluster.
When master and driver fall
Repository: spark
Updated Branches:
refs/heads/branch-1.1 f8f7a0c9d - fd98020a9
[SPARK-3150] Fix NullPointerException in in Spark recovery: Add initializing
default values in DriverInfo.init()
The issue happens when Spark is run standalone on a cluster.
When master and driver fall
Repository: spark
Updated Branches:
refs/heads/master 39012452d - 96df92906
[SPARK-3190] Avoid overflow in VertexRDD.count()
VertexRDDs with more than 4 billion elements are counted incorrectly due to
integer overflow when summing partition sizes. This PR fixes the issue by
converting
Repository: spark
Updated Branches:
refs/heads/branch-1.0 31de05b08 - 5481196ab
[SPARK-3190] Avoid overflow in VertexRDD.count()
VertexRDDs with more than 4 billion elements are counted incorrectly due to
integer overflow when summing partition sizes. This PR fixes the issue by
converting
Repository: spark
Updated Branches:
refs/heads/master 665e71d14 - 27df6ce6a
[SPARK-3279] Remove useless field variable in ApplicationMaster
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Closes #2177 from sarutak/SPARK-3279 and squashes the following commits:
2955edc [Kousuke Saruta]
Repository: spark
Updated Branches:
refs/heads/master 27df6ce6a - e248328b3
[SPARK-3307] [PySpark] Fix doc string of SparkContext.broadcast()
remove invalid docs
Author: Davies Liu davies@gmail.com
Closes #2202 from davies/keep and squashes the following commits:
aa3b44f [Davies Liu]
Repository: spark
Updated Branches:
refs/heads/branch-1.1 c71b5c6db - 98d0716a1
[SPARK-3307] [PySpark] Fix doc string of SparkContext.broadcast()
remove invalid docs
Author: Davies Liu davies@gmail.com
Closes #2202 from davies/keep and squashes the following commits:
aa3b44f [Davies
Repository: spark
Updated Branches:
refs/heads/master 0f16b23cd - 32ec0a8cd
SPARK-3331 [BUILD] PEP8 tests fail because they check unzipped py4j code
PEP8 tests run on files under ./python, but unzipped py4j code is found at
./python/build/py4j. Py4J code fails style checks and can fail
instances and
logging warnings, or maybe using another mechanism to group instances into
clusters. For the 1.1.0 release, though, I propose that we just revert this
patch.
Author: Josh Rosen joshro...@apache.org
Closes #2225 from JoshRosen/revert-ec2-cluster-naming and squashes the
following
Repository: spark
Updated Branches:
refs/heads/master c5cbc4923 - 7c6e71f05
[SPARK-2435] Add shutdown hook to pyspark
Author: Matthew Farrellee m...@redhat.com
Closes #2183 from mattf/SPARK-2435 and squashes the following commits:
ee0ee99 [Matthew Farrellee] [SPARK-2435] Add shutdown hook
Repository: spark
Updated Branches:
refs/heads/master 62c557609 - 7ff8c45d7
[SPARK-3399][PySpark] Test for PySpark should ignore HADOOP_CONF_DIR and
YARN_CONF_DIR
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Closes #2270 from sarutak/SPARK-3399 and squashes the following commits:
Repository: spark
Updated Branches:
refs/heads/master baff7e936 - da35330e8
Spark-3406 add a default storage level to python RDD persist API
Author: Holden Karau hol...@pigscanfly.ca
Closes #2280 from
holdenk/SPARK-3406-Python-RDD-persist-api-does-not-have-default-storage-level
and
Repository: spark
Updated Branches:
refs/heads/master 21a1e1bb8 - 110fb8b24
[SPARK-2334] fix AttributeError when call PipelineRDD.id()
The underline JavaRDD for PipelineRDD is created lazily, it's delayed until
call _jrdd.
The id of JavaRDD is cached as `_id`, it saves a RPC call in py4j
Repository: spark
Updated Branches:
refs/heads/master e2614038e - ecfa76cdf
[SPARK-3415] [PySpark] removes SerializingAdapter code
This code removes the SerializingAdapter code that was copied from PiCloud
Author: Ward Viaene ward.via...@bigdatapartnership.com
Closes #2287 from
Repository: spark
Updated Branches:
refs/heads/master 16a73c247 - 386bc24eb
Provide a default PYSPARK_PYTHON for python/run_tests
Without this the version of python used in the test is not
recorded. The error is,
Testing with Python version:
./run-tests: line 57: --version: command not
Repository: spark
Updated Branches:
refs/heads/master f0c87dc86 - 26503fdf2
[HOTFIX] Fix scala style issue introduced by #2276.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/26503fdf
Tree:
Repository: spark
Updated Branches:
refs/heads/master ed1980ffa - 1ef656ea8
[SPARK-3047] [PySpark] add an option to use str in textFileRDD
str is much efficient than unicode (both CPU and memory), it'e better to use
str in textFileRDD. In order to keep compatibility, use unicode by default.
Repository: spark
Updated Branches:
refs/heads/master 71af030b4 - 885d1621b
[SPARK-3500] [SQL] use JavaSchemaRDD as SchemaRDD._jschema_rdd
Currently, SchemaRDD._jschema_rdd is SchemaRDD, the Scala API (coalesce(),
repartition()) can not been called in Python easily, there is no way to
Repository: spark
Updated Branches:
refs/heads/branch-1.1 6cbf83c05 - 9c06c7230
[SPARK-3500] [SQL] use JavaSchemaRDD as SchemaRDD._jschema_rdd
Currently, SchemaRDD._jschema_rdd is SchemaRDD, the Scala API (coalesce(),
repartition()) can not been called in Python easily, there is no way to
Repository: spark
Updated Branches:
refs/heads/master 0f8c4edf4 - 2aea0da84
[SPARK-3030] [PySpark] Reuse Python worker
Reuse Python worker to avoid the overhead of fork() Python process for each
tasks. It also tracks the broadcasts for each worker, avoid sending repeated
broadcasts.
This
Repository: spark
Updated Branches:
refs/heads/master 2aea0da84 - 4e3fbe8cd
[SPARK-3463] [PySpark] aggregate and show spilled bytes in Python
Aggregate the number of bytes spilled into disks during aggregation or sorting,
show them in Web UI.
Repository: spark
Updated Branches:
refs/heads/master da33acb8b - 60050f428
[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.
Also made some cosmetic cleanups.
Author: Aaron Staple aaron.sta...@gmail.com
Closes #2385 from staple/SPARK-1087 and squashes the
Repository: spark
Updated Branches:
refs/heads/master 008a5ed48 - 983609a4d
[Docs] Correct spark.files.fetchTimeout default value
change the value of spark.files.fetchTimeout
Author: viper-kun xukun...@huawei.com
Closes #2406 from viper-kun/master and squashes the following commits:
Repository: spark
Updated Branches:
refs/heads/master 9306297d1 - e77fa81a6
[SPARK-3554] [PySpark] use broadcast automatically for large closure
Py4j can not handle large string efficiently, so we should use broadcast for
large closure automatically. (Broadcast use local filesystem to pass
Repository: spark
Updated Branches:
refs/heads/master a48956f58 - be0c7563e
[SPARK-1701] Clarify slice vs partition in the programming guide
This is a partial solution to SPARK-1701, only addressing the
documentation confusion.
Additional work can be to actually change the numSlices
Repository: spark
Updated Branches:
refs/heads/master be0c7563e - a03e5b81e
[SPARK-1701] [PySpark] remove slice terminology from python examples
Author: Matthew Farrellee m...@redhat.com
Closes #2304 from mattf/SPARK-1701-partition-over-slice-for-python-examples and
squashes the following
Repository: spark
Updated Branches:
refs/heads/master 78d4220fa - c32c8538e
Fix Java example in Streaming Programming Guide
val conf was used instead of SparkConf conf in Java snippet.
Author: Santiago M. Mola sa...@mola.io
Closes #2472 from smola/patch-1 and squashes the following commits:
Repository: spark
Updated Branches:
refs/heads/master c32c8538e - 5f8833c67
[PySpark] remove unnecessary use of numSlices from pyspark tests
Author: Matthew Farrellee m...@redhat.com
Closes #2467 from mattf/master-pyspark-remove-numslices-from-tests and squashes
the following commits:
Repository: spark
Updated Branches:
refs/heads/master 50f863365 - c854b9fcb
[SPARK-3634] [PySpark] User's module should take precedence over system modules
Python modules added through addPyFile should take precedence over system
modules.
This patch put the path for user added module in the
Repository: spark
Updated Branches:
refs/heads/master c3f2a8588 - 9b56e249e
[SPARK-3690] Closing shuffle writers we swallow more important exception
Author: epahomov pahomov.e...@gmail.com
Closes #2537 from epahomov/SPARK-3690 and squashes the following commits:
a0b7de4 [epahomov]
Repository: spark
Updated Branches:
refs/heads/branch-1.1 a8c6e82de - 06b96d4a3
SPARK-3745 - fix check-license to properly download and check jar
for details, see: https://issues.apache.org/jira/browse/SPARK-3745
Author: shane knapp incompl...@gmail.com
Closes #2596 from
any more.
cc JoshRosen , sorry for these stupid bugs.
Author: Davies Liu davies@gmail.com
Closes #2603 from davies/fix_broadcast and squashes the following commits:
080a743 [Davies Liu] fix bugs in broadcast large closure of RDD
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Repository: spark
Updated Branches:
refs/heads/master abf588f47 - dcb2f73f1
SPARK-2626 [DOCS] Stop SparkContext in all examples
Call SparkContext.stop() in all examples (and touch up minor nearby code style
issues while at it)
Author: Sean Owen so...@cloudera.com
Closes #2575 from
Repository: spark
Updated Branches:
refs/heads/branch-1.1 24ee61625 - c52c231c7
SPARK-3638 | Forced a compatible version of http client in kinesis-asl profile
This patch forces use of commons http client 4.2 in Kinesis-asl profile so that
the AWS SDK does not run into dependency conflicts
Repository: spark
Updated Branches:
refs/heads/master 93861a5e8 - 29c351320
[SPARK-3446] Expose underlying job ids in FutureAction.
FutureAction is the only type exposed through the async APIs, so
for job IDs to be useful they need to be exposed there. The complication
is that some async jobs
Repository: spark
Updated Branches:
refs/heads/master 6e27cb630 - 5b4a5b1ac
[SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to 1 and
PYSPARK_PYTHON unset
### Problem
The section Using the shell in Spark Programming Guide
Repository: spark
Updated Branches:
refs/heads/master c9ae79fba - 20ea54cc7
[SPARK-2461] [PySpark] Add a toString method to GeneralizedLinearModel
Add a toString method to GeneralizedLinearModel, also change `__str__` to
`__repr__` for some classes, to provide better message in repr.
This
Repository: spark
Updated Branches:
refs/heads/master 20ea54cc7 - 4f01265f7
[SPARK-3786] [PySpark] speedup tests
This patch try to speed up tests of PySpark, re-use the SparkContext in
tests.py and mllib/tests.py to reduce the overhead of create SparkContext,
remove some test cases, which
Repository: spark
Updated Branches:
refs/heads/master 2300eb58a - 69c3f441a
[SPARK-3479] [Build] Report failed test category
This PR allows SparkQA (i.e. Jenkins) to report in its posts to GitHub what
category of test failed, if one can be determined.
The failure categories are:
* general
Repository: spark
Updated Branches:
refs/heads/master 70e824f75 - d65fd554b
[SPARK-3827] Very long RDD names are not rendered properly in web UI
With Spark SQL we generate very long RDD names. These names are not properly
rendered in the web UI.
This PR fixes the rendering issue.
Repository: spark
Updated Branches:
refs/heads/branch-1.1 964e3aa48 - 82ab4a796
[SPARK-3827] Very long RDD names are not rendered properly in web UI
With Spark SQL we generate very long RDD names. These names are not properly
rendered in the web UI.
This PR fixes the rendering issue.
Repository: spark
Updated Branches:
refs/heads/branch-1.1 267c7be3b - 553183024
[SPARK-3731] [PySpark] fix memory leak in PythonRDD
The parent.getOrCompute() of PythonRDD is executed in a separated thread, it
should release the memory reserved for shuffle and unrolling finally.
Author:
Repository: spark
Updated Branches:
refs/heads/master b32bb72e8 - 5912ca671
[SPARK-3398] [EC2] Have spark-ec2 intelligently wait for specific cluster states
Instead of waiting arbitrary amounts of time for the cluster to reach a
specific state, this patch lets `spark-ec2` explicitly wait for
Repository: spark
Updated Branches:
refs/heads/master b69c9fb6f - 798ed22c2
[SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs
Retire Epydoc, use Sphinx to generate API docs.
Refine Sphinx docs, also convert some docstrings into Sphinx style.
It looks like:
![api
Repository: spark
Updated Branches:
refs/heads/master bcb1ae049 - f706823b7
Fetch from branch v4 in Spark EC2 script.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f706823b
Tree:
Repository: spark
Updated Branches:
refs/heads/master 1e0aa4deb - 73bf3f2e0
[SPARK-3741] Make ConnectionManager propagate errors properly and add mo...
...re logs to avoid Executors swallowing errors
This PR made the following changes:
* Register a callback to `Connection` so that the error
Repository: spark
Updated Branches:
refs/heads/master 2c8851343 - e7edb723d
[SPARK-3868][PySpark] Hard to recognize which module is tested from
unit-tests.log
./python/run-tests script display messages about which test it is running
currently on stdout but not write them on unit-tests.log.
(to avoid breaking
existing example programs).
There are more details in a block comment in `bin/pyspark`.
Author: Josh Rosen joshro...@apache.org
Closes #2651 from JoshRosen/SPARK-3772 and squashes the following commits:
7b8eb86 [Josh Rosen] More changes to PySpark python executable
Repository: spark
Updated Branches:
refs/heads/master 90f73fcc4 - 72f36ee57
[SPARK-3886] [PySpark] use AutoBatchedSerializer by default
Use AutoBatchedSerializer by default, which will choose the proper batch size
based on size of serialized objects, let the size of serialized batch fall in
Repository: spark
Updated Branches:
refs/heads/master 0e8203f4f - 81015a2ba
[SPARK-3867][PySpark] ./python/run-tests failed when it run with Python 2.6 and
unittest2 is not installed
./python/run-tests search a Python 2.6 executable on PATH and use it if
available.
When using Python 2.6, it
Repository: spark
Updated Branches:
refs/heads/master c86c97603 - fc616d51a
[SPARK-3121] Wrong implementation of implicit bytesWritableConverter
val path = ... //path to seq file with BytesWritable as type of both key and
value
val file = sc.sequenceFile[Array[Byte],Array[Byte]](path)
Repository: spark
Updated Branches:
refs/heads/branch-1.1 5a21e3e7e - 0e3257906
[SPARK-3121] Wrong implementation of implicit bytesWritableConverter
val path = ... //path to seq file with BytesWritable as type of both key and
value
val file = sc.sequenceFile[Array[Byte],Array[Byte]](path)
Repository: spark
Updated Branches:
refs/heads/branch-1.0 b539b0e98 - dc18167ee
[SPARK-3121] Wrong implementation of implicit bytesWritableConverter
val path = ... //path to seq file with BytesWritable as type of both key and
value
val file = sc.sequenceFile[Array[Byte],Array[Byte]](path)
Repository: spark
Updated Branches:
refs/heads/branch-1.1 0e3257906 - a36116c19
[SPARK-3905][Web UI]The keys for sorting the columns of Executor page ,Stage
page Storage page are incorrect
Author: GuoQiang Li wi...@qq.com
Closes #2763 from witgo/SPARK-3905 and squashes the following
Repository: spark
Updated Branches:
refs/heads/master b4a7fa7a6 - d8b8c2107
Add echo Run streaming tests ...
Author: Ken Takagiwa ugw.gi.wo...@gmail.com
Closes #2778 from giwa/patch-2 and squashes the following commits:
a59f9a1 [Ken Takagiwa] Add echo Run streaming tests ...
Project:
Repository: spark
Updated Branches:
refs/heads/master e7f4ea8a5 - 56fd34af5
[SPARK-3741] Add afterExecute for handleConnectExecutor
Sorry. I found that I forgot to add `afterExecute` for `handleConnectExecutor`
in #2593.
Author: zsxwing zsxw...@gmail.com
Closes #2794 from
1 - 100 of 928 matches
Mail list logo