Repository: spark
Updated Branches:
refs/heads/master 3a5962f0f - d1d0ee41c
[SPARK-3103] [PySpark] fix saveAsTextFile() with utf-8
bugfix: It will raise an exception when it try to encode non-ASCII strings into
unicode. It should only encode unicode as utf-8.
Author: Davies Liu
Repository: spark
Updated Branches:
refs/heads/branch-1.1 cc4015d2f - e08333463
[SPARK-3103] [PySpark] fix saveAsTextFile() with utf-8
bugfix: It will raise an exception when it try to encode non-ASCII strings into
unicode. It should only encode unicode as utf-8.
Author: Davies Liu
tests,
irrespective of whether SparkSQL itself has been modified. It also includes
Davies' fix for the bug.
Closes #2026.
Author: Josh Rosen joshro...@apache.org
Author: Davies Liu davies@gmail.com
Closes #2027 from JoshRosen/pyspark-sql-fix and squashes the following commits:
9af2708
an opportunity
to clean this up later if we sever the circular dependencies between
BlockManager and other components and pass those components to BlockManager's
constructor.
Author: Josh Rosen joshro...@apache.org
Closes #1976 from JoshRosen/SPARK-2977 and squashes the following commits:
a9cd1e1 [Josh
to clean this up later if we sever the circular dependencies between
BlockManager and other components and pass those components to BlockManager's
constructor.
Author: Josh Rosen joshro...@apache.org
Closes #1976 from JoshRosen/SPARK-2977 and squashes the following commits:
a9cd1e1 [Josh Rosen
Repository: spark
Updated Branches:
refs/heads/branch-1.1 8c7957446 - bd3ce2ffb
[SPARK-2677] BasicBlockFetchIterator#next can wait forever
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Closes #1632 from sarutak/SPARK-2677 and squashes the following commits:
cddbc7b [Kousuke Saruta]
Repository: spark
Updated Branches:
refs/heads/branch-1.1 a12d3ae32 - 721f2fdc9
[SPARK-3035] Wrong example with SparkContext.addFile
https://issues.apache.org/jira/browse/SPARK-3035
fix for wrong document.
Author: iAmGhost kdh7...@gmail.com
Closes #1942 from iAmGhost/master and squashes
Repository: spark
Updated Branches:
refs/heads/master ac6411c6e - 379e7585c
[SPARK-3035] Wrong example with SparkContext.addFile
https://issues.apache.org/jira/browse/SPARK-3035
fix for wrong document.
Author: iAmGhost kdh7...@gmail.com
Closes #1942 from iAmGhost/master and squashes the
Repository: spark
Updated Branches:
refs/heads/branch-1.1 721f2fdc9 - 5dd571c29
[SPARK-1065] [PySpark] improve supporting for large broadcast
Passing large object by py4j is very slow (cost much memory), so pass broadcast
objects via files (similar to parallelize()).
Add an option to keep
Repository: spark
Updated Branches:
refs/heads/master 2fc8aca08 - bc95fe08d
In the stop method of ConnectionManager to cancel the ackTimeoutMonitor
cc JoshRosen sarutak
Author: GuoQiang Li wi...@qq.com
Closes #1989 from witgo/cancel_ackTimeoutMonitor and squashes the following
commits
Repository: spark
Updated Branches:
refs/heads/branch-1.1 5dd571c29 - f02e327f0
In the stop method of ConnectionManager to cancel the ackTimeoutMonitor
cc JoshRosen sarutak
Author: GuoQiang Li wi...@qq.com
Closes #1989 from witgo/cancel_ackTimeoutMonitor and squashes the following
commits
TestOutputFormat.test_newhadoop on Python 2.6 until SPARK-2951 is fixed.
- Fix MLlib _deserialize_double on Python 2.6.
Closes #1868. Closes #1042.
Author: Josh Rosen joshro...@apache.org
Closes #1874 from JoshRosen/python2.6 and squashes the following commits:
983d259 [Josh Rosen] [SPARK-2954] Fix
TestOutputFormat.test_newhadoop on Python 2.6 until SPARK-2951 is fixed.
- Fix MLlib _deserialize_double on Python 2.6.
Closes #1868. Closes #1042.
Author: Josh Rosen joshro...@apache.org
Closes #1874 from JoshRosen/python2.6 and squashes the following commits:
983d259 [Josh Rosen] [SPARK-2954] Fix
is to reset currentLocalityIndex
after recomputing the locality levels.
Thanks to kayousterhout, mridulm, and lirui-intel for helping me to debug this.
Author: Josh Rosen joshro...@apache.org
Closes #1896 from JoshRosen/SPARK-2931 and squashes the following commits:
48b60b5 [Josh Rosen] Move
here is to reset currentLocalityIndex
after recomputing the locality levels.
Thanks to kayousterhout, mridulm, and lirui-intel for helping me to debug this.
Author: Josh Rosen joshro...@apache.org
Closes #1896 from JoshRosen/SPARK-2931 and squashes the following commits:
48b60b5 [Josh Rosen
Repository: spark
Updated Branches:
refs/heads/master 1d03a26a4 - 28dcbb531
[SPARK-2898] [PySpark] fix bugs in deamon.py
1. do not use signal handler for SIGCHILD, it's easy to cause deadlock
2. handle EINTR during accept()
3. pass errno into JVM
4. handle EAGAIN during fork()
Now, it can
Repository: spark
Updated Branches:
refs/heads/branch-1.1 bb23b118e - 92daffed4
[SPARK-2898] [PySpark] fix bugs in deamon.py
1. do not use signal handler for SIGCHILD, it's easy to cause deadlock
2. handle EINTR during accept()
3. pass errno into JVM
4. handle EAGAIN during fork()
Now, it
Repository: spark
Updated Branches:
refs/heads/master e053c5581 - 59f84a953
[SPARK-1687] [PySpark] pickable namedtuple
Add an hook to replace original namedtuple with an pickable one, then
namedtuple could be used in RDDs.
PS: pyspark should be import BEFORE from collections import
Repository: spark
Updated Branches:
refs/heads/branch-1.1 3823f6d25 - bfd2f3958
[SPARK-1687] [PySpark] pickable namedtuple
Add an hook to replace original namedtuple with an pickable one, then
namedtuple could be used in RDDs.
PS: pyspark should be import BEFORE from collections import
Repository: spark
Updated Branches:
refs/heads/branch-1.1 aa7a48ee9 - 2225d18a7
[SPARK-1687] [PySpark] fix unit tests related to pickable namedtuple
serializer is imported multiple times during doctests, so it's better to make
_hijack_namedtuple() safe to be called multiple times.
Author:
Repository: spark
Updated Branches:
refs/heads/master 8e7d5ba1a - 9fd82dbbc
[SPARK-1687] [PySpark] fix unit tests related to pickable namedtuple
serializer is imported multiple times during doctests, so it's better to make
_hijack_namedtuple() safe to be called multiple times.
Author:
Repository: spark
Updated Branches:
refs/heads/master e139e2be6 - 55349f9fe
[SPARK-1740] [PySpark] kill the python worker
Kill only the python worker related to cancelled tasks.
The daemon will start a background thread to monitor all the opened sockets for
all workers. If the socket is
merged.
Both of these fixes are useful when backporting changes.
Author: Josh Rosen joshro...@apache.org
Closes #1668 from JoshRosen/pr-script-improvements and squashes the following
commits:
ff4f33a [Josh Rosen] Default SPARK_HOME to cwd(); detect missing JIRA
credentials.
ed5bc57 [Josh Rosen
Repository: spark
Updated Branches:
refs/heads/master e02136214 - cc820502f
Docs: monitoring, streaming programming guide
Fix several awkward wordings and grammatical issues in the following
documents:
* docs/monitoring.md
* docs/streaming-programming-guide.md
Author: kballou
[SPARK-2024] Add saveAsSequenceFile to PySpark
JIRA issue: https://issues.apache.org/jira/browse/SPARK-2024
This PR is a followup to #455 and adds capabilities for saving PySpark RDDs
using SequenceFile or any Hadoop OutputFormats.
* Added RDD methods ```saveAsSequenceFile```,
Repository: spark
Updated Branches:
refs/heads/master 437dc8c5b - 94d1f46fc
http://git-wip-us.apache.org/repos/asf/spark/blob/94d1f46f/python/pyspark/tests.py
--
diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
with an
incorrect ClassTag by wrapping it and overriding its ClassTag. This should be
okay for cases where the Scala code that calls collect() knows what type of
array should be allocated, which is the case in the MLlib wrappers.
Author: Josh Rosen joshro...@apache.org
Closes #1639 from JoshRosen/SPARK
Repository: spark
Updated Branches:
refs/heads/branch-1.0 1a0a2f81a - 2693035ba
[SPARK-2580] [PySpark] keep silent in worker if JVM close the socket
During rdd.take(n), JVM will close the socket if it had got enough data, the
Python worker should keep silent in this case.
In the same time,
Repository: spark
Updated Branches:
refs/heads/branch-1.0 2693035ba - e0bc72eb7
[SPARK-791] [PySpark] fix pickle itemgetter with cloudpickle
fix the problem with pickle operator.itemgetter with multiple index.
Author: Davies Liu davies@gmail.com
Closes #1627 from davies/itemgetter and
Repository: spark
Updated Branches:
refs/heads/branch-0.9 c37db1537 - 7e4a0e1a0
[SPARK-2547]:The clustering documentaion example provided for spark 0.9
I modified a trivial mistake in the MLlib documentation.
I checked that the python sample code for a k-means clustering can correctly
901 - 930 of 930 matches
Mail list logo