http://git-wip-us.apache.org/repos/asf/spark/blob/04e44b37/python/pyspark/sql/_types.py
--
diff --git a/python/pyspark/sql/_types.py b/python/pyspark/sql/_types.py
new file mode 100644
index 000..492c0cb
--- /dev/null
+++ b/pyt
in
Python may be GCed, then the broadcast will be destroyed in JVM before the
PythonRDD.
This PR change to use PythonRDD to track the lifecycle of the broadcast object.
It also have a refactor about getNumPartitions() to avoid unnecessary creation
of PythonRDD, which could be heavy.
cc JoshRo
Repository: spark
Updated Branches:
refs/heads/branch-1.2 964f54478 -> 8e9fc27aa
Revert "[SPARK-5634] [core] Show correct message in HS when no incomplete apps
f..."
This reverts commit 5845a62361c39eb97df5de01c982821c8858de76.
This was reverted because it broke compilation for branch-1.2.
in
Python may be GCed, then the broadcast will be destroyed in JVM before the
PythonRDD.
This PR change to use PythonRDD to track the lifecycle of the broadcast object.
It also have a refactor about getNumPartitions() to avoid unnecessary creation
of PythonRDD, which could be heavy.
cc JoshRo
hon may be GCed, then the broadcast will be destroyed in JVM before the
PythonRDD.
This PR change to use PythonRDD to track the lifecycle of the broadcast object.
It also have a refactor about getNumPartitions() to avoid unnecessary creation
of PythonRDD, which could be heavy.
cc JoshRosen
Aut
Repository: spark
Updated Branches:
refs/heads/master 4d4b24927 -> a76b921a9
Revert "[SPARK-6352] [SQL] Add DirectParquetOutputCommitter"
This reverts commit b29663eeea440b1d1a288d41b5ddf67e77c5bd54.
I'm reverting this because it broke test compilation for the Hadoop 1.x
profiles.
Project:
tps://github.com/xerial/snappy-java/issues/100).
Author: Josh Rosen
Closes #5512 from JoshRosen/snappy-1.1.1.7 and squashes the following commits:
f1ac0f8 [Josh Rosen] Upgrade to snappy-java 1.1.1.7.
(cherry picked from commit 6adb8bcbf0a1a7bfe2990de18c59c66cd7a0aeb8)
Signed-off-by: Josh Rosen
Confli
tps://github.com/xerial/snappy-java/issues/100).
Author: Josh Rosen
Closes #5512 from JoshRosen/snappy-1.1.1.7 and squashes the following commits:
f1ac0f8 [Josh Rosen] Upgrade to snappy-java 1.1.1.7.
(cherry picked from commit 6adb8bcbf0a1a7bfe2990de18c59c66cd7a0aeb8)
Signed-off-by: Josh Rosen
Confli
tps://github.com/xerial/snappy-java/issues/100).
Author: Josh Rosen
Closes #5512 from JoshRosen/snappy-1.1.1.7 and squashes the following commits:
f1ac0f8 [Josh Rosen] Upgrade to snappy-java 1.1.1.7.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/
Repository: spark
Updated Branches:
refs/heads/branch-1.3 ea13948b9 -> 8d4176132
[SPARK-6677] [SQL] [PySpark] fix cached classes
It's possible to have two DataType object with same id (memory address) at
different time, we should check the cached classes to verify that it's
generated by give
Repository: spark
Updated Branches:
refs/heads/master 0cc8fcb4c -> 5d8f7b9e8
[SPARK-6677] [SQL] [PySpark] fix cached classes
It's possible to have two DataType object with same id (memory address) at
different time, we should check the cached classes to verify that it's
generated by given da
Repository: spark
Updated Branches:
refs/heads/master 5c2844c51 -> dea5dacc5
[HOTFIX] Add explicit return types to fix lint errors
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dea5dacc
Tree: http://git-wip-us.apache.org
Repository: spark
Updated Branches:
refs/heads/branch-1.2 7a1583917 -> daec1c635
[SPARK-5969][PySpark] Fix descending pyspark.rdd.sortByKey.
The samples should always be sorted in ascending order, because
bisect.bisect_left is used on it. The reverse order of the result is already
achieved i
Repository: spark
Updated Branches:
refs/heads/branch-1.3 ec3e76f1e -> 48321b83d
[SPARK-5969][PySpark] Fix descending pyspark.rdd.sortByKey.
The samples should always be sorted in ascending order, because
bisect.bisect_left is used on it. The reverse order of the result is already
achieved i
Repository: spark
Updated Branches:
refs/heads/master 0375134f4 -> 4740d6a15
[SPARK-6216] [PySpark] check the python version in worker
Author: Davies Liu
Closes #5404 from davies/check_version and squashes the following commits:
e559248 [Davies Liu] add tests
ec33b5f [Davies Liu] check the
Repository: spark
Updated Branches:
refs/heads/master b9baa4cd9 -> 0375134f4
[SPARK-5969][PySpark] Fix descending pyspark.rdd.sortByKey.
The samples should always be sorted in ascending order, because
bisect.bisect_left is used on it. The reverse order of the result is already
achieved in ra
hat subclass ShuffleSuite.scala. This commit fixes that
problem.
JoshRosen would be great if you could take a look at this, since you wrote this
test originally.
Author: Kay Ousterhout
Closes #5401 from kayousterhout/SPARK-6753 and squashes the following commits:
368c540 [Kay Ousterhout] [SPARK-6
hat subclass ShuffleSuite.scala. This commit fixes that
problem.
JoshRosen would be great if you could take a look at this, since you wrote this
test originally.
Author: Kay Ousterhout
Closes #5401 from kayousterhout/SPARK-6753 and squashes the following commits:
368c540 [Kay Ousterhout] [SPARK-6
hat subclass ShuffleSuite.scala. This commit fixes that
problem.
JoshRosen would be great if you could take a look at this, since you wrote this
test originally.
Author: Kay Ousterhout
Closes #5401 from kayousterhout/SPARK-6753 and squashes the following commits:
368c540 [Kay Ousterhout] [SPARK-6
hat subclass ShuffleSuite.scala. This commit fixes that
problem.
JoshRosen would be great if you could take a look at this, since you wrote this
test originally.
Author: Kay Ousterhout
Closes #5401 from kayousterhout/SPARK-6753 and squashes the following commits:
368c540 [Kay Ousterhout] [SPARK-6
Repository: spark
Updated Branches:
refs/heads/branch-1.3 cdef7d080 -> e967ecaca
[SPARK-6506] [pyspark] Do not try to retrieve SPARK_HOME when not needed...
In particular, this makes pyspark in yarn-cluster mode fail unless
SPARK_HOME is set, when it's not really needed.
Author: Marcelo
Repository: spark
Updated Branches:
refs/heads/master 15e0d2bd1 -> f7e21dd1e
[SPARK-6506] [pyspark] Do not try to retrieve SPARK_HOME when not needed...
In particular, this makes pyspark in yarn-cluster mode fail unless
SPARK_HOME is set, when it's not really needed.
Author: Marcelo Van
ta structures.
Author: Josh Rosen
Closes #5397 from JoshRosen/SPARK-6737 and squashes the following commits:
af3b02f [Josh Rosen] Consolidate stage completion handling code in a single
method.
e96ce3a [Josh Rosen] Consolidate stage completion handling code in a single
method.
3052aea [Josh Rosen] C
ures.
Author: Josh Rosen
Closes #5397 from JoshRosen/SPARK-6737 and squashes the following commits:
af3b02f [Josh Rosen] Consolidate stage completion handling code in a single
method.
e96ce3a [Josh Rosen] Consolidate stage completion handling code in a single
method.
3052aea [Josh Rosen] C
Repository: spark
Updated Branches:
refs/heads/branch-1.3 1cde04f21 -> ab1b8edb8
[SPARK-6636] Use public DNS hostname everywhere in spark_ec2.py
The spark_ec2.py script uses public_dns_name everywhere in the script except
for testing ssh availability, which is done using the public ip address
Repository: spark
Updated Branches:
refs/heads/master a0846c4b6 -> 6f0d55d76
[SPARK-6636] Use public DNS hostname everywhere in spark_ec2.py
The spark_ec2.py script uses public_dns_name everywhere in the script except
for testing ssh availability, which is done using the public ip address of
PI by metrics users, but it's probably okay to
do this in a major release as long as we document it in the release notes.
Author: Josh Rosen
Closes #5372 from JoshRosen/driver-id-fix and squashes the following commits:
42d3c10 [Josh Rosen] Clarify comment
0c5d04b [Josh Rosen] Add backward
ng a bug reproduction.
This patch fixes this issue by ensuring proper cleanup of these resources. It
also adds logging for unexpected error cases.
(See #4944 for the corresponding PR for 1.3/1.4).
Author: Josh Rosen
Closes #5174 from JoshRosen/executorclassloaderleak-branch-1.2 and squa
Repository: spark
Updated Branches:
refs/heads/master 6e1c1ec67 -> 440ea31b7
[SPARK-6621][Core] Fix the bug that calling EventLoop.stop in
EventLoop.onReceive/onError/onStart doesn't call onStop
Author: zsxwing
Closes #5280 from zsxwing/SPARK-6621 and squashes the following commits:
521125
Repository: spark
Updated Branches:
refs/heads/branch-1.3 d21f77988 -> ac705aa83
[SPARK-6621][Core] Fix the bug that calling EventLoop.stop in
EventLoop.onReceive/onError/onStart doesn't call onStop
Author: zsxwing
Closes #5280 from zsxwing/SPARK-6621 and squashes the following commits:
52
Repository: spark
Updated Branches:
refs/heads/branch-1.2 a73055f7f -> 8fa09a480
SPARK-6414: Spark driver failed with NPE on job cancelation
Use Option for ActiveJob.properties to avoid NPE bug
Author: Hung Lin
Closes #5124 from hunglin/SPARK-6414 and squashes the following commits:
2290b6
Repository: spark
Updated Branches:
refs/heads/branch-1.3 a6664dcd8 -> 58e2b3fcd
SPARK-6414: Spark driver failed with NPE on job cancelation
Use Option for ActiveJob.properties to avoid NPE bug
Author: Hung Lin
Closes #5124 from hunglin/SPARK-6414 and squashes the following commits:
2290b6
Repository: spark
Updated Branches:
refs/heads/master 0cce5451a -> e3202aa2e
SPARK-6414: Spark driver failed with NPE on job cancelation
Use Option for ActiveJob.properties to avoid NPE bug
Author: Hung Lin
Closes #5124 from hunglin/SPARK-6414 and squashes the following commits:
2290b6b [H
ive operation if
there are many (e.g. thousands) of retained jobs.
This patch adds a new map to `JobProgressListener` in order to speed up these
lookups.
Author: Josh Rosen
Closes #4830 from JoshRosen/statustracker-job-group-indexing and squashes the
following commits:
e39c5c7 [Josh Rosen] Addr
Repository: spark
Updated Branches:
refs/heads/branch-1.2 758ebf77d -> a73055f7f
[SPARK-6667] [PySpark] remove setReuseAddress
The reused address on server side had caused the server can not acknowledge the
connected connections, remove it.
This PR will retry once after timeout, it also add
Repository: spark
Updated Branches:
refs/heads/branch-1.3 1160cc9e1 -> ee2bd70a4
[SPARK-6667] [PySpark] remove setReuseAddress
The reused address on server side had caused the server can not acknowledge the
connected connections, remove it.
This PR will retry once after timeout, it also add
Repository: spark
Updated Branches:
refs/heads/master 424e987df -> 0cce5451a
[SPARK-6667] [PySpark] remove setReuseAddress
The reused address on server side had caused the server can not acknowledge the
connected connections, remove it.
This PR will retry once after timeout, it also add a ti
Repository: spark
Updated Branches:
refs/heads/master 86b439935 -> 757b2e917
[SPARK-6553] [pyspark] Support functools.partial as UDF
Use `f.__repr__()` instead of `f.__name__` when instantiating
`UserDefinedFunction`s, so `functools.partial`s may be used.
Author: ksonj
Closes #5206 from ks
Repository: spark
Updated Branches:
refs/heads/branch-1.3 bc04fa2e2 -> 98f72dfc1
[SPARK-6553] [pyspark] Support functools.partial as UDF
Use `f.__repr__()` instead of `f.__name__` when instantiating
`UserDefinedFunction`s, so `functools.partial`s may be used.
Author: ksonj
Closes #5206 fro
potting this issue.
Author: Josh Rosen
Closes #5276 from JoshRosen/SPARK-6614 and squashes the following commits:
d532ba7 [Josh Rosen] Check whether failed task was authorized committer
cbb3784 [Josh Rosen] Add regression test for SPARK-6614
Project: http://git-wip-us.apache.org/repos/asf/spark
potting this issue.
Author: Josh Rosen
Closes #5276 from JoshRosen/SPARK-6614 and squashes the following commits:
d532ba7 [Josh Rosen] Check whether failed task was authorized committer
cbb3784 [Josh Rosen] Add regression test for SPARK-6614
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
to this bug.
Author: Josh Rosen
Closes #5050 from JoshRosen/javardd-si-8905-fix and squashes the following
commits:
2feb068 [Josh Rosen] Use intermediate abstract classes to work around SPARK-3266
d5f3e5d [Josh Rosen] Add failing regression tests for SPARK-3266
(cherry picked from com
Repository: spark
Updated Branches:
refs/heads/master 3b5aaa6a5 -> f17d43b03
[SPARK-6219] [Build] Check that Python code compiles
This PR expands the Python lint checks so that they check for obvious
compilation errors in our Python code.
For example:
```
$ ./dev/lint-python
Python lint che
Repository: spark
Updated Branches:
refs/heads/master 3db138742 -> 540b2a4ea
[SPARK-6394][Core] cleanup BlockManager companion object and improve the
getCacheLocs method in DAGScheduler
The current implementation include searching a HashMap many times, we can avoid
this.
Actually if you look
Repository: spark
Updated Branches:
refs/heads/branch-1.3 29e39e178 -> febb12308
[SPARK-6313] Add config option to disable file locks/fetchFile cache to ...
...support NFS mounts.
This is a work around for now with the goal to find a more permanent solution.
https://issues.apache.org/jira/bro
Repository: spark
Updated Branches:
refs/heads/branch-1.2 9ebd6f12e -> a2a94a154
[SPARK-6313] Add config option to disable file locks/fetchFile cache to ...
...support NFS mounts.
This is a work around for now with the goal to find a more permanent solution.
https://issues.apache.org/jira/bro
Repository: spark
Updated Branches:
refs/heads/master 0f673c21f -> 4cca3917d
[SPARK-6313] Add config option to disable file locks/fetchFile cache to ...
...support NFS mounts.
This is a work around for now with the goal to find a more permanent solution.
https://issues.apache.org/jira/browse/
to this bug.
Author: Josh Rosen
Closes #5050 from JoshRosen/javardd-si-8905-fix and squashes the following
commits:
2feb068 [Josh Rosen] Use intermediate abstract classes to work around SPARK-3266
d5f3e5d [Josh Rosen] Add failing regression tests for SPARK-3266
(cherry picked from com
his bug.
Author: Josh Rosen
Closes #5050 from JoshRosen/javardd-si-8905-fix and squashes the following
commits:
2feb068 [Josh Rosen] Use intermediate abstract classes to work around SPARK-3266
d5f3e5d [Josh Rosen] Add failing regression tests for SPARK-3266
Project: http://git-wip-us.apache.
Repository: spark
Updated Branches:
refs/heads/master f149b8b5e -> e3f315ac3
[SPARK-6327] [PySpark] fix launch spark-submit from python
SparkSubmit should be launched without setting PYSPARK_SUBMIT_ARGS
cc JoshRosen , this mode is actually used by python unit test, so I will not
add m
ory leak in
collect(), which may consume lots of memory in JVM.
This PR change the way we sending collected data back into Python from local
file to socket, which could avoid any disk IO during collect, also avoid any
referrers of Java object in Python.
cc JoshRosen
Author: Davies Liu
Clo
ory leak in
collect(), which may consume lots of memory in JVM.
This PR change the way we sending collected data back into Python from local
file to socket, which could avoid any disk IO during collect, also avoid any
referrers of Java object in Python.
cc JoshRosen
Author: Davies Liu
Clo
ory leak in
collect(), which may consume lots of memory in JVM.
This PR change the way we sending collected data back into Python from local
file to socket, which could avoid any disk IO during collect, also avoid any
referrers of Java object in Python.
cc JoshRosen
Author: Davies Liu
Closes #4
changed the code that reads the environment
variable to do so via `SparkConf.getenv`, then used a custom SparkConf subclass
to mock the environment variable (this pattern is used elsewhere in Spark's
tests).
Author: Josh Rosen
Closes #4903 from JoshRosen/SPARK-6175 and squashes the foll
C_DNS, I changed the code that reads the environment
variable to do so via `SparkConf.getenv`, then used a custom SparkConf subclass
to mock the environment variable (this pattern is used elsewhere in Spark's
tests).
Author: Josh Rosen
Closes #4903 from JoshRosen/SPARK-6175 and squashes
k references
here anyways, since this map is cleared at the end of each task.
Author: Josh Rosen
Closes #4835 from JoshRosen/SPARK-6075 and squashes the following commits:
4f4b5b2 [Josh Rosen] Remove defensive assertions that caused test failures in
code unrelated to this change
120c7b0 [Josh Rose
erSchema (avoid the unnecessary
converter of object).
cc pwendell JoshRosen
Author: Davies Liu
Closes #4808 from davies/leak and squashes the following commits:
6a322a4 [Davies Liu] tests refactor
3da44fc [Davies Liu] fix __eq__ of Singleton
534ac90 [Davies Liu] add more checks
46999dc [Davies
erSchema (avoid the unnecessary
converter of object).
cc pwendell JoshRosen
Author: Davies Liu
Closes #4808 from davies/leak and squashes the following commits:
6a322a4 [Davies Liu] tests refactor
3da44fc [Davies Liu] fix __eq__ of Singleton
534ac90 [Davies Liu] add more checks
46999dc [Da
Repository: spark
Updated Branches:
refs/heads/branch-1.1 814934da6 -> 91d0effb3
[SPARK-6055] [PySpark] fix incorrect DataType.__eq__ (for 1.1)
The eq of DataType is not correct, class cache is not use correctly (created
class can not be find by dataType), then it will create lots of classes
Repository: spark
Updated Branches:
refs/heads/branch-1.2 17b7cc733 -> 576fc54e5
[SPARK-6055] [PySpark] fix incorrect DataType.__eq__ (for 1.2)
The eq of DataType is not correct, class cache is not use correctly (created
class can not be find by dataType), then it will create lots of classes
Repository: spark
Updated Branches:
refs/heads/master cfff397f0 -> 7fa960e65
[SPARK-5363] Fix bug in PythonRDD: remove() inside iterator is not safe
Removing elements from a mutable HashSet while iterating over it can cause the
iteration to incorrectly skip over entries that were not removed.
Repository: spark
Updated Branches:
refs/heads/branch-1.2 015895ab5 -> cc7313d09
[SPARK-5363] Fix bug in PythonRDD: remove() inside iterator is not safe
Removing elements from a mutable HashSet while iterating over it can cause the
iteration to incorrectly skip over entries that were not remov
Repository: spark
Updated Branches:
refs/heads/branch-1.3 dafb3d210 -> 5d309ad6c
[SPARK-5363] Fix bug in PythonRDD: remove() inside iterator is not safe
Removing elements from a mutable HashSet while iterating over it can cause the
iteration to incorrectly skip over entries that were not remov
Repository: spark
Updated Branches:
refs/heads/master e4f9d03d7 -> 95cd643aa
[SPARK-3885] Provide mechanism to remove accumulators once they are no longer
used
Instead of storing a strong reference to accumulators, I've replaced this with
a weak reference and updated any code that uses these
Repository: spark
Updated Branches:
refs/heads/master 275b1bef8 -> e4f9d03d7
[SPARK-911] allow efficient queries for a range if RDD is partitioned wi...
...th RangePartitioner
Author: Aaron Josephs
Closes #1381 from aaronjosephs/PLAT-911 and squashes the following commits:
e30ade5 [Aaron J
Repository: spark
Updated Branches:
refs/heads/branch-1.3 07a401a7b -> 7e5e4d82b
[SPARK-4454] Revert getOrElse() cleanup in DAGScheduler.getCacheLocs()
This method is performance-sensitive and this change wasn't necessary.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: ht
Repository: spark
Updated Branches:
refs/heads/master d46d6246d -> a51fc7ef9
[SPARK-4454] Revert getOrElse() cleanup in DAGScheduler.getCacheLocs()
This method is performance-sensitive and this change wasn't necessary.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http:/
Repository: spark
Updated Branches:
refs/heads/branch-1.3 07d8ef9e7 -> 81202350a
[SPARK-5785] [PySpark] narrow dependency for cogroup/join in PySpark
Currently, PySpark does not support narrow dependency during cogroup/join when
the two RDDs have the partitioner, another unnecessary shuffle s
Repository: spark
Updated Branches:
refs/heads/master 117121a4e -> c3d2b90bd
[SPARK-5785] [PySpark] narrow dependency for cogroup/join in PySpark
Currently, PySpark does not support narrow dependency during cogroup/join when
the two RDDs have the partitioner, another unnecessary shuffle stage
Repository: spark
Updated Branches:
refs/heads/master de4836f8f -> 445a755b8
[SPARK-4172] [PySpark] Progress API in Python
This patch bring the pull based progress API into Python, also a example in
Python.
Author: Davies Liu
Closes #3027 from davies/progress_api and squashes the following
Repository: spark
Updated Branches:
refs/heads/branch-1.3 e65dc1fd5 -> 35e23ff14
[SPARK-4172] [PySpark] Progress API in Python
This patch bring the pull based progress API into Python, also a example in
Python.
Author: Davies Liu
Closes #3027 from davies/progress_api and squashes the follo
Repository: spark
Updated Branches:
refs/heads/branch-1.3 b8da5c390 -> aeb85cdee
Revert "[SPARK-5363] [PySpark] check ending mark in non-block way"
This reverts commits ac6fe67e1d8bf01ee565f9cc09ad48d88a275829 and
c06e42f2c1e5fcf123b466efd27ee4cb53bbed3f.
Project: http://git-wip-us.apache.o
Repository: spark
Updated Branches:
refs/heads/branch-1.2 432ceca2a -> 6be36d5a8
Revert "[SPARK-5363] [PySpark] check ending mark in non-block way"
This reverts commits ac6fe67e1d8bf01ee565f9cc09ad48d88a275829 and
c06e42f2c1e5fcf123b466efd27ee4cb53bbed3f.
Project: http://git-wip-us.apache.o
Repository: spark
Updated Branches:
refs/heads/master a65766bf0 -> ee6e3eff0
Revert "[SPARK-5363] [PySpark] check ending mark in non-block way"
This reverts commits ac6fe67e1d8bf01ee565f9cc09ad48d88a275829 and
c06e42f2c1e5fcf123b466efd27ee4cb53bbed3f.
Project: http://git-wip-us.apache.org/r
the ending mark from Python in non-block way, so it will not
blocked by Python process.
There is a small chance that the ending mark is sent by Python process but not
available right now, then Python process will not be used.
cc JoshRosen pwendell
Author: Davies Liu
Closes #4601 from davies/fre
Repository: spark
Updated Branches:
refs/heads/branch-1.2 f468688f1 -> a39da171c
[SPARK-5395] [PySpark] fix python process leak while coalesce()
Currently, the Python process is released into pool only after the task had
finished, it cause many process forked if coalesce() is called.
This PR
ing mark from Python in non-block way, so it will not
blocked by Python process.
There is a small chance that the ending mark is sent by Python process but not
available right now, then Python process will not be used.
cc JoshRosen pwendell
Author: Davies Liu
Closes #4601 from davies/freeze
the ending mark from Python in non-block way, so it will not
blocked by Python process.
There is a small chance that the ending mark is sent by Python process but not
available right now, then Python process will not be used.
cc JoshRosen pwendell
Author: Davies Liu
Closes #4601 from davies/fre
Repository: spark
Updated Branches:
refs/heads/branch-1.2 6f47114d9 -> f468688f1
[SPARK-5788] [PySpark] capture the exception in python write thread
The exception in Python writer thread will shutdown executor.
Author: Davies Liu
Closes #4577 from davies/exception and squashes the following
Repository: spark
Updated Branches:
refs/heads/master 1294a6e01 -> b1bd1dd32
[SPARK-5788] [PySpark] capture the exception in python write thread
The exception in Python writer thread will shutdown executor.
Author: Davies Liu
Closes #4577 from davies/exception and squashes the following com
Repository: spark
Updated Branches:
refs/heads/branch-1.3 52994d83b -> c2a9a6176
[SPARK-5788] [PySpark] capture the exception in python write thread
The exception in Python writer thread will shutdown executor.
Author: Davies Liu
Closes #4577 from davies/exception and squashes the following
Repository: spark
Updated Branches:
refs/heads/branch-1.2 1af7ca15f -> 6f47114d9
[SPARK-5361]Multiple Java RDD <-> Python RDD conversions not working correctly
This is found through reading RDD from `sc.newAPIHadoopRDD` and writing it back
using `rdd.saveAsNewAPIHadoopFile` in pyspark.
It tu
Repository: spark
Updated Branches:
refs/heads/branch-1.2 7f19c7c1b -> 1af7ca15f
[SPARK-5441][pyspark] Make SerDeUtil PairRDD to Python conversions more robust
SerDeUtil.pairRDDToPython and SerDeUtil.pythonToPairRDD now both support empty
RDDs by checking the result of take(1) instead of call
are
manually set based off of ManualClock; this eliminates many Thread.sleep calls.
- Update these tests to use the withStreamingContext fixture.
Author: Josh Rosen
Closes #4633 from JoshRosen/spark-1600-b12-backport and squashes the following
commits:
e5d3dc4 [Josh Rosen] [SPARK-1600] Refacto
osen
Closes #4603 from JoshRosen/SPARK-2313 and squashes the following commits:
6a7740b [Josh Rosen] Remove EchoOutputThread since it's no longer needed
0db501f [Josh Rosen] Use select() so that we don't block if GatewayServer dies.
9bdb4b6 [Josh Rosen] Handle case where getListening
r: Josh Rosen
Closes #4603 from JoshRosen/SPARK-2313 and squashes the following commits:
6a7740b [Josh Rosen] Remove EchoOutputThread since it's no longer needed
0db501f [Josh Rosen] Use select() so that we don't block if GatewayServer dies.
9bdb4b6 [Josh Rosen] Handle case where getL
Repository: spark
Updated Branches:
refs/heads/branch-1.2 97541b22e -> 63eee523e
[SPARK-4905][STREAMING] FlumeStreamSuite fix.
Using String constructor instead of CharsetDecoder to see if it fixes the issue
of empty strings in Flume test output.
Author: Hari Shreedharan
Closes #4371 from h
Repository: spark
Updated Branches:
refs/heads/master 6fe70d843 -> 0765af9b2
[SPARK-4905][STREAMING] FlumeStreamSuite fix.
Using String constructor instead of CharsetDecoder to see if it fixes the issue
of empty strings in Flume test output.
Author: Hari Shreedharan
Closes #4371 from haris
Repository: spark
Updated Branches:
refs/heads/branch-1.3 6a0144c63 -> 18c5a999b
[SPARK-4905][STREAMING] FlumeStreamSuite fix.
Using String constructor instead of CharsetDecoder to see if it fixes the issue
of empty strings in Flume test output.
Author: Hari Shreedharan
Closes #4371 from h
Repository: spark
Updated Branches:
refs/heads/branch-1.0 a1425db96 -> 4b9234905
[HOTFIX] use --driver-java-options instead of --conf for branch-1.0
This fixes a build-break caused by b78422ae170b89fa09e8910e247cbfecc23442f8,
a previous hotfix.
Project: http://git-wip-us.apache.org/repos/asf
Repository: spark
Updated Branches:
refs/heads/branch-1.2 d89964f86 -> 4bad85485
SPARK-5425: Use synchronised methods in system properties to create SparkConf
SPARK-5425: Fixed usages of system properties
This patch fixes few problems caused by the fact that the Scala wrapper over
system pro
the hadoop-2.3 or hadoop-2.4 profiles.
The jets3t release notes can be found at
http://www.jets3t.org/RELEASE_NOTES.html
Author: Josh Rosen
Closes #4454 from JoshRosen/SPARK-5671 and squashes the following commits:
fa6cb3e [Josh Rosen] [SPARK-5671] Upgrade jets3t to 0.9.2 in hadoop-2.3 and
the hadoop-2.3 or hadoop-2.4 profiles.
The jets3t release notes can be found at
http://www.jets3t.org/RELEASE_NOTES.html
Author: Josh Rosen
Closes #4454 from JoshRosen/SPARK-5671 and squashes the following commits:
fa6cb3e [Josh Rosen] [SPARK-5671] Upgrade jets3t to 0.9.2 in hadoop-2.3 and
Repository: spark
Updated Branches:
refs/heads/master 0e23ca9f8 -> e772b4e4e
SPARK-5403: Ignore UserKnownHostsFile in SSH calls
See https://issues.apache.org/jira/browse/SPARK-5403
Author: Grzegorz Dubicki
Closes #4196 from grzegorz-dubicki/SPARK-5403 and squashes the following
commits:
a
Repository: spark
Updated Branches:
refs/heads/branch-1.3 11b28b9b4 -> 3d99741b2
SPARK-5403: Ignore UserKnownHostsFile in SSH calls
See https://issues.apache.org/jira/browse/SPARK-5403
Author: Grzegorz Dubicki
Closes #4196 from grzegorz-dubicki/SPARK-5403 and squashes the following
commits
Repository: spark
Updated Branches:
refs/heads/branch-1.3 87e0f0dc6 -> 1d3234165
SPARK-5633 pyspark saveAsTextFile support for compression codec
See https://issues.apache.org/jira/browse/SPARK-5633 for details
Author: Vladimir Vladimirov
Closes #4403 from smartkiwi/master and squashes the f
Repository: spark
Updated Branches:
refs/heads/master 65181b751 -> b3872e00d
SPARK-5633 pyspark saveAsTextFile support for compression codec
See https://issues.apache.org/jira/browse/SPARK-5633 for details
Author: Vladimir Vladimirov
Closes #4403 from smartkiwi/master and squashes the follo
Repository: spark
Updated Branches:
refs/heads/master 3d3ecd774 -> 0f3a36071
[SPARK-4983] Insert waiting time before tagging EC2 instances
The boto API doesn't support tag EC2 instances in the same call that launches
them.
We add a five-second wait so EC2 has enough time to propagate the info
Repository: spark
Updated Branches:
refs/heads/branch-1.2 09da688b0 -> 36f70de83
[SPARK-4983] Insert waiting time before tagging EC2 instances
The boto API doesn't support tag EC2 instances in the same call that launches
them.
We add a five-second wait so EC2 has enough time to propagate the
Repository: spark
Updated Branches:
refs/heads/branch-1.3 2ef9853e7 -> 2872d8344
[SPARK-4983] Insert waiting time before tagging EC2 instances
The boto API doesn't support tag EC2 instances in the same call that launches
them.
We add a five-second wait so EC2 has enough time to propagate the
501 - 600 of 1110 matches
Mail list logo