GitHub user davies opened a pull request:
https://github.com/apache/spark/pull/5173
[WIP] [SPARK-4897] [PySpark] Python 3 support
This PR update PySpark to support Python 3 (tested with 3.4).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/davies/spark python3
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5173.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5173
----
commit 7c5b4ceaa82e5bc015a6cb60f9797f1c3764cefa
Author: Josh Rosen <[email protected]>
Date: 2014-12-19T07:35:10Z
Remove Python 3 check in shell.py.
commit 854be27d17c9cc2e46a23fe1e976e7896ca675dc
Author: Josh Rosen <[email protected]>
Date: 2014-12-19T07:44:28Z
Run `futurize` on Python code:
futurize --stage1 -w python
futurize --stage1 -w ec2
This is the raw output from `futurize`, with no editing.
commit 2adb42dcf252bebec6d50ca487f62219730f7613
Author: Josh Rosen <[email protected]>
Date: 2014-12-19T07:52:05Z
Fix up some import differences between Python 2 and 3
commit 79de9d0366089ffc9b3cd15a94fd2dad24be6988
Author: twneale <[email protected]>
Date: 2014-12-19T21:13:07Z
Replaces python2.7 `file` with 3.4 _io.TextIOWrapper
commit e1042153ce8d2ec37a629b8ae7a7090b3c744df8
Author: twneale <[email protected]>
Date: 2014-12-19T21:13:51Z
Replaces 2.7 types.InstsanceType with 3.4 `object`....could be horribly
wrong depending on how types.InstanceType is used elsewhere in the package--see
http://bugs.python.org/issue8206
commit f40d925d728643901cab91e95c1db590e145b754
Author: twneale <[email protected]>
Date: 2014-12-19T21:30:01Z
xrange --> range
commit b69ccdfdf5a58810725b3e100dd8d39ed1a8bcb3
Author: twneale <[email protected]>
Date: 2014-12-19T21:33:02Z
Uses the pure python pickle._Pickler instead of c-extension
_pickle.Pickler. It appears pyspark 2.7 uses the pure python pickler as well,
so this shouldn't degrade pickling performance (?).
commit 735437187bbc41aa536325b4e73d1da259bc2099
Author: twneale <[email protected]>
Date: 2014-12-19T21:35:38Z
buffer --> memoryview I'm not super sure if this a valid change, but the
2.7 docs recommend using memoryview over buffer where possible, so hoping it'll
work.
commit 1aa5e8f89df61c6e08a8a07e811dbe471651a9bb
Author: twneale <[email protected]>
Date: 2014-12-19T21:36:38Z
Turned out `pickle.DictionaryType is dict` == True, so swapped it out
commit 2fb2db348b8f724a06c95a75836fa77dcd642d0f
Author: Josh Rosen <[email protected]>
Date: 2014-12-31T22:01:28Z
Guard more changes behind sys.version; still doesn't run
commit 6e3c21d19004e62a65f3477924c0863d340ea9c3
Author: Davies Liu <[email protected]>
Date: 2015-03-20T00:14:48Z
make cloudpickle work with Python3
commit 1eebac24c1c86c680f0172e8f9f8d5459427881b
Author: Davies Liu <[email protected]>
Date: 2015-03-20T00:37:13Z
fix conflict in ec2/spark_ec2.py
commit 35f48fe6f784e1978f080822c457b575c72b1e22
Author: Davies Liu <[email protected]>
Date: 2015-03-20T00:41:18Z
run future again
commit 24b2f2ee9d70536e6f378c270bc31b3aede69973
Author: Davies Liu <[email protected]>
Date: 2015-03-23T22:04:09Z
pass all RDD tests
commit 78901a71c0cb884cb78a22998d767f3cff25f9e8
Author: Davies Liu <[email protected]>
Date: 2015-03-23T22:49:09Z
fix hash of serializer in Python 3
commit 431a8de00b0e79e1a65f62cd266fe00cefc4c514
Author: Davies Liu <[email protected]>
Date: 2015-03-23T23:13:25Z
streaming tests pass
commit 6cc42a99133baa128c586fb1a99813736e271c70
Author: Davies Liu <[email protected]>
Date: 2015-03-24T05:27:12Z
rename
commit 375ea17f102aeadd1fd9ef517779cf6ba035b9da
Author: Davies Liu <[email protected]>
Date: 2015-03-24T00:57:58Z
SQL tests pass
commit d737924481885100c17fdf016eda6529cb4142e0
Author: Davies Liu <[email protected]>
Date: 2015-03-24T05:47:58Z
pass ml tests
commit 7f4476eb7fd2ff7f93cda0df6ba42088a9f149da
Author: Davies Liu <[email protected]>
Date: 2015-03-24T17:11:43Z
mllib tests passed
commit 814c77bf55e4d58d9f6f2690fecc6e831d579e21
Author: Davies Liu <[email protected]>
Date: 2015-03-24T17:29:50Z
run unittests with python 3
commit a39167eaa29d62932d9924bd3a9b7a2e0a822ba0
Author: Davies Liu <[email protected]>
Date: 2015-03-24T18:19:21Z
support customize class in __main__
commit 70b6b73453e639c7471b61c2c53c623885f4e342
Author: Davies Liu <[email protected]>
Date: 2015-03-24T20:26:24Z
compile ec2/spark_ec2.py in python 3
commit f53e1f0519a987d967d6191162396cef56a45761
Author: Davies Liu <[email protected]>
Date: 2015-03-24T21:00:42Z
fix tests
commit 8662d5b41f1eca18826530806e55e38298965553
Author: Davies Liu <[email protected]>
Date: 2015-03-24T21:04:17Z
Merge branch 'master' of github.com:apache/spark into python3
Conflicts:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]