Re: [Scikit-learn-general] Sharing objects between Python 2 and 3
Hi Gaël, I know virtually nothing about string/unicode handling and compression, but I do know what I want to work... I'm happy to open a PR and add the (failing) tests, but someone with more expertise in those fields would have to actually get this working or at least provide me with extensive clues. Any takers? Given that pickle doesn't seem to work across Python versions, I think this could be very valuable to the wider community! Juan. On Sat, Jan 24, 2015 at 10:24 PM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: They all sound related to the Py3k handling of Unicode, in which case I'm guessing a search should find cases of this issue elsewhere. I'm glad joblib worked in the end, but maybe it's worth leaving an issue on the joblib project so that it could be appropriately tested or documented. joblib doesn't warrant in anyway that object stored in one environment can be restored in another (maybe that should be better documented). In a sens, I am not against work on better support for this, but it will require quite a complex test suite. I don't see myself investing resources on that in the near future (things like better parallelism and cache replacement are higher on my list of priorities). If someone wants to work on this, that person should work on demonstrating an automated test suite (working on travis) first. The reason is that if we cannot test such a behavior, I don't think that we can maintain it. -- New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely compliant. http://p.sf.net/sfu/gigenet ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general-- New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely compliant. http://p.sf.net/sfu/gigenet___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Sharing objects between Python 2 and 3
They all sound related to the Py3k handling of Unicode, in which case I'm guessing a search should find cases of this issue elsewhere. I'm glad joblib worked in the end, but maybe it's worth leaving an issue on the joblib project so that it could be appropriately tested or documented. On 23 January 2015 at 23:40, jni.s...@gmail.com wrote: Hi everyone, Using joblib with compress=0 worked! Is it a joblib bug that compress=3 is not portable? Joel, here are the tracebacks from standard Python pickles of increasing protocols (0, 1, 2), saved in Python 2 and attempting to load them in Python 3: --- TypeError Traceback (most recent call last) ipython-input-6-100e36105a73 in module() 1 with open('rf-1.pck', 'r') as fin: 2 rf1 = pck.load(fin) 3 TypeError: 'str' does not support the buffer interface --- UnicodeDecodeErrorTraceback (most recent call last) ipython-input-7-100e36105a73 in module() 1 with open('rf-1.pck', 'r') as fin: 2 rf1 = pck.load(fin) 3 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/codecs.py in decode(self, input, final) 311 # decode input (taking the buffer into account) 312 data = self.buffer + input -- 313 (result, consumed) = self._buffer_decode(data, self.errors, final) 314 # keep undecoded input until the next call 315 self.buffer = data[consumed:] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 595: invalid start byte --- UnicodeDecodeErrorTraceback (most recent call last) ipython-input-5-100e36105a73 in module() 1 with open('rf-1.pck', 'r') as fin: 2 rf1 = pck.load(fin) 3 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/codecs.py in decode(self, input, final) 311 # decode input (taking the buffer into account) 312 data = self.buffer + input -- 313 (result, consumed) = self._buffer_decode(data, self.errors, final) 314 # keep undecoded input until the next call 315 self.buffer = data[consumed:] UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte Thanks again everyone! On Friday, Jan 23, 2015 at 1:49 pm, Juan Nunez-Iglesias jni.s...@gmail.com, wrote: Joel, *sorry*, I should probably have mentioned this earlier: joblib.dump takes a compress kwarg, which I used, probably 3 as recommended by the docstring, so that I wouldn't have a bajillion files representing my RF. So the zipping error makes perfect sense, except that I wouldn't expect gzip to change between Python versions. ;) I haven't tried using compress=0, but would like to avoid that if possible! (these test RFs are in my repo.) I'm on a different computer right now so will submit pickle traceback later... But hoping there's a good joblib-based solution! =) Juan. On Fri, Jan 23, 2015 at 1:38 PM, Joel Nothman joel.noth...@gmail.com wrote: Could you provide the traceback when using pickle? The joblib error is about zipping, which should not be applicable there... On 23 January 2015 at 13:30, Juan Nunez-Iglesias jni.s...@gmail.com wrote: Nope, the Py2 RF was saved with joblib! The SO response might work for standard pickling though, I'll give that a try, thanks! On Fri, Jan 23, 2015 at 11:18 AM, Sebastian Raschka se.rasc...@gmail.com wrote: Sorry, I think my previous message was a little bit ambiguous. What I would try is: 1) Unpickle the original pickle file in Python 2 2) Pickle it via joblib 3) Load it in Python 3 (I think you only did step 3), right? Sorry for the confusion). I also just saw a related SO post that might be very helpful: http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3 Best, Sebastian On Jan 22, 2015, at 5:10 PM, jni.s...@gmail.com wrote: Hi Sebastian, Thanks for the response, but actually joblib doesn't work either: In [1]: from sklearn.externals import joblib In [2]: rf = joblib.load('rf-1.joblib') --- error Traceback (most recent call last) ipython-input-3-2c47f0ec1d5b in module() 1 rf = joblib.load('rf-1.joblib') /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in load(filename, mmap_mode) 417 'ignoring mmap_mode %(mmap_mode)s flag passed' 418 % locals(), Warning, stacklevel=2) -- 419 unpickler =
Re: [Scikit-learn-general] Sharing objects between Python 2 and 3
They all sound related to the Py3k handling of Unicode, in which case I'm guessing a search should find cases of this issue elsewhere. I'm glad joblib worked in the end, but maybe it's worth leaving an issue on the joblib project so that it could be appropriately tested or documented. joblib doesn't warrant in anyway that object stored in one environment can be restored in another (maybe that should be better documented). In a sens, I am not against work on better support for this, but it will require quite a complex test suite. I don't see myself investing resources on that in the near future (things like better parallelism and cache replacement are higher on my list of priorities). If someone wants to work on this, that person should work on demonstrating an automated test suite (working on travis) first. The reason is that if we cannot test such a behavior, I don't think that we can maintain it. -- New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely compliant. http://p.sf.net/sfu/gigenet ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Sharing objects between Python 2 and 3
Hi everyone, Using joblib with compress=0 worked! Is it a joblib bug that compress=3 is not portable? Joel, here are the tracebacks from standard Python pickles of increasing protocols (0, 1, 2), saved in Python 2 and attempting to load them in Python 3: --- TypeError Traceback (most recent call last) ipython-input-6-100e36105a73 in module() 1 with open('rf-1.pck', 'r') as fin: 2 rf1 = pck.load(fin) 3 TypeError: 'str' does not support the buffer interface --- UnicodeDecodeError Traceback (most recent call last) ipython-input-7-100e36105a73 in module() 1 with open('rf-1.pck', 'r') as fin: 2 rf1 = pck.load(fin) 3 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/codecs.py in decode(self, input, final) 311 # decode input (taking the buffer into account) 312 data = self.buffer + input -- 313 (result, consumed) = self._buffer_decode(data, self.errors, final) 314 # keep undecoded input until the next call 315 self.buffer = data[consumed:] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 595: invalid start byte --- UnicodeDecodeError Traceback (most recent call last) ipython-input-5-100e36105a73 in module() 1 with open('rf-1.pck', 'r') as fin: 2 rf1 = pck.load(fin) 3 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/codecs.py in decode(self, input, final) 311 # decode input (taking the buffer into account) 312 data = self.buffer + input -- 313 (result, consumed) = self._buffer_decode(data, self.errors, final) 314 # keep undecoded input until the next call 315 self.buffer = data[consumed:] UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte Thanks again everyone! On Friday, Jan 23, 2015 at 1:49 pm, Juan Nunez-Iglesias jni.s...@gmail.com, wrote: Joel, *sorry*, I should probably have mentioned this earlier: joblib.dump takes a compress kwarg, which I used, probably 3 as recommended by the docstring, so that I wouldn't have a bajillion files representing my RF. So the zipping error makes perfect sense, except that I wouldn't expect gzip to change between Python versions. ;) I haven't tried using compress=0, but would like to avoid that if possible! (these test RFs are in my repo.) I'm on a different computer right now so will submit pickle traceback later... But hoping there's a good joblib-based solution! =) Juan. On Fri, Jan 23, 2015 at 1:38 PM, Joel Nothman joel.noth...@gmail.com wrote: Could you provide the traceback when using pickle? The joblib error is about zipping, which should not be applicable there... On 23 January 2015 at 13:30, Juan Nunez-Iglesias jni.s...@gmail.com wrote: Nope, the Py2 RF was saved with joblib! The SO response might work for standard pickling though, I'll give that a try, thanks! On Fri, Jan 23, 2015 at 11:18 AM, Sebastian Raschka se.rasc...@gmail.com wrote: Sorry, I think my previous message was a little bit ambiguous. What I would try is: 1) Unpickle the original pickle file in Python 2 2) Pickle it via joblib 3) Load it in Python 3 (I think you only did step 3), right? Sorry for the confusion). I also just saw a related SO post that might be very helpful: http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3 Best, Sebastian On Jan 22, 2015, at 5:10 PM, jni.s...@gmail.com wrote: Hi Sebastian, Thanks for the response, but actually joblib doesn't work either: In [1]: from sklearn.externals import joblib In [2]: rf = joblib.load('rf-1.joblib') --- error Traceback (most recent call last) ipython-input-3-2c47f0ec1d5b in module() 1 rf = joblib.load('rf-1.joblib') /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in load(filename, mmap_mode) 417 'ignoring mmap_mode %(mmap_mode)s flag passed' 418 % locals(), Warning, stacklevel=2) -- 419 unpickler = ZipNumpyUnpickler(filename, file_handle=file_handle) 420 else: 421 unpickler = NumpyUnpickler(filename, file_handle=file_handle, /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in
[Scikit-learn-general] Sharing objects between Python 2 and 3
Hi all, I'm working on a project that depends on sklearn. I've been up test coverage (which includes saving a RandomForest, so far using joblib serialization), and now I wanted to make the project Python 3-compatible. However, the final roadblock is the sharing of RF objects: I can't load the Python 2-serialized RFs with Python 3 tests. Of course, the test outcome depends on the exact RF that was created a while back. Is there any way around this? Thanks! Juan.-- New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely compliant. http://p.sf.net/sfu/gigenet___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Sharing objects between Python 2 and 3
Hi Sebastian, Thanks for the response, but actually joblib doesn't work either: In [1]: from sklearn.externals import joblib In [2]: rf = joblib.load('rf-1.joblib') --- error Traceback (most recent call last) ipython-input-3-2c47f0ec1d5b in module() 1 rf = joblib.load('rf-1.joblib') /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in load(filename, mmap_mode) 417 'ignoring mmap_mode %(mmap_mode)s flag passed' 418 % locals(), Warning, stacklevel=2) -- 419 unpickler = ZipNumpyUnpickler(filename, file_handle=file_handle) 420 else: 421 unpickler = NumpyUnpickler(filename, file_handle=file_handle, /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in __init__(self, filename, file_handle) 306 NumpyUnpickler.__init__(self, filename, 307 file_handle, -- 308 mmap_mode=None) 309 310 def _open_pickle(self, file_handle): /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in __init__(self, filename, file_handle, mmap_mode) 264 self._dirname = os.path.dirname(filename) 265 self.mmap_mode = mmap_mode -- 266 self.file_handle = self._open_pickle(file_handle) 267 Unpickler.__init__(self, self.file_handle) 268 try: /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in _open_pickle(self, file_handle) 309 310 def _open_pickle(self, file_handle): -- 311 return BytesIO(read_zfile(file_handle)) 312 313 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in read_zfile(file_handle) 66 # We use the known length of the data to tell Zlib the size of the 67 # buffer to allocate. --- 68 data = zlib.decompress(file_handle.read(), 15, length) 69 assert len(data) == length, ( 70 Incorrect data length while decompressing %s. error: Error -3 while decompressing data: incorrect header check The very same commands work fine in Py2: In [1]: from sklearn.externals import joblib In [2]: rf1 = joblib.load('rf-1.joblib') In [3]: Is this unexpected? On Fri, Jan 23, 2015 at 1:57 AM, Sebastian Raschka se.rasc...@gmail.com wrote: Hi, Juan, It's been some time, but I remember that I had similar issues. I think it has to do with the numpy arrays that specifically cause problems in pickle. (http://bugs.python.org/issue6784) You could try to use joblib (which should also be more efficient): from sklearn.externals import joblib joblib.dump(clf, 'filename.pkl') clf = joblib.load('filename.pkl') (http://scikit-learn.org/stable/modules/model_persistence.html) Best, Sebastian On Jan 22, 2015, at 8:50 AM, jni.s...@gmail.com wrote: Hi all, I'm working on a project that depends on sklearn. I've been up test coverage (which includes saving a RandomForest, so far using joblib serialization), and now I wanted to make the project Python 3-compatible. However, the final roadblock is the sharing of RF objects: I can't load the Python 2-serialized RFs with Python 3 tests. Of course, the test outcome depends on the exact RF that was created a while back. Is there any way around this? Thanks! Juan. -- New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely compliant. http://p.sf.net/sfu/gigenet___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely compliant. http://p.sf.net/sfu/gigenet ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net
Re: [Scikit-learn-general] Sharing objects between Python 2 and 3
Hi, Juan, It's been some time, but I remember that I had similar issues. I think it has to do with the numpy arrays that specifically cause problems in pickle. (http://bugs.python.org/issue6784) You could try to use joblib (which should also be more efficient): from sklearn.externals import joblib joblib.dump(clf, 'filename.pkl') clf = joblib.load('filename.pkl') (http://scikit-learn.org/stable/modules/model_persistence.html) Best, Sebastian On Jan 22, 2015, at 8:50 AM, jni.s...@gmail.com wrote: Hi all, I'm working on a project that depends on sklearn. I've been up test coverage (which includes saving a RandomForest, so far using joblib serialization), and now I wanted to make the project Python 3-compatible. However, the final roadblock is the sharing of RF objects: I can't load the Python 2-serialized RFs with Python 3 tests. Of course, the test outcome depends on the exact RF that was created a while back. Is there any way around this? Thanks! Juan. -- New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely compliant. http://p.sf.net/sfu/gigenet___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely compliant. http://p.sf.net/sfu/gigenet ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Sharing objects between Python 2 and 3
Nope, the Py2 RF was saved with joblib! The SO response might work for standard pickling though, I'll give that a try, thanks! On Fri, Jan 23, 2015 at 11:18 AM, Sebastian Raschka se.rasc...@gmail.com wrote: Sorry, I think my previous message was a little bit ambiguous. What I would try is: 1) Unpickle the original pickle file in Python 2 2) Pickle it via joblib 3) Load it in Python 3 (I think you only did step 3), right? Sorry for the confusion). I also just saw a related SO post that might be very helpful: http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3 http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3 Best, Sebastian On Jan 22, 2015, at 5:10 PM, jni.s...@gmail.com wrote: Hi Sebastian, Thanks for the response, but actually joblib doesn't work either: In [1]: from sklearn.externals import joblib In [2]: rf = joblib.load('rf-1.joblib') --- error Traceback (most recent call last) ipython-input-3-2c47f0ec1d5b in module() 1 rf = joblib.load('rf-1.joblib') /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in load(filename, mmap_mode) 417 'ignoring mmap_mode %(mmap_mode)s flag passed' 418 % locals(), Warning, stacklevel=2) -- 419 unpickler = ZipNumpyUnpickler(filename, file_handle=file_handle) 420 else: 421 unpickler = NumpyUnpickler(filename, file_handle=file_handle, /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in __init__(self, filename, file_handle) 306 NumpyUnpickler.__init__(self, filename, 307 file_handle, -- 308 mmap_mode=None) 309 310 def _open_pickle(self, file_handle): /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in __init__(self, filename, file_handle, mmap_mode) 264 self._dirname = os.path.dirname(filename) 265 self.mmap_mode = mmap_mode -- 266 self.file_handle = self._open_pickle(file_handle) 267 Unpickler.__init__(self, self.file_handle) 268 try: /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in _open_pickle(self, file_handle) 309 310 def _open_pickle(self, file_handle): -- 311 return BytesIO(read_zfile(file_handle)) 312 313 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in read_zfile(file_handle) 66 # We use the known length of the data to tell Zlib the size of the 67 # buffer to allocate. --- 68 data = zlib.decompress(file_handle.read(), 15, length) 69 assert len(data) == length, ( 70 Incorrect data length while decompressing %s. error: Error -3 while decompressing data: incorrect header check The very same commands work fine in Py2: In [1]: from sklearn.externals import joblib In [2]: rf1 = joblib.load('rf-1.joblib') In [3]: Is this unexpected? On Fri, Jan 23, 2015 at 1:57 AM, Sebastian Raschka se.rasc...@gmail.com mailto:se.rasc...@gmail.com wrote: Hi, Juan, It's been some time, but I remember that I had similar issues. I think it has to do with the numpy arrays that specifically cause problems in pickle. (http://bugs.python.org/issue6784) You could try to use joblib (which should also be more efficient): from sklearn.externals import joblib joblib.dump(clf, 'filename.pkl') clf = joblib.load('filename.pkl') (http://scikit-learn.org/stable/modules/model_persistence.html) Best, Sebastian On Jan 22, 2015, at 8:50 AM, jni.s...@gmail.com wrote: Hi all, I'm working on a project that depends on sklearn. I've been up test coverage (which includes saving a RandomForest, so far using joblib serialization), and now I wanted to make the project Python 3-compatible. However, the final roadblock is the sharing of RF objects: I can't load the Python 2-serialized RFs with Python 3 tests. Of course, the test outcome depends on the exact RF that was created a while back. Is there any way around this? Thanks! Juan. -- New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of
Re: [Scikit-learn-general] Sharing objects between Python 2 and 3
Could you provide the traceback when using pickle? The joblib error is about zipping, which should not be applicable there... On 23 January 2015 at 13:30, Juan Nunez-Iglesias jni.s...@gmail.com wrote: Nope, the Py2 RF was saved with joblib! The SO response might work for standard pickling though, I'll give that a try, thanks! On Fri, Jan 23, 2015 at 11:18 AM, Sebastian Raschka se.rasc...@gmail.com wrote: Sorry, I think my previous message was a little bit ambiguous. What I would try is: 1) Unpickle the original pickle file in Python 2 2) Pickle it via joblib 3) Load it in Python 3 (I think you only did step 3), right? Sorry for the confusion). I also just saw a related SO post that might be very helpful: http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3 Best, Sebastian On Jan 22, 2015, at 5:10 PM, jni.s...@gmail.com wrote: Hi Sebastian, Thanks for the response, but actually joblib doesn't work either: In [1]: from sklearn.externals import joblib In [2]: rf = joblib.load('rf-1.joblib') --- error Traceback (most recent call last) ipython-input-3-2c47f0ec1d5b in module() 1 rf = joblib.load('rf-1.joblib') /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in load(filename, mmap_mode) 417 'ignoring mmap_mode %(mmap_mode)s flag passed' 418 % locals(), Warning, stacklevel=2) -- 419 unpickler = ZipNumpyUnpickler(filename, file_handle=file_handle) 420 else: 421 unpickler = NumpyUnpickler(filename, file_handle=file_handle, /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in __init__(self, filename, file_handle) 306 NumpyUnpickler.__init__(self, filename, 307 file_handle, -- 308 mmap_mode=None) 309 310 def _open_pickle(self, file_handle): /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in __init__(self, filename, file_handle, mmap_mode) 264 self._dirname = os.path.dirname(filename) 265 self.mmap_mode = mmap_mode -- 266 self.file_handle = self._open_pickle(file_handle) 267 Unpickler.__init__(self, self.file_handle) 268 try: /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in _open_pickle(self, file_handle) 309 310 def _open_pickle(self, file_handle): -- 311 return BytesIO(read_zfile(file_handle)) 312 313 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in read_zfile(file_handle) 66 # We use the known length of the data to tell Zlib the size of the 67 # buffer to allocate. --- 68 data = zlib.decompress(file_handle.read(), 15, length) 69 assert len(data) == length, ( 70 Incorrect data length while decompressing %s. error: Error -3 while decompressing data: incorrect header check The very same commands work fine in Py2: In [1]: from sklearn.externals import joblib In [2]: rf1 = joblib.load('rf-1.joblib') In [3]: Is this unexpected? On Fri, Jan 23, 2015 at 1:57 AM, Sebastian Raschka se.rasc...@gmail.com wrote: Hi, Juan, It's been some time, but I remember that I had similar issues. I think it has to do with the numpy arrays that specifically cause problems in pickle. (http://bugs.python.org/issue6784) You could try to use joblib (which should also be more efficient): from sklearn.externals import joblib joblib.dump(clf, 'filename.pkl') clf = joblib.load('filename.pkl') (http://scikit-learn.org/stable/modules/model_persistence.html) Best, Sebastian On Jan 22, 2015, at 8:50 AM, jni.s...@gmail.com wrote: Hi all, I'm working on a project that depends on sklearn. I've been up test coverage (which includes saving a RandomForest, so far using joblib serialization), and now I wanted to make the project Python 3-compatible. However, the final roadblock is the sharing of RF objects: I can't load the Python 2-serialized RFs with Python 3 tests. Of course, the test outcome depends on the exact RF that was created a while back. Is there any way around this? Thanks! Juan. -- New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of
Re: [Scikit-learn-general] Sharing objects between Python 2 and 3
Joel, *sorry*, I should probably have mentioned this earlier: joblib.dump takes a compress kwarg, which I used, probably 3 as recommended by the docstring, so that I wouldn't have a bajillion files representing my RF. So the zipping error makes perfect sense, except that I wouldn't expect gzip to change between Python versions. ;) I haven't tried using compress=0, but would like to avoid that if possible! (these test RFs are in my repo.) I'm on a different computer right now so will submit pickle traceback later... But hoping there's a good joblib-based solution! =) Juan. On Fri, Jan 23, 2015 at 1:38 PM, Joel Nothman joel.noth...@gmail.com wrote: Could you provide the traceback when using pickle? The joblib error is about zipping, which should not be applicable there... On 23 January 2015 at 13:30, Juan Nunez-Iglesias jni.s...@gmail.com wrote: Nope, the Py2 RF was saved with joblib! The SO response might work for standard pickling though, I'll give that a try, thanks! On Fri, Jan 23, 2015 at 11:18 AM, Sebastian Raschka se.rasc...@gmail.com wrote: Sorry, I think my previous message was a little bit ambiguous. What I would try is: 1) Unpickle the original pickle file in Python 2 2) Pickle it via joblib 3) Load it in Python 3 (I think you only did step 3), right? Sorry for the confusion). I also just saw a related SO post that might be very helpful: http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3 Best, Sebastian On Jan 22, 2015, at 5:10 PM, jni.s...@gmail.com wrote: Hi Sebastian, Thanks for the response, but actually joblib doesn't work either: In [1]: from sklearn.externals import joblib In [2]: rf = joblib.load('rf-1.joblib') --- error Traceback (most recent call last) ipython-input-3-2c47f0ec1d5b in module() 1 rf = joblib.load('rf-1.joblib') /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in load(filename, mmap_mode) 417 'ignoring mmap_mode %(mmap_mode)s flag passed' 418 % locals(), Warning, stacklevel=2) -- 419 unpickler = ZipNumpyUnpickler(filename, file_handle=file_handle) 420 else: 421 unpickler = NumpyUnpickler(filename, file_handle=file_handle, /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in __init__(self, filename, file_handle) 306 NumpyUnpickler.__init__(self, filename, 307 file_handle, -- 308 mmap_mode=None) 309 310 def _open_pickle(self, file_handle): /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in __init__(self, filename, file_handle, mmap_mode) 264 self._dirname = os.path.dirname(filename) 265 self.mmap_mode = mmap_mode -- 266 self.file_handle = self._open_pickle(file_handle) 267 Unpickler.__init__(self, self.file_handle) 268 try: /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in _open_pickle(self, file_handle) 309 310 def _open_pickle(self, file_handle): -- 311 return BytesIO(read_zfile(file_handle)) 312 313 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in read_zfile(file_handle) 66 # We use the known length of the data to tell Zlib the size of the 67 # buffer to allocate. --- 68 data = zlib.decompress(file_handle.read(), 15, length) 69 assert len(data) == length, ( 70 Incorrect data length while decompressing %s. error: Error -3 while decompressing data: incorrect header check The very same commands work fine in Py2: In [1]: from sklearn.externals import joblib In [2]: rf1 = joblib.load('rf-1.joblib') In [3]: Is this unexpected? On Fri, Jan 23, 2015 at 1:57 AM, Sebastian Raschka se.rasc...@gmail.com wrote: Hi, Juan, It's been some time, but I remember that I had similar issues. I think it has to do with the numpy arrays that specifically cause problems in pickle. (http://bugs.python.org/issue6784) You could try to use joblib (which should also be more efficient): from sklearn.externals import joblib joblib.dump(clf, 'filename.pkl') clf = joblib.load('filename.pkl') (http://scikit-learn.org/stable/modules/model_persistence.html) Best, Sebastian On Jan 22, 2015, at 8:50 AM, jni.s...@gmail.com wrote: Hi all, I'm working on a project that depends on sklearn. I've been up test coverage (which includes saving
Re: [Scikit-learn-general] Sharing objects between Python 2 and 3
Sorry, I think my previous message was a little bit ambiguous. What I would try is: 1) Unpickle the original pickle file in Python 2 2) Pickle it via joblib 3) Load it in Python 3 (I think you only did step 3), right? Sorry for the confusion). I also just saw a related SO post that might be very helpful: http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3 http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3 Best, Sebastian On Jan 22, 2015, at 5:10 PM, jni.s...@gmail.com wrote: Hi Sebastian, Thanks for the response, but actually joblib doesn't work either: In [1]: from sklearn.externals import joblib In [2]: rf = joblib.load('rf-1.joblib') --- error Traceback (most recent call last) ipython-input-3-2c47f0ec1d5b in module() 1 rf = joblib.load('rf-1.joblib') /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in load(filename, mmap_mode) 417 'ignoring mmap_mode %(mmap_mode)s flag passed' 418 % locals(), Warning, stacklevel=2) -- 419 unpickler = ZipNumpyUnpickler(filename, file_handle=file_handle) 420 else: 421 unpickler = NumpyUnpickler(filename, file_handle=file_handle, /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in __init__(self, filename, file_handle) 306 NumpyUnpickler.__init__(self, filename, 307 file_handle, -- 308 mmap_mode=None) 309 310 def _open_pickle(self, file_handle): /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in __init__(self, filename, file_handle, mmap_mode) 264 self._dirname = os.path.dirname(filename) 265 self.mmap_mode = mmap_mode -- 266 self.file_handle = self._open_pickle(file_handle) 267 Unpickler.__init__(self, self.file_handle) 268 try: /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in _open_pickle(self, file_handle) 309 310 def _open_pickle(self, file_handle): -- 311 return BytesIO(read_zfile(file_handle)) 312 313 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py in read_zfile(file_handle) 66 # We use the known length of the data to tell Zlib the size of the 67 # buffer to allocate. --- 68 data = zlib.decompress(file_handle.read(), 15, length) 69 assert len(data) == length, ( 70 Incorrect data length while decompressing %s. error: Error -3 while decompressing data: incorrect header check The very same commands work fine in Py2: In [1]: from sklearn.externals import joblib In [2]: rf1 = joblib.load('rf-1.joblib') In [3]: Is this unexpected? On Fri, Jan 23, 2015 at 1:57 AM, Sebastian Raschka se.rasc...@gmail.com mailto:se.rasc...@gmail.com wrote: Hi, Juan, It's been some time, but I remember that I had similar issues. I think it has to do with the numpy arrays that specifically cause problems in pickle. (http://bugs.python.org/issue6784) You could try to use joblib (which should also be more efficient): from sklearn.externals import joblib joblib.dump(clf, 'filename.pkl') clf = joblib.load('filename.pkl') (http://scikit-learn.org/stable/modules/model_persistence.html) Best, Sebastian On Jan 22, 2015, at 8:50 AM, jni.s...@gmail.com wrote: Hi all, I'm working on a project that depends on sklearn. I've been up test coverage (which includes saving a RandomForest, so far using joblib serialization), and now I wanted to make the project Python 3-compatible. However, the final roadblock is the sharing of RF objects: I can't load the Python 2-serialized RFs with Python 3 tests. Of course, the test outcome depends on the exact RF that was created a while back. Is there any way around this? Thanks! Juan. -- New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely compliant. http://p.sf.net/sfu/gigenet___ Scikit-learn-general mailing list