Hi everyone,
Using joblib with compress=0 worked! Is it a joblib bug that compress=3 is not
portable?
Joel, here are the tracebacks from standard Python pickles of increasing
protocols (0, 1, 2), saved in Python 2 and attempting to load them in Python 3:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-100e36105a73> in <module>()
1 with open('rf-1.pck', 'r') as fin:
----> 2 rf1 = pck.load(fin)
3
TypeError: 'str' does not support the buffer interface
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-7-100e36105a73> in <module>()
1 with open('rf-1.pck', 'r') as fin:
----> 2 rf1 = pck.load(fin)
3
/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/codecs.py in
decode(self, input, final)
311 # decode input (taking the buffer into account)
312 data = self.buffer + input
--> 313 (result, consumed) = self._buffer_decode(data, self.errors,
final)
314 # keep undecoded input until the next call
315 self.buffer = data[consumed:]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 595:
invalid start byte
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-5-100e36105a73> in <module>()
1 with open('rf-1.pck', 'r') as fin:
----> 2 rf1 = pck.load(fin)
3
/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/codecs.py in
decode(self, input, final)
311 # decode input (taking the buffer into account)
312 data = self.buffer + input
--> 313 (result, consumed) = self._buffer_decode(data, self.errors,
final)
314 # keep undecoded input until the next call
315 self.buffer = data[consumed:]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid
start byte
Thanks again everyone!
On Friday, Jan 23, 2015 at 1:49 pm, Juan Nunez-Iglesias <jni.s...@gmail.com>,
wrote:
Joel, *sorry*, I should probably have mentioned this earlier:
joblib.dump takes a "compress" kwarg, which I used, probably 3 as recommended
by the docstring, so that I wouldn't have a bajillion files representing my RF.
So the zipping error makes perfect sense, except that I wouldn't expect gzip to
change between Python versions. ;) I haven't tried using compress=0, but would
like to avoid that if possible! (these test RFs are in my repo.)
I'm on a different computer right now so will submit pickle traceback later...
But hoping there's a good joblib-based solution! =)
Juan.
On Fri, Jan 23, 2015 at 1:38 PM, Joel Nothman <joel.noth...@gmail.com> wrote:
Could you provide the traceback when using pickle? The joblib error is about
zipping, which should not be applicable there...
On 23 January 2015 at 13:30, Juan Nunez-Iglesias <jni.s...@gmail.com> wrote:
Nope, the Py2 RF was saved with joblib!
The SO response might work for standard pickling though, I'll give that a try,
thanks!
On Fri, Jan 23, 2015 at 11:18 AM, Sebastian Raschka <se.rasc...@gmail.com>
wrote:
Sorry, I think my previous message was a little bit ambiguous.
What I would try is:
1) Unpickle the original pickle file in Python 2
2) Pickle it via joblib
3) Load it in Python 3
(I think you only did step 3), right? Sorry for the confusion).
I also just saw a related SO post that might be very helpful:
http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3
Best,
Sebastian
On Jan 22, 2015, at 5:10 PM, jni.s...@gmail.com wrote:
Hi Sebastian,
Thanks for the response, but actually joblib doesn't work either:
In [1]: from sklearn.externals import joblib
In [2]: rf = joblib.load('rf-1.joblib')
---------------------------------------------------------------------------
error Traceback (most recent call last)
<ipython-input-3-2c47f0ec1d5b> in <module>()
----> 1 rf = joblib.load('rf-1.joblib')
/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
in load(filename, mmap_mode)
417 'ignoring mmap_mode "%(mmap_mode)s" flag
passed'
418 % locals(), Warning, stacklevel=2)
--> 419 unpickler = ZipNumpyUnpickler(filename,
file_handle=file_handle)
420 else:
421 unpickler = NumpyUnpickler(filename,
file_handle=file_handle,
/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
in __init__(self, filename, file_handle)
306 NumpyUnpickler.__init__(self, filename,
307 file_handle,
--> 308 mmap_mode=None)
309
310 def _open_pickle(self, file_handle):
/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
in __init__(self, filename, file_handle, mmap_mode)
264 self._dirname = os.path.dirname(filename)
265 self.mmap_mode = mmap_mode
--> 266 self.file_handle = self._open_pickle(file_handle)
267 Unpickler.__init__(self, self.file_handle)
268 try:
/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
in _open_pickle(self, file_handle)
309
310 def _open_pickle(self, file_handle):
--> 311 return BytesIO(read_zfile(file_handle))
312
313
/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
in read_zfile(file_handle)
66 # We use the known length of the data to tell Zlib the size of the
67 # buffer to allocate.
---> 68 data = zlib.decompress(file_handle.read(), 15, length)
69 assert len(data) == length, (
70 "Incorrect data length while decompressing %s."
error: Error -3 while decompressing data: incorrect header check
The very same commands work fine in Py2:
In [1]: from sklearn.externals import joblib
In [2]: rf1 = joblib.load('rf-1.joblib')
In [3]:
Is this unexpected?
On Fri, Jan 23, 2015 at 1:57 AM, Sebastian Raschka <se.rasc...@gmail.com> wrote:
Hi, Juan,
It's been some time, but I remember that I had similar issues. I think it has
to do with the numpy arrays that specifically cause problems in pickle.
(http://bugs.python.org/issue6784)
You could try to use joblib (which should also be more efficient):
>>> from sklearn.externals import joblib
>>> joblib.dump(clf, 'filename.pkl')
>>> clf = joblib.load('filename.pkl')
(http://scikit-learn.org/stable/modules/model_persistence.html)
Best,
Sebastian
> On Jan 22, 2015, at 8:50 AM, jni.s...@gmail.com wrote:
>
> Hi all,
>
> I'm working on a project that depends on sklearn. I've been up test coverage
> (which includes saving a RandomForest, so far using joblib serialization),
> and now I wanted to make the project Python 3-compatible. However, the final
> roadblock is the sharing of RF objects: I can't load the Python 2-serialized
> RFs with Python 3 tests. Of course, the test outcome depends on the exact RF
> that was created a while back. Is there any way around this?
>
> Thanks!
>
> Juan.
>
>
> ------------------------------------------------------------------------------
> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
> GigeNET is offering a free month of service with a new server in Ashburn.
> Choose from 2 high performing configs, both with 100TB of bandwidth.
> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
> http://p.sf.net/sfu/gigenet_______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general