Re: [Scikit-learn-general] Sharing objects between Python 2 and 3

2015-01-24 Thread jni . soma
Hi Gaël,




I know virtually nothing about string/unicode handling and compression, but I 
do know what I want to work... I'm happy to open a PR and add the (failing) 
tests, but someone with more expertise in those fields would have to actually 
get this working or at least provide me with extensive clues. Any takers?




Given that pickle doesn't seem to work across Python versions, I think this 
could be very valuable to the wider community!




Juan.

On Sat, Jan 24, 2015 at 10:24 PM, Gael Varoquaux
gael.varoqu...@normalesup.org wrote:

 They all sound related to the Py3k handling of Unicode, in which case I'm
 guessing a search should find cases of this issue elsewhere. I'm glad joblib
 worked in the end, but maybe it's worth leaving an issue on the joblib 
 project
 so that it could be appropriately tested or documented.
 joblib doesn't warrant in anyway that object stored in one environment
 can be restored in another (maybe that should be better documented).
 In a sens, I am not against work on better support for this, but it will
 require quite a complex test suite.
 I don't see myself investing resources on that in the near future (things
 like better parallelism and cache replacement are higher on my list of
 priorities). If someone wants to work on this, that person should work on
 demonstrating an automated test suite (working on travis) first. The
 reason is that if we cannot test such a behavior, I don't think that we
 can maintain it.
 --
 New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
 GigeNET is offering a free month of service with a new server in Ashburn.
 Choose from 2 high performing configs, both with 100TB of bandwidth.
 Higher redundancy.Lower latency.Increased capacity.Completely compliant.
 http://p.sf.net/sfu/gigenet
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Sharing objects between Python 2 and 3

2015-01-24 Thread Joel Nothman
They all sound related to the Py3k handling of Unicode, in which case I'm
guessing a search should find cases of this issue elsewhere. I'm glad
joblib worked in the end, but maybe it's worth leaving an issue on the
joblib project so that it could be appropriately tested or documented.

On 23 January 2015 at 23:40, jni.s...@gmail.com wrote:

  Hi everyone,

 Using joblib with compress=0 worked! Is it a joblib bug that compress=3 is
 not portable?

 Joel, here are the tracebacks from standard Python pickles of increasing
 protocols (0, 1, 2), saved in Python 2 and attempting to load them in
 Python 3:



 ---
 TypeError Traceback (most recent call last)
 ipython-input-6-100e36105a73 in module()
   1 with open('rf-1.pck', 'r') as fin:
  2 rf1 = pck.load(fin)
   3

 TypeError: 'str' does not support the buffer interface



 ---
 UnicodeDecodeErrorTraceback (most recent call last)
 ipython-input-7-100e36105a73 in module()
   1 with open('rf-1.pck', 'r') as fin:
  2 rf1 = pck.load(fin)
   3

 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/codecs.py in
 decode(self, input, final)
 311 # decode input (taking the buffer into account)
 312 data = self.buffer + input
 -- 313 (result, consumed) = self._buffer_decode(data,
 self.errors, final)
 314 # keep undecoded input until the next call
 315 self.buffer = data[consumed:]

 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 595:
 invalid start byte



 ---
 UnicodeDecodeErrorTraceback (most recent call last)
 ipython-input-5-100e36105a73 in module()
   1 with open('rf-1.pck', 'r') as fin:
  2 rf1 = pck.load(fin)
   3

 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/codecs.py in
 decode(self, input, final)
 311 # decode input (taking the buffer into account)
 312 data = self.buffer + input
 -- 313 (result, consumed) = self._buffer_decode(data,
 self.errors, final)
 314 # keep undecoded input until the next call
 315 self.buffer = data[consumed:]

 UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0:
 invalid start byte


 Thanks again everyone!

 On Friday, Jan 23, 2015 at 1:49 pm, Juan Nunez-Iglesias 
 jni.s...@gmail.com, wrote:

 Joel, *sorry*, I should probably have mentioned this earlier:

 joblib.dump takes a compress kwarg, which I used, probably 3 as
 recommended by the docstring, so that I wouldn't have a bajillion files
 representing my RF. So the zipping error makes perfect sense, except that I
 wouldn't expect gzip to change between Python versions. ;) I haven't tried
 using compress=0, but would like to avoid that if possible! (these test RFs
 are in my repo.)

 I'm on a different computer right now so will submit pickle traceback
 later... But hoping there's a good joblib-based solution! =)

 Juan.




 On Fri, Jan 23, 2015 at 1:38 PM, Joel Nothman joel.noth...@gmail.com
 wrote:

 Could you provide the traceback when using pickle? The joblib error is
 about zipping, which should not be applicable there...

 On 23 January 2015 at 13:30, Juan Nunez-Iglesias jni.s...@gmail.com
 wrote:

 Nope, the Py2 RF was saved with joblib!

 The SO response might work for standard pickling though, I'll give that
 a try, thanks!




 On Fri, Jan 23, 2015 at 11:18 AM, Sebastian Raschka 
 se.rasc...@gmail.com wrote:

 Sorry, I think my previous message was a little bit ambiguous.

 What I would try is:

 1) Unpickle the original pickle file in Python 2
 2) Pickle it via joblib
 3) Load it in Python 3

 (I think you only did step 3), right? Sorry for the confusion).

 I also just saw a related SO post that might be very helpful:
 http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3

 Best,
 Sebastian


 On Jan 22, 2015, at 5:10 PM, jni.s...@gmail.com wrote:

 Hi Sebastian,

 Thanks for the response, but actually joblib doesn't work either:

  In [1]: from sklearn.externals import joblib

 In [2]: rf = joblib.load('rf-1.joblib')

 ---
 error Traceback (most recent call
 last)
 ipython-input-3-2c47f0ec1d5b in module()
  1 rf = joblib.load('rf-1.joblib')

 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in load(filename, mmap_mode)
 417   'ignoring mmap_mode
 %(mmap_mode)s flag passed'
 418   % locals(), Warning,
 stacklevel=2)
 -- 419 unpickler = 

Re: [Scikit-learn-general] Sharing objects between Python 2 and 3

2015-01-24 Thread Gael Varoquaux
 They all sound related to the Py3k handling of Unicode, in which case I'm
 guessing a search should find cases of this issue elsewhere. I'm glad joblib
 worked in the end, but maybe it's worth leaving an issue on the joblib project
 so that it could be appropriately tested or documented.


joblib doesn't warrant in anyway that object stored in one environment
can be restored in another (maybe that should be better documented).

In a sens, I am not against work on better support for this, but it will
require quite a complex test suite.

I don't see myself investing resources on that in the near future (things
like better parallelism and cache replacement are higher on my list of
priorities). If someone wants to work on this, that person should work on
demonstrating an automated test suite (working on travis) first. The
reason is that if we cannot test such a behavior, I don't think that we
can maintain it.

--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Sharing objects between Python 2 and 3

2015-01-23 Thread jni . soma
Hi everyone,




Using joblib with compress=0 worked! Is it a joblib bug that compress=3 is not 
portable?




Joel, here are the tracebacks from standard Python pickles of increasing 
protocols (0, 1, 2), saved in Python 2 and attempting to load them in Python 3:








---

TypeError                                 Traceback (most recent call last)

ipython-input-6-100e36105a73 in module()

      1 with open('rf-1.pck', 'r') as fin:

 2     rf1 = pck.load(fin)

      3




TypeError: 'str' does not support the buffer interface










---

UnicodeDecodeError                        Traceback (most recent call last)

ipython-input-7-100e36105a73 in module()

      1 with open('rf-1.pck', 'r') as fin:

 2     rf1 = pck.load(fin)

      3




/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/codecs.py in 
decode(self, input, final)

    311         # decode input (taking the buffer into account)

    312         data = self.buffer + input

-- 313         (result, consumed) = self._buffer_decode(data, self.errors, 
final)

    314         # keep undecoded input until the next call

    315         self.buffer = data[consumed:]




UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 595: 
invalid start byte










---

UnicodeDecodeError                        Traceback (most recent call last)

ipython-input-5-100e36105a73 in module()

      1 with open('rf-1.pck', 'r') as fin:

 2     rf1 = pck.load(fin)

      3




/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/codecs.py in 
decode(self, input, final)

    311         # decode input (taking the buffer into account)

    312         data = self.buffer + input

-- 313         (result, consumed) = self._buffer_decode(data, self.errors, 
final)

    314         # keep undecoded input until the next call

    315         self.buffer = data[consumed:]




UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid 
start byte







Thanks again everyone!








On Friday, Jan 23, 2015 at 1:49 pm, Juan Nunez-Iglesias jni.s...@gmail.com, 
wrote:

Joel, *sorry*, I should probably have mentioned this earlier:




joblib.dump takes a compress kwarg, which I used, probably 3 as recommended 
by the docstring, so that I wouldn't have a bajillion files representing my RF. 
So the zipping error makes perfect sense, except that I wouldn't expect gzip to 
change between Python versions. ;) I haven't tried using compress=0, but would 
like to avoid that if possible! (these test RFs are in my repo.)




I'm on a different computer right now so will submit pickle traceback later... 
But hoping there's a good joblib-based solution! =)




Juan.








On Fri, Jan 23, 2015 at 1:38 PM, Joel Nothman joel.noth...@gmail.com wrote:



Could you provide the traceback when using pickle? The joblib error is about 
zipping, which should not be applicable there...



On 23 January 2015 at 13:30, Juan Nunez-Iglesias jni.s...@gmail.com wrote:

Nope, the Py2 RF was saved with joblib!




The SO response might work for standard pickling though, I'll give that a try, 
thanks!









On Fri, Jan 23, 2015 at 11:18 AM, Sebastian Raschka se.rasc...@gmail.com 
wrote:



Sorry, I think my previous message was a little bit ambiguous.




What I would try is:




1) Unpickle the original pickle file in Python 2

2) Pickle it via joblib

3) Load it in Python 3




(I think you only did step 3), right? Sorry for the confusion).




I also just saw a related SO post that might be very helpful: 
http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3





Best,

Sebastian







On Jan 22, 2015, at 5:10 PM, jni.s...@gmail.com wrote:



Hi Sebastian,




Thanks for the response, but actually joblib doesn't work either:





In [1]: from sklearn.externals import joblib




In [2]: rf = joblib.load('rf-1.joblib')

---

error                                     Traceback (most recent call last)

ipython-input-3-2c47f0ec1d5b in module()

 1 rf = joblib.load('rf-1.joblib')




/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in load(filename, mmap_mode)

    417                               'ignoring mmap_mode %(mmap_mode)s flag 
passed'

    418                               % locals(), Warning, stacklevel=2)

-- 419             unpickler = ZipNumpyUnpickler(filename, 
file_handle=file_handle)

    420         else:

    421             unpickler = NumpyUnpickler(filename, 
file_handle=file_handle,




/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in 

[Scikit-learn-general] Sharing objects between Python 2 and 3

2015-01-22 Thread jni . soma
Hi all,


I'm working on a project that depends on sklearn. I've been up test coverage 
(which includes saving a RandomForest, so far using joblib serialization), and 
now I wanted to make the project Python 3-compatible. However, the final 
roadblock is the sharing of RF objects: I can't load the Python 2-serialized 
RFs with Python 3 tests. Of course, the test outcome depends on the exact RF 
that was created a while back. Is there any way around this?


Thanks!


Juan.--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Sharing objects between Python 2 and 3

2015-01-22 Thread jni . soma
Hi Sebastian,




Thanks for the response, but actually joblib doesn't work either:





In [1]: from sklearn.externals import joblib




In [2]: rf = joblib.load('rf-1.joblib')

---

error                                     Traceback (most recent call last)

ipython-input-3-2c47f0ec1d5b in module()

 1 rf = joblib.load('rf-1.joblib')




/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in load(filename, mmap_mode)

    417                               'ignoring mmap_mode %(mmap_mode)s flag 
passed'

    418                               % locals(), Warning, stacklevel=2)

-- 419             unpickler = ZipNumpyUnpickler(filename, 
file_handle=file_handle)

    420         else:

    421             unpickler = NumpyUnpickler(filename, 
file_handle=file_handle,




/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in __init__(self, filename, file_handle)

    306         NumpyUnpickler.__init__(self, filename,

    307                                 file_handle,

-- 308                                 mmap_mode=None)

    309

    310     def _open_pickle(self, file_handle):




/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in __init__(self, filename, file_handle, mmap_mode)

    264         self._dirname = os.path.dirname(filename)

    265         self.mmap_mode = mmap_mode

-- 266         self.file_handle = self._open_pickle(file_handle)

    267         Unpickler.__init__(self, self.file_handle)

    268         try:




/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in _open_pickle(self, file_handle)

    309

    310     def _open_pickle(self, file_handle):

-- 311         return BytesIO(read_zfile(file_handle))

    312

    313




/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in read_zfile(file_handle)

     66     # We use the known length of the data to tell Zlib the size of the

     67     # buffer to allocate.

--- 68     data = zlib.decompress(file_handle.read(), 15, length)

     69     assert len(data) == length, (

     70         Incorrect data length while decompressing %s.




error: Error -3 while decompressing data: incorrect header check







The very same commands work fine in Py2:





In [1]: from sklearn.externals import joblib




In [2]: rf1 = joblib.load('rf-1.joblib')




In [3]:







Is this unexpected?

On Fri, Jan 23, 2015 at 1:57 AM, Sebastian Raschka se.rasc...@gmail.com
wrote:

 Hi, Juan,
 It's been some time, but I  remember that I had similar issues. I think it 
 has to do with the numpy arrays that specifically cause problems in pickle. 
 (http://bugs.python.org/issue6784)
 You could try to use joblib (which should also be more efficient):
 from sklearn.externals import joblib
 joblib.dump(clf, 'filename.pkl')
 clf = joblib.load('filename.pkl') 
 (http://scikit-learn.org/stable/modules/model_persistence.html)   
  
 Best,
 Sebastian
 On Jan 22, 2015, at 8:50 AM, jni.s...@gmail.com wrote:
 
 Hi all,
 
 I'm working on a project that depends on sklearn. I've been up test coverage 
 (which includes saving a RandomForest, so far using joblib serialization), 
 and now I wanted to make the project Python 3-compatible. However, the final 
 roadblock is the sharing of RF objects: I can't load the Python 2-serialized 
 RFs with Python 3 tests. Of course, the test outcome depends on the exact RF 
 that was created a while back. Is there any way around this?
 
 Thanks!
 
 Juan.
 
 
 --
 New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
 GigeNET is offering a free month of service with a new server in Ashburn.
 Choose from 2 high performing configs, both with 100TB of bandwidth.
 Higher redundancy.Lower latency.Increased capacity.Completely compliant.
 http://p.sf.net/sfu/gigenet___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
 --
 New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
 GigeNET is offering a free month of service with a new server in Ashburn.
 Choose from 2 high performing configs, both with 100TB of bandwidth.
 Higher redundancy.Lower latency.Increased capacity.Completely compliant.
 http://p.sf.net/sfu/gigenet
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 

Re: [Scikit-learn-general] Sharing objects between Python 2 and 3

2015-01-22 Thread Sebastian Raschka
Hi, Juan,

It's been some time, but I  remember that I had similar issues. I think it has 
to do with the numpy arrays that specifically cause problems in pickle. 
(http://bugs.python.org/issue6784)

You could try to use joblib (which should also be more efficient):

 from sklearn.externals import joblib
 joblib.dump(clf, 'filename.pkl')
 clf = joblib.load('filename.pkl') 

(http://scikit-learn.org/stable/modules/model_persistence.html) 
 

Best,
Sebastian

 On Jan 22, 2015, at 8:50 AM, jni.s...@gmail.com wrote:
 
 Hi all,
 
 I'm working on a project that depends on sklearn. I've been up test coverage 
 (which includes saving a RandomForest, so far using joblib serialization), 
 and now I wanted to make the project Python 3-compatible. However, the final 
 roadblock is the sharing of RF objects: I can't load the Python 2-serialized 
 RFs with Python 3 tests. Of course, the test outcome depends on the exact RF 
 that was created a while back. Is there any way around this?
 
 Thanks!
 
 Juan.
 
 
 --
 New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
 GigeNET is offering a free month of service with a new server in Ashburn.
 Choose from 2 high performing configs, both with 100TB of bandwidth.
 Higher redundancy.Lower latency.Increased capacity.Completely compliant.
 http://p.sf.net/sfu/gigenet___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Sharing objects between Python 2 and 3

2015-01-22 Thread Juan Nunez-Iglesias
Nope, the Py2 RF was saved with joblib!




The SO response might work for standard pickling though, I'll give that a try, 
thanks!

On Fri, Jan 23, 2015 at 11:18 AM, Sebastian Raschka se.rasc...@gmail.com
wrote:

 Sorry, I think my previous message was a little bit ambiguous.
 What I would try is:
 1) Unpickle the original pickle file in Python 2
 2) Pickle it via joblib
 3) Load it in Python 3
 (I think you only did step 3), right? Sorry for the confusion).
 I also just saw a related SO post that might be very helpful: 
 http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3
  
 http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3
 Best,
 Sebastian
 On Jan 22, 2015, at 5:10 PM, jni.s...@gmail.com wrote:
 
 Hi Sebastian,
 
 Thanks for the response, but actually joblib doesn't work either:
 
 In [1]: from sklearn.externals import joblib
 
 In [2]: rf = joblib.load('rf-1.joblib')
 ---
 error Traceback (most recent call last)
 ipython-input-3-2c47f0ec1d5b in module()
  1 rf = joblib.load('rf-1.joblib')
 
 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
  in load(filename, mmap_mode)
 417   'ignoring mmap_mode %(mmap_mode)s 
 flag passed'
 418   % locals(), Warning, stacklevel=2)
 -- 419 unpickler = ZipNumpyUnpickler(filename, 
 file_handle=file_handle)
 420 else:
 421 unpickler = NumpyUnpickler(filename, 
 file_handle=file_handle,
 
 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
  in __init__(self, filename, file_handle)
 306 NumpyUnpickler.__init__(self, filename,
 307 file_handle,
 -- 308 mmap_mode=None)
 309
 310 def _open_pickle(self, file_handle):
 
 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
  in __init__(self, filename, file_handle, mmap_mode)
 264 self._dirname = os.path.dirname(filename)
 265 self.mmap_mode = mmap_mode
 -- 266 self.file_handle = self._open_pickle(file_handle)
 267 Unpickler.__init__(self, self.file_handle)
 268 try:
 
 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
  in _open_pickle(self, file_handle)
 309
 310 def _open_pickle(self, file_handle):
 -- 311 return BytesIO(read_zfile(file_handle))
 312
 313
 
 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
  in read_zfile(file_handle)
  66 # We use the known length of the data to tell Zlib the size of 
 the
  67 # buffer to allocate.
 --- 68 data = zlib.decompress(file_handle.read(), 15, length)
  69 assert len(data) == length, (
  70 Incorrect data length while decompressing %s.
 
 error: Error -3 while decompressing data: incorrect header check
 
 
 The very same commands work fine in Py2:
 
 In [1]: from sklearn.externals import joblib
 
 In [2]: rf1 = joblib.load('rf-1.joblib')
 
 In [3]:
 
 
 Is this unexpected?
 
 
 
 
 On Fri, Jan 23, 2015 at 1:57 AM, Sebastian Raschka se.rasc...@gmail.com 
 mailto:se.rasc...@gmail.com wrote:
 
 Hi, Juan, 
 
 It's been some time, but I remember that I had similar issues. I think it 
 has to do with the numpy arrays that specifically cause problems in pickle. 
 (http://bugs.python.org/issue6784) 
 
 You could try to use joblib (which should also be more efficient): 
 
  from sklearn.externals import joblib 
  joblib.dump(clf, 'filename.pkl') 
  clf = joblib.load('filename.pkl') 
 
 (http://scikit-learn.org/stable/modules/model_persistence.html)  
 
 
 Best, 
 Sebastian 
 
  On Jan 22, 2015, at 8:50 AM, jni.s...@gmail.com wrote: 
  
  Hi all, 
  
  I'm working on a project that depends on sklearn. I've been up test 
  coverage (which includes saving a RandomForest, so far using joblib 
  serialization), and now I wanted to make the project Python 3-compatible. 
  However, the final roadblock is the sharing of RF objects: I can't load 
  the Python 2-serialized RFs with Python 3 tests. Of course, the test 
  outcome depends on the exact RF that was created a while back. Is there 
  any way around this? 
  
  Thanks! 
  
  Juan. 
  
  
  --
   
  New Year. New Location. New Benefits. New Data Center in Ashburn, VA. 
  GigeNET is offering a free month of service with a new server in Ashburn. 
  Choose from 2 high performing configs, both with 100TB of 

Re: [Scikit-learn-general] Sharing objects between Python 2 and 3

2015-01-22 Thread Joel Nothman
Could you provide the traceback when using pickle? The joblib error is
about zipping, which should not be applicable there...

On 23 January 2015 at 13:30, Juan Nunez-Iglesias jni.s...@gmail.com wrote:

 Nope, the Py2 RF was saved with joblib!

 The SO response might work for standard pickling though, I'll give that a
 try, thanks!




 On Fri, Jan 23, 2015 at 11:18 AM, Sebastian Raschka se.rasc...@gmail.com
 wrote:

 Sorry, I think my previous message was a little bit ambiguous.

 What I would try is:

 1) Unpickle the original pickle file in Python 2
 2) Pickle it via joblib
 3) Load it in Python 3

 (I think you only did step 3), right? Sorry for the confusion).

 I also just saw a related SO post that might be very helpful:
 http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3

 Best,
 Sebastian


 On Jan 22, 2015, at 5:10 PM, jni.s...@gmail.com wrote:

 Hi Sebastian,

 Thanks for the response, but actually joblib doesn't work either:

  In [1]: from sklearn.externals import joblib

 In [2]: rf = joblib.load('rf-1.joblib')

 ---
 error Traceback (most recent call
 last)
 ipython-input-3-2c47f0ec1d5b in module()
  1 rf = joblib.load('rf-1.joblib')

 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in load(filename, mmap_mode)
 417   'ignoring mmap_mode %(mmap_mode)s
 flag passed'
 418   % locals(), Warning, stacklevel=2)
 -- 419 unpickler = ZipNumpyUnpickler(filename,
 file_handle=file_handle)
 420 else:
 421 unpickler = NumpyUnpickler(filename,
 file_handle=file_handle,

 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in __init__(self, filename, file_handle)
 306 NumpyUnpickler.__init__(self, filename,
 307 file_handle,
 -- 308 mmap_mode=None)
 309
 310 def _open_pickle(self, file_handle):

 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in __init__(self, filename, file_handle, mmap_mode)
 264 self._dirname = os.path.dirname(filename)
 265 self.mmap_mode = mmap_mode
 -- 266 self.file_handle = self._open_pickle(file_handle)
 267 Unpickler.__init__(self, self.file_handle)
 268 try:

 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in _open_pickle(self, file_handle)
 309
 310 def _open_pickle(self, file_handle):
 -- 311 return BytesIO(read_zfile(file_handle))
 312
 313

 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in read_zfile(file_handle)
  66 # We use the known length of the data to tell Zlib the size
 of the
  67 # buffer to allocate.
 --- 68 data = zlib.decompress(file_handle.read(), 15, length)
  69 assert len(data) == length, (
  70 Incorrect data length while decompressing %s.

 error: Error -3 while decompressing data: incorrect header check


 The very same commands work fine in Py2:

  In [1]: from sklearn.externals import joblib

 In [2]: rf1 = joblib.load('rf-1.joblib')

 In [3]:


 Is this unexpected?




 On Fri, Jan 23, 2015 at 1:57 AM, Sebastian Raschka se.rasc...@gmail.com
 wrote:

 Hi, Juan,

 It's been some time, but I remember that I had similar issues. I think
 it has to do with the numpy arrays that specifically cause problems in
 pickle. (http://bugs.python.org/issue6784)

 You could try to use joblib (which should also be more efficient):

  from sklearn.externals import joblib
  joblib.dump(clf, 'filename.pkl')
  clf = joblib.load('filename.pkl')

 (http://scikit-learn.org/stable/modules/model_persistence.html)


 Best,
 Sebastian

  On Jan 22, 2015, at 8:50 AM, jni.s...@gmail.com wrote:
 
  Hi all,
 
  I'm working on a project that depends on sklearn. I've been up test
 coverage (which includes saving a RandomForest, so far using joblib
 serialization), and now I wanted to make the project Python 3-compatible.
 However, the final roadblock is the sharing of RF objects: I can't load the
 Python 2-serialized RFs with Python 3 tests. Of course, the test outcome
 depends on the exact RF that was created a while back. Is there any way
 around this?
 
  Thanks!
 
  Juan.
 
 
 
 --

  New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
  GigeNET is offering a free month of service with a new server in
 Ashburn.
  Choose from 2 high performing configs, both with 100TB of 

Re: [Scikit-learn-general] Sharing objects between Python 2 and 3

2015-01-22 Thread Juan Nunez-Iglesias
Joel, *sorry*, I should probably have mentioned this earlier:




joblib.dump takes a compress kwarg, which I used, probably 3 as recommended 
by the docstring, so that I wouldn't have a bajillion files representing my RF. 
So the zipping error makes perfect sense, except that I wouldn't expect gzip to 
change between Python versions. ;) I haven't tried using compress=0, but would 
like to avoid that if possible! (these test RFs are in my repo.)




I'm on a different computer right now so will submit pickle traceback later... 
But hoping there's a good joblib-based solution! =)




Juan.

On Fri, Jan 23, 2015 at 1:38 PM, Joel Nothman joel.noth...@gmail.com
wrote:

 Could you provide the traceback when using pickle? The joblib error is
 about zipping, which should not be applicable there...
 On 23 January 2015 at 13:30, Juan Nunez-Iglesias jni.s...@gmail.com wrote:
 Nope, the Py2 RF was saved with joblib!

 The SO response might work for standard pickling though, I'll give that a
 try, thanks!




 On Fri, Jan 23, 2015 at 11:18 AM, Sebastian Raschka se.rasc...@gmail.com
 wrote:

 Sorry, I think my previous message was a little bit ambiguous.

 What I would try is:

 1) Unpickle the original pickle file in Python 2
 2) Pickle it via joblib
 3) Load it in Python 3

 (I think you only did step 3), right? Sorry for the confusion).

 I also just saw a related SO post that might be very helpful:
 http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3

 Best,
 Sebastian


 On Jan 22, 2015, at 5:10 PM, jni.s...@gmail.com wrote:

 Hi Sebastian,

 Thanks for the response, but actually joblib doesn't work either:

  In [1]: from sklearn.externals import joblib

 In [2]: rf = joblib.load('rf-1.joblib')

 ---
 error Traceback (most recent call
 last)
 ipython-input-3-2c47f0ec1d5b in module()
  1 rf = joblib.load('rf-1.joblib')

 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in load(filename, mmap_mode)
 417   'ignoring mmap_mode %(mmap_mode)s
 flag passed'
 418   % locals(), Warning, stacklevel=2)
 -- 419 unpickler = ZipNumpyUnpickler(filename,
 file_handle=file_handle)
 420 else:
 421 unpickler = NumpyUnpickler(filename,
 file_handle=file_handle,

 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in __init__(self, filename, file_handle)
 306 NumpyUnpickler.__init__(self, filename,
 307 file_handle,
 -- 308 mmap_mode=None)
 309
 310 def _open_pickle(self, file_handle):

 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in __init__(self, filename, file_handle, mmap_mode)
 264 self._dirname = os.path.dirname(filename)
 265 self.mmap_mode = mmap_mode
 -- 266 self.file_handle = self._open_pickle(file_handle)
 267 Unpickler.__init__(self, self.file_handle)
 268 try:

 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in _open_pickle(self, file_handle)
 309
 310 def _open_pickle(self, file_handle):
 -- 311 return BytesIO(read_zfile(file_handle))
 312
 313

 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in read_zfile(file_handle)
  66 # We use the known length of the data to tell Zlib the size
 of the
  67 # buffer to allocate.
 --- 68 data = zlib.decompress(file_handle.read(), 15, length)
  69 assert len(data) == length, (
  70 Incorrect data length while decompressing %s.

 error: Error -3 while decompressing data: incorrect header check


 The very same commands work fine in Py2:

  In [1]: from sklearn.externals import joblib

 In [2]: rf1 = joblib.load('rf-1.joblib')

 In [3]:


 Is this unexpected?




 On Fri, Jan 23, 2015 at 1:57 AM, Sebastian Raschka se.rasc...@gmail.com
 wrote:

 Hi, Juan,

 It's been some time, but I remember that I had similar issues. I think
 it has to do with the numpy arrays that specifically cause problems in
 pickle. (http://bugs.python.org/issue6784)

 You could try to use joblib (which should also be more efficient):

  from sklearn.externals import joblib
  joblib.dump(clf, 'filename.pkl')
  clf = joblib.load('filename.pkl')

 (http://scikit-learn.org/stable/modules/model_persistence.html)


 Best,
 Sebastian

  On Jan 22, 2015, at 8:50 AM, jni.s...@gmail.com wrote:
 
  Hi all,
 
  I'm working on a project that depends on sklearn. I've been up test
 coverage (which includes saving 

Re: [Scikit-learn-general] Sharing objects between Python 2 and 3

2015-01-22 Thread Sebastian Raschka
Sorry, I think my previous message was a little bit ambiguous.

What I would try is:

1) Unpickle the original pickle file in Python 2
2) Pickle it via joblib
3) Load it in Python 3

(I think you only did step 3), right? Sorry for the confusion).

I also just saw a related SO post that might be very helpful: 
http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3
 
http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3

Best,
Sebastian


 On Jan 22, 2015, at 5:10 PM, jni.s...@gmail.com wrote:
 
 Hi Sebastian,
 
 Thanks for the response, but actually joblib doesn't work either:
 
 In [1]: from sklearn.externals import joblib
 
 In [2]: rf = joblib.load('rf-1.joblib')
 ---
 error Traceback (most recent call last)
 ipython-input-3-2c47f0ec1d5b in module()
  1 rf = joblib.load('rf-1.joblib')
 
 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
  in load(filename, mmap_mode)
 417   'ignoring mmap_mode %(mmap_mode)s 
 flag passed'
 418   % locals(), Warning, stacklevel=2)
 -- 419 unpickler = ZipNumpyUnpickler(filename, 
 file_handle=file_handle)
 420 else:
 421 unpickler = NumpyUnpickler(filename, 
 file_handle=file_handle,
 
 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
  in __init__(self, filename, file_handle)
 306 NumpyUnpickler.__init__(self, filename,
 307 file_handle,
 -- 308 mmap_mode=None)
 309
 310 def _open_pickle(self, file_handle):
 
 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
  in __init__(self, filename, file_handle, mmap_mode)
 264 self._dirname = os.path.dirname(filename)
 265 self.mmap_mode = mmap_mode
 -- 266 self.file_handle = self._open_pickle(file_handle)
 267 Unpickler.__init__(self, self.file_handle)
 268 try:
 
 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
  in _open_pickle(self, file_handle)
 309
 310 def _open_pickle(self, file_handle):
 -- 311 return BytesIO(read_zfile(file_handle))
 312
 313
 
 /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
  in read_zfile(file_handle)
  66 # We use the known length of the data to tell Zlib the size of the
  67 # buffer to allocate.
 --- 68 data = zlib.decompress(file_handle.read(), 15, length)
  69 assert len(data) == length, (
  70 Incorrect data length while decompressing %s.
 
 error: Error -3 while decompressing data: incorrect header check
 
 
 The very same commands work fine in Py2:
 
 In [1]: from sklearn.externals import joblib
 
 In [2]: rf1 = joblib.load('rf-1.joblib')
 
 In [3]:
 
 
 Is this unexpected?
 
 
 
 
 On Fri, Jan 23, 2015 at 1:57 AM, Sebastian Raschka se.rasc...@gmail.com 
 mailto:se.rasc...@gmail.com wrote:
 
 Hi, Juan, 
 
 It's been some time, but I remember that I had similar issues. I think it has 
 to do with the numpy arrays that specifically cause problems in pickle. 
 (http://bugs.python.org/issue6784) 
 
 You could try to use joblib (which should also be more efficient): 
 
  from sklearn.externals import joblib 
  joblib.dump(clf, 'filename.pkl') 
  clf = joblib.load('filename.pkl') 
 
 (http://scikit-learn.org/stable/modules/model_persistence.html)   
 
 
 Best, 
 Sebastian 
 
  On Jan 22, 2015, at 8:50 AM, jni.s...@gmail.com wrote: 
  
  Hi all, 
  
  I'm working on a project that depends on sklearn. I've been up test 
  coverage (which includes saving a RandomForest, so far using joblib 
  serialization), and now I wanted to make the project Python 3-compatible. 
  However, the final roadblock is the sharing of RF objects: I can't load the 
  Python 2-serialized RFs with Python 3 tests. Of course, the test outcome 
  depends on the exact RF that was created a while back. Is there any way 
  around this? 
  
  Thanks! 
  
  Juan. 
  
  
  --
   
  New Year. New Location. New Benefits. New Data Center in Ashburn, VA. 
  GigeNET is offering a free month of service with a new server in Ashburn. 
  Choose from 2 high performing configs, both with 100TB of bandwidth. 
  Higher redundancy.Lower latency.Increased capacity.Completely compliant. 
  http://p.sf.net/sfu/gigenet___ 
  Scikit-learn-general mailing list