[scikit-learn] Categorical handling

2017-08-17 Thread Georg Heiler
Hi, how can I properly handle categorical values in scikit-learn? https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934 goals - scikit-learn syle fit/transform methods to encode labels of categorical features of X - should handl

Re: [scikit-learn] caching transformers during hyper parameter optimization

2017-08-16 Thread Georg Heiler
e the transform is costly? Or is it > more a matter of you wanting to store the transformed data at each step? > > There are custom ways to do this sort of thing generically with a mixin if > you really want. > > On 16 August 2017 at 21:28, Georg Heiler > wrote: > >>

[scikit-learn] caching transformers during hyper parameter optimization

2017-08-16 Thread Georg Heiler
There is a new option in the pipeline: http://scikit-learn.org/stable/modules/pipeline.html#pipeline-cache How can I use this to also store the transformed data as I only want to compute the last step i.e. estimator during hyper parameter tuning and not the transform methods of the clean steps? Is

Re: [scikit-learn] transform categorical data to numerical representation

2017-08-06 Thread Georg Heiler
To my understanding pandas.factorize only works for the static case where no unseen variables can occur. Georg Heiler schrieb am Mo. 7. Aug. 2017 um 08:40: > I will need to look into factorize. Here is the result from profiling the > transform method on a single new observation &

Re: [scikit-learn] transform categorical data to numerical representation

2017-08-06 Thread Georg Heiler
all possible values that could > occur, do the transformation, and then only pass the 1 transformed sample > to the classifier. I guess that could be even slow though ... > > Best, > Sebastian > > > On Aug 6, 2017, at 6:30 AM, Georg Heiler > wrote: > >

Re: [scikit-learn] transform categorical data to numerical representation

2017-08-06 Thread Georg Heiler
there's no >> way around doing this manually; for example you could create mapping >> dictionaries for that (most conveniently done in pandas). >> >> Best, >> Sebastian >> >> > On Aug 5, 2017, at 5:10 AM, Georg Heiler >> wrote: >> > >

[scikit-learn] transform categorical data to numerical representation

2017-08-05 Thread Georg Heiler
Hi, the LabelEncooder is only meant for a single column i.e. target variable. Is the DictVectorizeer or a manual chaining of multiple LabelEncoders (one per categorical column) the desired way to get values which can be fed into a subsequent classifier? Is there some way I have overlooked which w

Re: [scikit-learn] Broken c dependencies

2017-05-10 Thread Georg Heiler
May 10, 2017 at 5:17 AM, Georg Heiler > wrote: > > Hi Matthew, > > > > indeed, that works fine. But what was the Problem? Installation from > source > > should have worked fine? > > Yes, it should, and I don't know what the problem is. > > I just compi

Re: [scikit-learn] Broken c dependencies

2017-05-09 Thread Georg Heiler
> On Tue, May 9, 2017 at 6:27 PM, Georg Heiler > wrote: > >> Yes just like that. > > > > Hum - you shouldn't get what I got, because I was installing for > > Python 3.5, and there is a wheel for Python 3.5. I now see there > > isn't a wheel for OSX Pyth

Re: [scikit-learn] Broken c dependencies

2017-05-09 Thread Georg Heiler
Yes just like that. Even when completely removing the python library folder the error persists Meanwhile I set up a conda environment that works but I would prefer a plain pip installation. Matthew Brett schrieb am Di. 9. Mai 2017 um 19:17: > Hi, > > On Tue, May 9, 2017 at 6:00 PM, Geo

Re: [scikit-learn] Broken c dependencies

2017-05-09 Thread Georg Heiler
cked that it > is available? E.g. Via xcode-select -p > BTW does NumPy / SciPy work on your install or is it just sklearn? > > Best, > Sebastian > > > > Sent from my iPhone > On May 9, 2017, at 11:36 AM, Georg Heiler > wrote: > > Hi, > > unfortunately, the c

[scikit-learn] Broken c dependencies

2017-05-09 Thread Georg Heiler
Hi, unfortunately, the c dependencies of my scikit-learn installation broke and I get the following error on osx: dlopen(/usr/local/lib/python3.6/site-packages/sklearn/svm/libsvm.cpython-36m-darwin.so, 2): Symbol not found: __ZdlPvm Referenced from: /usr/local/lib/python3.6/site-packages/sklear