What do you mean? It's pretty trivial to implement a one-hot encoding, the
issue is that if you use a non-sparse format then you'll end up with a
matrix which is far too dense to be practical, for anything but trivial
examples.
On Fri, Jun 21, 2013 at 10:46 AM, Maheshakya Wijewardena <
pmaheshak...@gmail.com> wrote:
> I'd like to analyse a bit and encode using that method to cohere with
> random forests in scikit-learn.
>
>
> On Fri, Jun 21, 2013 at 2:08 PM, Peter Prettenhofer <
> peter.prettenho...@gmail.com> wrote:
>
>> ? you already use one-hot encoding in your example (
>> preprocessing.OneHotEncoder)
>>
>>
>> 2013/6/21 Maheshakya Wijewardena <pmaheshak...@gmail.com>
>>
>>> can anyone give me a sample algorithm for one hot encoding used in
>>> scikit-learn?
>>>
>>>
>>> On Thu, Jun 20, 2013 at 8:37 PM, Peter Prettenhofer <
>>> peter.prettenho...@gmail.com> wrote:
>>>
>>>> you can try an ordinal encoding instead - just map each categorical
>>>> value to an integer so that you end up with 8 numerical features - if you
>>>> use enough trees and grow them deep it may work
>>>>
>>>>
>>>> 2013/6/20 Maheshakya Wijewardena <pmaheshak...@gmail.com>
>>>>
>>>>> And yes Gilles, It is the Amazon challenge :D
>>>>>
>>>>>
>>>>> On Thu, Jun 20, 2013 at 8:21 PM, Maheshakya Wijewardena <
>>>>> pmaheshak...@gmail.com> wrote:
>>>>>
>>>>>> The shape of X after encoding is (32769, 16600). Seems as if that is
>>>>>> too big to be converted into a dense matrix. Can Random forest handle
>>>>>> this
>>>>>> amount of features?
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 20, 2013 at 7:31 PM, Olivier Grisel <
>>>>>> olivier.gri...@ensta.org> wrote:
>>>>>>
>>>>>>> 2013/6/20 Lars Buitinck <l.j.buiti...@uva.nl>:
>>>>>>> > 2013/6/20 Olivier Grisel <olivier.gri...@ensta.org>:
>>>>>>> >>> Actually twice as much, even on a 32-bit platform (float size is
>>>>>>> >>> always 64 bits).
>>>>>>> >>
>>>>>>> >> The decision tree code always uses 32 bits floats:
>>>>>>> >>
>>>>>>> >>
>>>>>>> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L38
>>>>>>> >>
>>>>>>> >> but you have to cast your data to `dtype=np.float32` in fortran
>>>>>>> layout
>>>>>>> >> ahead of time to avoid the memory copy.
>>>>>>> >
>>>>>>> > OneHot produces np.float, though, which is float64.
>>>>>>>
>>>>>>> Alright but you could convert it to np.float32 before calling
>>>>>>> toarray.
>>>>>>> But anyway this kind of sparsity level is unsuitable for random
>>>>>>> forests anyways I think.
>>>>>>>
>>>>>>> --
>>>>>>> Olivier
>>>>>>> http://twitter.com/ogrisel - http://github.com/ogrisel
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> This SF.net email is sponsored by Windows:
>>>>>>>
>>>>>>> Build for Windows Store.
>>>>>>>
>>>>>>> http://p.sf.net/sfu/windows-dev2dev
>>>>>>> _______________________________________________
>>>>>>> Scikit-learn-general mailing list
>>>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> This SF.net email is sponsored by Windows:
>>>>>
>>>>> Build for Windows Store.
>>>>>
>>>>> http://p.sf.net/sfu/windows-dev2dev
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Peter Prettenhofer
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> This SF.net email is sponsored by Windows:
>>>>
>>>> Build for Windows Store.
>>>>
>>>> http://p.sf.net/sfu/windows-dev2dev
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Windows:
>>>
>>> Build for Windows Store.
>>>
>>> http://p.sf.net/sfu/windows-dev2dev
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> --
>> Peter Prettenhofer
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Windows:
>>
>> Build for Windows Store.
>>
>> http://p.sf.net/sfu/windows-dev2dev
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general