Re: Feature reduction for LibLinear weights

Ted Dunning Sat, 13 Apr 2013 20:47:42 -0700

On Sat, Apr 13, 2013 at 7:05 AM, Ken Krugler <[email protected]>wrote:

>
> On Apr 12, 2013, at 11:55pm, Ted Dunning wrote:
>
> > The first thing to try is feature hashing to reduce your feature vector
> size.
>
> Unfortunately LibLinear takes feature indices directly (assumes they're
> sequential ints from 0..n-1), so I don't think feature hashing will help
> here.
>

I am sure that it would.  The feature indices that you give to liblinear
don't have to be your original indices.

The simplest level of feature hashing would be to take the original feature
indices and use multiple hashing to get 1, 2 or more new feature index
values for each original index.  Then take these modulo the new feature
vector size (which can be much smaller than your original).

There will be some collisions, but the result here is a linear
transformation of the original space and if you use multiple indexes for
each original feature, you will lose very little, if anything.  The SVM
will almost always be able to learn around the effects of collisions.

> If I constructed a minimal perfect hash function then I could skip storing
> the mapping from feature to index, but that's not what's taking most of the
> memory; it's the n x m array of weights used by LibLinear.
>

Don't both with perfect hash.  Just use a decent hash.

That n x m array of weights will be 10x smaller if you have n/10.

 > With multiple probes and possibly with random weights you might be able
> to drop the size by 10x.
>
> More details here would be great, sometime when you're not trying to type
> on an iPhone :)
>

Re: Feature reduction for LibLinear weights

Reply via email to