Sorry -- I should add that I'm pointing out the potential shogun
implementation because I suspect their implementation of a
bag-of-words -like kernel would use the kernel trick, so you won't
have to map all of your data explicitly into some huge feature space
that will blow your memory away.
I'm n
Hi,
These suggestions still require you to explicitly compute your feature
space or kernel matrix first, which might kill you memory wise.
You might consider taking a look at the shogun toolbox:
http://www.shogun-toolbox.org/
With some digging, I'm pretty sure you'll find a bag-of-words type of
] SVM. How to use categorical attributes?
Sorry, I forgot to mention the following: all I wrote is only valid as long
as your number of samples is smaller than the number of different words. If
the number of samples exceeds the total number of different words, you
should better use the explicit matrix
Sorry, I forgot to mention the following: all I wrote is only valid as long
as your number of samples is smaller than the number of different words. If
the number of samples exceeds the total number of different words, you
should better use the explicit matrix representation and use some kernel
(e.
Alex,
To avoid the memory issue, you can directly use a "bag of words" kernel
(which corresponds to using the linear kernel on the sparse bag of words
matrix Steve suggested). Just a little toy example how this is done for two
:
> x1 <- c("how", "to", "grow", "tree")
> x2 <- c("where", "to", "go
://stats.stackexchange.com/questions/25355/multi-value-categorical-attributes-how-r
Thank you,
-Alex
From: Steve Lianoglou [mailinglist.honey...@gmail.com]
Sent: 27 March 2012 21:47
To: Alekseiy Beloshitskiy
Cc: r-help@r-project.org
Subject: Re: [R] SVM. How to use categorical
Hi,
On Tue, Mar 27, 2012 at 6:05 AM, Alekseiy Beloshitskiy
wrote:
> Hi All,
>
> Here is the case. I want to build classification model (SVM). Some of
> variables for this model are categorical attributes which represent words
> (usually 3-10 words - query for search in google). For example:
>
Hi All,
Here is the case. I want to build classification model (SVM). Some of variables
for this model are categorical attributes which represent words (usually 3-10
words - query for search in google). For example:
search_id | query_words|..| result
---+
8 matches
Mail list logo