Hi All,
As you may already know, Vertica announced it now supports user defined R
function. Does anybody tried this alreay or maybe have more info than Vertica
site?
http://www.vertica.com/content/vertica-an-hp-company-enables-users-to-connect-access-analyze-and-manage-any-data-anywhere/
Hi Triss,
If you need to stem just one text in the Corupus use a[[n]]-stemDocument
Best,
-Alex
From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of
Triss.Ashton [triss.ash...@unt.edu]
Sent: 02 May 2012 21:09
To:
Dear All,
I need to estimate the level of similarity of two strings. For example:
string1 - c(depending,audience,research, school);
string2 - c(audience,push,drama,button,depending);
The words in string may occur in different order though. What function would
you recommend to use to estimate
Hi, Taisa,
It depends on many paramfactors, e.g. nature of your data, volume of data set
etc.
The analog of SAS fastclus in R - kmeans (for practical example check slide #35
here:
http://www.slideshare.net/whitish/textmining-with-r)
Check also kmedoids (pam) and hclust.
Good luck,
-Alex
Thank you, Michael,
Right, I m looking for R implementation of Leventstein or or any other similar
approaches. Will try it.
Thank you again!
-Alex
From: R. Michael Weylandt [michael.weyla...@gmail.com]
Sent: 19 April 2012 19:01
Cc: Alekseiy Beloshitskiy
Check this
slideshare.net/whitish/textmining-with-r
Best,
-Alex
From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of
Deborah H. Deng [deborah.d...@alumni.utexas.net]
Sent: 13 April 2012 10:27
To: r-help@r-project.org
Subject: [R]
I would perform data pre-processing before loading in R.
Best,
-Alex
From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of
efulas [ef_u...@hotmail.com]
Sent: 13 April 2012 14:32
To: r-help@r-project.org
Subject: [R] R Large
Hi, Vinod,
Hope this will help you:
library(RJDBC)
#specify your mysql driver
drv - JDBC(com.vertica.Driver, ../vertica_3.5_jdk_5.jar)
# specify your connection string
conn - dbConnect(drv, jdbc:postgres://IP:port/dbname, login,
password)
#list tables
dbListTables(conn)
#get your distances
Hello, Jessica,
Can you please elaborate what you're trying to find with that function.
If e.g. you want to find best parameters C and gamma for RBF model (SVM), you
can use grid.py, check here:
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Function tune.control() in package e1071 is an R interface
Hello, Alexandre,
In R you can specify whether to use or not scaling with parameter scale, e.g.:
model-svm(class_var ~ ., data=trainset, scale=FALSE);
Don't forget to disable it if you already sclaed your data with libsvm
svm-scale algorithm.
Best,
-Alex
Dear All,
I m just curious if this is okay that SVM takes several hours to build the
model based on training data set with ~1 million observations and 7 columns (6
variables + 1 class variable). Here is my code:
model-svm(result ~ ., data=trainset)
where 'trainset' has 997594 obs. and 7
Dear All,
I m trying to build SVM model with large dataset. However, svm from package
e1071 works slowly (takes hours) to build model for like 1 million
observations. I was thinking about CVM (Core Vector Machines:
://stats.stackexchange.com/questions/25355/multi-value-categorical-attributes-how-r
Thank you,
-Alex
From: Steve Lianoglou [mailinglist.honey...@gmail.com]
Sent: 27 March 2012 21:47
To: Alekseiy Beloshitskiy
Cc: r-help@r-project.org
Subject: Re: [R] SVM. How to use categorical
Thank you so much, Ulrich,
Will play with this.
Best,
-Alex
From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of
Ulrich Bodenhofer [bodenho...@bioinf.jku.at]
Sent: 28 March 2012 14:40
To: r-help@r-project.org
Subject: Re: [R]
From: Jessica Streicher [j.streic...@micromata.de]
Sent: 27 March 2012 11:18
To: Alekseiy Beloshitskiy
Subject: Re: [R] normalization of multi-value string variable
Well, not sure what you mean with scaling and normalizing strings, but if you
want to represent
Hi All,
Here is the case. I want to build classification model (SVM). Some of variables
for this model are categorical attributes which represent words (usually 3-10
words - query for search in google). For example:
search_id | query_words|..| result
:)
Thank you,
-Alex
From: Jessica Streicher [j.streic...@micromata.de]
Sent: 27 March 2012 15:24
To: Alekseiy Beloshitskiy
Cc: r-help@r-project.org
Subject: Re: [R] normalization of multi-value string variable
Hm.. so what you need is either
- one new feature
Guys, let me add my 5 coins into your interesting discussion.
I have ~10Gb txt file with train data for my model. It has about 150 millions
rows for 12 variables.
When I load it into memory (just run only one row!):
train-read.table(file=/training.txt)
while loading it takes ~28Gb of RAM (It
Hello, I didn't quite understand what you need, but maybe you can have a look
here:
www.slideshare.net/whitish/textmining-with-r
R code fragments are in appendixes of the presentation.
Hope this will help,
-Alex
From: r-help-boun...@r-project.org
Hi All,
I need to scale variable of tokens (several words per observation) to be able
to use it for svm().
For example, i have varible x4=(how,grow,tree)
Any ideas how to scale to use in svm()?
Thank you,
-Alex
[[alternative HTML version deleted]]
Hi All,
I need to normalize/scale string variable which represents interests of
customers (e.g., 'cycling, rollerblading, swimming' etc).
Does anybody know how to do this, I want then use it along with other numeric
variables for SVM classification.
Appreciate for any advice.
-Alex
Dear All,
Let's suppose there's a case when you want to make a prediction using range of
variables. Some variables are represented as set of words (tokens). For example
there is a training set:
x1,x2,..,x7, y
where y - to be predicted (despite of the model to be used for prediction), and
let's
22 matches
Mail list logo