[R] Who knows how to use it in Vertica 6

2012-06-04 Thread Alekseiy Beloshitskiy
Hi All, As you may already know, Vertica announced it now supports user defined R function. Does anybody tried this alreay or maybe have more info than Vertica site? http://www.vertica.com/content/vertica-an-hp-company-enables-users-to-connect-access-analyze-and-manage-any-data-anywhere/

Re: [R] Help with stemDocument

2012-05-10 Thread Alekseiy Beloshitskiy
Hi Triss, If you need to stem just one text in the Corupus use a[[n]]-stemDocument Best, -Alex From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of Triss.Ashton [triss.ash...@unt.edu] Sent: 02 May 2012 21:09 To:

[R] Compare String Similarity

2012-04-19 Thread Alekseiy Beloshitskiy
Dear All, I need to estimate the level of similarity of two strings. For example: string1 - c(depending,audience,research, school); string2 - c(audience,push,drama,button,depending); The words in string may occur in different order though. What function would you recommend to use to estimate

Re: [R] Cluster Analysis

2012-04-19 Thread Alekseiy Beloshitskiy
Hi, Taisa, It depends on many paramfactors, e.g. nature of your data, volume of data set etc. The analog of SAS fastclus in R - kmeans (for practical example check slide #35 here: http://www.slideshare.net/whitish/textmining-with-r) Check also kmedoids (pam) and hclust. Good luck, -Alex

Re: [R] Compare String Similarity

2012-04-19 Thread Alekseiy Beloshitskiy
Thank you, Michael, Right, I m looking for R implementation of Leventstein or or any other similar approaches. Will try it. Thank you again! -Alex From: R. Michael Weylandt [michael.weyla...@gmail.com] Sent: 19 April 2012 19:01 Cc: Alekseiy Beloshitskiy

Re: [R] Help with stemDocument

2012-04-13 Thread Alekseiy Beloshitskiy
Check this slideshare.net/whitish/textmining-with-r Best, -Alex From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of Deborah H. Deng [deborah.d...@alumni.utexas.net] Sent: 13 April 2012 10:27 To: r-help@r-project.org Subject: [R]

Re: [R] R Large Dataset Problem

2012-04-13 Thread Alekseiy Beloshitskiy
I would perform data pre-processing before loading in R. Best, -Alex From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of efulas [ef_u...@hotmail.com] Sent: 13 April 2012 14:32 To: r-help@r-project.org Subject: [R] R Large

Re: [R] Constructing Distance matrix for hclust

2012-04-04 Thread Alekseiy Beloshitskiy
Hi, Vinod, Hope this will help you: library(RJDBC) #specify your mysql driver drv - JDBC(com.vertica.Driver, ../vertica_3.5_jdk_5.jar) # specify your connection string conn - dbConnect(drv, jdbc:postgres://IP:port/dbname, login, password) #list tables dbListTables(conn) #get your distances

Re: [R] e1071 tune.control() random parameter

2012-04-03 Thread Alekseiy Beloshitskiy
Hello, Jessica, Can you please elaborate what you're trying to find with that function. If e.g. you want to find best parameters C and gamma for RBF model (SVM), you can use grid.py, check here: http://www.csie.ntu.edu.tw/~cjlin/libsvm/ Function tune.control() in package e1071 is an R interface

Re: [R] TR: [e1071] Load an SVM model exported with write.svm

2012-04-03 Thread Alekseiy Beloshitskiy
Hello, Alexandre, In R you can specify whether to use or not scaling with parameter scale, e.g.: model-svm(class_var ~ ., data=trainset, scale=FALSE); Don't forget to disable it if you already sclaed your data with libsvm svm-scale algorithm. Best, -Alex

[R] SVM works slowly with 1M observations

2012-03-29 Thread Alekseiy Beloshitskiy
Dear All, I m just curious if this is okay that SVM takes several hours to build the model based on training data set with ~1 million observations and 7 columns (6 variables + 1 class variable). Here is my code: model-svm(result ~ ., data=trainset) where 'trainset' has 997594 obs. and 7

[R] SVM performance/optimization

2012-03-29 Thread Alekseiy Beloshitskiy
Dear All, I m trying to build SVM model with large dataset. However, svm from package e1071 works slowly (takes hours) to build model for like 1 million observations. I was thinking about CVM (Core Vector Machines:

Re: [R] SVM. How to use categorical attributes?

2012-03-28 Thread Alekseiy Beloshitskiy
://stats.stackexchange.com/questions/25355/multi-value-categorical-attributes-how-r Thank you, -Alex From: Steve Lianoglou [mailinglist.honey...@gmail.com] Sent: 27 March 2012 21:47 To: Alekseiy Beloshitskiy Cc: r-help@r-project.org Subject: Re: [R] SVM. How to use categorical

Re: [R] SVM. How to use categorical attributes?

2012-03-28 Thread Alekseiy Beloshitskiy
Thank you so much, Ulrich, Will play with this. Best, -Alex From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of Ulrich Bodenhofer [bodenho...@bioinf.jku.at] Sent: 28 March 2012 14:40 To: r-help@r-project.org Subject: Re: [R]

Re: [R] normalization of multi-value string variable

2012-03-27 Thread Alekseiy Beloshitskiy
From: Jessica Streicher [j.streic...@micromata.de] Sent: 27 March 2012 11:18 To: Alekseiy Beloshitskiy Subject: Re: [R] normalization of multi-value string variable Well, not sure what you mean with scaling and normalizing strings, but if you want to represent

[R] SVM. How to use categorical attributes?

2012-03-27 Thread Alekseiy Beloshitskiy
Hi All, Here is the case. I want to build classification model (SVM). Some of variables for this model are categorical attributes which represent words (usually 3-10 words - query for search in google). For example: search_id | query_words|..| result

Re: [R] normalization of multi-value string variable

2012-03-27 Thread Alekseiy Beloshitskiy
:) Thank you, -Alex From: Jessica Streicher [j.streic...@micromata.de] Sent: 27 March 2012 15:24 To: Alekseiy Beloshitskiy Cc: r-help@r-project.org Subject: Re: [R] normalization of multi-value string variable Hm.. so what you need is either - one new feature

Re: [R] Memory Utilization on R

2012-03-27 Thread Alekseiy Beloshitskiy
Guys, let me add my 5 coins into your interesting discussion. I have ~10Gb txt file with train data for my model. It has about 150 millions rows for 12 variables. When I load it into memory (just run only one row!): train-read.table(file=/training.txt) while loading it takes ~28Gb of RAM (It

Re: [R] how to cluster rows of words in a text file

2012-03-26 Thread Alekseiy Beloshitskiy
Hello, I didn't quite understand what you need, but maybe you can have a look here: www.slideshare.net/whitish/textmining-with-r R code fragments are in appendixes of the presentation. Hope this will help, -Alex From: r-help-boun...@r-project.org

[R] how to scale tokens

2012-03-26 Thread Alekseiy Beloshitskiy
Hi All, I need to scale variable of tokens (several words per observation) to be able to use it for svm(). For example, i have varible x4=(how,grow,tree) Any ideas how to scale to use in svm()? Thank you, -Alex [[alternative HTML version deleted]]

[R] normalization of multi-value string variable

2012-03-26 Thread Alekseiy Beloshitskiy
Hi All, I need to normalize/scale string variable which represents interests of customers (e.g., 'cycling, rollerblading, swimming' etc). Does anybody know how to do this, I want then use it along with other numeric variables for SVM classification. Appreciate for any advice. -Alex

[R] How do you scale variables which consist of tokens

2012-03-23 Thread Alekseiy Beloshitskiy
Dear All, Let's suppose there's a case when you want to make a prediction using range of variables. Some variables are represented as set of words (tokens). For example there is a training set: x1,x2,..,x7, y where y - to be predicted (despite of the model to be used for prediction), and let's