Re: [R] Using R for Production - Discussion

2010-11-02 Thread Saeed Abu Nimeh
I worked on a project where we used a random forest classifier to
predict a binary response. We trained a model in the ec2 cloud with 3
million observations and 44 features. We stored the model that was
generated by R using save(mymodel,file=model.Rdata). Now we use
model.Rdata locally to predict new observations.
In our local system, we built a parser in Perl to generate the csv
representation of the observation we want to predict, then we used
RSPerl to communicate between Perl and R. But there is a catch,
instead of loading the random forest model (model.Rdata) every time we
want to predict a new observation, we have an R console running as a
daemon with the model.Rdata loaded already. Then, we send the
observation to be predicted from Perl to R. If anyone else has better
solutions/ideas, please feel free to share.
Thanks,
Saeed

On Mon, Nov 1, 2010 at 9:04 PM, Santosh Srinivas
santosh.srini...@gmail.com wrote:
 Hello Group,

 This is an open-ended question.

 Quite fascinated by the things I can do and the control I have on my
 activities since I started using R.
 I basically have been using this for analytical related work off my desktop.
 My experience has been quite good and most issues where I need to
 investigate and solve are typical items more related to data errors, format
 corruption, etc... not necessarily R Related.

 Complementing this with Python gives enough firepower to do lots of
 production (analytical related activities) on the cloud (from my research I
 see that every innovative technology provider seems to support Python ...
 google, amazon, etc).

 Question on using R for Production activities:
 Q1) Does anyone have experience of using R-scripts etc ... for production
 related activities. E.g. serving off a computational/ analytical /
 simulation environment from a webportal with the analytical processing done
 in R.
 I've seen that most useful things for normal (not rocket science) business
 (80-20 rule) can be done just as well in R in comparison with tools like
 SAS, Matlab, etc.

 Q2) I haven't tried the processing routines for much larger data-sets
 assuming size is not a constraint nowadays.
 I know that I should try out ... but any forewarnings would help. Is it
 likely that something that works for my desktop dataset is quite as likely
 to work when scaled up to a cloud dataset?
 Assuming that I do the clearing out of unused objects, not running into
 infinite loops, etc?

 i.e. is there any problem with the fundamental architecture of R itself?
 (like press articles often say)


 Q3) There are big fans of the SAS, Matlab, Mathworks environments out there
  does anyone have a comparison of how R fares.
 From my experience R is quite neat and low level ... so overheads should be
 quite low.
 Most slowness comes due to lack of knowledge (see my code ... like using the
 wrong structures, functions, loops, etc.) rather than something wrong with
 the way R itself is.
 Perhaps there is no commercial focus to enhance performance related issues
 but my guess is that it is just matter of time till the community evolves
 the language to score higher on that too.
 And perhaps develops documentation to assist the challenge users with
 performance tips (the ten commandments types)

 Q4) You must have heard about the latest comment from James Goodnight of SAS
 ... We haven't noticed that a lot. Most of our companies need industrial
 strength software that has been tested, put through every possible scenario
 or failure to make sure everything works correctly.
 My gut is that random passionate geeks (playing part-time) do better
 testing than a military of professionals ... (but I've no empirical evidence
 here)

 I am not taking a side here (although I appreciate those who do!) .. but
 looking for an objective reasoning.

 Thanks,
 S

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with time in R

2010-07-21 Thread Saeed Abu Nimeh
You can use strptime to specify the format of the date and time you want, e.g.

 x1-strptime(x, %Y-%m-%d %H:%M:%S)
 x1
[1] 2010-04-02 12:00:05
 str(x1)
 POSIXlt[1:1], format: 2010-04-02 12:00:05

On Wed, Jul 21, 2010 at 8:02 AM, Aaditya Nanduri
aaditya.nand...@gmail.com wrote:
 Ms. Chisholm,

 If you could tell us how you plan to use the variables, we will have a
 better understanding of what you are looking for and will be able to help
 you.
 Are you looking for the time in seconds? In that case, do as Mr. Holfman
 says. He just skipped the part about converting the factors to characters.
 You can do that by:
 y - as.character(x) where x is the vector of factors.

 Are you looking to have a list of hours, minutes and seconds? That can be
 done too...Although it would be much easier to just have hours and min.sec

 On Tue, Jul 20, 2010 at 7:33 AM, Sarah Chisholm sarah.chisholm...@ucl.ac.uk
 wrote:

 Hi,

 I have a problem with the time formatting in R. I have entered time in the
 format MM:SS.xyz and R has automatically classified this as a factor, but
 I need it numerically. However when I use as.numeric() it gives me totally
 different numbers. Is there any way I can tell R to read thes input as a
 number?

 Thank you very much

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Aaditya Nanduri
 aaditya.nand...@gmail.com

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Figures in Latex

2010-07-23 Thread Saeed Abu Nimeh
http://nixtricks.wordpress.com/2009/11/09/latex-multiple-figures-under-the-same-caption-using-subfigure/
It will create two rows of subfigures with two subfigures on each row

On Fri, Jul 23, 2010 at 6:43 AM, li li hannah@gmail.com wrote:
 Hi all,
   I want to add 6 plots in the format of 2 columns and 3 rows as one
 figure in latex. The plots are in .eps file.
 I know how to add 2 plots side by side, but could not figure out how to do
 multiple rows.
  I know this may not be the right place to ask such a question. But I do
 not know who to ask, so just try my
 luck here.
  Thank you in advance.
                                                      Hannah

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] transforming dates into years

2010-08-13 Thread Saeed Abu Nimeh
myFrame$year-years(strptime(x))

On Fri, Aug 13, 2010 at 12:36 PM, Dimitri Liakhovitski
dimitri.liakhovit...@gmail.com wrote:
 Hello!

 If I have in my data frame MyFrame a variable saved as a Date and want
 to translate it into years, I currently do it like this using zoo:

 library(zoo)
 as.year - function(x) as.numeric(floor(as.yearmon(x)))
 myFrame$year-as.year(myFrame$date)

 Is there a function that would do it directly - like as.yearmon -
 but for years?

 Thank you!


 --
 Dimitri Liakhovitski
 Ninah Consulting
 www.ninah.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Importance of levels in a factor variable

2010-08-26 Thread Saeed Abu Nimeh
I have a dataset of multiple variables and a response. For example,
 str(x)
'data.frame':   3557238 obs. of  44 variables:
 $ response :  Factor w/ 2 levels
 $ var2: Factor w/5000 levels


If var2 for example is a factor with 5000 levels, what is the best
approach to determine which of these levels is the most important to
include in building the model, and which ones to discard. Assuming
there is a way to do that, is it accurate to only include the
important levels and discard the rest for that variable when building
the model.
Thansk,
Saeed

---
 sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-pc-linux-gnu
32 GB RAM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looking for an image (R 64-bit on Linux 64-bit) on Amazon EC2

2010-08-26 Thread Saeed Abu Nimeh
No need to do that. They have some instances that run 64-bit ubuntu.
If I remember correctly we had to install 64-bit R from the debian
packages on the ubuntu instance.

On Wed, Aug 25, 2010 at 6:12 PM, noclue_ tim@netzero.net wrote:


 You have a 64 bit Linux?  If so...

Dowload the sources

 Do you mean download Linux kernel source code and then compile it on Amazon
 EC2?


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Looking-for-an-image-R-64-bit-on-Linux-64-bit-on-Amazon-EC2-tp2338938p2339072.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Importance of levels in a factor variable

2010-08-27 Thread Saeed Abu Nimeh
Thanks Greg. Actually, we have 5000 levels and it is not an import
problem. I looked into combine.levels in the Hmisc package. The
problem with this approach is that it takes the frequency of levels,
then combines infrequent levels into one level called Others. If you
apply this to the complete dataset (positive and negative samples),
and if the number of negative samples is much greater than the
positive ones, then most of the levels of the positive samples will go
into the Others level in the final result. Thats why I was wondering
if there is a more accurate way to remove the unimportant levels.

On Thu, Aug 26, 2010 at 3:47 PM, Greg Snow greg.s...@imail.org wrote:
 A factor with 5000 levels looks like it may be a numeric variable that was 
 accidently coded as a factor (functions like read.table will do this if there 
 is a non numeric character in with the numbers).

 If you really have a 5000 level factor, which levels can be discarded or 
 combined is a question for the subject specific scientist, not the 
 statistician.

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Saeed Abu Nimeh
 Sent: Thursday, August 26, 2010 1:40 PM
 To: r-help@r-project.org
 Subject: [R] Importance of levels in a factor variable

 I have a dataset of multiple variables and a response. For example,
  str(x)
 'data.frame':   3557238 obs. of  44 variables:
  $ response :  Factor w/ 2 levels
  $ var2: Factor w/5000 levels


 If var2 for example is a factor with 5000 levels, what is the best
 approach to determine which of these levels is the most important to
 include in building the model, and which ones to discard. Assuming
 there is a way to do that, is it accurate to only include the
 important levels and discard the rest for that variable when building
 the model.
 Thansk,
 Saeed

 ---
  sessionInfo()
 R version 2.10.1 (2009-12-14)
 x86_64-pc-linux-gnu
 32 GB RAM

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Collapsing levels of categorical variables

2010-08-31 Thread Saeed Abu Nimeh
In this paper [1] the author mentioned a procedure by M. Greenace that
can be used to collapse the levels of a categorical variable by
setting up a
table with the frequency of each level and the proportion of the
target value in each level. Then collapsing the table
level by level looking at the change in chi-square as the table is
collapsed. Does anyone know if such a procedure is available in R.

[1] http://www2.sas.com/proceedings/sugi31/079-31.pdf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Communicating with an R daemon from perl

2010-10-08 Thread Saeed Abu Nimeh
Is there a way to communicate with a running R daemon from perl. I
tried RSPerl but the functions there initiate an R instance first. I
would like to keep an R instance running in the background and
communicate with it using Perl.
The problem is due to a large object that we need which has to be
loaded every time the R instance is initialized:
load(file=model.Rdata).
Thanks,
Saeed

---
R version 2.11.1 (2010-05-31)
x86_64-pc-linux-gnu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] can not print probabilities in svm of e1071

2010-04-29 Thread Saeed Abu Nimeh
 svm.model - svm(y~.,data=dataset,probability=TRUE)
 svm.pred-predict(svm.model, test.set, decision.values = TRUE,
probability = TRUE)
 library(ROCR)
 svm.roc - prediction(attributes(svm.pred)$decision.values, test.set)
 svm.auc - performance(svm.roc, 'tpr', 'fpr')
 plot(svm.auc)


On Thu, Apr 29, 2010 at 4:17 PM, Changbin Du changb...@gmail.com wrote:
 x - train[,c( 2:18, 20:21, 24, 27:31)]
 y - train$out

 svm.pr - svm(x, y, probability = TRUE, method=C-classification,
 kernel=radial, cost=bestc, gamma=bestg, cross=10)

 pred - predict(svm.pr, valid[,c( 2:18, 20:21, 24, 27:31)],
 decision.values = TRUE, probability = TRUE)
      attr(pred, decision.values)[1:4,]
        16         23         43         52
 1.08157648 0.51241842 0.06234319 1.20656580
      attr(pred, probabilities)[1:4,]
 NULL


 HI, Dear David and R community,

 I am trying to print out the probabilities and set a threshold for make ROC
 curve.  I dont know  why  it showed NULL for the probabilities.

 y-train$out, is consisting of 0 and 1 binary values.

 Can you help me with this?

 Thanks so much!



 --
 Sincerely,
 Changbin
 --

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ROC curve in R

2010-07-01 Thread Saeed Abu Nimeh
Try the ROCR package. http://rocr.bioinf.mpi-sb.mpg.de/ROCR.pdf
Saeed

On Thu, Jul 1, 2010 at 9:50 AM, ashu6886 ashu.infy.m...@gmail.com wrote:

 Hi,

 i have a fairly large amount of genomic data. I have created a dataframe
 which has Reference as one column and Variation as another. I want to
 plot a ROC curve based on these 2 columns. I have serached the R manual but
 I could not understand. Can anybody help me with the R code for plotting ROC
 curve.

 Thnx
 ashu6886
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/ROC-curve-in-R-tp2275431p2275431.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Function similar to combine.levels in Hmisc package

2010-07-09 Thread Saeed Abu Nimeh
Is there a function similar to combine.levels ( in the Hmisc package)
that combines the levels of factors, but not based on their frequency.
Alternatively, I am looking into using the significance of the dummy
variables of factors based on their Pr(|t|) value using the linear
model, then deleting the non-significant levels. Any other
suggestions?
Thanks,
Saeed

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to make R plot under Linux

2010-02-22 Thread Saeed Abu Nimeh
Try to install xming in your windows box
http://www.straightrunning.com/XmingNotes/. Make sure to run xming
before plotting.
Saeed

On Mon, Feb 22, 2010 at 12:46 PM, xin wei xin...@stat.psu.edu wrote:

 hi, Guys:
 thank you so much for all the suggestion. Now I seem to be able to set up
 x11 forwarding in PUTTY. however, I still could not get plot and I get the
 following error msg:

  Error in function (display = , width, height, pointsize, gamma, bg,  :
  X11 I/O error while opening X11 connection to 'localhost:20.0'

 Is this error msg indication of lack of appropriate plotting package on the
 server or the server is not properly set up for X11 forwarding?

 thanks
 --
 View this message in context: 
 http://n4.nabble.com/how-to-make-R-plot-under-Linux-tp1562060p1565113.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Graphics into Latex‏

2010-02-24 Thread Saeed Abu Nimeh
Use \usepackage{epsfig} after your \documentclass. Then make sure to
run LaTex not PDFLaTex

On Wed, Feb 24, 2010 at 3:29 PM, Lars Bishop lars...@gmail.com wrote:
 Hi,

 I'm new in Latex and I'm trying to include an R chart into a Latex document.

 This is what I'm doing:

 1) In R: save the chart as a a Postcript in a folder C:/xxx/Density.eps

 2) In Latex (using TexWorks on windows xp) :

 In the preambule:

 \documentclass[11pt]{article}
 \usepackage{graphicx}
 \begin{document}

 blah..blah…blah

 \begin{figure}
 \centering
 \includegraphics{C:/xxx/Density.eps}
 \label{fig:Density}
 \end{figure}

 --This is the Error Message I'm getting:

 LaTeX Warning: File `R:/MarsTH/Studies/Misc/LIA QA/R/Density.eps' not found
 on

 input line 26.

 ! LaTeX Error: Unknown graphics extension: .eps.

 See the LaTeX manual or LaTeX Companion for explanation.

 Type H return for immediate help.

 I'll appreciate your help.
 Thanks in advance,

 Lars.

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] snow package on multi core unix box

2007-12-05 Thread Saeed Abu Nimeh
Is the rmpi package (or rpvm) needed to exploit multiple cores on a
single unix box using the snow package. The documentation of the package
does not provide info about setting up a single machine with multiple
cores. Also, if how effective is it to run a bayesian simulation on
parallel (or distributed) processors using the snow package.
Thanks,
Saeed

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R on a multi core unix box

2007-12-06 Thread Saeed Abu Nimeh
Hi,
I installed the snow package on a unix box that has multiple cores. To be
able to exploit the multiple cores (on one pc) do I still need to install
the rmpi package (or rpvm). Another question, if i run a bayesian simulation
on the multiple core after setting them up correctly (using snow), would you
think there will be a noticeable speedup gain.
Thanks,
Saeed
---

linux centos
4 dual core processors
32 gb ram
R (2.6.0)
snow 0.29

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dual Core vs Quad Core

2007-12-17 Thread Saeed Abu Nimeh
I ran a bayesian simulation sometime ago and it took me 1 week to finish
on a debian box (Dell PE 2850  Dual Intel [EMAIL PROTECTED]  6GB). I think it
depends on the setting of the experiment and whether the code can be
parallelized.

Simon Blomberg wrote:
 I've been running R on a quad-core using Debian Gnu/Linux since March
 this year, and I am very pleased with the performance.
 
 Simon.
 
 
 On Mon, 2007-12-17 at 20:13 -0500, Andrew Perrin wrote:
 On Mon, 17 Dec 2007, Kitty Lee wrote:

 Dear R-users,

 I use R to run spatial stuff and it takes up a lot of ram. Runs can take 
 hours or days. I am thinking of getting a new desktop. Can R take advantage 
 of the dual-core system?

 I have a dual-core computer at work. But it seems that right now R is using 
 only one processor.

 The new computers feature quad core with 3GB of RAM. Can R take advantage 
 of the 4 chips? Or am I better off getting a dual core with faster 
 processing speed per chip?

 Thanks! Any advice would be really appreciated!

 K.
 If I have my information right, R will use dual- or quad-cores if it's 
 doing two (or four) things at once. The second core will help a little bit 
 insofar as whatever else your machine is doing won't interfere with the 
 one core on which it's running, but generally things that take a single 
 thread will remain on a single core.

 As for RAM, if you're doing memory-bound work you should certainly be 
 using a 64-bit machine and OS so you can utilize the larger memory space.


 --
 Andrew J Perrin - andrew_perrin (at) unc.edu - http://perrin.socsci.unc.edu
 Associate Professor of Sociology; Book Review Editor, _Social Forces_
 University of North Carolina - CB#3210, Chapel Hill, NC 27599-3210 USA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Installing R on BSD

2008-01-08 Thread Saeed Abu Nimeh

add_pkg -r R

Kitty Lee wrote:
 Dear users,
 
 I try to follow the instruction on this page to install R on 4.4BSD network. 
 
 http://cran.r-project.org/doc/manuals/R-admin.html#Using-make
 
 
 I can unpack the file but the system can't recognize the command:
 
 ./configure
 make
 
 Any ideas what should be the right command?
 
 Thanks!!
 
 
 K.
 
 

 -
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Invoking R on BSD

2008-01-08 Thread Saeed Abu Nimeh
when you do pkg_add -r R it should install R and you will not need to 
run make. To invoke R, you just need to type R in your prompt. Here is 
what I have on my FreeBSD:

FreeBSD 7.0-PRERELEASE (GENERIC2) #0: Sat Jan  5 21:27:47 CST 2008

Welcome to 

%R

R version 2.6.0 (2007-10-03)
Copyright (C) 2007 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

 

Kitty Lee wrote:
 Thanks to Saeed Abu Nimeh. I used pkg_add to install R package on 4.4BSD. 
 
 My directory now has the following:
 
 BUILDDIRMakefrag.cc_lo  config.log  m4  tests
 MakeconfMakefrag.cxxconfig.status   po  tools
 MakefileR-2.6.1 doc roots
 Makefile.bakR-2.6.1.tar.gz  etc share
 Makefrag.cc SVN-REVISIONlibtool src
 
 
 But the make check shows errors:
 
  [2:32pm][~]  make check 
 `Makedeps' is up to date.
 make: don't know how to make ../../bin/exec/R. Stop
 *** Error code 2
 
 Stop in /usr/home/xxx/tests/Examples.
 *** Error code 1
 
 Stop in /usr/home/xxx/tests.
 *** Error code 1
 
 Stop in /usr/home/xxx/tests.
 *** Error code 1
 
 
 How to fix this error? And then what are the steps involved to invoke R?
 
 I know eventually I need to use commands like R CMD. But what are the steps 
 before this?
 
 (Sorry, I have not done anything before on unix and am trying to figure 
 things out from bits and pieces off the internet. I would truly appreciate 
 any help or hint!)
 
 K.
 

 -
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ROCR package finding maximum accuracy and optimal cutoff point

2009-03-26 Thread Saeed Abu Nimeh
If we use the ROCR package to find the accuracy of a classifier
pred - prediction(svm.pred, testset[,2])
perf.acc - performance(pred,acc)

Do we find the maximum accuracy as follows (is there a simplier way?):
 max(perf@x.values[[1]])

Then to find the cutoff point that maximizes the accuracy do we do the
following (is there a simpler way):
 cutoff.list - unlist(perf@x.values[[1]])
 cutoff.list[which.max(perf@y.values[[1]])]

If the above is correct how is it possible to find the average false
positive and negative rates  from the following
perf.fpr - performance(pred, fpr)
perf.fnr - performance(pred, fnr)

The dataset that consists of two columns; score and a binary response,
similar to this:
2.5, 0
-1, 0
2, 1
6.3, 1
4.1, 0
3.3, 1


Thanks,
Saeed
 ---
R 2.8.1 Win XP Pro SP2
ROCR package v1.0-2
e1071 v1.5-19

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ROCR package finding maximum accuracy and optimal cutoff point

2009-03-28 Thread Saeed Abu Nimeh
Found the solution to my own question. To find the false positive rate
and the false negative rate that correspond to a certain cutoff point
using the ROCR package, one can do the following (for sure there is
simpler ways, but this works):

library(ElemStatLearn)
library(rpart)
data(spam)

##
# create a train and test sets   #
##
index- 1:nrow(spam)
testindex - sample(index, trunc(length(index)/3))
testset - spam[testindex, ]
trainset - spam[-testindex, ]
rpart.model - rpart(spam ~ ., data = trainset) # training model

##
# use ROCR to calculate accuracy #
# fp,fn,tp,tn rates  #
##
library(ROCR)
rpart.pred2 - predict(rpart.model, testset)[,2]  #testing model
pred-prediction(rpart.pred2,testset[,58]) #prediction using rocr
perf.acc-performance(pred,acc) #find list of accuracies
perf.fpr-performance(pred,fpr) # find list of fp rates
perf.fnr-performance(pred,fnr) # find list of fn rates

acc.rocr-max(perf@y.values[[1]])   # accuracy using rocr

#find cutoff list for accuracies
cutoff.list.acc - unlist(perf@x.values[[1]])

#find optimal cutoff point for accuracy
optimal.cutoff.acc-cutoff.list.acc[which.max(perf@y.values[[1]])]

#find optimal cutoff fpr, as numeric because a list is returned
optimal.cutoff.fpr-which(perf@x.values[[1]]==as.numeric(optimal.cutoff.acc))

# find cutoff list for fpr
cutoff.list.fpr - unlist(perf@y.values[[1]])
# find fpr using rocr
fpr.rocr-cutoff.list.fpr[as.numeric(optimal.cutoff.fpr)]

#find optimal cutoff fnr
optimal.cutoff.fnr-which(perf@x.values[[1]]==as.numeric(optimal.cutoff.acc))
#find list of fnr
cutoff.list.fnr - unlist(perf@y.values[[1]])
#find fnr using rocr
fnr.rocr-cutoff.list.fnr[as.numeric(optimal.cutoff.fnr)]

Now acc.rocr, fpr.rocr, fnr.rocr will give you the accuracy, fpr, and
fnr percentages

Saeed Abu Nimeh wrote:
 If we use the ROCR package to find the accuracy of a classifier
 pred - prediction(svm.pred, testset[,2])
 perf.acc - performance(pred,acc)
 
 Do we find the maximum accuracy as follows (is there a simplier way?):
 max(perf@x.values[[1]])
 
 Then to find the cutoff point that maximizes the accuracy do we do the
 following (is there a simpler way):
 cutoff.list - unlist(perf@x.values[[1]])
 cutoff.list[which.max(perf@y.values[[1]])]
 
 If the above is correct how is it possible to find the average false
 positive and negative rates  from the following
 perf.fpr - performance(pred, fpr)
 perf.fnr - performance(pred, fnr)
 
 The dataset that consists of two columns; score and a binary response,
 similar to this:
 2.5, 0
 -1, 0
 2, 1
 6.3, 1
 4.1, 0
 3.3, 1
 
 
 Thanks,
 Saeed
  ---
 R 2.8.1 Win XP Pro SP2
 ROCR package v1.0-2
 e1071 v1.5-19


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ROCR package partial false positive and accuracy

2009-04-06 Thread Saeed Abu Nimeh
Hi,
In the ROCR package is there a way to find the accuracy that
corresponds to a given false positive rate. In version 1.0-2, the
authors of the package added an option to find the partial area under
the ROC curve up to a given false positive rate by passing an optional
parameter fpr.stop:
perf.auc-performance(pred,auc,fpr.stop=0.15)
Is there a way to find the accuracy up to a given false positive rate.
We use a classification tree (rpart) and a binary response.
Thanks, Saeed
---
R 2.8.1 Win XP pro
rpart 3.1-43
rocr 1.0-2

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to compute a roc curve

2008-10-31 Thread Saeed Abu Nimeh
Try library(ROCR)

Pau Marc Munoz Torres wrote:
 Hi,
 
  I'm trying to set up a prediction software, now i testing the performance
 of my method, so i need to calculate a ROC curve, specially auc, cut-off,
 sens and spec, i just looking at ROCH package, but it's a mass for me,  i'm
 not a math guy and I'm getting lost
 
 Could any of you recommend me an easy-to-use package to do this task? i just
 have a list of positive/negative samples and his score on my program. can I
 compute a roc curve with this?
 
 thanks
 
 pau
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Security Data extraction

2009-01-15 Thread Saeed Abu Nimeh
Subba Rao wrote:
 Hi,
 
 Today I came across the R application and I will admit I am not a
 Statistician.  However, I think this application will be useful for me
 at work.  I am a Network/System Security Engineer trying to make sense
 of the huge security data I collect.  I am trying to visualize the
 traffic on our network.   The data in the packet header (captured by
 tcpdump) has all the information about the systems on the network.
 
 There are lots of visual tools that can present the data in a meaningful
 way.   Each tool seems to have a different data format while most tools
 seem to understand CSV format?  How do I select the subset of the
 network data or syslog data and create a CSV file?

Sniff is a good tool: http://www.thedumbterminal.co.uk/software/sniff.shtml

 
 How else can the R application help me present the security data in a
 meaningful way to the management?

Depends on what you want to present

 
 Please excuse my ignorance.
 
 Thank you.
 
 Subba Rao
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM

2009-09-17 Thread Saeed Abu Nimeh
read Support Vector Machines in R http://www.jstatsoft.org/v15/i09/paper

On Thu, Sep 17, 2009 at 4:39 AM, Samuel Okoye samu...@yahoo.com wrote:
 Hello,

 I have 12 sample each sample has got 1000 observation, i.e I have a matrix X 
 with 1000 rows and 12 columns!

 m - svm(t(X))
 p - predict (m)

 Can anyone tell me how to use svmtrain() in R!

 Many Yhanks,
 Samuel




        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-25 Thread Saeed Abu Nimeh
On Thu, Feb 25, 2010 at 9:31 AM, Patrick Burns pbu...@pburns.seanet.com wrote:
 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?

1- Compared to other programming languages it is hard to learn R by
example, because it is hard to find code on the web that will do the
exact thing you are looking for, sometimes you might get lucky though.
By contrast, take Perl for example, it is an easy language to learn by
example.

2- The R mailing list. Beginners get frustrated after they struggle
for a long time to solve a problem and the easiest thing then is to
send an email to the R mailing list. I did this in the past. The best
thing that happened was that my request was neglected and I had to
spend more time on the problem and find a solution by myself
eventually. Do not get me wrong, I am not saying that the mailing list
is bad, but it should be more organized. Maybe broken down into couple
of other mailing lists. This might bring up a good discussion thread.


 * What documents helped you the most in this
 initial phase?

An Introduction to R by Venables
simpleR – Using R for Introductory Statistics by Verzani

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-26 Thread Saeed Abu Nimeh
Pat,
Off the bat, beginners and advanced. In addition, splitting by domain
would be very helpful -- something along the lines of:
http://cran.r-project.org/web/views/. But we should be careful, we do
not want to create 20 other mailing lists :) We have to group things.
This will help splitting the volume of the list and will help in
targeting lists by expertise.
Thanks,
Saeed

On Fri, Feb 26, 2010 at 2:08 AM, Patrick Burns pbu...@pburns.seanet.com wrote:
 Saeed,

 If the R-help list were split, what do you
 see as the pieces?

 Pat

 On 26/02/2010 01:53, Saeed Abu Nimeh wrote:

 On Thu, Feb 25, 2010 at 9:31 AM, Patrick Burnspbu...@pburns.seanet.com
  wrote:

 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?

 1- Compared to other programming languages it is hard to learn R by
 example, because it is hard to find code on the web that will do the
 exact thing you are looking for, sometimes you might get lucky though.
 By contrast, take Perl for example, it is an easy language to learn by
 example.

 2- The R mailing list. Beginners get frustrated after they struggle
 for a long time to solve a problem and the easiest thing then is to
 send an email to the R mailing list. I did this in the past. The best
 thing that happened was that my request was neglected and I had to
 spend more time on the problem and find a solution by myself
 eventually. Do not get me wrong, I am not saying that the mailing list
 is bad, but it should be more organized. Maybe broken down into couple
 of other mailing lists. This might bring up a good discussion thread.


 * What documents helped you the most in this
 initial phase?

 An Introduction to R by Venables
 simpleR – Using R for Introductory Statistics by Verzani


 --
 Patrick Burns
 pbu...@pburns.seanet.com
 http://www.burns-stat.com
 (home of 'The R Inferno' and 'A Guide for the Unwilling S User')


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-26 Thread Saeed Abu Nimeh

Hi Ivan,

On 2/26/10 6:30 AM, Ivan Calandra wrote:

You are definitely right...
What to do with bad beginner's questions is not a simple issue.

If a beginner's mailing list is created, who will answer to such
questions?


If I subscribe to the beginners mailing list, then I have to expect 
novice questions and I should be willing to help. Otherwise, I should 
not be there.


And moreover, the beginners won't take advantage of the other

questions (I've personally learned a lot trying to understand the
questions and answers to other's problems).


They can still subscribe to the advanced, but they will know that they 
are here to observe and learn, not to ask novice questions. You want to 
ask basic stuff, go to the beginners list :)


Not sure if you guys have been on some of the linux mailing lists out 
there, but man let me tell you, some of these lists have a RTFM attitude 
and they will fry you if you ask novice questions. Frankly, that is 
understandable, as most of the members are geeks and they have higher 
expectations. This mailing list is different, I have seen posts from 
different disciplines; biology, biostats, stats, computer science, 
oceanography, etc. So, IMO, there should be a beginners list to cope 
with such broad committee.


Thanks,
Saeed

And also, as you said, the

problems might persist.
The beginner's mailing list might be good in one aspect though: the
experts who subscribe to it would be willing to help the beginners to
get started with R, knowing that the questions might not be clearly stated.

As you pointed out, the mailing list is not the best for basic stuff
(the question is of course what is basic?). Not everybody knows some
colleagues who work with R (I'm personally the 1st one to use R in my lab).
I think, somehow and I have no idea how, documentation and guidance to
search for help should be more accessible as soon as you start with R.
Maybe a _*clear*_ section on the R homepage or in the introduction to
R manual like where to find help, including all of the most common
and useful resources available (from ? and RSiteSearch() to R Wiki and
Crantastic).

I hope that this whole discussion might help to make the R world better.
Thank you Patrick for initiating it!
Regards,
Ivan

Le 2/26/2010 15:09, Paul Hiemstra a écrit :

Ivan Calandra wrote:

Since you want input from beginners, here are some thoughts

I had and still have two big problems with R:
- this vectorization thing. I've read many manuals (including R
inferno), but I'm still not completely clear about it. In simple
examples, it's fine. But when it gets a bit more complex, then...
Related to it, the *apply functions are still a bit difficult to
understand. When I have to use them, I just try one and see what
happens. I don't understand them well enough to know which one I need.
- the second problem is where to find the functions/packages I need.
There are many options, and that's actually the problem. R Wiki,
Rseek, RSiteSearch, Crantastic, etc... When you start with R, you
discover that the capabilities of R are almost unlimited and you
don't really know where to start, where to find what you need.

As noted in earlier posts, the mailing list is really great, but some
people are really hard with beginners. It was noted in a discussion a
few days ago, but it looks like some don't realize how difficult it
is at the beginning to formulate a good question, clear, with
self-contained example and so on. Moreover, not everybody speaks
English natively. I don't mean that you must help, even when the
question is really vague and not clear and whatever. I'm just saying
that if you don't want to help (whatever the reason), you don't have
to say it badly. But in any cases, the mailing list is still really
helpful. As someone noted (sorry I erased the email so I don't
remember who), it might be a good idea to split it.

Hi everyone,

My 2ct about the mailing list :). I understand that beginners have a
hard time formulating a good question. But the problem is that we
can't answer the question when it is unclear. So either I:

- Don't bother answering
- Try do discuss with the author of the question, taking lots of time
to find out what exactly is the question.
- Send a read the posting guide answer

I mostly do the first, as I have to get things done during my PhD :).
So this leaves us with kind of a problem, the person mailing the list
doesn't have the knowledge to ask the right question, the list can't
answer properly and consequently, the person mailing the list still
doesn't get the information he/she needs. We could start an R-beginner
mailing list, but this would also suffer from this problem. What do
you guys think?

Maybe the mailing list is not the right medium for really basic stuff.
For that I would recommend a good R-book or (better) a course in R or
(even better) some colleagues who work with R that you can ask
questions to.

cheers,
Paul


Hope that's what you wanted
Ivan


Le 2/26/2010 08:39, Dieter Menne a 

Re: [R] two questions for R beginners

2010-02-26 Thread Saeed Abu Nimeh

sorry meant community not committee

On 2/26/10 8:36 PM, Saeed Abu Nimeh wrote:

Hi Ivan,

On 2/26/10 6:30 AM, Ivan Calandra wrote:

You are definitely right...
What to do with bad beginner's questions is not a simple issue.

If a beginner's mailing list is created, who will answer to such
questions?


If I subscribe to the beginners mailing list, then I have to expect
novice questions and I should be willing to help. Otherwise, I should
not be there.

And moreover, the beginners won't take advantage of the other

questions (I've personally learned a lot trying to understand the
questions and answers to other's problems).


They can still subscribe to the advanced, but they will know that they
are here to observe and learn, not to ask novice questions. You want to
ask basic stuff, go to the beginners list :)

Not sure if you guys have been on some of the linux mailing lists out
there, but man let me tell you, some of these lists have a RTFM attitude
and they will fry you if you ask novice questions. Frankly, that is
understandable, as most of the members are geeks and they have higher
expectations. This mailing list is different, I have seen posts from
different disciplines; biology, biostats, stats, computer science,
oceanography, etc. So, IMO, there should be a beginners list to cope
with such broad committee.

Thanks,
Saeed

And also, as you said, the

problems might persist.
The beginner's mailing list might be good in one aspect though: the
experts who subscribe to it would be willing to help the beginners to
get started with R, knowing that the questions might not be clearly
stated.

As you pointed out, the mailing list is not the best for basic stuff
(the question is of course what is basic?). Not everybody knows some
colleagues who work with R (I'm personally the 1st one to use R in my
lab).
I think, somehow and I have no idea how, documentation and guidance to
search for help should be more accessible as soon as you start with R.
Maybe a _*clear*_ section on the R homepage or in the introduction to
R manual like where to find help, including all of the most common
and useful resources available (from ? and RSiteSearch() to R Wiki and
Crantastic).

I hope that this whole discussion might help to make the R world better.
Thank you Patrick for initiating it!
Regards,
Ivan

Le 2/26/2010 15:09, Paul Hiemstra a écrit :

Ivan Calandra wrote:

Since you want input from beginners, here are some thoughts

I had and still have two big problems with R:
- this vectorization thing. I've read many manuals (including R
inferno), but I'm still not completely clear about it. In simple
examples, it's fine. But when it gets a bit more complex, then...
Related to it, the *apply functions are still a bit difficult to
understand. When I have to use them, I just try one and see what
happens. I don't understand them well enough to know which one I need.
- the second problem is where to find the functions/packages I need.
There are many options, and that's actually the problem. R Wiki,
Rseek, RSiteSearch, Crantastic, etc... When you start with R, you
discover that the capabilities of R are almost unlimited and you
don't really know where to start, where to find what you need.

As noted in earlier posts, the mailing list is really great, but some
people are really hard with beginners. It was noted in a discussion a
few days ago, but it looks like some don't realize how difficult it
is at the beginning to formulate a good question, clear, with
self-contained example and so on. Moreover, not everybody speaks
English natively. I don't mean that you must help, even when the
question is really vague and not clear and whatever. I'm just saying
that if you don't want to help (whatever the reason), you don't have
to say it badly. But in any cases, the mailing list is still really
helpful. As someone noted (sorry I erased the email so I don't
remember who), it might be a good idea to split it.

Hi everyone,

My 2ct about the mailing list :). I understand that beginners have a
hard time formulating a good question. But the problem is that we
can't answer the question when it is unclear. So either I:

- Don't bother answering
- Try do discuss with the author of the question, taking lots of time
to find out what exactly is the question.
- Send a read the posting guide answer

I mostly do the first, as I have to get things done during my PhD :).
So this leaves us with kind of a problem, the person mailing the list
doesn't have the knowledge to ask the right question, the list can't
answer properly and consequently, the person mailing the list still
doesn't get the information he/she needs. We could start an R-beginner
mailing list, but this would also suffer from this problem. What do
you guys think?

Maybe the mailing list is not the right medium for really basic stuff.
For that I would recommend a good R-book or (better) a course in R or
(even better) some colleagues who work with R that you can ask
questions to.

cheers,
Paul

Re: [R] svm of e1071 package

2010-04-06 Thread Saeed Abu Nimeh
I think the problem is that you have R configured as 32-bits. If that
is the case, then you will only have access to 4 gigs of RAM (see
http://www.brianmadden.com/blogs/brianmadden/archive/2004/02/19/the-4gb-windows-memory-limit-what-does-it-really-mean.aspx).
Try booting up an ubuntu instance in the cloud and then install R
using the 64-bit configuration. I am interested to know if this solves
the problem. Let me know.
Thanks,
Saeed

On Tue, Apr 6, 2010 at 5:07 AM, Shyamasree Saha [shs] s...@aber.ac.uk wrote:
 Hello List,

 I am having a great trouble using svm function in e1071 package. I have 4gb 
 of data that i want to use to train svm. I am using Amazon cloud, my Amazon 
 Machine Image(AMI) has 34.2 GB of memory. my R process was killed several 
 times when i tried to use 4GB of data for svm. Now I am using a subset of 
 that data and it is only 1.4 GB.  i remove all unnecessary objects before 
 calling svm(). I have monitored the memory consumption and found that before 
 i call svm() my AMI has 25GB of free memory. after calling svm(), this free 
 memory starts going down and at the end i have only 1.7 gb of memory and R 
 gives me error that it can not create vector of size 3.4 gb. Its true that if 
 i do not have enough memory then how R will create the vector. But my 
 question is how svm function is eating up that 25gb of memory?? do i have 
 anything to do to solve this problem or its a problem in e1071 package ? by 
 problem in e1071 package, i mean does svm() in e1071 normally consume that 
 high amount !
  of memory? if svm() really consume this much memory then i have to think of 
 some other way to train svm. if 34gb ram is not enough for 1.4 gb of data 
 then i am in trouble. Amazon has maximum 68.4gb ram.

 Please help. Thanks in advance.

 Regards
 Shyama
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] In svm(), how to connect quantitative prediction result to categorical result?

2011-04-12 Thread Saeed Abu Nimeh
I trained a linear svm and did classification. looking at the model I
have, with a binary response 0/1, the decision values look like this:
head(svm.model$decision.values)
2.5
3.1
-1.0

looking at the fitted values
head(svm.model$fitted)
1
1
0
So it looks like anything less than or equal 0 is mapped to the
negative class, i.e. 0), otherwise it is mapped to the positive class,
i.e. 1.



On Fri, Apr 8, 2011 at 8:35 PM, Li, Yunfei yunfei...@wsu.edu wrote:
 Hi,

 I am studying using SVM functions of e1071 package to do prediction, and I 
 found during the training data are factor type, then svm.predict() can 
 predict data directly by categories; but if response variables are 
 numerical, the predicted value from svm will be continuous quantitative 
 numbers, then how can I connect these quantitative numbers to categories? 
 (for example:in an example data set, the response variables are numerical and 
 have two categories: 0 and 1, and the predicted value are continuous 
 quantitative numbers from 0 to 1.3, how can I know which of them represent 
 category 0 and which represent 1?)

 Best,

 Yunfei Li
 --
 Research Assistant
 Department of Statistics 
 School of Molecular Biosciences
 Biotechnology Life Sciences Building 427
 Washington State University
 Pullman, WA 99164-7520
 Phone: 509-339-5096
 http://www.wsu.edu/~ye_lab/people.html


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prediction error in ROCR package when sampled y consists of only one class

2011-04-15 Thread Saeed Abu Nimeh
Try performing stratified sampling when doing cv.
cran.r-project.org/web/packages/ipred

On Fri, Apr 15, 2011 at 11:00 AM, Soyeon Kim yunni0...@gmail.com wrote:
 Dear R users,

 Hi. I am using prediction function in ROCR package.
 y consists of two classes 0 and 1.
 However, since I am using cross-validation, a sampled small number of
 y may consist of only one class
y
  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 In this case, prediction function gives an error:
 Error in prediction(predic, y) : Number of classes is not equal to 2.
 ROCR currently supports only evaluation of binary classification tasks.

 How can I solve this problem?

 Thank you,
 Soyeon Kim

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to reference a package in academical paper

2011-03-07 Thread Saeed Abu Nimeh
http://www.iiap.res.in/astrostat/School07/R/html/utils/html/citation.html

On Mon, Mar 7, 2011 at 4:12 PM, Jan Hornych jh.horn...@gmail.com wrote:
 Dear,

 I am now writing more formal academical paper, and would like to reference
 an R package. Do you have any recommendation how to do it?

 Taking for instance the RODBC package as an example, how would the reference
 look like?
 http://cran.r-project.org/web/packages/RODBC/index.html

 Thank you
 Jan

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.