Re: [R] nested if/else very slow, more efficient ways?

2006-10-23 Thread Mike Nielsen
One way that might do what you want is to change the character column
to a factor, and then apply as.numeric.

resultsfuzzy$x-as.numeric(factor(resultsfuzzy$x,levels=c(5a,5b,5c,5d,5e)))

This assumes, of course, that you know that the levels are going to be
in the set {5a,5b,5c,5d,5e}.

However, it may be better to just leave it as a factor, depending upon
what you intend to do with it later.

Hope this helps.

Regards,

Mike

On 10/23/06, Kim Milferstedt [EMAIL PROTECTED] wrote:
 Hello,

 in the data.frame resultsfuzzy I would like to replace the
 characters in the second column (5a, 5b, ... 5e) with numbers
 from 1 to 5. The data.frame has 39150 entries. I seems to work on
 samples that are  nrow(resultsfuzzy) but it takes suspicously long.

 Do you have any suggestions how to make the character replacing more 
 efficient?

 Code:

 for (i in 1:nrow(resultsfuzzy))
 {
 if (resultsfuzzy[i,2] == 5a){resultsfuzzy[i,2] - 1} else
  if (resultsfuzzy[i,2] == 5b){resultsfuzzy[i,2] - 2} else
  if (resultsfuzzy[i,2] == 5c){resultsfuzzy[i,2] - 3} else
  if (resultsfuzzy[i,2] == 5d){resultsfuzzy[i,2] - 4} else
  resultsfuzzy[i,2] - 5
 }

 Thanks,

 Kim

 version

 platform i386-pc-mingw32
 arch i386
 os   mingw32
 system   i386, mingw32
 status
 major2
 minor2.1
 year 2005
 month12
 day  20
 svn rev  36812
 language R

 __

 Kim Milferstedt
 University of Illinois at Urbana-Champaign
 Department of Civil and Environmental Engineering
 4125 Newmark Civil Engineering Laboratory
 205 North Mathews Avenue MC-250
 Urbana, IL 61801
 USA
 phone: (001) 217 333-9663
 fax: (001) 217 333-6968
 email: [EMAIL PROTECTED]
 http://cee.uiuc.edu/research/morgenroth

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Regards,

Mike Nielsen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] discarding 'levels'

2006-10-04 Thread Mike Nielsen
From TFM of read.table:

 as.is: the default behavior of 'read.table' is to convert character
  variables (which are not converted to logical, numeric or
  complex) to factors.  The variable 'as.is' controls the
  conversion of columns not otherwise specified by
  'colClasses'. Its value is either a vector of logicals
  (values are recycled if necessary), or a vector of numeric or
  character indices which specify which columns should not be
  converted to factors.

You may have some blanks in the third column.

Factor levels whose character representation happens to be a numeral
don't necessarily compare equal to the integer with the same character
representation (if you get my drift...).

You can use as.numeric, but better would be to use colClasses in read.table.

Regards,

Mike

On 10/4/06, hoopz [EMAIL PROTECTED] wrote:

 Ok, so I am using read.table to read a .txt file and put it into a matrix.
 There are some values that are 'NA'.  If I use read.table with as.is =FALSE,
 then some of the entries in the matrix return this:

  data[1,3]
 [1]  0
 Levels:  0  1  NA

 and if I do

  data[1,3]==0

 it returns FALSE.  It's a zero, it's not false!


 If I set as.is=TRUE, I don't get the levels problem, but in those entries
 where I did get the levels problem, this happens:

  data[1,3]
 [1]  0

 This time, it keeps it as a string.  I can use as.numeric to fix it now, but
 I'm just curious as to why this happens.


 Thanks
 --
 View this message in context: 
 http://www.nabble.com/discarding-%27levels%27-tf2384152.html#a6645474
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Regards,

Mike Nielsen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creation of new variables

2006-09-26 Thread Mike Nielsen
You may not have told us quite enough to be able to help you.  It may
be worth your while investing some time in describing the problem you
are trying to solve a little bit more comprehensively.

The posting guide http://www.R-project.org/posting-guide.html can be
useful in helping you  frame a question that stands a better chance of
receiving help.

Regards,

Mike

On 9/26/06, nalluri pratap [EMAIL PROTECTED] wrote:
 Hello All,

   I have 8 variables named

a b c d e f g h

   I need to create four variables from these 8 vraibles in R.

   the new variables are ab,cd,ef,gh.

   Can anyone pleas help me

   thanks,
   Pratap




 -


 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Regards,

Mike Nielsen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Passing R connection as argument to a shell command on Windows

2006-09-25 Thread Mike Nielsen
No, the cut command won't understand that z is an R connection and
not a file in the current working directory: there is no overlap
between the R object name space and the Windows object name space.

Unfortunately, you may be forced to unzip to a temporary file, and
then read from that.

One thing that you might want to try, if you're using cygwin, is to
create a named pipe, and use shell() with wait=FALSE to unzip and
pipe into cut and then output to the named pipe.  Open an R
connection for reading from the named pipe.  This leaves open the
question of how to deal with failures, and whether you can invoke a
command pipeline from R under Windows...

I haven't tried this, so if you manage to make it work, it may be
something that's of interest to the list in general.

Regards,

Mike

On 9/25/06, Anupam Tyagi [EMAIL PROTECTED] wrote:
 Hello, is there a way to pass a connection to a file in a zipped archive as
 argument (instead of a file name of unzipped file) to shell command cut. In
 general, is it possible to pipe output of a R function to a shell command? 
 How?

 I want to do something like:

 z = unz(zipArchive.zip, fileASCII.ASC)
 # open connection
 open(z)
 # cut lines of the ASCII file in zipped archive at specific postions and send
 results to another file.
 shell(cut -c2-3,5-8 z  test2.dat)

 Anupam.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Regards,

Mike Nielsen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Beginner Loop Question with dynamic variable names

2006-09-25 Thread Mike Nielsen
Is this what you had in mind?

 j-data.frame(q1=rnorm(10),q2=rnorm(10))
 j
   q1 q2
1  -0.9189618 -0.2832102
2   0.9394316  1.1345975
3  -0.6388848  0.6850255
4   0.4938245 -0.5825715
5  -1.2885257 -0.2654023
6  -0.5278295  0.2382791
7   0.6517268  0.8923375
8   0.4124178  1.1231630
9  -0.1604982  0.2285672
10 -0.2369713  0.6130197
 for(i in 1:3){j[,paste(sep=,res,i)]-with(j,q1+q2)}
 j
q1  q2res1res2
res3
1  -0.9189618 -0.2832102 -1.20217207 -1.20217207 -1.20217207
2   0.9394316  1.1345975  2.07402913  2.07402913  2.07402913
3  -0.6388848  0.6850255  0.04614073  0.04614073  0.04614073
4   0.4938245 -0.5825715 -0.08874699 -0.08874699 -0.08874699
5  -1.2885257 -0.2654023 -1.55392802 -1.55392802 -1.55392802
6  -0.5278295  0.2382791 -0.28955044 -0.28955044 -0.28955044
7   0.6517268  0.8923375  1.54406433  1.54406433  1.54406433
8   0.4124178  1.1231630  1.53558084  1.53558084  1.53558084
9  -0.1604982  0.2285672  0.06806901  0.06806901  0.06806901
10 -0.2369713  0.6130197  0.37604847  0.37604847  0.37604847

Regards,

Mike
On 9/25/06, Peter Wolkerstorfer - CURE [EMAIL PROTECTED] wrote:
 Dear all,

 I have another small scripting-beginner problem which you hopefully can
 help:

 I compute new variables with:

 # Question 1
 results$q1 - with(results, q1_1*1+ q1_2*2+ q1_3*3+ q1_4*4+ q1_5*5)
 # Question 2
 results$q2 - with(results, q2_1*1+ q2_2*2+ q2_3*3+ q2_4*4+ q2_5*5)
 # Question 3
 results$q3 - with(results, q3_1*1+ q3_2*2+ q3_3*3+ q3_4*4+ q3_5*5)
 # Question 4
 results$q4 - with(results, q4_1*1+ q4_2*2+ q4_3*3+ q4_4*4+ q4_5*5)

 This is very inefficient so I would like to do this in a loop like:

 for (i in 1:20) {results$q1 - with(results, q1_1*1+ q1_2*2+ q1_3*3+
 q1_4*4+ q1_5*5)}

 My question now:
 How to replace the 1-s (results$q1, q1_1...) in the variables with the
 looping variable?

 Here like I like it (just for illustration - of course I still miss the
 function to tell R that it should append the value of i to the variable
 name):

 # i is the number of questions - just an illustration, I know it does
 not work this way
 for (i in 1:20) {results$qi - with(results, qi_1*1+ qi_2*2+ qi_3*3+
 qi_4*4+ qi_5*5)}

 Help would be greatly appreciated. Thanks in advance.

 Peter


 ___CURE - Center for Usability Research  Engineering___

 Peter Wolkerstorfer
 Usability Engineer
 Hauffgasse 3-5, 1110 Wien, Austria

 [Tel]  +43.1.743 54 51.46
 [Fax]  +43.1.743 54 51.30

 [Mail] [EMAIL PROTECTED]
 [Web]  http://www.cure.at

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Regards,

Mike Nielsen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with function

2006-09-20 Thread Mike Nielsen
Take the case of i==1.

Ct[i]-1/Bq*Bt[i]*Cerr  # Assign Ct[1]
using Bt[1]
  Rt[i]-Bt[i]/(a+b*Bt[i])   # Assign
Rt[1] using Bt[1]
  Bt[i+2]-(1-m)*Bt[i+1]+Rt[i]   *Rerr-Ct[i+1]  # Assign Bt[3] using
Bt[2] and Rt[1] and **Ct[2]**


You're reading Ct[i+1] before you ever assign it, hence NA.

OSISTM

Hope this helps,

Regards,

Mike



On 9/20/06, Guenther, Cameron [EMAIL PROTECTED] wrote:
 Hello everyone,

 I have a function here that I wrote but doesn't seem to work quite
 right.  Attached is the code.  In the calib funcion under the for loop
 Bt[i+2]-(1-m)*Bt[i+1]+Rt[i]*Rerr-Ct[i+1] returns NA's for everything
 after years 1983 and 1984.  However the code works when it reads
 Bt[i+2]-(1-m)*Bt[i+1]+Rt[i]*Rerr-Ct[i].  I don't quite understand why
 since it should be calculating all of the necessary inputs prior to
 calculating Bt[i+2].  Any help would be greatly appreciated.

 Thanks

 #Model parameters
 B0-7500
 m-0.3
 R0-B0*m
 z-0.8
 a-B0/R0*(1-(z-0.2)/(0.8*z))
 b-(z-0.2)/(0.8*z*R0)
 dat-data.frame(years=seq(1983,2004),cobs=c(19032324,19032324,17531618,2
 0533029,20298099,20793744,23519369,23131780,19922247,17274513,17034419,1
 2448318,4551585,4226451,7183688,7407924,7538366,7336039,8869193,7902341,
 6369089,6211886))
 stdr-runif(100,0,0.5)
 stdc-runif(100,0,0.5)
 BC-runif(1000,0,100)


 #model calibration

 calib-function(x){
  v-sample(stdr,1)
  cr-sample(stdc,1)
  N-rnorm(1)
  Bq-sample(BC,1)
  Rerr-exp(N*v-(v^2/2))
  Cerr-exp(N*cr-(cr^2/2))
  Bt-vector();Bt[1]=B0;Bt[2]=B0
  Rt-vector()
  Ct-vector()
  for (i in 1:length(x$years)){
   Ct[i]-1/Bq*Bt[i]*Cerr
   Rt[i]-Bt[i]/(a+b*Bt[i])
   Bt[i+2]-(1-m)*Bt[i+1]+Rt[i]*Rerr-Ct[i+1]
  }
   out-new.env()
   out$yr-x$years[1:length(x$years)]
   out$Bt-Bt[1:length(x$years)]
   out$Rt-Rt[1:length(x$years)]
   out$Ct-Ct[1:length(x$years)]
   out$stdr-v
   out$stdc-cr
   out$Bq-Bq
   out$Rerr-Rerr
   out$Cerr-Cerr
   return(as.list(out))
  }
  test-calib(dat)


 Cameron Guenther, Ph.D.
 Associate Research Scientist
 FWC/FWRI, Marine Fisheries Research
 100 8th Avenue S.E.
 St. Petersburg, FL 33701
 (727)896-8626 Ext. 4305
 [EMAIL PROTECTED]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Regards,

Mike Nielsen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Statitics Textbook - any recommendation?

2006-09-20 Thread Mike Nielsen
Excellent characterization.

MASS is a very good book, but I'm not sure I would describe it as a
statistics textbook, much less one of the basic variety.  While I
certainly wouldn't presume to speak for Prof. Ripley and Dr. Venables,
it seems unlikely their intent in writing MASS was to teach
statistics, but rather, as the name of the book might suggest, to
explain how S+ (and R) can be applied to modern statistical
techniques.  My experience with this book is that it assumes
considerable background knowledge.

By all means, buy MASS, but if you need guidance on the how and why of
statistical techniques, you may wish to shop Amazon to find a
supplement.

Regards,

Mike

On 9/20/06, Berton Gunter [EMAIL PROTECTED] wrote:
 Not withstanding Prof. Heiberger's admirable enthusiasm, I think the
 canonical answer is probably MASS (Modern Applied Statistics with S) by
 Venables and Ripley. It is very comprehensive, but depending on your
 background, you may find it too telegraphic.

 -- Bert Gunter
 Genentech Non-Clinical Statistics
 South San Francisco, CA

 The business of the statistician is to catalyze the scientific learning
 process.  - George E. P. Box



  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Iuri Gavronski
  Sent: Wednesday, September 20, 2006 1:22 PM
  To: r-help@stat.math.ethz.ch
  Subject: [R] Statitics Textbook - any recommendation?
 
  I would like to buy a basic statistics book (experimental design,
  sampling, ANOVA, regression, etc.) with examples in R. Or
  download it
  in PDF or html format.
  I went to the CRAN contributed documentation, but there were only R
  textbooks, that is, textbooks where R is the focus, not the
  statistics. And I would like to find the opposite.
  Other text I am trying to find is multivariate data analysis (EFA,
  cluster, mult regression, MANOVA, etc.) with examples with R.
  Any recommendation?
 
  Thank you in advance,
 
  Iuri.
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Regards,

Mike Nielsen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2006-09-18 Thread Mike Nielsen
On 18 Sep 2006 19:53:59 +0200, Peter Dalgaard [EMAIL PROTECTED] wrote:
 Sarosh Jamal [EMAIL PROTECTED] writes:

  Hi there,
 
  I was updating the R-cmdr add-on (v.1.1-6 to the latest v.1.2) for R
  (v.2.2.0) in a SunOS9 environment and came across some warnings during my
  installation - it seems to download the dependencies but runs into the
  following during install:
 
  * Installing *source* package 'acepack' ...
  ** libs
  /opt/sfw/R/R-2.2.0/bin/SHLIB: make: not found
  ERROR: compilation failed for package 'acepack'
  /opt/sfw/R/R-2.2.0/bin/INSTALL: test: argument expected
  ERROR: failed to lock directory '/opt/sfw/R/R-2.2.0/library' for modifying
  Try removing '/opt/sfw/R/R-2.2.0/library/00LOCK'
 
  I don't see why I would have to remove the 00LOCK file since it seems to
  have been created by the very session of R I use to run install.packages().
 
  I'm attaching the complete log.
 
  Any insight or feedback will be much appreciated.

 Notice the _first_ issue reported:

 make: not found

 without a functioning make command, you're not likely to get
 anything to work. Presumably, since you have a functioning R, make
 is there somewhere, but you need to adjust your PATH. The rest could
 well just be consequences.

And please do use a meaningful subject line.  If you're the type who
likes to look through the list archives to try to solve problems,
you'll find that good subject lines are most helpful.


  Thank you,
 
 
  Sarosh Jamal
  Geo Computing  IT Specialist, Department of Geography
  University of Toronto at Mississauga
  e: [EMAIL PROTECTED]
 
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 --
O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
 ~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Regards,

Mike Nielsen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] opening files in directory

2006-09-04 Thread Mike Nielsen
R won't do variable interpolation inside quotation marks as perl does.

You could try amending your code with, for e.g.

file.name-paste(sep=/,data_files,files[[i]])
x-read.table(file.name)

Regards,

Mike

On 9/4/06, Ffenics [EMAIL PROTECTED] wrote:
 Hi there
 I want to be able to take all the files in a given directory, read them in 
 one at a time, calculate a distance matrix for them (the files are data 
 matrices) and then print them out to separate files. This is the code I 
 thought I would be able to use
 (all files are in directory data_files)
 for(i in 1:length(files))
 + {
 + x-read.table(data_files/files[[i]])
 + dist-dist(x, method=euclidean, diag=TRUE)
 + mat-as.matrix(dist)
 + write.table(mat, file=files[[i]])
 + }
 But I get this error when I try to open the first file using read.table
 Error in file(file, r) : unable to open connection
 In addition: Warning message:
 cannot open file 'data_files/files[[i]]'
 if I try the read.table command without the quotation marks like so
 x-read.table(data_matrix_files/files[[i]])
 I get the error
 Error in read.table(data_matrix_files/files[[i]]) :
 Object data_matrix_files not found
 But if I go to the directory where the files are kept before starting up R, 
 the read.table command without the quotation marks works.
 I don't want to start up R in the same directory as the where the files I 
 will be using reside though so how do I rectify this?
 Any help much appreciated

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Regards,

Mike Nielsen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can R compute the expected value of a random variable?

2006-08-26 Thread Mike Nielsen
Yes.

On 8/26/06, Paul Smith [EMAIL PROTECTED] wrote:
 Dear All

 Can R compute the expected value of a random variable?

 Thanks in advance,

 Paul

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Regards,

Mike Nielsen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] string-to-number

2006-08-21 Thread Mike Nielsen
Marc,

Thanks very much for this.  I hadn't really looked at Rprof in the
past; now I have a new toy to play with!

I have formulated an hypothesis that the reason parse/eval is quicker
lies in the pattern-matching code:  strsplit is using regular
expressions, whereas perhaps parse is using some more clever (but
possibly less general) matching algorithm.  It will be interesting to
inspect the source code to get to the bottom of it.

Thanks again for your interest and efforts in this, and for pointing out Rprof!

Regards,

Mike Nielsen

On 8/20/06, Marc Schwartz [EMAIL PROTECTED] wrote:
 On Sat, 2006-08-19 at 10:25 -0600, Mike Nielsen wrote:
  Wow.  New respect for parse/eval.
 
  Do you think this is a special case of a more general principle?  I
  suppose the cost is memory, but from time to time a speedup like this
  would be very beneficial.
 
  Any hints about how R programmers could recognize such cases would, I
  am sure, be of value to the list in general.
 
  Many thanks for your efforts, Marc!

 Mike,

 I think that one needs to consider where the time is being spent and
 then adjust accordingly. Once you understand that, you can develop some
 insight into what may be a more efficient approach. R provides good
 profiling tools that facilitate this process.

 In this case, almost all of the time in the first two examples using
 strsplit(), is in that function:

  repeated.measures.columns - paste(1:10, collapse = ,)

  library(utils)
  Rprof(tmp - tempfile())
  res1 - as.numeric(unlist(strsplit(repeated.measures.columns, ,)))
  Rprof()

  summaryRprof(tmp)
 $by.self
 self.time self.pct total.time total.pct
 strsplit  23.68 99.7  23.68  99.7
 as.double.default  0.06  0.3   0.06   0.3
 as.numeric 0.00  0.0  23.74 100.0
 unlist 0.00  0.0  23.68  99.7

 $by.total
 total.time total.pct self.time self.pct
 as.numeric 23.74 100.0  0.00  0.0
 strsplit   23.68  99.7 23.68 99.7
 unlist 23.68  99.7  0.00  0.0
 as.double.default   0.06   0.3  0.06  0.3

 $sampling.time
 [1] 23.74


 Contrast that with Prof. Ripley's approach:

  Rprof(tmp - tempfile())
  res3 - eval(parse(text=paste(c(, repeated.measures.columns, 
  Rprof()

  summaryRprof(tmp)
 $by.self
 self.time self.pct total.time total.pct
 parse  0.42 87.5   0.42  87.5
 eval   0.06 12.5   0.48 100.0

 $by.total
 total.time total.pct self.time self.pct
 eval0.48 100.0  0.06 12.5
 parse   0.42  87.5  0.42 87.5

 $sampling.time
 [1] 0.48


 To some extent, one could argue that my initial timing examples are
 contrived, in that they specifically demonstrate a worst case scenario
 using strsplit().  Real world examples may or may not show such gains.

 For example with Charles' initial query, the initial vector was rather
 short:

repeated.measures.columns
   [1] 3,6,10

 So if this was a one-time conversion, we would not see such significant
 gains.

 However, what if we had a long list of shorter entries:

  repeated.measures.columns - paste(1:10, collapse = ,)
  repeated.measures.columns
 [1] 1,2,3,4,5,6,7,8,9,10

  big.list - replicate(1, list(repeated.measures.columns))

  head(big.list)
 [[1]]
 [1] 1,2,3,4,5,6,7,8,9,10

 [[2]]
 [1] 1,2,3,4,5,6,7,8,9,10

 [[3]]
 [1] 1,2,3,4,5,6,7,8,9,10

 [[4]]
 [1] 1,2,3,4,5,6,7,8,9,10

 [[5]]
 [1] 1,2,3,4,5,6,7,8,9,10

 [[6]]
 [1] 1,2,3,4,5,6,7,8,9,10


  system.time(res1 - t(sapply(big.list, function(x)
 as.numeric(unlist(strsplit(x, ,))
 [1] 1.972 0.044 2.411 0.000 0.000

  str(res1)
  num [1:1, 1:10] 1 1 1 1 1 1 1 1 1 1 ...

  head(res1)
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]12345678910
 [2,]12345678910
 [3,]12345678910
 [4,]12345678910
 [5,]12345678910
 [6,]12345678910



 Now use Prof. Ripley's approach:

  system.time(res3 - t(sapply(big.list, function(x)
 eval(parse(text=paste(c(, x, )))
 [1] 1.676 0.012 1.877 0.000 0.000

  str(res3)
  num [1:1, 1:10] 1 1 1 1 1 1 1 1 1 1 ...

  head(res3)
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]12345678910
 [2,]12345678910
 [3,]12345678910
 [4,]12345678910
 [5,]12345678910
 [6,]12345678910



  all(res1 == res3)
 [1] TRUE


 We do see a notable reduction in time with strsplit(), while a notable
 increase in time using eval

Re: [R] string-to-number

2006-08-19 Thread Mike Nielsen
Wow.  New respect for parse/eval.

Do you think this is a special case of a more general principle?  I
suppose the cost is memory, but from time to time a speedup like this
would be very beneficial.

Any hints about how R programmers could recognize such cases would, I
am sure, be of value to the list in general.

Many thanks for your efforts, Marc!

Regards,

Mike

On 8/19/06, Marc Schwartz [EMAIL PROTECTED] wrote:
 On Sat, 2006-08-19 at 13:30 +0100, Prof Brian Ripley wrote:
  On Sat, 19 Aug 2006, Marc Schwartz wrote:
 
   On Sat, 2006-08-19 at 07:58 -0400, Charles Annis, P.E. wrote:
Greetings, Amigos:
   
I have been trying without success to convert a character string,
 repeated.measures.columns
[1] 3,6,10
   
into c(3,6,10) for subsequent use.
   
as.numeric(repeated.measures.columns) doesn't work (likely because of 
the
commas)
[1] NA
Warning message:
NAs introduced by coercion
   
I've tried many things including
strsplit(repeated.measures.columns, split = ,)
   
which produces a list with only one element, viz:
[[1]]
[1] 3  6  10
   
as.numeric() doesn't like that either.
   
Clearly: 1) I cannot be the first person to attempt this, and 2) I've 
made
this WAY harder than it is.
   
Would some kind soul please instruct me (and perhaps subsequent 
searchers)
how to convert the elements of a string into numbers?
   
Thank you.
  
   One more step:
  
as.numeric(unlist(strsplit(repeated.measures.columns, ,)))
   [1]  3  6 10
  
   Use unlist() to take the output of strsplit() and convert it to a
   vector, before coercing to numeric.
 
  Or, more simply, use [[1]] as in
 
  as.numeric(strsplit(repeated.measures.columns, ,)[[1]])
 
  Also,
 
  eval(parse(text=paste(c(, repeated.measures.columns, 
 
  looks competitive, and is quite a bit more general (e.g. allows spaces,
  works with complex numbers), or you can use scan() from an anonymous file
  or a textConnection.

 I would say more than competitive:

   repeated.measures.columns - paste(1:10, collapse = ,)

  str(repeated.measures.columns)
  chr
 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,4|
  __truncated__


  system.time(res1 -
 as.numeric(unlist(strsplit(repeated.measures.columns, ,
 [1] 24.238  0.192 26.200  0.000  0.000

  system.time(res2 - as.numeric(strsplit(repeated.measures.columns,
 ,)[[1]]))
 [1] 24.313  0.196 26.471  0.000  0.000

  system.time(res3 - eval(parse(text=paste(c(,
 repeated.measures.columns, )
 [1] 0.328 0.004 0.395 0.000 0.000


  str(res1)
  num [1:10] 1 2 3 4 5 6 7 8 9 10 ...

  str(res2)
  num [1:10] 1 2 3 4 5 6 7 8 9 10 ...

  str(res3)
  num [1:10] 1 2 3 4 5 6 7 8 9 10 ...


  all(res1 == res2)
 [1] TRUE

  all(res1 == res3)
 [1] TRUE


 Best regards,

 Marc

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Regards,

Mike Nielsen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Autocompletion

2006-08-16 Thread Mike Nielsen
I mostly use R under Linux and Xemacs with the truly wonderful ESS (Emacs
Speaks Statistics).  It has numerous features, one of which is a pretty
comprehensive auto-completion facility.

The few times I have used R under Windows, any auto-completion feature that
may be there did not fall readily to hand (translate:  I poked a few keys
and didn't notice anything auto-completing), and I didn't spend any time
looking for one.

As there are several platforms on which R runs, and a number of interactive
interfaces, you may need to be a bit more specific in the framing of your
question to get a more informative response.

Regards,

Mike

On 8/16/06, Óttar Ísberg [EMAIL PROTECTED] wrote:

 Hi there!

 I may be guilty of not doing my homework, but still, I've searched. I'm a
 relative newcomer to R (my forte is at present MATLAB, but for various
 reasons I'm trying to get literate in R). My question is: Is there an
 autocompletion feature buried somewhere in R?

 All the best

 Óttar

 [[alternative HTML version deleted]]



 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Regards,

Mike Nielsen

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] random section of samples based on group membership

2006-07-24 Thread Mike Nielsen
Well, how you do it might be a matter of taste with respect to how you
want the results.

You could try using by with sample

by(x,x[,3],function(y){y[sample(nrow(y),1),]})

This will return a list with one list element for each sample group.
You can the combine the list back into a matrix.

That's my naive solution; no doubt there will be half a dozen better
ways to go about it.

Also, some of the clustering functions I have seen will sample for you.


On 7/24/06, Wade Wall [EMAIL PROTECTED] wrote:
 Hi all,

 I have a matrix of 474 rows (samples) with 565 columns (variables).
 each of the 474 samples belong to one of 120 groups, with the
 groupings as a column in the above matrix. For example, the group
 column would be:

 1
 1
 1
 2
 2
 2
 .
 .
 .
 120
 120

 I  want to randomly select one from each group.  Not all the groups
 have the same number of samples, some have 4, some 3 etc.  Is there a
 function to do this, or would I need to write a looping statement to
 look at each successive group?

 I basically want to combine the randomly selected samples from the 120
 groups into a new matrix in order to perform a cluster analysis.

 Thanks,
 Wade

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Regards,

Mike Nielsen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error Calculating Mean

2006-07-09 Thread Mike Nielsen
I'd hazard a guess that data.linear$Weight may have a Not Available
data point (ie. missing data).

 f - c(NA,rnorm(10))
 mode(f)
[1] numeric
 mean(f)
[1] NA

If you'd like to compute the mean anyway, you can use
 mean(f,na.rm=TRUE)
[1] 0.3433036


On 7/9/06, justin rapp [EMAIL PROTECTED] wrote:
 I have a vector containing players' weights.  When I enter


 mode(data.linear$Weight)
 numeric is returned.


 When I type mean(data.linear$Weight)

 NA is returned.

 Any ideas as to why this may be the case?  I am trying to calculate
 this ultimately so I can superimpose a normal density line over a
 histogram containing the weights?

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



-- 
Regards,

Mike Nielsen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Combining a list of similar dataframes into a single dataframe

2006-07-08 Thread Mike Nielsen
I would be very grateful to anyone who could point to the error of my
ways in the following.

I have a dataframe called net1, as such:

 str(net1)
`data.frame':114192 obs. of  9 variables:
 $ server : Factor w/ 122 levels AB93-99,AMP93-1,..: 1 1 1
1 1 1 1 1 1 1 ...
 $ ts :'POSIXct', format: chr  2006-06-30 12:31:44
2006-06-30 12:31:44 2006-06-30 12:31:44 2006-06-30 12:31:44 ...
 $ instance   : Factor w/ 22 levels 1,2,Compaq Ethernet_Fast
Ethernet Adapter_Module,..: 4 4 4 4 4 4 4 4 4 4 ...
 $ instanceno : Factor w/ 3 levels 1,2,3: 1 1 1 1 1 1 1 1 1 1 ...
 $ perftime   : num  3.16e+13 3.16e+13 3.16e+13 3.16e+13 3.16e+13 ...
 $ perffreq   : num  6.99e+08 6.99e+08 6.99e+08 6.99e+08 6.99e+08 ...
 $ perftime100nsec: num  1.28e+17 1.28e+17 1.28e+17 1.28e+17 1.28e+17 ...
 $ countername: Factor w/ 4 levels Bytes Received/sec,..: 1 3 2
4 1 3 2 4 1 3 ...
 $ countervalue   : num  6.08e+07 6.64e+07 5.58e+06 1.00e+08 6.09e+07 ...


What I am trying to do is subset this thing down by server, instance,
instanceno, countername and then apply a function to each subsetted
dataframe.  The function performs a calculation on countervalue,
essentially collapsing instanceno and instance down to a single
value.

Here is a snippet of my code:
t1 - by(net1,
 list(
  net1$server,
  factor(as.character(net1$countername))),# get rid of
unused levels of countername for this server
 function(x){
   g - by(x,
   list(factor(as.character(x$instance)), # get rid of
unused levels of instance for this server
   factor(as.character(x$instanceno))),   # same with instanceno

function(y){c(NA,mean(y$perffreq)*diff(y$countervalue)/diff(y$perftime))})
   data.frame(server=x$server,
  ts=x$ts,
  countername = x$countername,
  countervalue =
apply(sapply(g[!sapply(g,is.null)],I),1,sum))
 })

So t1 then is a list of dataframes, each with an identical set of columns)

 str(t1[[1]])
`data.frame':   149 obs. of  4 variables:
 $ server  : Factor w/ 122 levels AB93-99,AMP93-1,..: 1 1 1 1
1 1 1 1 1 1 ...
 $ ts  :'POSIXct', format: chr  2006-06-30 12:31:44
2006-06-30 12:32:58 2006-06-30 12:34:46 2006-06-30 12:36:55 ...
 $ countername : Factor w/ 4 levels Bytes Received/sec,..: 1 1 1 1 1
1 1 1 1 1 ...
 $ countervalue: numNA  938  816 4213  906 ...

What I'd dearly love to do, without looping or lapply-ing through t1
and rbinding (too much data for this to finish quickly enough -- this
is about 10% of what I'm eventually going to have to manage), is
convert t1 to one big dataframe.

On the other hand, I admit that I may be going about this wrongly from
the start; perhaps there's a better approach?

Any pointers would be most gratefully received.

Many thanks!


-- 
Regards,

Mike Nielsen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Combining a list of similar dataframes into a single dataframe

2006-07-08 Thread Mike Nielsen
Well, this worked, and rather more quickly than I had expected.

Many thanks to the dogs, who told me the answer in return for walking
them and feeding them!

 jj - eval(parse(text=paste(sep= ,rbind(,paste(sep= 
 ,t1[[,1:length(t1),]],collapse=,),
 str(jj)
`data.frame':   85644 obs. of  4 variables:
 $ server  : Factor w/ 122 levels AB93-99,AMP93-1,..: 1 1 1 1
1 1 1 1 1 1 ...
 $ ts  :'POSIXct', format: chr  2006-06-30 12:31:44
2006-06-30 12:32:58 2006-06-30 12:34:46 2006-06-30 12:36:55 ...
 $ countername : Factor w/ 4 levels Bytes Received/sec,..: 1 1 1 1 1
1 1 1 1 1 ...
 $ countervalue: numNA  938  816 4213  906 ...


On 7/8/06, Mike Nielsen [EMAIL PROTECTED] wrote:
 I would be very grateful to anyone who could point to the error of my
 ways in the following.

 I have a dataframe called net1, as such:

  str(net1)
 `data.frame':114192 obs. of  9 variables:
  $ server : Factor w/ 122 levels AB93-99,AMP93-1,..: 1 1 1
 1 1 1 1 1 1 1 ...
  $ ts :'POSIXct', format: chr  2006-06-30 12:31:44
 2006-06-30 12:31:44 2006-06-30 12:31:44 2006-06-30 12:31:44 ...
  $ instance   : Factor w/ 22 levels 1,2,Compaq Ethernet_Fast
 Ethernet Adapter_Module,..: 4 4 4 4 4 4 4 4 4 4 ...
  $ instanceno : Factor w/ 3 levels 1,2,3: 1 1 1 1 1 1 1 1 1 1 ...
  $ perftime   : num  3.16e+13 3.16e+13 3.16e+13 3.16e+13 3.16e+13 ...
  $ perffreq   : num  6.99e+08 6.99e+08 6.99e+08 6.99e+08 6.99e+08 ...
  $ perftime100nsec: num  1.28e+17 1.28e+17 1.28e+17 1.28e+17 1.28e+17 ...
  $ countername: Factor w/ 4 levels Bytes Received/sec,..: 1 3 2
 4 1 3 2 4 1 3 ...
  $ countervalue   : num  6.08e+07 6.64e+07 5.58e+06 1.00e+08 6.09e+07 ...
 

 What I am trying to do is subset this thing down by server, instance,
 instanceno, countername and then apply a function to each subsetted
 dataframe.  The function performs a calculation on countervalue,
 essentially collapsing instanceno and instance down to a single
 value.

 Here is a snippet of my code:
 t1 - by(net1,
  list(
   net1$server,
   factor(as.character(net1$countername))),# get rid of
 unused levels of countername for this server
  function(x){
g - by(x,
list(factor(as.character(x$instance)), # get rid of
 unused levels of instance for this server
factor(as.character(x$instanceno))),   # same with 
 instanceno

 function(y){c(NA,mean(y$perffreq)*diff(y$countervalue)/diff(y$perftime))})
data.frame(server=x$server,
   ts=x$ts,
   countername = x$countername,
   countervalue =
 apply(sapply(g[!sapply(g,is.null)],I),1,sum))
  })

 So t1 then is a list of dataframes, each with an identical set of columns)

  str(t1[[1]])
 `data.frame':   149 obs. of  4 variables:
  $ server  : Factor w/ 122 levels AB93-99,AMP93-1,..: 1 1 1 1
 1 1 1 1 1 1 ...
  $ ts  :'POSIXct', format: chr  2006-06-30 12:31:44
 2006-06-30 12:32:58 2006-06-30 12:34:46 2006-06-30 12:36:55 ...
  $ countername : Factor w/ 4 levels Bytes Received/sec,..: 1 1 1 1 1
 1 1 1 1 1 ...
  $ countervalue: numNA  938  816 4213  906 ...

 What I'd dearly love to do, without looping or lapply-ing through t1
 and rbinding (too much data for this to finish quickly enough -- this
 is about 10% of what I'm eventually going to have to manage), is
 convert t1 to one big dataframe.

 On the other hand, I admit that I may be going about this wrongly from
 the start; perhaps there's a better approach?

 Any pointers would be most gratefully received.

 Many thanks!


 --
 Regards,

 Mike Nielsen



-- 
Regards,

Mike Nielsen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html