### Re: [R] plot legend: combining filled boxes and lines

Check out: http://tolstoy.newcastle.edu.au/R/e2/help/07/05/16777.html On 9/10/07, Lauri Nikkinen [EMAIL PROTECTED] wrote: Hello, I have difficulties combining boxes and lines in plot legend. I searched previous R-posts and found this (with no solution): http://tolstoy.newcastle.edu.au/R/help/06/07/30248.html. Is there a way to avoid boxes behind the line legends? x1 - rnorm(100) x2 - rnorm(100, 2) hist(x1, main = , col = orange,ylab = density, xlab = x, freq = F, density = 55, xlim = c(-2, 5), ylim = c(0, 0.5)) par(new = T) hist(x2, main = , col = green, ylab = , xlab = ,axes = F, xlim = c(-2, 5), ylim = c(0, 0.5), density = 45, freq = F) abline(v = mean(x1), col = orange, lty = 2, lwd = 2.5) abline(v = mean(x2), col = green, lty = 2, lwd = 2.5) legend(3, 0.45, legend = c(x1, x2, mean(x1), mean(x2)), col = c(orange, green), fill=c(orange,green, 0, 0), lty = c(0, 0, 2, 2), merge = T) Thanks Lauri __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] off-topic: better OS for statistical computing

You want whatever all the people you are working with are using to make it as easy as possible to work together with them. On 9/10/07, Wensui Liu [EMAIL PROTECTED] wrote: Good morning, everyone, I am sorry for this off-topic post but think I can get great answer from this list. My question is what is the best OS on PC (laptop) for statistical computing and why. I really appreciate your insight. Have a nice day. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] off-topic: better OS for statistical computing

My sense is that R users are even split between UNIX and Windows users so either will do in terms of the larger community. Some R packages may not be avaliable on every platform or will be available on one platform before another or there will be certain platform-specific issues. So in the end its easiest to have the same thing everyone else that you work with does. Also if you run into problems then you can ask others whereas if you are the lone person with something different you have no one to turn to. Also associated software may be, for example, Microsoft Office in a Microsoft environment and LaTeX in a UNIX environment. And networking will be simplified in a consistent environment too. Certainly there is Open Office, Samba and putty but the easiest is just not to have to worry about getting everything to work together by just having the same thing in the first place. Neither Linux nor Windows is superior to the other. People making such representations generally know one much better than the other and its more a reflection of their own experience than anything else. I personally have used both UNIX and Windows since their inception and find that I tend to have a slight preference for whatever I used last. Technical merits of one vs. the other are basically irrelevant for most purposes. On 9/10/07, Patrick Connolly [EMAIL PROTECTED] wrote: On Mon, 10-Sep-2007 at 12:26PM -0400, Gabor Grothendieck wrote: | You want whatever all the people you are working with are using | to make it as easy as possible to work together with them. Assuming you're using R, there is negligible difficulty using a different OS from what your colleagues use (apart from the inconsistencies you get between different versions of Windows, but even that has little effect on R). The standard .RData binary files work with Windows and Linux (and probably OS X). The only issue I come across is that Linux can't create WMF files as readily as Windows can, and that is more than made up for by the greater flexibility that Linux offers. It's easier in Linux to produce Excel files from dataframes and matrices using a perl script posted to this list by Marc Schwartz. Thanks again Marc. Best Patrick | | On 9/10/07, Wensui Liu [EMAIL PROTECTED] wrote: | Good morning, everyone, | I am sorry for this off-topic post but think I can get great answer | from this list. | My question is what is the best OS on PC (laptop) for statistical | computing and why. | I really appreciate your insight. | Have a nice day. | | __ | R-help@stat.math.ethz.ch mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_Middle minds discuss events (:_~*~_:)Small minds discuss people (_)-(_) . Anon ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] finding the minimum positive value of some data

Here are some solutions each of which 1. has only one line, 2. x only occurs once so you can just plug in a complex expression 3. no temporary variables are left min(sapply(x, function(z) if (z 0) z else Inf)) (function(z) min(ifelse(z 0, z, Inf))) (x) with(list(z = x), min(z[z 0])) local({ z - x; min(z[z 0]) }) On 9/10/07, dxc13 [EMAIL PROTECTED] wrote: useRs, I am looking to find the minimum positive value of some data I have. Currently, I am able to find the minimum of data after I apply some other functions to it: x [1] 1 0 1 2 3 3 4 5 5 5 6 7 8 8 9 9 10 10 sort(x) [1] 0 1 1 2 3 3 4 5 5 5 6 7 8 8 9 9 10 10 diff(sort(x)) [1] 1 0 1 1 0 1 1 0 0 1 1 1 0 1 0 1 0 min(diff(sort(x))) [1] 0 The minimum is given as zero, which is clearly true, but I am interested in only the positive minimum, which is 1. Can I find this by using only 1 line of code, like I have above? Thanks! dxc13 -- View this message in context: http://www.nabble.com/finding-the-minimum-positive-value-of-some-data-tf4417250.html#a12599319 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] what am I missing

Its a FAQ: http://hermes.sdu.dk/Rdoc/faq.html#Why%20does%20outer()%20behave%20strangely%20with%20my%20function%3f On 9/10/07, Jan de Leeuw [EMAIL PROTECTED] wrote: x-seq(-1,1,length=10) y-seq(-1,1,length=10) a-matrix(c(1,2,2,1),2,2) b-matrix(c(2,1,1,2),2,2) fv-function(x,y) { m-x*a+y*b t-m[1,1]+m[2,2]; d-m[1,1]*m[2,2]-m[1,2]^2 return((t-sqrt(t^2-4*d))/2) } gv-function(x,y) { t-x*(a[1,1]+a[2,2])+y*(b[1,1]+b[2,2]) d-(x*a[1,1]+y*b[1,1])*(x*a[2,2]+y*b[2,2])-(x*a[1,2]+y*b[1,2])^2 return((t-sqrt(t^2-4*d))/2) } now outer(x,y,gv) works as expected, outer(x,y,fv) bombs. But z-matrix(0,10,10); for (i in 1:10) for (j in 1:10) z[i,j]-fv(x[i],y [j]) works fine. Must be something in outer(). == Jan de Leeuw, 11667 Steinhoff Rd, Frazier Park, CA 93225, 661-245-1725 .mac: jdeleeuw ++ aim: deleeuwjan ++ skype: j_deleeuw homepages: http://www.cuddyvalley.org and http://gifi.stat.ucla.edu == A bath when you're born, a bath when you die, how stupid. (Issa 1763-1827) [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] SQL like function?

Great. Regarding the web, note that there are actually quite a few R web projects as well: http://www.lmbe.seu.edu.cn/CRAN/doc/FAQ/R-FAQ.html#R-Web-Interfaces I have used rpad (www.rpad.org) which has an integrated web server right in the R package making setup a non-issue. On 9/8/07, Takatsugu Kobayashi [EMAIL PROTECTED] wrote: Hi Gabor, Wow, this is awesome although I eventually should learn MySQL for integrating it on web-based DB management using PHP or Perl, this is a very helpful tool for me to start with! Thank you very much Gabor Grothendieck wrote: Others have already pointed out %in% but regarding your comment about SQL, you can use SQL to manipulate R data frames using the sqldf package which provides an interface to lower level RSQLite (and RMySQL in the future) routines. The following examples use SQLite underneath: DF - data.frame(observation = c(1,2,3,4,5)) ID - data.frame(ID = c(1, 3, 4)) library(sqldf) sqldf(select observation, observation in (select * from ID) `ID?` from DF) # or sqldf(select observation, observation in (1, 3, 4) `ID?` from DF) See home page at: http://sqldf.googlecode.com On 9/7/07, Takatsugu Kobayashi [EMAIL PROTECTED] wrote: Hi RUsers, I am wonder if I can search observations whose IDs matches any of the values in another vector, such as in MySQL. While I am learing MySQL for future database management, I appreciate if anyone could give me a hint. Suppose I have one 5*1 vector containing observation IDs and frequencies, and one 3*1 vector containing observation IDs. observation-c(1,2,3,4,5) ID-c(1,3,4) Then, I would like to program a code that returns a results showing matched observations like result: TRUE FALSE TRUE TRUE FALSE I am reading S programming, but I cannot find a way to do this. Thank you very much. Taka __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Lisp-like primitives in R

On 9/8/07, Peter Dalgaard [EMAIL PROTECTED] wrote: François Pinard wrote: [Roland Rau] [François Pinard] I wonder what happened, for R to hide the underlying Scheme so fully, at least at the level of the surface language (despite there are hints). To further foster portability, we chose to write R in ANSI C Yes, of course. Scheme is also (often) implemented in C. I meant that R might have implemented a Scheme engine (or part of a Scheme engine, extended with appropriate data types) with a surface language (nearly the S language) which is purposely not Scheme, but could have been. If the gap is not extreme, one could dare dreaming that the Scheme engine in R be completed, and Scheme offered as an alternate extension language. If you allow me to continue dreaming awake -- they told me they will let me free as long as I do not get dangerous! :-) -- part of the interest lies in the fact there are excellent Scheme compilers. If we could only find or devise some kind of marriage between a mature Scheme and R, so to speed up the non-vectorisable parts of R scripts... Well, depending on what you want, this is either trivial or impossible... The internal storage of R is still pretty much equivalent to scheme. E.g. try this: r2scheme - function(e) if (!is.recursive(e)) deparse(e) else c((, unlist(lapply(as.list(e), r2scheme)), )) paste(r2scheme(quote(for(i in 1:4)print(i))), collapse= ) [1] ( for i ( : 1 4 ) ( print i ) ) Also see showTree in codetools: library(codetools) showTree(quote(for(i in 1:4)print(i))) (for i (: 1 4) (print i)) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] SQL like function?

Others have already pointed out %in% but regarding your comment about SQL, you can use SQL to manipulate R data frames using the sqldf package which provides an interface to lower level RSQLite (and RMySQL in the future) routines. The following examples use SQLite underneath: DF - data.frame(observation = c(1,2,3,4,5)) ID - data.frame(ID = c(1, 3, 4)) library(sqldf) sqldf(select observation, observation in (select * from ID) `ID?` from DF) # or sqldf(select observation, observation in (1, 3, 4) `ID?` from DF) See home page at: http://sqldf.googlecode.com On 9/7/07, Takatsugu Kobayashi [EMAIL PROTECTED] wrote: Hi RUsers, I am wonder if I can search observations whose IDs matches any of the values in another vector, such as in MySQL. While I am learing MySQL for future database management, I appreciate if anyone could give me a hint. Suppose I have one 5*1 vector containing observation IDs and frequencies, and one 3*1 vector containing observation IDs. observation-c(1,2,3,4,5) ID-c(1,3,4) Then, I would like to program a code that returns a results showing matched observations like result: TRUE FALSE TRUE TRUE FALSE I am reading S programming, but I cannot find a way to do this. Thank you very much. Taka __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] help on replacing values

Your columns are factors, not character strings. Use as.is = TRUE as an argument to read.table. Also its a bit dangerous to use T although not wrong. Its safer to use TRUE. On 9/7/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Dear List, I have a newbie question. I have read in a data.frame as follows: data = read.table(table.txt, header = T) data X1 X2 X3 X4 A AB AC AB AC B AB AC AA AB C AA AB AA AB D AA AB AB AC E AB AA AA AB F AB AA AB AC B AB AC AB AA I would like to replace AA values by BB in column X2. I have tried using replace() with no success, although I am not sure this is the right function. This is the code I have used: data$X2 - replace(data$X2, data$X2 ==AA,BB) Warning message: invalid factor level, NAs generated in: `[-.factor`(`*tmp*`, list, value = BB) What is wrong with the code? How can I get this done? how about changing AA values by BB in all 4 columns simultaneously? Actually this is a small example dataframe, the real one would have about 1000 columns. Extendind this, I found a similar thread dated July 2006 that used replace() on iris dataset, but I have tried reproducing it obtaining same warning message iris$Species - replace(iris$Species, iris$Species == setosa,NewName) Warning message: invalid factor level, NAs generated in: `[-.factor`(`*tmp*`, list, value = NewName) Thanks in advance your help, David __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] variable format

A matrix is for situations where every element is of the same class but your columns have different classes so use a data frame: DF - data.frame(a = 11:15, b = letters[1:5], stringsAsFactors = FALSE) subset(DF, a %in% 11:13) subset(DF, a %in% c(0, 11:13)) # same Suggest you review the Introduction to R manual and look at ?data.frame, ?subset and ?%in% On 9/4/07, Cory Nissen [EMAIL PROTECTED] wrote: Okay, I want to do something similar to SAS proc format. I usually do this... a - NULL a$divisionOld - c(1,2,3,4,5) divisionTable - matrix(c(1, New England, 2, Middle Atlantic, 3, East North Central, 4, West North Central, 5, South Atlantic), ncol=2, byrow=T) a$divisionNew[match(a$divisionOld, divisionTable[,1])] - divisionTable[,2] But how do I handle the case where... a$divisionOld - c(0,1,2,3,4,5) #no format available for 0, this throws an error. OR divisionTable - matrix(c(1, New England, 2, Middle Atlantic, 3, East North Central, 4, West North Central, 5, South Atlantic, 6, East South Central, 7, West South Central, 8, Mountain, 9, Pacific), ncol=2, byrow=T) There are extra formats available... this throws a warning. Thanks Cory [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] ploting missing data

Try this: library(zoo) plot(na.approx(zoo(as.matrix(data[-1]), data[,1])), plot.type = single) See ?na.approx, ?plot.zoo, ?xyplot.zoo and vignette(zoo) On 9/7/07, Markus Schmidberger [EMAIL PROTECTED] wrote: Hello, I have this kind of dataframe and have to plot it. data - data.frame(sw= c(1,2,3,4,5,6,7,8,9,10,11,12,15), zehn = c(33.44,20.67,18.20,18.19,17.89,19.65,20.05,19.87,20.55,22.53,NA,NA,NA), zwanzig = c(61.42,NA,26.60,23.28,NA,24.90,24.47,24.53,26.41,28.26,NA,29.80,35.49), fuenfzig = c(162.51,66.08,49.55,43.40,NA,37.77,35.53,36.46,37.25,37.66,NA,42.29,47.80) ) The plot should have lines: lines(fuenfzig~sw, data=data) lines(zwanzig~sw, data=data) But now I have holes in my lines for the missing values (NA). How to plot the lines without the holes? The missing values should be interpolated or the left and right point directly connected. The function approx interpolates the whole dataset. Thats not my goal! Is there no plotting function to do this directly? Best Markus -- Dipl.-Tech. Math. Markus Schmidberger Ludwig-Maximilians-Universität München IBE - Institut für medizinische Informationsverarbeitung, Biometrie und Epidemiologie Marchioninistr. 15, D-81377 Muenchen URL: http://ibe.web.med.uni-muenchen.de Mail: Markus.Schmidberger [at] ibe.med.uni-muenchen.de __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Delete query in sqldf?

Yes but delete does not return anything so its not useful. In the devel version of sqldf you can pass multiple command so try this using the builtin data frame BOD noting that the record with demand = 8.3 was removed: library(sqldf) Loading required package: RSQLite Loading required package: DBI Loading required package: gsubfn Loading required package: proto # overwrite with devel version of the sqldf.R file source(http://sqldf.googlecode.com/svn/trunk/R/sqldf.R;) sqldf(c(delete from BOD where demand = 8.3, select * from BOD)) Time__1 demand 1 2 10.3 2 3 19.0 3 4 16.0 4 5 15.6 5 7 19.8 On 9/7/07, Paul Smith [EMAIL PROTECTED] wrote: Dear All, Is sqldf equipped with delete queries? I have tried delete queries but with no success. Thanks in advance, Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Automatic detachment of dependent packages

If its good enough just to get rid of all attached packages since after startup you could just do repeated detaches like this making use of the fact that search() has 9 components on startup: replicate(length(search()) - 9, detach()) On 9/7/07, Paul Smith [EMAIL PROTECTED] wrote: Dear All, When one loads certain packages, some other dependent packages are loaded as well. Is there some way of detaching them automatically when one detaches the first package loaded? For instance, library(sqldf) Loading required package: RSQLite Loading required package: DBI Loading required package: gsubfn Loading required package: proto but detach(package:sqldf) search() [1] .GlobalEnvpackage:gsubfnpackage:proto [4] package:RSQLite package:DBI package:stats [7] package:graphics package:grDevices package:utils [10] package:datasets package:methods Autoloads [13] package:base The packages RSQLite DBI gsubfn proto were not detached. Thanks in advance, Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Delete query in sqldf?

All sqldf does is pass the command to sqlite and retrieve whatever it sends back translating the two directions to and from R. sqldf does not change the meaning of any sql statements. Perhaps the meaning you expect is desirable but its not how sqlite works. If sqlite were changed to adopt that meaning then sqldf would automatically get it too. Here is an example which does not involve R at all which illustrates that delete returns nothing. C:\ sqlite3 SQLite version 3.4.0 Enter .help for instructions sqlite sqlite create table t1(a,b); sqlite insert into T1 values(1,2); sqlite insert into T1 values(1,3); sqlite insert into T1 values(2,4); sqlite delete from t1 where b = 2; sqlite select * from t1; 1|3 2|4 On 9/7/07, Paul Smith [EMAIL PROTECTED] wrote: On 9/7/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: Yes but delete does not return anything so its not useful. In the devel version of sqldf you can pass multiple command so try this using the builtin data frame BOD noting that the record with demand = 8.3 was removed: library(sqldf) Loading required package: RSQLite Loading required package: DBI Loading required package: gsubfn Loading required package: proto # overwrite with devel version of the sqldf.R file source(http://sqldf.googlecode.com/svn/trunk/R/sqldf.R;) sqldf(c(delete from BOD where demand = 8.3, select * from BOD)) Time__1 demand 1 2 10.3 2 3 19.0 3 4 16.0 4 5 15.6 5 7 19.8 I see, Gabor, but I would expect as more natural to have sqldf(delete from BOD where demand = 8.3) working, with no second command. Paul On 9/7/07, Paul Smith [EMAIL PROTECTED] wrote: Dear All, Is sqldf equipped with delete queries? I have tried delete queries but with no success. Thanks in advance, Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] R first.id last.id function error

A slightly easier way to construct first and last if the vector x is sorted (as is assumed in SAS) is: first - !duplicated(x) last - !duplicated(x, fromLast = TRUE) where the fromLast= argument is added in R 2.6.0. On 9/7/07, Gerard Smits [EMAIL PROTECTED] wrote: Hi R users, I have a test dataframe (file1, shown below) for which I am trying to create a flag for the first and last ID record (equivalent to SAS first.id and last.id variables. Dump of file1: file1 id rx week dv1 1 1 11 1 2 1 12 1 3 1 13 2 4 2 11 3 5 2 12 4 6 2 13 1 7 3 11 2 8 3 12 3 9 3 13 4 10 4 11 2 11 4 12 6 12 4 13 5 13 5 21 7 14 5 22 8 15 5 23 5 16 6 21 2 17 6 22 4 18 6 23 6 19 7 21 7 20 7 22 8 21 8 21 9 22 9 21 4 23 9 22 5 I have written code that correctly assigns the first.id and last.id variabes: require(Hmisc) #for Lags #ascending order to define first dot file1- file1[order(file1$id, file1$week),] file1$first.id - (Lag(file1$id) != file1$id) file1$first.id[1]-TRUE #force NA to TRUE #descending order to define last dot file1- file1[order(-file1$id,-file1$week),] file1$last.id - (Lag(file1$id) != file1$id) file1$last.id[1]-TRUE #force NA to TRUE #resort to original order file1- file1[order(file1$id,file1$week),] I am now trying to get the above code to work as a function, and am clearly doing something wrong: first.last - function (df, idvar, sortvars1, sortvars2) + { + #sort in ascending order to define first dot + df- df[order(sortvars1),] + df$first.idvar - (Lag(df$idvar) != df$idvar) + #force first record NA to TRUE + df$first.idvar[1]-TRUE + + #sort in descending order to define last dot + df- df[order(-sortvars2),] + df$last.idvar - (Lag(df$idvar) != df$idvar) + #force last record NA to TRUE + df$last.idvar[1]-TRUE + + #resort to original order + df- df[order(sortvars1),] + } Function call: first.last(df=file1, idvar=file1$id, sortvars1=c(file1$id,file1$week), sortvars2=c(-file1$id,-file1$week)) R Error: Error in as.vector(x, mode) : invalid argument 'mode' I am not sure about the passing of the sort strings. Perhaps this is were things are off. Any help greatly appreciated. Thanks, Gerard [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] creat list

Try this: do.call(cbind, lista) On 9/6/07, livia [EMAIL PROTECTED] wrote: Hi, I have a list named lista, which has 50 vectors and each vector has the length about 1200. I would like to creat a matrix out of lista. What I try now is cbind(lista[[1]],lista[[2]],...,lista[[50]]). I guess there would be an easy way of doing this. Could anyone give me some advice? -- View this message in context: http://www.nabble.com/creat-list-tf4391162.html#a12519637 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Excel

On my version of Excel (Excel 2007 under Vista) using File | Open on a file, a.txt such as: a b sep7 10 sep10 11 causes it to enter a wizard where it asks you for the delimiters and column types so you can change it from what it offers as the default. In particular, if you leave it at General it will guess Date but you can specify Text or you can specify Date to cause it to select a particular type. On 9/6/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Quoting Robert A LaBudde [EMAIL PROTECTED]: If you format the column as Text, you won't have this problem. By leaving the cells as General, you leave it up to Excel to guess at the correct interpretation. You will note that the conversion to a date occurs immediately in Excel when you enter the value. There are many formats to enter dates. Either pre-format the column as Text, or prefix the individual entry with an ' to indicate text. But the conversion is done as soon as the file is opened, _before_ you have the chance to format the column as text!!! Once the conversion is done... it's done. I had gene names such as SEP7 converted by Excel into a 5 digit number representing a date. From that number I didn't find a way to reconstruct SEP7. Sept-7 is not the same. It seems like a problem with an easy solution. But it isn't. There are too many variations. A similar problem occurs in R's read.table() function when a factor has levels that can be interpreted as numbers. at least with read.table you can specify the classes of each column _before_ you read the file. R developers are better behaved than MS Excel ones ;-) Jose At 10:11 PM 8/27/2007, David wrote: A common process when data is obtained in an Excel spreadsheet is to save the spreadsheet as a .csv file then read it into R. Experienced users might have learned to be wary of dates (as I have) but possibly have not experienced what just happened to me. I thought I might just share it with r-help as a cautionary tale. I received an Excel file giving patient details. Each patient had an ID code in the form of three letters followed by four digits. (Actually a New Zealand National Health Identification.) I saved the .xls file as .csv. Then I opened up the .csv (with Excel) to look at it. In the column of ID codes I saw: Aug-99. Clicking on that entry it showed 1/08/2699. In a column of character data, Excel had interpreted AUG2699 as a date. The .csv did not actually have a date in that cell, but if I had saved the .csv file it would have. David Scott Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr. Jose I. de las Heras Email: [EMAIL PROTECTED] The Wellcome Trust Centre for Cell BiologyPhone: +44 (0)131 6513374 Institute for Cell Molecular BiologyFax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Excel

That is not what happens in Excel 2007 when I tried it just now. I tried saving the same file I displayed in my prior message as an .xls file and as an .xlsx file and in both cases the first column came back as text, as I had specified to the Wizard on the initial import. I guess they fixed the behavior in Excel 2007. On 9/6/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Yes, and then you save it, you open it again... same behaviour. The only way I found around it was to insert a character at the beginning of every element in such columns. An apostrophe works, but it looks ugly. Yes, when loading the data in R you could easily clean it up automatically... doable. You can add a space. Then it will not show, but you have to remember that if you ever use the data for labels etc. You shouldn't need to do that in the first place... Jose Quoting Erich Neuwirth [EMAIL PROTECTED]: There is a hack to get around the problem. It is definitely not a good solution, just a hack. Open the .csv file in a text editor and select everything. Paste it into an empty Excel sheet. Then use Data - Text to Columns The third dialog box (at least it is the third one in Excel 2003) allows you to format each column of the data. This is the place where you can switch off the date interpretation of your ID column. AUG1838 probably is not onterpreted as date because Excel dates only start at 1/1/1900. Duncan Murdoch wrote: On 8/28/2007 3:16 AM, J Dougherty wrote: On Monday 27 August 2007 22:21, David Scott wrote: On Tue, 28 Aug 2007, Robert A LaBudde wrote: If you format the column as Text, you won't have this problem. By leaving the cells as General, you leave it up to Excel to guess at the correct interpretation. Not true actually. I had converted the column to Text because I saw the interpretation as a date in the .xls file. I saved the .csv file *after* the column had been converted to Text. Looking at the .csv file in a text editor, the entry is correct. I have just rechecked this. On reopening the .csv using Excel, the entry AUG2699 had been interpreted as a date, and was showing as Aug-99. Most bizarre is that the NHI value of AUG1838 has *not* been interpreted as a date. -- Erich Neuwirth, University of Vienna Faculty of Computer Science Computer Supported Didactics Working Group Visit our SunSITE at http://sunsite.univie.ac.at Phone: +43-1-4277-39464 Fax: +43-1-4277-39459 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr. Jose I. de las Heras Email: [EMAIL PROTECTED] The Wellcome Trust Centre for Cell BiologyPhone: +44 (0)131 6513374 Institute for Cell Molecular BiologyFax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Lisp-like primitives in R

Reduce, Filter and Map are part of R 2.6.0. Try ?Reduce On 9/6/07, Chris Elsaesser [EMAIL PROTECTED] wrote: I mainly program in Common Lisp and use R for statistical analysis. While in R I miss the power and ease of use of Lisp, especially its many primitives such as find, member, cond, and (perhaps a bridge too far) loop. Has anyone created a package that includes R analogs to a subset of Lisp functions? Chris Elsaesser, PhD Principal Scientist, Machine Learning SPADAC Inc. 7921 Jones Branch Dr. Suite 600 McLean, VA 22102 703.371.7301 (m) 703.637.9421 (o) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] problems in read.table

See ?count.fields to get a vector of how many fields are on each line. Also fill = TRUE on read.table() can be used to fill out short lines if that is appropriate. On 9/6/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Dear R-users, I have encountered the following problem every now and then. But I was dealing with a very small dataset before, so it wasn't a problem (I just edited the dataset in Openoffice speadsheet). This time I have to deal with many large datasets containing commuting flow data. I appreciate if anyone could give me a hint or clue to get out of this problem. I have a .dat file called 1081.dat: 1001 means Birmingham, AL. I imported this .dat file using read.table like tmp-read.table('CTPP3_ANSI/MPO3441_ctpp3_sumlv944.dat',header=T) Then I got this error message: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 9499 did not have 209 elements Since I got an error message saying other rows did not have 209 elements, I added skip=c(205,9499,9294)) in hoping that R would take care of this problem. But I got a similar error message: tmp-read.table('CTPP3_ANSI/MPO3441_ctpp3_sumlv944.dat',header=T,skip=c(205,9499,9294)) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 9294 did not have 209 elements In addition: Warning message: the condition has length 1 and only the first element will be used in: if (skip 0) readLines(file, skip) Is there any way to let a R code to automatically skip problematic rows? Thank you very much! Taka __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] 'singular gradient matrix’ when using nl s() and how to make the program skip nls( ) and run on

In case 1 graph your function and then use optimize rather than nls. In case 2 a and b may have the same effect as c on f whereas they don't vary in case 1 so it does not matter. For example consider minimizing f - function(a, b) (a + b)^2 If a is fixed at zero then the minimum occurs for b=0 but if a is not fixed then increasing a and decreasing b by the same amount causes no change in the result so the gradient in such a direction is zero. On 9/5/07, Yuchen Luo [EMAIL PROTECTED] wrote: Dear friends. I use nls() and encounter the following puzzling problem: I have a function f(a,b,c,x), I have a data vector of x and a vectory y of realized value of f. Case1 I tried to estimate c with (a=0.3, b=0.5) fixed: nls(y~f(a,b,c,x), control=list(maxiter = 10, minFactor=0.5 ^2048),start=list(c=0.5)). The error message is: number of iterations exceeded maximum of 10 Case2 I then think maybe the value of a and be are not reasonable. So, I let nls() estimate (a,b,c) altogether: nls(y~f(a,b,c,x), control=list(maxiter = 10, minFactor=0.5 ^2048),start=list(a=0.3,b=0.5,c=0.5)). The error message is: singular gradient matrix at initial parameter estimates. This is what puzzles me, if the initial parameter of (a=0.3,b=0.5,c=0.5) can create 'singular gradient matrix', then why doesn't this 'singular gradient matrix' appear in Case1? I have tried to change the initial value of (a,b,c) around but the problem persists. I am wondering if there is a way out. My another question is, I need to run 220 of nls() in my program with different y and x. When one of the nls() encounter a problem, the whole program stops. In my case, the 3rd nls() runs into a problem. I would still need the program to run the remaining 217 nls( )! Is there a way to make the program skip the problematic nls() and complete the ramaining nls()'s? Your help will be highly appreciated! Yuchen Luo [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Table and ftable

Try this which gives an object of the required shape and of class c(xtabs, table) : xx - xtabs(area ~ sic + level, DF) You can optionally do it like this to make it class matrix xx - xtabs(area ~ sic + level, DF)[] and if you don't want the call attribute: attr(xx, call) - NULL On 9/4/07, Giulia Bennati [EMAIL PROTECTED] wrote: Dear listmembers, I have a little question: I have my data organized as follow sic level area a2112.4 b3112.3 b3220.2 b3220.5 c1003.0 c1001.5 c2421.5 d2220.2 where levels and sics are factors. I'm trying to obtain a matrix like this: level 211311322 100242 222 sic a2.4 0 0 0 00 b 0 2.30.7 0 00 c 00 0 4.5 1.5 0 d 00 00 0 0.2 I tryed with table function as table(sic,level) but i obteined only a contingency table. Have you any suggestions? Thank you very much, Giulia __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Variable scope in a function

environment(test_func) - baseenv() will allow it to access the base environment so it can still find exists but will not find kat. If you issue the command search() then each attached package has the next as its parent and base is the last one. Regarding your second question, try rm(). f - function() { x - 1; rm(x); exists(x, environment()) } f() # FALSE On 9/4/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hello, I apologise in advance for this question; I'm sure it is answered in the documentation or this mailing list many times, however the answer has eluded me. I'm trying to write a function where I don't want external variables to be scoped in from the parent environment. Given this function: test_func = function() { if (exists(kat) == FALSE) { print(kat is undefined) } else { print(kat) } } If I did this: kat = 12 test_func() I'd like the result to be the error, but now it's 12 (which is of course correct according to the documentation). So there are two questions: 1) How can I disregard all variables from the parent environment within a function? (Although from what I've read on the mailing lists this isn't really what I want.) Apparently environment(test_func) = NULL is defunct, and what I thought was its replacement environment(test_func) = emptyenv() doesn't seem to be. 2) How can I undefine a variable, perhaps just within the context of my function. I'm hoping to find some line that I can put at the start of my function above so that the result would be: kat = 12 test_func() [1] kat is undefined kat [1] 12 Thanks in advance for any help! Cheers, Demitri __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] using temporary arrays in R

You can do it in a local, in a function or explicitly remove it. Also if you never assign it to a variable then it will be garbage collected as well # 1 local({ print(gc()) x - matrix(NA, 1000, 1000) print(gc()) }) gc() # 2 f - function() { print(gc()) x - matrix(NA, 1000, 1000) print(gc()) } f() gc() # 3 gc() x - matrix(NA, 1000, 1000) gc() rm(x) gc() # 4 gc() sum(matrix(1, 1000, 1000)) gc() On 9/3/07, dxc13 [EMAIL PROTECTED] wrote: useR's, Is there a way to create a temporary array (or matrix) in R to hold values, then drop or delete that temporary array from memory once I do not need it anymore? I am working with multidimensional arrays/matrices and I frequently perform multiple operations on the same matrix and rename it to be another object. I want to be able to delete the older versions of the array/matrix to free up space. Thank you. -- View this message in context: http://www.nabble.com/using-temporary-arrays-in-R-tf4372367.html#a12462219 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Derivative of a Function Expression

The Ryacas package can do that (but the function must be one line and it can't have brace brackets). The first yacas call below registers f with yacas, then we set up a function to act as a template to hold the derivative and then we set its body calling yacas again to take the derivative. library(Ryacas) f - function(x) 2*cos(x)^2 + 3*sin(x) + 0.5 yacas(f) # register f with yacas Df - f body(Df) - yacas(expression(deriv(f(x[[1]] Df Here is the output: library(Ryacas) f - function(x) 2*cos(x)^2 + 3*sin(x) + 0.5 yacas(f) [1] Starting Yacas! expression(TRUE) Df - f body(Df) - yacas(expression(deriv(f(x[[1]] Df function (x) 2 * (-2 * sin(x) * cos(x)) + 3 * cos(x) Also see: demo(Ryacas-Function) and the other demos, vignette and home page: http://ryacas.googlecode.com On 9/3/07, Rory Winston [EMAIL PROTECTED] wrote: Hi I am currently (for pedagogical purposes) writing a simple numerical analysis library in R. I have come unstuck when writing a simple Newton-Raphson implementation, that looks like this: f - function(x) { 2*cos(x)^2 + 3*sin(x) + 0.5 } root - newton(f, tol=0.0001, N=20, a=1) My issue is calculating the symbolic derivative of f() inside the newton() function. I cant seem to get R to do this...I can of course calculate the derivative by calling D() with an expression object containing the inner function definition, but I would like to just define the function once and then compute the derivative of the existing function. I have tried using deriv() and as.call(), but I am evidently misusing them, as they dont do what I want. Does anyone know how I can define a function, say foo, which manipulates one or more arguments, and then refer to that function later in my code in order to calculate a (partial) derivative? Thanks Rory [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Derivative of a Function Expression

Actually in thinking about this its pretty easy to do it without Ryacas too: Df - f body(Df) - deriv(body(f), x) Df On 9/3/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: The Ryacas package can do that (but the function must be one line and it can't have brace brackets). The first yacas call below registers f with yacas, then we set up a function to act as a template to hold the derivative and then we set its body calling yacas again to take the derivative. library(Ryacas) f - function(x) 2*cos(x)^2 + 3*sin(x) + 0.5 yacas(f) # register f with yacas Df - f body(Df) - yacas(expression(deriv(f(x[[1]] Df Here is the output: library(Ryacas) f - function(x) 2*cos(x)^2 + 3*sin(x) + 0.5 yacas(f) [1] Starting Yacas! expression(TRUE) Df - f body(Df) - yacas(expression(deriv(f(x[[1]] Df function (x) 2 * (-2 * sin(x) * cos(x)) + 3 * cos(x) Also see: demo(Ryacas-Function) and the other demos, vignette and home page: http://ryacas.googlecode.com On 9/3/07, Rory Winston [EMAIL PROTECTED] wrote: Hi I am currently (for pedagogical purposes) writing a simple numerical analysis library in R. I have come unstuck when writing a simple Newton-Raphson implementation, that looks like this: f - function(x) { 2*cos(x)^2 + 3*sin(x) + 0.5 } root - newton(f, tol=0.0001, N=20, a=1) My issue is calculating the symbolic derivative of f() inside the newton() function. I cant seem to get R to do this...I can of course calculate the derivative by calling D() with an expression object containing the inner function definition, but I would like to just define the function once and then compute the derivative of the existing function. I have tried using deriv() and as.call(), but I am evidently misusing them, as they dont do what I want. Does anyone know how I can define a function, say foo, which manipulates one or more arguments, and then refer to that function later in my code in order to calculate a (partial) derivative? Thanks Rory [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Derivative of a Function Expression

The problem is that brace brackets are not in the derivatives table. Make sure you don't have any. On 9/3/07, Alberto Vieira Ferreira Monteiro [EMAIL PROTECTED] wrote: Gabor Grothendieck wrote: Actually in thinking about this its pretty easy to do it without Ryacas too: Df - f body(Df) - deriv(body(f), x) Df This is weird. f - function(x) { x^2 + 2*x+1 } Df - f body(Df) - deriv(body(f), x) # error Also: f - function(x) x^2 + 2 * x + 1 Df - f body(Df) - deriv(body(f), x) # ok D2f - f body(D2f) - deriv(body(Df), x) # error Alberto Monteiro __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Derivative of a Function Expression

One improvement. This returns a function directly without having to create a template and filling in its body: deriv(body(f), x, func = TRUE) On 9/3/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: The problem is that brace brackets are not in the derivatives table. Make sure you don't have any. On 9/3/07, Alberto Vieira Ferreira Monteiro [EMAIL PROTECTED] wrote: Gabor Grothendieck wrote: Actually in thinking about this its pretty easy to do it without Ryacas too: Df - f body(Df) - deriv(body(f), x) Df This is weird. f - function(x) { x^2 + 2*x+1 } Df - f body(Df) - deriv(body(f), x) # error Also: f - function(x) x^2 + 2 * x + 1 Df - f body(Df) - deriv(body(f), x) # ok D2f - f body(D2f) - deriv(body(Df), x) # error Alberto Monteiro __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Derivative of a Function Expression

And if f has brace brackets surrounding the body then do this: f - function(x) { x*x } deriv(body(f)[[2]], x, func = TRUE) If you are writing a general function you can do this: e - if (identical(body(f)[[1]], as.name({))) body(f)[[2]] else body(f) deriv(e, x, func = TRUE) On 9/3/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: One improvement. This returns a function directly without having to create a template and filling in its body: deriv(body(f), x, func = TRUE) On 9/3/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: The problem is that brace brackets are not in the derivatives table. Make sure you don't have any. On 9/3/07, Alberto Vieira Ferreira Monteiro [EMAIL PROTECTED] wrote: Gabor Grothendieck wrote: Actually in thinking about this its pretty easy to do it without Ryacas too: Df - f body(Df) - deriv(body(f), x) Df This is weird. f - function(x) { x^2 + 2*x+1 } Df - f body(Df) - deriv(body(f), x) # error Also: f - function(x) x^2 + 2 * x + 1 Df - f body(Df) - deriv(body(f), x) # ok D2f - f body(D2f) - deriv(body(Df), x) # error Alberto Monteiro __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Comparing transform to with

Try this version of transform. In the first test we show it works on your example but we have used the head of the built in anscombe data set. The second and third show that it necessarily is incompatible with transform because transform always looks up variables in DF first whereas my.transform looks up the computed ones first. my.transform - function(DF, ...) { f - function(){} formals(f) - eval(substitute(as.pairlist(c(alist(...), DF body(f) - substitute(modifyList(DF, data.frame(...))) f() } # test a - head(anscombe) # 1 my.transform(a, sum1 = x1+x2+x3+x4, sum2 = y1+y2+y3+y4, total = sum1+sum2) # 2 my.transform(a, y2 = y1, y3 = y2) # 3 transform(a, y2 = y1, y3 = y2) # different On 9/1/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Hi All, I've been successfully using the with function for analyses and the transform function for multiple transformations. Then I thought, why not use with for both? I ran into problems couldn't figure them out from help files or books. So I created a simplified version of what I'm doing: rm( list=ls() ) x1-c(1,3,3) x2-c(3,2,1) x3-c(2,5,2) x4-c(5,6,9) myDF-data.frame(x1,x2,x3,x4) rm(x1,x2,x3,x4) ls() myDF This creates two new variables just fine transform(myDF, sum1=x1+x2, sum2=x3+x4 ) This next code does not see sum1, so it appears that transform cannot see the variables that it creates. Would I need to transform new variables in a second pass? transform(myDF, sum1=x1+x2, sum2=x3+x4, total=sum1+sum2 ) Next I'm trying the same thing using with. It doesn't not work but also does not generate error messages, giving me the impression that I'm doing something truly idiotic: with(myDF, { sum1-x1+x2 sum2-x3+x4 total - sum1+sum2 } ) myDF ls() Then I thought, perhaps one of the advantages of transform is that it works on the left side of the equation without using a longer name like myDF$sum1. with probably doesn't do that, so I use the longer form below. It also does not work and generates no error messages. # Try it again, writing vars to myDF explicitly. # It generates no errors, and no results. with(myDF, { myDF$sum1-x1+x2 myDF$sum2-x3+x4 myDF$total - myDF$sum1+myDF$sum2 } ) myDF ls() I would appreciate some advice about the relative roles of these two functions why my attempts with with have failed. Thanks! Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Function modification: how to calculate values for every combination?

Just to add to this be sure you do have names if you want them and read about vectorization in ?outer in case fun was just an example and your actual fun is more complex: x - c(1,2,3) names(x) - x y - c(4,5,6) names(y) - y outer(x, y, fun) # as in previous answer # or outer(-log(15) * x, log(10) * y, +) On 9/2/07, Erich Neuwirth [EMAIL PROTECTED] wrote: outer(x,y,fun) Lauri Nikkinen wrote: Hello, I have a function like this: fun - function (x, y) { a - log(10)*y b - log(15)*x extr - a-b extr } fun(2,3) [1] 1.491655 x - c(1,2,3) y - c(4,5,6) fun(x, y) [1] 6.502290 6.096825 5.691360 How do I have to modify my function that I can calculate results using every combination of x and y? I would like to produce a matrix which includes the calculated values in every cell and names(x) and names(y) as row and column headers respectively. Is the outer-function a way to solution? Best regards, __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Synchronzing workspaces

You could try saving prior to quitting in the future if you want to try those arguments. On 9/3/07, Paul August [EMAIL PROTECTED] wrote: Thanks for sharing your experience. In my case, the involved machines are Windows Vista, XP and 2000. Not sure whether it contributes to my problem or not. I will look into this further. I just noticed the two arguments ascii and compress for save. However, my .RData file was created by q() with yes. The manual says that q() is equivalent to save(list = ls(all=TRUE), file = .RData). There seems to be no way to set ascii or compression of save through q function, unless the q function is replaced explicitly with save(list = ls(all=TRUE), file = .RData, ascii = T). Paul. - Original Message From: Gabor Grothendieck [EMAIL PROTECTED] To: Paul August [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Sent: Thursday, August 30, 2007 11:24:31 PM Subject: Re: [R] Synchronzing workspaces I haven't had similar experience but note that save has ascii= and compress= arguments. You could check if varying those parameter values makes a difference. On 8/30/07, Paul August [EMAIL PROTECTED] wrote: I used to work on several computers and to use a flash drive to synchronize the workspace on each machine before starting to work on it. I found that .RData always caused some trouble: Often it is corrupted even though there is no error in copying process. Does anybody have the similar experience? Paul. - Original Message From: Barry Rowlingson [EMAIL PROTECTED] To: Eric Turkheimer [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Sent: Wednesday, August 22, 2007 9:43:57 AM Subject: Re: [R] Synchronzing workspaces Eric Turkheimer wrote: How do people go about synchronizing multiple workspaces on different workstations? I tend to wind up with projects spread around the various machines I work on. I find that placing the directories on a server and reading them remotely tends to slow things down. If R were to store all its workspace data objects in individual files instead of one big .RData file, then you could use a revision control system like SVN. Check out the data, work on it, check it in, then on another machine just update to get the changes. However SVN doesn't work too well for binary files - conflicts being hard to resolve without someone backing down - so maybe its not such a good idea anyway... On unix boxes and derivatives, you can keep things in sync efficiently with the 'rsync' command. I think there are GUI addons for it, and Windows ports. Barry __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Comedy with an Edge to see what's on, when. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] by group problem

See the examples labelled head in the examples section near the bottom of: http://sqldf.googlecode.com/svn/trunk/man/sqldf.Rd These show show to do it using order as well as using SQL via sqldf. On 8/31/07, Cory Nissen [EMAIL PROTECTED] wrote: I am working with census data. My columns of interest are... PercentOld - the percentage of people in each county that are over 65 County - the county in each state State - the state in the US There are about 3100 rows, with each row corresponding to a county within a state. I want to return the top five PercentOld by state. But I want the County and the Value. I tried this... topN - function(column, n=5) { column - sort(column, decreasing=T) return(column[1:n]) } top5PerState - tapply(data$percentOld, data$STATE, topN) But this only returns the value for percentOld per state, I also want the corresponding County. I think I'm close, but I just can't get it... Thanks cn [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] data frame row manipulation

Try this: evaluation$maxVol - ave(evaluation$vol, evaluation$name, FUN = max) or using SQL via sqldf like this: library(sqldf) sqldf(select * from evaluation join (select name, max(vol) from evaluation group by name) using (name)) On 8/31/07, Calle [EMAIL PROTECTED] wrote: Hello, struggling with the very basic needs... :( any help appreciated. #using the package doBY #who drinks how much beer per day and therefor cannot calculate rowise maxvals evaluation=data.frame(date=c(1,2,3,4,5,6,7,8,9), name=c(Michael,Steve,Bob, Michael,Steve,Bob,Michael,Steve,Bob), vol=c(3,5,4,2,4,5,7,6,7)) evaluation # maxval=summaryBy(vol ~ name,data=evaluation,FUN = function(x) { c(ma=max(x)) } ) maxval # over all days per person #function getMaxVal=function(x) { maxval$vol.ma[maxval$name==x] } getMaxVal(Steve) # testing the function for one name is ok #we want to add a column, that shows the daily drinkingvolume in relation to the persons max-vol. evaluation[,relDrink]= evaluation$vol/getMaxVal(evaluation$name) # # this brings the error: # #Warning message: # Korrupter Data Frame: Spalten werden abgeschnitten oder mit NAs # aufgefüllt in: format.data.frame(x, digits = digits, na.encode = FALSE) errortest= evaluation$vol/getMaxVal(evaluation$name) errortest # this brings: # numeric(0) #target was the following: #show in each line the daily consumed beer per person and in the next column #the all time max consumed beer for this person´(or divided by daily vol): # # datename vol relDrink #11 Michael 37 #22 Steve 56 #33 Bob 47 #44 Michael 27 #55 Steve 47 #66 Bob 57 #77 Michael 77 #88 Steve 66 #99 Bob 77 # who can help??? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] size limitations in R

SAS was developed many years ago when computers were far less powerful so its heritage is that it is very efficient and its unlikely that R or other modern software will match SAS in that respect. The development version of the sqldf R package provides an interface which simplifies the use of the R package RSQLite which in turn is an interface to the sqlite database. The development version of sqldf supports RSQLite's ability to read a file directly to sqlite without going through R and then reading it from there or reading a subset of it from there into R. See example 6 on the sqldf home page: http://code.google.com/p/sqldf/ On 8/31/07, Fabiano Vergari [EMAIL PROTECTED] wrote: I am a SAS user currently evaluating R as a possible addition or even replacement for SAS. The difficulty I have come across straight away is R's apparent difficulty in handling relatively large data files. Whilst I would not expect it to handle datasets with millions of records, I still really need to be able to work with dataset with 100,000+ records and 100+ variables. Yet, when reading a .csv file with 180,000 records and about 200 variables, the software virtually ground to a halt (I stopped it after 1 hour). Are there guidelines or maybe a limitations document anywhere that helps me assess the size of file that R, generally, or specific routines will handle? Also, mindful of the fact that I am am an R novice, are there guidelines to make efficient use of R in terms of data handling? Many thanks in advance for your help. Regards, Fabiano Vergari [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] R and Web Applications

The R packages and projects for the web and R are listed here: http://www.lmbe.seu.edu.cn/CRAN/doc/FAQ/R-FAQ.html#R-Web-Interfaces On 8/30/07, Chris Parkin [EMAIL PROTECTED] wrote: Hello, I'm curious to know how people are calling R from web applications (I've been looking for Perl but I'm open to other languages). After doing a search, I came across the R package RSPerl, but I'm having difficulties getting it installed (on Mac OSX). I believe the problem probably has to do with changes in R since the package release. Below you will see where the installation process comes to an end. Does anyone have any suggestions, or perhaps a direction to point me in? Thanks in advance for your insight! Chris * Installing to library '/Library/Frameworks/R.framework/Resources/library' * Installing *source* package 'RSPerl' ... checking for perl... /usr/bin/perl No support for any of the Perl modules from calling Perl from R. * Set PERL5LIB to /Library/Frameworks/R.framework/Versions/2.5/Resources/library/RSPerl/perl * Testing: -F/Library/Frameworks/R.framework/.. -framework R Using '/usr/bin/perl' as the perl executable Perl modules (no): Adding R package to list of Perl modules to enable callbacks to R from Perl Creating the C code for dynamically loading modules with native code for Perl: R modules: R; linking: checking for gcc... gcc checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed Support R in Perl: yes configure: creating ./config.status config.status: creating src/Makevars config.status: creating inst/scripts/RSPerl.csh config.status: creating inst/scripts/RSPerl.bsh config.status: creating src/RinPerlMakefile config.status: creating src/Makefile.PL config.status: creating cleanup config.status: creating src/R.pm config.status: creating R/perl5lib.R making target all in RinPerlMakefile RinPerlMakefile:5: /Library/Frameworks/R.framework/Resources/etc/Makeconf: No such file or directory make: *** No rule to make target `/Library/Frameworks/R.framework/Resources/etc/Makeconf'. Stop. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Month end calculations

The zoo package includes the yearmon class to facilitate such manipulations. Here are a few solutions assuming you store you series in a zoo variable: # test data library(zoo) z - zoo(1001:1100, as.Date(101:200))[-(45:55)] # Solution 1. tapply produces indexes of last of month tt - time(z) z[ c(tapply(seq_along(tt), as.yearmon(tt), tail, 1)) ] # If we want to create a last variable which corresponds # to last in sas then do it this slightly longer way: # Solution 2 tt - time(z) last - seq_along(tt) %in% tapply(seq_along(tt), as.yearmon(tt), tail, 1) z[last] # Solution 3. another solution with a last variable. f(x) is # vector same length as x with all 0's except last element is 1. tt - time(z) f - function(x) replace(0*x, length(x), 1) last - ave(seq_along(tt), as.yearmon(tt), FUN = f) z[last] In all these solutions the last point in the series is always included. We have not assumed that every day is necessarily included in your series but if every day is included then even simpler solutions are possible. On 8/29/07, Shubha Vishwanath Karanth [EMAIL PROTECTED] wrote: Hi R users, Is there a function in R, which does some calculation only for the month end in a daily data?... In other words, is there a command in R, equivalent to last. function in SAS? BR, Shubha [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Month end calculations

The last line is wrong (see below for correction): On 8/30/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: The zoo package includes the yearmon class to facilitate such manipulations. Here are a few solutions assuming you store you series in a zoo variable: # test data library(zoo) z - zoo(1001:1100, as.Date(101:200))[-(45:55)] # Solution 1. tapply produces indexes of last of month tt - time(z) z[ c(tapply(seq_along(tt), as.yearmon(tt), tail, 1)) ] # If we want to create a last variable which corresponds # to last in sas then do it this slightly longer way: # Solution 2 tt - time(z) last - seq_along(tt) %in% tapply(seq_along(tt), as.yearmon(tt), tail, 1) z[last] # Solution 3. another solution with a last variable. f(x) is # vector same length as x with all 0's except last element is 1. tt - time(z) f - function(x) replace(0*x, length(x), 1) last - ave(seq_along(tt), as.yearmon(tt), FUN = f) z[last] This last line should be: z[last == 1] In all these solutions the last point in the series is always included. We have not assumed that every day is necessarily included in your series but if every day is included then even simpler solutions are possible. On 8/29/07, Shubha Vishwanath Karanth [EMAIL PROTECTED] wrote: Hi R users, Is there a function in R, which does some calculation only for the month end in a daily data?... In other words, is there a command in R, equivalent to last. function in SAS? BR, Shubha [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Month end calculations

And one more yearmon solution. Here z is a zoo series as before: tt - time(z) aggregate(z, ave(tt, as.yearmon(tt), FUN = max), tail, 1) On 8/30/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: The last line is wrong (see below for correction): On 8/30/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: The zoo package includes the yearmon class to facilitate such manipulations. Here are a few solutions assuming you store you series in a zoo variable: # test data library(zoo) z - zoo(1001:1100, as.Date(101:200))[-(45:55)] # Solution 1. tapply produces indexes of last of month tt - time(z) z[ c(tapply(seq_along(tt), as.yearmon(tt), tail, 1)) ] # If we want to create a last variable which corresponds # to last in sas then do it this slightly longer way: # Solution 2 tt - time(z) last - seq_along(tt) %in% tapply(seq_along(tt), as.yearmon(tt), tail, 1) z[last] # Solution 3. another solution with a last variable. f(x) is # vector same length as x with all 0's except last element is 1. tt - time(z) f - function(x) replace(0*x, length(x), 1) last - ave(seq_along(tt), as.yearmon(tt), FUN = f) z[last] This last line should be: z[last == 1] In all these solutions the last point in the series is always included. We have not assumed that every day is necessarily included in your series but if every day is included then even simpler solutions are possible. On 8/29/07, Shubha Vishwanath Karanth [EMAIL PROTECTED] wrote: Hi R users, Is there a function in R, which does some calculation only for the month end in a daily data?... In other words, is there a command in R, equivalent to last. function in SAS? BR, Shubha [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Synchronzing workspaces

I haven't had similar experience but note that save has ascii= and compress= arguments. You could check if varying those parameter values makes a difference. On 8/30/07, Paul August [EMAIL PROTECTED] wrote: I used to work on several computers and to use a flash drive to synchronize the workspace on each machine before starting to work on it. I found that .RData always caused some trouble: Often it is corrupted even though there is no error in copying process. Does anybody have the similar experience? Paul. - Original Message From: Barry Rowlingson [EMAIL PROTECTED] To: Eric Turkheimer [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Sent: Wednesday, August 22, 2007 9:43:57 AM Subject: Re: [R] Synchronzing workspaces Eric Turkheimer wrote: How do people go about synchronizing multiple workspaces on different workstations? I tend to wind up with projects spread around the various machines I work on. I find that placing the directories on a server and reading them remotely tends to slow things down. If R were to store all its workspace data objects in individual files instead of one big .RData file, then you could use a revision control system like SVN. Check out the data, work on it, check it in, then on another machine just update to get the changes. However SVN doesn't work too well for binary files - conflicts being hard to resolve without someone backing down - so maybe its not such a good idea anyway... On unix boxes and derivatives, you can keep things in sync efficiently with the 'rsync' command. I think there are GUI addons for it, and Windows ports. Barry __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Comedy with an Edge to see what's on, when. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] sql query over local tables

I assume that by local tables you mean data frames in R. You can use the merge function in the base of R, as others have already mentioned, or if you want to use SQL syntax you can use the sqldf package. See example 4 on the sqldf home page: http://sqldf.googlecode.com On 8/28/07, Jorge Cornejo Donoso [EMAIL PROTECTED] wrote: Hi i have to table with IDs in each one. I want to make a join (as in sql) by the ID. Is any way to use the RODBC package (or other) in local tables (not a access, mysql, sql, etc. ) and made the join? Thanks in advance __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Strage result with an append/strptime combination

Try chron: library(chron) namefile - 070707050642.dat#day-month-year-hour-minute-second.dat x - chron(substr(namefile, 1, 6), substr(namefile, 7, 12), + format = c(dmy, hms), out.format = c(m/d/y, h:m:s)) c(x, x) [1] (07/07/07 05:06:42) (07/07/07 05:06:42) See R News 4/1 Help Desk article for more. On 8/29/07, Ptit_Bleu [EMAIL PROTECTED] wrote: Hi, I keep on trying to write some small scripts in order to learn R but even with basic scripts I have problems ... I start with the name of a file which is in fact the time the file has been generated (I cannot change the format). Then I convert namefile with strptime. The problem occurs when I add another time from another file with append. It displays some informations I don't want. I found a post about this problem (http://www.nabble.com/Error-with-strptime-tf3607942.html#a10081942) but I don't understand the solution. I tested as.POSIXct or as.POSIX.lt but it has no effect. Do you have some ideas to solve this problem ? Thank you for your help. Ptit Bleu. --- namefile-070707050642.dat#day-month-year-hour-minute-second.dat jourheure-strptime(namefile,%d%m%y%H%M%S) jourheure [1] 2007-07-07 05:06:42 jourheure-append(jourheure,jourheure) jourheure [1] 2007-07-07 05:06:42 Paris, Madrid (heure d'été) 2007-07-07 05:06:42 Paris, Madrid (heure d'été) -- View this message in context: http://www.nabble.com/Strage-result-with-an-append-strptime-combination-tf4347401.html#a12385852 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Strage result with an append/strptime combination

Try fmt - function(x) with(month.day.year(x), sprintf(%02d/%02d/%02d %02d:%02d:%02d, month, day, year, hours(x), minutes(x), seconds(x))) fmt(x) On 8/29/07, Ptit_Bleu [EMAIL PROTECTED] wrote: Thanks Gabor ! It works. Just one more thing : is there a possibility to remove ( and ) before I copy the data to a MySQL database. Again thank you for the tip. Ptit Bleu. Gabor Grothendieck wrote: Try chron: library(chron) namefile - 070707050642.dat#day-month-year-hour-minute-second.dat x - chron(substr(namefile, 1, 6), substr(namefile, 7, 12), + format = c(dmy, hms), out.format = c(m/d/y, h:m:s)) c(x, x) [1] (07/07/07 05:06:42) (07/07/07 05:06:42) See R News 4/1 Help Desk article for more. On 8/29/07, Ptit_Bleu [EMAIL PROTECTED] wrote: Hi, I keep on trying to write some small scripts in order to learn R but even with basic scripts I have problems ... I start with the name of a file which is in fact the time the file has been generated (I cannot change the format). Then I convert namefile with strptime. The problem occurs when I add another time from another file with append. It displays some informations I don't want. I found a post about this problem (http://www.nabble.com/Error-with-strptime-tf3607942.html#a10081942) but I don't understand the solution. I tested as.POSIXct or as.POSIX.lt but it has no effect. Do you have some ideas to solve this problem ? Thank you for your help. Ptit Bleu. --- namefile-070707050642.dat#day-month-year-hour-minute-second.dat jourheure-strptime(namefile,%d%m%y%H%M%S) jourheure [1] 2007-07-07 05:06:42 jourheure-append(jourheure,jourheure) jourheure [1] 2007-07-07 05:06:42 Paris, Madrid (heure d'été) 2007-07-07 05:06:42 Paris, Madrid (heure d'été) -- View this message in context: http://www.nabble.com/Strage-result-with-an-append-strptime-combination-tf4347401.html#a12385852 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Strage-result-with-an-append-strptime-combination-tf4347401.html#a12386702 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Excel

You would still need the interactive GUI to get to the point where its at all comparable to Excel. Using rpad you could construct such an interface although its a bit of work. Here is an example using rpad and reshape: http://www.rpad.org/Rpad/DataExplorer.Rpad On 8/29/07, Bert Gunter [EMAIL PROTECTED] wrote: Erich: This is not a comment either for or against the use of Excel. I only wish to point out that AFAICS, Hadley Wickham's reshape package offers all the pivot table functionality and more. If I am wrong about this, please let me and everyone else know. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Erich Neuwirth Sent: Wednesday, August 29, 2007 11:43 AM To: r-help Subject: Re: [R] Excel Excel bashing can be fun but also can be dangerous because you are makeing your life harder than necessary. Statisticians meanwhile know that the numerics of statistical computation can be quite bad, therefore one should not use them. But using our (we = Thomas Baier + Erich Neuwirth) RExcel addin either with the R(D)COM server or with rcom (package on CRAN) allows you to use all the nice features of Excel (yes, there are quite a few) and use R as as the computational engine within Excel. The formula =RApply(var,A1:A1000) in an Excel cell for example will use R to compute the variance of the data in column A in Excel. If you change any of the values in the range A1:A1000 will automatically recompute the variance. There is one feature in Excel which is extremely convenient, Pivot tables. Anybody doing any work as statistical consultant really ought to know about Pivot tables, and I am still surprised how many statisticians do not know about it. Neither Gnumeric nor OpenOffice Calc offer comparably convenient ways working with multidimensional tables. I think the answer to the question Excel or R of course is Excel and R. -- Erich Neuwirth, University of Vienna Faculty of Computer Science Computer Supported Didactics Working Group Visit our SunSITE at http://sunsite.univie.ac.at Phone: +43-1-4277-39464 Fax: +43-1-4277-39459 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Efficient way to parse string and construct data.frame

Try this: s - c(1 ,2 ,3, 4 ,5 ,6) read.csv(textConnection(s), header = FALSE) V1 V2 V3 1 1 2 3 2 4 5 6 On 8/28/07, yoo [EMAIL PROTECTED] wrote: Hi all, I have this list of strings [1] 1 ,2 ,3 4 ,5 ,6 Is there an efficient way to convert it to data.frame: V1 V2 V3 1 1 23 2 4 56 Like I can use strsplit to get to a list of split strings.. and then use say a = strsplit(mylist, ,) data.frame(V1 = lapply(a, function(x){x[1]}), V2 = lapply(a, function(x){x[2]}),.) but i'm loop through that list so many times.. so I'm hesitated to use that.. Thanks a lot for your great help before and this time as well!! - boy -- View this message in context: http://www.nabble.com/Efficient-way-to-parse-string-and-construct-data.frame-tf4342441.html#a12370234 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Factor levels

You can create your own class and pass that to read table. In the example below Fld2 is read in with factor levels C, A, B in that order. library(methods) setClass(my.levels) setAs(character, my.levels, function(from) factor(from, levels = c(C, A, B))) ### test ### Input - Fld1 Fld2 10 A 20 B 30 C 40 A DF - read.table(textConnection(Input), header = TRUE, colClasses = c(numeric, my.levels)) str(DF) # or DF - read.table(textConnection(Input), header = TRUE, colClasses = list(Fld2 = my.levels)) str(DF) On 8/28/07, Sébastien [EMAIL PROTECTED] wrote: Dear R-users, I have found this not-so-recent post in the archives - http://tolstoy.newcastle.edu.au/R/devel/00a/0291.html - while I was looking for a particular way to reorder factor levels. The question addressed by the author was to know if the read.table function could be modified to order the levels of newly created factors according to the order that they appear in the data file. Exactly what I am looking for. As there was no reply to this post, I wonder if any move have been made towards the implementation of this suggestion. A quick look at ?read.table tells me that if this option was implemented, it was not in the read.table function... Sebastien PS: I am sorry to post so many messages on the list, but I am learning R (basically by trials errors ;-) ) and no one around me has even a slight notion about it... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Factor levels

Its not clear from your description what you want. Could you be a bit more specific including an example. On 8/28/07, Sébastien [EMAIL PROTECTED] wrote: Thanks Gabor, I have two questions: 1- Is there any difference between your code and the following one, with regards to Fld2 ? ### test ### Input - Fld1 Fld2 10 A 20 B 30 C 40 A DF - read.table(textConnection(Input), header = TRUE) DF$Fld2-factor(DF$Fld2,levels= c(C, A, B))) 2- do you see any way to bring flexibility to your method ? Because, it looks to me as, at this stage, I have to i) know the order of my levels before I read the table and ii) create one class per factor. My problem is that I am not really working on a specific dataset. My goal is to develop R scripts capable of handling datasets which have various contents but close structures. So, I really need to minimize the quantity of user-specific code. Sebastien Gabor Grothendieck a écrit : You can create your own class and pass that to read table. In the example below Fld2 is read in with factor levels C, A, B in that order. library(methods) setClass(my.levels) setAs(character, my.levels, function(from) factor(from, levels = c(C, A, B))) ### test ### Input - Fld1 Fld2 10 A 20 B 30 C 40 A DF - read.table(textConnection(Input), header = TRUE, colClasses = c(numeric, my.levels)) str(DF) # or DF - read.table(textConnection(Input), header = TRUE, colClasses = list(Fld2 = my.levels)) str(DF) On 8/28/07, Sébastien [EMAIL PROTECTED] wrote: Dear R-users, I have found this not-so-recent post in the archives - http://tolstoy.newcastle.edu.au/R/devel/00a/0291.html - while I was looking for a particular way to reorder factor levels. The question addressed by the author was to know if the read.table function could be modified to order the levels of newly created factors according to the order that they appear in the data file. Exactly what I am looking for. As there was no reply to this post, I wonder if any move have been made towards the implementation of this suggestion. A quick look at ?read.table tells me that if this option was implemented, it was not in the read.table function... Sebastien PS: I am sorry to post so many messages on the list, but I am learning R (basically by trials errors ;-) ) and no one around me has even a slight notion about it... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Factor levels

Its the same principle. Just change the function to be suitable. This one arranges the levels according to the input: library(methods) setClass(my.factor) setAs(character, my.factor, function(from) factor(from, levels = unique(from))) Input - a b c 1 1 176 w 2 2 141 k 3 3 172 r 4 4 182 s 5 5 123 k 6 6 153 p 7 7 176 l 8 8 170 u 9 9 140 z 10 10 194 s 11 11 164 j 12 12 100 j 13 13 127 x 14 14 137 r 15 15 198 d 16 16 173 j 17 17 113 x 18 18 144 w 19 19 198 q 20 20 122 f DF - read.table(textConnection(Input), header = TRUE, colClasses = list(c = my.factor)) str(DF) On 8/28/07, Sébastien [EMAIL PROTECTED] wrote: Ok, I cannot send to you one of my dataset since they are confidential. But I can produce a dummy mini dataset to illustrate my question. Let's say I have a csv file with 3 columns and 20 rows which content is reproduced by the following line. mydata-data.frame(a=1:20, b=sample(100:200,20,replace=T),c=sample(letters[1:26], 20, replace = T)) mydata a b c 1 1 176 w 2 2 141 k 3 3 172 r 4 4 182 s 5 5 123 k 6 6 153 p 7 7 176 l 8 8 170 u 9 9 140 z 10 10 194 s 11 11 164 j 12 12 100 j 13 13 127 x 14 14 137 r 15 15 198 d 16 16 173 j 17 17 113 x 18 18 144 w 19 19 198 q 20 20 122 f If I had to read the csv file, I would use something like: mydata-data.frame(read.table(file=c:/test.csv,header=T)) Now, if you look at mydata$c, the levels are alphabetically ordered. mydata$c [1] w k r s k p l u z s j j x r d j x w q f Levels: d f j k l p q r s u w x z What I am trying to do is to reorder the levels as to have them in the order they appear in the table, ie Levels: w k r s p l u z j x d q f Again, keep in mind that my script should be used on datasets which content are unknown to me. In my example, I have used letters for mydata$c, but my code may have to handle factors of numeric or character values (I need to transform specific columns of my dataset into factors for plotting purposes). My goal is to let the code scan the content of each factor of my data.frame during or after the read.table step and reorder their levels automatically without having to ask the user to hard-code the level order. In a way, my problem is more related to the way the factor levels are ordered than to the read.table function, although I guess there is a link... Gabor Grothendieck a écrit : Its not clear from your description what you want. Could you be a bit more specific including an example. On 8/28/07, Sébastien [EMAIL PROTECTED] wrote: Thanks Gabor, I have two questions: 1- Is there any difference between your code and the following one, with regards to Fld2 ? ### test ### Input - Fld1 Fld2 10 A 20 B 30 C 40 A DF - read.table(textConnection(Input), header = TRUE) DF$Fld2-factor(DF$Fld2,levels= c(C, A, B))) 2- do you see any way to bring flexibility to your method ? Because, it looks to me as, at this stage, I have to i) know the order of my levels before I read the table and ii) create one class per factor. My problem is that I am not really working on a specific dataset. My goal is to develop R scripts capable of handling datasets which have various contents but close structures. So, I really need to minimize the quantity of user-specific code. Sebastien Gabor Grothendieck a écrit : You can create your own class and pass that to read table. In the example below Fld2 is read in with factor levels C, A, B in that order. library(methods) setClass(my.levels) setAs(character, my.levels, function(from) factor(from, levels = c(C, A, B))) ### test ### Input - Fld1 Fld2 10 A 20 B 30 C 40 A DF - read.table(textConnection(Input), header = TRUE, colClasses = c(numeric, my.levels)) str(DF) # or DF - read.table(textConnection(Input), header = TRUE, colClasses = list(Fld2 = my.levels)) str(DF) On 8/28/07, Sébastien [EMAIL PROTECTED] wrote: Dear R-users, I have found this not-so-recent post in the archives - http://tolstoy.newcastle.edu.au/R/devel/00a/0291.html - while I was looking for a particular way to reorder factor levels. The question addressed by the author was to know if the read.table function could be modified to order the levels of newly created factors according to the order that they appear in the data file. Exactly what I am looking for. As there was no reply to this post, I wonder if any move have been made towards the implementation of this suggestion. A quick look at ?read.table tells me that if this option was implemented, it was not in the read.table function... Sebastien PS: I am sorry to post so many messages on the list, but I am learning R (basically by trials errors ;-) ) and no one around me has even a slight notion about it... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide

### Re: [R] Nodes edges with similarity matrix

Try this: # test data mat - structure(c(1, 0.325141612, 0.002109751, 0.250153137, 0.0223676, 1, 0.342654, 0.1987485, 0.9723831, 0.9644216, 1, 0.7391222, 0.394331, 0.5460461, 0.7080224, 1), .Dim = c(4L, 4L), .Dimnames = list( c(a, b, c, d), c(a, b, c, d))) library(sna) # draw edges according to value gplot(mat, edge.lwd = mat, label = rownames(mat)) # thresholding at 0.5 gplot(mat .5, label = rownames(mat)) On 8/28/07, H. Paul Benton [EMAIL PROTECTED] wrote: Hello, I apologise if someone has already answered this but I searched and googled but didn't find anything. I have a matrix which gives me the similarity of each item to each other. I would like to turn this matrix into something like what they have in the graph package with the nodes and edges. http://cran.r-project.org/doc/packages/graph.pdf . However I cannot find a method to convert my matrix to an object that graph can use. my similarity matrix looks like: sim[1:4,] a b c d [a] 1.0 0.0223676 0.9723831 0.3943310 [b] 0.325141612 1.000 0.9644216 0.5460461 [c] 0.002109751 0.3426540 1.000 0.7080224 [d] 0.250153137 0.1987485 0.7391222 1.000 please don't get caught up with the numbers I simple made this to show. I have not produce the code yet to make my similitary matrix. Does anyone know a method to do this or do I have to write something. :( If I do any starter code :D jj. If I've read something wrong or misunderstood my apologies. cheers, Paul -- Research Technician Mass Spectrometry o The / o Scripps \ o Research / o Institute __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] How to provide argument when opening RGui from an external application

There are also some batch files that can be used with Rscript on XP and info in the README here: http://batchfiles.googlecode.com On 8/26/07, Sébastien [EMAIL PROTECTED] wrote: Thanks for your reply. When you say look into Rscript.exe, do you have a specific document in mind ? I tried to google it but could not find much... I forgot to mention in my first email that I am working under the Windows XP environment. Prof Brian Ripley a écrit : Look into Rscript.exe (on Windows), which is a flexible way to run scripts. Neither using a GUI nor using source() are recommended. On Fri, 24 Aug 2007, Sébastien wrote: Dear R-users, I have written a small application (in visual basic) that automatically generate some R scripts. I would like to execute these scripts when my application is being closed. My problem is that I don't know how to pass the 'source(c:/.../myscript.r)' instruction when I programmatically start RGui. Tinn-R is capable of doing such things, so I guess there must be a way to pass arguments to RGui. Any advice or link to relevant references would be greatly appreciated. Sebastien __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] How to make an array of data.frames?

Is this what you want: DF1 - DF2 - DF3 - df1 - df2 - df3 - head(iris) list(a = list(DF1, DF2, DF3), b = list(df1, df2, df3)) or x - list() x$a - list(DF1, DF2, DF3) x$b - list(df1, df2, df3) On 8/26/07, Werner Wernersen [EMAIL PROTECTED] wrote: Hi, I am still struggling with the data structures in R. I know how it works in C++ but how can I get such a structure in R? Here is what I want: x[a]$dataframe1 x[a]$dataframe2 x[a]$dataframe3 x[b]$dataframe1 x[b]$dataframe2 x[b]$dataframe3 x[c]$dataframe1 x[c]$dataframe2 x[c]$dataframe3 And it would be nice if I could fill in objects a, b, c one at a time successively. What is the easiest way to get such a data structure? It would be great if someone could give me some help with this. Many thanks and kind regards, Werner __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] How to make an array of data.frames?

That gives you a list of data frames. An array is a vector with a dim attribute to to make it into an array add the appropriate dim attirbute. If x is the list we created before then: dim(x) - 2 gives us an array of length 2 each of which has a list of 3 elements or dim(x) - 1:2 gives a 1x2 array or y - list(DF1, DF2, DF3, df1, df2, df3) dim(y) - 3:2 gives a 3x2 array so you can write y[[1,2]] for example. etc. On 8/26/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: Is this what you want: DF1 - DF2 - DF3 - df1 - df2 - df3 - head(iris) list(a = list(DF1, DF2, DF3), b = list(df1, df2, df3)) or x - list() x$a - list(DF1, DF2, DF3) x$b - list(df1, df2, df3) On 8/26/07, Werner Wernersen [EMAIL PROTECTED] wrote: Hi, I am still struggling with the data structures in R. I know how it works in C++ but how can I get such a structure in R? Here is what I want: x[a]$dataframe1 x[a]$dataframe2 x[a]$dataframe3 x[b]$dataframe1 x[b]$dataframe2 x[b]$dataframe3 x[c]$dataframe1 x[c]$dataframe2 x[c]$dataframe3 And it would be nice if I could fill in objects a, b, c one at a time successively. What is the easiest way to get such a data structure? It would be great if someone could give me some help with this. Many thanks and kind regards, Werner __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Program of matrix of seasonal dummy variable(Econometrics)

Try this: kronecker(rep(1, 3), diag(4)) [,1] [,2] [,3] [,4] [1,]1000 [2,]0100 [3,]0010 [4,]0001 [5,]1000 [6,]0100 [7,]0010 [8,]0001 [9,]1000 [10,]0100 [11,]0010 [12,]0001 On 8/26/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Dear R users, I would like to construct a matrix of seasonal dummy variables, such matrix can be written as follows(i.e format(T,4)) 10 0 0 01 0 0 00 1 0 00 0 1 10 0 0 01 0 0 00 1 0 00 0 1 10 0 0 01 0 0 00 1 0 00 0 1 .. .. .. . . etc I have written the following small program: T=100 br-matrix(0,T,4) for (i in 1:T) { + for (j in 1:4) { + if i=j {+ br[i,j]=1 + } + if else (abs(i-j)%%4==0) {+ br[i,j]=1 +} + else {+ br[i,j]=0 +} +} +} I have obtained the following message from R consol: T=100 br-matrix(0,T,4) for (i in 1:T) + { + + for (j in 1:4) + { ++ if i=j Erreur : syntax error, unexpected SYMBOL, expecting '(' dans : {+ br[i,j]=1 + + } Erreur : syntax error, unexpected '}' dans {+ br[i,j]=1 + if else (abs(i-j)%%4==0) Erreur : syntax error, unexpected ELSE, expecting '(' dans+ if else {+ br[i,j]=1 + +} Erreur : syntax error, unexpected '}' dans {+ br[i,j]=1 + else Erreur : syntax error, unexpected ELSE dans+ else {+ br[i,j]=0 + +} Erreur : syntax error, unexpected '}' dans {+ br[i,j]=0 +} Erreur : syntax error, unexpected '}' dans +} +} Erreur : syntax error, unexpected '}' dans +} I would require if you can rectify my program in order to obtain this matrix of seasonal dummies. Many thanks in advance. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] subset using noncontiguous variables by name (not index)

Using builtin data frame anscombe try this. First we set up a data frame anscombe.seq which has one row containing 1, 2, 3, ... . Then select out from that data frame and unlist it to get the desired index vector. anscombe.seq - replace(anscombe[1,], TRUE, seq_along(anscombe)) idx - unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2))) anscombe[idx] x1 x3 x4 y2 1 10 10 8 9.14 2 8 8 8 8.14 3 13 13 8 8.74 4 9 9 8 8.77 5 11 11 8 9.26 6 14 14 8 8.10 7 6 6 8 6.13 8 4 4 19 3.10 9 12 12 8 9.13 10 7 7 8 7.26 11 5 5 8 4.74 On 8/26/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Hi All, I'm using the subset function to select a list of variables, some of which are contiguous in the data frame, and others of which are not. It works fine when I use the form: subset(mydata,select=c(x1,x3:x5,x7) ) In reality, my list is far more complex. So I would like to store it in a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to work. That use of the c function seems to violate R rules, so I'm not sure how it works at all. A small simulation of the problem is below. If the variable names orders were really this simple, I could use indices like summary( mydata[ ,c(1,3:5,7) ] ) but alas, they are not. How does the c function work this way in the first place, and how can I make this substitution? Thanks, Bob mydata - data.frame( x1=c(1,2,3,4,5), x2=c(1,2,3,4,5), x3=c(1,2,3,4,5), x4=c(1,2,3,4,5), x5=c(1,2,3,4,5), x6=c(1,2,3,4,5), x7=c(1,2,3,4,5) ) mydata # This does what I want. summary( subset(mydata,select=c(x1,x3:x5,x7) ) ) # Can I substitute myVars? attach(mydata) myVars1 - c(x1,x3:x5,x7) # Not looking good! myVars1 # This doesn't do the right thing. summary( subset(mydata,select=myVars1 ) ) # Total desperation on this attempt: myVars2 - x1,x3:x5,x7 myVars2 # This doesn't work either. summary( subset(mydata,select=myVars2 ) ) = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] subset using noncontiguous variables by name (not index)

Try this: %:% - function(x, y) { +prex - gsub([0-9], , x); postx - gsub([^0-9], , x) +prey - gsub([0-9], , y); posty - gsub([^0-9], , y) +stopifnot(prex == prey) +paste(prex, seq(from = as.numeric(postx), to = as.numeric(posty)), sep = ) + } x2 %:% x4 [1] x2 x3 x4 On 8/26/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Thanks Bert Gabor for two very interesting solutions! It would be very handy in R if string1:stringN generated string1,string2...stringN it would make selections like this much more obvious. I know it's easy to with the colon operator and paste function but that's quite a step up in complexity compared to SAS' x1 x3-x4 y2 or SPSS' x1,x3 to x4, y2. And it's complexity that beginners face early in learning R. While on the subject of the colon operator, why doesn't anscombe[[1:4]] select the x variables in list form as anscombe[,1:4] or anscombe[1:4] do in data frame form? Thanks, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: Bert Gunter [mailto:[EMAIL PROTECTED] Sent: Sunday, August 26, 2007 6:50 PM To: 'Gabor Grothendieck'; Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: RE: [R] subset using noncontiguous variables by name (not index) The problem is that x3:x5 does not mean what you think it means. The only reason it does the right thing in subset() is because a clever trick is used there (read the code -- it's not hard to understand) to ensure that it does. Gabor has essentially mimicked that trick in his solution. However, it is not necessary do this. You can construct the call directly as you tried to do. Using the anscombe example, here's how: chooz - c(x1,x3:x4,y2) ## enclose the desired expression in quotes do.call (subset, list( x = anscombe, select = parse(text = chooz))) -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck Sent: Sunday, August 26, 2007 2:10 PM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] subset using noncontiguous variables by name (not index) Using builtin data frame anscombe try this. First we set up a data frame anscombe.seq which has one row containing 1, 2, 3, ... . Then select out from that data frame and unlist it to get the desired index vector. anscombe.seq - replace(anscombe[1,], TRUE, seq_along(anscombe)) idx - unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2))) anscombe[idx] x1 x3 x4 y2 1 10 10 8 9.14 2 8 8 8 8.14 3 13 13 8 8.74 4 9 9 8 8.77 5 11 11 8 9.26 6 14 14 8 8.10 7 6 6 8 6.13 8 4 4 19 3.10 9 12 12 8 9.13 10 7 7 8 7.26 11 5 5 8 4.74 On 8/26/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Hi All, I'm using the subset function to select a list of variables, some of which are contiguous in the data frame, and others of which are not. It works fine when I use the form: subset(mydata,select=c(x1,x3:x5,x7) ) In reality, my list is far more complex. So I would like to store it in a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to work. That use of the c function seems to violate R rules, so I'm not sure how it works at all. A small simulation of the problem is below. If the variable names orders were really this simple, I could use indices like summary( mydata[ ,c(1,3:5,7) ] ) but alas, they are not. How does the c function work this way in the first place, and how can I make this substitution? Thanks, Bob mydata - data.frame( x1=c(1,2,3,4,5), x2=c(1,2,3,4,5), x3=c(1,2,3,4,5), x4=c(1,2,3,4,5), x5=c(1,2,3,4,5), x6=c(1,2,3,4,5), x7=c(1,2,3,4,5) ) mydata # This does what I want. summary( subset(mydata,select=c(x1,x3:x5,x7) ) ) # Can I substitute myVars? attach(mydata) myVars1 - c(x1,x3:x5,x7) # Not looking good! myVars1 # This doesn't do the right thing. summary( subset(mydata,select=myVars1 ) ) # Total desperation on this attempt: myVars2 - x1,x3:x5,x7 myVars2 # This doesn't work either. summary( subset

### Re: [R] Extracting a range of elements from a vector

See ?embed On 8/25/07, Otis Laws [EMAIL PROTECTED] wrote: Dear R users I am R newbie creating a function that implements the poker test to test pseudo random bit generators. Iam reading the bits from a text file (1 bit per line), which causes each bit to be stored in an element of a numeric vector. What Iam trying to do is to extract a block of bits of arbitray size from the original vector into a smaller numeric vector and then count this binary number (and keep repeating this until the end of the vector, so that I get a vector containing the number of times each binary number has occured) e.g. original vector: 0, 1,1,0,0,1,0,1,1 using a block size of 3 bits the first smaller vector becomes: 0, 1, 1 At the momemt I do this by iterating through the original vector and set the ith element of the smaller vector. I have looked at using the subset() function but it seems to operate on a vector's content rather than index. This causes the following two main questions: 1. Is there a way to specify a range of vector elements? 2. Is this the most efficient method, since this could be extremly time consuming when used to test millions of bits? Thanks very much in advance Otis Laws __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Character position command

See ?regexpr to get the position; however, using sub we could remove the dot and everything after it in one go. See ?regexp and ?sub . Also there are some links to info on regular expressions in the Links box on this page: http://gsubfn.googlecode.com n - regexpr(., apples.pears, fixed = TRUE) substr(apples.pear, 1, n-1) [1] apples sub([.].*, , apples.pears) [1] apples On 8/25/07, Mitchell Hoffman [EMAIL PROTECTED] wrote: This is a very simple question, so I apologize I couldn't find it online: I want to shorten the string 'apples.pears' to 'apples'. string='apples.pears' string1=substr(string,0,x) For x above, I would like to have a command like charAt(string,.), i.e. the position of the period in the word, but I can't seem to find a charAt command in R. Thank you. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] How to shade vertical bands in a graph?

There is an example using classic graphics here: http://www.mayin.org/ajayshah/KB/R/html/g5.html and one using lattice graphics here: library(zoo) ?xyplot.zoo On 8/23/07, del pes [EMAIL PROTECTED] wrote: Hello, I would like to draw vertical yellow bands in my graph, but could not find how to do that in the documentation. I set up a page to show what I would like to achieve: http://rstudent.blogg.de/eintrag.php?id=1 (the first picture was manually colored with the Gimp). Any help would be welcome... All the best, Delfina _ [[replacing trailing spam]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] It is possible to use a Shell command inside a R script?

What OS was that on? On 8/24/07, Alberto Monteiro [EMAIL PROTECTED] wrote: Ronaldo Reis Junior wrote: It is possible to use a shell command inside a R script? I'm write a R script and I like to put somes shell commands inside to R. Somethink like: convert fig01.png fig01.xpm or sed ..., etc. It is possible? How? ?system BTW, I found that using things directly in R is _much_ slower than creating a batch file and then running it. For example, I had a directory with misnamed mp3 files, and I wanted to use R to rename and copy them to another directory. I tried to use file.copy, but it took too much time. Writing a batch file and then running it was much faster. Alberto Monteiro __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] It is possible to use a Shell command inside a R script?

On 8/24/07, Dirk Eddelbuettel [EMAIL PROTECTED] wrote: On Fri, Aug 24, 2007 at 08:32:00AM -0400, Duncan Murdoch wrote: On 8/24/2007 6:58 AM, Ronaldo Reis Junior wrote: Hi, It is possible to use a shell command inside a R script? I'm write a R script and I like to put somes shell commands inside to R. Somethink like: convert fig01.png fig01.xpm or sed ..., etc. The details and available functions depend on the platform, but you want to look at ?system, ?shell, and/or ?shell.exec. (These all exist in Windows; on Unix-alikes, you probably won't have the latter two.) Don't forget pipes. R's ability to consistently work on connections that may be local files, remotes files, program output, ... is a true treasure (and thanks and credits to, I believe, Brian Ripley to make it so). Eg you can do this OD - read.table(pipe(links -dump http://cran.r-project.org/src/contrib/ | awk '/tar.gz/ {print $3, $4}'), header=FALSE, col.names=c(file, date)) to get files and dates of files on CRAN. As I recall, this also works on that other operating system, provided you do all the legwork of installing other tools, setting PATHs etc to provide what works out of the box on the supposedly unfriendlier OS. Or commonly we can just do it entirely within R. In the example discussed we read in the lines, grep out the tar.gz lines, split each line into fields and select the desired columns, delete the junk and reformat it all into a data frame: Lines - readLines(http://cran.r-project.org/src/contrib/;) tar.gz.Lines - grep(tar.gz, Lines, value = TRUE) raw.fields - do.call(rbind, strsplit(tar.gz.Lines, /td))[, 2:3] mat - apply(raw.fields, 2, gsub, pattern = /a|.*\| *$, replacement = ) DF - data.frame(file = mat[,1], + date = strptime(mat[,2], %d-%b-%Y %H:%M), + stringsAsFactors = FALSE) head(DF) filedate 1 ADaCGH_1.3-1.tar.gz 2007-05-14 12:04:00 2 AIS_1.0.tar.gz 2007-07-31 16:38:00 3 AMORE_0.2-10.tar.gz 2007-04-11 10:17:00 4 ARES_1.2-2.tar.gz 2007-03-19 20:53:00 5 AcceptanceSampling_0.1-1.tar.gz 2007-07-07 20:46:00 6 AdaptFit_0.2-1.tar.gz 2007-08-04 09:51:00 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] It is possible to use a Shell command inside a R script?

On 8/24/07, Dirk Eddelbuettel [EMAIL PROTECTED] wrote: On Fri, Aug 24, 2007 at 10:57:46AM -0400, Duncan Murdoch wrote: On 8/24/2007 10:33 AM, Dirk Eddelbuettel wrote: On Fri, Aug 24, 2007 at 08:32:00AM -0400, Duncan Murdoch wrote: On 8/24/2007 6:58 AM, Ronaldo Reis Junior wrote: Hi, It is possible to use a shell command inside a R script? I'm write a R script and I like to put somes shell commands inside to R. Somethink like: convert fig01.png fig01.xpm or sed ..., etc. The details and available functions depend on the platform, but you want to look at ?system, ?shell, and/or ?shell.exec. (These all exist in Windows; on Unix-alikes, you probably won't have the latter two.) Don't forget pipes. R's ability to consistently work on connections that may be local files, remotes files, program output, ... is a true treasure (and thanks and credits to, I believe, Brian Ripley to make it so). Eg you can do this OD - read.table(pipe(links -dump http://cran.r-project.org/src/contrib/ | awk '/tar.gz/ {print $3, $4}'), header=FALSE, col.names=c(file, date)) to get files and dates of files on CRAN. As I recall, this also works on that other operating system, provided you do all the legwork of installing other tools, setting PATHs etc to provide what works out of the box on the supposedly unfriendlier OS. The pipe command you list doesn't work in Windows. I'd guess this is because the pipe syntax | within the command is unsupported: it tries to execute links, with the rest of the line passed as arguments. But I haven't traced through to check on this. Hm, wishful thinking must have gotten the better of me then. Sorry for spreading misinformation about the capabilities of that other OS. This works for me on Windows: tab - read.table(pipe(lynx --nolist --dump http://cran.r-project.org/src/contrib/ | findstr tar.gz), as.is = TRUE) head(tab[3:5]) V3 V4V5 1 ADaCGH_1.3-1.tar.gz 14-May-2007 12:04 2 AIS_1.0.tar.gz 31-Jul-2007 16:38 3 AMORE_0.2-10.tar.gz 11-Apr-2007 10:17 4 ARES_1.2-2.tar.gz 19-Mar-2007 20:53 5 AcceptanceSampling_0.1-1.tar.gz 07-Jul-2007 20:46 6 AdaptFit_0.2-1.tar.gz 04-Aug-2007 09:51 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] It is possible to use a Shell command inside a R script?

On 8/24/07, Duncan Murdoch [EMAIL PROTECTED] wrote: On 8/24/2007 1:05 PM, Gabor Grothendieck wrote: On 8/24/07, Dirk Eddelbuettel [EMAIL PROTECTED] wrote: On Fri, Aug 24, 2007 at 10:57:46AM -0400, Duncan Murdoch wrote: On 8/24/2007 10:33 AM, Dirk Eddelbuettel wrote: On Fri, Aug 24, 2007 at 08:32:00AM -0400, Duncan Murdoch wrote: On 8/24/2007 6:58 AM, Ronaldo Reis Junior wrote: Hi, It is possible to use a shell command inside a R script? I'm write a R script and I like to put somes shell commands inside to R. Somethink like: convert fig01.png fig01.xpm or sed ..., etc. The details and available functions depend on the platform, but you want to look at ?system, ?shell, and/or ?shell.exec. (These all exist in Windows; on Unix-alikes, you probably won't have the latter two.) Don't forget pipes. R's ability to consistently work on connections that may be local files, remotes files, program output, ... is a true treasure (and thanks and credits to, I believe, Brian Ripley to make it so). Eg you can do this OD - read.table(pipe(links -dump http://cran.r-project.org/src/contrib/ | awk '/tar.gz/ {print $3, $4}'), header=FALSE, col.names=c(file, date)) to get files and dates of files on CRAN. As I recall, this also works on that other operating system, provided you do all the legwork of installing other tools, setting PATHs etc to provide what works out of the box on the supposedly unfriendlier OS. The pipe command you list doesn't work in Windows. I'd guess this is because the pipe syntax | within the command is unsupported: it tries to execute links, with the rest of the line passed as arguments. But I haven't traced through to check on this. Hm, wishful thinking must have gotten the better of me then. Sorry for spreading misinformation about the capabilities of that other OS. This works for me on Windows: tab - read.table(pipe(lynx --nolist --dump http://cran.r-project.org/src/contrib/ | findstr tar.gz), as.is = TRUE) Which R version is that? It doesn't work for me in Rgui, though it does in Rterm, both R-devel versions. I am using Rgui R.version.string [1] R version 2.5.1 (2007-06-27) on Windows XP. lynx --version gives: Lynx Version 2.8.5rel.1 (04 Feb 2004) libwww-FM 2.14FM, SSL-MM 1.4.1, OpenSSL 0.9.7d-dev Compiled by Borland C++ (Feb 5 2004 17:35:58). Copyrights held by the University of Kansas, CERN, and other contributors. Distributed under the GNU General Public License. See http://lynx.isc.org/ and the online help for more information. See http://www.moxienet.com/lynx/ for information about SSL for Lynx. See http://www.openssl.org/ for information about OpenSSL. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Turning a logical vector into its indices without losing its length

On 8/24/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: Here are two solutions: logvec - c(TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE) ifelse(logvec, seq_along(logvec), 0) [1] 1 0 0 4 0 0 7 0 replace(logvec * 0, logvec, which(logvec)) [1] 1 0 0 4 0 0 7 0 Actually the * 0 is not needed. The last one could simply be: replace(logvec, logvec, which(logvec)) On 8/24/07, Leeds, Mark (IED) [EMAIL PROTECTED] wrote: I have the code below which gives me what I want for temp based on logvec but I was wondering if there was a shorter way ( i.e : a one liner ) without having to initialize temp to zeros. This is purely for learning purposes. Thanks. logvec - c(TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE) temp-numeric(length(invec)) temp[invec]-which(invec) temp [1] 1 0 0 4 0 0 7 0 obviously, the code below doesn't work. temp - which(invec) temp [1] 1 4 7 This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Turning a logical vector into its indices without losing its length

Here are two solutions: logvec - c(TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE) ifelse(logvec, seq_along(logvec), 0) [1] 1 0 0 4 0 0 7 0 replace(logvec * 0, logvec, which(logvec)) [1] 1 0 0 4 0 0 7 0 On 8/24/07, Leeds, Mark (IED) [EMAIL PROTECTED] wrote: I have the code below which gives me what I want for temp based on logvec but I was wondering if there was a shorter way ( i.e : a one liner ) without having to initialize temp to zeros. This is purely for learning purposes. Thanks. logvec - c(TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE) temp-numeric(length(invec)) temp[invec]-which(invec) temp [1] 1 0 0 4 0 0 7 0 obviously, the code below doesn't work. temp - which(invec) temp [1] 1 4 7 This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Turning a logical vector into its indices without losing its length

On 8/24/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: On 8/24/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: Here are two solutions: logvec - c(TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE) ifelse(logvec, seq_along(logvec), 0) [1] 1 0 0 4 0 0 7 0 replace(logvec * 0, logvec, which(logvec)) [1] 1 0 0 4 0 0 7 0 Actually the * 0 is not needed. The last one could simply be: replace(logvec, logvec, which(logvec)) If logvec can have NAs then this solution would not work but could be modified to be done like this: replace(logvec, which(logvec), which(logvec)) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Splitting strings

This applies the indicated perl-style regular expression where the first backreference (\\D+) is the non-digits and the second backreference (\\d+) is the digits. The two backreferences, but not the entire matched pattern itself, are passed as arguments x and y to the function whose body is the right hand side of the formula in the third argument. That is then simplified using rbind to give the result. library(gsubfn) strapply(surgery, (\\D+)(\\d+), ~ list(lets = x, nums = as.numeric(y)), backref = -2, perl = TRUE, simplify = rbind) More on gsubfn at http://gsubfn.googlecode.com On 8/23/07, Gary Collins [EMAIL PROTECTED] wrote: I'm having a Thursday morning mental block, any suggestions on the following would be most appreciated... I have (as an example) surgery = c(d48, d67, dnc37, a75, d10, a78, d31, d55, d1) before each number part the possibilities are c(a, d, dnc), I'm trying to split each element in surgery so that I have, status time d48 d67 dnc 37 a75 d10 a78 d31 d55 d1 I've tried various strsplit approaches but nothing has done what I need. thanks in advance Gary [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] FAQ 7.x when 7 does not exist. Useability question

Note that googling R FAQ 7.10 will get it on the first hit. On 8/23/07, John Kane [EMAIL PROTECTED] wrote: The FAQ Section 7 is a very useful place for new users to find out any number of R idiosycracies. However there is no numbering on the FAQ Table of Content or on the Sections Tables of Contents. An R-help list reply of Read FAQ 7.10 in response to a question about converting a factor to numeric is a bit cryptic. The only time 7.10 appears is after the searcher has found the entry. Would it be a good idea to actually number the entries for the FAQ Table of Contents and the Table of Contents for the Sections? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] extracting duplicated elements

Try: lapply(as.data.frame(t(DF)), function(x) unique(x[duplicated(x) x 0])) On 8/23/07, dxc13 [EMAIL PROTECTED] wrote: Can anyone help me solve this problem...thanks! Consider a data frame, namely v, as such: v X1 X2 X3 X4 X5 X1 X2 X3 X4 X5 x1 1 2 -1 -1 -1 1 2 -1 -1 -1 y1 1 2 -1 -1 -1 1 2 3 -1 -1 What I would like to do is to create an array or data frame with only the elements that appear in the data frame more than once and are = 0. I try this... v[v=0] [1] 1 1 2 2 1 1 2 2 3 which returns all = 0 elements, but they are not in their respective rows from the original data frame. I have tried using the duplicated() function and can't seem to get it to work correctly. Essentially, the outcome I am trying to get is a df or array looking like: step 1...achieve this out of original df [1] 1 2 1 2 [2] 1 2 1 2 3 (the blank element in row 1, position 5 can be just be NA) step 2...take the above and get this...only the duplicated elements [1] 1 2 [2] 1 2 -- View this message in context: http://www.nabble.com/extracting-duplicated-elements-tf4318034.html#a12295213 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] read big text file into R

Another option is to read it into a database and from there into R. RSQLite has the capability of reading certain text files directly into an SQLite database without going through R and from there one can read it into R. You can use RSQLite to do that. Alternately this post describes how the devel version of the sqldf package can do it: http://www.nabble.com/Re%3A-Memory-Experimentation%3A-Rule-of-Thumb-%3D-10-15-Times-the-Memory-p12078165.html On 8/23/07, Yupu Liang [EMAIL PROTECTED] wrote: Dear Rs: Hi, I am trying to read a big text file (nrows=243440, ncols=144). It seems the computational time of all the read methods (scan,readtable,read.delim) is not linear to the number of rows I want to read in: things became really slow once I tried to read in 10 lines compare to 1 lines). If I am reading the profiling result right, I guess scan wouldn't help either. My questions are : 1) Is this a memory issue? 2) How to get around this?: I can't just sit around for 15 mins. Would write a c function help? Thanks! Here is the profiling I did: Rprof() dd = read.delim(file,skip=9,sep=\t,as.is= T,nrows=1) Rprof(NULL) summaryRprof() $by.self self.time self.pct total.time total.pct scan 3.56 85.2 3.56 85.2 type.convert 0.48 11.5 0.48 11.5 read.table0.08 1.9 4.18 100.0 make.names0.02 0.5 0.02 0.5 options 0.02 0.5 0.02 0.5 readLines 0.02 0.5 0.02 0.5 read.delim0.00 0.0 4.18 100.0 file 0.00 0.0 0.02 0.5 getOption 0.00 0.0 0.02 0.5 $by.total total.time total.pct self.time self.pct read.table 4.18 100.0 0.08 1.9 read.delim 4.18 100.0 0.00 0.0 scan 3.56 85.2 3.56 85.2 type.convert 0.48 11.5 0.48 11.5 make.names 0.02 0.5 0.02 0.5 options0.02 0.5 0.02 0.5 readLines 0.02 0.5 0.02 0.5 file 0.02 0.5 0.00 0.0 getOption 0.02 0.5 0.00 0.0 $sampling.time [1] 4.18 ?Rprof() Rprof() dd = read.delim(file,skip=9,sep=\t,as.is= T,nrows=10) Rprof(NULL) summaryRprof() $by.self self.time self.pct total.time total.pct scan 143.12 92.7 143.12 92.7 type.convert9.52 6.2 9.52 6.2 read.table 1.60 1.0 154.28 99.9 paste 0.02 0.0 0.08 0.1 textConnection 0.02 0.0 0.04 0.0 .deparseOpts0.02 0.0 0.02 0.0 file0.02 0.0 0.02 0.0 make.names 0.02 0.0 0.02 0.0 print.default 0.02 0.0 0.02 0.0 read.delim 0.00 0.0 154.28 99.9 doTryCatch 0.00 0.0 0.08 0.1 gsub0.00 0.0 0.08 0.1 try 0.00 0.0 0.08 0.1 tryCatch0.00 0.0 0.08 0.1 tryCatchList0.00 0.0 0.08 0.1 tryCatchOne 0.00 0.0 0.08 0.1 capture.output 0.00 0.0 0.06 0.0 deparse 0.00 0.0 0.02 0.0 eval.with.vis 0.00 0.0 0.02 0.0 evalVis 0.00 0.0 0.02 0.0 print 0.00 0.0 0.02 0.0 $by.total total.time total.pct self.time self.pct read.table 154.28 99.9 1.60 1.0 read.delim 154.28 99.9 0.00 0.0 scan 143.12 92.7143.12 92.7 type.convert 9.52 6.2 9.52 6.2 paste0.08 0.1 0.02 0.0 doTryCatch 0.08 0.1 0.00 0.0 gsub 0.08 0.1 0.00 0.0 try 0.08 0.1 0.00 0.0 tryCatch 0.08 0.1 0.00 0.0 tryCatchList 0.08 0.1 0.00 0.0 tryCatchOne 0.08 0.1 0.00 0.0 capture.output 0.06 0.0 0.00 0.0 textConnection 0.04 0.0 0.02 0.0 .deparseOpts 0.02 0.0 0.02 0.0 file 0.02 0.0 0.02 0.0 make.names 0.02 0.0 0.02 0.0 print.default0.02 0.0 0.02 0.0 deparse 0.02 0.0 0.00 0.0 eval.with.vis0.02 0.0 0.00 0.0 evalVis 0.02 0.0 0.00 0.0 print0.02 0.0 0.00 0.0 $sampling.time [1] 154.36 I am using R 2.5.1 for mac on a Dual 2

### Re: [R] uneven list to matrix

Here are two solutions. The first repeatedly uses merge and the second creates a zoo object from each alph component whose time index consists of the row labels and uses zoo's multiway merge to merge them. # test data m - matrix(1:5, 5, dimnames = list(LETTERS[1:5], NULL)) alph - list(m[1:4,,drop=F], m[c(1,3,4),,drop=F], m[c(1,4,5),,drop=F]) alph # solution 1 out - alph[[1]] for(i in 2:length(alph)) { out - merge(out, alph[[i]], by = 0, all = TRUE) row.names(out) - out[[1]] out - out[-1] } matrix(as.matrix(out), nrow(out), dimnames=list(rownames(out),NULL)) # solution 2 library(zoo) z - do.call(merge, lapply(alph, function(x) zoo(c(x), rownames(x matrix(coredata(z), nrow(z), dimnames=list(time(z),NULL)) On 8/23/07, Christopher Marcum [EMAIL PROTECTED] wrote: Hello, I am sure I am not the only person with this problem. I have a list with n elements, each consisting of a single column matrix with different row lengths. Each row has a name ranging from A to E. Here is an example: alph[[1]] A 1 B 2 C 3 D 4 alph[[2]] A 1 C 3 D 4 alph[[3]] A 1 D 4 E 5 I would like to create a matrix from the elements in the list with n columns such that the row names are preserved and NAs are inserted into the cells where the uneven lists do not match up based on their row names. Here is an example of the desired output: newmatrix [,1] [,2] [,3] A 1 1 1 B 2 NANA C 3 3 NA D 4 4 4 E NANA5 Any suggestions? I have tried do.call(cbind,list) I also thought I was on the right track when I tried converting each element into a vector and then running this loop (which ultimately failed): newmat-matrix(NA,ncol=3,nrow=5) colnames(newmatrix)-c(A:E) for(j in 1:3){ for(i in 1:5){ for(k in 1:length(list[[i]])){ if(is.na(match(colnames(newmatrix),names(alph[[i]])))[j]==TRUE){ newmatrix[i,j]-NA} else newmatrix[i,j]-alph[[i]][k]}}} Thanks, Chris UCI Sociology __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] uneven list to matrix

On 8/24/07, Christopher Marcum [EMAIL PROTECTED] wrote: Hi Gabor, Thank you. The native solution works just fine, though there is an interesting side effect, namely, that with very large lists the rows of the output become scrambled though the corresponding columns are correctly sorted. The zoo package solution does not work on large lists: there is an error: Error in order(na.last, decreasing, ...) : argument 1 is not a vector They both work on the example data. Please provide reproducible examples to illustrate your comments if you would like a response. Gabor Grothendieck wrote: Here are two solutions. The first repeatedly uses merge and the second creates a zoo object from each alph component whose time index consists of the row labels and uses zoo's multiway merge to merge them. # test data m - matrix(1:5, 5, dimnames = list(LETTERS[1:5], NULL)) alph - list(m[1:4,,drop=F], m[c(1,3,4),,drop=F], m[c(1,4,5),,drop=F]) alph # solution 1 out - alph[[1]] for(i in 2:length(alph)) { out - merge(out, alph[[i]], by = 0, all = TRUE) row.names(out) - out[[1]] out - out[-1] } matrix(as.matrix(out), nrow(out), dimnames=list(rownames(out),NULL)) # solution 2 library(zoo) z - do.call(merge, lapply(alph, function(x) zoo(c(x), rownames(x matrix(coredata(z), nrow(z), dimnames=list(time(z),NULL)) On 8/23/07, Christopher Marcum [EMAIL PROTECTED] wrote: Hello, I am sure I am not the only person with this problem. I have a list with n elements, each consisting of a single column matrix with different row lengths. Each row has a name ranging from A to E. Here is an example: alph[[1]] A 1 B 2 C 3 D 4 alph[[2]] A 1 C 3 D 4 alph[[3]] A 1 D 4 E 5 I would like to create a matrix from the elements in the list with n columns such that the row names are preserved and NAs are inserted into the cells where the uneven lists do not match up based on their row names. Here is an example of the desired output: newmatrix [,1] [,2] [,3] A 1 1 1 B 2 NANA C 3 3 NA D 4 4 4 E NANA5 Any suggestions? I have tried do.call(cbind,list) I also thought I was on the right track when I tried converting each element into a vector and then running this loop (which ultimately failed): newmat-matrix(NA,ncol=3,nrow=5) colnames(newmatrix)-c(A:E) for(j in 1:3){ for(i in 1:5){ for(k in 1:length(list[[i]])){ if(is.na(match(colnames(newmatrix),names(alph[[i]])))[j]==TRUE){ newmatrix[i,j]-NA} else newmatrix[i,j]-alph[[i]][k]}}} Thanks, Chris UCI Sociology __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] uneven list to matrix

OK. One other thought. The R merge command has a sort= argument that you can try out. See ?merge On 8/24/07, Christopher Marcum [EMAIL PROTECTED] wrote: Hi Gabor, My apologies. Both solutions work just fine on large lists (n=1000, n[[i]]=500). A memory problem on my machine caused the error and fail-to-sort. Thank you! PS - The zoo method is slightly faster. Best, Chris Gabor Grothendieck wrote: On 8/24/07, Christopher Marcum [EMAIL PROTECTED] wrote: Hi Gabor, Thank you. The native solution works just fine, though there is an interesting side effect, namely, that with very large lists the rows of the output become scrambled though the corresponding columns are correctly sorted. The zoo package solution does not work on large lists: there is an error: Error in order(na.last, decreasing, ...) : argument 1 is not a vector They both work on the example data. Please provide reproducible examples to illustrate your comments if you would like a response. Gabor Grothendieck wrote: Here are two solutions. The first repeatedly uses merge and the second creates a zoo object from each alph component whose time index consists of the row labels and uses zoo's multiway merge to merge them. # test data m - matrix(1:5, 5, dimnames = list(LETTERS[1:5], NULL)) alph - list(m[1:4,,drop=F], m[c(1,3,4),,drop=F], m[c(1,4,5),,drop=F]) alph # solution 1 out - alph[[1]] for(i in 2:length(alph)) { out - merge(out, alph[[i]], by = 0, all = TRUE) row.names(out) - out[[1]] out - out[-1] } matrix(as.matrix(out), nrow(out), dimnames=list(rownames(out),NULL)) # solution 2 library(zoo) z - do.call(merge, lapply(alph, function(x) zoo(c(x), rownames(x matrix(coredata(z), nrow(z), dimnames=list(time(z),NULL)) On 8/23/07, Christopher Marcum [EMAIL PROTECTED] wrote: Hello, I am sure I am not the only person with this problem. I have a list with n elements, each consisting of a single column matrix with different row lengths. Each row has a name ranging from A to E. Here is an example: alph[[1]] A 1 B 2 C 3 D 4 alph[[2]] A 1 C 3 D 4 alph[[3]] A 1 D 4 E 5 I would like to create a matrix from the elements in the list with n columns such that the row names are preserved and NAs are inserted into the cells where the uneven lists do not match up based on their row names. Here is an example of the desired output: newmatrix [,1] [,2] [,3] A 1 1 1 B 2 NANA C 3 3 NA D 4 4 4 E NANA5 Any suggestions? I have tried do.call(cbind,list) I also thought I was on the right track when I tried converting each element into a vector and then running this loop (which ultimately failed): newmat-matrix(NA,ncol=3,nrow=5) colnames(newmatrix)-c(A:E) for(j in 1:3){ for(i in 1:5){ for(k in 1:length(list[[i]])){ if(is.na(match(colnames(newmatrix),names(alph[[i]])))[j]==TRUE){ newmatrix[i,j]-NA} else newmatrix[i,j]-alph[[i]][k]}}} Thanks, Chris UCI Sociology __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Optimization problem

Try this. 1. following Ben remove the Randalstown point and reset the levels of the Location factor. 2. then replace solve with ginv so it uses the generalized inverse to calculate the hessian: alan2 - subset(alan, subset = Location != Randalstown) alan2$Location - factor(as.character(alan2$Location)) library(MASS) solve - ginv zinb.zc - zicounts(resp=Scars~.,x =~Location + Lar + Mass + Lar:Mass + Location:Mass,z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data = alan2) rm(solve) On 8/21/07, Ben Bolker [EMAIL PROTECTED] wrote: (Hope this gets threaded properly. Sorry if it doesn't.) Gabor: Lac and Lacfac being the same is irrelevant, wouldn't produce NAs (but would produce something like a singular Hessian and maybe other problems) -- but they're not even specified in this model. The bottom line is that you have a location with a single observation, so the GLM that zicounts runs to get the initial parameter values has an unestimable location:mass interaction for one location, so it gives an NA, so optim complains. In gruesome detail: ## set up data scardat = read.table(scars.dat,header=TRUE) library(zicounts) ## try to run model zinb.zc - zicounts(resp=Scars~., x =~Location + Lar + Mass + Lar:Mass + Location:Mass, z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=scardat) ## tried to debug this by dumping zicounts.R to a file, modifying ## it to put a trace argument in that would print out the parameters ## and log-likelihood for every call to the log-likelihood function. dump(zicounts,file=zicounts.R) source(zicounts.R) zinb.zc - zicounts(resp=Scars~., x =~Location + Lar + Mass + Lar:Mass + Location:Mass, z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=scardat,trace=TRUE) ## this actually didn't do any good because the negative log-likelihood ## function never gets called -- as it turns out optim() barfs when it ## gets its initial values, before it ever gets to evaluating the log-likelihood ## check the glm -- this is the equivalent of what zicounts does to ## get the initial values of the x parameters p1 - glm(Scars~Location + Lar + Mass + Lar:Mass + Location:Mass, data=scardat,family=poisson) which(is.na(coef(p1))) ## find out what the deal is table(scardat$Location) scar2 = subset(scardat,Location!=Randalstown) ## first step to removing the bad point from the data set -- but ... table(scar2$Location) ## it leaves the Location factor with the same levels, so ## now we have ZERO counts for one location: ## redefine the factor to drop unused levels scar2$Location - factor(scar2$Location) ## OK, looks fine now table(scar2$Location) zinb.zc - zicounts(resp=Scars~., x =~Location + Lar + Mass + Lar:Mass + Location:Mass, z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=scar2) ## now we get another error (system is computationally singular when ## trying to compute Hessian -- overparameterized?) Not in any ## trivial way that I can see. It would be nice to get into the guts ## of zicounts and stop it from trying to invert the Hessian, which is ## I think where this happens. In the meanwhile, I have some other ideas about this analysis (sorry, but you started it ...) Looking at the data in a few different ways: library(lattice) xyplot(Scars~Mass,groups=Location,data=scar2,jitter=TRUE, auto.key=list(columns=3)) xyplot(Scars~Mass|Location,data=scar2,jitter=TRUE) xyplot(Scars~Lar,groups=Location,data=scar2, auto.key=list(columns=3)) xyplot(Scars~Mass|Lar,data=scar2) xyplot(Scars~Lar|Location,data=scar2) Some thoughts: (1) I'm not at all sure that zero-inflation is necessary (see Warton 2005, Environmentrics). This is a fairly small, noisy data set without huge numbers of zeros -- a plain old negative binomial might be fine. I don't actually see a lot of signal here, period (although there may be some) ... there's not a huge range in Lar (whatever it is -- the rest of the covariates I think I can interpret). It would be tempting to try to fit location as a random effect, because fitting all those extra degrees of freedom is going to kill you. On the other hand, GLMMs are a bit hairy. cheers Ben __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Evaluating f(x(2,3)) on a function f- function(a,b){a+b}

Try this: do.call(f, as.list(x)) On 8/22/07, Søren Højsgaard [EMAIL PROTECTED] wrote: Dear list I have a function and a vector, say f - function(a,b){a+b} x - c(2,3) I want to evaluate f on x in the sense of computing f(x[1],x[2]). I would like it to be so that I can write f(x). (I know I can write a wrapper function g - function(x){f(x[1],x[2])}, but this is not really what I am looking for). Is there a general way doing this (programmatically)? (E.g. by unpacking the elements of x and putting them in the right places when calling f...) I've looked under formals, alist etc. but so far without luck. Regards Søren [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Subsetting zoo object with a vector of time values.

See ?window.zoo e.g. library(zoo) # create test data tt - c(-50, -49.996, -49.995, -49.96, -49.956, -49.955, -49.92, -49.916, -49.915, -49.88) z - zoo(seq_along(tt), tt) window(z, c(-50, -49.96, -49.92, -49.88)) On 8/21/07, Todd Remund [EMAIL PROTECTED] wrote: I have a zoo object for which I would like to subset using a vector of time values. For example, I have the following time values represented in my zoo object. -50.000 -49.996 -49.995 -49.960 -49.956 -49.955 -49.920 -49.916 -49.915 -49.880 and would like to get observations corresponding to times -50 -49.96 -49.92 -49.88. What can I do without using the lapply or which functions? Thank you. Todd Remund __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] extracting month from date in numeric form

On 8/21/07, Gonçalo Ferraz [EMAIL PROTECTED] wrote: Hi, Anyone knows what would be a short way of extracting a month from a date in numeric or integer format? months(1979-12-20) returns December in character format. How could I get 12 in numeric or integer format? Here are a few solutions: format(as.Date(1979-12-20), %m) as.POSIXlt(as.Date(1979-12-20))$mo + 1 as.numeric(substring(1979-12-20, 6, 7)) as.numeric(factor(months(as.Date(1979-12-20), abbrev = TRUE), levels = month.abb)) See R News 4/1 Help Desk article for more on dates. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Optimization problem

Lac and Lacfac are the same. On 8/21/07, Alan Harrison [EMAIL PROTECTED] wrote: Hello Folks, Very new to R so bear with me, running 5.2 on XP. Trying to do a zero-inflated negative binomial regression on placental scar data as dependent. Lactation, location, number of tick larvae present and mass of mouse are independents. Dataframe and attributes below: Location Lac Scars Lar Mass Lacfac 1 Tullychurry 0 0 15 13.87 0 2 Somerset 0 0 0 15.60 0 3 Tollymore 0 0 3 16.43 0 4 Tollymore 0 0 0 16.55 0 5 Caledon 0 0 0 17.47 0 6 Hillsborough 1 5 0 18.18 1 7 Caledon 0 0 1 19.06 0 8 Portglenone 0 4 0 19.10 0 9 Portglenone 0 5 0 19.13 0 10Tollymore 0 5 3 19.50 0 11 Hillsborough 1 5 0 19.58 1 12 Portglenone 0 4 0 19.76 0 13 Caledon 0 8 0 19.97 0 14 Hillsborough 1 4 0 20.02 1 15 Tullychurry 0 3 3 20.13 0 16 Hillsborough 1 5 0 20.18 1 17 LoughNavar 1 5 0 20.20 1 18Tollymore 0 0 1 20.24 0 19 Hillsborough 1 5 0 20.48 1 20 Caledon 0 4 1 20.56 0 21 Caledon 0 3 2 20.58 0 22Tollymore 0 4 3 20.58 0 23Tollymore 0 0 2 20.88 0 24 Hillsborough 1 0 0 21.01 1 25 Portglenone 0 5 0 21.08 0 26 Tullychurry 0 2 5 21.28 0 27 Ballysallagh 1 4 0 21.59 1 28 Caledon 0 0 1 21.68 0 29 Hillsborough 1 5 0 22.09 1 30 Tullychurry 0 5 5 22.28 0 31 Tullychurry 1 6 75 22.43 1 32 Ballysallagh 1 5 0 22.57 1 33 Ballysallagh 1 4 0 22.67 1 34 LoughNavar 1 5 3 22.71 1 35 Hillsborough 1 4 0 23.01 1 36 Caledon 0 0 3 23.08 0 37 LoughNavar 1 5 0 23.53 1 38 Ballysallagh 1 4 0 23.55 1 39 Portglenone 1 6 0 23.61 1 40 Mt.Stewart 0 3 0 23.70 0 41 Somerset 0 5 0 23.83 0 42 Ballysallagh 1 5 0 23.93 1 43 Ballysallagh 1 5 0 24.01 1 44 Caledon 0 0 3 24.14 0 45 LoughNavar 0 6 0 24.30 0 46 LoughNavar 1 5 0 24.34 1 47 Hillsborough 1 4 0 24.45 1 48 Caledon 0 3 2 24.55 0 49 Tullychurry 0 5 44 24.83 0 50 Hillsborough 1 5 0 24.86 1 51 Ballysallagh 1 5 0 25.02 1 52 Tullychurry 0 0 9 25.27 0 53 Mt.Stewart 0 5 0 25.31 0 54 LoughNavar 1 4 8 25.43 1 55 Somerset 1 0 0 25.58 1 56 Hillsborough 1 5 0 25.82 1 57 Portglenone 1 2 0 26.02 1 58 Ballysallagh 1 5 0 26.19 1 59 Mt.Stewart 1 0 0 26.66 1 60 Randalstown 1 0 1 26.70 1 61 Somerset 0 4 0 27.01 0 62 Mt.Stewart 0 4 0 27.05 0 63 Somerset 0 3 0 27.10 0 64 Somerset 0 6 0 27.34 0 65 Somerset 0 0 0 27.87 0 66 LoughNavar 1 5 1 28.01 1 67 Tullychurry 1 6 42 28.55 1 68 Hillsborough 1 5 0 28.84 1 69 Portglenone 1 4 0 29.00 1 70 Somerset 1 4 0 31.87 1 71 Ballysallagh 1 5 0 33.06 1 72 LoughNavar 1 4 0 33.24 1 73 Somerset 1 4 0 33.36 1 alan : 'data.frame':73 obs. of 6 variables: $ Location: Factor w/ 10 levels Ballysallagh,..: 10 8 9 9 2 3 2 6 6 9 ... $ Lac : int 0 0 0 0 0 1 0 0 0 0 ... $ Scars : int 0 0 0 0 0 5 0 4 5 5 ... $ Lar : int 15 0 3 0 0 0 1 0 0 3 ... $ Mass: num 13.9 15.6 16.4 16.6 17.5 ... $ Lacfac : Factor w/ 2 levels 0,1: 1 1 1 1 1 2 1 1 1 1 ... The syntax I used to create the model is: zinb.zc - zicounts(resp=Scars~.,x =~Location + Lar + Mass + Lar:Mass + Location:Mass,z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=alan) The error given is: Error in optim(par = parm, fn = neg.like, gr = neg.grad, hessian = TRUE, : non-finite value supplied by optim In addition: Warning message: fitted probabilities numerically 0 or 1 occurred in: glm.fit(zz, 1 - pmin(y, 1), family = binomial()) I understand this is a problem with the model I specified, could anyone help out?? Many thanks Alan Harrison Quercus Queen's University Belfast MBC, 97 Lisburn Road Belfast BT9 7BL T: 02890 972219 M: 07798615682 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal,

### Re: [R] tackle memory insufficiency for large dataset using save() load()?

See ?save . The ... arguments are the ***names*** of the objects, not the objects so you want save(d, ...whatever...) not save(d, ...whatever...) . Also don't use attach and detach and read this about factors which applies if your factor has many levels but can be ignored if not: http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg92970.html On 8/21/07, Jessica Z [EMAIL PROTECTED] wrote: Hello List, i have been agonizing over this for days, any reply would be greatly appreciated! Situation:___ My original dataset is a .csv dataset (w/ 2M records) with 4 variables: job_id (Primary key, won't be used for analysis, just used for join tables), sector_id (categorical variable, for 19 industry sectors), sqft (con't variable for square footage), building_type (categorical, for 2 building types) some values of sqft were inputed wrong, so i'd like to set sqft1 to NA and then use aregImpute() to impute those NAs. Problem: the origianl dataset(.csv format) is too large. though i could read that dataset into R, i could not get aregImpute() run even i set the memory limit to 3G ! (yes, i did the switch in windows to reach 3G rather than 2G) Goal: try to find a way to slim down my dataset so as to get aregImpute() running. What i did: i searched in the archive, and found someone said, as R tends to inflate memory, it is a good idea to first read the original dataset into R-- then save it as a more compact binary file using save() -- and then reload the compact binary file back into R using load(). this way would reduce the memory allocation. HOWEVER, after i saved my original dataset into a compact binary file using save(), and used load(filename.Rdata) to reload the new compact data format into R, I could not figure out how to retrive all my variables!!! R shows the new dataset is not a list, nor a matrix, or a dataframe, but just a character with length 1 !!! and there is no way i could do attach(). i generated a 1K-row subset out of my original dataset to illustrate my problem (does anyone know how to get my four variables back from this compact binary new dataset? what did i do wrong?): data - read.table (file.choose(),header=T,sep=,) summary(data) job_id sector_id sqftbuilding_type Min. : 1.0 Min. : 6.000 Min. : 0.00 Min. :1.000 1st Qu.: 250.8 1st Qu.: 6.000 1st Qu.: 3.00 1st Qu.:2.000 Median : 500.5 Median :11.000 Median : 4.00 Median :2.000 Mean : 500.5 Mean : 9.455 Mean : 12.49 Mean :1.996 3rd Qu.: 750.3 3rd Qu.:11.000 3rd Qu.: 4.00 3rd Qu.:2.000 Max. :1000.0 Max. :12.000 Max. :192.00 Max. :2.000 attach(data) sqft[sqft1] - NA sector.f - as.factor(sector_id) building_type.f - as.factor (building_type) d - data.frame(job_id,sector.f,sqft, building_type.f) summary (d) job_id sector.f sqftbuilding_type.f Min. : 1.0 6 :340 Min. : 3.00 1: 4 1st Qu.: 250.8 11:505 1st Qu.: 4.00 2:996 Median : 500.5 12:155 Median : 4.00 Mean : 500.5Mean : 14.16 3rd Qu.: 750.33rd Qu.: 17.00 Max. :1000.0Max. :192.00 NA's :118.00 save (d, file=compact_d.Rdata, ascii=FALSE) newdata - load (compact_d.Rdata) summary(newdata) Length Class Mode 1 character character attach(newdata) Error in attach(newdata) : file 'd' not found is.data.frame (newdata) [1] FALSE is.list (newdata) [1] FALSE is.matrix (newdata) [1] FALSE _ btw, i also tried to just save (into compact binary) and reload (the new compact binary data format) (as i could do the NA stuff in sql anyhow). however, i still got stucked at the same spot: data - read.table (file.choose(),header=T,sep=,) summary(data) job_id sector_id sqftbuilding_type Min. : 1.0 Min. : 6.000 Min. : 0.00 Min. :1.000 1st Qu.: 250.8 1st Qu.: 6.000 1st Qu.: 3.00 1st Qu.:2.000 Median : 500.5 Median :11.000 Median : 4.00 Median :2.000 Mean : 500.5 Mean : 9.455 Mean : 12.49 Mean :1.996 3rd Qu.: 750.3 3rd Qu.:11.000 3rd Qu.: 4.00 3rd Qu.:2.000 Max. :1000.0 Max. :12.000 Max. :192.00 Max. :2.000 save (data, file=compact_data.Rdata, ascii=FALSE) newdata - load (compact_data.Rdata) summary(newdata) Length Class Mode 1 character character attach(newdata) Error: restore file may be empty -- no data loaded In addition: Warning message: file 'data' has magic number '' Use of save versions prior to 2 is deprecated is.data.frame (newdata) [1] FALSE is.list (newdata) [1] FALSE is.matrix (newdata) [1] FALSE - Building a website is a piece of cake. [[alternative

### Re: [R] tackle memory insufficiency for large dataset using save() load()?

?save says its the names (not the objects) although I just tried it and both save(iris, file = /iris.Rdata) and save(iris, file = /iris.Rdata) seemed to work so you are right that it seems to work with the objects, not just the names,\ although its not documented to do so. Usage save(..., list = character(0), file = stop('file' must be specified), ascii = FALSE, version = NULL, envir = parent.frame(), compress = !ascii, eval.promises = TRUE) save.image(file = .RData, version = NULL, ascii = FALSE, compress = !ascii, safe = TRUE) Arguments ... the names of the objects to be saved. list A character vector containing the names of objects to be saved. On 8/21/07, Rolf Turner [EMAIL PROTECTED] wrote: On 22/08/2007, at 1:48 PM, Gabor Grothendieck wrote: See ?save . The ... arguments are the ***names*** of the objects, not the objects so you want save(d, ...whatever...) not save(d, ...whatever...) . I think this is wrong. You want the objects not their names. If you want to make use of object names, use the list argument. I.e. save(melvin,clyde,file=irving) and save(list=c(melvin,clyde),file=irving) accomplish the same thing. cheers, Rolf Turner ## Attention: This e-mail message is privileged and confidential. If you are not the intended recipient please delete the message and notify the sender. Any views or opinions presented are solely those of the author. This e-mail has been scanned and cleared by MailMarshal www.marshalsoftware.com ## __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Q: combine 2 data frames with missing values

Try this: Lines - casevar1var2var3 var4 1 9 9 13 11 2 15 9 15 13 3 na na 12 9 4 8 6 na na 5 14 10 na na 6 20 15 17 15 # replace with DF - read.table(myfile.dat, header = TRUE, na.strings = na) DF - read.table(textConnection(Lines), header = TRUE, na.strings = na) DF1 - DF[-1] kor - cor(DF1, use = pairwise) kor lm(var1 ~ var2, DF) # a sample regression # mycoef calculates kth coefficient in regression of # ith variable on jth variable mycoef - function(i, j, k) coef(lm(DF1[c(i, j)]))[k] idx - 1:ncol(DF1) names(idx) - names(DF1) intercepts - outer(idx, idx, Vectorize(mycoef), 1) names(dimnames(intercepts)) - c(y, x) intercepts slopes - outer(idx, idx, Vectorize(mycoef), 2) names(dimnames(slopes)) - c(y, x) slopes # another approach to the above # mycoef1 is like mycoef but has only one argument # and outputs all coefs, not just a specified one mycoef1 - function(idx) coef(lm(DF1[idx])) out - t(apply(expand.grid(y = idx, x = idx), 1, mycoef1)) colnames(out) - c(y, x, intercept, slope) out # To perform SQL operations on data frames # see sqldf home page at http://sqldf.googlecode.com # and also ?sqldf for many examples library(sqldf) sqldf(select avg(var1), avg(var2), avg(var3), avg(var4) from DF1) colMeans(DF1, na.rm = TRUE) # same On 8/20/07, Tom Willems [EMAIL PROTECTED] wrote: hello R ussers, i have the same problem with my data, for aal the different variables, i have the same number of cases, but the are often out of detectionlimits so they produce na's . so the data looks like this: casevar1var2var3var4 ... 1 9 9 13 11 2 15 9 15 13 3 na na 12 9 4 8 6 na na 5 14 10 na na 6 20 15 17 15 .. .. What i would like to do for data exploration, is to compare each possible pair of variables, get their correlation coefficient, the intercept and the slope of regression line. yet for every variable the messurements are lnked thruogh theyr case. it is the same sample just a diferent test. Now i select a subsets of variables out of the original dataset, and use : value_x1 = subset(dataset_1,select=lg_value) value_y1 =subset(dataset_2,select=lg_value) Then i to mold an lm model, inorder to get estimates for the slope ans intercept model_1 - lm (value_y1[,1]~ value_x1[,1] ) This is what R tell's me: Error in model.frame(formula, rownames, variables, varnames, extras, extranames, : variable lengths differ (found for 'value_x1[, 1]') Is there perhapes a way of binding the selected subsets together, still linked to their case, so that the na's can be discarded by R automaticaly? I have been trying to use SQLiteDF and the other sql func's of R, but i don't realy understand them. If someone out there knows how to use sql, in R, i d be delited if he or she could explain it to me, more understandible then the manuals i find on the web. Here is what io would want sql to do . My data is in columns, one column holds all the case numbers, one the messured values, one all the testtypes and one the timeperiod and then one column for the lab's that preformed the test. is is stored in a txt file. So it is a long 5 column data table. Now is it possible to make a cross table holding the case nr's, and timeperiod in 2 column's, and then have a different column for every test? so if there are 4 tests and 4 lab's, it would give 16 columns. I've tryed it in access, but it gave me andless loops of repeated values. and creating new data files is dangerous, 'litle mistakes made while copying ' or manipultaions made to one file and not to the other'. . kind regards, Tom Disclaimer: click here [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Any parser generator / code assistance for R?

On 8/17/07, Ali - [EMAIL PROTECTED] wrote: Hi, Is there any parser generator like www.antlr.org? Moreover, how does simple Given the response, it looks like no one has come up with an antlr parser for R but there are some facilities within R itself. showTree() in the R codetools package which can generate a Lisp style expression for any R expression: library(codetools) showTree(quote(for(i in myvec[1:3]) print(i+88*2+3*4))) (for i ([ myvec (: 1 3)) (print (+ (+ i (* 88 2)) (* 3 4 Looking at the source of showTree would show you how to walk an R parse tree. The Ryacas R package has a recursive descent R parser that is used to process R code translating it to yacas and it also can translate OpenMath XML code generated by yacas to R. See: http://ryacas.googlecode.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] recommended combo of apps for new user?

Regarding RODBC vs. DBI-based packages (RSQLite, RMySQL, etc.) its my perception, possibly mistaken, that apart from any consideration of the R packages themselves, ODBC (which originated in the Windows world) is more widely used on Windows than UNIX. Also ODBC has the problem that one must configure it which puts an extra step into the process. Clear documentation on how to do such ODBC configuration may be difficult to find. On the other hand the RODBC package itself seems to be maintained very well and is typically available for new versions of R before the DBI-based packages. On 8/19/07, Prof Brian Ripley [EMAIL PROTECTED] wrote: Some additional comments on the DBMS front. (a) SPSS is not a DBMS, so it is not clear that you need this. But if you do and are storing valuable data in a DBMS a lot of further questions come into play, like how you are going to do backups. I'd say PostgreSQL was really only for professional-level administrators. My sysadmins recommend MySQL for most people. We do also run PostgreSQL and they find it a lot trickier to maintain. 'dozens of columns and thousands of rows' is not big. A data frame with 50 columns and 5000 rows would only take 2Mb to store, and R will easily handle 100x with 4GB of RAM (and if you have less, get 4GB). So storing data in .rda (R's save() format) is most likely viable. R's indexing etc operations make it good at data manipulation, and using a DBMS will involve learning SQL, a non-trivial cost. (b) You have a choice of interfaces to a DBMS, RODBC and the DBI+ family, e.g. DBI+RMySQL and DBI+RSQLite. I'm biased, but I find RODBC more intuitive, and many people have reported it to be faster. If all you want is non-permanent storage for manipulation of large data sets, consider also SQLiteDF. On Sat, 18 Aug 2007, Duncan Murdoch wrote: Martin Brown wrote: [i sent this message earlier but apparently should have sent it plain text, as follows..] Hi there, I would like some advice, not so much about how to use R, but about software that I need to complement R. I've rooted around in the FAQ's and done a few searches on this mailing list but haven't quite found the perspective I need. I am an experienced data analyst in my field (forest ecology and ecological monitoring) but new to R. I am a long time user of SPSS and have gotten pretty handy with it. However, I am frustrated with SPSS for several reasons: There's the cost (I'm a freelancer; I pay for my software myself); the Windows dependence (I use Kubuntu as my usual OS now, and switching back and forth is a pain); the horrible inefficiency when I do certain types of file manipulations; and the inability to do the kind of publication-quality graphs I want... I've usually ended up using a commercial graphing program (another source of expense and limitation). I'd like to switch to using R on Kubuntu, for all those reasons. In addition I think the mathematical formality that R encourages might be good for me. However, reviewing the FAQ's on the R project web site makes me realize that I've been using SPSS as three kinds of software really: a DBMS; a statistical analysis package; and a graphing package. It looks like moving to R might involve learning three kinds of software, not just one. I wonder: 1) What open-source DBMS works most seamlessly with R? I have seen MySQL recommended but wonder if there are alternatives. I sometimes need to handle big data files. In fact a lot of my work involves exploratory and descriptive analyses of rather large and messy databases from ecological monitoring, rather than statistical tests per se. In SPSS the data files I have been generating have dozens of columns and thousands of rows, often with value and variable labels helpful for documenting my work. See above. I think you won't find much difference in the R interface between MySQL, PostgreSQL, or SQLite. The choice should be made based on the qualities of the database (and I don't know enough about the differences to give a recommendaton.) 2) For the purpose of creating publication-quality graphs, do R users typically need to go outside of the R system? If so, what open-source programs would you all recommend? R is great for this, but you might need to go outside for some specialized stuff (e.g. medical imaging). 3) Any other software I need to learn that would make my work in R more productive? (for example, a code editor). A lot of people are happy with ESS mode in Emacs. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied

### Re: [R] Creating a data set within a function

Check out ?embed On 8/19/07, Anup Nandialath [EMAIL PROTECTED] wrote: Dear Friends, I'm trying to find if there is a way to automate creation of the design matrix. Suppose we are interested in say running an autoregressive model. The user inputs the following data myfunAR - function(y, order) {. .. } now here y is the data series and order represents the level of the process. In other words if order=2 then we have an AR (2) process. Now it is easy to to create the y vector within the function, but I'm not clear on how to create the design matrix. For instance if order=2 then y - as.matrix(rnorm(100)) ynew - as.matrix(y[3:nrow(y),1]) x - as.matrix(cbind(rep(1, nrow(y)-2), y[2:(nrow(y)-1),1], y[1:(nrow(y)-2),1])) ynew and x gives me the response vector and design matrix respectively. however, I'm trying to write a general function which will accomodate any order. Hence given the user inputs y and the order, is there a way to program the creation of the x matrix automatically. The long way would be if (order=1) {%5} if (order=2) {%5} but this will force me to limit at some point.Is there an alternative way to program this?? Thanks in advance Regards Anup - Building a website is a piece of cake. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] How to parse a string into the symbol for a data frame object

You might want to store the data frames in a list to eliminate this problem and make it more convenient to iterate over them: L - list(df1 = df1, df2 = df2) rm(df1, df2) # reduce each data frame to its first few rows for(nm in names(L)) L[[nm]] - head(L[[nm]) or if you don't need to modify them or know their names: # print first few lines of each for(df in L) print(head(df)) On 8/19/07, Darren Weber [EMAIL PROTECTED] wrote: I have several data frames, eg: df1 - data.frame(x=seq(0,10), y=seq(10,20)) df2 - data.frame(a=seq(0,10), b=seq(10,20)) It is common to create loops in R like this: for(df in list(df1, df2)){ #etc. } This works fine when you know the name of the objects to put into the list. I assume that the order of the objects in the list is respected through the loop. Inside the loop, the objects of the list are 'dereferenced' using 'df' but, to my knowledge, there is no way to tell whether 'df' is a current representation of 'df1' or 'df2' without some additional book keeping. In addition, I really want to use 'paste' within the loop to create a new string value that will have the symbol name of a data frame to be dereferenced, e.g.: for(n in c(1, 2)){ dfString - paste('df', n, sep=); print(eval(dfString)) } [1] df1 [1] df2 This is not what I want. I have read through the documentation on eval and similar commands like substitute and quote. I program regularly, but I do not understand these constructs in R. I do not understand the R framework for parsing and evaluation and I don't have a lot of time right now to get lost in this detail. I could really use some help to get the string values in my loop to be parsed into symbols that refer to the data frame objects df1 and df2. How is this done? Best, Darren __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] how to collapse a list of 1 column matrix to a matrix?

Try this: L - list(`1` = matrix(1:4, 4), `2` = matrix(5:8, 4)) sapply(L, c) Note that the list component names are kept as column names in the result On 8/19/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi, I encounter a situation where I have a list whose element is a column matrix. Says, $'1' [,1] 1 2 3 $'2' [,1] 4 5 6 Is there fast way to collapse the list into a matrix like a cbind operation in this case? Meaning, the result should be a matrix that looks like: [,1] [,2] [1,]1 4 [2,]2 5 [3,]3 6 I can loop through all elements and do cbind manually. But I think there must be a simpler way that I don't know. Thank you. - adschai __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] recommended combo of apps for new user?

On 8/18/07, Martin Brown [EMAIL PROTECTED] wrote: Hi there, I would like some advice, not so much about how to use R, but about software that I need to complement R. I've rooted around in the FAQ's and done a few searches on this mailing list but haven't quite found the perspective I need. I am an experienced data analyst in my field (forest ecology and ecological monitoring) but new to R. I am a long time user of SPSS and have gotten pretty handy with it. However, I am frustrated with SPSS for several reasons: There's the cost (I'm a freelancer; I pay for my software myself); the Windows dependence (I use Kubuntu as my usual OS now, and switching back and forth is a pain); the horrible inefficiency when I do certain types of file manipulations; and the inability to do the kind of publication-quality graphs I want... I've usually ended up using a commercial graphing program (another source of expense and limitation). I'd like to switch to using R on Kubuntu, for all those reasons. In addition I think the mathematical formality that R encourages might be good for me. From a strictly language perspective, mathematical formality is pretty far from R. Its actually quite loose. Underneath there are some Lisp/Scheme ideas but you are not very close to that as a user. However, reviewing the FAQ's on the R project web site makes me realize that I've been using SPSS as three kinds of software really: a DBMS; a statistical analysis package; and a graphing package. It looks like moving to R might involve learning three kinds of software, not just one. I wonder: 1) What open-source DBMS works most seamlessly with R? I have seen MySQL recommended but wonder if there are alternatives. I sometimes need to handle big data files. In fact a lot of my work involves exploratory and descriptive analyses of rather large and messy databases from ecological monitoring, rather than statistical tests per se. In SPSS the data files I have been generating have dozens of columns and thousands of rows, often with value and variable labels helpful for documenting my work. Databases. SQLite is the easiest to install since its embedded rather than client/server so I would use that unless your application requires client/server or other features of MySQL. MySQL is probably the most popular of the free data bases so that would be the next one to go with. If you intend to create a commercial application you might want to consider Postgres instead of MySQL as the latter charges for commercial implementations but Postgres does not. Some heavy Postgres users might feel that it should be considered after SQLite rather than MySQL and there is a certain amount of arbitrariness here. See the R packages RSQLite, RMySQL and DBI. The R packages sqldf and SQLiteDF are beginning to blur the boundary between R and the database. 2) For the purpose of creating publication-quality graphs, do R users typically need to go outside of the R system? If so, what open-source programs would you all recommend? Graphics. R should be ok. Check out: http://cran.r-project.org/src/contrib/Views/Graphics.html and also google for R Graphics Gallery 3) Any other software I need to learn that would make my work in R more productive? (for example, a code editor). Other. You need to know a text editor. I use vim but there are many good choices here with ESS being one that is often mentioned. http://www.sciviews.org/_rgui/projects/Editors.html http://ess.r-project.org/ If you intend to write C routines to run with R then, of course, you need to know C. For certain R packages that interface with outside software (tcltk, Rgraphviz, Ryacas, XML, etc.) you will need to know something about the interfaced-to software if you intend to use those packages. For package development you will need to know latex and possibly subversion, i.e. svn, the UNIX screen program, tar and various other UNIX commands. Certain auxilliary programs that come with and are used with R are written in perl although its unlikely you will need to know it. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] names not inherited in functions

Within a function deparse(substitute(x)) will give the name of x, as a character variable. Search the archives for deparse substitute to find many examples. On 8/17/07, david dav [EMAIL PROTECTED] wrote: Dear R list, After a huge delay, I come back to this question. Using names of variables inside a function is a problem I run into quite often. Maybe this little example should help to get my point: Suppose I want to make a function llabel to get the labels of the variables from a data frame. If no label is defined, llabel should return the name of the variable. library(Hmisc) v1 - c(1,2) v2 - c(1,2) v3 - c(1,3) tablo - data.frame(v1,v2,v3) rm(v1,v2,v3) label(tablo$v1) - var1 attach(tablo) # This does the trick on one variable. if (label(v1) !=) label(v1) else names(data.frame(v1)) if (label(v2) !=) label(v2) else names(data.frame(v2)) But if I call this statement in a llabel function, llabel - function(var) { if (label(var) != ) res - label(var) else res - names(data.frame(var)) return (res) } I just get vars instead of the names when no label is defined : llabel(v1) # works llabel(v2) # gives var instead of v2 Thanks for your help. David 2007/6/7, Uwe Ligges [EMAIL PROTECTED]: Not sure what you are going to get. Can you shorten your functions and specify some example data? Then please tell us what your expected result is. Best, Uwe Ligges david dav wrote: Dear all, I 'd like to keep the names of variables when calling them in a function. An example might help to understand my problem : The following function puts in a new data frame counts and percent of a data.frame called as tablo the step nom.chiffr[1] - names(vari) is useless as names from the original data.frame aren't kept in the function environement. Hoping I use appropriate R-vocabulary, I thank you for your help David descriptif - function (tablo) { descriptifvar - function (vari) { table(vari) length(vari[!is.na(vari)]) chiffr - cbind(table(vari),100*table(vari)/(length(vari[!is.na(vari)]))) nom.chiffr - rep(NA, dim(table(vari))) if (is.null(names(vari))) nom.chiffr[1] - paste(i,) else nom.chiffr[1] - names(vari) chiffr - data.frame ( names(table(vari)),chiffr) rownames(chiffr) - NULL chiffr - data.frame (nom.chiffr, chiffr) return(chiffr) } res - rep(NA, 4) for (i in 1 : ncol(tablo)) res - rbind(res,descriptifvar(tablo[,i])) colnames(res) - c(variable, niveau, effectif, pourcentage) return(res[-1,]) } # NB I used this function on a data.frame with only factors in __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] an easy way to construct this special matirx

Here are two solutions. In the first lo has TRUE on the lower diagonal and diagonal. Then we compute the exponents, multiplying by lo to zero out the upper triangle. In the second rn is a matrix of row numbers and rn = t(rn) is the same as lo in the first solution. r - 2; n - 5 # test data lo - lower.tri(diag(n), diag = TRUE) lo * r ^ (row(lo) - col(lo) + 1) Here is another one: rn - row(diag(n)) (rn = t(rn)) * r ^ (rn - t(rn) + 1) On 8/15/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi, Sorry if this is a repost. I searched but found no results. I am wondering if it is an easy way to construct the following matrix: r 1 0 00 r^2 r 1 00 r^3 r^2 r 10 r^4 r^3 r^2 r1 where r could be any number. Thanks. Wen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] time series with quality codes

In addition, we could create a function to.df which converts a zoo object to a data frame assuming that any column that only contains 1:nlevels is a factor with the indicated level names. Use to.df just before plotting: library(zoo) set.seed(1) f - zoo(factor(sample(3, 10, replace = TRUE))) x - zoo(rnorm(10)) y - zoo(rnorm(10)) z - merge(x, y, f) to.df - function(z, levels = letters[1:3], time = FALSE) { zz - as.data.frame(z) for(i in ncol(zz)) if (all(zz[,i] %in% seq_along(levels))) z[,i] - factor(levels[z[,i]]) if (time) cbind(index = index(z), zz) else zz } library(lattice) xyplot(y ~ x | f, data = to.df(z)) On 8/16/07, Achim Zeileis [EMAIL PROTECTED] wrote: On Thu, 16 Aug 2007, Felix Andrews wrote: list(...), I am working with environmental time series (eg rainfall, stream flow) that have attached quality codes for each data point. The quality codes have just a few factor levels, like good, suspect, poor, imputed. I use the quality codes in plots and summaries. They are carried through when a time series is aggregated to a longer time-step, according to rules like worst, median or mode. I need to support time steps of anything from hours to years. I can assume the data are regular time series -- they might be irregular initially but could be 'regularized'. But I would want to plot irregular time series along with regular ones. So far I have been using a data frame with a POSIXct column, a numeric column and a factor column. However I would like to use zoo instead, because of its many utility functions and easy conversion to ts. Is there any prospect of zoo handling such numeric + factor data? Other suggestions on elegant ways to do it are also welcome. There is some limited support for this in zoo. You can do z - zoo(myfactor, myindex) and work with it like a zoo series and then coredata(z) will recover a factor. However, you cannot bind this to other series without losing the factor structure. At least not in a plain zoo series. But you can do df - merge(z, Z, retclass = data.frame) where every column of the resulting data.frame is a univariate zoo series. The final option would be to just have a data.frame as usual and put your data/index into one column. But then it's more difficult to leverage zoo's functionality. I would like to have more support for things like this, but currently this is what we have. Best, Z Felix -- Felix Andrews / �� PhD candidate Integrated Catchment Assessment and Management Centre The Fenner School of Environment and Society The Australian National University (Building 48A), ACT 0200 Beijing Bag, Locked Bag 40, Kingston ACT 2604 http://www.neurofractal.org/felix/ xmpp:[EMAIL PROTECTED] 3358 543D AAC6 22C2 D336 80D9 360B 72DD 3E4C F5D8 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] an easy way to construct this special matirx

It was pointed out that the required matrix may not be square and the superdiagonal was missing in my prior post. Here is a revision: r - 2; nr - 4; nc - 5 # test data x - matrix(nr = nr, nc = nc) x - row(x) - col(x) + 1 (x = 0) * r ^ x On 8/16/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: Here are two solutions. In the first lo has TRUE on the lower diagonal and diagonal. Then we compute the exponents, multiplying by lo to zero out the upper triangle. In the second rn is a matrix of row numbers and rn = t(rn) is the same as lo in the first solution. r - 2; n - 5 # test data lo - lower.tri(diag(n), diag = TRUE) lo * r ^ (row(lo) - col(lo) + 1) Here is another one: rn - row(diag(n)) (rn = t(rn)) * r ^ (rn - t(rn) + 1) On 8/15/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi, Sorry if this is a repost. I searched but found no results. I am wondering if it is an easy way to construct the following matrix: r 1 0 00 r^2 r 1 00 r^3 r^2 r 10 r^4 r^3 r^2 r1 where r could be any number. Thanks. Wen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Linear models over large datasets

Its actually only a few lines of code to do this from first principles. The coefficients depend only on the cross products X'X and X'y and you can build them up easily by extending this example to read files or a database holding x and y instead of getting them from the args. Here we process incr rows of builtin matrix state.x77 at a time building up the two cross productxts, xtx and xty, regressing Income (variable 2) on the other variables: mylm - function(x, y, incr = 25) { start - xtx - xty - 0 while(start nrow(x)) { idx - seq(start + 1, min(start + incr, nrow(x))) x1 - cbind(1, x[idx,]) xtx - xtx + crossprod(x1) xty - xty + crossprod(x1, y[idx]) start - start + incr } solve(xtx, xty) } mylm(state.x77[,-2], state.x77[,2]) On 8/16/07, Alp ATICI [EMAIL PROTECTED] wrote: I'd like to fit linear models on very large datasets. My data frames are about 200 rows x 200 columns of doubles and I am using an 64 bit build of R. I've googled about this extensively and went over the R Data Import/Export guide. My primary issue is although my data represented in ascii form is 4Gb in size (therefore much smaller considered in binary), R consumes about 12Gb of virtual memory. What exactly are my options to improve this? I looked into the biglm package but the problem with it is it uses update() function and is therefore not transparent (I am using a sophisticated script which is hard to modify). I really liked the concept behind the LM package here: http://www.econ.uiuc.edu/~roger/research/rq/RMySQL.html But it is no longer available. How could one fit linear models to very large datasets without loading the entire set into memory but from a file/database (possibly through a connection) using a relatively simple modification of standard lm()? Alternatively how could one improve the memory usage of R given a large dataset (by changing some default parameters of R or even using on-the-fly compression)? I don't mind much higher levels of CPU time required. Thank you in advance for your help. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Combine matrix

Try this. We convert to data frame placing the row names in column 1, do the merge, remove column 1 and convert back to matrix: # test input a - matrix(1:25, nrow = 5, dimnames = list(letters[1:5], rep(A, 5))) b - matrix(1:40, nrow = 8, dimnames = list(rep(letters[1:2], each = 4), rep(B, 5))) # 1. process to.DF - function(x) data.frame(rn = row.names(x), x, row.names = 1:nrow(x)) out - as.matrix(merge(to.DF(a), to.DF(b), by = 1)[,-1]) colnames(out) - c(colnames(a), colnames(b)) out # 2. same but merge is done using sqldf # assume same a, b and to.DF as before library(sqldf) DFa - to.DF(a) DFb - to.DF(b) out - as.matrix(sqldf(select * from DFa join DFb using(rn))[-1]) colnames(out) - c(colnames(a), colnames(b)) out # 3. same but uses sqldf and proto (which sqldf automatically loads) # assume same a, b and to.DF as before library(sqldf) out - as.matrix(sqldf(select * from a join b using(rn), envir = proto(a = to.DF(a), b = to.DF(b)))[-1]) colnames(out) - c(colnames(a), colnames(b)) out On 8/16/07, Gianni Burgin [EMAIL PROTECTED] wrote: let say something like this a=matrix(1:25, nrow=5) rownames(a)=letters[1:5] colnames(a)=rep(A, 5) a A A A A A a 1 6 11 16 21 b 2 7 12 17 22 c 3 8 13 18 23 d 4 9 14 19 24 e 5 10 15 20 25 b=matrix(1:40, nrow=8) rownames(b)=c(rep(a,4),rep(b,4)) colnames(b)=rep(B, 5) b B B B B B a 1 9 17 25 33 a 2 10 18 26 34 a 3 11 19 27 35 a 4 12 20 28 36 b 5 13 21 29 37 b 6 14 22 30 38 b 7 15 23 31 39 b 8 16 24 32 40 as a results I wold like something like A A A A A B B B B B a 1 6 11 16 21 1 9 17 25 33 a 1 6 11 16 21 2 10 18 26 34 a 1 6 11 16 21 3 11 19 27 35 a 1 6 11 16 21 4 12 20 28 36 b 2 7 12 17 22 5 13 21 29 37 b 2 7 12 17 22 6 14 22 30 38 b 2 7 12 17 22 7 15 23 31 39 b 2 7 12 17 22 8 16 24 32 40 does it is clear? is there a function that automate this operation? thank you very much! On 8/16/07, jim holtman [EMAIL PROTECTED] wrote: Can you provide an example of what you mean; e.g., the two input matrices and the desired output. On 8/16/07, Gianni Burgin [EMAIL PROTECTED] wrote: Hi R user, I am new to R, and I have a very simple question for you. I have two matrix A and B, with internally redundant rownames (but variables are different). Some, but not all the rownames are shared among the two matrix. I want to create a greater matrix that combines the previuos two, and has all the possible combinations of matching rownames lines among matrix A and B. looking for the solution I bumped in merge but actually works on data.frame, and in dataframe there could be no redundancy in names. can you help me?? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] function to find coodinates in an array

Get the indices using expand.grid and then reorder them: set.seed(1); X - array(rnorm(24), 2:4) # input X # look at X do.call(expand.grid, sapply(dim(X), seq))[order(X),] On 8/16/07, Ana Conesa [EMAIL PROTECTED] wrote: Dear list, I am looking for a function/way to get the array coordinates of given elements in an array. What I mean is the following: - Let X be a 3D array - I find the ordering of the elements of X by ord - order(X) (this returns me a vector) - I now want to find the x,y,z coordinates of each element of ord Can anyone help me? Thanks! Ana __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Formula in lm inside lapply

It can't find x since the environment of formula1 and of formula2 is the Global Environment and x is not there -- its local to the function. Try this: #generating data set.seed(1) DF - data.frame(y = rnorm(100, 1), x1 = rnorm(100, 1), x2 = rnorm(100, 1), group = rep(c(A, B), c(40, 60))) formula1 - as.formula(y ~ x1) lapply(levels(DF$group), function(x) { environment(formula1) - environment() lm(formula1, DF, subset = group == x) }) formula2 - as.formula(y ~ x1 + x2) lapply(levels(DF$group), function(x) { environment(formula2) - environment() lm(formula2, DF, subset = group == x) }) On 8/15/07, Li, Yan (IED) [EMAIL PROTECTED] wrote: I am trying to run separate regressions for different groups of observations using the lapply function. It works fine when I write the formula inside the lm() function. But I would like to pass formulae into lm(), so I can do multiple models more easily. I got an error message when I tried to do that. Here is my sample code: #generating data x1 - rnorm(100,1) x2 - rnorm(100,1) y - rnorm(100,1) group - rep(c(A,B),c(40,60)) group - factor(group) df - data.frame(y,x1,x2,group) #write formula inside lm--works fine res1 - lapply(levels(df$group), function(x) lm(y~x1,df, subset = group ==x)) res1 res2 - lapply(levels(df$group),function(x) lm(y~x1+x2,df, subset = group ==x)) res2 #try to pass formula into lm()--does not work formula1 - as.formula(y~x1) formula2 - as.formula(y~x1+x2) resf1 - lapply(levels(df$group),function(x) lm(formula1,df, subset = group ==x)) resf1 resf2 - lapply(levels(df$group),function(x) lm(formula2,df, subset = group ==x)) Resf2 The error message is 'Error in eval(expr, envir, enclos): object x not found' Any help is greatly appreciated! Yan This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Formula in lm inside lapply

Here is another solution that gets around the non-standard way that subset= is handled in lm. It has the advantage that unlike the previous solution where formula1 and group == x appear literally in the output, in this one the formula appears written out and group == A and group == B appear: lapply(levels(DF$group), function(x) do.call(lm, +list(formula1, quote(DF), subset = bquote(group == .(x) [[1]] Call: lm(formula = y ~ x1, data = DF, subset = group == A) Coefficients: (Intercept) x1 1.04855 0.04585 [[2]] Call: lm(formula = y ~ x1, data = DF, subset = group == B) Coefficients: (Intercept) x1 1.13593 -0.01627 On 8/15/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: It can't find x since the environment of formula1 and of formula2 is the Global Environment and x is not there -- its local to the function. Try this: #generating data set.seed(1) DF - data.frame(y = rnorm(100, 1), x1 = rnorm(100, 1), x2 = rnorm(100, 1), group = rep(c(A, B), c(40, 60))) formula1 - as.formula(y ~ x1) lapply(levels(DF$group), function(x) { environment(formula1) - environment() lm(formula1, DF, subset = group == x) }) formula2 - as.formula(y ~ x1 + x2) lapply(levels(DF$group), function(x) { environment(formula2) - environment() lm(formula2, DF, subset = group == x) }) On 8/15/07, Li, Yan (IED) [EMAIL PROTECTED] wrote: I am trying to run separate regressions for different groups of observations using the lapply function. It works fine when I write the formula inside the lm() function. But I would like to pass formulae into lm(), so I can do multiple models more easily. I got an error message when I tried to do that. Here is my sample code: #generating data x1 - rnorm(100,1) x2 - rnorm(100,1) y - rnorm(100,1) group - rep(c(A,B),c(40,60)) group - factor(group) df - data.frame(y,x1,x2,group) #write formula inside lm--works fine res1 - lapply(levels(df$group), function(x) lm(y~x1,df, subset = group ==x)) res1 res2 - lapply(levels(df$group),function(x) lm(y~x1+x2,df, subset = group ==x)) res2 #try to pass formula into lm()--does not work formula1 - as.formula(y~x1) formula2 - as.formula(y~x1+x2) resf1 - lapply(levels(df$group),function(x) lm(formula1,df, subset = group ==x)) resf1 resf2 - lapply(levels(df$group),function(x) lm(formula2,df, subset = group ==x)) Resf2 The error message is 'Error in eval(expr, envir, enclos): object x not found' Any help is greatly appreciated! Yan This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] shell and shell.exec on Windows

The system() function has an invisible= argument. The ryacas package uses system() to run yacas. See the runYacas() and yacasInvokeString() functions in yacas.R for examples: http://ryacas.googlecode.com/svn/trunk/R/yacas.R On 8/11/07, Erich Neuwirth [EMAIL PROTECTED] wrote: I have an Excel workbook MyWorkbook.xls containing an Auto_Open macro which I want to be run from R. shell.exec(MyWorkbook.xls) does that. shell(start MyWorkbook.xls) also runs it. In both cases, the Excel window is visible on screen when Excel is started. Is there a way of opening the sheet with a hidden Excel window? start has some parameters (e.g. /MIN), which should allow this, but shell(start /MIN MyWorkbook.xls) also starts Excel visibly. -- Erich Neuwirth, University of Vienna Faculty of Computer Science Computer Supported Didactics Working Group Visit our SunSITE at http://sunsite.univie.ac.at Phone: +43-1-4277-39464 Fax: +43-1-4277-39459 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] help with counting how many times each value occur in each column

Try this where we have constructed the example to illustrate that it does handle the case where not all values are in each column: mat - matrix(rep(1:6, each = 4), 6) table(col(mat), mat) On 8/10/07, Tom Cohen [EMAIL PROTECTED] wrote: Dear list, I have the following dataset and want to know how many times each value occur in each column. data [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] -100 -100 -100000000 -100 [2,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [3,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [4,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [5,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -50 [6,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [7,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [8,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [9,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [10,] -100 -100 -100 -50 -100 -100 -100 -100 -100 -100 [11,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [12,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [13,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [14,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [15,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [16,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [17,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [18,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [19,] -100 -100 -100000000 -100 [20,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 The result matrix should look like -100 0 -50 [1] 20 [2] 20 [3] 20 [4] 17 [5] 18 [6] 18 [7] 18 and so on [8] [9] [10] How can I do this in R ? Thanks alot for your help, Tom - Jämför pris på flygbiljetter och hotellrum: http://shopping.yahoo.se/c-169901-resor-biljetter.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] ordering a data.frame by average rank of multiple columns

Try this: positions - order(ranks) On 8/10/07, Tom.O [EMAIL PROTECTED] wrote: Hi I have run into a problem and i wonder if anyone has a smart way of doing this. For example i have this data frame for 5 different test groups: Res1 - c(1,5,4,-0.5,3) Res2 - c(-1,8,2,0,3) Mean - c(0.5,1,1.5,-.5,2) MyFrame - data.frame(Res1,Res2,Mean,row.names=c(G1,G2,G3,G4,G5)) where the first two columns are the results of two different tests, the third column is the mean of the group. I want to order this data.frame by the combined rank of Res1 Res2, but where weigths are assigned to the importeance av each column. Lets assume that Res1 is twice as important and lower values rank better. MyRanks-data.frame(Rank1=rank(MyFrame[,Res1]),Rank2=rank(MyFrame[,Res2]),CombR=2*rank(MyFrame[,Res1])+rank(MyFrame[,Res2]),row.names=c(G1,G2,G3,G4,G5)) Rank1 Rank2 CombR G1 2 1 5 G2 5 515 G3 4 311 G4 1 2 4 G5 3 410 and the rank of the combined is 2,5,4,1,3 , but to be able to sort MyFrame in that order I need to enter this vector of positions c(4,1,5,3,2) but do anyone have a smart way of converting ranks to positions? Tom -- View this message in context: http://www.nabble.com/ordering-a-data.frame-by-average-rank-of-multiple-columns-tf4247393.html#a12087498 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] [Fwd: Re: How to apply functions over rows of multiple matrices]

1. matrices are stored columnwise so R is better at column-wise operations than row-wise. 2. Here is one way to do it (although I am not sure its better than the index approach): row.apply - function(f, a, b) t(mapply(f, as.data.frame(t(a)), as.data.frame(t(b 3. The code for the example in this post could be simplified to: first.1 - apply(cbind(goldstandard, 1), 1, which.max) ifelse(col(newtest) first.1, NA, newtest) 4. given that both examples did not inherently need row by row operations I wonder if that is the wrong generalization in the first place? On 8/10/07, Johannes Hüsing [EMAIL PROTECTED] wrote: [Apologies to Gabor, who I sent a personal copy of the reply erroneously instead of posting to List directly] [...] Perhaps what you really intend is to take the average over those elements in each row of the first matrix which correspond to 1's in the second in the corresponding row of the second. In that case its just: rowSums(newtest * goldstandard) / rowSums(goldstandard) Thank you for clearing my thoughts about the particular example. My question was a bit more general though, as I have different functions which are applied row-wise to multiple matrices. An example that sets all values of a row of matrix A to NA after the first occurrence of TRUE in matrix B. fillfrom - function(applvec, testvec=NULL) { if (is.null(testvec)) testvec - applvec if (length(testvec) != length(applvec)) { stop(applvec and testvec have to be of same length!) } else if(any(testvec, na.rm=TRUE)) { applvec[min(which(testvec)) : length(applvec)] - NA } applvec } fillafter - function(applvec, testvec=NULL) { if (is.null(testvec)) testvec - applvec fillfrom(applvec, c(FALSE, testvec[-length(testvec)])) } numtest - 6 numsubj - 20 newtest - array(rbinom(numtest*numsubj, 1, .5), dim=c(numsubj, numtest)) goldstandard - array(rbinom(numtest*numsubj, 1, .5), dim=c(numsubj, numtest)) newtest.NA - t(sapply(1:nrow(newtest), function(i) { fillafter(newtest[i,], goldstandard[i,]==1)})) My general question is if R provides some syntactic sugar for the awkward sapply(1:nrow(A)) expression. Maybe in this case there is also a way to bypass the apply mechanism and my way of thinking about the problem has to be adapted. But as the *apply calls are galore in R, I feel this is a standard way of dealing with vectors and matrices. -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Countvariable for id by date

Try this: Lines - id;dg1;dg2;date; 1;F28;;1997-11-04; 1;F20;F702;1998-11-09; 1;F20;;1997-12-03; 1;F208;;2001-03-18; 2;F32;;1999-03-07; 2;F29;F32;2000-01-06; 2;F32;;2003-07-05; 2;F323;F2800;2000-02-05; # replace textConnection(Lines) with actual file name DF - read.csv2(textConnection(Lines), as.is = TRUE, colClasses = list(numeric, character, character, Date, NULL)) rk - function(x, pat) { z - regexpr(pat, x$dg1) 0 | regexpr(pat, x$dg2) 0 rank(ifelse(z, x$date, NA), na.last = keep) } DF$countF20 - unlist(by(DF, DF$id, rk, pat = ^F20)) DF$countF2129 - unlist(by(DF, DF$id, rk, pat = ^F2[1-9])) DF On 8/9/07, David Gyllenberg [EMAIL PROTECTED] wrote: Best R-users, Here's a newbie question. I have tried to find an answer to this via help and the ave(x,factor(),FUN=function(y) rank (z,tie='first')-function, but without success. I have a dataframe (~8000 observations, registerdata) with four columns: id, dg1, dg2 and date(-MM-DD) of interest: id;dg1;dg2;date; 1;F28;;1997-11-04; 1;F20;F702;1998-11-09; 1;F20;;1997-12-03; 1;F208;;2001-03-18; 2;F32;;1999-03-07; 2;F29;F32;2000-01-06; 2;F32;;2003-07-05; 2;F323;F2800;2000-02-05; ... I would like o have two additional columns: 1. countF20: a countvariable that shows which in order (by date) the id has if it fulfils the following logical expression: dg1 = F20* OR dg2 = F20*, where * means F201,F202... F2001,F2002...F20001,F20002... 2. countF2129: another countvariable that shows which in order (by date) the id has if it fulfils the following logical expression: dg1 = F21*-F29* OR dg2 = F21*-F29*, where F21*-F29* means F21*, F22*...F29* and where * means F211,F212... F2101,F2102...F21001,F21002... ... so the dataframe would look like this, where 1 is the first observation for the id with the right condition, 2 is the second etc.: id;dg1;dg2;date;countF20;countF2129; 1;F28;;1997-11-04;;1; 1;F20;F702;1998-11-09;2;; 1;F20;;1997-12-03;1;; 1;F208;;2001-03-18;3;; 2;F32;;1999-03-07;;; 2;F29;F32;2000-01-06;;1; 2;F32;;2003-07-05;;; 2;F323;F2800;2000-02-05;;2; ... Do you know a convenient way to create these kind of countvariables? Thank you in advance! / David (david.gyllenberg at yahoo.com - Park yourself in front of a world of choices in alternative vehicles. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.