from:"Tony Plate"

Re: [R] Seeking help with a loop

2005-08-03 Thread Tony Plate

  x - data.frame(q33a=3:4,q33b=5:6,q35a=1:2,q35b=2:1)
  y - list()
  for (i in grep(q33, colnames(x), value=TRUE))
+y[[sub(q33,,i)]] - ifelse(x[[sub(q33,q35,i)]]==1, x[[i]], NA)
  as.data.frame(y)
a  b
1  3 NA
2 NA  6
  # if you really want to create new variables rather
  # than have them in a data frame:
  # (use paste() or sub() to modify the names if you
  #  want something like newfielda)
  for (i in names(y)) assign(i, y[[i]])
  a
[1]  3 NA
  b
[1] NA  6
 

hope this helps,

Tony Plate

Greg Blevins wrote:
 Hello R Helpers,
 
 After spending considerable time attempting to write a loop (and searching 
 the help archives) I have decided to post my problem.  
 
 In a dataframe I have columns labeled:
 
 q33a q33b q33c...q33rq35a q35b q35c...q35r
 
 What I want to do is create new variables based on the following logic:
 newfielda - ifelse(q35a==1, q33a, NA)
 newfieldb - ifelse(q35b==1, q33b, NA)
 ...
 newfieldr
 
 What I did was create two new dataframes, one containing q33a-r the other 
 q35a-r and tried to loop over both, but I could not get any of the loop 
 syntax I tried to give me the result I was seeking.
 
 Any help would be much appreciated.
 
 Greg Blevins
 Partner
 The Market Solutions Group, Inc.
 Minneapolis, MN
 
 Windows XP, R 2.1.1
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Why only a string for heading for row.names with write.csv with a matrix?

2005-08-10 Thread Tony Plate

Here's a relatively easy way to get what I think you want.  Note that 
converting x to a data frame before cbind'ing allows the type of the 
elements of x to be preserved:

  x - matrix(1:6, 2,3)
  rownames(x) - c(ID1, ID2)
  colnames(x) - c(Attr1, Attr2, Attr3)
  x
 Attr1 Attr2 Attr3
ID1 1 3 5
ID2 2 4 6
  write.table(cbind(id=row.names(x), as.data.frame(x)), 
row.names=FALSE, sep=,)
id,Attr1,Attr2,Attr3
ID1,1,3,5
ID2,2,4,6
 

As to why you can't get this via an argument to write.table (or 
write.csv), I suspect that part of the answer is a wish to avoid 
creeping featuritis.  Transferring data between programs is 
notoriously infuriating.  There are more data formats than there are 
programs, but few programs use the same format as their default  
preferred format.  So to accommodate everyone's preferred format would 
require an extremely large number of features in the data import/export 
functions.  Maintaining software that contains a large number of 
features is difficult -- it's easy for errors to creep in because there 
are so many combinations of how different features can be used on 
different functions.

The alternative to having lots of features on each function is to have a 
relatively small set of powerful functions that can be used to construct 
the behavior you want.  This type of software is thought by many to be 
easier to maintain and extend.  I think is is pretty much the preferred 
approach in R.  The above one-liner for writing the data in the form you 
want is really not much more complex than using an additional argument 
to write.table().  (And if you need to do this kind of thing frequently, 
then it's easy in R to create your own wrapper function for 'write.table'.)

One might object to this line of explanation by noting that many 
functions already have many arguments and lots of features.  I think the 
situation is that the original author of any particular function gets to 
decide what features the function will have, and after that there is 
considerable reluctance (justifiably) to add new features, especially in 
cases where there desired functionality can be easily achieved in other 
ways with existing functions.

-- Tony Plate

Earl F. Glynn wrote:
 Consider:
 
x - matrix(1:6, 2,3)
rownames(x) - c(ID1, ID2)
colnames(x) - c(Attr1, Attr2, Attr3)
 
 
x
 
 Attr1 Attr2 Attr3
 ID1 1 3 5
 ID2 2 4 6
 
 
write.csv(x,file=x.csv)
 
 ,Attr1,Attr2,Attr3
 ID1,1,3,5
 ID2,2,4,6
 
 Have I missed an easy way to get the  string to be something meaningful?
 
 There is no information in the  string.  This column heading for the row
 names often could used as a database key, but the  entry would need to be
 manually edited first.  Why not provide a way to specify the string instead
 of putting  as the heading for the rownames?
 
From http://finzi.psych.upenn.edu/R/doc/manual/R-data.html
 
   Header line
   R prefers the header line to have no entry for the row names,
   . . .
   Some other systems require a (possibly empty) entry for the row names,
 which is what write.table will provide if argument col.names = NA  is
 specified. Excel is one such system.
 
 Why is an empty entry the only option here?
 
 A quick solution that comes to mind seems a bit kludgy:
 
 
y - cbind(rownames(x), x)
colnames(y)[1] - ID
y
 
 IDAttr1 Attr2 Attr3
 ID1 ID1 1   3   5
 ID2 ID2 2   4   6
 
 
write.table(y, row.names=F, col.names=T, sep=,, file=y.csv)
 
 ID,Attr1,Attr2,Attr3
 ID1,1,3,5
 ID2,2,4,6
 
 Now the rownames have an ID header, which could be used as a key in a
 database if desired without editing (but all the numbers are now
 characters strings, too).
 
 It's also not clear why I had to use write.table above, instead of
 write.csv:
 
write.csv(y, row.names=F, col.names=T, file=y.csv)
 
 Error in write.table(..., col.names = NA, sep = ,, qmethod = double) :
 col.names = NA makes no sense when row.names = FALSE
 
 Thanks for any insight about this.
 
 efg
 --
 Earl F. Glynn
 Bioinformatics
 Stowers Institute
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] queer data set

2005-08-15 Thread Tony Plate

Here's one way of working with the data you gave:

  x - read.table(file(clipboard), fill=T, header=T)
  x
   HEADER1 HEADER2 HEADER3   HEADER3.1
1  A1  B1  C1 X11;X12;X13
2  A2  B2  C2 X21;X22;X23;X24;X25
3  A3  B3  C3
4  A4  B4  C4 X41;X42;X43
5  A5  B5  C5 X51
  apply(x, 1, function(x) strsplit(x[4], ;)[[1]])
$1
[1] X11 X12 X13

$2
[1] X21 X22 X23 X24 X25

$3
character(0)

$4
[1] X41 X42 X43

$5
[1] X51

  do.call(rbind, apply(x, 1, function(x) {
+y - strsplit(x[4], ;)[[1]]
+x3 - matrix(x[1:3], ncol=3, nrow=max(1,length(y)), byrow=T)
+return(cbind(x3, if (length(y)) y else NA))
+ }))
   [,1] [,2] [,3] [,4]
  [1,] A1 B1 C1 X11
  [2,] A1 B1 C1 X12
  [3,] A1 B1 C1 X13
  [4,] A2 B2 C2 X21
  [5,] A2 B2 C2 X22
  [6,] A2 B2 C2 X23
  [7,] A2 B2 C2 X24
  [8,] A2 B2 C2 X25
  [9,] A3 B3 C3 NA
[10,] A4 B4 C4 X41
[11,] A4 B4 C4 X42
[12,] A4 B4 C4 X43
[13,] A5 B5 C5 X51
 

This of course is a matrix; you can convert it back to a dataframe using 
as.data.frame() if you desire.  Use either NA (with quotes) or NA 
(without quotes) to control whether you get just the string NA or an 
actual character NA value in column 4.  If you're processing a huge 
amount of data, you can probably do better by rewriting the above code 
to avoid implicit coercions of data types.

hope this helps,

Tony Plate

S.O. Nyangoma wrote:
 I have a dataset that is basically structureless. Its dimension varies 
 from row to row and sep(s) are a mixture of tab and semi colon (;) and 
 example is
 
 HEADER1 HEADER2 HEADER3   HEADER3
 A1   B1  C1   X11;X12;X13
 A2   B2  C2   X21;X22;X23;X24;X25
 A3   B3  C3   
 A4   B4  C4   X41;X42;X43
 A5   B5  C5   X51
 
 etc., say. Note that a blank under HEADER3 corresponds to non 
 occurance and all semi colon (;) delimited variables are under 
 HEADER3. These values run into tens of thousands. I want to give some 
 order to this queer matrix to something like:
 
 HEADER1 HEADER2 HEADER3   HEADER3
 A1   B1  C1   X11
 A1   B1  C1   X12
 A1   B1  C1   X13
 A1   B1  C1   X14
 A2   B2  C2   X21
 A2   B2  C2   X22
 A2   B2  C2   X23
 A2   B2  C2   X24
 A2   B2  C2   X25
 A2   B2  C2   X26
 A3   B3  C3   NA
 A4   B4  C4   X41
 A4   B4  C4   X42
 A4   B4  C4   X43
 
 Is there a brilliant R-way of doing such task?
 
 Goodday. Stephen.
 
 
 
 
 
 
 
 
 - Original Message -
 From: Prof Brian Ripley [EMAIL PROTECTED]
 Date: Monday, August 15, 2005 11:13 pm
 Subject: Re: [R] How to get a list work in RData file
 
 
On Mon, 15 Aug 2005, Xiyan Lon wrote:


Dear R-Helper,

(There are quite a few of us.)


I want to know how I get a list  work which I saved in RData 

file. For

example,

I don't understand that at all, but it looks as if you want to 
save an 
unevaluated call, in which case see ?quote and use something like

xyadd - quote(test.xy(x=2, y=3))

load and saving has nothing to do with this: it doesn't change the 
meaning 
of objects in the workspace.


test.xy - function(x,y) {

+xy - x+y
+xy
+ }

xyadd - test.xy(x=2, y=3)
xyadd

[1] 5

x1 - c(2,43,60,8)
y1 - c(91,7,5,30)

xyadd1 - test.xy(x=x1, y=y1)
xyadd1

[1] 93 50 65 38

save(list = ls(all=TRUE), file = testxy.RData)
rm(list=ls(all=TRUE))
load(C:/R/useR/testxy.RData)
ls()

[1] test.xy x1  xyadd   xyadd1  y1

ls.str(pat=xyadd)

xyadd :  num 5
xyadd1 :  num [1:4] 93 50 65 38

When I run, I know the result like above

xyadd

[1] 5

xyadd1

[1] 93 50 65 38

what I want to know, is there any function to make the result like:


xyadd

test.xy(x=2, y=3)

and


xyadd1

   test.xy(x=x1, y=y1)

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-
guide.html
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Regular expressions sub

2005-08-18 Thread Tony Plate

  x - scan(clipboard, what=)
Read 7 items
  x
[1] 1.11   10.11  11.11  113.31 114.2  114.3  114.8
  gsub([0-9]*\\., , x)
[1] 11 11 11 31 2  3  8
 


Bernd Weiss wrote:
 Dear all,
 
 I am struggling with the use of regular expression. I got
 
 
as.character(test$sample.id)
 
  [1] 1.11   10.11  11.11  113.31 114.2  114.3  114.8  
 
 and need
 
 [1] 11   11  11  31 2  3  8
 
 I.e. remove everything before the . .
 
 TIA,
 
 Bernd
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] books about MCMC to use MCMC R packages?

2005-09-23 Thread Tony Plate

I've found Bayesian Data Analysis by Gelman, Carlin, Stern  Rubin 
(2nd ed) to be quite useful for understanding how MCMC can be used for 
Bayesian models.  It has a little bit of R code in it too.

-- Tony Plate

Molins, Jordi wrote:
 Dear list users,
 
 I need to learn about MCMC methods, and since there are several packages in
 R that deal with this subject, I want to use them. 
 
 I want to buy a book (or more than one, if necessary) that satisfies the
 following requirements:
 
 - it teaches well MCMC methods;
 
 - it is easy to implement numerically the ideas of the book, and notation
 and concepts are similar to the corresponding R packages that deal with MCMC
 methods.
 
 I have done a search and 2 books seem to satisfy my requirements:
 
 - Markov Chain Monte Carlo In Practice, by W.R. Gilks and others.
 
 - Monte Carlo Statistical methods, Robert and Casella.
 
 What do people think about these books? Is there a suggestion of some other
 book that could satisfy better my requirements?
 
 Thank you very much in advance.
 
 
 
 
 
 The information contained herein is confidential and is inte...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Assign references

2005-10-07 Thread Tony Plate

Looking at what objects exist after the call to myFunk() should give you 
a clue as to what happened:

  remove(list=objects())
  myFunk-function(a,b,foo,bar) {foo-a+b; bar-a*b;}
  x-0; y-0;
  myFunk(4,5,x,y)
  x
[1] 0
  y
[1] 0
  objects()
[1] barfoomyFunk x  y
  bar
[1] 20
  foo
[1] 9
 

I suspect that you might have slightly misinterpreted Thomas Lumely's 
explanations of how the - operator works in different situations (the 
LHS must exist if you are assigning using a replacement operator, e.g., 
as in foo[1] - ..., but not when you are assigning the whole object 
as in foo - ...).

But I really would suggest careful consideration of what might be the 
best way to approach your problem -- modifying global data from within a 
function is not the standard way of using R.  Unless you are very 
careful about how you do it, it is likely to cause headaches for 
yourself and/or others down the road (because R is just not intended to 
be used that way).

The standard way of doing this sort of thing in R is to modify a local 
copy of the dataframe and return that, or if you have to return several 
dataframes, then return a list of dataframes.

-- Tony Plate

[EMAIL PROTECTED] wrote:
 Folks,
 
 I've run into trouble while writing functions that I hope will create
 and modify a dataframe or two.  To that end I've written a toy function
 that simply sets a couple of variables (well, tries but fails).
 Searching the archives, Thomas Lumley recently explained the -
 operator, showing that it was necessary for x and y to exist prior to
 the function call, but I haven't the faintest why this isn't working:
 
 
myFunk-function(a,b,foo,bar) {foo-a+b; bar-a*b;}
x-0; y-0;
myFunk(4,5,x,y)
x-0; y-0;
myFunk(4,5,x,y)
x
 
 [1] 0
 
y
 
 [1] 0
 
 What (no doubt simple) reason is there for x and y not changing?
 
 Thank you,
 cur
 --
 Curt Seeliger, Data Ranger
 CSC, EPA/WED contractor
 541/754-4638
 [EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] R on a supercomputer

2005-10-10 Thread Tony Plate

In general, R is not written in such a way that data remain in cache. 
However, R can use optimized BLAS libraries, and these are.   So if your 
version of R is compiled to use an optimized BLAS library appropriate to 
the machine (e.g., ATLAS, or Prof. Goto's Blas), AND a considerable 
amount of the computation done in your R program involves basic linear 
algebra (matrix multiplication, etc.), then you might see a good speedup.

-- Tony Plate

Kimpel, Mark William wrote:
 I am using R with Bioconductor to perform analyses on large datasets
 using bootstrap methods. In an attempt to speed up my work, I have
 inquired about using our local supercomputer and asked the administrator
 if he thought R would run faster on our parallel network. I received the
 following reply:
 
  
 
  
 
 The second benefit is that the processors have large caches. 
 
 Briefly, everything is loaded into cache before going into the
 processor.  With large caches, there is less movement of data between
 memory and cache, and this can save quite a bit of time.  Indeed, when
 programmers optimize code they usually think about how to do things to
 keep data in cache as long as possible. 
 
   Whether you would receive any benefit from larger cache depends on how
 R is written. If it's written such that  data remain in cache, the
 speed-up could be considerable, but I have no way to predict it.
 
  
 
 My question is, is R written such that data remain in cache? 
 
  
 
 Thanks,
 
  
 
  
 
 Mark W. Kimpel MD 
 
  
 
 Indiana University School of Medicine
 
  
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] transformation matrice of vector into array

2006-07-27 Thread Tony Plate

Here's a way to convert a matrix of vectors like you have into an array:

  x - array(lapply(seq(0,len=6,by=4), +, c(a=1,b=2,c=3,d=4)), 
dim=c(2,3), dimnames=list(c(X,Y),c(e,f,g)))
  x
   e f g
X Numeric,4 Numeric,4 Numeric,4
Y Numeric,4 Numeric,4 Numeric,4
  x[[Y,e]]
a b c d
5 6 7 8
  xa - array(unlist(x, use.names=F), dim=c(length(x[[1,1]]),dim(x)), 
dimnames=c(list(names(x[[1,1]])),dimnames(x)))
  x[Y,e]
[[1]]
a b c d
5 6 7 8

  xa[,Y,e]
a b c d
5 6 7 8
 

Then you can do whatever sums you want over the array.

I have not extensively checked the above code, and if I were going to 
use it, I would do numerous spot checks of elements to make sure all the 
elements are going to the right places -- it's not too difficult to make 
mistakes when pulling apart and reassembling arrays like this.  (For 
simpler cases involving lists of vectors or matrices, the abind() 
function can help.)

-- Tony Plate

Jessica Gervais wrote:
 Hi,
 
 I need some help
 
 I have a matrix M(m,n) in which each element is a vector V of lenght 6
  1  2  3  4  5  6  7
 1   List,6 List,6 List,6 List,6 List,6 List,6 List,6
 2   List,6 List,6 List,6 List,6 List,6 List,6 List,6
 3   List,6 List,6 List,6 List,6 List,6 List,6 List,6
 4   List,6 List,6 List,6 List,6 List,6 List,6 List,6
 
 
 i would like to make the sum on the matrix of each element of the
 matrix, that is to say 
 sum(on the matrix)(M[j,][[j]][[1]])
 sum(on the matrix)(M[j,][[j]][[2]])
 ...
 sum(on the matrix)(M[j,][[j]][[6]])  
 
 I don't really know how to do.
 I thought it was possible to transform the matrix M into an array A of
 dimension (m,n,6), and then use the command sum(colsums(A[,,1]), which
 seems to be possible and quite fast.
 ...but I don't know how to convert a matrix of vector into an array
 
 As anyone any little idea about that ?
 
 Thanks by advance
 
 Jessica
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Functions ,Optim, Dataframe

2006-07-31 Thread Tony Plate

Supply your additional arguments to optim() and they will get passed to 
your function:

  mydat-data.frame(d1=c(3,5),d2=c(6,10),p1=c(.55,.05),p2=c(.85,.35))
 
  fr-function(x, d) {
+ # d is a vector of d1, d2, p1  p2
+ u - x[1]
+ v - x[2]
+ d1 - d[1]
+ d2 - d[2]
+ p1 - d[3]
+ p2 - d[4]
+ sqrt(sum((plnorm(c(d1,d2,u,v)-c(p1,p2))^2)))
+ }
  x0 - c(1,1)# starting values for two unknown parameters
  y1 - optim(x0,fr,d=unlist(mydat[1,]))
  y2 - optim(x0,fr,d=unlist(mydat[2,]))
  y1$par
[1] 0.462500 0.828125
  y2$par
[1] -1.0937500  0.2828125
  yall - apply(mydat, 1, function(d) optim(x0,fr,d=d))
  yall[[1]]$par
[1] 0.462500 0.828125
  yall[[2]]$par
[1] -1.0937500  0.2828125
 

One thing you must be careful of is that none of the arguments to your 
function match or partially match the named arguments of optim(), which are:
  names(formals(optim))
[1] par fn  gr  method  lower   upper   control
[8] hessian ...
 

For example, if your function has an argument 'he=', you will not be 
able to pass it, because if you say optim(x0, fr, he=3), the 'he' will 
match the 'hessian=' argument of optim(), and it will not be interpreted 
as being a '...' argument.

-- Tony Plate

Michael Papenfus wrote:
 I think I need to clarify a little further on my original question.
 
 I have the following two rows of data:
 mydat-data.frame(d1=c(3,5),d2=c(6,10),p1=c(.55,.05),p2=c(.85,.35))
  mydat
   d1 d2 p1 p2
 1 3 6 0.55 0.85
 2 5 10 0.05 0.35
 
 I need to optimize the following function using  optim for each row in mydat
 fr-function(x) {
 u-x[1]
 v-x[2]
 sqrt(sum((plnorm(c(d1,d2,u,v)-c(p1,p2))^2))
 }
 x0-c(1,1)# starting values for two unknown parameters
 y-optim(x0,fr)
 
 In my defined function fr, (d1 d2 p1 p2) are known values which I need 
 to read in from my dataframe and u  v are the TWO unknown parameters.  
 I want to solve this equation for each row of my dataframe.
 
 I can get this to work when I manually plug in the known values (d1 d2 
 p1 p2).  However, I would like to apply this to each row in my dataframe 
 where the known values are automatically passed to my function which 
 then is sent to optim which solves for the two unknown parameters for 
 each row in the dataframe.
 
 thanks again,
 mike


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Functions ,Optim, Dataframe

2006-07-31 Thread Tony Plate

I added an example of passing additional arguments through optim() to 
the objective and gradient functions to the Discussion section of the 
Wiki-fied R documentation.  See it at 
http://wiki.r-project.org/rwiki/doku.php?id=rdoc:stats:optim

-- Tony Plate

PS.  I had to add purge=true to the end of the URL, i.e., 
http://wiki.r-project.org/rwiki/doku.php?id=rdoc:stats:optimpurge=true 
in order to see the original documentation the first time -- it's 
something to do with bad cache entries for the page.

Michael Papenfus wrote:
 I think I need to clarify a little further on my original question.
 
 I have the following two rows of data:
 mydat-data.frame(d1=c(3,5),d2=c(6,10),p1=c(.55,.05),p2=c(.85,.35))
  mydat
   d1 d2 p1 p2
 1 3 6 0.55 0.85
 2 5 10 0.05 0.35
 
 I need to optimize the following function using  optim for each row in mydat
 fr-function(x) {
 u-x[1]
 v-x[2]
 sqrt(sum((plnorm(c(d1,d2,u,v)-c(p1,p2))^2))
 }
 x0-c(1,1)# starting values for two unknown parameters
 y-optim(x0,fr)
 
 In my defined function fr, (d1 d2 p1 p2) are known values which I need 
 to read in from my dataframe and u  v are the TWO unknown parameters.  
 I want to solve this equation for each row of my dataframe.
 
 I can get this to work when I manually plug in the known values (d1 d2 
 p1 p2).  However, I would like to apply this to each row in my dataframe 
 where the known values are automatically passed to my function which 
 then is sent to optim which solves for the two unknown parameters for 
 each row in the dataframe.
 
 thanks again,
 mike


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] deleting a directory

2006-08-01 Thread Tony Plate

?unlink says that unlink() can remove directories (and has a 'recursive' 
argument).  'unlink' is in the SEE ALSO section in ?file.remove.

-- Tony Plate

Sundar Dorai-Raj wrote:
 Hi, all,
 
 I'm looking a utility for removing a directory from within R. Currently, 
 I'm using:
 
 foo - function(...) {
mydir - tempdir()
dir.create(mydir, showWarnings = FALSE, recursive = TRUE)
on.exit(system(sprintf(rm -rf %s, mydir)))
## do some stuff in mydir
invisible()
 }
 
 However, this is assumes rm is available. I know of ?dir.create, but 
 there is no opposite. And ?file.remove appears to work only on files and 
 not directories.
 
 Any advice? Or is my current approach the only solution?
 
   R.version
 _
 platform   i386-pc-mingw32
 arch   i386
 os mingw32
 system i386, mingw32
 status
 major  2
 minor  3.1
 year   2006
 month  06
 day01
 svn rev38247
 language   R
 version.string Version 2.3.1 (2006-06-01)
 
 
 Thanks,
 
 --sundar
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] meta characters in file path

2006-08-03 Thread Tony Plate

What is the problem you are having?  Seems to work fine for me running 
under Windows2000:

  write.table(data.frame(a=1:3,b=4:6), file=@# x.csv, sep=,)
  read.csv(file=@# x.csv)
   a b
1 1 4
2 2 5
3 3 6
  sessionInfo()
Version 2.3.1 (2006-06-01)
i386-pc-mingw32

attached base packages:
[1] methods   stats graphics  grDevices utils datasets
[7] base

other attached packages:
  XML
0.99-8
 

Li,Qinghong,ST.LOUIS,Molecular Biology wrote:
 Hi,
 
 I need to read in some files. The file names contain come meta characters 
 such as @, #, and white spaces etc, In read.csv, file= option, is there any 
 way that one can make the function to recognize a file path with those 
 characters?
 
 Thanks
 Johnny
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regex scares me

2006-08-28 Thread Tony Plate

I think this does the trick.  Note that it is case sensitive.

  x - c(lad.tab, xxladyy.tab, xxyy.tab, lad.tabx, LAD.tab, 
lad.TAB)
  grep(lad.*\\.tab$, x, value=T)
[1] lad.tab xxladyy.tab
 

Jon Minton wrote:
 Hi, apologies if this is too simple but I've been stuck on the following for
 a while:
 
  
 
 I have a vector of strings: filenames with a name before the extension and a
 variety of possible extensions
 
  
 
 I want to select only those files with:
 
  1) a .tab extension
 
 AND 
 
 2) the character sequence lad anywhere in the name of the file before the
 extension.
 
  
 
 Surely this won't take long to do, I thought. (But I was wrong.)
 
  
 
 What's the regexp pattern to specify here?
 
  
 
 Thanks,
 
  
 
 Jon Minton
 
  
 
  
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cannot get simple data.frame binding.

2006-08-28 Thread Tony Plate

Maybe I'm missing something, but your Real life code looks like it 
should work.  What happens when you do:

  ire1 - data.frame(md1[, 1:11], other)
Error in data.frame(md1[, 1:11], other) : arguments
imply differing number of rows: 11, 75
  str(md1[, 1:11])
  str(other)

?

Maybe the labelled data frame is causing the problem?  Did you try 
as.data.frame(md1[,1:11])? (I'm guessing that will strip off extra 
attributes).

-- Tony Plate

John Kane wrote:
 I am stuck on a simple problem where an example works
 fine but the real one does not.
 
 I have a data.frame where I wish to sum up some values
 across the rows and create a new data.frame with some
 of old data.frame variables and the new summed
 variable.
 
 It works fine in my simple example but I am doing
 something wrong in the real world.  In the real world
 I am loading a labeled data.frame. The orginal data
 comes from a spss file imported using spss.get but the
 current data.frame is a subset of the orginal spss
 file.
 
 EXAMPLE
 cata - c( 1,1,6,1,1,NA)
 catb - c( 1,2,3,4,5,6)
 doga - c(3,5,3,6,4, 0)
 dogb - c(2,4,6,8,10, 12)
 rata - c (NA, 9, 9, 8, 9, 8)
 ratb - c( 1,2,3,4,5,6)
 bata - c( 12, 42,NA, 45, 32, 54)
 batb - c( 13, 15, 17,19,21,23)
 id - c('a', 'b', 'b', 'c', 'a', 'b')
 site - c(1,1,4,4,1,4)
 mat1 -  cbind(cata, catb, doga, dogb, rata, ratb,
 bata, batb)
 
 data1 - data.frame(site, id, mat1)
 attach(data1)
 data1
 aa - which(names(data1)==rata)
 bb - length(names(data1))
 
 mat1 - as.matrix(data1[,aa:bb])
 food - apply( mat1, 1, sum , na.rm=T)
 food
 
 abba - data.frame(data1[, 1:6], food)
 abba
 
 --
 Real life problem
 
 
load(C:/start/R.objects/partly.corrected.materials.Rdata)
md1-partly.corrected.materials
aa - which(names(md1)==oaks)
bb - length(names(md1))

# sum the values of the other variables
mat1 - as.matrix( md1[, aa:bb] )
other - apply(mat1,1, sum, na.rm=T)
ire1 - data.frame(md1[, 1:11], other)
 
 Error in data.frame(md1[, 1:11], other) : arguments
 imply differing number of rows: 11, 75
 
 -
 
 I have simply worked around the problem by using 
 ire1 - data.frame(md1$site, md1$colour, md1$ss1 ... ,
 other) 
 but I would like to know what stupid thing I am doing.
 
 Thanks
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with putting objects in list

2006-09-06 Thread Tony Plate

I suspect you are not thinking about the list and the 
subsetting/extraction operators in the right way.

A list contains a number of components.

To get a subset of the list, use the '[' operator.  The subset can 
contain zero or more components of the list, and it is a list itself. 
So, if x is a list, then x[2] is a list containing a single component.

To extract a component from the list, use the '[[' operator.  You can 
only extract one component at a time.  If you supply a vector index with 
more than one element, it will index recursively.

  x - list(1,2:3,letters[1:3])
  x
[[1]]
[1] 1

[[2]]
[1] 2 3

[[3]]
[1] a b c

  # a subset of the list
  x[2:3]
[[1]]
[1] 2 3

[[2]]
[1] a b c

  # a list with one component:
  x[2]
[[1]]
[1] 2 3

  # the second component itself
  x[[2]]
[1] 2 3
  # recursive indexing
  x[[c(2,1)]]
[1] 2
  x[[c(3,2)]]
[1] b
 

Rainer M Krug wrote:
 Hi
 
 I use the following code and it stores the results of density() in the
 list dr:
 
 dens - function(run) { density( positions$X[positions$run==run], bw=3,
 cut=-2 ) }
 dr - lapply(1:5, dens)
 
 but the results are stored in dr[[i]] and not dr[i], i.e. plot(dr[[1]])
 works, but plot([1]) doesn't.
 
 Is there any way that I can store them in dr[i]?
 
 Thanks a lot,
 
 Rainer
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rename cols

2006-09-11 Thread Tony Plate

The following works for data frames and matrices (you didn't say which 
you were working with).

  x - data.frame(V1=1:3,V2=4:6)
  x
   V1 V2
1  1  4
2  2  5
3  3  6
  colnames(x) - c(Apple, Orange)
  x
   Apple Orange
1 1  4
2 2  5
3 3  6
 

For a data frame, 'names(x) - c(Apple, Orange)' also works, because 
a dataframe is stored internally as a list of columns.

-- Tony Plate

Ethan Johnsons wrote:
 A quick question please!
 
 How do you rename column names?  i.e. V1 -- Apple; V2 -- Orange, etc.
 
 thx much
 
 ej
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Access Rows in a Data Frame by Row Name

2006-09-13 Thread Tony Plate

Matrix-style indexing works for both columns and rows of data frames.

E.g.:
  x - data.frame(a=1:5, b=6:10, d=11:15)
  x
   a  b  d
1 1  6 11
2 2  7 12
3 3  8 13
4 4  9 14
5 5 10 15
  x[2:4,c(1,3)]
   a  d
2 2 12
3 3 13
4 4 14
 

Time spend reading the help document An Introduction to R will 
probably be well worth it.  The relevant sections are 5 Arrays and 
matrices, and 6.3 Data frames.

-- Tony Plate

Michael Gormley wrote:
 I have created a data frame using the read.table command.  I want to be able 
 to access the rows by the row name, or a vector of row names. I know that you 
 can access columns by using the data.frame.name$col.name.  Is there a way to 
 access row names in a similar manner?
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] symbolic matrix elements...

2006-09-18 Thread Tony Plate

If I construct the matrix by list()ing together the expressions rather 
than c()ing, then it works OK:

  x - matrix(list( expression(x3-5*x+4), expression(log(x2-4*x
  x[1,1]
[[1]]
expression(x3 - 5 * x + 4)

  x[[1,1]]
expression(x3 - 5 * x + 4)
  D(x[[1,1]], x)
-5
 

The reason c() doesn't work properly here might have something to do 
with it creating a language object of an unconventional type:

  c( expression(x3-5*x+4), expression(log(x2-4*x)))
expression(x3 - 5 * x + 4, log(x2 - 4 * x))
  expression(x3-5*x+4)
expression(x3 - 5 * x + 4)
 

Using list() with language objects is much safer if you just want to 
make lists of them.

-- Tony Plate

Evan Cooch wrote:
 
 Eik Vettorazzi wrote:
 
test=matrix(c( expression(x^3-5*x+4), expression(log(x^2-4*x
works.
 
 Well, not really (or I'm misunderstanding). Your code enters fine (no 
 errors), but I can't access individual elements - e.g., test[1,1] gives 
 me an error:
 
   test=matrix(c( expression(x^3-5*x+4), expression(log(x^2-4*x
   test[1,1]
 Error: matrix subscripting not handled for this type
 
 Meaning...what?
 
 
btw. you recieved an error because D expects an expression and you 
offered a list
 
 
 OK - so why then are each of the elements identified as an expression 
 which I print out the vector? Each element is reported to be an 
 expression. OK, if so, then I remain puzzled as to how this is a 'list'.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] List-manipulation

2006-09-29 Thread Tony Plate

Does this do what you want?

  x - list(1,2,3:7,8,9:10)
  sapply(x, function(xx) xx[1])
[1] 1 2 3 8 9
 

-- Tony Plate

Benjamin Otto wrote:
 Hi,
 
  
 
 Sorry for the question, I know it should be basic knowledge but I'm
 struggling for two hours now.
 
  
 
 How do I select only the first entry of each list member and ignore the
 rest?
 
  
 
 So for 
 
  
 
 
$121_at
 
 
-113691170 
 
 
  
 
 
$1255_g_at
 
 
42231151 
 
 
  
 
 
$1316_at
 
 
35472685 35472588 
 
 
  
 
 
$1320_at
 
 
-88003869
 
 
  
 
 I only want to select 
 
  
 
 -113691170, 42231151, 35472685 and -88003869 .?
 
  
 
 Regards
 
 Benjamin
 
 --
 Benjamin Otto
 Universitaetsklinikum Eppendorf Hamburg
 Institut fuer Klinische Chemie
 Martinistrasse 52
 20246 Hamburg
 
  
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how ot replace the diagonal of a matrix

2006-10-03 Thread Tony Plate

You are indexing with numeric 0's and 1's, which will refer to only the 
matrix element 1,1 (multiple times), cf:

  matrix(1:9,3)[diag(3)]
[1] 1 1 1
 

Try one of these:

  idx - diag(3)  0
  idx - which(diag(3)0)
  idx - cbind(seq(len=n), seq(len=n))

(For very large matrices, the third will be more efficient, I believe.)

-- Tony Plate

roger bos wrote:
 Dear useRs,
 
 Trying to replace the diagonal of a matrix is not working for me.  I
 want a matrix with .6 on the diag and .4 elsewhere.  The following
 code looks like it should work--when I lookk at mps and idx they look
 how I want them too--but it only replaces the first element, not each
 element on the diagonal.
 
 mps - matrix(rep(.4, 3*3), nrow=n, byrow=TRUE)
 idx - diag(3)
 mps
 idx
 mps[idx] - rep(.6,3)
 
 I also tried something along the lines of diag(mps=.6, ...) but it
 didn't know what mps was.
 
 Thanks,
 
 Roger
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] shifting a huge matrix left or right efficiently ?

2006-10-09 Thread Tony Plate

If you're able to work with the transpose of your matrix, you might 
consider the function 'filter()', e.g.:

  filter(diag(1:5), c(2,3), sides=1)
Time Series:
Start = 1
End = 5
Frequency = 1
   [,1] [,2] [,3] [,4] [,5]
1   NA   NA   NA   NA   NA
234000
306600
400980
5000   12   10
 

I don't know if the conversion to and from a time-series class will 
impact the timing, but if this might serve your purposes, it's easy to 
do some experiments to find out.

- Tony Plate

Huang-Wen Chen wrote:
 I'm wondering what's the best way to shift a huge matrix left or right.
 My current implementation is the following:
 
 shiftMatrixL - function(X, shift, padding=0) {
   cbind(X[, -1:-shift], matrix(padding, dim(X)[1], shift))
 }
 
 X - shiftMatrixL(X, 1)*3 + shiftMatrixL(X,2)*5...
 
 However, it's still slow due to heavy use of this function.
 The resulting matrix will only be read once and then discarded,
 so I believe the best implementation of this function is in C,
 manipulating the internal data structure of this matrix.
 Anyone know similar package for doing this job ?
 
 Huang-Wen
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fwd: rarefy a matrix of counts

2006-10-11 Thread Tony Plate

Here's a way using apply(), and the prob= argument of sample():

  df - data.frame(sample1=c(red=400,green=100,black=300), 
sample2=c(300,0,1000), sample3=c(2500,200,500))
  df
   sample1 sample2 sample3
red   400 3002500
green 100   0 200
black 3001000 500
  set.seed(1)
  apply(df, 2, function(counts) sample(seq(along=counts), rep=T, 
size=7, prob=counts))
  sample1 sample2 sample3
[1,]   1   3   1
[2,]   1   3   1
[3,]   3   3   1
[4,]   2   3   2
[5,]   1   3   1
[6,]   2   3   1
[7,]   2   3   3
 

Note that this does sampling WITH replacement.
AFAIK, sampling without replacement requires enumerating the entire 
population to be sampled from.  I.e., you cannot do
  sample(1:3, prob=1:3, rep=F, size=4)
instead of
  sample(c(1,2,2,3,3,3), rep=F, size=4)

-- Tony Plate

 From reading ?sample, I was a little unclear on whether sampling 
without replacement could work

Petr Pikal wrote:
 Hi
 
 a litle bit different story. But
 
 x1 - sample(c(rep(red,400),rep(green, 100), 
 rep(black,300)),100)
 
 is maybe close. With data frame (if it is not big)
 
 
DF
 
   color sample1 sample2 sample3
 1   red 400 3002500
 2 green 100   0 200
 3 black 3001000 500
 
 x - data.frame(matrix(NA,100,3))
 for (i in 2:ncol(DF)) x[,i-1] - sample(rep(DF[,1], DF[,i]),100)
 if you want result in data frame
 or
 x-vector(list, 3)
 for (i in 2:ncol(DF)) x[[,i-1]] - sample(rep(DF[,1], DF[,i]),100)
 
 if you want it in list. Maybe somebody is clever enough to discard 
 for loop but you said you have 80 columns which shall be no problem.
 
 HTH
 Petr
 
 
 
 
 
 
 
 On 11 Oct 2006 at 10:11, Brian Frappier wrote:
 
 Date sent:Wed, 11 Oct 2006 10:11:33 -0400
 From: Brian Frappier [EMAIL PROTECTED]
 To:   Petr Pikal [EMAIL PROTECTED]
 Subject:  Fwd: [R] rarefy a matrix of counts
 
 
-- Forwarded message --
From: Brian Frappier [EMAIL PROTECTED]
Date: Oct 11, 2006 10:10 AM
Subject: Re: [R] rarefy a matrix of counts
To: r-help@stat.math.ethz.ch

Hi Petr,

Thanks for your response.  I have data that looks like the following:

   sample 1 sample 2 sample 3  
red candy400 300   2500
green candy1000  200
black candy 3001000500

I don't want to randomly select either the samples (columns) or the
candy types (rows), which sample as you state would allow me. 
Instead, I want to randomly sample 100 candies from each sample and
retain info on their associated type.  I could make a list of all the
candies in each sample:

sample 1
red
red
red
red
green
green
black
red
black
...

and then randomly sample those rows.  Repeat for each sample.  But, I
am not sure how to do that without alot of loops, and am wondering if
there is an easier way in R.  Thanks!  I should have laid this out in
the first email...sorry.


On 10/11/06, Petr Pikal [EMAIL PROTECTED] wrote:

Hi

I am not experienced in Matlab and from your explanation I do not
understand what exactly do you want. It seems that you want randomly
choose a sample of 100 rows from your martix, what can be achived by
sample.

DF-data.frame(rnorm(100), 1:100, 101:200, 201:300)
DF[sample(1:100, 10),]

If you want to do this several times, you need to save your result
and than it depends on what you want to do next. One suitable form
is list of matrices the other is array and you can use for loop for
completing it.

HTH
Petr


On 10 Oct 2006 at 17:40, Brian Frappier wrote:

Date sent:  Tue, 10 Oct 2006 17:40:47 -0400
From:   Brian Frappier [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch Subject:   
[R] rarefy a matrix of counts


Hi all,

I have a matrix of counts for objects (rows) by samples (columns).
 I aimed for about 500 counts in each sample (I have about 80
samples) and would now like to rarefy these down to 100 counts in
each sample using simple random sampling without replacement.  I
plan on rarefying several times for each sample.  I could do the
tedious looping task of making a list of all objects (with its
associated identifier) in each sample and then use the wonderful
sampling package to select a sub-sample of 100 for each sample
and thereby get a logical vector of inclusions.  I would then
regroup the resulting logical vector into a vector of counts by
object, rinse and repeat several times for each sample.

Alternately, using the same list, I could create a random index of
integers between 1 and the number of objects for a sample (without
repeats) and then select those objects from the list.  Again,
rinse and repeat several time for each sample.

Is there a way to directly rarefy a matrix of counts without
having to create a list of objects first?  I am

Re: [R] Fwd: rarefy a matrix of counts

2006-10-11 Thread Tony Plate

Two things to note:

(1) rep() can be vectorized:
  rep(1:3, 2:4)
[1] 1 1 2 2 2 3 3 3 3
 

(2) you will likely get much better performance if you work with 
integers and convert to strings after sampling (or use factors), e.g.:

  c(red,green,blue)[sample(rep(1:3,c(400,100,300)), 5)]
[1] red  blue red  red  red
 

-- Tony Plate

Brian Frappier wrote:
 I tried all of the approaches below. 
 
 the problem with:
 
   x - data.frame(matrix(NA,100,3))
   for (i in 2:ncol(DF)) x[,i-1] - sample(rep(DF[,1], DF[,i]),100)
   if you want result in data frame
   or
   x-vector(list, 3)
   for (i in 2:ncol(DF)) x[[,i-1]] - sample(rep(DF[,1], DF[,i]),100)
 
 is that this code still samples the rows, not the elements, i.e. returns 
 100 or 300 in the matrix cells instead of red or a matrix of counts by 
 color (object type) like:
x1x2   x3  
 red  32 560
 gr6895   40
 sum 100  100  100
 
  It looks like Tony is right: sampling without replacement requires 
 listing of all elements to be sampled.  But, the code Petr provided
 
 x1 - sample(c(rep(red,400),rep(green, 100),rep(black,300)),100)
 
 did give me a clue of how to quickly make such a list using the 'rep' 
 command.  I will for-loop a rep statement using my original matrix to 
 create a list of elements for each sample:
 
 Thanks Petr and Tony for your help!
 
 On 10/11/06, *Tony Plate* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 
 wrote:
 
 Here's a way using apply(), and the prob= argument of sample():
 
   df - data.frame(sample1=c(red=400,green=100,black=300),
 sample2=c(300,0,1000), sample3=c(2500,200,500))
   df
sample1 sample2 sample3
 red   400 3002500
 green 100   0 200
 black 3001000 500
   set.seed(1)
   apply(df, 2, function(counts) sample(seq(along=counts), rep=T,
 size=7, prob=counts))
   sample1 sample2 sample3
 [1,]   1   3   1
 [2,]   1   3   1
 [3,]   3   3   1
 [4,]   2   3   2
 [5,]   1   3   1
 [6,]   2   3   1
 [7,]   2   3   3
  
 
 Note that this does sampling WITH replacement.
 AFAIK, sampling without replacement requires enumerating the entire
 population to be sampled from.  I.e., you cannot do
   sample(1:3, prob=1:3, rep=F, size=4)
 instead of
   sample(c(1,2,2,3,3,3), rep=F, size=4)
 
 -- Tony Plate
 
  From reading ?sample, I was a little unclear on whether sampling
 without replacement could work
 
 Petr Pikal wrote:
   Hi
  
   a litle bit different story. But
  
   x1 - sample(c(rep(red,400),rep(green, 100),
   rep(black,300)),100)
  
   is maybe close. With data frame (if it is not big)
  
  
  DF
  
 color sample1 sample2 sample3
   1   red 400 3002500
   2 green 100   0 200
   3 black 3001000 500
  
   x - data.frame(matrix(NA,100,3))
   for (i in 2:ncol(DF)) x[,i-1] - sample(rep(DF[,1], DF[,i]),100)
   if you want result in data frame
   or
   x-vector(list, 3)
   for (i in 2:ncol(DF)) x[[,i-1]] - sample(rep(DF[,1], DF[,i]),100)
  
   if you want it in list. Maybe somebody is clever enough to discard
   for loop but you said you have 80 columns which shall be no problem.
  
   HTH
   Petr
  
  
  
  
  
  
  
   On 11 Oct 2006 at 10:11, Brian Frappier wrote:
  
   Date sent:Wed, 11 Oct 2006 10:11:33 -0400
   From: Brian Frappier  [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED]
   To:   Petr Pikal [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED]
   Subject:  Fwd: [R] rarefy a matrix of counts
  
  
  -- Forwarded message --
  From: Brian Frappier [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED]
  Date: Oct 11, 2006 10:10 AM
  Subject: Re: [R] rarefy a matrix of counts
  To: r-help@stat.math.ethz.ch mailto:r-help@stat.math.ethz.ch
  
  Hi Petr,
  
  Thanks for your response.  I have data that looks like the
 following:
  
 sample 1 sample 2 sample 3  
  red candy400 300   2500
  green candy1000  200
  black candy 3001000500
  
  I don't want to randomly select either the samples (columns) or the
  candy types (rows), which sample as you state would allow me.
  Instead, I want to randomly sample 100 candies from each sample and
  retain info on their associated type.  I could make a list of all the
  candies in each sample:
  
  sample 1
  red
  red
  red
  red
  green
  green
  black
  red
  black

Re: [R] enter browser on error

2004-08-31 Thread Tony Plate

use options(error=recover), e.g.:
 remove(x)
NULL
Warning message:
remove: variable x was not found
 (function() {x})()
Error in (function() { : Object x not found
 options(error=recover)
 (function(y=1) {x})(2)
Error in (function(y = 1) { : Object x not found
Enter a frame number, or 0 to exit
1:(function(y = 1) {
Selection: 1
Called from: eval(expr, envir, enclos)
Browse[1] y
[1] 2
Browse[1]
Enter a frame number, or 0 to exit
1:(function(y = 1) {
Selection: 0

At Tuesday 11:26 AM 8/31/2004, Bickel, David wrote:
Is there a way I can get R to automatically enter the browser inside a 
user-defined function on the generation of an error? Specifically, I'm 
trying to debug this:

Error in as.double.default(sapply(lis, FUN)) :
(list) object cannot be coerced to double
In addition: There were 38 warnings (use warnings() to see them)
 traceback()
8: as.double.default(sapply(lis, FUN))
7: as.numeric(sapply(lis, FUN))
6: numeric.sapply(function(x) {
   [EMAIL PROTECTED]
   })
On detection of the error, I would like browser() to be called at the 
level of numeric.sapply(), so that I can examine x. I'm wondering if this 
can be done by modifying the default error handling. Using try() with 
browser() didn't work.

Thanks,
David
_
David Bickel  http://davidbickel.com
Research Scientist
Pioneer Hi-Bred International
Bioinformatics  Exploratory Research
7250 NW 62nd Ave., PO Box 552
Johnston, Iowa 50131-0552
515-334-4739 Tel
515-334-6634 Fax
[EMAIL PROTECTED], [EMAIL PROTECTED]

This communication is for use by the intended recipient and ...{{dropped}}
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Signs of loadings from princomp on Windows

2004-09-14 Thread Tony Plate

FWIW, I see the same behavior as Francisco on my Windows machine (also an 
installation of the windows binary without trying to install any special 
BLAS libraries):

 library(MASS)
 data(painters)
 pca.painters - princomp(painters[ ,1:4])
 loadings(pca.painters)
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4
Composition  0.484 -0.376  0.784 -0.101
Drawing  0.424  0.187 -0.280 -0.841
Colour  -0.381 -0.845 -0.211 -0.310
Expression   0.664 -0.330 -0.513  0.432
   Comp.1 Comp.2 Comp.3 Comp.4
SS loadings  1.00   1.00   1.00   1.00
Proportion Var   0.25   0.25   0.25   0.25
Cumulative Var   0.25   0.50   0.75   1.00
 pca.painters - princomp(painters[ ,1:4])
 loadings(pca.painters)
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4
Composition -0.484 -0.376  0.784 -0.101
Drawing -0.424  0.187 -0.280 -0.841
Colour   0.381 -0.845 -0.211 -0.310
Expression  -0.664 -0.330 -0.513  0.432
   Comp.1 Comp.2 Comp.3 Comp.4
SS loadings  1.00   1.00   1.00   1.00
Proportion Var   0.25   0.25   0.25   0.25
Cumulative Var   0.25   0.50   0.75   1.00
 R.version
 _
platform i386-pc-mingw32
arch i386
os   mingw32
system   i386, mingw32
status
major1
minor9.1
year 2004
month06
day  21
language R

My machine is a dual-processor hp xw8000.
I also get the same results with R 2.0.0 dev as in
 R.version
 _
platform i386-pc-mingw32
arch i386
os   mingw32
system   i386, mingw32
status   Under development (unstable)
major2
minor0.0
year 2004
month09
day  13
language R

-- Tony Plate
At Tuesday 10:25 AM 9/14/2004, Prof Brian Ripley wrote:
On Tue, 14 Sep 2004, Francisco Chamu wrote:
 I have run this on both Windows 2000 and XP.  All I did was install
 the binaries from CRAN so I think I am using the standard Rblas.dll.

 To reproduce what I see you must run the code at the beginning of the
 R session.
We did, as you said `start a clean session'.
I think to reproduce what you see we have to be using your account on your
computer.
 After the second run, all subsequent runs give the same
 result as the second set.

 Thanks,
 Francisco


 On Tue, 14 Sep 2004 08:29:25 +0200, Uwe Ligges
 [EMAIL PROTECTED] wrote:
  Prof Brian Ripley wrote:
   I get the second set each time, on Windows, using the build from CRAN.
   Which BLAS are you using?
 
 
  Works also well for me with a self compiled R-1.9.1 (both with standard
  Rblas as well as with the Rblas.dll for Athlon CPU from CRAN).
  Is this a NT-based version of Windows (NT, 2k, XP)?
 
  Uwe
 
 
 
 
   On Tue, 14 Sep 2004, Francisco Chamu wrote:
  
  
  I start a clean session of R 1.9.1 on Windows and I run the 
following code:
  
  
  library(MASS)
  data(painters)
  pca.painters - princomp(painters[ ,1:4])
  loadings(pca.painters)
  
  Loadings:
  Comp.1 Comp.2 Comp.3 Comp.4
  Composition  0.484 -0.376  0.784 -0.101
  Drawing  0.424  0.187 -0.280 -0.841
  Colour  -0.381 -0.845 -0.211 -0.310
  Expression   0.664 -0.330 -0.513  0.432
  
 Comp.1 Comp.2 Comp.3 Comp.4
  SS loadings  1.00   1.00   1.00   1.00
  Proportion Var   0.25   0.25   0.25   0.25
  Cumulative Var   0.25   0.50   0.75   1.00
  
  However, if I rerun the same analysis, the loadings of the first
  component have the opposite sign (see below), why is that?  I have
  read the note
  in the princomp help that says
  
  The signs of the columns of the loadings and scores are arbitrary,
   and so may differ between different programs for PCA, and even
   between different builds of R.
  
  However, I still would expect the same signs for two runs in the 
same session.
  
  
  pca.painters - princomp(painters[ ,1:4])
  loadings(pca.painters)
  
  Loadings:
  Comp.1 Comp.2 Comp.3 Comp.4
  Composition -0.484 -0.376  0.784 -0.101
  Drawing -0.424  0.187 -0.280 -0.841
  Colour   0.381 -0.845 -0.211 -0.310
  Expression  -0.664 -0.330 -0.513  0.432
  
 Comp.1 Comp.2 Comp.3 Comp.4
  SS loadings  1.00   1.00   1.00   1.00
  Proportion Var   0.25   0.25   0.25   0.25
  Cumulative Var   0.25   0.50   0.75   1.00
  
  R.version
  
   _
  platform i386-pc-mingw32
  arch i386
  os   mingw32
  system   i386, mingw32
  status
  major1
  minor9.1
  year 2004
  month06
  day  21
  language R
  
  BTW, I have tried the same in R 1.9.1 on Debian and I can't reproduce
  what I see
  on Windows.  In fact all the runs give the same as the second run 
on Windows.
  
  -Francisco
  
  __
  [EMAIL PROTECTED] mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
  
  
  
  
 
 



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road

Re: [R] efficient submatrix extraction

2004-09-15 Thread Tony Plate

I think you should be able to do something with reassigning the dim 
attribute, and then using apply(), something along the lines of the 
following (which doesn't do your computation on the data in the subarrays, 
but merely illustrates how to create and access them):

 x - matrix(1:64,ncol=8)
 x
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]19   17   25   33   41   49   57
[2,]2   10   18   26   34   42   50   58
[3,]3   11   19   27   35   43   51   59
[4,]4   12   20   28   36   44   52   60
[5,]5   13   21   29   37   45   53   61
[6,]6   14   22   30   38   46   54   62
[7,]7   15   23   31   39   47   55   63
[8,]8   16   24   32   40   48   56   64
 x2 - x
 dim(x2) - c(2,4,2,4)
 x2[,1,,1]
 [,1] [,2]
[1,]19
[2,]2   10
 x2[,2,,1]
 [,1] [,2]
[1,]3   11
[2,]4   12
 x2[,1,,2]
 [,1] [,2]
[1,]   17   25
[2,]   18   26
 x4 - x
 dim(x4) - c(4,2,4,2)
 x4[,1,,1]
 [,1] [,2] [,3] [,4]
[1,]19   17   25
[2,]2   10   18   26
[3,]3   11   19   27
[4,]4   12   20   28
 invisible(apply(x4, c(2,4), print))
 [,1] [,2] [,3] [,4]
[1,]19   17   25
[2,]2   10   18   26
[3,]3   11   19   27
[4,]4   12   20   28
 [,1] [,2] [,3] [,4]
[1,]5   13   21   29
[2,]6   14   22   30
[3,]7   15   23   31
[4,]8   16   24   32
 [,1] [,2] [,3] [,4]
[1,]   33   41   49   57
[2,]   34   42   50   58
[3,]   35   43   51   59
[4,]   36   44   52   60
 [,1] [,2] [,3] [,4]
[1,]   37   45   53   61
[2,]   38   46   54   62
[3,]   39   47   55   63
[4,]   40   48   56   64

hope this helps,
Tony Plate
At Wednesday 03:10 PM 9/15/2004, Rajarshi Guha wrote:
Hi,
  I have a matrix of say 1024x1024 and I want to look at it in chunks.
That is I'd like to divide into a series of submatrices of order 2x2.
| 1 2 3 4 5 6 7 8 ... |
| 1 2 3 4 5 6 7 8 ... |
| 1 2 3 4 5 6 7 8 ... |
| 1 2 3 4 5 6 7 8 ... |
...
So the first submatrix would be
| 1 2 |
| 1 2 |
the second one would be
| 3 4 |
| 3 4 |
and so on. That is I want the matrix to be evenly divided into 2x2
submatrices. Now I'm also doing this subdivision into 4x4, 8x8 ...
256x256 submatrices.
Currently I'm using loops and I'm sure there is a mroe efficient way to
do it:
m - matrix(runif(1024*1024), nrow=1024)
boxsize - 2^(1:8)
for (b in boxsize) {
bcount - 0
bstart - seq(1,1024, by=b)
for (x in bstart) {
for (y in bstart) {
xend - x + b - 1
yend - y + b - 1
if (length(which( m[ x:xend, y:yend ]  0.7))  0) {
bcount - bcount + 1
}
}
}
}
Is there any way to vectorize the two inner loops?
Thanks,
---
Rajarshi Guha [EMAIL PROTECTED] http://jijo.cjb.net
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
---
The way to love anything is to realize that it might be lost.
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Signs of loadings from princomp on Windows

2004-09-15 Thread Tony Plate

You could investigate this yourself by looking at the code of princomp (try 
getAnywhere(princomp.default)).  I'd suggest making a file that in-lines 
the body of princomp.default into the commands you had below.  See if you 
still get the difference.  (I'd be surprised if you didn't).  Then try 
commenting out lines the second pass through the commands produces the same 
results as the first.  The very last thing you commented out might help to 
answer your question What would be causing the
difference?  (The fact that various people chimed in to say they could 
reproduce the behavior that bothered you, but didn't bother dig deeper 
suggests it didn't bother them that much, which further suggests that you 
are the person most motivated by this and thus the best candidate for 
investigating it further...)

-- Tony Plate
At Wednesday 05:07 PM 9/15/2004, Francisco Chamu wrote:
I am sorry to insist, but we have three other people that were able to
reproduce the behavior I mentioned.  I have also installed R 1.9.1
from the CRAN binaries on a different Windows machine and again I see
the differents signs as mentioned before.  What would be causing the
difference?
-Francisco
On Tue, 14 Sep 2004 11:04:29 -0600, Tony Plate
[EMAIL PROTECTED] wrote:
 FWIW, I see the same behavior as Francisco on my Windows machine (also an
 installation of the windows binary without trying to install any special
 BLAS libraries):

   library(MASS)
   data(painters)
   pca.painters - princomp(painters[ ,1:4])
   loadings(pca.painters)

 Loadings:
  Comp.1 Comp.2 Comp.3 Comp.4
 Composition  0.484 -0.376  0.784 -0.101
 Drawing  0.424  0.187 -0.280 -0.841
 Colour  -0.381 -0.845 -0.211 -0.310
 Expression   0.664 -0.330 -0.513  0.432

 Comp.1 Comp.2 Comp.3 Comp.4
 SS loadings  1.00   1.00   1.00   1.00
 Proportion Var   0.25   0.25   0.25   0.25
 Cumulative Var   0.25   0.50   0.75   1.00
   pca.painters - princomp(painters[ ,1:4])
   loadings(pca.painters)

 Loadings:
  Comp.1 Comp.2 Comp.3 Comp.4
 Composition -0.484 -0.376  0.784 -0.101
 Drawing -0.424  0.187 -0.280 -0.841
 Colour   0.381 -0.845 -0.211 -0.310
 Expression  -0.664 -0.330 -0.513  0.432

 Comp.1 Comp.2 Comp.3 Comp.4
 SS loadings  1.00   1.00   1.00   1.00
 Proportion Var   0.25   0.25   0.25   0.25
 Cumulative Var   0.25   0.50   0.75   1.00
   R.version
   _
 platform i386-pc-mingw32
 arch i386
 os   mingw32
 system   i386, mingw32
 status
 major1
 minor9.1
 year 2004
 month06
 day  21
 language R
  

 My machine is a dual-processor hp xw8000.

 I also get the same results with R 2.0.0 dev as in
   R.version
   _
 platform i386-pc-mingw32
 arch i386
 os   mingw32
 system   i386, mingw32
 status   Under development (unstable)
 major2
 minor0.0
 year 2004
 month09
 day  13
 language R
  

 -- Tony Plate



 At Tuesday 10:25 AM 9/14/2004, Prof Brian Ripley wrote:
 On Tue, 14 Sep 2004, Francisco Chamu wrote:
 
   I have run this on both Windows 2000 and XP.  All I did was install
   the binaries from CRAN so I think I am using the standard Rblas.dll.
  
   To reproduce what I see you must run the code at the beginning of the
   R session.
 
 We did, as you said `start a clean session'.
 
 I think to reproduce what you see we have to be using your account on your
 computer.
 
   After the second run, all subsequent runs give the same
   result as the second set.
  
   Thanks,
   Francisco
  
  
   On Tue, 14 Sep 2004 08:29:25 +0200, Uwe Ligges
   [EMAIL PROTECTED] wrote:
Prof Brian Ripley wrote:
 I get the second set each time, on Windows, using the build 
from CRAN.
 Which BLAS are you using?
   
   
Works also well for me with a self compiled R-1.9.1 (both with 
standard
Rblas as well as with the Rblas.dll for Athlon CPU from CRAN).
Is this a NT-based version of Windows (NT, 2k, XP)?
   
Uwe
   
   
   
   
 On Tue, 14 Sep 2004, Francisco Chamu wrote:


I start a clean session of R 1.9.1 on Windows and I run the
  following code:


library(MASS)
data(painters)
pca.painters - princomp(painters[ ,1:4])
loadings(pca.painters)

Loadings:
Comp.1 Comp.2 Comp.3 Comp.4
Composition  0.484 -0.376  0.784 -0.101
Drawing  0.424  0.187 -0.280 -0.841
Colour  -0.381 -0.845 -0.211 -0.310
Expression   0.664 -0.330 -0.513  0.432

   Comp.1 Comp.2 Comp.3 Comp.4
SS loadings  1.00   1.00   1.00   1.00
Proportion Var   0.25   0.25   0.25   0.25
Cumulative Var   0.25   0.50   0.75   1.00

However, if I rerun the same analysis, the loadings of the first
component have the opposite sign (see below), why is that?  I have
read the note
in the princomp help that says

The signs of the columns of the loadings and scores are 
arbitrary,
 and so may differ between

Re: [R] There were 50 or more warnings (use warnings() to see the first 50)

2004-09-16 Thread Tony Plate

Try putting options(warn=1) at the start of your R code.
This should cause the warnings to be printed as they occur, instead of the 
default of being saved up until the top-level command terminates.

See ?warning and ?option.
-- Tony Plate
At Thursday 08:52 AM 9/16/2004, Mag. Ferri Leberl wrote:
I employ R in the Slave-Mode.
The slave returns me the following feedback:
There were 50 or more warnings (use warnings() to see the first 50)
I have found no way so far to get the warnings viewed. Which command would be
appropriate? warnings() (without an argument) returns NULL.
Thank you in advance.
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] read 4-jan-02 as date

2004-10-11 Thread Tony Plate

Works fine when you give as.Date() a character vector.  I suspect the Date 
column in your data frame is a factor.

 d - c(12-Jan-01, 11-Jan-01, 10-Jan-01, 9-Jan-01, 8-Jan-01, 
5-Jan-01)
 d
[1] 12-Jan-01 11-Jan-01 10-Jan-01 9-Jan-01  8-Jan-01  5-Jan-01
 as.Date(d, format=%d-%b-%y)
[1] 2001-01-12 2001-01-11 2001-01-10 2001-01-09 2001-01-08
[6] 2001-01-05
 as.Date(factor(d), format=%d-%b-%y)
Error in fromchar(x) : character string is not in a standard unambiguous format


Hope this helps,
Tony Plate
At Monday 09:04 AM 10/11/2004, bogdan romocea wrote:
Dear R users,
I have a column with dates (character) in a data frame:
12-Jan-01 11-Jan-01 10-Jan-01 9-Jan-01  8-Jan-01  5-Jan-01
and I need to convert them to (Julian) dates so that I can
sort the whole data frame by date. I thought it would be
very simple, but after checking the documentation and the
list I still don't have something that works.
1. as.Date returns the error below. What am I doing wrong?
As far as I can see the character strings are in standard
format.
d$Date - as.Date(d$Date, format=%d-%b-%y)
Error in fromchar(x) : character string is not in a
standard unambiguous format
2. as.date {Survival} produces this error,
d$Date - as.date(d$Date, order = dmy)
Error in as.date(d$Date, order = dmy) : Cannot coerce to
date format
3. Assuming all else fails, is there a text function
similar to SCAN in SAS? Given a string like 9-Jan-01 and
- as separator, I'd like a function that can read the
first, second and third values (9, Jan, 01), so that I can
get Julian dates with mdy.date {survival}.
Thanks in advance,
b.
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How might one write this better?

2004-10-05 Thread Tony Plate

The trick to vectorizing
 asset - numeric(T+1)
 for (t in 1:T) asset[t+1] - cont[t] + ret[t]*asset[t]
is to expand it algebraically into a sum of terms like:
asset[4] = cont[3] + ret[3] * cont[2] + ret[3] * ret[2] * cont[1]
(where the general case should be reasonably obvious, but is more work to 
write down)

Then recognize that this a sum of the elementwise product of a pair of 
vectors, one of which can be constructed with careful use of rev() and 
cumprod():

 set.seed(1)
 ret - (rnorm(5)+1)/10
 cont - seq(along=ret)+100
 asset - numeric(length(ret)+1)
 # loop way of computing assets -- final asset value is in the last 
element of asset[]
 for (i in seq(along=ret)) asset[i+1] - cont[i] + (1+ret[i]) * asset[i]
 asset
[1]   0. 101. 214.9548 321.4880 508.9232 681.5849
 # vectorized way of computing final asset value
 sum(cumprod(rev(c(1+ret[-1],1))) * rev(cont))
[1] 681.585
 # compare the two
 sum(cumprod(rev(c(1+ret[-1],1))) * rev(cont)) - asset[length(ret)+1]
[1] 0


At Sunday 05:35 AM 10/3/2004, you wrote:
I am trying to simulate the trajectory of the pension assets of one
person. In C-like syntax, it looks like this:
daily.wage.growth = 1.001 # deterministic
contribution.rate = 0.08  # deterministic 8%
Wage = 10 # initial
Asset = 0 # initial
for (10,000 days) {
Asset += contribution.rate * Wage   # accreting contributions
Wage *= daily.wage.growth * Wage# wage growth
Asset *= draw from a normal distribution# Asset returns
}
cat(Terminal asset = , Asset, \n)
How can one do this well in R? What I tried so far is to notice that
the wage trajectory is deterministic, it does not change from one run
to the next, and it can be done in one line. The asset returns
trajectory can be obtained using a single call to rnorm(). Both these
can be done nicely using R functions (if you're curious, I can give
you my code). Using these, I efficiently get a vector of contributions
c[] and a vector of returns r[]. But that still leaves the loop:
  Asset - 0
  for (t in 1:T) {
Asset - c[t] + r[t]*Asset
  }
How might one do this better?
I find that using this code, it takes roughly 0.3 seconds per
computation of Asset (on my dinky 500 MHz Celeron). I need to do
50,000 of these every now and then, and it's a pain to have to wait 3
hours. It'll be great if there is some neat R way to rewrite the
little loop above.
--
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Equivalents of Matlab's 'find' and 'end'

2004-10-07 Thread Tony Plate

At Thursday 08:10 AM 10/7/2004, Bryan L. Brown wrote:
Sorry if these questions have been asked recently--I'm new to this list.
I'm primarily a Matlab user who is attempting to learn R and I'm searching 
for possible equivalents of commands that I found very handy in 
Matlab.  So that I don't seem ungrateful to those who may answer, I HAVE 
determined ways to carry out these processes in 'brute force' sorts of 
ways in R code, but they lack the elegance and simplicity of the Matlab 
commands.  Also, if you know that no such commands exist, that bit of 
knowledge would be helpful to know so that I don't continue fruitless 
searches.

The first is Matlab's 'find' command.
This is one of the most useful commands in Matab.  Basically, if X is the 
vector

X=[3, 2, 1, 1, 2, 3]
the command
'find(X==1)'
would return the vector [3, 4] which would indicate that the vector X had 
the value of 1 at the 3 and 4 positions.  This was an extremely useful 
command for subsetting in Matlab.  The closest thing I've found in R has 
been 'match' but match only returns the first value as opposed to the 
position of all matching values.
For this specific case, you can use which().  Also note that sometimes it 
can be useful to use match() with the arguments swapped, which can return 
you the positions of all matching values.  Also, the operator %in% can be 
useful:

 X - c(3, 2, 1, 1, 2, 3)
 which(X==1)
[1] 3 4
 match(1, X)
[1] 3
 match(X, 1)
[1] NA NA  1  1 NA NA
 which(!is.na(match(X, 1)))
[1] 3 4
 which(X %in% 1)
[1] 3 4


The second Matlab command that I'd like to find an R equivalent for is 
'end'.  'end' is just a simple little command that indicates the end of a 
row/column.  It is incredibly handy when used to subset matrices like

Y = X(2:end)
and produces Y=[2, 1, 1, 2, 3] if the X is the same as in the previous 
example.  This cutsie little command was extremely useful for composing 
programs that were flexible and could use input matrices of any size 
without modifying the code.  I realize that you can accomplish the same by 
Y - X[2:length(X)] in R, but this method is ungainly, particularly when 
subsetting matrices rather than vectors.
Yep, that is a handy feature, and I often wish for something like it, but 
in my 10 years of using R/S-PLUS I've not come across anything better than 
using length(X) (or nrow(X)/ncol(X)) for the general case.  (But I do 
sometimes still discover useful things that I didn't know about.)

For your specific case of
 Y = X(2:end)
in R/S-PLUS you can do:
 Y = X[-1]
If anyone has advice, I'd be grateful,
Bryan L. Brown
Integrative Biology
University of Texas at Austin
Austin, TX 78712
512-965-0678
[EMAIL PROTECTED]
[[alternative HTML version deleted]]
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] R-(wiki)-pedia?

2004-10-07 Thread Tony Plate

At Thursday 11:29 AM 10/7/2004, Dan Bolser wrote:
[snip]
I just added some pages... I think it would be great if people could get
motivated to contribute to something like this. Its one of those cases of
just getting the ball rolling...
Do you think you can dump the existing R-docs into this wiki as a
framework to get things going?
If the existing R-docs are dumped into a wiki, won't the copy in the Wiki 
quickly get out of date?  How does one get around this problem?

-- Tony Plate
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] gsub() on Matrix

2004-10-28 Thread Tony Plate

Many more recent regular expression implementations have ways of indicating 
a match on a word boundary.  It's usually \b.

Here's what you did:
 gsub(x1, i1, x1 + x2 + x10 + xx1)
[1] i1 + x2 + i10 + xi1
The following worked for me to just change x1 to i1, while leaving 
alone any larger word that contains x1:

 gsub(\\bx1\\b, i1, x1 + x2 + x10 + xx1)
[1] i1 + x2 + x10 + xx1

Note that the backslash must be escaped itself to get past the R lexical 
analyser, which is independent of the regexp processor.  What the regexp 
processor sees is just a single backslash.

For more on this, look for perl documentation of regular expressions.  Be 
aware that to use full perl regexps, you must supply the perl=T argument to 
gsub().  Also note that \b seems to be part of the most basic regular 
expression language in R; it even works with extended=F:

 gsub(\\bx1\\b, i1, x1 + x2 + x10 + xx1, perl=T)
[1] i1 + x2 + x10 + xx1
 gsub(\\bx1\\b, i1, x1 + x2 + x10 + xx1, perl=F)
[1] i1 + x2 + x10 + xx1
 gsub(\\bx1\\b, i1, x1 + x2 + x10 + xx1, perl=F, ext=F)
[1] i1 + x2 + x10 + xx1

(I assumed the fact that you have a matrix of strings is not relevant.)
Hope this helps,
Tony Plate
At Wednesday 09:07 PM 10/27/2004, Kevin Wang wrote:
Hi,
Suppose I've got a matrix, and the first few elements look like
  x1 + x3 + x4 + x5 + x1:x3 + x1:x4
  x1 + x2 + x3 + x5 + x1:x2 + x1:x5
  x1 + x3 + x4 + x5 + x1:x3 + x1:x5
and so on (have got terms from x1 ~ x14).
If I want to replace all the x1 with i7, all x2 with i14, all x3 with i13,
for example.  Is there an easy way?
I tried to put what I want to replace in a vector, like:
 repl = c(i7, i14, i13, d2, i8, i5,
  i6, i3, A, i9, i2,
  i4, i15, i21)
and have another vector, say:
   orig
 [1] x1  x2  x3  x4  x5  x6  x7  x8  x9  x10
[11] x11 x12 x13 x14
Then I tried something like
  gsub(orig, repl, mat)
## mat is the name of my matrix
but it didn't work *_*.it would replace terms like x10 with i70.
(I know it may be an easy question...but I haven't done much regular
expression)
Cheers,
Kevin

Ko-Kang Kevin Wang
PhD Student
Centre for Mathematics and its Applications
Building 27, Room 1004
Mathematical Sciences Institute (MSI)
Australian National University
Canberra, ACT 0200
Australia
Homepage: http://wwwmaths.anu.edu.au/~wangk/
Ph (W): +61-2-6125-2431
Ph (H): +61-2-6125-7407
Ph (M): +61-40-451-8301
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] why should you set the mode in a vector?

2004-10-29 Thread Tony Plate

It's useful when you need to be certain of the mode of a vector.  One such 
situation is when you are about to call a C-language function using the 
.C() interface.  As you point out, some assignments (even just to vector 
elements) can change the mode of the entire vector.  This is why it's 
important to check the mode of vectors passed to external language 
functions immediately before the call.

As to what assigning the mode does, it specifies (or changes, if necessary) 
the underlying type of storage of the vector.  In R, all the elements in a 
vector have the same storage mode.  In the example below, the storage is 
initial as double-precision floats, but after the assignment of character 
data to element 2, the vector is stored as character data (with suitably 
coerced values of the other elements).  After assignment of list data to 
element 1, the entire vector becomes a list (i.e., a vector of pointers to 
general objects).  [The terminology I'm using here is a little loose, but 
someone please correct me if it is outright wrong.]  Finally, the assigning 
of mode numeric to the list fails because not all elements can be 
coerced.  (And I'm not sure why the last assignment succeeds and produces 
the results it does.)

 v - vector(mode=numeric,length=4)
 v[3:4] - 3:4
 storage.mode(v)
[1] double
 v[2] - foo
 v
[1] 0   foo 3   4
 storage.mode(v)
[1] character

 v[1] - list(1:3)
 v
[[1]]
[1] 1 2 3
[[2]]
[1] foo
[[3]]
[1] 3
[[4]]
[1] 4
 mode(v) - numeric
Error in as.double.default(list(as.integer(c(1, 2, 3)), foo, 3, 4)) :
(list) object cannot be coerced to double
 x - v[2:4]
 mode(x) - numeric
 x
[1] NA NA NA

-- Tony Plate
At Friday 03:41 PM 10/29/2004, Joel Bremson wrote:
Hi all,
If I write
v = vector(mode=numeric,length=10)
I'm still allowed to assign non-numerics to v.
Furthermore, R figures out what kind of vector I've got anyway
when I use the mode() function.
So what is it that assigning a mode does?
Joel
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] make apply() return a list

2004-11-01 Thread Tony Plate

for()-loops aren't so bad.  Look inside the code of apply() and see what it 
uses!  The important thing is that you use vectorized functions to 
manipulate vectors.  It's often fine to use for-loops to manipulate the 
rows or columns of a matrix, but once you've extracted a row or a column, 
then use a vectorized function to manipulate that data.

In any case, one way to get apply() to return a list is to wrap the result 
from the subfunction inside a list, e.g.:

 x - apply(matrix(1:6,2), 1, function(x) list((c(mean=mean(x), sd=sd(x)
 x
[[1]]
[[1]][[1]]
mean   sd
   32
[[2]]
[[2]][[1]]
mean   sd
   42
 # to remove the extra level of listing here, do:
 lapply(x, [[, 1)
[[1]]
mean   sd
   32
[[2]]
mean   sd
   42

At Monday 11:37 AM 11/1/2004, Arne Henningsen wrote:
Hi,
I have a dataframe (say myData) and want to get a list (say myList) that
contains a matrix for each row of the dataframe myData. These matrices are
calculated based on the corresponding row of myData. Using a for()-loop to do
this is very slow. Thus, I tried to use apply(). However, afaik apply() does
only return a list if the matrices have different dimensions, while my
matrices have all the same dimension. To get a list I could change the
dimension of one matrix artificially and restore it after apply():
This a (very much) simplified example of what I did:
 myData - data.frame( a = c( 1,2,3 ), b = c( 4,5,6 ) )
 myFunction - function( values ) {
+myMatrix - matrix( values, 2, 2 )
+if( all( values == myData[ 1, ] ) ) {
+   myMatrix - cbind( myMatrix, rep( 0, 2 ) )
+}
+return( myMatrix )
+ }
 myList - apply( myData, 1, myFunction )
 myList[[ 1 ]] - myList[[ 1 ]][ 1:2, 1:2 ]
 myList
$1
 [,1] [,2]
[1,]11
[2,]44
$2
 [,1] [,2]
[1,]22
[2,]55
$3
 [,1] [,2]
[1,]33
[2,]66
This exactly does what I want and really speeds up the calculation, but I
wonder if there is an easier way to make apply() return a list.
Thanks for your help,
Arne
--
Arne Henningsen
Department of Agricultural Economics
University of Kiel
Olshausenstr. 40
D-24098 Kiel (Germany)
Tel: +49-431-880 4445
Fax: +49-431-880 1397
[EMAIL PROTECTED]
http://www.uni-kiel.de/agrarpol/ahenningsen/
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Reading word by word in a dataset

2004-11-01 Thread Tony Plate

Trying to make it work when not all rows have the same numbers of fields 
seems like a good place to use the flush argument to scan() (to skip 
everything after the first field on the line):

With the following copied to the clipboard:
i1-apple10$   New_York
i2-banana
i3-strawberry   7$Japan
do:
 scan(clipboard, , flush=T)
Read 3 items
[1] i1-apple  i2-banana i3-strawberry
 sub(^[A-Za-z0-9]*-, , scan(clipboard, , flush=T))
Read 3 items
[1] apple  banana strawberry

-- Tony Plate
At Monday 01:59 PM 11/1/2004, Spencer Graves wrote:
 Uwe and Andy's solutions are great for many applications but won't 
work if not all rows have the same numbers of fields.  Consider for 
example the following modification of Lee's example:
i1-apple10$   New_York
i2-banana
i3-strawberry   7$Japan

 If I copy this to clipboard and run Andy's code, I get the following:
 read.table(clipboard, colClasses=c(character, NULL, NULL))
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
dec,  :
   line 2 did not have 3 elements

 We can get around this using scan, then splitting things apart 
similar to the way Uwe described:
 dat -
+ scan(clipboard, character(0), sep=\n)
Read 3 items
 dash - regexpr(-, dat)
 dat2 - substring(dat, pmax(0, dash)+1)

 blank - regexpr( , dat2)
 if(any(blank0))
+   blank[blank0] - nchar(dat2[blank0])
 substring(dat2, 1, blank)
[1] apple   banana  strawberry 

 hope this helps.  spencer graves
Uwe Ligges wrote:
Liaw, Andy wrote:
Using R-2.0.0 on WinXPPro, cut-and-pasting the data you have:

read.table(clipboard, colClasses=c(character, NULL, NULL))

 V1
1  i1-apple
2 i2-banana
3 i3-strawberry

... and if only the words after - are of interest, the statement can be 
followed by

 sapply(strsplit(, -), [, 2)
Uwe Ligges

HTH,
Andy

From: j lee
Hello All,
I'd like to read first words in lines into a new file.
If I have a data file the following, how can I get the
first words: apple, banana, strawberry?
i1-apple10$   New_York
i2-banana   5$London
i3-strawberry   7$Japan
Is there any similar question already posted to the
list? I am a bit new to R, having a few months of
experience now.
Cheers,
John
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Resources for optimizing code

2004-11-05 Thread Tony Plate

Have you tried reading the manual An Introduction to R, with special 
attention to Array Indexing (indexing for data frames is pretty similar 
to indexing for matrices).

Unless I'm misunderstanding, what you want to do is very simple.  It is 
possible to use numeric vectors with 0 and 1 to indicate whether you want 
to keep the row, but it's a little easier with logical vectors.  Here's an 
example:

 x - data.frame(a=1:5,b=letters[1:5])
 keep.num - ifelse(x$a %% 2 == 1, 1, 0)
 keep.num
[1] 1 0 1 0 1
 keep.logical - (x$a %% 2) == 1
 keep.logical
[1]  TRUE FALSE  TRUE FALSE  TRUE
 x[keep.num==1,,drop=F]
  a b
1 1 a
3 3 c
5 5 e
 x[keep.logical,,drop=F]
  a b
1 1 a
3 3 c
5 5 e


At Friday 10:34 AM 11/5/2004, Janet Elise Rosenbaum wrote:
I want to eliminate certain observations in a large dataframe (21000x100).
I have written code which does this using a binary vector (0=delete obs,
1=keep), but it uses for loops, and so it's slow and in the extreme it
causes R to hang for indefinite time periods.
I'm looking for one of two things:
1.  A document which discusses how to avoid for loops and situations in
which it's impossible to avoid for loops.
or
2.  A function which can do the above better than mine.
My code is pasted below.
Thanks so much,
Janet
# asst is a binary vector of length= nrow(DATAFRAME).
# 1= observations you want to keep.  0= observation to get rid of.
remove.xtra.f -function(asst, DATAFRAME) {
n-sum(asst, na.rm=T)
newdata-matrix(nrow=n, ncol=ncol(DATAFRAME))
j-1
for(i in 1:length(data)) {
if (asst[i]==1) {
newdata[j,]-DATAFRAME[i,]
j-j+1
}
}
newdata.f-as.data.frame(newdata)
names(newdata.f)-names(DATAFRAME)
return(newdata.f)
}
--
Janet Rosenbaum [EMAIL PROTECTED]
PhD Candidate in Health Policy, Harvard GSAS
Harvard Injury Control Research Center, Harvard School of Public Health
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] hashing using named lists

2004-11-18 Thread Tony Plate

Use match() for exact matching,
i.e.,
 test[[match(name, names(test))]]
Yes, it is more cumbersome.  This partial matching is considered by some to 
be a design fault, but changing it would break too many programs that 
depend upon it.

I don't understand your question about all.equal.list() -- it does seem to 
require exact matches on names, e.g.:

 all.equal(list(a=1:3), list(aa=1:3))
[1] Names: 1 string mismatches
 all.equal(list(aa=1:3), list(a=1:3))
[1] Names: 1 string mismatches

(the above run in R 2.0.0)
-- Tony Plate
(BTW, in R this operation is generally called indexing or subscripting 
or extraction, but not hashing.  Hashing is a specific technique for 
managing and looking up indices, which is why some other programming 
languages refer to list-like objects that are indexed by character strings 
as hashes.  I don't think hashing is used for list names in R, but 
someone please correct me if I'm wrong! )

At Thursday 09:29 AM 11/18/2004, ulas karaoz wrote:
hi all,
I am trying to use named list to hash a bunch of vector by name, for instance:
test = list()
test$name = c(1,2,3)
the problem is that when i try to get the values back by using the name, 
the matching isn't done in an exact way, so
test$na is not NULL.

is there a way around this?
Why by default all.equal.list doesnt require an exact match?
How can I do hashing in R?
thanks.
ulas.
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Re: Protocol for answering basic questions

2004-12-04 Thread Tony Plate

Perhaps something like the following paragraph should be added to the start 
of the Posting Guide (as a new paragraph right after the existing first 
paragraph):

Note that R-help is *not* intended for questions that are easily answered 
by consulting one of the FAQs or other introductory material (see Do your 
homework before posting below).Such questions are actively discouraged 
and are likely to evoke a brusque response.  Questions about seemingly 
simple matters that are mentioned in the FAQs or other introductory 
material *are welcomed* on R-help *when the questioner obviously has done 
their homework and the question is accompanied by an explanation* like FAQ 
7.2.1 seems to be relevant to this but I couldn't understand/apply the 
answer because 

Something like this would make it very clear up front what type of 
questions are not appropriate.  (I'm not trying at all to dictate the 
policy, but as far as I can tell, the above summaries the attitude of the 
majority of very knowledgeable helpers that respond to questions on R-help.)

Also, I think that John Maindonald's idea of a I am new to R, where do I 
start? page, with a link from the posting guide, is an excellent idea.

I'm aware that some feel that the posting guide is already too long, but my 
feeling is that if users don't read a very easily accessible posting guide 
AND post inappropriate questions AND become offended by brusque responses, 
then they are beyond where they can easily be helped.  The most important 
thing is to make it very clear what types of questions are and are not 
considered appropriate, so that beginning users know what they are getting 
into.

And the following might merit inclusion in the FAQ:
Why is R-help not for hand-holding beginner questions?
R-help is a high traffic list and the general sentiment is that too many 
very simple questions will overwhelm everyone and most importantly result 
in the knowledgeable helpers ceasing to participate.  The reason that there 
is no R-help-me-quickly-I-dont-want-to-read-the-documentation list is 
that no-one has felt that it would work well -- it is unlikely that many 
knowledgeable users of R would be willing to participate.  Without such 
users participating, it is likely that sometimes bad advice would be 
offered and stand uncorrected, because R is a complex language with many 
ways of doing things, some markedly inferior to others.  For these reasons, 
some feel it would be a very bad idea to create such a list.  (However, 
anyone who believes otherwise and wishes to start and maintain such a list 
or other similar service is free to do so.)  One reason for this overall 
state of affairs is that R is free software and consequently there is no 
revenue stream to support a hand-holding support service with paid 
employees.  So although the actual software is free, some investment in 
terms of time spent reading documentation is required in order to use 
it.  Furthermore, many of the frequent helpers on R-help have written 
introductory documents intended to help beginners with many aspects of 
learning and using R (e.g., An Introduction to R, and the various 
FAQs).  Consequently they sometimes get fed up getting asked again and 
again the same question they have already written a document to 
explain.  Nonetheless, the general sentiment on R-help is very helpful -- a 
quote summarizes it well: It's OK if you need some spoonfeeding (I need 
that quite often myself), but at least show how you have tried to use the 
spoon yourself, instead of just showing us your open mouth.  [Attribution 
to Andy Liaw, or remain anonymous?]

As some feel that sufficient time and bandwidth has already been spent on 
this issue, if anyone has any comments on this particular matter of an 
addition to the posting guide (or FAQ), feel free to choose to respond to 
me privately, and I will summarize as appropriate.

-- Tony Plate
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] Percentages in contingency tables warning trivial question

2004-12-13 Thread Tony Plate

The 'abind' function in the 'abind' package is a generalized binding 
functions for arrays.  (I've never tried it with tables.)

At Monday 04:36 AM 12/13/2004, BXC (Bendix Carstensen) wrote:
[...snip...]
The last step is necessary in the absence of a generalized cbind/rbind
for tables/arrays.
Please correct me if such a thing exists. If it does, it should be
referenced under see also in the help page for cbind.
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] reading the seed from a simulation

2004-12-17 Thread Tony Plate

With most modern random number generators you can't capture the current 
state in a single 32-bit integer.  (I suspect the .Random.seed you are 
seeing is the state contained in 625 integers).

The easiest way to run reproducible simulations is to explicitly set the 
seed, using an integer, before each run.  Then it's easy to put the random 
number generator into the same state again, e.g.:

for (sim.num in 1:100) {
  set.seed(sim.num)
  ... run simulation ...
}
If you can't do this, you can record the value of .Random.seed prior to the 
simulation, and then when you want to reproduce that simulation again, set 
.Random.seed to that value, e.g.:

 set.seed(1)
 sample(1:100, 5)
[1] 27 37 57 89 20
 sample(1:100, 5)
[1] 90 94 65 62  6
 set.seed(1)
 sample(1:100, 5)
[1] 27 37 57 89 20
 saved.seed - .Random.seed
 sample(1:100, 5)
[1] 90 94 65 62  6
 .Random.seed - saved.seed
 sample(1:100, 5)
[1] 90 94 65 62  6

This is not guaranteed to work with all random-number generators; see the 
NOTE section in ?set.seed

-- Tony Plate
At Friday 09:50 AM 12/17/2004, Suzette Blanchard wrote:
Greetings,
I have a simulation of a nonlinear model that
is failing.  But it does not fail til way into the simulation.
I would like to look at the run that is failing
and maybe I could if I could capture the seed for the
failing run.  The help file on set.seed says you can do it
but when I tried
rs-.Random.seed
print(paste(rs,rs,sep= ))
I got 626 of them so I don't know how to identify the right
one.  Please can you help?
Thank you,
Suzette
=
Suzette Blanchard, Ph.D.
UCSD-PPRU
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] two dimensional array of object elements

2005-02-11 Thread Tony Plate

Create your original matrix as a list datatype.  When assigning elements, 
be careful with the list structure, as the example indicates.

 m - 2; n - 3
 a - array(list(),c(m,n))
 a[1,2] - list(b=1,c=2)
Error in [-(`*tmp*`, 1, 2, value = list(b = 1, c = 2)) :
number of items to replace is not a multiple of replacement length
 a[1,2] - list(list(b=1,c=2))


At Friday 11:36 AM 2/11/2005, Weijie Cai wrote:
Hi list,
I want to create a two (possibly three) dimensional array of objects. 
These objects are classes in object oriented style. I failed by using
a-array(NA,c(m,n))
for (i in 1:m){
 for (j in 1:n){
   a[i,j]-My.Obj
 }
}

The elements are still NA. Any suggestions?
Thanks
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] problem using uniroot with integrate

2005-03-09 Thread Tony Plate

At Wednesday 09:27 AM 3/9/2005, Ken Knoblauch wrote:
Hi,
I'm trying to calculate the value of the variable, dp, below, in the
argument to the integral of dnorm(x-dp) * pnorm(x)^(m-1).  This
corresponds to the estimate of the sensitivity of an observer in an
m-alternative forced choice experiment, given the probability of
a correct response, Pc, a Gaussian assumption for the noise and
no bias.  The function that I wrote below gives me an error:
Error in f(x, ...) : recursive default argument reference
The problem seems to be at the statement using uniroot,
because the furntion est.dp works fine outside of the main function.
I've been using R for awhile but there are still many nuances
about the scoping and the use of environments that I'm weak on
and would like to understand better.  I would appreciate any
suggestions or solutions that anyone might offer for fixing
my error.  Thank you.
dprime.mAFC - function(Pc, m) {
est.dp - function(dp, Pc = Pc, m = m) {
  pr - function(x, dpt = dp, m0 = m) {
dnorm(x - dpt) * pnorm(x)^(m0 - 1)
}
  Pc - integrate(pr, lower = -Inf, upper = Inf,
  dpt = dp, m0 = m)$value
}
dp.res - uniroot(est.dp, interval = c(0,5), Pc = Pc, m = m)
dp.res$root
}
You've got several problems here
* recursive argument defaults: these are unnecessary but result in the 
particular error message you are seeing (e.g., in the def of est.dp, the 
default value for the argument 'm' is the value of the argument 'm' itself 
-- default values for arguments are interpreted in the frame of the 
function itself)
* the argument m=m you supply to uniroot() is being interpreted as 
specifying the 'maxiter' argument to uniroot()

I think you can fix it by changing the 'm' argument of function est.dp to 
be named 'm0', and specifying 'm0' in the call to uniroot.  (but I can't 
tell for sure because you didn't supply a working example -- when I just 
guess at values to pass in I get numerical errors.)
Also, it would be best to remove the incorrect recursive default arguments 
for the functions est.dp and pr.

-- Tony Plate
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] how to multiply a constant to a matrix?

2006-05-26 Thread Tony Plate

I still can't see why this is a problem.  If a 1x1 matrix should be 
treated as a scalar, then it can just be wrapped in drop(), and the 
arithmetic will be computed correctly by R.

Are there any cases where this cannot be done?  More specifically, are 
there any matrix algebra expressions where, depending on the particular 
dimensions of the variables used, drop() must be used in some cases, and 
not in other cases?

A related but different behavior is the default dropping dimensions with 
extent equal to one by indexing operations.  This can be problematic 
because if one is not careful, incorrect results can be obtained for 
particular values used in the expression.

For example, consider the following, in which we are trying to compute 
the cross product of some columns of x with some rows of y.  If x has n 
rows and y has n columns, then the result should always be an nxn 
matrix.  However, if we are not careful with using drop=F in the 
indexing expressions, we can inadvertently end up with a 1x1 inner 
product matrix result for the case where we just use one column of x and 
one row of y.  The solution to this is to always use drop=F in indexing 
in situations where this can occur.

  x - matrix(1:9, ncol=3)
  y - matrix(-(1:9), ncol=3)
  i - 1:2
  x[,i] %*% y[i,]
  [,1] [,2] [,3]
[1,]   -9  -24  -39
[2,]  -12  -33  -54
[3,]  -15  -42  -69
  i - 1:3
  x[,i] %*% y[i,]
  [,1] [,2] [,3]
[1,]  -30  -66 -102
[2,]  -36  -81 -126
[3,]  -42  -96 -150
  # i has just one element -- the expression without drop=F
  # no longer computes an outer product
  i - 2
  x[,i] %*% y[i,]
  [,1]
[1,]  -81
  x[,i,drop=F] %*% y[i,,drop=F]
  [,1] [,2] [,3]
[1,]   -8  -20  -32
[2,]  -10  -25  -40
[3,]  -12  -30  -48
 

Cannot all cases in the situations you mention be handled in an 
analogous manner, by always wrapping appropriate quadratic expressions 
in drop(), or are there some cases where the result of the quadratic 
expression must be treated as a matrix, and other cases where the result 
of the quadratic expression must be treated as a scalar?

-- Tony Plate

Michael wrote:
 imagine when you have complicated matrix algebra computation using R,
 
 you cannot prevent some middle-terms become quadratic and absorbs into one
 scalar, right?
 
 if R cannot intelligently determine this, and you  have to manually add
 drop everywhere,
 
 do you think it is reasonable?
 
 On 5/23/06, Patrick Burns [EMAIL PROTECTED] wrote:
 
I think

drop(B/D) * solve(A)

would be a more transparent approach.

It isn't that R can not do what you want, it is that
it is saving you from shooting yourself in the foot
in your attempt.  What you are doing is not really
a matrix computation.


Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and A Guide for the Unwilling S User)

Michael wrote:


This is very strange:

I want compute the following in R:

g = B/D * solve(A)

where B and D are  quadratics so they are just a scalar number, e.g.

B=t(a)

%*% F %*% a;

I want to multiply B/D to A^(-1),

but R just does not allow me to do that and it keeps complaining that
nonconformable array, etc.


I tried the following two tricks and they worked:

as.numeric(B/D) * solve(A)

diag(as.numeric(B/D), 5, 5) %*% solve (A)



But if R cannot intelligently do scalar and matrix multiplication, it is
really problemetic.

It basically cannot be used to do computations, since in complicated

matrix

algebras, you have to distinguish where is scalar, and scalars obtained

from

quadratics cannot be directly used to multiply another matrix, etc. It is
going to a huge mess...

Any thoughts?

  [[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!

http://www.R-project.org/posting-guide.html




 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] max / pmax

2006-05-30 Thread Tony Plate

Here's an example of how I think you can do what you want.  Play with 
the definition of the function highest.use() to get random selection of 
multiple maxima.

  drug.names - c(marijuana, crack, cocaine, heroin)
  drugs - factor(drug.names, levels=drug.names)
  drugs
[1] marijuana crack cocaine   heroin
Levels: marijuana crack cocaine heroin
  as.numeric(drugs)
[1] 1 2 3 4
  N - 20
  set.seed(1)
  primary.drug - sample(drugs, N, rep=T)
  primary.drug[sample(1:20, 10)] - NA
  primary.drug
  [1] NA  crack NA  NA  NA  NA  heroin
  [8] cocaine   cocaine   marijuana NA  NA  cocaine   crack
[15] heroinNA  cocaine   heroinNA  NA
Levels: marijuana crack cocaine heroin
  # usage frequencies
  marijuana - sample(1:3, N, rep=T)
  crack - sample(1:3, N, rep=T)
  cocaine - sample(1:3, N, rep=T)
  heroin - sample(1:3, N, rep=T)
  cbind(marijuana, crack, cocaine, heroin)
   marijuana crack cocaine heroin
  [1,] 2 2   2  1
  [2,] 2 3   3  1
  [3,] 2 2   2  2
  [4,] 1 1   2  3
  [5,] 3 1   2  3
  [6,] 3 1   3  3
  [7,] 3 1   3  2
  [8,] 1 2   2  2
  [9,] 3 2   3  3
[10,] 2 2   3  2
[11,] 3 3   2  2
[12,] 2 1   3  2
[13,] 3 2   2  1
[14,] 2 1   1  3
[15,] 2 2   3  2
[16,] 3 1   1  1
[17,] 1 2   3  1
[18,] 2 3   1  2
[19,] 3 1   1  3
[20,] 3 3   1  2
  highest.use - function(x) {y - which(x==max(x, na.rm=T)); if 
(length(y)==1) return(y) else return(NA)}
  apply(cbind(marijuana, crack, cocaine, heroin), 1, highest.use)
  [1] NA NA NA  4 NA NA NA NA NA  3 NA  3  1  4  3  1  3  2 NA NA
  impute.primary.drug - drugs[ifelse(is.na(primary.drug), 
apply(cbind(marijuana, crack, cocaine, heroin), 1, highest.use), 
as.numeric(primary.drug))]
  data.frame(primary.drug, impute.primary.drug)
primary.drug impute.primary.drug
1  NANA
2 crack   crack
3  NANA
4  NA  heroin
5  NANA
6  NANA
7heroin  heroin
8   cocaine cocaine
9   cocaine cocaine
10marijuana   marijuana
11 NANA
12 NA cocaine
13  cocaine cocaine
14crack   crack
15   heroin  heroin
16 NA   marijuana
17  cocaine cocaine
18   heroin  heroin
19 NANA
20 NANA
 


Brian Perron wrote:
 Hello R users,
 
 I am relatively new to R and cannot seem to crack a coding problem.  I 
 am working with substance abuse data, and I have a variable called 
 primary.drug which is considered the drug of choice for each 
 subject.   I have just a few missing values on that variable.  Instead 
 of using a multiple imputation method like chained equations, I would 
 prefer to derive these values from other survey responses.  
 Specifically, I have a frequency of use (in days) for each of the major 
 drugs, so I would like the missing values to be replaced by that drug 
 with the highest level of use.  I am starting with the ifelse and 
 max statements, but I know it is wrong:
 
 impute.primary.drug -   ifelse(is.na(primary.drug), max(marijuana, 
 crack, cocaine, heroin), primary.drug)
 
 Here are the problems.  First, the max statement (should it be pmax?), 
 returns the highest numeric quantity rather than the variable itself.  
 In other words, I want to test which drug has the highest value, but 
 return the variable name rather than the observed value.   Second, if 
 ties are observed, how can I specify the value to be NA?  Or, how can I 
 specify one of the values to be randomly selected?   
 
  Thank in advance for your assistance.
 
 Regards,
 Brian
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] References verifying accuracy of R for basic statistical calculations and tests

2006-07-13 Thread Tony Plate

This might be a place to start:

http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html

Among the references listed there are:

Assessing the Reliability of Statistical Software: Part I by B. D. 
McCullough (1998)
http://www.amstat.org/publications/tas/mccull-1.pdf

Assessing the Reliability of Statistical Software: Part II by B. D. 
McCullough (1999)
http://www.amstat.org/publications/tas/mccull.pdf

Those might have some relevance

Then, doing within an R session:

  RSiteSearch(Assessing Reliability Statistical Software)

turns up 14 hits, many of them looking relevant

[leaving the and of in the query results in the search engine timing 
out - odd?]

-- Tony Plate


Corey Powell wrote:
 Do you know of any references that verify the accuracy of R for basic 
 statistical calculations and tests.  The results of these studies should 
 indicate that R results are the same as the results of other statistical 
 packages to a certain number of decimal places on some benchmark calculations.
 
 Thanks,
 
 Corey Powell
 Clinical Data Analyst
 Broncus Technologies
 [EMAIL PROTECTED]
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Subset dataframe based on condition

2006-04-17 Thread Tony Plate

Works OK for me:

  x - data.frame(a=10^(-2:7), b=10^(10:1))
  subset(x, a  1)
a b
4  1e+01 1e+07
5  1e+02 1e+06
6  1e+03 1e+05
7  1e+04 1e+04
8  1e+05 1e+03
9  1e+06 1e+02
10 1e+07 1e+01
  subset(x, a  1  b  a)
ab
8  1e+05 1000
9  1e+06  100
10 1e+07   10
 

Do you get all numeric for the following?

  sapply(x, class)
 a b
numeric numeric
 

If not, then your data frame is probably encoding the information in 
some way that you don't want (though if it was as factors, I would have 
expected a warning from the comparison operator).

You might get more help by distilling your problem to a simple example 
that can be tried out by others.

-- Tony Plate

Sachin J wrote:
 Hi,

   I am trying to extract subset of data from my original data frame 
 based on some condition. For example : (mydf -original data frame, submydf 
 - subset dada frame)

   submydf = subset(mydf, a  1  b = a), 

   here column a contains values ranging from 0.01 to 10. I want to 
 extract only those matching condition 1 i.e a  . But when i execute 
 this command it is not giving me appropriate result. The subset df - 
 submydf  contains rows with 0.01 also. Please help me to resolve this 
 problem.

   Thanks in advance.

   Sachin
 
   
 -
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] test for end of file on connection

2004-05-11 Thread Tony Plate

With the text of your message copied to the clipboard:

 con - file(clipboard, r)
 readLines(con, 1)
[1] I am looking for a function to test for end-of-file on a connection.
 readLines(con, 1)
[1] Apparently this question was already asked a couple of years ago and
 readLines(con, 1)
[1] then P. Dalgaard suggested to look at help(connections),
 readLines(con, 1)
[1] help(readLines). Unfortunately, I couldn't find such a function on those
 readLines(con, 1)
[1] pages, maybe I am missing something.
 readLines(con, 1)
character(0)

i.e., readLines() returns a zero length result upon reaching end of 
file.  AFAIK the other file reading functions have similar behavior.  It's 
still worth reading in detail the help for readLines().

hope this helps,

Tony Plate

At Tuesday 12:08 AM 5/11/2004, Vadim Ogranovich wrote:
Hi,

I am looking for a function to test for end-of-file on a connection.
Apparently this question was already asked a couple of years ago and
then P. Dalgaard suggested to look at help(connections),
help(readLines). Unfortunately, I couldn't find such a function on those
pages, maybe I am missing something.
Did anyone figure this out?

Thanks,
Vadim


[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] privileged slots,

2004-06-03 Thread Tony Plate

At Tuesday 03:44 AM 6/1/2004, Jari Oksanen wrote:
 [snip]
There are several other things that were fully documented and still were
removed. One of the latest cases was print.coefmat which was abruptly
made Defunct without warning or grace period: code written for 1.8*
didn't work in 1.9.0 and if corrected for 1.9.0 it wouldn't work in
pre-1.9.0. Anything can change in R without warning, and your code may
be broken anytime. Just be prepared.
This is true of many software packages.  In our production environment we 
often (usually) run older versions of software, including statistical 
software, because of bugs or changed behaviors (or fears thereof) in new 
versions.  We usually run the latest versions in our test and 
non-production systems and only upgrade our production systems when two 
conditions are satisfied: (1) we need the features in the upgrade and (2) 
we are comfortable that the upgraded package will run reliably.  From what 
I can see, R is only distinguished from other software packages in these 
regards by the extreme speed with which bug fixes for the latest version 
are made available (in contrast, we're still waiting more than a year for 
fixes for bugs in some commercial software that were described as 
critical bugs by the vendor's support team) and the high level of respect 
accorded to users by the core developers (changes are debated and effects 
on existing software seem to be taken seriously).

One very helpful tool to deal with software updates is automated 
testing.  I highly recommend it.  R comes with a testing framework.

-- Tony Plate
cheers, jari oksanen
--
Jari Oksanen [EMAIL PROTECTED]
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Importing binary data

2004-06-03 Thread Tony Plate

Probably the simplest way to improve the speed of your code would be to 
write the data so that all the data in a column is contiguous.  Then you'll 
be able to read each column with a single call to readBin().

hope this helps,
Tony Plate
At Tuesday 04:02 AM 6/1/2004, Uli Tuerk wrote:
Hi everybody!
I've a large dataset, about 2 Mio entries of the format which I would like
to import into a frame:
integerintegerfloatstringfloatstringstring
Because to the huge data amount I've choosen a binary format instead
of a text format when exporting from Matlab.
My import function is attached below. It works fine for only some entries
but is deadly slow when trying to read the complete set.
Does anybody has some pointers for me for improving the import or handling
such large data sets?
Thanks in advance!
Uli

read.DET.data - function ( f ) {
counter - 1
spk.v - c()
imp.v - c()
score.v - c()
th.v - c()
ses.v - c()
rec.v - c()
type.v - c()
fid - file( f ,rb)
tempi - readBin(fid , integer(), size=1, signed=FALSE)
while ( length(tempi) != 0) {
spk.v[ counter ] - tempi
imp.v[ counter ] - readBin(fid, integer(), size=1, 
signed=FALSE)
score.v[ counter  ] - readBin(fid, numeric(), size=4)
type.v[ counter ] - readBin(fid, character())
th.v[ counter ] - readBin(fid, numeric(), size=4)
ses.v[ counter ] - readBin(fid, character())
rec.v[ counter ] - readBin(fid, character())
counter - counter + 1
tempi - readBin(fid, integer(), size=1, signed=FALSE)
}
close( fid )
spkf - factor ( spk.v )
impf - factor ( imp.v )

det.f - data.frame( spk=spkf, imp=impf, score=score.v, th=th.v, 
ses=ses.v, rec=rec.v, type=type.v)

det.f
}
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How to Describe R to Finance People

2004-06-08 Thread Tony Plate

At Monday 07:58 PM 6/7/2004, Richard A. O'Keefe wrote:
[snip]
There are three perspectives on programming languages like the S/R family:
(1) The programming language perspective.
I am sorry to tell you that the only excuse for R is S.
R is *weird*.  It combines error-prone C-like syntax with data structures
that are APL-like but not sufficiently* APL-like to have behaviour that
is easy to reason about.  The scope rules (certainly the scope rules for
S) were obviously designed by someone who had a fanatical hatred of
compilers and wanted to ensure that the language could never be usefully
compiled.
What in particular about the scope rules for S makes it tough for 
compilers?  The scope for ordinary variables seems pretty straightforward 
-- either local or in one of several global locations.  (Or are you 
referring to the feature of the get() function that it can access variables 
in any frame?)


  Thanks to 'with' the R scope rules are little better.  The
fact that (object)$name returns NULL instead of reporting an error when
the object doesn't _have_ a $name property means that errors can be
delayed to the point where debugging is harder than it needs to be.
Yup, that's why I proposed (and provided an implementation) of an 
alternative $$ operator that did report an error when object$$name didn't 
have a name component (and also didn't allow abbreviation), but there was 
no interest shown in incorporating this into R.

-- Tony Plate
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] direct data frame entry

2004-06-09 Thread Tony Plate

easy to do it by column:
 d - 
data.frame(name=c(obs1name,obs2name,obs3name),val1=c(0.2,0.4,0.6),val2=c(0.3,1.0,2.0),row.names=c(r1,r2,r3))
 d
   name val1 val2
r1 obs1name  0.2  0.3
r2 obs2name  0.4  1.0
r3 obs3name  0.6  2.0


(when you do it by row, you get the numbers as factors because 
c(obs1name, 0.2, 0.3) etc. are character vectors)

At Wednesday 01:29 PM 6/9/2004, ivo welch wrote:
hi:  I searched the last 2 hours for a way to enter a data frame directly 
in my program.  (I know how to read from a file.)  that is, I would like 
to say something like

   d - this.is.a.data.frame(   c(obs1name, 0.2, 0.3),
 c(obs2name, 0.4, 1.0),
 c(obs3name, 0.6, 2.0) , 
varnames=c(name, val1, val2)  );

everything I have tried sofar (usually, building with rbind and then 
names(d)) has come out with factors for the numbers, which is obviously 
not what I want.   this must be a pretty elementary request, so it should 
probably be an example under data.frame (or read.table).  of course, it is 
probably somewhere---just I have do not remember it and could not find it 
after 2 hours of searching.  I also tried the r-help archives---at the 
very least, I hope we will get the answer there for future lookups.

regards, /iaw
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] a scope problem

2004-06-10 Thread Tony Plate

This looks like it probably is a scope problem with non-standard evaluation 
rules for the argument subset= of nnet.

Instead of subset=sub[-i], try data=dftc[-i,]  (I've not tested this since 
I don't have the data objects you used.)

hope this helps,
Tony Plate
At Thursday 04:38 PM 6/10/2004, you wrote:
Hi,
 I have some code that looks like:
dftc - df[sets$tcset,]
pt - numeric(nrow(dftc))
sub - 1:nrow(dftc)
for (i in 1:nrow(dftc)) {
n - nnet( fmla, data=dftc, weights=wts, subset=sub[-i], size=4,
decay=0.01)
pt[i] - predict( n, dftc[ i, ], type='class' )
}
However running this give me the error:
Error in eval(expr, envir, enclos) : Object i not found
I have noted this problem in some other instances. For example if I
define a function
f - function( dat, sets ) {
  # use sets
}
I sometimes get an error similar to that above.
Does anybody know why this would happen?
(R 1.9.0 on Fedora Core 2)
---
Rajarshi Guha [EMAIL PROTECTED] http://jijo.cjb.net
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
---
All laws are simulations of reality.
-- John C. Lilly
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Elementary sapply question

2004-06-21 Thread Tony Plate

At Monday 12:57 PM 6/21/2004, Ajay Shah wrote:
[...snip...]
I am aware of the ... in sapply(). I am unable to understand how
sapply will know where to utilise the x[i] values: as the 1st arg or
the 2nd arg for f(x, y)?
That is, when I say:
 sapply(x, f, 3)
how does sapply know that I mean:
for (i in 3:5) {
f(i, 3)
}
and not
for (i in 3:5) {
f(3, i)
}
How would we force sapply to use one or the other interpretation?
All the functions in the apply() family construct the call by just 
appending the additional arguments after the first.  If you supply argument 
names for the additional arguments, those will be supplied to the function 
called.  This can be used to force different interpretations of 
arguments.  E.g:

 sapply(3:5, function(x, y) {return(y)}, 1)
[1] 1 1 1
 sapply(3:5, function(x, y) {return(y)}, y=1)
[1] 1 1 1
 sapply(3:5, function(x, y) {return(y)}, x=1)
[1] 3 4 5
 sapply(3:5, function(x, y) {return(y)}, z=1)
Error in FUN(X[[as.integer(1)]], ...) : unused argument(s) (z ...)

In the third example, the actual set of arguments in the call to the 
anonymous function is something like (3, x=1), so the standard argument 
interpretation rules result in the arguments having the values y=3, x=1.

hope this help,
Tony Plate


Thanks,
-ans.
--
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Re: summaries (was: SUMMARY: elementary sapply question)

2004-06-24 Thread Tony Plate

Posting summaries is customary (or used to be) on S-news, where it was 
customary to reply to the poster, and not always the whole list.  (Whereas 
on R it is requested that replies be posted to the entire list, which makes 
summaries less necessary.)

However, a good summary can be a very useful thing (and Ajay's summary was 
very nicely done).  What about making it the custom on R-list that the 
recipient of helpful responses post a summary on a Wiki?  This could be a 
good way for recipients of help to give something back to the community, 
and it might provide a sufficient input of energy to take a wiki past 
critical mass, such as the one mentioned by Gabor Grothendieck last year:

 From: Gabor Grothendieck [EMAIL PROTECTED]
MIME-Version: 1.0
 Date: Wed, 17 Dec 2003 11:53:59 -0500 (EST)
 [snip]  Actually someone did set up an R wiki some time ago at:

  http://fawn.unibw-hamburg.de/cgi-bin/Rwiki.pl?RwikiHome

 yet no one really used it.  Some critical mass of use is needed
 to get such a project off the ground.
Comments?
-- Tony Plate
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] naive question

2004-06-30 Thread Tony Plate

As far as I know, read.table() in S-plus performs similarly to read.table() 
in R with respect to speed.  So, I wouldn't put high hopes in finding much 
satisfaction there.

I do frequently read large tables in S-plus, and with a considerable amount 
of work was able to speed things up significantly, mainly by using scan() 
with appropriate arguments.  It's possible that some of the add-on modules 
for S-plus (e.g., the data-mining module) have faster I/O, but I haven't 
investigated those.  I get the best read performance out of S-plus by using 
a homegrown binary file format with each column stored in a contiguous 
block of memory and meta data (i.e., column types and dimensions) stored at 
the start of the file.  The S-plus read function reads the columns one at a 
time using readRaw(). One would be able to do something similar in R.  If 
you have to read from a text file, then, as others have suggested, writing 
a C program wouldn't be that hard, as long as you make the format inflexible.

-- Tony Plate
At Tuesday 06:19 PM 6/29/2004, Igor Rivin wrote:
I was not particularly annoyed, just disappointed, since R seems like
a much better thing than SAS in general, and doing everything with a 
combination
of hand-rolled tools is too much work. However, I do need to work with 
very large data sets, and if it takes 20 minutes to read them in, I have 
to explore other
options (one of which might be S-PLUS, which claims scalability as a major
, er, PLUS over R).

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] naive question

2004-06-30 Thread Tony Plate

To be careful, there's lots more to I/O than the functions read.table()  
scan() -- I was only commenting on those, and no inference should be made 
about other aspects of S-plus I/O based on those comments!

I suspect that what has happened is that memory, CPU speed, and I/O speed 
have evolved at different rates, so what used to be acceptable code in 
read.table() (in both R and S-plus) is now showing its limitations and has 
reached the point where it can take half an hour to read in, on a 
readily-available computer, the largest data table that can be comfortably 
handled.  I'm speculating, but 10 years ago,  on a readily available 
computer, did it take half an hour to read in the largest data table that 
could be comfortably handled in S-plus or R?  People who encounter this now 
are surprised and disappointed, and IMHO, somewhat justifiably so.  The 
fact that R is an open source volunteer project suggests that the time is 
ripe for one of those disappointed people to fix the matter and contribute 
the function read.table.fast()!

-- Tony Plate
At Wednesday 10:08 AM 6/30/2004, Igor Rivin wrote:
Thank you! It's interesting about S-Plus, since they apparently try to support
work with much larger data sets by writing everything out to disk (thus 
getting
around the, eg, address space limitations, I guess), so it is a little 
surprising
that they did not tweak the I/O more...

Thanks again,
Igor
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Can R read data from stdin?

2004-07-09 Thread Tony Plate

The easiest way would probably be to do the hack of creating a temporary 
file to hold stdin, then call R to process that file.  That would be easy 
to do in a shell script.

If this really won't suffice, this older message might lead to something 
useful:

Rd] R scripting patches for R-1.8.0
Neil McKay mckay at repsac.gmr.com
Thu Oct 16 20:30:20 MEST 2003
Previous message: [Rd] data() misbehaving inside a function
Next message: [Rd] R scripting patches for R-1.8.0
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I've updated my scripting patches to R-1.8.0. These patches
allow you to write shell scripts in R (at least on *nix systems)
by putting
#!/path/to/R.bin --script
on the first line of the script file. If you're interested
in the patches, e-mail me at
mckay at gmr.com
--
Neil D. McKay, Mail Code 480-106-359Phone: (586)986-1470 (GM:8-226-1470)
Manufacturing Systems Research Lab  FAX:   (586)986-0574 (GM:8-226-0574)
GM Research  Development CenterInternet e-mail: mckay at gmr.com
30500 Mound Road
Warren, Mich. 48090

At Friday 02:17 PM 7/9/2004, Hayashi Soichi - shayas wrote:
Is there anyway I can write a script which feed input datasource from stdin
and let R process it (maybe frequency report) then output the report to
stdout?

I can't seem to find much info on documentation or FAQ on this topic.

Thanks!
Soichi Hayashi

**
The information contained in this communication is
confidential, is intended only for the use of the recipient
named above, and may be legally privileged.
If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination,
distribution, or copying of this communication is strictly
prohibited.
If you have received this communication in error,
please re-send this communication to the sender and
delete the original message or any copy of it from your
computer system. Thank You.
[[alternative HTML version deleted]]
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Regular Expressions

2004-07-12 Thread Tony Plate

I'd suggest doing it with multiple regular expressions -- you could 
construct a single regular expression for this, but I expect it would get 
quite complicated and possibly very slow.

The expression for y in the example below tabulates how many words 
matched for each line (i.e., line 2 matched 1 word, line 3 matched 3 words, 
and line 4 matched 2 words).

 x - readLines(clipboard, -1)
 x
[1] Is there a way to use regular expressions to capture two or more words 
in a 
[2] sentence?  For example, I wish to to find all the lines that have the 
words \thomas\, 
[3] \perl\, and \program\, such as \thomas uses a program called 
perl\, or \perl is a 
[4] program that thomas uses\, 
etc.
 sapply(c(perl,program,thomas), function(re) grep(re, x))
$perl
[1] 3

$program
[1] 3 4
$thomas
[1] 2 3 4
 unlist(sapply(c(perl,program,thomas), function(re) grep(re, x)), 
use.names=F)
[1] 3 3 4 2 3 4
 y - table(unlist(sapply(c(perl,program,thomas), function(re) 
grep(re, x)), use.names=F))
 y

2 3 4
1 3 2
 which(y=2)
3 4
2 3

hope this helps,
Tony Plate
At Monday 05:59 PM 7/12/2004, Sangick Jeon wrote:

Hi,
Is there a way to use regular expressions to capture two or more words in a
sentence?  For example, I wish to to find all the lines that have the 
words thomas,
perl, and program, such as thomas uses a program called perl, or 
perl is a
program that thomas uses, etc.

I'm sure this is a very easy task, I would greatly appreciate any 
help.  Thanks!

Sangick
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Stumped with subsetting

2004-07-29 Thread Tony Plate

Seems to work fine for me if I understand correctly what you're trying to 
do (there are some typos in your message, which may mean I'm not 
understanding):

 data - data.frame(x=1:3,y=4:6,z=7:9)
 data[c(x,y)]
  x y
1 1 4
2 2 5
3 3 6
 mylist - c(x,y)
 data[mylist]
  x y
1 1 4
2 2 5
3 3 6
 data[,mylist]
  x y
1 1 4
2 2 5
3 3 6

I'd generally use the second form of subsetting above (i.e., data[,mylist], 
because that will work with matrices as well).

hope this helps,
Tony Plate
At Thursday 01:22 PM 7/29/2004, Peter Wilkinson wrote:
This seems like such a trivial thing to do:
given a data.frame DF and variables w,v, x,y,z I can do
DF[x] or DF[c(x,y)]
if I create a vector, mylist = c(x,y)
then I do DF[mylist]
I am not getting x and y, I get something else.
what is the correct way to subset a data.frame by columns using a vector, 
as if I were doing DF[x,y]?

Peter
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] lapply drops colnames

2004-08-02 Thread Tony Plate

If you were preferring to use lapply() rather than for() for reasons of 
efficiency,you might want to test whether there actually is any 
difference.  In a little test case, involving a data frame with 10,000 
columns, I see no big difference.  The advantage of a for loop in your 
situation is that it makes it easy to get at the column names.

 x - data.frame(sapply(1:1, FUN=rnorm, n=100))
 system.time(x1 - unlist(lapply(x, sum)))
[1] 0.31 0.01 0.33   NA   NA
 system.time({x2 - numeric(ncol(x)); for (i in seq(len=ncol(x))) x2[i] 
- sum(x[[i]])})
[1] 0.27 0.00 0.27   NA   NA
 all.equal(x1, x2)
[1] TRUE


hope this helps,
Tony Plate
At Monday 04:35 PM 8/2/2004, Jack Tanner wrote:
Wolski wrote:
What you can do is to extend the column (list) by an addtional 
attribute  attr(mydataframe[i],info)-names(mydataframe)[i] and store 
theyr names in it.
OK, that's brilliant. Any ideas on how to do this automatically for every 
column in my dataframe? lapply(dataframe... fails for the obvious reason. 
Should I do something like this, or is for() to be avoided even in this case?

 for(i in 1:length(a)) {print(names(a)[i])}
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How to import specific column(s) using read.table?

2004-08-10 Thread Tony Plate

At Tuesday 01:55 PM 8/10/2004, F Duan wrote:
Thanks a lot.
Your way works perfect. And one more tiny question related to your codes:
My data file has many columns to be omitted (suppose the first 20 ones), 
but I
found scan(myfile, what=list(rep(NULL, 20), rep(0, 5)) doesn't work. I 
had to
to type NULL 20 times and 0 five times in the list(...).
That's because rep(NULL, 20) returns a single NULL -- it's not obvious what 
else it could sensibly return.  What you need to do is replicate 20 times a 
list containing NULL (and a list containing NULL is quite a different 
object to NULL).  E.g.:

 rep(NULL, 20)
NULL
 c(rep(list(NULL), 3), rep(list(0), 2))
[[1]]:
NULL
[[2]]:
NULL
[[3]]:
NULL
[[4]]:
[1] 0
[[5]]:
[1] 0

Tony Plate

But anyway, it works and saves a lot of memory for me. Thank you again.
Frank
Quoting Gabor Grothendieck [EMAIL PROTECTED]:
 Gabor Grothendieck ggrothendieck at myway.com writes:

 :
 : F Duan f.duan at yale.edu writes:
 :
 :  I have a very big tab-delim txt file with header and I only want to
 import
 :  several columns into R. I checked the options for read.table 
and only

 :
 : Try using scan with the what=list(...) and flush=TRUE arguments.
 : For example, if your data looks like this:
 :
 : 1 2 3 4
 : 5 6 7 8
 : 9 10 11 12
 : 13 14 15 16
 :
 : then you could read columns 2 and 4 into a list with:
 :

 oops. That should be 1 and 3.

 :scan(myfile, what = list(0, NULL, 0), flush = TRUE)
 :
 : or read in and convert to a data frame via:
 :
 :do.call(cbind, scan(myfile, what = list(0, NULL, 0), flush = TRUE))

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html



__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] numerical accuracy, dumb question

2004-08-14 Thread Tony Plate

At Friday 08:41 PM 8/13/2004, Marc Schwartz wrote:
Part of that decision may depend upon how big the dataset is and what is
intended to be done with the ID's:
 object.size(1011001001001)
[1] 36
 object.size(1011001001001)
[1] 52
 object.size(factor(1011001001001))
[1] 244
They will by default, as Andy indicates, be read and stored as doubles.
They are too large for integers, at least on my system:
 .Machine$integer.max
[1] 2147483647
Converting to a character might make sense, with only a minimal memory
penalty. However, using a factor results in a notable memory penalty, if
the attributes of a factor are not needed.
That depends on how long the vectors are.  The memory overhead for factors 
is per vector, with only 4 bytes used for each additional element (if the 
level already appears).  The memory overhead for character data is per 
element -- there is no amortization for repeated values.

 object.size(factor(1011001001001))
[1] 244
 
object.size(factor(rep(c(1011001001001,111001001001,001001001001,011001001001),1)))
[1] 308
 # bytes per element in factor, for length 4:
 
object.size(factor(rep(c(1011001001001,111001001001,001001001001,011001001001),1)))/4
[1] 77
 # bytes per element in factor, for length 1000:
 
object.size(factor(rep(c(1011001001001,111001001001,001001001001,011001001001),250)))/1000
[1] 4.292
 # bytes per element in character data, for length 1000:
 
object.size(as.character(factor(rep(c(1011001001001,111001001001,001001001001,011001001001),250/1000
[1] 20.028


So, for long vectors with relatively few different values, storage as 
factors is far more memory efficient (this is because the character data is 
stored only once per level, and each element is stored as a 4-byte 
integer).  (The above was done on Windows 2000).

-- Tony Plate
If any mathematical operations are to be performed with the ID's then
leaving them as doubles makes most sense.
Dan, more information on the numerical characteristics of your system
can be found by using:
.Machine
See ?.Machine and ?object.size for more information.
HTH,
Marc Schwartz
On Fri, 2004-08-13 at 21:02, Liaw, Andy wrote:
 If I'm not mistaken, numerics are read in as doubles, so that shouldn't 
be a
 problem.  However, I'd try using factor or character.

 Andy

  From: Dan Bolser
 
  I store an id as a big number, could this be a problem?
 
  Should I convert to at string when I use read.table(...
 
  example id's
 
  1001001001001
  1001001001002
  ...
  1002001002005
 
 
  Bigest is probably
 
  1011001001001
 
  Ta,
  Dan.
 

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Suggestion for posting guide

2004-08-20 Thread Tony Plate

When I originally compiled the posting guide many people felt that it 
should be kept as concise as possible, so that its length would not 
discourage people from reading it.  (It probably ended up too long 
anyway.)  So I wouldn't really recommend adding a section of this length 
too it.

That said, a question posted with a good example that can be cut and pasted 
directly into R is far easier to respond to, so it does seem like a good 
idea to help people create such things.  If someone (Gabor?) wanted to 
create a page on how to provide good examples in posts, the people who 
control what gets put on the R-project site might be willing to put it up 
there, and a link to it from the posting guide would seem like a good idea.

Tony Plate
At Thursday 07:17 AM 8/19/2004, Gabor Grothendieck wrote:
I have a suggestion for the posting guide.  One problem with some posts is 
that
they do not provide an example that can be reproduced.   I think that many
people just do not know how to easily specify some data and some technical
assistance should be provided in the posting guide.  If the problem
depends on specific data they should be made aware, in the posting guide, of:

   dput(x)
since that outputs object x as R code which can then be easily copied from the
post and pasted into a session.  If its not dependent on particular data they
can generate patterned or random data IF THEY KNOW HOW but many might find it
easier to just use one of the included datasets so some guidance should be
provided on the contents of a few of them, e.g.
R comes with built in data sets.  data() will list them, data(iris) will
attach data set iris and ?iris, str(iris), summary(iris), head(iris)
and dput(iris) will give more information on iris (after attaching it).
The following are a few of the datasets that come with R:
   iris - data frame with 4 numeric columns and one 3 level factor
   nhtemp - a ts class time series
   faithful - data frame with two numeric columns
   warpbreaks - data with a numeric column, a 2-level factor  a 3-level 
factor

Also letters, LETTERS, month.abb and month.name are built in character vectors
that do not require a data statement to access.
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Loss of rownames and colnames

2004-08-20 Thread Tony Plate

At Friday 11:46 AM 8/20/2004, Min-Han Tan wrote:
Hi,
I am working on some microarray data, and have some problems with
writing iterations.
In essence, the problem is that objects with three dimensions don't
have rownames and colnames. These colnames and rownames would
otherwise still be there in 2 dimensional objects.
I need to generate multiple iterations of a 2 means-clustering
algorithm, and these objects thus probably need 3 dimensions.
What objects are you using that are three dimensional but don't have 
dimension names?  Ordinary arrays have dimension names, and rownames() and 
colnames() extract the names on the first two dimensions:

 x - 
array(1:12,dim=c(2,3,2),dimnames=list(letters[1:2],LETTERS[24:26],letters[20:21]))
 x
, , t

  X Y Z
a 1 3 5
b 2 4 6
, , u
  X  Y  Z
a 7  9 11
b 8 10 12
 dimnames(x)[[1]]
[1] a b
 dimnames(x)[[2]]
[1] X Y Z
 x[,Y,]
  t  u
a 3  9
b 4 10
 rownames(x)
[1] a b
 colnames(x)
[1] X Y Z

If you need a convenient way to construct three dimensional objects, you 
can use the abind() package, e.g.:

 library(abind) # you will have to install the package from CRAN first
 x1 - matrix(1:6,nrow=2,dimnames=list(letters[1:2],LETTERS[24:26]))
 x2 - matrix(7:12,nrow=2,dimnames=list(letters[1:2],LETTERS[24:26]))
 abind(list(t=x1, u=x2), along=3)
, , t
  X Y Z
a 1 3 5
b 2 4 6
, , u
  X  Y  Z
a 7  9 11
b 8 10 12

(The objects to be bound don't have to be given to abind() in a list, but 
this manner of invocation is convenient when one happens to have a list of 
objects to be bound together, as one might get in the result from lapply().)


My scripts are all written with heavy references to matching of
colnames and rownames, so I am running into some problems here.
(colnames = sample ids, and rownames = gene ids)
Last time I looked, subscripting matrices and arrays with strings was very 
slow (for large objects), so if you are using character subscripts and 
you're having problems with slowness, consider doing the indexing yourself 
using match(), e.g.:

 x - matrix(rnorm(26^4), ncol=26, 
dimnames=list(paste(rep(letters,each=26^2),rep(letters,each=26),letters,sep=), 
LETTERS))
 dim(x)
[1] 1757626
 xr - sample(rownames(x), 1)
 length(xr)
[1] 1
 system.time(y - x[xr, ])
[1] 2.22 0.00 2.30   NA   NA
 system.time(y - x[match(xr, rownames(x)), ])
[1] 0.09 0.00 0.09   NA   NA


HTH
-- Tony Plate
My bad workaround solution so far has been to generate objects tagged
with .2, and have multiple blocks of code.
e.g.
test.1 - ...
test.2 - ...
test.x - ..
The obvious problem with this solution is that there does not seem to
be an easy way of manipulating all these objects together without
typing out their names individually.
Thanks for any advice.
Regards,
Min-Han
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] apply ( , , table)

2004-08-24 Thread Tony Plate

apply() tries to be a bit smart about what it does (sometimes maybe too 
smart), but it actually is pretty useful a lot of the time.  It's extremely 
widely used, so changing the behavior is not an option -- changing the 
behavior would break a lot of existing code.  (Personally, I'd prefer it if 
apply() put its dimensions back together in a slightly more intelligent 
way, i.e., if apply(x, 1, c) and apply(x, 2, c) returned the same thing, 
but apply is how it is.)

In situations where you don't want apply() to try to construct a matrix 
from your results, you can wrap the results in a list, to force apply() to 
return just a list of results, e.g. (the outer lapply() strips off an 
unnecessary level of list depth):

 b2 - lapply(apply (a, 1, function(x) list(table(x))), [[, 1)
 length(b2)
[1] 4
 b2[[1]]
x
1 2 6 7
2 1 1 1
 attributes(b2[[1]])
$dim
[1] 4
$dimnames
$dimnames$x
[1] 1 2 6 7
$class
[1] table
Your particular case might benefit from more information given to table, 
which allows it to provide results in a more uniform format, e.g.:

 b1 - apply (a, 1, function(x) table(factor(x, levels=0:9)))
 b1
  [,1] [,2] [,3] [,4]
00100
12112
21001
30100
40220
50011
61001
71000
80010
90000

hope this helps,
Tony Plate
At Tuesday 10:42 AM 8/24/2004, [EMAIL PROTECTED] wrote:


a - matrix (c(
7, 1, 1, 2, 6,
3, 4, 0, 1, 4,
5, 1, 8, 4, 4,
6, 1, 1, 2, 5), nrow=4, byrow=TRUE)
b - apply (a, 1, table)
apply documentation says clearly that if the rows of the result of FUN
are the same length, then an array will be returned.  And column-major
would be the appropriate order in R.  But b above is pretty opaque
compared to what one would expect, and what one would get from apply (
, , table) if the rows were not of equal length.  One needs to do
something like
n - matrix (apply (a, 1, function (x) unique (sort (x))), nrow=nrow(a))
to get the corresponding names of b to figure out the counts.
Denis White
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] (no subject)

2004-08-24 Thread Tony Plate

Looks like there might have been some truncation of Jonathon Baron's message.
Here's one way of computing the sample mode of a vector.
 set.seed(1)
 x - sample(1:5,20,rep=T)
 x
 [1] 2 2 3 5 2 5 5 4 4 1 2 1 4 2 4 3 4 5 2 4
 table(x)
x
1 2 3 4 5
2 6 2 6 4
 names(which.max(table(x)))
[1] 2

Note that this method returns the first max value in the case of ties.
hope this helps,
Tony Plate
At Tuesday 11:01 AM 8/24/2004, Jonathan Baron wrote:
On 08/24/04 13:50, Paolo Tommasini wrote:
Hi my name is Paolo Tommasini does anyone know how to compute a mode
( most frequent element ) for a distribution ?
which.max
--
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
R search page: http://finzi.psych.upenn.edu/
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] S - R

2004-08-25 Thread Tony Plate

Have you tried following the advice in the R Data Import/Export manual?  It 
suggests the following:

Function data.restore reads S-PLUS data dumps (created by data.dump) with
the same restrictions (except that dumps from the Alpha platform can also 
be read).
It should be possible to read data dumps from S-PLUS 5.x and 6.x written with
data.dump(oldStyle=T).

-- Tony Plate

At Wednesday 10:29 AM 8/25/2004, Zachary Skrivanek wrote:
Hello!  I would like to be able to read in list data objects in R/S
created in R/S.  (Ie R-S or S-R.)  I have tried 'dput' and 'dump' in S,
but neither of the created files could be read into R (with 'dget' nor
'source').  Is there any way that I can save a list object in S that can
be read into R?

Sincerely,
Zachary Skrivanek, PhD
Research Scientist
Program Phase Statistics-Endocrine
 [[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] S - R

2004-08-25 Thread Tony Plate

I think the issue is that dput() and dget() don't work for some more 
complex structures (as you point out, they do appear to work for simple 
structures).  The R Data Import/Export manual doesn't mention using dput 
and dget to transfer objects between R and S-PLUS, perhaps because these 
functions have limited coverage?

E.g.:
S-PLUS6.1 junk - list(f=as.name(g))
S-PLUS6.1 dput(junk,junk1.dat)
S-PLUS6.1 data.dump(junk, file=junk2.dat, oldStyle=F)
S-PLUS6.1 data.dump(junk, file=junk3.dat, oldStyle=T)
R dget(junk1.dat)
Error in eval(expr, envir, enclos) : Object g not found
R # with package foreign loaded
R data.restore(junk2.dat)
Error in ReadSdump(TRUE,  ) : S mode junk (near byte offset 45) not 
supported
In addition: Warning message:
NAs introduced by coercion
R data.restore(junk3.dat)
[1] junk3.dat
R junk
$f
g


-- Tony Plate
At Wednesday 11:45 AM 8/25/2004, Rolf Turner wrote:
I'm puzzled by the discourse in this thread.  Briefly, dput() and
dget() seem to work just fine for me.
I tried
 junk - list(x=rnorm(20),y=sample(1:100,12,TRUE))
 dput(junk,junk.dat)
in Splus (Version 6.1.2 Release 2 for Sun SPARC, SunOS 5.6 : 2002)
and then in R
 junk - dget(junk.dat)
R version:
platform sparc-sun-solaris2.9
arch sparc
os   solaris2.9
system   sparc, solaris2.9
status
major1
minor9.1
year 2004
month06
day  21
language R
There were no complaints, and typing ``junk'' in the R window
and in the Splus window appeared to produce indentical results.
So what's the problem?
cheers,
Rolf Turner
[EMAIL PROTECTED]
Tony Plate wrote:
 Have you tried following the advice in the R Data Import/Export 
manual?  It
 suggests the following:

 Function data.restore reads S-PLUS data dumps (created by data.dump) with
 the same restrictions (except that dumps from the Alpha platform can also
 be read).
 It should be possible to read data dumps from S-PLUS 5.x and 6.x 
written with
 data.dump(oldStyle=T).

 -- Tony Plate

 At Wednesday 10:29 AM 8/25/2004, Zachary Skrivanek wrote:
 Hello!  I would like to be able to read in list data objects in R/S
 created in R/S.  (Ie R-S or S-R.)  I have tried 'dput' and 'dump' in S,
 but neither of the created files could be read into R (with 'dget' nor
 'source').  Is there any way that I can save a list object in S that can
 be read into R?
 
 Sincerely,
 Zachary Skrivanek, PhD
 Research Scientist
 Program Phase Statistics-Endocrine

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] terminate R program when trying to access out-of-bounds array element?

2005-04-13 Thread Tony Plate

One way could be to make a special class with an indexing method that 
checks for out-of-bounds numeric indices.  Here's an example for vectors:

 setOldClass(c(oobcvec))
 x - 1:3
 class(x) - oobcvec
 x
[1] 1 2 3
attr(,class)
[1] oobcvec
 [.oobcvec - function(x, ..., drop=T) {
+if (!missing(..1)  is.numeric(..1)  any(is.na(..1) | ..1  1 | 
..1  length(x)))
+stop(numeric vector out of range)
+NextMethod([)
+ }
 x[2:3]
[1] 2 3
 x[2:4]
Error in [.oobcvec(x, 2:4) : numeric vector out of range


Then, for vectors for which you want out-of-bounds checks done when they 
indexed, set the class to oobcvec.  This should work for simple 
vectors (I checked, and it works if the vectors have names).

If you want this write a method like this for indexing matrices, you can 
use ..1 and ..2 to refer to the i and j indices.  If you want to also be 
able to check for missing character indices, you'll just need to add 
more code.  Note that the above example disallows 0 and negative 
indices, which may or may not be what you want.

If you're extensively using other classes that you've defined, and you 
want out-of-bounds checking for them, then you need to integrate the 
checks into the subsetting methods for those classes -- you can't just 
use the above approach.

hope this helps,
Tony Plate
Vivek Rao wrote:
I want R to stop running a script (after printing an
error message) when an array subscript larger than the
length of the array is used, for example
x = c(1)
print(x[2])
rather than printing NA, since trying to access such
an element may indicate an error in my program. Is
there a way to get this behavior in R? Explicit
testing with the is.na() function everywhere does not
seem like a good solution. Thanks.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] terminate R program when trying to access out-of-bounds array element?

2005-04-13 Thread Tony Plate

Oops.
The message in the 'stop' should be something more like numeric index 
out of range.

-- Tony Plate
Tony Plate wrote:
One way could be to make a special class with an indexing method that 
checks for out-of-bounds numeric indices.  Here's an example for vectors:

  setOldClass(c(oobcvec))
  x - 1:3
  class(x) - oobcvec
  x
[1] 1 2 3
attr(,class)
[1] oobcvec
  [.oobcvec - function(x, ..., drop=T) {
+if (!missing(..1)  is.numeric(..1)  any(is.na(..1) | ..1  1 | 
..1  length(x)))
+stop(numeric vector out of range)
+NextMethod([)
+ }
  x[2:3]
[1] 2 3
  x[2:4]
Error in [.oobcvec(x, 2:4) : numeric vector out of range
 

Then, for vectors for which you want out-of-bounds checks done when they 
indexed, set the class to oobcvec.  This should work for simple 
vectors (I checked, and it works if the vectors have names).

If you want this write a method like this for indexing matrices, you can 
use ..1 and ..2 to refer to the i and j indices.  If you want to also be 
able to check for missing character indices, you'll just need to add 
more code.  Note that the above example disallows 0 and negative 
indices, which may or may not be what you want.

If you're extensively using other classes that you've defined, and you 
want out-of-bounds checking for them, then you need to integrate the 
checks into the subsetting methods for those classes -- you can't just 
use the above approach.

hope this helps,
Tony Plate
Vivek Rao wrote:
I want R to stop running a script (after printing an
error message) when an array subscript larger than the
length of the array is used, for example
x = c(1)
print(x[2])
rather than printing NA, since trying to access such
an element may indicate an error in my program. Is
there a way to get this behavior in R? Explicit
testing with the is.na() function everywhere does not
seem like a good solution. Thanks.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] pointer to comments re Paul Murrell's new book, R, SAS on Andrew Gelman's blog

2005-04-21 Thread Tony Plate

There are some interesting comments re Paul Murrell's new book, R,  SAS 
on Andrew Gelman's blog:

http://www.stat.columbia.edu/~cook/movabletype/archives/2005/04/a_new_book_on_r.html
-- Tony Plate
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Proba( Ut+2=1 / ((Ut+1==1) (Ut==1))) ?

2005-04-25 Thread Tony Plate

table() can return all the n-gram statistics, e.g.:
 v - sample(c(-1,1), 1000, rep=TRUE)
 table(v_{t-2}=v[-seq(to=length(v), len=2)], 
v_{t-1}=v[-c(1,length(v))], v_t=v[-(1:2)])
, , v_t = -1

   v_{t-1}
v_{t-2}  -1   1
 -1 136 134
 1  131 112
, , v_t = 1
   v_{t-1}
v_{t-2}  -1   1
 -1 131 113
 1  115 126

This says that there were 136 cases in which a -1 followed two -1's (and 
126 cases in which a 1 followed to 1's).

If you're really only interested in particular contexts, you can do 
something like:

 table(v[-seq(to=length(v), len=2)]==1  v[-c(1,length(v))]==1  
v[-(1:2)]==1)

FALSE  TRUE
  872   126
 table(v[-seq(to=length(v), len=2)]==-1  v[-c(1,length(v))]==-1  
v[-(1:2)]==-1)

FALSE  TRUE
  862   136
or
 sum(v[-seq(to=length(v), len=2)]==-1  v[-c(1,length(v))]==-1  
v[-(1:2)]==-1)
[1] 136

vincent wrote:
Dear all,
First I apologize if my question is quite simple,
but i'm very newbie with R.
I have vectors of the form v = c(1,1,-1,-1,-1,1,1,1,1,-1,1)
(longer than this one of course).
The elements are only +1 or -1.
I would like to calculate :
- the frequencies of -1 occurences after 2 consecutives -1
- the frequencies of +1 occurences after 2 consecutives +1
It looks probably something like :
Proba( Ut+2=1 / ((Ut+1==1)  (Ut==1)))
could someone please give me a little hint about how
i should/could begin to proceed ?
Thanks
(Thanks also to the R creators/contributors, this soft
seems really great !)
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Index matrix to pick elements from 3-dimensional matrix

2005-04-26 Thread Tony Plate

I'm assuming what you want to do is randomly sample from slices of A 
selected on the 3-rd dimension, as specified by J.  Here's a way that 
uses indexing by a matrix.  The cbind() builds a three column matrix of 
indices, the first two of which are randomly selected.  The use of 
replace() is to make the result have the same attributes, e.g., dim and 
dimnames, as J.

 A - array(letters[1:12],c(2,2,3))
 J - matrix(c(1,2,3,3),2,2)
 replace(J, TRUE, A[cbind(sample(dim(A)[1], length(J), rep=T), 
sample(dim(A)[2], length(J), rep=T), as.vector(J))])
 [,1] [,2]
[1,] b  l
[2,] f  k
 replace(J, TRUE, A[cbind(sample(dim(A)[1], length(J), rep=T), 
sample(dim(A)[2], length(J), rep=T), as.vector(J))])
 [,1] [,2]
[1,] b  l
[2,] h  i
 replace(J, TRUE, A[cbind(sample(dim(A)[1], length(J), rep=T), 
sample(dim(A)[2], length(J), rep=T), as.vector(J))])
 [,1] [,2]
[1,] c  l
[2,] h  k


-- Tony Plate
Robin Hankin wrote:
Hello Juhana
try this (but there must be a better way!)

stratified.select - function(A,J){
  out - sapply(J,function(i){sample(A[,,i],1)})
  attributes(out) - attributes(J)
  return(out)
}
A - array(letters[1:12],c(2,2,3))
J - matrix(c(1,2,3,3),2,2)
R  stratified.select(A,J)
 [,1] [,2]
[1,] b  i
[2,] g  k
R   stratified.select(A,J)
 [,1] [,2]
[1,] d  j
[2,] f  l
R
best wishes
Robin

On Apr 26, 2005, at 05:16 am, juhana vartiainen wrote:
Hi all
Suppose I have a dim=c(2,2,3) matrix A, say:
A[,,1]=
a b
c d
A[,,2]=
e f
g h
A[,,3]=
i j
k l
Suppose that I want to create a 2x2 matrix X, which picks elements 
from the above-mentioned submatrices according to an index matrix J 
referring to the depth dimension:
J=
1 3
2 3

In other words, I want X to be
X=
a j
g l
since the matrix J says that the (1,1)-element should be picked from 
A[,,1], the (1,2)-element should be picked from A[,,3], etc.

I have A and I have J. Is there an expression in A and J that creates X?
Thanks
Juhana
[EMAIL PROTECTED]
--
Juhana Vartiainen
docent in economics
Director, FIEF (Trade Union Foundation for Economic Research, 
Stockholm), http://www.fief.se
gsm +46 70 360 9915
office +46 8 696 9915
email [EMAIL PROTECTED]
homepage http://www.fief.se/staff/Juhana/index.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


--
Robin Hankin
Uncertainty Analyst
Southampton Oceanography Centre
European Way, Southampton SO14 3ZH, UK
 tel  023-8059-7743
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Summarizing factor data in table?

2005-04-26 Thread Tony Plate

Do you want to count the number of non-NA divisions and organizations in 
the data for each year (where duplicates are counted as many times as 
they appear)?

 tapply(!is.na(foo$div), foo$yr, sum)
1998 1999 2000
   042
 tapply(!is.na(foo$org), foo$yr, sum)
1998 1999 2000
   442

Or perhaps the number of unique non-NA divisions and organizations in 
the data for each year?

 tapply(foo$div, foo$yr, function(x) length(na.omit(unique(x
1998 1999 2000
   042
 tapply(foo$org, foo$yr, function(x) length(na.omit(unique(x
1998 1999 2000
   442

(I don't understand where the 3 in your desired output comes from 
though, which maybe indicates I completely misunderstand your request.)

Andy Bunn wrote:
I have a very simple query with regard to summarizing the number of factors
present in a certain snippet of a data frame.
Given the following data frame:
foo - data.frame(yr = c(rep(1998,4), rep(1999,4), rep(2000,2)), div =
factor(c(rep(NA,4),A,B,C,D,A,C)),
org = factor(c(1:4,1:4,1,2)))
I want to get two new variables. Object ndiv would give the number of
divisions by year:
 1998 0
 1999 3
 2000 2
Object norgs would give the number of organizations
 1998 4
 1999 4
 2000 2
I figure xtabs should be able to do it, but I'm stuck without a for loop.
Any suggestions? -Andy
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Defining binary indexing operators

2005-04-27 Thread Tony Plate

It's not necessary to be that complicated, is it?  AFAIK, the '$' 
operator is treated specially by the parser so that its RHS is treated 
as a string, not a variable name.  Hence, a method for $ can just take 
the indexing argument directly as given -- no need for any fancy 
language tricks (eval(), etc.)

 x - structure(3, class = myclass)
 y - 5
 foo - function(x,y) paste(x,  indexed by ', y, ', sep=)
 foo(x, y)
[1] 3 indexed by '5'
 $.myclass - foo
 x$y
[1] 3 indexed by 'y'

The point of the above example is that foo(x,y) behaves differently from 
x$y even when both call the same function: foo(x,y) uses the value of 
the variable 'y', whereas x$y uses the string y.  This is as desired 
for an indexing operator $.

-- Tony Plate

Gabor Grothendieck wrote:
On 4/27/05, Ali - [EMAIL PROTECTED] wrote: 

Assume we have a function like:
foo - function(x, y)
how is it possible to define a binary indexing operator, denoted by $, so
that
x$y
functions the same as
foo(x, y)

  Here is an example. Note that $ does not evaluate y so you have
to do it yourself:
x - structure(3, class = myclass)
y - 5
foo - function(x,y) x+y
$.myclass - function(x, i) { i - eval.parent(parse(text=i)); foo(x, i) }
x$y # structure(8, class = myclass)
[[alternative HTML version deleted]]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Defining binary indexing operators

2005-04-27 Thread Tony Plate

Excuse me!  I misunderstood the question, and indeed, it is necessary be 
that complicated when you try to make x$y behave the same as foo(x,y), 
rather than foo(x,y) (doing the former would be inadvisible, as I 
think someelse pointed out too.)

Tony Plate wrote:
It's not necessary to be that complicated, is it?  AFAIK, the '$' 
operator is treated specially by the parser so that its RHS is treated 
as a string, not a variable name.  Hence, a method for $ can just take 
the indexing argument directly as given -- no need for any fancy 
language tricks (eval(), etc.)

  x - structure(3, class = myclass)
  y - 5
  foo - function(x,y) paste(x,  indexed by ', y, ', sep=)
  foo(x, y)
[1] 3 indexed by '5'
  $.myclass - foo
  x$y
[1] 3 indexed by 'y'
 
The point of the above example is that foo(x,y) behaves differently from 
x$y even when both call the same function: foo(x,y) uses the value of 
the variable 'y', whereas x$y uses the string y.  This is as desired 
for an indexing operator $.

-- Tony Plate

Gabor Grothendieck wrote:
On 4/27/05, Ali - [EMAIL PROTECTED] wrote:
Assume we have a function like:
foo - function(x, y)
how is it possible to define a binary indexing operator, denoted by 
$, so
that

x$y
functions the same as
foo(x, y)

  Here is an example. Note that $ does not evaluate y so you have
to do it yourself:
x - structure(3, class = myclass)
y - 5
foo - function(x,y) x+y
$.myclass - function(x, i) { i - eval.parent(parse(text=i)); 
foo(x, i) }
x$y # structure(8, class = myclass)

[[alternative HTML version deleted]]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Getting the name of an object as character

2005-04-28 Thread Tony Plate

If you're trying to find the textual form of an actual argument, here's 
one way:

 foo - function(x) {
+ xn - substitute(x)
+ if (is.name(xn)  !exists(as.character(xn)))
+ as.character(xn)
+ else
+ x
+ }
 foo(x)
[1] 3
 foo(xx)
[1] xx
 foo(list(xx))
Error in foo(list(xx)) : Object xx not found

If you want the textual form of arguments that are expressions, use 
deparse() and a different test ( beware that deparse() can return a 
vector of character data).

Although you can do this in R, it is not always advisable practice. 
Many people who have written functions with non-standard evaluation 
rules like this have come to regret it (one reason is that it makes 
these functions difficult to use in programs, another is that the 
behavior of the function can depend upon what global variables exists, 
another is that when the function works as intended, that's great, but 
when it doesn't, users can get quite confused trying to figure out what 
it's doing.)  The R function help() is an example of a commonly used 
function with a non-standard evaluation rule.

-- Tony Plate

Ali - wrote:
This could be really trivial, but I cannot find the right function to 
get the name of an object as a character.

Assume we have a function like:
getName - function(obj)
Now if we call the function like:
getName(blabla)
and 'blabla' is not a defined object, I want getName to return blabla. 
In other word, if

paste(blabla)
returns
blabla
I want to define a paste function which returns the same character by:
paste(blabla)
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Reconstruction of a valid expression within a function

2005-04-28 Thread Tony Plate

You are passing just a string to subset().  At the very least you need 
to parse it (but still this does not work easily with subset() -- see 
below).  But are you sure you need to do this?  subset() for dataframes 
already accepts subset expressions involving the columns of the 
dataframe, e.g.:

 df - data.frame(x=1:10,y=rep(1:5,2))
 subset(df, y==2)
  x y
2 2 2
7 7 2

However, it's tricky to get subset() to work with an expression for its 
subset argument.  This is because of the way it evaluates its subset 
expression (look at the code for subset.data.frame()).

 subset(df, parse(text=df$y==2))
Error in subset.data.frame(df, parse(text = df$y==2)) :
'subset' must evaluate to logical
 subset(df, parse(text=y==2))
Error in subset.data.frame(df, parse(text = y==2)) :
'subset' must evaluate to logical

It's a little tricky in general passing R language expressions around, 
because many functions that work with expressions work with the 
unevaluated form of the actual argument, rather than with an R language 
expression as the value of a variable.  E.g.:

 with(df, y==2)
 [1] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
 cond - parse(text=y==2)
 cond
expression(y == 2)
 with(df, cond)
expression(y == 2)
One way to make these types of functions work with R language 
expressions as the value of a variable is to use do.call():

 do.call(with, list(df, cond))
 [1] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE

So, returning to subset(), you can give it an expression that is stored 
in the value of a variable like this:

 do.call(subset, list(df, cond))
  x y
2 2 2
7 7 2

However, if you're a beginner at R, I suspect that you'll get much 
further if you avoid such meta-language constructs and just find a way 
to make subset() work for you without trying to paste together R 
language expressions.

Hope this helps,
-- Tony Plate
Pascal Boisson wrote:
Hello all,
I have some trouble in reconstructing a valid expression within a
function,
here is my question.
I am building a function :
SUB-function(DF,subset=TRUE) {
#where DF is a data frame, with Var1, Var2, Fact1, Fact2, Fact3
#and subset would be an expression, eg. Fact3 == 1 

#in a first time I want to build a subset from DF
#I managed to, with an expression like eg. DF$Fact3,
# but I would like to skip the DF$ for convenience
# so I tried something like this :
tabsub-deparse(substitute(subset))
dDF-deparse(substitute(DF))
if (tabsub[1]!=TRUE) {
subset-paste(dDF,$,tabsub,sep=)}
#At this point, I have a string that seems to be the expression that I
want
sDF-subset(DF, subset)
}
#But I have an error message :
Error in r  !is.na(r) : operations are possible only for numeric or
logical types
I can not understand why is that, even after I've tried to convert
properly the string into an expression.
I've been all the day trying to sort that problem ...
Maybe this attempt is ackward and I have not understood what is really
behind an expression. 
But if anyone could give me a tip concerning this problem or point me to
relevant references, I would really appreciate.

Thanks
Pascal Boisson
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
DISCLAIMER:\ 
\ This email is from the Scottish Crop Researc...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Subarrays

2005-04-29 Thread Tony Plate

Here's one way:
 subarray - function(x, marginals, intervals) {
+ if (length(marginals) != length(intervals))
+ stop(marginals and intervals must be the same length 
(intervals can be a list))
+ if (any(marginals1 | marginalslength(dim(x
+ stop(marginals must contain values in 1:length(dim(x)))
+ ic - Quote(x[, drop=T])
+ # ic has 4 elts with one empty index arg
+ ic2 - ic[c(1, 2, rep(3, length(dim(x))), 4)]
+ # ic2 has an empty arg for each dim of x
+ ic2[marginals+2] - intervals
+ eval(ic2)
 }

 subarray(v, c(1,4), c(3,2))
 [,1] [,2] [,3] [,4]
[1,]   67   83   99  115
[2,]   71   87  103  119
[3,]   75   91  107  123
[4,]   79   95  111  127
 subarray(v, c(1,4), list(3,2))
 [,1] [,2] [,3] [,4]
[1,]   67   83   99  115
[2,]   71   87  103  119
[3,]   75   91  107  123
[4,]   79   95  111  127
 subarray(v, c(1,3,4), list(c(1,3,4),1,2))
 [,1] [,2] [,3] [,4]
[1,]   65   69   73   77
[2,]   67   71   75   79
[3,]   68   72   76   80

Question for language experts: is this the best way to create and 
manipulate R language expressions that contain empty arguments, or are 
there other preferred ways?

-- Tony Plate
Gunnar Hellmund wrote:
Define an array

v-1:256
dim(v)-rep(4,4)

Subarrays can be obtained as follows:

v[3,2,,2]
[1]  71  87 103 119
v[3,,,2]
 [,1] [,2] [,3] [,4]
[1,]   67   83   99  115
[2,]   71   87  103  119
[3,]   75   91  107  123
[4,]   79   95  111  127
In the general case this procedure is very tedious. 

Given an array 
A, dim(A)=(dim_1,dim_2,...,dim_d) 
and two vectors
v1=(n_i1,...n_ik), v2=(int_1,...,int_k) ('marginals' and relevant
'interval numbers')
is there a smart way to obtain 
A[,...,int_1,,int_2,,,int_k,]
?

Best wishes
Gunnar Hellmund
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] na.action

2005-04-29 Thread Tony Plate

Maybe this does what you want:
 x - as.matrix(read.table(clipboard))
 x
  V1 V2 V3 V4
1 NA  0  0  0
2  0 NA  0 NA
3  0  0 NA  2
4  0  0  2 NA
 rowSums(x==2, na.rm=T)
1 2 3 4
0 0 1 1

There's probably at least 5 or 6 other quite sensible ways of doing 
this, but this is probably the fastest (and the least versatile).

A more general building block is the sum() function, as in:
 sum(x[3,]==2, na.rm=T)
[1] 1

The key is the use of the 'na.rm=T' argument value.
hope this helps,
Tony Plate
Tim Smith wrote:
Hi,
 
I had the following code:

  testp - rcorr(t(datcm1),type = pearson)
  mat1 - testp[[1]][,]  0.6
  mat2 - testp[[3]][,]  0.05
  mat3 - mat1 + mat2
 
The resulting mat3 (smaller version) matrix looks like:
 
 NA   000  
  0  NA0   NA 
  0   0   NA2 
  0   02   NA   
 
To get to the number of times a '2' appears in the rows, I was trying to run the following code:
 
numrow = nrow(mat3)
  counter - matrix(nrow = numrow,ncol =1)
  for(i in 1:numrow){
   count = 0;
   for(j in 1:numrow){
if(mat3[i,j] == 2){
 count = count + 1
}
   }
  counter[i,1] = count
  }
 
However, I get the following error:
 
'Error in if (mat3[i, j] == 2) { : missing value where TRUE/FALSE needed'
 
I also tried to use the na.action, but couldn't get anything. I'm sure there must be a relatively easy fix to this. Is there a workaround this problem?
 
thanks,
 
Tim
 

__

[[alternative HTML version deleted]]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] summary(as.factor(x) - force to not sort the result according factor levels

2005-05-02 Thread Tony Plate

Christoph Lehmann wrote:
Hi
The result of a summary(as.factor(x)) (see example below) call is sorted 
according to the factor level. How can I get the result not sorted but 
in the original order of the levels in x?
by creating the factor with the levels in the order you want:
 test - c(120402, 120402, 120402, 1323, 1323,200393, 200393, 200393, 
200393, 200393)
 summary(factor(test, levels=unique(test)))
120402   1323 200393
 3  2  5

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] function for cumulative occurrence of elements

2005-06-28 Thread Tony Plate

I'm not entirely sure what you want, but is it 9 5 3 for this data? (9 
new species occur at the first point, 5 new at the second, and 3 
new at the third).  If this is right, then to get accumulation curve 
when random Points are considered, you can probably just index rows of 
dt appropriately.

  dd - read.table(clipboard, header=T)
  dd[,1:3]
Pointspecies frequency
1  7   American_elm 7
2  7  apple 2
3  7   black_cherry 8
4  7  black_oak 1
5  7chokecherry 1
6  7 oak_sp 1
7  7 pignut_hickory 1
8  7  red_maple 1
9  7  white_oak 5
10 9   black_spruce 2
11 9blue_spruce 2
12 9missing12
13 9  Norway_spruce 8
14 9   white_spruce 3
1512  apple 2
1612   black_cherry 1
1712   black_locust 1
1812   black_walnut 1
1912  lilac 3
2012missing 2
  # dt: table of which species occur at which Points
  dt - table(dd$Point, dd$species)
  # doc: for each species, the index of the Point where
  # it first occurs
  doc - apply(dt, 2, function(x) which(x==1)[1])
  doc
   American_elm  apple   black_cherry   black_locust  black_oak
  1  1  1  3  1
   black_spruce   black_walnutblue_sprucechokecherry  lilac
  2  3  2  1  3
missing  Norway_spruce oak_sp pignut_hickory  red_maple
  2  2  1  1  1
  white_oak   white_spruce
  1  2
  table(doc)
doc
1 2 3
9 5 3
 

hope this helps,

Tony Plate

Steven K Friedman wrote:
 Hello, 
 
 I have a data set with 9700 records, and 7 parameters. 
 
 The data were collected for a survey of forest communities.  Sample plots 
 (1009) and species (139) are included in this data set. I need to determine 
 how species are accumulated as new plots are considered. Basically, I want 
 to develop a species area curve. 
 
 I've included the first 20 records from the data set.  Point represents the 
 plot id. The other parameters are parts of the information statistic H'. 
 
 Using Table, I can construct a data set that lists the occurrence of a 
 species at any Point (it produces a binary 0/1 data table). From there it 
 get confusing, regarding the most efficient approach to determining the 
 addition of new and or repeated species occurrences. 
 
 ptcount -  table(sppoint.freq$species, sppoint.freq$Point) 
 
  From here I've played around with colSums to calculate the number of species 
 at each Point.  The difficulty is determining if a species is new or 
 repeated.  Also since there are 1009 points a function is needed to screen 
 every Point. 
 
 Two goals are of interest: 1) the species accumulation curve, and 2) an 
 accumulation curve when random Points are considered. 
 
 Any help would be greatly appreciated. 
 
 Thank you
 Steve Friedman 
 
 
  Pointspecies frequency point.list point.prop   log.prop 
 point.hprime
 1  7   American elm 7 27 0.25925926 -1.3499267
 0.3499810
 2  7  apple 2 27 0.07407407 -2.6026897
 0.1927918
 3  7   black cherry 8 27 0.29629630 -1.2163953
 0.3604134
 4  7  black oak 1 27 0.03703704 -3.2958369
 0.1220680
 5  7chokecherry 1 27 0.03703704 -3.2958369
 0.1220680
 6  7 oak sp 1 27 0.03703704 -3.2958369
 0.1220680
 7  7 pignut hickory 1 27 0.03703704 -3.2958369
 0.1220680
 8  7  red maple 1 27 0.03703704 -3.2958369
 0.1220680
 9  7  white oak 5 27 0.18518519 -1.6863990
 0.3122961
 10 9   black spruce 2 27 0.07407407 -2.6026897
 0.1927918
 11 9blue spruce 2 27 0.07407407 -2.6026897
 0.1927918
 12 9missing12 27 0. -0.8109302
 0.3604134
 13 9  Norway spruce 8 27 0.29629630 -1.2163953
 0.3604134
 14 9   white spruce 3 27 0. -2.1972246
 0.2441361
 1512  apple 2 27 0.07407407 -2.6026897
 0.1927918
 1612   black cherry 1 27 0.03703704 -3.2958369
 0.1220680
 1712   black locust 1 27 0.03703704 -3.2958369
 0.1220680
 1812   black walnut 1 27 0.03703704 -3.2958369
 0.1220680
 1912  lilac 3 27 0. -2.1972246
 0.2441361
 2012missing 2 27 0.07407407 -2.6026897
 0.1927918
 
 __
 R-help@stat.math.ethz.ch

Re: [R] Generating correlated data from uniform distribution

2005-07-01 Thread Tony Plate

Isn't this a little trickier with non-normal variables?  It sounds like 
Menghui Chen wants variables that have uniform marginal distribution, 
and a specified correlation.

When I look at histograms (or just the quantiles) of the rows of dat2 in 
your example, I see something for dat2[2,] that does not look much like 
it comes from a uniform distribution.

  dat-matrix(runif(2000),2,1000)
  rho-.77
  R-matrix(c(1,rho,rho,1),2,2)
  ch-chol(R)
  dat2-t(ch)%*%dat
  cor(dat2[1,],dat2[2,])
[1] 0.7513892
  hist(dat2[1,])
  hist(dat2[2,])
 
  quantile(dat2[1,])
  0% 25% 50% 75%100%
0.000655829 0.246216035 0.507075912 0.745158441 0.16418
  quantile(dat2[2,])
0%   25%   50%   75%  100%
0.0393046 0.4980066 0.7150426 0.9208855 1.3864704
 

-- Tony Plate

Jim Brennan wrote:
 dat-matrix(runif(2000),2,1000)
 rho-.77
 R-matrix(c(1,rho,rho,1),2,2)
 ch-chol(R)
 dat2-t(ch)%*%dat
 cor(dat2[1,],dat2[2,])
[1] 0.7513892
 
dat-matrix(runif(2),2,1)
rho-.28
R-matrix(c(1,rho,rho,1),2,2)
ch-chol(R)
dat2-t(ch)%*%dat
cor(dat2[1,],dat2[2,])
 
 [1] 0.2681669
 
dat-matrix(runif(20),2,10)
rho-.28
R-matrix(c(1,rho,rho,1),2,2)
ch-chol(R)
dat2-t(ch)%*%dat
cor(dat2[1,],dat2[2,])
 
 [1] 0.2814035
 
 See  ?choleski
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Menghui Chen
 Sent: July 1, 2005 4:49 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Generating correlated data from uniform distribution
 
 Dear R users,
 
 I want to generate two random variables (X1, X2) from uniform
 distribution (-0.5, 0.5) with a specified correlation coefficient r.
 Does anyone know how to do it in R?
 
 Many thanks!
 
 Menghui
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] how as.numeric() !- factor

2003-08-03 Thread Tony Plate

The problem is that the 2nd column in your data frame has been converted 
into a factor.  This happened because you used cbind() with mixed character 
and numeric vectors.  cbind() with these types of arguments will construct 
a character matrix.  Then when you passed that character matrix to 
as.data.frame() it converted both columns to factors.

Here's a simpler example of what happened:

 cbind(letters[1:2], c(1,3))
 [,1] [,2]
[1,] a  1
[2,] b  3
 x - as.data.frame(cbind(letters[1:2], c(1,3)))
 x
  V1 V2
1  a  1
2  b  3
 as.numeric(x[,2])
[1] 1 2
 as.numeric(as.character(x[,2]))
[1] 1 3

With the data frame as you constructed it, you need an expression like 
round(as.numeric(as.character(Np.occup97.98[,2])), 2) to accomplish what 
you want.  It would probably be better to construct a more felicitous data 
frame in the first place:

 df - data.frame(site = levels(sums$site), Np.occup97.98 = 
sums$Ant.Nptrad97.98/Ant.trad$Ant.trad97.98)

(unless of course you had some unstated reason for constructing the data 
frame the way you did)

-- Tony Plate

At Thursday 10:03 AM 7/31/2003 +0200, Tord Snall wrote:
Dear all,

I have divided two vectors:

Np.occup97.98- as.data.frame(cbind(site = levels(sums$site),
 Np.occup97.98 = sums$Ant.Nptrad97.98/Ant.trad$Ant.trad97.98))
 Np.occup97.98
  site Np.occup97.98
1  erken97 0.342592592592593
2  erken98 0.333
3 rormyran  0.48471615720524
4  valkror 0.286026200873362
However, at a later stage of the analysis I want
 round(Np.occup97.98[,2], 2)
Error in Math.factor(x, digits) : round not meaningful for factors
neither did this work:

 round(Np.occup97.98[,2], 2)
Error in Math.factor(x, digits) : round not meaningful for factors
or this:

 round(as.numeric(Np.occup97.98[,2]), 2)
[1] 3 2 4 1

because, as clearly written in the help file:
as.numeric for factors yields the codes underlying the factor levels, not
the numeric representation of the labels.
I've discovered this solution:

 Np.occup97.98- as.data.frame(cbind(site = levels(sums$site),
+  Np.occup97.98 =
round(sums$Ant.Nptrad97.98/Ant.trad$Ant.trad97.98,2)))

 Np.occup97.98
  site Np.occup97.98
1  erken97  0.34
2  erken98  0.33
3 rormyran  0.48
4  valkror  0.29
However, I would like to do this rounding later.

Could someone give a tip. I think that I would have been helped by a
sentence in help(as.numeric).
Thanks in advance.

Sincerely,
Tord


---
Tord Snäll
Avd. f växtekologi, Evolutionsbiologiskt centrum, Uppsala universitet
Dept. of Plant Ecology, Evolutionary Biology Centre, Uppsala University
Villavägen 14
SE-752 36 Uppsala, Sweden
Tel: 018-471 28 82 (int +46 18 471 28 82) (work)
Tel: 018-25 71 33 (int +46 18 25 71 33) (home)
Fax: 018-55 34 19 (int +46 18 55 34 19) (work)
E-mail: [EMAIL PROTECTED]
Check this: http://www.vaxtbio.uu.se/resfold/snall.htm!
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] vectorization question

2003-08-14 Thread Tony Plate

From ?data.frame:
Details:

 A data frame is a list of variables of the same length with unique
 row names, given class `data.frame'.
Your example constructs an object that does not conform to the definition 
of a data frame (the new column is not the same length as the old 
columns).  Some data frame functions may work OK with such an object, but 
others will not.  For example, the print function for data.frame silently 
handles such an illegal data frame (which could be described as 
unfortunate.)  It would probably be far easier to construct a correct data 
frame in the first place than to try to find and fix functions that don't 
handle illegal data frames.  For adding a new column to a data frame, the 
expressions x[,new.column.name] - value and x[[new.column.name]] - 
value will replicate the value so that the new column is the same length 
as the existing ones, while the $ operator in an assignment will not 
replicate the value.  (One could argue that this is a deficiency, but I 
think it has been that way for a long time, and the behavior is the same in 
the current version of S-plus.)

 x1 - data.frame(a=1:3)
 x2 - x1
 x3 - x1
 x1$b - 0
 x2[,b] - 0
 x3[[b]] - 0
 sapply(x1, length)
a b
3 1
 sapply(x2, length)
a b
3 3
 sapply(x3, length)
a b
3 3
 as.matrix(x2)
  a b
1 1 0
2 2 0
3 3 0
 as.matrix(x1)
Error in as.matrix.data.frame(x1) : dim- length of dims do not match the 
length of object


At Thursday 04:50 PM 8/14/2003 +, Alberto Murta wrote:
Dear all

I recently noticed the following error when cohercing a data.frame into a
matrix:
 example - matrix(1:12,4,3)
 example - as.data.frame(example)
 example$V4 - 0
 example
  V1 V2 V3 V4
1  1  5  9   0
2  2  6 10  0
3  3  7 11  0
4  4  8 12  0
 example - as.matrix(example)
Error in as.matrix.data.frame(example) : dim- length of dims do not match 
the
length of object

However, if the column to be added has the right number of lines, there's no
error:
 example - matrix(1:12,4,3)
 example - as.data.frame(example)
 example$V4 - rep(0,4)
 example
  V1 V2 V3 V4
1  1  5  9  0
2  2  6 10  0
3  3  7 11  0
4  4  8 12  0
 example - as.matrix(example)
 example
  V1 V2 V3 V4
1  1  5  9  0
2  2  6 10  0
3  3  7 11  0
4  4  8 12  0
Shouldn't it work well both ways? I checked the attributes and dims of the
data frame and they are the same in both cases. Where's the difference that
originates the error message?
Thanks in advance
Alberto

platform i686-pc-linux-gnu
arch i686
os   linux-gnu
system   i686, linux-gnu
status
major1
minor7.1
year 2003
month06
day  16
language R
--
 Alberto G. Murta
Institute for Agriculture and Fisheries Research (INIAP-IPIMAR)
Av. Brasilia, 1449-006 Lisboa, Portugal | Phone: +351 213027062
Fax:+351 213015948 | http://www.ipimar-iniap.ipimar.pt/pelagicos/
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Variance Computing- - HELP!!!!!!!!!!!!!!!!!!

2003-08-19 Thread Tony Plate

Perhaps you were trying for as sample size increases, variance *of the 
mean* decreases (a least when variance is finite).  If you swap mean and 
var in your code, I think you will get what you are looking for.

-- Tony Plate

At Tuesday 05:42 PM 8/19/2003 +, Padmanabhan, Sudharsha wrote:

Hello,

I am running a few simulations for clinical trial anlysis. I want some help
regarding the following.
We know trhat as the sample size increases, the variance should decrease, but
I am getting some unexpected results. SO I ran a code (shown below) to check
the validity of this.
large-array(1,c(1000,1000))
small-array(1,c(100,1000))
for(i in 1:1000){large[i,]-rnorm(1000,0,3)}
for(i in 1:1000){small[i,]-rnorm(100,0,3)}}
yy-array(1,100)
for(i in 1:100){yy[i]-var(small[i,])}
y1y-array(1,1000)
for(i in 1:1000){y1y[i]-var(large[i,])}
mean(yy);mean(y1y);
[1] 8.944
[1] 9.098
This shows that on an average,for 1000 such samples of 1000 Normal numbers,
the variance is higher than that of a 100 samples of 1000 random numbers.
Why is this so?

Can someone please help me out

Thanks.

Regards

~S.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Using files as connections

2003-08-28 Thread Tony Plate

You need to save the connection object returned by file() and then use that 
object in other functions.

You need to change the appropriate lines to the following (at least):

con - file(c:/data/perry/data.csv,open=r)
  cline - readLines(con,n=1)
close(con)
(I don't know if more changes are needed to get it working.)

Note that using the connection object in other functions can have side 
effects on the connection object (which is how a connection remembers its 
point in the file.) (Perhaps more accurately, the side effect is on the 
internal system data referred to by the R connection object.)

 con - textConnection(letters)
 con
 descriptionclass mode text
   letters textConnection  r   text
  opened can readcan write
openedyes no
 readLines(con, 1)
[1] a
 readLines(con, 1)
[1] b
 con.saved - con
 readLines(con, 1)
[1] c
 readLines(con.saved, 1)
[1] d
 readLines(con, 1)
[1] e
 identical(con, con.saved)
[1] TRUE
 showConnections()
  description classmode text   isopen   can read can write
3 letters   textConnection r  text opened yesno


hope this helps,

Tony Plate

At Thursday 11:19 AM 8/28/2003 +1200, you wrote:
I have been trying to read a random sample of lines from a file into a
data frame using readLines(). The help indicates that readLines() will
start from the current line if the connection is open, but presented with
a closed connection it will open it, start from the beginning, and close
it when finished.
In the code that follows I tried to open the file before reading but
apparently without success, because the result was repeated copies of the
first line:
flines - 107165
slines - 100
selected - sort(sample(flines,slines))
strvec - rep(,slines)
file(c:/data/perry/data.csv,open=r)
isel - 0
for (iline in 1:slines) {
  isel - isel + 1
  cline - readLines(c:/data/perry/data.csv,n=1)
  if (iline == selected[isel]) strvec[isel] - cline else
isel - isel - 1
}
close(c:/data/perry/data.csv)
sel.flows - read.table(textConnection(strvec), header=FALSE, sep=,)
There was also an error no applicable method  for close.

Comments gratefully received.

Murray Jorgensen

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: Just don't do it, surely? (was RE: [R] Retrieve ... argument values)

2003-09-17 Thread Tony Plate

At Wednesday 11:19 AM 9/17/2003 +0100, Simon Fear wrote:
There have been various elegant solutions to test for the presence
of a particular named parameter within a ... argument, such as
if (!is.null(list(...)$ylim))
if (ylim %in% names(list(...)))
I think I'd have to comment these lines pretty clearly if I wanted
to easily follow the code in 6 months time.
But I'm still not convinced it is ever a good idea to use this
technique in preference to using explicit named arguments. If
there is something special about ylim, why insist that it be
passed within  ... in the first place? Surely it's better
to define the function as function(x,ylim=default,...) within which
you do your special ylim stuff, then call plot(x, ylim=ylim,...))??
Can anyone come up with a good reason not to follow
that principle? I think my earlier post may have been
misconstrued: I'm not saying never write functions that use ...,
I'm just saying never write functions that depend on a particular
argument being passed via 
Several reasons for not following that principle involve proliferation of 
defaults -- if the lower level functions have defaults, then those defaults 
must be repeated at the higher levels.  This is a good reason for not 
following that principle, because it makes software maintenance more 
difficult.  Another reason for not following that principle is that tf you 
have several lower level functions with different default values for an 
argument of the same name, it becomes impossible to get the lower-level 
default behavior.

-- Tony Plate

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

RE: Just don't do it, surely? (was RE: [R] Retrieve ... argument values)

2003-09-17 Thread Tony Plate

Simon, I agree, for some (maybe most) arguments it is good to know what 
defaults are being used.  But there are some for which I really don't want 
to know.  An example of the latter is arguments that control interaction 
with a database.  Suppose I have a low-level interaction function that 
takes an argument 'db.mode', where this specifies a way of interacting with 
the database.  Now, if I also have a higher level function that gets data 
from the database I might write:

db.get.high.level.data - function(what, ...) {
 processed.what - do something to 'what'
 db.get.low.level.data(processed.what, ...)
}
db.get.low.level.data(what, db.mode=2) {
 # fetch the data
}
By using ... arguments I can specify a db.mode argument to the higher level 
function, or just get the default provided in the lower level function.  If 
I then change the lower level function to provide a better mode of 
interaction I can make that mode the default in the lower level function, 
and be confident it will be used everywhere.  But if I specify the defaults 
in both places, then changing defaults becomes a big task.

As for the second point regarding different functions having different 
defaults for an argument of the same name, it can certainly be handled as 
you describe by making different argument names in the higher level function.

-- Tony Plate

At Wednesday 05:25 PM 9/17/2003 +0100, Simon Fear wrote:
Tony, I don't understand what you mean. Could you give
an example?
 -Original Message-
 From: Tony Plate [mailto:[EMAIL PROTECTED]
  ... I'm not saying never write functions that use ...,
 I'm just saying never write functions that depend on a particular
 argument being passed via 

 Several reasons for not following that principle involve proliferation
 of
 defaults -- if the lower level functions have defaults, then those
 defaults
 must be repeated at the higher levels.
 This is a good reason for not
 following that principle, because it makes software maintenance more
 difficult.
I don't think I agree with that (though maybe I just didn't
get it). I prefer to know what arguments a function is going
to use.
 Another reason for not following that principle is that tf
 you
 have several lower level functions with different default
 values for an
 argument of the same name, it becomes impossible to get the
 lower-level
 default behavior.
I'm lost there. When I choose which function to call it has
its own default??
I often call a function of mine called timepoints.summary for which I
want
to pass graphical parameters to boxplots, matplots and confidence
interval plots. So I name the arguments cex.boxplot, col.boxplot etc
and then within the function I call boxplot(x, cex=boxplot.cex) and so
on. I wouldn't expect a single argument cex to magically work out
whether it was being used in a boxplot or matplot and change
to a different default??
Simon Fear
Senior Statistician
Syne qua non Ltd
Tel: +44 (0) 1379 69
Fax: +44 (0) 1379 65
email: [EMAIL PROTECTED]
web: http://www.synequanon.com
Number of attachments included with this message: 0

This message (and any associated files) is confidential and
contains information which may be legally privileged.  It is
intended for the stated addressee(s) only.  Access to this
email by anyone else is unauthorised.  If you are not the
intended addressee, any action taken (or not taken) in
reliance on it, or any disclosure or copying of the contents of
it is unauthorised and unlawful.  If you are not the addressee,
please inform the sender immediately and delete the email
from your system.
This message and any associated attachments have been
checked for viruses using an internationally recognised virus
detection process.  However, Internet communications cannot
be guaranteed to be secure or error-free as information could
be intercepted, corrupted, lost, destroyed, arrive late or
incomplete. Therefore, we do not accept responsibility for any
errors or omissions that are present in this message, or any
attachment, that have arisen as a result of e-mail transmission.
If verification is required, please request a hard-copy version.
Any views or opinions presented are solely those of the author
and do not necessarily represent those of Syne qua non.
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: AW: [R] Rank and extract data from a series

2003-09-23 Thread Tony Plate

Using Thomas Unternährer's handy example, one could also do:

 X - c(1, 4.5, 2.3, 1, 7.3)
 mean(order(X, decreasing=TRUE)[1:2])
[1] 3.5

I think this will give the same results as Thomas Unternährer's suggested 
code in almost all cases, but it is perhaps more concise and direct 
(provided that you don't actually need the values of the top items).

(of course you have to change the 1:2 to 1:10 for your needs).

Note that this question gets tricky if there are ties such that there is no 
unique set of row numbers that identify N top items.

For example, consider the following data:

 X - c(1,3,2,3,4)

Taking top two, should the answer be 3.5 (avg of row numbers 2 and 5), 
4.5 (avg of row numbers 4 and 5), or 3.67 (avg of row numbers 2,4 and 5)?

 mean(order(X, decreasing=TRUE)[1:2])
[1] 3.5
 order(X, decreasing=TRUE)[1:2]
[1] 5 2
 # Andy Liaw's suggestion:
 mean(which(X %in% sort(X, decreasing=TRUE)[1:2]))
[1] 3.67
 which(X %in% sort(X, decreasing=TRUE)[1:2])
[1] 2 4 5
 # Thomas Unternährer's suggestion:
 mean(match(sort(X, decreasing=TRUE)[1:2], X))
[1] 3.5
 match(sort(X, decreasing=TRUE)[1:2], X)
[1] 5 2

hope this helps,

Tony Plate

At Tuesday 02:23 PM 9/23/2003 +0200, Unternährer Thomas, uth wrote:

Hi,

I would like to rank a time-series of data, extract the top ten data 
items from this series, determine the
corresponding row numbers for each value in the sample, and take a mean 
of these *row numbers* (not the data).

I would like to do this in R, rather than pre-process the data on the 
UNIX command line if possible, as I need to calculate other statistics 
for the series.

I understand that I can use 'sort' to order the data, but I am not aware 
of a function in R that would allow me
to extract a given number of these data and then determine their 
positions within the original time series.

e.g.

Time series:

1.0 (row 1)
4.5 (row 2)
2.3 (row 3)
1.0 (row 4)
7.3 (row 5)
Sort would give me:

1.0
1.0
2.3
4.5
7.3
I would then like to extract the top two data items:

4.5
7.3
and determine their positions within the original (unsorted) time series:

4.5 = row 2
7.3 = row 5
then take a mean:

2 and 5 = 3.5

Thanks in advance.

James Brown

X - c(1, 4.5, 2.3, 1, 7.3)
X1 - sort(X, decreasing=TRUE)[1:2]
X2 - match(X1, X)
mean(X2)


Hope this helps

Thomas

___

James Brown

Cambridge Coastal Research Unit (CCRU)
Department of Geography
University of Cambridge
Downing Place
Cambridge
CB2 3EN, UK
Telephone: +44 (0)1223 339776
Mobile: 07929 817546
Fax: +44 (0)1223 355674
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]
http://www.geog.cam.ac.uk/ccru/CCRU.html
___




On Wed, 10 Sep 2003, Jerome Asselin wrote:

 On September 10, 2003 04:03 pm, Kevin S. Van Horn wrote:
 
  Your method looks like a naive reimplementation of integration, and
  won't work so well for distributions that have the great majority of
  the probability mass concentrated in a small fraction of the sample
  space.  I was hoping for something that would retain the
  adaptability of integrate().

 Yesterday, I've suggested to use approxfun(). Did you consider my
 suggestion? Below is an example.

 N - 500
 x - rexp(N)
 y - rank(x)/(N+1)
 empCDF - approxfun(x,y)
 xvals - seq(0,4,.01)
 plot(xvals,empCDF(xvals),type=l,
 xlab=Quantile,ylab=Cumulative Distribution Function)
 lines(xvals,pexp(xvals),lty=2)
 legend(2,.4,c(Empirical CDF,Exact CDF),lty=1:2)


 It's possible to tune in some parameters in approxfun() to better
 match your personal preferences. Have a look at help(approxfun) for
 details.

 HTH,
 Jerome Asselin

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list 
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Tony Plate   [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] confusion about what to expect?

2003-09-23 Thread Tony Plate

Have you investigated the drop= argument to [? (as in the expression 
testdata[,2,drop=F], which will return a dataframe).

[.data.frame has somewhat different behavior from [ on matrices with 
respect to the drop argument: If the result would be a dataframe with a 
single column, the default behavior of [.data.frame is to return a vector 
(return a dataframe always if drop=F), but if the result would be a 
dataframe with a single row, the default behavior is to return a dataframe 
(return a list if drop=T).

E.g.:
 class(data.frame(a=1:3,b=4:6)[,1])
[1] integer
 class(data.frame(a=1:3,b=4:6)[,1,drop=F])
[1] data.frame
 class(data.frame(a=1:3,b=4:6)[1,])
[1] data.frame
 class(data.frame(a=1:3,b=4:6)[1,,drop=T])
[1] list

The default behavior is often what you want, but when it isn't it can be 
confusing, especially it's not that easy to find documentation for this (at 
least not in a quick look through the FAQ, ?[, and An Introduction to R 
-- please excuse me if I overlooked something.)

The thing you have going on with names(testdata[...]) is merely a 
consequence of whether or not the result of the subsetting operation is a 
dataframe or a vector.

hope this helps,

Tony Plate

At Tuesday 04:08 PM 9/23/2003 -0700, you wrote:

In playing around with data.frames (and wanting a simple, cheap way to
use the variable and case names in plots; but I've solved that with
some hacks, yech), I noticed the following behavior with subsetting.
testdata - data.frame(matrix(1:20,nrow=4,ncol=5))
names(testdata) ## expect labels, get them
names(testdata[2,]) ## expect labels, get them
names(testdata[,2]) ## expect labels, but NOT --  STRIPPED OFF??
testdata[,2]  ## would have expect a name (X2) in the front? NOT EXPECTED
testdata[2,]  ## get what I expect
testdata[2,2]  ## just a number, not a sub-data.frame? unexpected
testdata[2,2:3] ## this is a data.frame
testdata[2:3,2:3] ## and this is, too.
 version
 _
platform i386-pc-linux-gnu
arch i386
os   linux-gnu
system   i386, linux-gnu
status   alpha
major1
minor8.0
year 2003
month09
day  20
language R

I don't have 1.7.1 handy at this location to test, but I would've
expected a data.frame-like object upon subsetting; should I have
expected otherwise?  (granted, a data.frame with just a single
variable could be thought of as silly, but it does have some extra
information that might be worthwhile, on occassion?)
I'm not sure that it is a bug, but I was caught by suprise.  If it
isn't a bug, and someone has a concise way to think through this, for
my future reference, I'd appreciate hearing about it.
best,
-tony
--
[EMAIL PROTECTED]http://www.analytics.washington.edu/
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN  Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email
CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Why does a[which(b == c[d])] not work?

2003-10-08 Thread Tony Plate

At Wednesday 03:06 PM 10/8/2003 +0200, Martin Maechler wrote:
Your question has been answered by Achim and Peter Dalgaard (at least).

Just a note:

Using
   a[which(logic)]
looks like a clumsy and inefficient way of writing
   a[ logic ]
and I think you shouldn't propagate its use ...
What then is the recommended way of treating an NA in the logical subset as 
a FALSE? (Or were you just talking about the given example, which didn't 
have this issue.  However, you admonition seemed more general.)

As in:
 x - 1:4
 y - c(1,2,NA,4)
 x[y %% 2 == 0]
[1]  2 NA  4
 x[which(y %% 2 == 0)]
[1] 2 4

Sometimes one might want the first result, but more usually, I want the 
second, and using which() seems a convenient way to get it.

-- Tony Plate

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: AW: [R] Getting rows from a dataframe

2003-10-09 Thread Tony Plate

If you're having so much trouble, perhaps it's because you want to get a 
vector result?  This requires a little more, and if so, perhaps one of the 
following provides what you are looking for:

 x - data.frame(a=1:3,b=4:6)
 # row as a data frame
 x[2,]
  a b
2 2 5
 # row as a list
 x[2,,drop=T]
$a
[1] 2
$b
[1] 5
 # row as a vector
 unlist(x[2,,drop=T])
a b
2 5
 # row as a vector again
 unlist(x[2,])
a b
2 5
 # row as a matrix (if x contains any non-numeric columns, this will be a 
character matrix)
 as.matrix(x[2,])
  a b
2 2 5
 # row as a vector (if x contains any non-numeric columns, this will be a 
character vector)
 as.matrix(x)[2,]
a b
2 5
 # row as a numeric vector (non-numeric columns in x will be converted to 
numeric data, see ?data.matrix for how)
 data.matrix(x[2,])
  a b
2 2 5


Tony Plate

At Thursday 05:40 PM 10/9/2003 +0100, Mark Lee wrote:
I have this right on the desk in front of me. I have gone through most
of this actually and have been looking for the answer for several
weeks now before resorting to this. The only reference I've found to
this is on page 20 under array indexing but didn't see the relation to
dataframes. Thanks,
Mark

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Subseting in a 3D array

2003-10-15 Thread Tony Plate

One way would be:

 apply(ib5km.lincol.random[1:3,], 1, function(i) ib5km15.dbc[i[1],i[2],])

(untested)

-- Tony Plate

At Wednesday 06:47 PM 10/15/2003 +0200, Agustin Lobo wrote:

Hi!

I have a 3d array:
 dim(ib5km15.dbc)
[1] 190 241  19
and a set of positions to extract:
 ib5km.lincol.random[1:3,]
 [,1] [,2]
[1,]   78   70
[2,]   29  213
[3,]  180   22
Geting the values of a 2D array
for that set of positions would
be:
 ima - ib5km15.dbc[,,1]
 ima[ib5km.lincol.random[1:10,]]
but don't find the way for the case
of the 3D array:
 ib5km15.dbc[ib5km.lincol.random[1:10,],]
Error in ib5km15.dbc[ib5km.lincol.random[1:10, ], ] :
incorrect number of dimensions
Could anyone suggest the way of subseting
the 3D array to get a vector of z values
for each position recorded in ib5km.lincol.random?
(avoiding the use of for loops).
Thanks

Agus

Dr. Agustin Lobo
Instituto de Ciencias de la Tierra (CSIC)
Lluis Sole Sabaris s/n
08028 Barcelona SPAIN
tel 34 93409 5410
fax 34 93411 0012
[EMAIL PROTECTED]
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] datetime data and plotting

2003-10-17 Thread Tony Plate

At Friday 02:20 PM 10/17/2003 -0400, Gabor Grothendieck wrote:
[material deleted]
Time zones are not part of the problem yet POSIXt forces this
extraneous complication on you.  chron has no time zones in the
first place and therefore allows you to work in the natural frame
of the problem, avoiding subtle problems like this.
This sort of thing has been discussed a number of times and I
had previously suggested that chron be moved to the base or else that
a timezone-less version of POSIXt be added to the base.  See:
https://stat.ethz.ch/pipermail/r-devel/2003-August/027269.html
I also see the usefulness of a time-zone-free time/date class, but why
does chron need to be moved to the base to be useful here?
-- Tony Plate

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] do.call() and aperm()

2003-10-21 Thread Tony Plate

I've also been thinking about how to specify that 'along' should be 
length(dim)+1.  At the moment one can specify any number from 0 up to 
length(dim)+1, but as you point out you have to spell out length(dim)+1 as 
the value for the along argument.  It would possible to make abind() 
automatically calculate along=length(dim)+1 when given along=NA, or 
along=-1, or along=+1.  Any preferences?

-- Tony Plate

At Tuesday 04:48 PM 10/21/2003 +0100, Robin Hankin wrote:
Hi everyone

I've been playing with do.call() but I'm having problems understanding it.

I have a list of n elements, each one of which is d dimensional
[actually an n-by-n-by ... by-n array].  Neither n nor d is known in
advance.  I want to bind the elements together in a higher-dimensional
array.
Toy example follows with d=n=3.

f - function(n){array(n,c(3,3,3))}
x -  sapply(1:3,f,simplify=FALSE)
Then what I want is

ans - abind(x[[1]] , x[[2]] , x[[3]]  , along=4)

[abind() is defined in library(abind)].

Note that dim(ans) is c(3,3,3,3), as required.

PROBLEM: how do I do tell do.call() that I want to give abind() the
extra argument along=4 (in general, I want
along=length(dim(x[[1]]))+1)?
Oblig Attempt:

jj - function(...){abind(... , along=4)}
do.call(jj , x)
This works, because I know that d=3 (and therefore use along=4), but
it doesn't generalize easily to arbitrary d.  I'm clearly missing
something basic.  Anyone?
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] do.call() and aperm()

2003-10-21 Thread Tony Plate

 do.call(abind c(list.of.arrays, list(along=4)))

This reminds me that I had been meaning to submit an enhancement of abind() 
that allows the first argument to be a list of arrays so that you could 
simply do abind(list.of.arrays, along=4), as I find this is a very common 
pattern.

-- Tony Plate

At Tuesday 04:48 PM 10/21/2003 +0100, Robin Hankin wrote:
Hi everyone

I've been playing with do.call() but I'm having problems understanding it.

I have a list of n elements, each one of which is d dimensional
[actually an n-by-n-by ... by-n array].  Neither n nor d is known in
advance.  I want to bind the elements together in a higher-dimensional
array.
Toy example follows with d=n=3.

f - function(n){array(n,c(3,3,3))}
x -  sapply(1:3,f,simplify=FALSE)
Then what I want is

ans - abind(x[[1]] , x[[2]] , x[[3]]  , along=4)

[abind() is defined in library(abind)].

Note that dim(ans) is c(3,3,3,3), as required.

PROBLEM: how do I do tell do.call() that I want to give abind() the
extra argument along=4 (in general, I want
along=length(dim(x[[1]]))+1)?
Oblig Attempt:

jj - function(...){abind(... , along=4)}
do.call(jj , x)
This works, because I know that d=3 (and therefore use along=4), but
it doesn't generalize easily to arbitrary d.  I'm clearly missing
something basic.  Anyone?
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] do.call() and aperm()

2003-10-21 Thread Tony Plate

Thanks, I appreciate knowing that.

abind() can currently take a fractional value for along, and behaves as per 
your description of 'catenation' in APL.

Does APL supply any hints as to what sort of value to give 'along' to tell 
abind() to perform 'lamination'?

-- Tony Plate

At Tuesday 01:22 PM 10/21/2003 -0400, Gabor Grothendieck wrote:


I suggest following APL as that is a well thought out system.
In APL terms there are two operations here called:
- catenation. In abind, this occurs when along = 1,2,...,length(dim)
- lamination.  In abind, this occurs when along = length(dim) + 1
however, the latter is really only one case of lamination in
which the added dimension comes at the end.  To do it in full
generality would require that one can add the new dimension
at any spot including before the first, between the first and
the second, ..., after the last.
In APL notation, if along has a fractional part then the new
dimension is placed between floor(along) and ceiling(along).
Thus along=1.1 would put the new dimension between the first
and second.  The actual value of the fractional part is not material.
---
From: Tony Plate [EMAIL PROTECTED]
I've also been thinking about how to specify that 'along' should be
length(dim)+1. At the moment one can specify any number from 0 up to
length(dim)+1, but as you point out you have to spell out length(dim)+1 as
the value for the along argument. It would possible to make abind()
automatically calculate along=length(dim)+1 when given along=NA, or
along=-1, or along=+1. Any preferences?
-- Tony Plate



___
No banners. No pop-ups. No kidding.
Introducing My Way - http://www.myway.com
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] what's going on here with substitute() ?

2003-10-23 Thread Tony Plate

I was trying to create a function with a value computed at creation time, 
using substitute(), but I got results I don't understand:

 this.is.R
Error: Object this.is.R not found
 substitute(this.is.R - function() X, 
list(X=!is.null(options(CRAN)[[1]])))
this.is.R - function() TRUE
 # the above expression as printed is what I want for the function definition
 eval(substitute(this.is.R - function() X, 
list(X=!is.null(options(CRAN)[[1]]
 this.is.R
function() X
 this.is.R()
[1] TRUE
 X
Error: Object X not found
 rm(this.is.R)
 # Try again a slightly different way
 substitute(this.is.R - function() X, 
list(X=!is.null(options(CRAN)[[1]])))
this.is.R - function() TRUE
 .Last.value
this.is.R - function() TRUE
 eval(.Last.value)
 this.is.R
function() X
 this.is.R()
[1] TRUE
 rm(this.is.R)

Why is the body of the function X when I substituted a different 
expression for X? Also, given that the body of the function is X, how does 
the function evaluate to TRUE since X is not defined anywhere (except in a 
list that should have been discarded.)

This happens with both R 1.7.1 and R 1.8.0 (under Windows 2000).

(yes, I did discover the function is.R(), but I still want to discover 
what's going here.)

-- Tony Plate

PS.  In S-plus 6.1, things worked as I had expected:

 substitute(this.is.R - function() X, 
list(X=!is.null(options(CRAN)[[1]])))
this.is.R - function()
F
 eval(substitute(this.is.R - function() X, 
list(X=!is.null(options(CRAN)[[1]]
function()
F
 this.is.R
function()
F
 this.is.R()
[1] F


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

1 2 >

1 - 100 of 186 matches

Mail list logo