from:"jim holtman"

[R] POSIXct dates on x-axis using xyplot

2007-09-10 Thread jim holtman

I am using 'xyplot' in lattice to plot some data where the x-axis is a
POSIXct date.  I have data which spans a 6 month period, but when I
plot it, only the last month is printed on the right hand side of the
axis.  I would have expected that at least I would have a beginning
and an ending point so that I have a point of reference as to the time
that the data spans.  Here is some test data.


 # create test data
 dates - seq(as.POSIXct('2006-01-03'), as.POSIXct('2006-06-26'), by='1 week')
 my.data - seq(1, length=length(dates))
 require(lattice)
[1] TRUE
 # plot only shows a single month (Jul on the right).  Would have
 # expected at least the beginning and the ending month since this spans
 # a 6 month period
 pdf('/test.pdf')
 xyplot(my.data ~ dates)
 dev.off()
windows
  2
 sessionInfo()
R version 2.5.1 (2007-06-27)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
[7] base

other attached packages:
 lattice
0.16-5
 Sys.info()
  sysname   release
Windows  NT 5.1
  version  nodename
(build 2600) Service Pack 2  JIM-LAPTOP
  machine login
x86 jim holtman
 user
jim holtman



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fitdistr()

2007-09-09 Thread jim holtman

I assume that you want to do the fitdistr on one of the columns of the
dataframe that you have read in. What does 'str(ONES3)' show?  If the
data is in the first column, try:

fitdistr(ONES3[[1]],chi-squared)


On 9/9/07, Terence Broderick [EMAIL PROTECTED] wrote:
 I am trying to fit the chi-squared distribution to a set of data using the 
 fitdistr function found in the MASS4 library, the data set is called ONES3, I 
 have loaded it using the command

  ONES3-read.table(ONES3.pdf,header=TRUE,na=NA)

  I print out the dataset ONES3 to the screen to make sure it has loaded

  Then I try to fit this data using the command fitdistr

   fitdistr(ONES3,chi-squared)

  and it returns the comment

  Error in fitdistr(ONES3, chi-squared) : 'x' must be a non-empty numeric 
 vector

  Can anybody help with this, I imagine it is a common mistake for beginners 
 like myself



 audaces fortuna iuvat

 -

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R first.id last.id function error

2007-09-07 Thread jim holtman

This function should do it for you:


 file1 - read.table(textConnection(   id rx week dv1
+ 1   1  11   1
+ 2   1  12   1
+ 3   1  13   2
+ 4   2  11   3
+ 5   2  12   4
+ 6   2  13   1
+ 7   3  11   2
+ 8   3  12   3
+ 9   3  13   4
+ 10  4  11   2
+ 11  4  12   6
+ 12  4  13   5
+ 13  5  21   7
+ 14  5  22   8
+ 15  5  23   5
+ 16  6  21   2
+ 17  6  22   4
+ 18  6  23   6
+ 19  7  21   7
+ 20  7  22   8
+ 21  8  21   9
+ 22  9  21   4
+ 23  9  22   5), header=TRUE)

 mark.function -
+ function(df){
+ df - df[order(df$id, df$week),]
+ # create 'diff' of 'id' to determine where the breaks are
+ breaks - diff(df$id)
+ # the first entry will be TRUE, and then every occurance of
non-zero in breaks
+ df$first.id - c(TRUE, breaks != 0)
+ # the last entry is TRUE and every non-zero breaks
+ df$last.id - c(breaks != 0, TRUE)
+ df
+ }

 mark.function(file1)
   id rx week dv1 first.id last.id
1   1  11   1 TRUE   FALSE
2   1  12   1FALSE   FALSE
3   1  13   2FALSETRUE
4   2  11   3 TRUE   FALSE
5   2  12   4FALSE   FALSE
6   2  13   1FALSETRUE
7   3  11   2 TRUE   FALSE
8   3  12   3FALSE   FALSE
9   3  13   4FALSETRUE
10  4  11   2 TRUE   FALSE
11  4  12   6FALSE   FALSE
12  4  13   5FALSETRUE
13  5  21   7 TRUE   FALSE
14  5  22   8FALSE   FALSE
15  5  23   5FALSETRUE
16  6  21   2 TRUE   FALSE
17  6  22   4FALSE   FALSE
18  6  23   6FALSETRUE
19  7  21   7 TRUE   FALSE
20  7  22   8FALSETRUE
21  8  21   9 TRUETRUE
22  9  21   4 TRUE   FALSE
23  9  22   5FALSETRUE




On 9/7/07, Gerard Smits [EMAIL PROTECTED] wrote:
 Hi R users,

 I have a test dataframe (file1, shown below) for which I am trying
 to create a flag for the first and last ID record (equivalent to SAS
 first.id and last.id variables.

 Dump of file1:

   file1
id rx week dv1
 1   1  11   1
 2   1  12   1
 3   1  13   2
 4   2  11   3
 5   2  12   4
 6   2  13   1
 7   3  11   2
 8   3  12   3
 9   3  13   4
 10  4  11   2
 11  4  12   6
 12  4  13   5
 13  5  21   7
 14  5  22   8
 15  5  23   5
 16  6  21   2
 17  6  22   4
 18  6  23   6
 19  7  21   7
 20  7  22   8
 21  8  21   9
 22  9  21   4
 23  9  22   5

 I have written code that correctly assigns the first.id and last.id variabes:

 require(Hmisc)  #for Lags
 #ascending order to define first dot
 file1- file1[order(file1$id, file1$week),]
 file1$first.id - (Lag(file1$id) != file1$id)
 file1$first.id[1]-TRUE  #force NA to TRUE

 #descending order to define last dot
 file1- file1[order(-file1$id,-file1$week),]
 file1$last.id  - (Lag(file1$id) != file1$id)
 file1$last.id[1]-TRUE   #force NA to TRUE

 #resort to original order
 file1- file1[order(file1$id,file1$week),]



 I am now trying to get the above code to work as a function, and am
 clearly doing something wrong:

   first.last - function (df, idvar, sortvars1, sortvars2)
 +   {
 +   #sort in ascending order to define first dot
 +   df- df[order(sortvars1),]
 +   df$first.idvar - (Lag(df$idvar) != df$idvar)
 +   #force first record NA to TRUE
 +   df$first.idvar[1]-TRUE
 +
 +   #sort in descending order to define last dot
 +   df- df[order(-sortvars2),]
 +   df$last.idvar  - (Lag(df$idvar) != df$idvar)
 +   #force last record NA to TRUE
 +   df$last.idvar[1]-TRUE
 +
 +   #resort to original order
 +   df- df[order(sortvars1),]
 +   }
  

 Function call:

   first.last(df=file1, idvar=file1$id,
 sortvars1=c(file1$id,file1$week), sortvars2=c(-file1$id,-file1$week))

 R Error:

 Error in as.vector(x, mode) : invalid argument 'mode'
  

 I am not sure about the passing of the sort strings.  Perhaps this is
 were things are off.  Any help greatly appreciated.

 Thanks,

 Gerard
[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting lines to sets of points

2007-09-07 Thread jim holtman

?segments

On 9/7/07, lawnboy34 [EMAIL PROTECTED] wrote:

 I am using R to plot baseball spray charts from play-by-play data. I have
 used the following command to plot the diamond:

 plot (0:250, -250:0, type=n, bg=white)
lines(c(125,150,125,100,125),c(-210,-180,-150,-180,-210), 
 col=c(black))

 I have also plotted different hit locations using commands such as the
 following:

 points(subset(framename$hit_x, framename$hit_traj==line_drive),
 subset(-framename$hit_y, framename$hit_traj==line_drive), pch=20,
 col=c(red))

 My question: Is there any easy way to plot a line from the origin (home
 plate) to each point on the graph? Preferably the line would share the same
 color as the dot that denotes where the ball landed. I have tried searching
 Google and these forums, and most graphing questions have to do with
 scatterplots or other varieties of graphs I am not using. Thanks very much
 in advance.

 -Jason
 --
 View this message in context: 
 http://www.nabble.com/Plotting-lines-to-sets-of-points-tf4404235.html#a12564704
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] remove particular elements in a vector

2007-09-07 Thread jim holtman

x - answer(100)
x - x[!is.na(x)]  # remove NAs

On 9/7/07, kevinchang [EMAIL PROTECTED] wrote:

 Hi,

 Is there any build-in function allowing us to remove a particular group of
 elements in a vector?

 For example, if I want to remove all the NA in the output of answer
 function . Please help. Thanks

  answer(100)
  [1]  1  2 NA  4 NA NA  7  8 NA NA 11 NA 13 14 NA 16 17 NA 19 NA NA 22 23
 NA NA
  [26] 26 NA 28 29 NA 31 32 NA 34 NA NA 37 38 NA NA 41 NA 43 44 NA 46 47 NA
 49 NA
  [51] NA 52 53 NA NA 56 NA 58 59 NA 61 62 NA 64 NA NA 67 68 NA NA 71 NA 73
 74 NA
  [76] 76 77 NA 79 NA NA 82 83 NA NA 86 NA 88 89 NA 91 92 NA 94 NA NA 97 98
 NA NA

 --
 View this message in context: 
 http://www.nabble.com/remove-particular-elements-in-a-vector-tf4404489.html#a12565480
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with the aggregate command

2007-09-07 Thread jim holtman

Your 'lst' is not the same length as either set1 or set2.  If one of
your columns in the dataframe is the year, then you should have:

aggregate(set1, set1$year, median)

On 9/7/07, Anup Nandialath [EMAIL PROTECTED] wrote:
 Dear friends,

 I have a data set with 23 columns and 38000 rows. It is a panel running from 
 the years 1991 through 2005. I want to aggregate the data and get the medians 
 of each of the 23 columns for each of the years. In other words my output 
 should be like this

 Year Median

 1991123
 1992145
 1993132

 etc.

 The sample lines of code to do this operation is

 set1 - subset(as.data.frame(dataset),rep1==1)
 set2 - subset(as.data.frame(dataset),rep1==0)
 lst - list(unique(yeara))

 y1 - aggregate(set1,lst,median)
 y2 - aggregate(set2,lst,median)

 However I'm getting an error as follows
 Error in FUN(X[[1]], ...) : arguments must have same length

 Can somebody please help me with what I'm doing wrong here?

 Thanks in advance
 Regards

 Anup




 -

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] list element to matrix

2007-09-05 Thread jim holtman

If they are already a matrix in the list, then you don't have to use
'as.matrix'; you can just say:

M1 - D[[1]]

Now the question is, what do you mean by how do you index M1?  Do you
want to go through the list applying a function to each matrix?  If
so, then just 'lapply'.  For example, to get the column means, you
would do:

mean.list - lapply(D, colMeans)

Can you explain in a little more detail the problem you are trying to solve.

On 9/5/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 I have created a list of matrices using sapply or lapply and wish to 
 extract each of the matrices as a matrix.  Some of them are 2x2, 3x3, etc.

 I can do this one at a time as:

 M1-as.matrix(D[[1]])

 How can repeat this process for an unknown number of entries in the list?  In 
 other words, how shall I index M1?

 Diana

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] writing elements in list as a data frame

2007-09-05 Thread jim holtman

Try this:

 sls - list(a=matrix(sample(10), ncol=2, dimnames=list(NULL, c('x', 'y'))),
+ b=matrix(sample(16), ncol=2, dimnames=list(NULL, c('x', 'y'
 sls
$a
 x  y
[1,] 8  2
[2,] 9 10
[3,] 4  1
[4,] 5  7
[5,] 3  6

$b
  x  y
[1,]  4 14
[2,]  3 15
[3,] 16  5
[4,]  1  9
[5,]  8  7
[6,] 10  2
[7,] 12 13
[8,] 11  6

 # create output matrix
 do.call('rbind', lapply(names(sls), function(.name){
+ data.frame(sls[[.name]], Name=.name)
+ }))
x  y Name
1   8  2a
2   9 10a
3   4  1a
4   5  7a
5   3  6a
6   4 14b
7   3 15b
8  16  5b
9   1  9b
10  8  7b
11 10  2b
12 12 13b
13 11  6b




On 9/5/07, Srinivas Iyyer [EMAIL PROTECTED] wrote:
 Dear R-helpers,
 Lists in R are stumbling block for me.

 I kindly ask you to help me able to write a
 data-frame.

 I have a list of lists.

  sls[1:2]
 $Andromeda_maya1
   x   y
 [1,] 369 103
 [2,] 382 265
 [3,] 317 471
 [4,] 169 465
 [5,] 577 333

 $Andromeda_maya2
x   y
  [1,] 173 507
  [2,] 540 395
  [3,] 268 143
  [4,] 346 175
  [5,] 489  91

 I want to be able to write a data.frame like the
 following:
 X   Y Name
 369 103  Andromeda_maya1
 382 265  Andromeda_maya1
 317 471  Andromeda_maya1
 169 465  Andromeda_maya1
 577 333  Andromeda_maya1
 173 507  Andromeda_maya2
 540 395  Andromeda_maya2
 268 143  Andromeda_maya2
 346 175  Andromeda_maya2
 489  91  Andromeda_maya2

 Is there a way to convert this list-of-list into a
 data.frame.

 Thanks
 srini

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] change all . to 0 in a data.frame

2007-09-05 Thread jim holtman

Here is one way.  You might want to read in the data with 'as.is=TRUE'
to prevent conversion to factors.

 x - data.frame(a=c(1,2,3,'.',5,'.'))
 str(x)
'data.frame':   6 obs. of  1 variable:
 $ a: Factor w/ 5 levels .,1,2,3,..: 2 3 4 1 5 1
 # replace '.' with zero; either readin with 'as.is=TRUE' or convert to 
 character
 x$a - as.character(x$a)
 x$a[x$a == '.'] - '0'
 x$a - as.numeric(x$a)
 str(x)
'data.frame':   6 obs. of  1 variable:
 $ a: num  1 2 3 0 5 0




On 9/5/07, Dieter Best [EMAIL PROTECTED] wrote:
 Hello,
  I read in a tab delimited text file via mydata = read.delim(myfile). The 
 text file was originally an excel file where . was used in place of 0. Now 
 all the columns which should be integers are factors. Any ideas how to change 
 all the . to 0 and factors back to integer?
  Thanks a lot in advance for any suggestions,
  -- D


 -

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confusion using functions to access the function call stack example section

2007-09-04 Thread jim holtman

It is because you have a recursive function call and the value of 'y'
when you print is it 0.  I have added another statement that might
help clarify what you are seeing.  At the point at which the most
current value of the function 'ggg' is evaluated (last call), the
value of 'y' is zero and you are 5 levels down from the 'main frame':

 gg - function(y) {
+cat (gg y=, y, current frame =, sys.nframe(), \n)
+ggg - function() {
+cat(y = , y, \n)
+cat(current frame is , sys.nframe(), \n)
+cat(parents are , sys.parents(), \n)
+print(sys.function(0)) # ggg
+print(sys.function(2)) # gg
+}
+
+if (y  0) gg(y-1) else ggg()
+ }

 gg(3)
gg y= 3 current frame = 1
gg y= 2 current frame = 2
gg y= 1 current frame = 3
gg y= 0 current frame = 4
y =  0
current frame is  5
parents are  0 1 2 3 4
function() {
   cat(y = , y, \n)
   cat(current frame is , sys.nframe(), \n)
   cat(parents are , sys.parents(), \n)
   print(sys.function(0)) # ggg
   print(sys.function(2)) # gg
   }
environment: 0x01cf5f6c
function(y) {
   cat (gg y=, y, current frame =, sys.nframe(), \n)
   ggg - function() {
   cat(y = , y, \n)
   cat(current frame is , sys.nframe(), \n)
   cat(parents are , sys.parents(), \n)
   print(sys.function(0)) # ggg
   print(sys.function(2)) # gg
   }

   if (y  0) gg(y-1) else ggg()
}


On 9/4/07, Leeds, Mark (IED) [EMAIL PROTECTED] wrote:
 I was going through the example below which is taken from the example
 section in the R documentation for accessing the function call stack.
 I am confused and I have 3 questions that I was hoping someone could
 answer.

 1) why is y equal to zero even though the call was done with gg(3)

 2) what does parents are 0,1,2,0,4,5,6,7 mean ? I understand what a
 parent frame is but how do the #'s relate to this
 particular example ? Why is the current frame # 8 ?

 3) it says that sys.function(2) should be gg but I would think that
 sys.function(1) would be gg since it's one up from where
 the call is being made.

 Thanks a lot. If the answers are too complicated and someone knows of a
 good reference that goes into more details about
 the sys functions, that's appreciated also.




 gg - function(y) {
ggg - function() {
cat(y = , y, \n)
cat(current frame is , sys.nframe(), \n)
cat(parents are , sys.parents(), \n)
print(sys.function(0)) # ggg
print(sys.function(2)) # gg
}

if (y  0) gg(y-1) else ggg()
 }

 gg(3)



 # OUTPUT


 y =  0
 current frame is  8
 parents are  0 1 2 0 4 5 6 7
 function() {
cat(y = , y, \n)
cat(current frame is , sys.nframe(), \n)
cat(parents are , sys.parents(), \n)
print(sys.function(0)) # ggg
print(sys.function(2)) # gg
}
 environment: 0x8a9cc68
 function (expr, envir = parent.frame(), enclos = if (is.list(envir) ||
is.pairlist(envir)) parent.frame() else baseenv())
 .Internal(eval.with.vis(expr, envir, enclos))
 environment: 0x8974ea0
 

 This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Howto sort dataframe columns by colMeans

2007-09-04 Thread jim holtman

Here is one way of doing it by 'skipping' the first column which is a
factor and your 'time':

 x - read.table(textConnection( time   met-amet-bmet-c
+ 00:0042 18  99
+ 00:0588 16  67
+ 00:1080 27  84), header=TRUE)
 x.mean - colMeans(x[-1])
 x.new - x[,c('time', names(sort(x.mean, decreasing=TRUE)))]

 x.new
   time met.c met.a met.b
1 00:00994218
2 00:05678816
3 00:10848027



On 9/4/07, Lynn Osburn [EMAIL PROTECTED] wrote:

 I read from external data source containing several columns.  Each column
 represents value of a metric.  The columns are time series data.

 I want to sort the resulting dataframe such that the column with the largest
 mean is the leftmost column, descending in colMean values to the right.

 I see many solutions for sorting rows based on some column characteristic,
 but haven't found any discussion of sorting columns based on column
 characteristics.

 viz.  input data looks like this
  time   met-amet-bmet-c
 00:0042 18  99
 00:0588 16  67
 00:1080 27  84

 desired output:
  time   met-cmet-a met-b
 00:0099 42  18
 00:0567 88  16
 00:1084 80  27

 Thanks,
 -Lynn

 --
 View this message in context: 
 http://www.nabble.com/Howto-sort-dataframe-columns-by-colMeans-tf4380044.html#a12485729
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data.frame loses name when constructed with one column

2007-09-04 Thread jim holtman

Try drop=FALSE:

 x
  out pred1 predd2
1   1   2.03.0
2   2   3.55.5
3   3   5.5   11.0
 x[,1]
[1] 1 2 3
 data.frame(x[,1])
  x...1.
1  1
2  2
3  3
 data.frame(x[,1, drop=FALSE])
  out
1   1
2   2
3   3



On 9/4/07, Stan Hopkins [EMAIL PROTECTED] wrote:
 Not sure why the data.frame function does not capture the name of the column 
 field when its being built with only one column.

 Can anyone help?



  data
  out pred1 predd2
 1   1   2.03.0
 2   2   3.55.5
 3   3   5.5   11.0
  data1=data.frame(data[,1])
  data1
  data...1.
 1 1
 2 2
 3 3
  data1=data.frame(data[,1:2])
  data1
  out pred1
 1   1   2.0
 2   2   3.5
 3   3   5.5
  sessionInfo()
 R version 2.5.1 (2007-06-27)
 i386-pc-mingw32

 locale:
 LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
 States.1252;LC_MONETARY=English_United 
 States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods
 [7] base
 

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data formatting: from rows to columns

2007-08-28 Thread jim holtman

Here is a way using sprintf:

x - read.table(textConnection( V2  V3
27  2032567  19
28  2035482  19
126 2472826  19
132 2473320  19
136 2035480 135
145 2062458 135
148 2074927 135
151 2102395 142
156 2027252 142
158 2473082 142))

# output the data
cat(sprintf(%d\n%d\n\n, x$V2, x$V3), sep='', file='tempxx.txt')


On 8/28/07, Federico Calboli [EMAIL PROTECTED] wrote:
 Hi All,

 I have some data I need to write as a file from R to use in a different 
 program.
 My data comes as a numeric matrix of n rows and 2 colums, I need to transform
 each row as a two rows 1 col output, and separate the output of each row with 
 a
 blanck line.

 Foe instance I need to go from this:

  V2  V3
 27  2032567  19
 28  2035482  19
 126 2472826  19
 132 2473320  19
 136 2035480 135
 145 2062458 135
 148 2074927 135
 151 2102395 142
 156 2027252 142
 158 2473082 142

 to

 2032567
 19

 2035482
 19

 2472826
 19

 2473320
 19

 2035480
 135

 ...

 Any hint? I seem a bit stuck. cat(unlist(data), file ='data.txt', sep = '\n')
 (obviously) does not work...

 Cheers,

 Fede






 --
 Federico C. F. Calboli
 Department of Epidemiology and Public Health
 Imperial College, St Mary's Campus
 Norfolk Place, London W2 1PG

 Tel  +44 (0)20 7594 1602 Fax (+44) 020 7594 3193

 f.calboli [.a.t] imperial.ac.uk
 f.calboli [.a.t] gmail.com

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] alternate methods to perform a calculation

2007-08-28 Thread jim holtman

I think you can use 'outer'

outer(b$xk1, a$x1, function(y,z)abs(z-y))
outer(b$xk2, a$x2, function(y,z)abs(z-y))

On 8/28/07, dxc13 [EMAIL PROTECTED] wrote:

 Consider a data frame (x) with 2 variables, x1 and x2, having equal values.
 It looks like:

 x1   x2
 11
 22
 33

 Now, consider a second data frame (xk):
 xk1   xk2
 0.50.5
 1.00.5
 1.50.5
 2.00.5
 0.51
 1.01
 1.51
 2.01
 0.51.5
 1.01.5
 1.51.5
 2.01.5
 0.52
 1.02
 1.52
 2.02

 I have written code to calculate some differences between these two data
 sets; the main idea is to subtract off each element of xk1 from each value
 of x1, and similarly for xk2 and x2.  This is what I have:

 w1 - array(NA,dim=c(nrow(xk),length(x$x1)))
 w2 - array(NA,dim=c(nrow(xk),length(x$x2)))
 for (j in 1:nrow(xk)) {
w1[j,] - abs(x$x1-xk$xk1[j])
w2[j,] - abs(x$x2-xk$xk2[j])
 }

 Is there  a way to do the above calculation without use of a FOR loop?
 Thank you

 Derek


 --
 View this message in context: 
 http://www.nabble.com/alternate-methods-to-perform-a-calculation-tf4344469.html#a12376906
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to calculate mean into a list

2007-08-28 Thread jim holtman

try:

colMeans(do.call('rbind', lapply(a0, mean)))


On 8/28/07, Weiwei Shi [EMAIL PROTECTED] wrote:
 Dear Listers:

 I have this task and suppose a0 is a list of 10 data.frames, I want to
 calculate like this
  (a0[[1]]+a0[[2]]+..+a[[10]])/10

 Thanks.

 --
 Weiwei Shi, Ph.D
 Research Scientist
 GeneGO, Inc.

 Did you always know?
 No, I did not. But I believed...
 ---Matrix III

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset question

2007-08-27 Thread jim holtman

Here is one way of checking to see if a row contains a particular
value and setting the contents of a new column:

n - 20
# create test data
x - 
data.frame(sample(letters,n),sample(letters,n),sample(letters,n),sample(letters,n))
# add a column indicating if the row contains 'a', 'b' or 'c'
x$a - apply(x[, 1:4], 1, function(.row) any(.row %in% c('a','b','c'))) + 0


On 8/27/07, Kirsten Beyer [EMAIL PROTECTED] wrote:
 I would like to code records in a dataset with a 1 if any of the
 columns 9-67 contain a particular code, and zero if they don't.  I've
 been working with subset and it seems that something like
 subset(data, data[9:67]--12345) would work, but I have been
 unsuccessful so far.  It seems like a simple problem - any help is
 appreciated!

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fill circles

2007-08-25 Thread jim holtman

Here is a function that will generate a color sequence for an input
vector,  You can specify the colors to use, the range and the number
of color steps:

# specify the colors and the number of increments you want for a specified
# range.  It will return the colors for the input vector
# specify the colors and the number of increments you want for a specified
# range.  It will return the colors for the input vector
f.color -
function(input, # input vector
 colors=c('green','yellow','red'),  # desired colors
 input.range=c(0,0.01),  # range of input to create colors
 input.steps=10)  # number of increments
{
myColors - colorRampPalette(colors)(input.steps)  # generate colors
myColors[cut(input, seq(input.range[1], input.range[2],
length=input.steps+1),
labels=FALSE, include.lowest=TRUE)]
}
# generate a legend to show colors
plot.new()  # create blank plot
x - round(runif(15), 3)
legend('topleft', legend=x, fill=f.color(x, input.range=c(0,1)))
legend('topright', legend=x, fill=f.color(x, input.range=c(0,1),
colors=c('purple','red','blue','orange')))
legend('top', legend=x, fill=f.color(x, input.range=c(0,1),
colors=c('red','yellow','green')))

So you should be able to use something like this.

On 8/25/07, Cristian cristian [EMAIL PROTECTED] wrote:
 Hi all,
 I'm an R newbie,
 I did this script to create a scatterplot using the tree matrix from
 datasets package:

 library('datasets')
 with(trees,
 {
 plot(Height, Volume, pch=3, xlab=Height, ylab=Volume)
 symbols(Height, Volume, circles=Girth/12, fg=grey, inches=FALSE,
 add=FALSE)
 }
 )

 I'd like to use the column Named Height to fill the circles with colors
 (ex.: the small numbers in green then yellow and the high numbers in red).
 I'd like to have a legend for the size and the colors too.
 I did it manually using a script like that:
 color[(x=0.001)(x0.002)]-#41FF41
 color[(x=0.002)(x0.003)]-#2BFF2B
 color[(x=0.003)(x0.004)]-#09FF09
 color[(x=0.004)(x0.005)]-#00FE00
 color[(x=0.005)(x0.006)]-#00F700
 color[(x=0.006)(x0.007)]-#00E400
 color[(x=0.007)(x0.008)]-#00D600
 color[(x=0.008)(x0.009)]-#00C300 and so on but I don't like to do it
 manually... do know a solution...
 Thank you very much
 chris

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] as.numeric : what goes wrong?

2007-08-24 Thread jim holtman

Do an 'str' on the vector.  Are you sure it is not a 'factor'?

Try:

as.numeric(as.character(j1[1]))



On 8/24/07, Wolfgang Polasek [EMAIL PROTECTED] wrote:
 I have a character vector j1 created from dimnames and want it to convert it
 to numeric.
 Like the first element:

  j1[1]
  f896
 1  896

  as.numeric(j1[1])
 [1] 1990

 why is it not 896 as it should be?
 This is true fr the whole vector.

 Thanks
 W.P.

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need a variant of rbind for datasets with different numbers of columns

2007-08-22 Thread jim holtman

Where is the data coming from since it has a variable number of
columns in each row?  Is it coming from a text file?  If so, you can
use the fill=TRUE option when reading to fill out empty columns.
You need to provide at least a subset of the data so we can see what
you are working with.

On 8/22/07, Kirsten Beyer [EMAIL PROTECTED] wrote:
 Hello.  I am looking for a function that will allow me to paste rows
 together without regard for the numbers of columns in the datasets to
 be joined.  The only columns where it matters if they are aligned
 correctly are at the beginning - the rest of the columns represent
 differing numbers of ICD9 (disease) codes reported by each
 person(record) at a health visit.  They are in no particular order.

 For example, a result would look like this:

 patient  ICD91  ICD92  ICD93
 patient A   12345  67891543
 patient B3469   9090
 patient C   1234

 I am trying to accomplish this inside a loop which first identifies
 the codes associated with the person and then joins them to the
 person.  I have the code working so that it can create a row for each
 person, but I can't figure out how to join these rows together!  FYI,
 my dataset has 200,000+ people.

 Thanks

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rectify a program of seasonal dummies matrix

2007-08-21 Thread jim holtman

Your syntax is wrong; e.g.,
if i==j

should be

if (i == j)

same with your use of 'if else'.  You need to use the correct syntax.
Your example is hard to follow without the correct indentation since
you are using the incorrect syntax.

On 8/21/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Hi friends,
 I would like to construct a matrix of seasonal dummies with number of rows 
 (observations)=100. such matrix is written as follows:[1 0 0 0;0 1 0 0;0 0 1 
 0;0 0 0 1;1 0 0 0;0 1 0 0;0 0 1 0;0 0 0 1;etc...] . I wrote the following 
 program:
 T=100
 br=matrix(0,T,4)
 {
 for (i in 1:T)
 for (j in 1:4)
 if i==j
 br[i,j]=1
 if else (abs(i-j)%%4==0
 br[i,j]=1
 else
 br[i,j]=0
 }
 z-br
 z

 but unfortunately I obtained from the console the following message:
  {
 + for (i in 1:T)
 +  for (j in 1:4)
 + (if i==j)
 Erreur : syntax error, unexpected SYMBOL, expecting '(' dans :
 
 
  br[i,j]=1
 Erreur dans br[i, j] = 1 : objet i non trouvé
 
  (if else (abs(i-j)%%4==0)
 Erreur : syntax error, unexpected ELSE, expecting '(' dans (if else
  br[i,j]=1
 Erreur dans br[i, j] = 1 : objet i non trouvé
  else
 Erreur : syntax error, unexpected ELSE dans else
  br[i,j]=0
 Erreur dans br[i, j] = 0 : objet i non trouvé
}
 Erreur : syntax error, unexpected '}' dans   }
 
 Can you please rectify my smal program, I tried to rectify it but I can't. 
 Many thanks in advance.
[[alternative HTML version deleted]]


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to parse a string into the symbol for a data frame object

2007-08-19 Thread jim holtman

One way to do it is to pass in the character name of the dataframe you
want to reference and then use 'get' to access the value: e.g.,

df1 - data.frame(x=seq(0,10), y=seq(10,20))
df2 - data.frame(a=seq(0,10), b=seq(10,20))
# use the character names for referencing
for (df in c('df1', 'df2')){
# get the data to operate on (read-only)
.val - get(df)
# now you can reference the object
print(names(.val))
# or construct new objects to store the value in
# or you can use assign' to store back in the original object
assign(paste('temp.', df, sep=''), .val)
}


On 8/19/07, Darren Weber [EMAIL PROTECTED] wrote:
  I have several data frames, eg:

  df1 - data.frame(x=seq(0,10), y=seq(10,20))
  df2 - data.frame(a=seq(0,10), b=seq(10,20))

 It is common to create loops in R like this:

  for(df in list(df1, df2)){ #etc. }

 This works fine when you know the name of the objects to put into the
 list.  I assume that the order of the objects in the list is respected
 through the loop.  Inside the loop, the objects of the list are
 'dereferenced' using 'df' but, to my knowledge, there is no way to
 tell whether 'df' is a current representation of 'df1' or 'df2'
 without some additional book keeping.

 In addition, I really want to use 'paste' within the loop to create a
 new string value that will have the symbol name of a data frame to be
 dereferenced, e.g.:

  for(n in c(1, 2)){ dfString - paste('df', n, sep=); 
  print(eval(dfString)) }

 [1] df1
 [1] df2

 This is not what I want.  I have read through the documentation on
 eval and similar commands like substitute and quote.  I program
 regularly, but I do not understand these constructs in R.  I do not
 understand the R framework for parsing and evaluation and I don't have
 a lot of time right now to get lost in this detail.  I could really
 use some help to get the string values in my loop to be parsed into
 symbols that refer to the data frame objects df1 and df2.  How is this
 done?

 Best, Darren

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matching elements from two vectors

2007-08-17 Thread jim holtman

 x - c(1,2,1,1,3,5,3,3,1)
 y - c(2,3)
 intersect(x,y)
[1] 2 3

On 8/17/07, Gonçalo Ferraz [EMAIL PROTECTED] wrote:
 Hi,

 Imagine a vector x with elements (1,2,1,1,3,5,3,3,1) and a vector y with
 elements (2,3). I need to find out what elements of x match any of the
 elements of y.

 Is there a simple command that will return a vector with elements
 (F,T,F,F,T,F,T,T,F). Ideally, I would like a solution that works with
 dataframe colums as well.

 I have tried x==y and it doesn't work.
 x==any(y) doesn't work either. I realize I could write a foor loop and go
 through each element of y asking if it matches any element of x, but isn't
 there a shorter way?

 Thanks,
 Gonçalo

[[alternative HTML version deleted]]


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matching elements from two vectors

2007-08-17 Thread jim holtman

Also if you want all the matches

 x[x %in% y]
[1] 2 3 3 3


On 8/17/07, Gonçalo Ferraz [EMAIL PROTECTED] wrote:
 Hi,

 Imagine a vector x with elements (1,2,1,1,3,5,3,3,1) and a vector y with
 elements (2,3). I need to find out what elements of x match any of the
 elements of y.

 Is there a simple command that will return a vector with elements
 (F,T,F,F,T,F,T,T,F). Ideally, I would like a solution that works with
 dataframe colums as well.

 I have tried x==y and it doesn't work.
 x==any(y) doesn't work either. I realize I could write a foor loop and go
 through each element of y asking if it matches any element of x, but isn't
 there a shorter way?

 Thanks,
 Gonçalo

[[alternative HTML version deleted]]


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] comparison of arrays of strings

2007-08-16 Thread jim holtman

Read them into 2 different vectors and then use 'intersect'.

On 8/17/07, ramakanth reddy [EMAIL PROTECTED] wrote:
 Hi

 i have two arrays of genes names,one with18 gene names and the other with 
 24000 gene names,I have to compare both of them for finding common names.

 I have both the arrays in .csv format.i loaded the files and tried to compare 
 them using for and if loops

 but I got the error Error in Ops.factor(cgh[i, 1], cgh[j, 2]) :
level sets of factors are different

 Please suggest me how to solve this problem or any other alternative procedure

 Thanks
 ramakanth




  Get the freedom to save as many mails as you wish. To know how, go to 
 http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html
[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] for plots

2007-08-16 Thread jim holtman

Turn 'Recording on for the plots.

windows(record=TRUE)

or select from the GUI.

On 8/17/07, Brad Zhang [EMAIL PROTECTED] wrote:
 Hi, All,

 I am a beginner for R. Now I have installed R 2.5.1 in Window
 environment. After I run a program such as gam I would like to display
 a plot for the object. The following is an example. When I did this,
 only the last plot was presented on my screen. How can I get a plot
 before the last plot? I mean if the object has several plots how can I
 get those?



 gam.object - gam(y ~ s(x,6) + z,data=gam.data)

 plot(gam.object,se=TRUE)





 Thank you.



 Brad.


 Dr. Guicheng (Brad) Zhang
 Senior Research Officer

 School of Paediatrics and Child Health
 Telethon Institute for Child Health Research
 100 Roberts Road, Subiaco
 Western Australia, 6008 AUSTRALIA

 Email: [EMAIL PROTECTED]
 Phone: 93407896
 Fax: 93882097



[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to write to a table column by column?

2007-08-13 Thread jim holtman

Assuming that the daily.incomes are the same lengths, then your loop could be:

Lst - list()
for (i in 1:count) Lst[[i]] - list(..)
Lst.col - do.call('cbind', Lst)

On 8/12/07, Yuchen Luo [EMAIL PROTECTED] wrote:
 Dear friends.
 Every loop of my program will result in a list that is very long, with a
 structure similar to the one below:

 Lst - list(name=Fred, wife=Mary, daily.incomes=c(1:850))

 Please notice the large size of daily.incomes.

 I need to store all such lists in a csv file so that I can easily view them
 in Excel. Excel cannot display a row of more than 300 elements, therefore, I
 have to store the lists as columns. It is not hard to store one list as a
 column in the csv file. The problem is how to store the second list as a
 second column, so that the two columns will lie side by side to each other
 and I can easily compare their elements. ( If I use 'appened=TRUE', the
 second time series will be stored in the same column. )

 Thank you for your tine and your help will be highly appreciated!!

 Best

 Yuchen Luo

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extract part of vector

2007-08-13 Thread jim holtman

This should do it:

 txt
[1] 
\nhttp://www.mysite.com/system/empty.asp?P=2VID=defaultSID=421384237289476S=1C=18631;
[2] 
\nhttp://www.mysite.com/system/empty.asp?P=123VID=defaultSID=421384237289476S=1C=18643;
[3] 
\nhttp://www.mysite.com/system/empty.asp?P=342VID=defaultSID=421384237289476S=1C=18634\n;
[4] 
\nhttp://www.mysite.com/system/empty.asp?P=232VID=defaultSID=421384237289476S=1C=18645;
[5] 
\nhttp://www.mysite.com/system/empty.asp?P=2345VID=defaultSID=421384237289476S=1C=18254;
[6] 
\nhttp://www.mysite.com/system/empty.asp?P=257654VID=defaultSID=421384237289476S=1C=18732;
[7] 
\nhttp://www.mysite.com/system/empty.asp?P=22VID=defaultSID=421384237289476S=1C=18637;
[8] 
\nhttp://www.mysite.com/system/empty.asp?P=2463VID=defaultSID=421384237289476S=1C=18575\n;

 gsub(^.*asp.P=([[:digit:]]+).*$, '\\1', txt)
[1] 2  1233422322345   257654 22 2463



On 8/13/07, Lauri Nikkinen [EMAIL PROTECTED] wrote:
 Dear R-users,

 How do I extract numbers between asp?P= and VID from my txt vector? I have
 tried grep function with no luck.

 txt - c(
 http://www.mysite.com/system/empty.asp?P=2VID=defaultSID=421384237289476S=1C=18631;,
 
 http://www.mysite.com/system/empty.asp?P=123VID=defaultSID=421384237289476S=1C=18643;,
 
 http://www.mysite.com/system/empty.asp?P=342VID=defaultSID=421384237289476S=1C=18634
 ,
 http://www.mysite.com/system/empty.asp?P=232VID=defaultSID=421384237289476S=1C=18645;,
 
 http://www.mysite.com/system/empty.asp?P=2345VID=defaultSID=421384237289476S=1C=18254;,
 
 http://www.mysite.com/system/empty.asp?P=257654VID=defaultSID=421384237289476S=1C=18732;,
 
 http://www.mysite.com/system/empty.asp?P=22VID=defaultSID=421384237289476S=1C=18637;,
 
 http://www.mysite.com/system/empty.asp?P=2463VID=defaultSID=421384237289476S=1C=18575
 )

 The result should be like
 2
 123
 342
 232
 2345
 257654
 22
 2463

 Thanks,
 Lauri

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] invert 160000x160000 matrix

2007-08-13 Thread jim holtman

You would need 200GB to store a since image, so if you have about 1TB
of physical memory on your computer, it might be possible.

On 8/13/07, Jiao Yang [EMAIL PROTECTED] wrote:
 Can R invert a 16x16 matrix with all positive numbers?  Thanks a lot!

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Write values on y axe

2007-08-12 Thread jim holtman

Does this do what you want:

x - runif(10)
plot(x)
# put min/max in red
axis(2, at=round(range(x), 4), col.axis='red', las=2)



On 8/12/07, akki [EMAIL PROTECTED] wrote:
 Hi,
 I have values on y axe from 0.0001 to 3.086. When I do plot I have writen
 values: 0.001, 0.050,1.000 ..., but how I can write on graph  the minimum
 value and maximum value, with all decimals (I don't want to use the format
 1e-0x)? I am using log scale.

 For example, if I have the values:
 0.0001
 0.0015
 0.0256
 0.0236
 
 0.0201
 2.9668
 3.0086

 I need have each 'x' value put on y axe, and add the value minimum and
 maximum on   my graph.
 How can I do it?

 I do:
 plot(o$a, log=y, type=l, col=colors[1], xlab=a_x, ylab=a_y,
 cex.lab=0.8)
 lines(o$b, type=l, pch=1, lty=1, col=colors[2])
 lines(o$c, type=l, pch=2, lty=2, col=colors[3])

 to I draw my graph.

 Thanks in advance.

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Legend on graph

2007-08-12 Thread jim holtman

If you are asking to have the values plotted on top of the legend,
then you can do the following:

plot(x, y, type='n', ...) # create plot, but don't plot
legend('topright', ...)
lines(x,y)  # now plot the data

If you want it outside the plot, check the archives for several examples.

On 8/12/07, akki [EMAIL PROTECTED] wrote:
 Hi,
 I have a problem when I want to put a legend on the graph.
 I do:

 legend(topright, names(o), cex=0.9, col=plot_colors,lty=1:5, bty=n)

 but the legend is writen into the graph (graphs' top but into the graph),
 because I have values on this position. How can I write the legend on top
 the graph without the legend writes on graph's values.

 Thanks.

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to control the number format on plot axes ?

2007-08-12 Thread jim holtman

Here is a way that you can put the formatting that you want; you were
not clear on exactly what you were after.  You can setup the 'labels'
argument for whatever you want.

a-1:10
myTicks-c(0.1,1,2,5,10)
# set ylim to range of myTicks that you want
plot(x=a,y=a,log=y,type=p,yaxt=n, ylim=range(myTicks))
# change the sprintf to whatever formatting you want
axis(side=2,at=myTicks,
labels=ifelse(myTicks = 1, sprintf(%.0f, myTicks),
sprintf(%0.1f, myTicks)))





On 8/12/07, Sébastien [EMAIL PROTECTED] wrote:
 Dear R-users,

 Basically, everything is in the title of my e-mail. I know that some
 threads from the archives have already addressed this question but they
 did not really give a clear solution.
 Here is a series of short codes that will illustrate the problem:

 # First
 a-1:10
 plot(x=a,y=a,log=y,type=p)

 # Second
 a-1:10
 myTicks-c(1,2,5,10)
 plot(x=a,y=a,log=y,type=p,yaxt=n)
 axis(side=2,at=myTicks)

 # Third
 a-1:10
 myTicks-c(0.1,1,2,5,10)
 plot(x=a,y=a,log=y,type=p,yaxt=n)
 axis(side=2,at=myTicks)

 # Forth
 a-0.1:10
 plot(x=a,y=a,log=y,type=p)

 In the first and second examples, the plots are identical and the tick
 labels are 1, 2, 5 and 10. In the third, the labels are number in the
 x.0 format (1.0, 2.0, 5.0 and 10.0), even if there is no point below 1.
 The only reason I see is because the first element of myTicks is 0.1.
 And, the forth example is self-explanatory.
 Interestingly, the 'scales' argument of xyplot in the lattice package do
 not add these (unnecessary) decimals on labels greater than 1.

 Do you know how I could transpose the behavior of the lattice 'scales'
 argument to the 'axis' function ?

 Thank you

 PS: No offense, but please don't suggest I use lattice. I have to go for
 base R graphics in my full-scale project (it is a speed issue).

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Replace NAs in dataframe: what am I doing wrong

2007-08-11 Thread jim holtman

The problem is that the first column is probably a factor and you are
trying to assign a value that is not already a 'level' in the factor.
One way is to read the data with as.is=TRUE to keep it as character,
replace the NAs and then convert back to factors if you want to:

 x - read.csv(textConnection(A,B
+ a,3
+ b,4
+ .,.
+ c,5), na.strings='.', as.is=TRUE)  # keep as character
 # replace NAs
 x[is.na(x[,1]), 1] - Missing Value
 # convert back to factors if you want to
 x[[1]] - factor(x[[1]])
 str(x)
'data.frame':   4 obs. of  2 variables:
 $ A: Factor w/ 4 levels a,b,c,Missing Value: 1 2 4 3
 $ B: int  3 4 NA 5




On 8/11/07, Sébastien [EMAIL PROTECTED] wrote:
 Dear R-users,

 My script imports a dataset from a csv file, in which missing values are
 represented by .. This importation is done into a dataframe using the
 read.table function with na.strings = .  Then I want to replace the
 NAs in the first column of the dataframe by Missing data. I am using
 the following code to do so :

 mydata-data.frame(read.table(myFile,sep=,,header=TRUE,na.strings=.))
   # myFile is the full path of the source file

 mydata[,1][is.na(mydata[,1])]-Missing value

 This code works perfectly fine if this first column contains only
 missing values, i.e. .. As soon as it contains multiple levels and
 missing values, things start to get wrong. I get the following error
 message and the replacement is not done.

 Warning message:
 invalid factor level, NAs generated in: `[-.factor`(`*tmp*`,
 is.na(mydata[, 1]), value = Missing value)

 Is there an error in my code or is that a bug (I doubt about it) ?

 Thanks in advance.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] shell and shell.exec on Windows

2007-08-11 Thread jim holtman

If you are using Windows, then try:

system('cmd /c yourfile.xls')

This will invoke the windows command processor and it should pick the
correct association.

On 8/11/07, Erich Neuwirth [EMAIL PROTECTED] wrote:
 Thanks Gabor,
 system() indeed would be the answer, but it does not solve my problem
 because of some inconsistencies in WindowsXP.
 I will explain the story, because perhaps it can help somebody else to
 avoid wasting time.
 On my machine, when I doubleclick an .xlsm file, it is opened in Excel
 2007. .xls files are opened in Excel 2003.
 shell.exec(file.xls) and shell.exec(file.xlsm)
 also open the files in Excel 2003 and Excel 2007 respectively.

 system() does not invoke a shell, so I need to find the application
 associated with Excel to create a string with the name
 of the application and the name of the file to open.
 Then, something like
 system(\c:\\mypath\\CorrectVersionOfExcel.exe\
 \c:\\mydir\\myexcelfile.xlsm\)
 should work (and run the program invisibly)

 There are two helpful shell commands in WinXP
 ASSOC and FTYPE

 ASSOC .xls
   .xls=Excel.Sheet.8

 ASSOC .xlsm
   .xlsm=Excel.SheetMacroEnabled.12

 ftype Excel.Sheet.8
  Excel.Sheet.8=C:\Program Files\Microsoft Office\OFFICE11\EXCEL.EXE /e

 ftype Excel.SheetMacroEnabled.12
  Excel.SheetMacroEnabled.12=C:\PROGRA~2\MICROS~2\OFFICE11\EXCEL.EXE /e

 So despite the fact that doubleclicking .xlsm files or using
 shell.exec opens Excel 2007
 the application reported by assoc and ftype for .xlsm files is Excel 2003.


 Gabor Grothendieck wrote:
  The system() function has an invisible= argument.  The ryacas package
  uses system() to run yacas.  See the runYacas() and
  yacasInvokeString() functions in yacas.R for examples:
 http://ryacas.googlecode.com/svn/trunk/R/yacas.R
 
  On 8/11/07, Erich Neuwirth [EMAIL PROTECTED] wrote:
  I have an Excel workbook MyWorkbook.xls containing an Auto_Open macro
  which I want to be run from R.
 
  shell.exec(MyWorkbook.xls)
  does that.
 
  shell(start MyWorkbook.xls)
  also runs it.
 
  In both cases, the Excel window is visible on screen when Excel is started.
  Is there a way of opening the sheet with a hidden Excel window?
  start has some parameters (e.g. /MIN), which should allow this, but
  shell(start /MIN MyWorkbook.xls)
  also starts Excel visibly.
 
 
 
  --
  Erich Neuwirth, University of Vienna
  Faculty of Computer Science
  Computer Supported Didactics Working Group
  Visit our SunSITE at http://sunsite.univie.ac.at
  Phone: +43-1-4277-39464 Fax: +43-1-4277-39459
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 


 --
 Erich Neuwirth, University of Vienna
 Faculty of Computer Science
 Computer Supported Didactics Working Group
 Visit our SunSITE at http://sunsite.univie.ac.at
 Phone: +43-1-4277-39464 Fax: +43-1-4277-39459

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help wit matrices

2007-08-10 Thread jim holtman

Is this what you want:

 x - matrix(runif(100), 10)
 round(x, 3)
   [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
 [1,] 0.268 0.961 0.262 0.347 0.306 0.762 0.524 0.062 0.028 0.226
 [2,] 0.219 0.100 0.165 0.131 0.578 0.933 0.317 0.109 0.527 0.131
 [3,] 0.517 0.763 0.322 0.374 0.910 0.471 0.278 0.382 0.880 0.982
 [4,] 0.269 0.948 0.510 0.631 0.143 0.604 0.788 0.169 0.373 0.327
 [5,] 0.181 0.819 0.924 0.390 0.415 0.485 0.702 0.299 0.048 0.507
 [6,] 0.519 0.308 0.511 0.690 0.211 0.109 0.165 0.192 0.139 0.681
 [7,] 0.563 0.650 0.258 0.689 0.429 0.248 0.064 0.257 0.321 0.099
 [8,] 0.129 0.953 0.046 0.555 0.133 0.499 0.755 0.181 0.155 0.119
 [9,] 0.256 0.954 0.418 0.430 0.460 0.373 0.620 0.477 0.132 0.050
[10,] 0.718 0.340 0.854 0.453 0.943 0.935 0.170 0.771 0.221 0.929
 ifelse(x  .5, 1, 0)
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]010001100 0
 [2,]000011001 0
 [3,]110010001 1
 [4,]011101100 0
 [5,]011000100 1
 [6,]101100000 1
 [7,]110100000 0
 [8,]010100100 0
 [9,]010000100 0
[10,]101011010 1


On 8/10/07, Lanre Okusanya [EMAIL PROTECTED] wrote:
 Hello all,

 I am working with a 1000x1000 matrix, and I would like to return a
 1000x1000 matrix that tells me which value in the matrix is greater
 than a theshold value (1 or 0 indicator).
 i have tried
  mat2-as.matrix(as.numeric(mat10.25))
 but that returns a 1:10 matrix.
 I have also tried for loops, but they are grossly inefficient.

 THanks for all your help in advance.

 Lanre

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Countvariable for id by date

2007-08-09 Thread jim holtman

This should do what you want:

 x - read.table(textConnection(id;dg1;dg2;date;
+  1;F28;;1997-11-04;
+  1;F20;F702;1998-11-09;
+  1;F20;;1997-12-03;
+  1;F208;;2001-03-18;
+  2;F32;;1999-03-07;
+  2;F29;F32;2000-01-06;
+  2;F32;;2003-07-05;
+  2;F323;F2800;2000-02-05;), header=TRUE, sep=;, as.is=TRUE)
 # convert dates
 x$dateP - unclass(as.POSIXct(x$date))
 # matches for F20
 F20 - grep(F20, paste(x$dg1, x$dg2))
 # matches for F21 - F29
 F21 - grep(F2[1-9], paste(x$dg1, x$dg2))
 # grouping
 x$F20 - x$F21 - NA
 x$F20[F20] - rank(x$dateP[F20])
 x$F21[F21] - rank(x$dateP[F21])
 x
  id  dg1   dg2   date  X  dateP F21 F20
1  1  F28   1997-11-04 NA  878601600   1  NA
2  1  F20  F702 1998-11-09 NA  910569600  NA   2
3  1  F20   1997-12-03 NA  881107200  NA   1
4  1 F208   2001-03-18 NA  984873600  NA   3
5  2  F32   1999-03-07 NA  920764800  NA  NA
6  2  F29   F32 2000-01-06 NA  947116800   2  NA
7  2  F32   2003-07-05 NA 1057363200  NA  NA
8  2 F323 F2800 2000-02-05 NA  949708800   3  NA


On 8/9/07, David Gyllenberg [EMAIL PROTECTED] wrote:
Best R-users,

  Here's a  newbie question. I have tried to find an answer to this via 
 help and the ave(x,factor(),FUN=function(y)  rank (z,tie='first')-function, 
 but without success.

  I have a dataframe  (~8000 observations, registerdata) with four 
 columns: id, dg1, dg2 and date(-MM-DD)  of interest:

  id;dg1;dg2;date;
  1;F28;;1997-11-04;
  1;F20;F702;1998-11-09;
  1;F20;;1997-12-03;
  1;F208;;2001-03-18;
  2;F32;;1999-03-07;
  2;F29;F32;2000-01-06;
  2;F32;;2003-07-05;
  2;F323;F2800;2000-02-05;
  ...

  I would  like o have two additional columns:
  1. countF20:  a countvariable that shows which in order (by date) 
 the id has if it fulfils  the following logical expression: dg1 = F20* OR dg2 
 = F20*,
  where *  means F201,F202... F2001,F2002...F20001,F20002...
  2. countF2129:  another countvariable that shows which in order (by 
 date) the id has if it fulfils  the following logical expression: dg1 = 
 F21*-F29* OR dg2 = F21*-F29*,
  where F21*-F29*  means F21*, F22*...F29* and
  where *  means F211,F212... F2101,F2102...F21001,F21002...

  ... so the  dataframe would look like this, where 1 is the first 
 observation for the id with  the right condition, 2 is the second etc.:

  id;dg1;dg2;date;countF20;countF2129;
  1;F28;;1997-11-04;;1;
  1;F20;F702;1998-11-09;2;;
  1;F20;;1997-12-03;1;;
  1;F208;;2001-03-18;3;;
  2;F32;;1999-03-07;;;
  2;F29;F32;2000-01-06;;1;
  2;F32;;2003-07-05;;;
  2;F323;F2800;2000-02-05;;2;
  ...

  Do you know  a convenient way to create these kind of countvariables? 
 Thank you in  advance!

  / David (david.gyllenberg  at  yahoo.com


 -
 Park yourself in front of a world of choices in alternative vehicles.

[[alternative HTML version deleted]]


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot table with sapply - labeling problems

2007-08-09 Thread jim holtman

Here is a modified script that should work.  In many cases where you
want the names of the element of the list you are processing, you
should work with the names:

test-as.data.frame(cbind(round(runif(50,0,5)),round(runif(50,0,3)),round(runif(50,0,4
sapply(test, table)-vardist
sapply(test, function(x) round(table(x)/sum(table(x))*100,1) )-vardist1
  par(mfrow=c(1,3))
# you need to use the 'names' and then index into the variable
# your original 'x' did not have a names associated with it
sapply(names(vardist1), function(x) barplot(vardist1[[x]],
ylim=c(0,100),main=Varset1,xlab=x))
  par(mfrow=c(1,1))



On 8/9/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Hi List,

 I am trying to label a barplot group with variable names when using
 sapply unsucessfully.
 I can't seem to extract the names for the indiviual plots:

 test-as.data.frame(cbind(round(runif(50,0,5)),round(runif(50,0,3)),roun
 d(runif(50,0,4
 sapply(test, table)-vardist
 sapply(test, function(x) round(table(x)/sum(table(x))*100,1) )-vardist1
   par(mfrow=c(1,3))
 sapply(vardist1, function(x) barplot(x,
 ylim=c(0,100),main=Varset1,xlab=names(x)))
   par(mfrow=c(1,1))

 Names don't show up although names(vardist) works.

 Also I would like to put a single Title on this plot instead of
 repeating Varset three times.

 Any hints appreciated.

 Thanx
 Herry

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting by number of observations in a factor

2007-08-09 Thread jim holtman

Does this do what you want?  It creates a new dataframe with those
'mg' that have at least a certain number of observation.

 set.seed(2)
 # create some test data
 x - data.frame(mg=sample(LETTERS[1:4], 20, TRUE), data=1:20)
 # split the data into subsets based on 'mg'
 x.split - split(x, x$mg)
 str(x.split)
List of 4
 $ A:'data.frame':  7 obs. of  2 variables:
  ..$ mg  : Factor w/ 4 levels A,B,C,D: 1 1 1 1 1 1 1
  ..$ data: int [1:7] 1 4 7 12 14 18 20
 $ B:'data.frame':  3 obs. of  2 variables:
  ..$ mg  : Factor w/ 4 levels A,B,C,D: 2 2 2
  ..$ data: int [1:3] 9 15 19
 $ C:'data.frame':  4 obs. of  2 variables:
  ..$ mg  : Factor w/ 4 levels A,B,C,D: 3 3 3 3
  ..$ data: int [1:4] 2 3 10 11
 $ D:'data.frame':  6 obs. of  2 variables:
  ..$ mg  : Factor w/ 4 levels A,B,C,D: 4 4 4 4 4 4
  ..$ data: int [1:6] 5 6 8 13 16 17
 # only choose subsets with at 5 observations
 x.5 - lapply(x.split, function(a) {
+ if (nrow(a) = 5) return(a)
+ else return(NULL)
+ })
 # create new dataframe with these observations
 x.new - do.call('rbind', x.5)
 x.new
 mg data
A.1   A1
A.4   A4
A.7   A7
A.12  A   12
A.14  A   14
A.18  A   18
A.20  A   20
D.5   D5
D.6   D6
D.8   D8
D.13  D   13
D.16  D   16
D.17  D   17




On 8/9/07, Ron Crump [EMAIL PROTECTED] wrote:
 Hi,

 I generally do my data preparation externally to R, so I
 this is a bit unfamiliar to me, but a colleague has asked
 me how to do certain data manipulations within R.

 Anyway, basically I can get his large file into a dataframe.
 One of the columns is a management group code (mg). There may be
 varying numbers of observations per management group, and
 he would like to subset the dataframe such that there are
 always at least n per management group.

 I presume I can get to this using table or tapply, then
 (and I'm not sure how on this bit) creating a column nmg
 containing the number of observations that corresponds to
 mg for that row, then simply subsetting.

 So, am I on the right track? If so how do I actually do it, and
 is there an easier method than I am considering.

 Thanks for your help,
 Ron

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting by number of observations in a factor

2007-08-09 Thread jim holtman

Here is an even faster way:

 # faster way
 x.mg.size - table(x$mg)  # count occurance
 x.mg.5 - names(x.mg.size)[x.mg.size  5]  # select greater than 5
 x.new1 - subset(x, x$mg %in% x.mg.5)  # use in the subset
 x.new1
   mg data
1   A1
4   A4
5   D5
6   D6
7   A7
8   D8
12  A   12
13  D   13
14  A   14
16  D   16
17  D   17
18  A   18
20  A   20


On 8/9/07, Ron Crump [EMAIL PROTECTED] wrote:
 Jim,

  Does this do what you want?  It creates a new dataframe with those
  'mg' that have at least a certain number of observation.

 Looks good. I also have an alternative solution which appears to work,
 so I'll see which is quicker on the big data set in question.

 My solution:

 mgsize - as.data.frame(table(in$mg))
 in2 - merge(in,mgsize,by.x=mg,by.y=Var1)
 out - subset(in2, Freq  1, select= -Freq)

 Thanks for your help.

 Ron.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] input data file

2007-08-07 Thread jim holtman

If you are going to read it back into R, then use 'save'; if it is
input to another applicaiton, consider 'write.csv'.  I assume that
when you say save all my data files you really mean save all my R
objects.

On 8/7/07, Tiandao Li [EMAIL PROTECTED] wrote:
 Hello,

 I am new to R. I used scan() to read data from tab-delimited files. I want
 to save all my data files (multiple scan()) in another file, and use it
 like infile statement in SAS or \input{tex.file} in latex.

 Thanks!

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to convert decimal date to its equivalent date format(YYYY.mm.dd.hr.min.sec)

2007-08-07 Thread jim holtman

Is this what you want?

 x - scan(textConnection(1979.00
+
+ 1979.020833
+
+ 1979.041667
+
+ 1979.062500), what=0)
Read 4 items
 # get the year and then determine the number of seconds in the year so you can
 # use the decimal part of the year
 x.year - floor(x)
 # fraction of the year
 x.frac - x - x.year
 # number of seconds in each year
 x.sec.yr - unclass(ISOdate(x.year+1,1,1,0,0,0)) - 
 unclass(ISOdate(x.year,1,1,0,0,0))
 # now get the actual time
 x.actual - ISOdate(x.year,1,1,0,0,0) + x.frac * x.sec.yr

 x.actual
[1] 1979-01-01 00:00:00 GMT 1979-01-08 14:29:49 GMT 1979-01-16
05:00:10 GMT
[4] 1979-01-23 19:30:00 GMT



On 8/7/07, Yogesh Tiwari [EMAIL PROTECTED] wrote:
 Hello R Users,

 How to convert decimal date to date as .mm.dd.hr.min.sec

 For example, I have decimal date in one column , and want to convert and
 write it in equivalent date(.mm.dd.hr.min.sec) in another next six
 columns.

 1979.00

 1979.020833

 1979.041667

 1979.062500



 Is it possible in R ?

 Kindly help,

 Regards,

 Yogesh





 --
 Dr. Yogesh K. Tiwari,
 Scientist,
 Indian Institute of Tropical Meteorology,
 Homi Bhabha Road,
 Pashan,
 Pune-411008
 INDIA

 Phone: 0091-99 2273 9513 (Cell)
 : 0091-20-258 93 600 (O) (Ext.250)
 Fax: 0091-20-258 93 825

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] input data file

2007-08-07 Thread jim holtman

I would hope that you don't have 100 'scan' statements; you should
just have a loop that is using a set of file names in a vector to read
the data.  Are you reading the data into separate objects?  If so,
have you considered reading the 100 files into a 'list' so that you
have a single object with all of your data?  This is then easy to save
with the 'save' function and then you can quickly retrieve it with the
'load' statement.

file.names - c('file1', ..., 'file100')
input.list - list()
for (i in file.names){
input.list[[i]] - scan(i, what=)
}

You can then 'save(input.list, file='save.Rdata')'.  You can access
the data from the individual files with:

input.list[['file33']]





On 8/7/07, Tiandao Li [EMAIL PROTECTED] wrote:
 In the first part of myfile.R, I used scan() 100 times to read data from
 100 different tab-delimited files. I want to save this part to another
 data file, so I won't accidently make mistakes, and I want to re-use/input
 it like infile statement in SAS or \input(file.tex} in latex. Don't want
 to copy/paste 100 scan() every time I need to read the same data.

 Thanks!

 On Tue, 7 Aug 2007, jim holtman wrote:

 If you are going to read it back into R, then use 'save'; if it is
 input to another applicaiton, consider 'write.csv'.  I assume that
 when you say save all my data files you really mean save all my R
 objects.

 On 8/7/07, Tiandao Li [EMAIL PROTECTED] wrote:
  Hello,
 
  I am new to R. I used scan() to read data from tab-delimited files. I want
  to save all my data files (multiple scan()) in another file, and use it
  like infile statement in SAS or \input{tex.file} in latex.
 
  Thanks!
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem you are trying to solve?



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] input data file

2007-08-07 Thread jim holtman

You don't have to name them after numbers.  What I sent was just an
example of a character vector with file names.  If you have all the
files in a directory, then you can set the loop to read in all the
files (or selected one based on a pattern match).  If you are
copy/pasting the 'scan' command, then you must somehow be changing the
file name that is being read and the R object that you are storing the
values in.

You can use list.files(pattern=..) to select a list of file names.
This is much easier than copy/paste.

On 8/8/07, Tiandao Li [EMAIL PROTECTED] wrote:
 I thought of loop at first. My data were generated from 32 microarray
 experiments, each had 3 replicates, 96 files in total. I named the files
 based on different conditions or time series, and I really won't want to
 name them after numbers. It will make me confused later when I need to
 refer/compare them.


 On Tue, 7 Aug 2007, jim holtman wrote:

 I would hope that you don't have 100 'scan' statements; you should
 just have a loop that is using a set of file names in a vector to read
 the data.  Are you reading the data into separate objects?  If so,
 have you considered reading the 100 files into a 'list' so that you
 have a single object with all of your data?  This is then easy to save
 with the 'save' function and then you can quickly retrieve it with the
 'load' statement.

 file.names - c('file1', ..., 'file100')
 input.list - list()
 for (i in file.names){
input.list[[i]] - scan(i, what=)
 }

 You can then 'save(input.list, file='save.Rdata')'.  You can access
 the data from the individual files with:

 input.list[['file33']]





 On 8/7/07, Tiandao Li [EMAIL PROTECTED] wrote:
  In the first part of myfile.R, I used scan() 100 times to read data from
  100 different tab-delimited files. I want to save this part to another
  data file, so I won't accidently make mistakes, and I want to re-use/input
  it like infile statement in SAS or \input(file.tex} in latex. Don't want
  to copy/paste 100 scan() every time I need to read the same data.
 
  Thanks!
 
  On Tue, 7 Aug 2007, jim holtman wrote:
 
  If you are going to read it back into R, then use 'save'; if it is
  input to another applicaiton, consider 'write.csv'.  I assume that
  when you say save all my data files you really mean save all my R
  objects.
 
  On 8/7/07, Tiandao Li [EMAIL PROTECTED] wrote:
   Hello,
  
   I am new to R. I used scan() to read data from tab-delimited files. I want
   to save all my data files (multiple scan()) in another file, and use it
   like infile statement in SAS or \input{tex.file} in latex.
  
   Thanks!
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide 
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
 
  --
  Jim Holtman
  Cincinnati, OH
  +1 513 646 9390
 
  What is the problem you are trying to solve?
 


 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem you are trying to solve?



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Secondary axis

2007-08-06 Thread jim holtman

plot()
par(new=TRUE)
plot(...)
axis(4,...)

On 8/5/07, Patrick Martin [EMAIL PROTECTED] wrote:
 Dear R help list members,

 I am trying to plot two sets of data (both of which are zoo objects)
 in the same graph using two separate y-axes with different scales,
 with the x-axis consisting of dates. I have simply used a plot()
 command to plot first one set of data, and then added the second set
 with lines(). I have also tried to add a further y-axis (at side=4),
 but this simply comes up with the same scale as the first y-axis. I
 somehow need to 'associate' one of the data sets with the second y-
 axis, such that it will scale sensibly (my first data set ranges from
 0-25, the second one from 0 to 40). The problem is compounded by the
 fact that the two data sets have very different frequencies: one
 consists of twice-monthly measurements, the other of hourly
 measurements. I would be very grateful for advice on how to do this.

 Thanks in advance,
 Patrick Martin

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sink behavior

2007-08-06 Thread jim holtman

'sink' will capture 'printed' output from your program.  Try:

 # Using a matrix because as a simple example.
 dumpMatrix = function(mat) {
sink(file = mat.txt)
print(mat)
sink(NULL)
 }

In this case, there is an explicit 'print' statement.  At the command
line, there is an implicit 'print' when you give an object name.


On 8/6/07, Daniel Gatti [EMAIL PROTECTED] wrote:
 There is a package called 'safe' that produces an object which I can
 only write to a file using the sink() function.  It works fine if the
 sink() command is not inside of a function, but it does not write
 anything to the file if the command is within a function.

 Sample code:
 # Using a matrix because as a simple example.
 dumpMatrix = function(mat) {
sink(file = mat.txt)
mat
sink(NULL)
 }

 # This will write the file correctly.
 x = matrix(100, 10, 10)
 sink(file = x.txt)
 x
 sink(NULL)

 # This will create an empty file.
 dumpMatrix(x)

 R 2.5.1
 Windows XP, SP2

 The sink() docs are full of warnings, but I'm not clear which one I've
 violated with this example.

 Dan

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Access an entry after reading a table

2007-08-05 Thread jim holtman

read.table will convert you character columns to factors.  You are
seeing a single value returned (A), but it is also reporting the
levels for the factors. One way is to read the data in without
conversion to factors:

 Model=read.table(ModelMat.txt, header=TRUE, as.is=TRUE)

or you can convert the factor to character for output:

as.character(Model[1,1])


On 8/2/07, Gang Chen [EMAIL PROTECTED] wrote:
 Sorry about this basic question. After reading a table,

 Model=read.table(ModelMat.txt, header=T)

 I want to get access to each entry in the table Model. However, if I do

   Model[1,1]

 I get the following,

 [1] A
 Levels: A B C

 My question is, how can I just get the entry A without the 2nd line
 (Levels: A B C)?

 Thanks,
 Gang

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading Matrices

2007-08-01 Thread jim holtman

1.5688962.0142731.40849 1.730805
 1.645146
 1.7016340   1.121.0721741.238244
 1.1136781.58344 1.6826671.55684 1.59504 1.423253
 1.9618982.4072751.8014922.1238072.038148
 1.7329661.120   0.5481740.977588
 0.8530221.6147721.7139991.5881721.626372  
   1.4545851.99323 2.4386071.8328242.155139
 2.06948
 1.6384781.0721740.5481740   0.8831  0.758534  
   1.5202841.6195111.4936841.5318841.360097
 1.8987422.3441191.7383362.0606511.974992
 1.8045481.2382440.9775880.8831  0   0.728972  
   1.6863541.7855811.6597541.6979541.526167
 2.0648122.5101891.9044062.2267212.141062
 1.6799821.1136780.8530220.7585340.728972  
   0   1.5617881.6610151.5351881.573388
 1.4016011.9402462.3856231.77984 2.102155
 2.016496
 1.8086821.58344 1.6147721.5202841.686354
 1.5617880   0.4774211.1998761.238076
 1.5303012.0689462.5143231.90854 2.230855
 2.145196
 1.9079091.6826671.7139991.6195111.785581  
   1.6610150.4774210   1.2991031.337303
 1.6295282.1681732.61355 2.0077672.330082
 2.244423
 1.7820821.55684 1.5881721.4936841.659754
 1.5351881.1998761.2991030   0.823034
 1.5037012.0423462.4877231.88194 2.204255
 2.118596
 1.8202821.59504 1.6263721.5318841.697954
 1.5733881.2380761.3373030.8230340   
 1.5419012.0805462.5259231.92014 2.242455
 2.156796
 1.3932571.4232531.4545851.3600971.526167  
   1.4016011.5303011.6295281.5037011.541901
 0   1.6535212.0988981.4931151.81543 1.729771
 1.5688961.9618981.99323 1.8987422.064812
 1.9402462.0689462.1681732.0423462.080546  
   1.6535210   0.9887471.51847 1.8407851.755126
 2.0142732.4072752.4386072.3441192.510189  
   2.3856232.5143232.61355 2.4877232.525923
 2.0988980.9887470   1.9638472.286162
 2.200503
 1.40849 1.8014921.8328241.7383361.904406
 1.77984 1.90854 2.0077671.88194 1.92014 1.4931151.51847 
 1.9638470   1.0544050.968746
 1.7308052.1238072.1551392.0606512.226721  
   2.1021552.2308552.3300822.2042552.242455
 1.81543 1.8407852.2861621.0544050   0.722953
 1.6451462.0381482.06948 1.9749922.141062
 2.0164962.1451962.2444232.1185962.156796  
   1.7297711.7551262.2005030.9687460.722953
 0

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simple table with frequency variable

2007-08-01 Thread jim holtman

I am not exactly sure what you are asking for.  I am assuming that you
want a vector that represent the combinations that are given
combinations that are present:

 N
 [1] 11 22 31 42 51 12 21 32 41 52
 table(i,j)
   j
i   1 2
  1 1 1
  2 1 1
  3 1 1
  4 1 1
  5 1 1
 z - table(i,j)
 which(z==1)
 [1]  1  2  3  4  5  6  7  8  9 10
 which(z==1,arr.ind=T)
  row col
1   1   1
2   2   1
3   3   1
4   4   1
5   5   1
1   1   2
2   2   2
3   3   2
4   4   2
5   5   2
 x - which(z==1,arr.ind=T)
 paste(rownames(z)[x[,'row']], colnames(z)[x[,'col']], sep='')
 [1] 11 21 31 41 51 12 22 32 42 52



On 8/1/07, G. Draisma [EMAIL PROTECTED] wrote:
 Hallo,

 Im trying to find out how to tabulate frequencies
 of factors when the data have a frequency variable.

 e,g:
 i-rep(1:5,2)
 j-rep(1:2,5)
 N-10*i+j

 table(i,j) gives a table of ones
 as each combination occurs only once.
 How does one get a table with the corresponding N's?

 Thanks!
 Gerrit.


 --
 Gerrit Draisma
 Department of Public Health
 Erasmus MC, University Medical Center Rotterdam
 Room AE-103
 P.O. Box 2040 3000 CA  Rotterdam The Netherlands
 Phone: +31 10 4087124 Fax: +31 10 4638474
 http://mgzlx4.erasmusmc.nl/pwp/?gdraisma

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem to remove loops in a routine

2007-08-01 Thread jim holtman

=),
 paste(, Group
 ,l,sep=,sep=))


  trellis.par.set(par.xlab.text=list(cex=trellis.par.get(axis.text)[2]))
trellis.par.set(par.ylab.text=list(cex=trellis.par.get(axis.text)[2]))


 print(myplot,panel.width=list(x=(0.75/nTrellisCol),units=npc),panel.height=list(x=(0.50/nTrellisRow),units=npc))


 dev.off()

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to combine data of several csv-files

2007-07-31 Thread jim holtman

Here is the modified script for computing the 'sd':

v1 - NA
v2 - rnorm(6)
v3 - rnorm(6)
v4 - rnorm(6)
v5 - rnorm(6)
v6 - rnorm(6)
v7 - rnorm(6)
v8 - rnorm(6)
v8 - NA

list - list(v1,v2,v3,v4,v5,v6,v7,v8)
categ - c(NA,cat1,cat1,cat1,cat2,cat2,cat2,NA)

# create partitioned list
list.cat - split(list, categ)
# combine each partition into a matrix
list.mat - lapply(list.cat, function(x) do.call('rbind', x))
# now take the means of each column
lapply(list.mat, colMeans)
# compute the 'sd' by using 'apply' on the columns
lapply(list.mat, apply, 2, sd)



On 7/31/07, 8rino-Luca Pantani [EMAIL PROTECTED] wrote:
  Hi Jim,
  that's exactly what I'm looking for. Thank you so much. I think, I
  should look for some further documentation on list handling.
 I think I will do the same...
 Thanks to Jim I learned textConnection and rowMeans.

 Jim, could you please go a step further and tell me how to use lapply to
 calculate
 the sd instead of the mean of the same items?
 I mean
 sd(-0.6442149 0.02354036 -1.40362589)
 sd(-1.1829260 1.17099178 -0.046778203)
 sd(-0.2047012 -1.36186952 0.13045724)
 etc

 x - read.table(textConnection(  v1 v2 v3  v4 v5  v6 v7 v8
 NA -0.6442149  0.02354036 -1.40362589 -1.1829260  1.17099178 -0.046778203 NA
 NA -0.2047012 -1.36186952  0.13045724  2.1411553  0.49248118 -0.233788840 NA
 NA -1.1986041 -0.42197792 -0.84651458 -0.1327081 -0.18690065  0.443908897 NA
 NA -0.2097442  1.50445971  1.57005071 -0.1053442  1.50050976 -1.649740180 NA
 NA -0.7343465 -1.76763996  0.06961015 -0.8179396 -0.65552410  0.003991354 NA
 NA -1.3888750  0.53722404  0.25269771 -1.2342698 -0.01243247 -0.228020092 
 NA), header=TRUE)







-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reading and storing files in the workspace

2007-07-31 Thread jim holtman

try:

for (i in test){
assign(gsub(.txt$, , i), read.table(i, header=TRUE))
}

On 7/31/07, Luis Ridao Cruz [EMAIL PROTECTED] wrote:
 R-help,

 I have a vector containing (test) some file names.
 The files contents are matrixes.

  test

  [1] aaOki.txtaOki.txt bOki.txt c1Oki.txt
 c2Oki.txtc3Oki.txtcOki.txt dOki.txt dyp100.txt
  dyp200.txt
 [11] dyp300.txt   dyp400.txt   dyp500.txt   dyp600.txt
 dyp700.txt   dyp800.txt   eOki.txt FBdyp100.txt
 FBdyp150.txt FBdyp200.txt.

 What I want to do is to import to R using the same file name
 and remove the .txt extension out of the object name.
 Something like this:

 for(i in test)
 gsub(\\., , paste(i, sep = )) - read.table(file = paste(i, sep =
 ), header = TRUE)

 But I get the following message:

 Error in gsub(\\., , paste(i, sep = )) - read.table(file =
 paste(i,  :
target of assignment expands to non-language object


 Thanks in advance.


  version
   _
 platform   i386-pc-mingw32
 arch   i386
 os mingw32
 system i386, mingw32
 status
 major  2
 minor  5.1
 year   2007
 month  06
 day27
 svn rev42083
 language   R
 version.string R version 2.5.1 (2007-06-27)

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expressions : extracting numbers

2007-07-30 Thread jim holtman

Is this what you want:

 x
 [1] lema, rb 2%   rb 2% rb 3% rb 4%
rb 3% rb 2%,mineuse
 [7] rbrbrb 12 rb
rj 30%rb
[13] rbrb 25%rbrb
rbrj, rb
 gsub([^0-9]*([0-9]*)[^0-9]*, \\1, x)
 [1] 2  2  3  4  3  21230
25  



On 7/30/07, GOUACHE David [EMAIL PROTECTED] wrote:
 Hello all,

 I have a vector of character strings, in which I have letters, numbers, and 
 symbols. What I wish to do is obtain a vector of the same length with just 
 the numbers.
 A quick example -

 extract of the original vector :
 lema, rb 2% rb 2% rb 3% rb 4% rb 3% rb 2%,mineuse rb rb rb 
 12 rb rj 30% rb rb rb 25% rb rb rb rj, rb

 and the type of thing I wish to end up with :
 2 2 3 4 3 2   12  30   25

 or, instead of , NA would be acceptable (actually it would almost be better 
 for me)

 Anyways, I've been battling with gsub() and things of the sort, but I'm 
 drowning in the regular expressions, despite a few hours of looking at Perl 
 tutorials...
 So if anyone can help me out, it would be greatly appreciated!!

 In advance, thanks very much.

 David Gouache
 Arvalis - Institut du Végétal
 Station de La Minière
 78280 Guyancourt
 Tel: 01.30.12.96.22 / Port: 06.86.08.94.32

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to combine data of several csv-files

2007-07-30 Thread jim holtman

This should do it:

 v1 - NA
 v2 - rnorm(6)
 v3 - rnorm(6)
 v4 - rnorm(6)
 v5 - rnorm(6)
 v6 - rnorm(6)
 v7 - rnorm(6)
 v8 - rnorm(6)
 v8 - NA

 list - list(v1,v2,v3,v4,v5,v6,v7,v8)
 categ - c(NA,cat1,cat1,cat1,cat2,cat2,cat2,NA)

 # create partitioned list
 list.cat - split(list, categ)
 # combine each partition into a matrix
 list.mat - lapply(list.cat, function(x) do.call('rbind', x))
 # now take the means of each column
 lapply(list.mat, colMeans)
$cat1
[1] -0.5699080  0.3855693  1.1051809  0.2379324  0.6684713  0.3240003

$cat2
[1]  0.38160462 -0.10559496 -0.40963090 -0.09507354  0.95021406 -0.31491450




On 7/30/07, Antje [EMAIL PROTECTED] wrote:
 okay, I played a bit around and now I have some kind of testcase for you:

 v1 - NA
 v2 - rnorm(6)
 v3 - rnorm(6)
 v4 - rnorm(6)
 v5 - rnorm(6)
 v6 - rnorm(6)
 v7 - rnorm(6)
 v8 - rnorm(6)
 v8 - NA

 list - list(v1,v2,v3,v4,v5,v6,v7,v8)
 categ - c(NA,cat1,cat1,cat1,cat2,cat2,cat2,NA)

   list
 [[1]]
 [1] NA

 [[2]]
 [1] -0.6442149 -0.2047012 -1.1986041 -0.2097442 -0.7343465 -1.3888750

 [[3]]
 [1]  0.02354036 -1.36186952 -0.42197792  1.50445971 -1.76763996  0.53722404

 [[4]]
 [1] -1.40362589  0.13045724 -0.84651458  1.57005071  0.06961015  0.25269771

 [[5]]
 [1] -1.1829260  2.1411553 -0.1327081 -0.1053442 -0.8179396 -1.2342698

 [[6]]
 [1]  1.17099178  0.49248118 -0.18690065  1.50050976 -0.65552410 -0.01243247

 [[7]]
 [1] -0.046778203 -0.233788840  0.443908897 -1.649740180  0.003991354 
 -0.228020092

 [[8]]
 [1] NA

 now, I need the means (and sd) of element 1 of list[2],list[3],list[4] 
 (because they belong to cat1) and

 = mean(-0.6442149, 0.02354036, -1.40362589)

 the same for element 2 up to element 6 (-- I would the get a vector 
 containing the means for cat1)
 the same for the vectors belonging to cat2.

 does anybody now understand what I mean?

 Antje

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping through all possible combinations of cases

2007-07-30 Thread jim holtman

Here is how to do it for 2; you can extend it:

 # test data
 n - 100
 x - data.frame(id=sample(letters[1:4], n, TRUE), values=runif(n))
 # get combinations of 2 at a time
 comb.2 - combn(unique(as.character(x$id)), 2)
 for (i in 1:ncol(comb.2)){
+ cat(sprintf(%s:%s %f\n,comb.2[1,i], comb.2[2,i],
+ sum(x$value[x$id %in% comb.2[,i]])))
+ }
c:d 25.259988
c:b 21.268737
c:a 21.250933
d:b 26.013253
d:a 25.995450
b:a 22.004198


On 7/27/07, Dimitri Liakhovitski [EMAIL PROTECTED] wrote:
 Hello!

 I have a regular data frame (DATA) with 10 people and 1 column
 ('variable'). Its cases are people with names ('a', 'b', 'c', 'd',
 'e', 'f', etc.). I would like to write a function that would sum up
 the values on 'variable' of all possible combinations of people, i.e.

 1. I would like to write a loop - in such a way that it loops through
 each possible pair of cases (i.e., ab, ac, ad, etc.) and sums up their
 respective values on 'variable'

 2. I would like to write a loop - in such a way that it loops through
 each possible trio of cases (i.e., abc, abd, abe, etc.) and sums up
 their respective values on 'variable'.

 3.  I would like to write a loop - in such a way that it loops through
 each possible quartet of cases (i.e., abcd, abce, abcf, etc.) and sums
 up their respective values on 'variable'.

 etc.

 Then, at the end I want to capture all possible combinations that were
 considered (i.e., what elements were combined in it) and get the value
 of the sum for each combination.

 How should I do it?
 Thanks a lot!
 Dimitri

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Order by the columns

2007-07-30 Thread jim holtman

?order

You could do something like:

mat[order(mat[,1], mat[,2], mat[,3]),]

On 7/29/07, Am Stat [EMAIL PROTECTED] wrote:
 Dear useR,

 I have a data matrix, it has n columns, each column is a two-level variable
 with entires -1 and +1. They are randomly generated, now I want to order
 them like (for example, 5 columns case)
 ---   -   -
 --   -   --
 .
 (first several rows are the samples with all variables in low level)

 +   -   --   -
 +   -   ---
 .


 -   +   --   -


 +  +   --   -



 + + + + +

 Is there any function in R that could let me do this order by Var1 then
 order by Var2 then...order by Var n


 Thanks very much in advance!


 Best,

 Leon

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] write.csv

2007-07-30 Thread jim holtman

Then you can just write a 'for' loop to write out each submatrix:

for (i in 1:dim(x)[3]){
write.csv(x[,,i], paste(x, i, .csv, sep=))
}


On 7/30/07, Dong GUO 郭东 [EMAIL PROTECTED] wrote:
 the dim of my results is (26,31,8) -(years, regions and variables). so, if i
 save each (years, regions) in 8 csv files, later, I could connect the
 (26,31) to dbf file in ArcGIS to show in a map. This is what I intend to do.

 I dont know a better way to do it directly in R...


 On 7/31/07, jim holtman [EMAIL PROTECTED] wrote:
  It really depends on how you want it output.  You can use 'write.csv'
  to write an array out and it will be a 2-dimentional image that you
  could then reconstruct it from if you know what the dimensions were.
  What do you want to do with the data?  If you are just going to read
  it back into R, then use save/load.
 
  On 7/29/07, Dong GUO 郭东  [EMAIL PROTECTED] wrote:
   Hi,
  
   I want to save an array(say, array[6,7,8]) write a cvs file. How can I
 do
   that??? can I write in one file?
  
   if I could not write in one file, i want to use a loop to save in
 different
   files (in the array[6,7,8], should be 8 csv files), such as the filename
   structure should be: file =filename +str(i) +. +csv
  
   Many thanks.
  
  [[alternative HTML version deleted]]
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
 
  --
  Jim Holtman
  Cincinnati, OH
  +1 513 646 9390
 
  What is the problem you are trying to solve?
 




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to combine data of several csv-files

2007-07-30 Thread jim holtman

Is this what you want:

 x - read.table(textConnection(  v1 v2  v3  v4   
   v5  v6   v7 v8
+ 1 NA -0.6442149  0.02354036 -1.40362589 -1.1829260  1.17099178 -0.046778203 NA
+ 2 NA -0.2047012 -1.36186952  0.13045724  2.1411553  0.49248118 -0.233788840 NA
+ 3 NA -1.1986041 -0.42197792 -0.84651458 -0.1327081 -0.18690065  0.443908897 NA
+ 4 NA -0.2097442  1.50445971  1.57005071 -0.1053442  1.50050976 -1.649740180 NA
+ 5 NA -0.7343465 -1.76763996  0.06961015 -0.8179396 -0.65552410  0.003991354 NA
+ 6 NA -1.3888750  0.53722404  0.25269771 -1.2342698 -0.01243247
-0.228020092 NA), header=TRUE)

 categ - scan(textConnection(NAcat1cat1cat1   
 cat2cat2 cat2   NA), what='')
Read 8 items
 cat.col - split(1:ncol(x), categ)
 lapply(cat.col, function(.cat){
+ rowMeans(x[, .cat])
+ })
$cat1
  1   2   3
   4   5
-0.674766810001 -0.47870449 -0.82236554
0.95492207 -0.81079210
  6
-0.19965108

$cat2
   123
   45
-0.0195708076663  0.7999492133324  0.0414333823334
-0.0848582066670 -0.4898241153337
   6
-0.4915741206667



On 7/30/07, Antje [EMAIL PROTECTED] wrote:
 Hello,

 thank you for your help. But I guess, it's still not what I want... printing 
 df.my gives me

 df.my
   v1 v2  v3  v4 v5  v6   v7 v8
 1 NA -0.6442149  0.02354036 -1.40362589 -1.1829260  1.17099178 -0.046778203 NA
 2 NA -0.2047012 -1.36186952  0.13045724  2.1411553  0.49248118 -0.233788840 NA
 3 NA -1.1986041 -0.42197792 -0.84651458 -0.1327081 -0.18690065  0.443908897 NA
 4 NA -0.2097442  1.50445971  1.57005071 -0.1053442  1.50050976 -1.649740180 NA
 5 NA -0.7343465 -1.76763996  0.06961015 -0.8179396 -0.65552410  0.003991354 NA
 6 NA -1.3888750  0.53722404  0.25269771 -1.2342698 -0.01243247 -0.228020092 NA

 now, I have to combine like this:

   v1 v2  v3  v4 v5  v6   v7   
   v8
   NAcat1cat1cat1   cat2cat2 cat2  
  NA

 --

 mean(df.my$v2[1],df.my$v3[1],df.my$v4[1])
 mean(df.my$v2[2],df.my$v3[2],df.my$v4[2])
 mean(df.my$v2[3],df.my$v3[3],df.my$v4[3])
 mean(df.my$v2[4],df.my$v3[4],df.my$v4[4])
 mean(df.my$v2[5],df.my$v3[5],df.my$v4[5])
 mean(df.my$v2[6],df.my$v3[6],df.my$v4[6])

 the same for v5, v6 and v7

 further, I'm not sure how to avoid the list, because this is the result of 
 the processing I did before...

 Ciao,
 Antje


 8rino-Luca Pantani schrieb:
  I hope I see.
 
  Why not try the following, and avoid lists, which I'm not still able to
  manage properly ;-)
  v1 - NA
  v2 - rnorm(6)
  v3 - rnorm(6)
  v4 - rnorm(6)
  v5 - rnorm(6)
  v6 - rnorm(6)
  v7 - rnorm(6)
  v8 - rnorm(6)
  v8 - NA
  (df.my - cbind.data.frame(v1, v2, v3, v4, v5, v6, v7, v8))
  (df.my2 - reshape(df.my,
   varying=list(c(v1,v2,v3, v4,v5,v6,v7,v8)),
   idvar=sequential,
   timevar=cat,
   direction=long
 ))
  aggregate(df.my2$v1, by=list(category=df.my2$cat), mean)
  aggregate(df.my2$v1, by=list(category=df.my2$cat), function(x){sd(x,
  na.rm = TRUE)})
 
 
  Antje ha scritto:
  okay, I played a bit around and now I have some kind of testcase for you:
 
  v1 - NA
  v2 - rnorm(6)
  v3 - rnorm(6)
  v4 - rnorm(6)
  v5 - rnorm(6)
  v6 - rnorm(6)
  v7 - rnorm(6)
  v8 - rnorm(6)
  v8 - NA
 
  list - list(v1,v2,v3,v4,v5,v6,v7,v8)
  categ - c(NA,cat1,cat1,cat1,cat2,cat2,cat2,NA)
 
   list
  [[1]]
  [1] NA
 
  [[2]]
  [1] -0.6442149 -0.2047012 -1.1986041 -0.2097442 -0.7343465 -1.3888750
 
  [[3]]
  [1]  0.02354036 -1.36186952 -0.42197792  1.50445971 -1.76763996
  0.53722404
 
  [[4]]
  [1] -1.40362589  0.13045724 -0.84651458  1.57005071  0.06961015
  0.25269771
 
  [[5]]
  [1] -1.1829260  2.1411553 -0.1327081 -0.1053442 -0.8179396 -1.2342698
 
  [[6]]
  [1]  1.17099178  0.49248118 -0.18690065  1.50050976 -0.65552410
  -0.01243247
 
  [[7]]
  [1] -0.046778203 -0.233788840  0.443908897 -1.649740180  0.003991354
  -0.228020092
 
  [[8]]
  [1] NA
 
  now, I need the means (and sd) of element 1 of list[2],list[3],list[4]
  (because they belong to cat1) and
 
  = mean(-0.6442149, 0.02354036, -1.40362589)
 
  the same for element 2 up to element 6 (-- I would the get a vector
  containing the means for cat1)
  the same for the vectors belonging to cat2.
 
  does anybody now understand what I mean?
 
  Antje
 
 
 
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What

Re: [R] problems saving and loading (PLMset) objects

2007-07-30 Thread jim holtman

you just need to say:

load(expr.RData)

You should not be assigning it to 'expr' since it is already 'load'ed

On 7/30/07, Quin Wills [EMAIL PROTECTED] wrote:
 Hi



 I'm running the latest R on a presumably up to date Linux server.



 'Doing something silly I'm sure, but can't see why my saved PLMset objects
 come out all wrong. To use an example:



 Setting up an example PLMset (I have the same problem no matter what example
 I use)

  library(affyPLM)

  data(Dilution) # affybatch object

  Dilution = updateObject(Dilution)

  options(width=36)

  expr - fitPLM(Dilution)





 This works, and I'm able to get the probeset coefficients with coefs(expr).
 until I save and try reloading:

  save(expr, file=expr.RData)

  rm(expr) # just to be sure

  expr - load(expr.RData)





 Now, running coefs(expr) says:

  Error in function (classes, fdef, mtable) : unable to find an inherited
 method for function coefs, for signature character





 Trying str(exp) just gives the following:

  chr exp



 expr.Rdata appears to save properly (in that there is an actual file with
 notable size in my working directory).



 Thanks in advance,

 Quin






[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] deriv; loop

2007-07-30 Thread jim holtman

for question 1, is this what you want (BTW allocate 'result' to the
size you want - the example keeps extending it which is OK for small
numbers, but for larger size preallocate):

 result - numeric(0)
 for (i in 1:6) result[i] - i
 result
[1] 1 2 3 4 5 6
 prod(result)
[1] 720


On 7/29/07, francogrex [EMAIL PROTECTED] wrote:

 Hi, 2 questions:

 Question 1: example of what I currently do:

 for(i in 1:6){sink(temp.txt,append=TRUE)
 dput(i+0)
 sink()}
 x=scan(file=temp.txt)
 print(prod(x))
 file.remove(C:/R-2.5.0/temp.txt)

 But how to convert the output of the loop to a vector that I can manipulate
 (by prod or sum etc), without having to write and append to a file?

 Question 2:

  deriv(~gamma(x),x)

 expression({
.expr1 - gamma(x)
.value - .expr1
.grad - array(0, c(length(.value), 1), list(NULL, c(x)))
.grad[, x] - .expr1 * psigamma(x)
attr(.value, gradient) - .grad
.value
 })

 BUT

  deriv3(~gamma(x),x)
 Error in deriv3.formula(~gamma(x), x) : Function 'psigamma' is not in the
 derivatives table

 What I want is the expression for the second derivative (which I believe is
 trigamma(x), or psigamma(x,1)), how can I obtain that?

 Thanks
 --
 View this message in context: 
 http://www.nabble.com/deriv--loop-tf4166283.html#a11853456
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] the large dataset problem

2007-07-30 Thread jim holtman

FYI.  I used your script on a Windows machine with 1.5GHZ and using
the CYGWIN software that has the UNIX utilities.  The field as 1000
lines with 10,000 fields on each line.  Here is what it reported:

gawk 'BEGIN{FS=,}{print $(1) , $(1000) , $(1275) ,  $(5678)}'
 tempxx.txt  newdata.csv


real0m0.806s
user0m0.640s
sys 0m0.124s

So it took less than a second to process the file, so it still should
be pretty fast on windows.  BTW, the first run took 30 seconds of real
time due to the slow disk that I have.  The run above had the data
already cached in memory.


On 7/30/07, Ted Harding [EMAIL PROTECTED] wrote:
 On 30-Jul-07 11:40:47, Eric Doviak wrote:
  [...]

 Sympathies for the constraints you are operating in!

  The Introduction to R manual suggests modifying input files with
  Perl. Any tips on how to get started? Would Perl Data Language (PDL) be
  a good choice?  http://pdl.perl.org/index_en.html

 I've not used SIPP files, but itseems that they are available in
 delimited format, including CSV.

 For extracting a subset of fields (especially when large datasets may
 stretch RAM resources) I would use awk rather than perl, since it
 is a much lighter program, transparent to code for, efficient, and
 it will do that job.

 On a Linux/Unix system (see below), say I wanted to extract fields
 1, 1000, 1275,  , 5678 from a CSV file. Then the 'awk' line
 that would do it would look like

 awk '
  BEGIN{FS=,}{print $(1) , $(1000) , $(1275) , ... $(5678)
 '  sippfile.csv  newdata.csv

 Awk reads one line at a tine, and does with it what you tell it to do.
 It will not be overcome by a file with an enormous number of lines.
 Perl would be similar. So long as one line fits comfortably into RAM,
 you would not be limited by file size (unless you're running out
 of disk space), and operation will be quick, even for very long
 lines (as an experiment, I just set up a file with 10,000 fields
 and 35 lines; awk output 6 selected fields from all 35 lines in
 about 1 second, on the 366MHz 128MB RAM machine I'm on at the
 moment. After transferring it to a 733MHz 512MB RAM machine, it was
 too quick to estimate; so I duplicated the lines to get a 363-line
 file, and now got those same fields out in a bit less than 1 second.
 So that's over 300 lines/second, 200,000 lines a minute, a million
 lines in 5 minutes; and all on rather puny hardware.).

 In practice, you might want to write a separate script which woould
 automatically create the necessary awk script (say if you supply
 the filed names, haing already coded the filed positions corresponding
 to filed names). You could exploit R's system() command to run the
 scripts from within R, and then load in the filtered data.

  I wrote a script which loads large datasets a few lines at a time,
  writes the dozen or so variables of interest to a CSV file, removes
  the loaded data and then (via a for loop) loads the next few lines
   I managed to get it to work with one of the SIPP core files,
  but it's SLW.

 See above ...

  Worse, if I discover later that I omitted a relevant variable,
  then I'll have to run the whole script all over again.

 If the script worked quickly (as with awk), presumably you
 wouldn't mind so much?

 Regarding Linux/Unix versus Windows. It is general experience
 that Linux/Unix works faster, more cleanly and efficiently, and
 often more reliably, for similar tasks; and cam do so on low grade
 hardware. Also, these systems come with dozens of file-processing
 utilities (including perl and awk; also many others), each of which
 has been written to be efficient at precisely the repertoire of
 tasks it was designed for. A lot of Windows sotware carries a huge
 overhead of either cosmetic dross, or a pantechnicon of functionality
 of which you are only going to need 0.01% at any one time.

 The Unix utilities have been ported to Windows, long since, but
 I have no experience of using them in that environment. Others,
 who have, can advise! But I'd seriously suggest getting hold of them.

 Hoping this helps,
 Ted.

 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 094 0861
 Date: 30-Jul-07   Time: 18:24:41
 -- XFMail --

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained

Re: [R] Matrix Multiplication, Floating-Point, etc.

2007-07-30 Thread jim holtman

One thing to realize is that although it appears that the operations
are the same, the code that is being executed is different in the two
cases.  Due to the different sequence of instructions, there may be
round-off errors that are then introduced

On 7/30/07, Talbot Katz [EMAIL PROTECTED] wrote:
 Thank you for responding!

 I realize that floating point operations are often inexact, and indeed, the
 difference between the two answers is within the all.equal tolerance, as
 mentioned in FAQ 7.31 (cited by Charles):

 (as.numeric(ev1%*%ev2))==(sum(ev1*ev2))
 [1] FALSE
 all.equal((as.numeric(ev1%*%ev2)),(sum(ev1*ev2)))
 [1] TRUE
 

 I suppose that's good enough for numerical computation.  But I was still
 surprised to see that matrix multiplication (ev1%*%ev2) doesn't give the
 exact right answer, whereas sum(ev1*ev2) does give the exact answer.  I
 would've expected them to perform the same two multiplications and one
 addition.  But I guess that's not the case.

 However, I did find that if I multiplied the two vectors by 10, making the
 entries integers (although the class was still numeric rather than
 integer), both computations gave equal answers of 0:

 xf1-10*ev1
 xf2-10*ev2
 (as.numeric(xf1%*%xf2))==(sum(xf1*xf2))
 [1] TRUE
 

 Perhaps the moral of the story is that one should exercise caution and keep
 track of significant digits.

 --  TMK  --
 212-460-5430home
 917-656-5351cell



 From: Charles C. Berry [EMAIL PROTECTED]
 To: Talbot Katz [EMAIL PROTECTED]
 CC: r-help@stat.math.ethz.ch
 Subject: Re: [R] Matrix Multiplication, Floating-Point, etc.
 Date: Mon, 30 Jul 2007 09:27:42 -0700
 
 
 
 7.31 Why doesn't R think these numbers are equal?
 
 On Fri, 27 Jul 2007, Talbot Katz wrote:
 
 Hi.
 
 I recently tried the following in R 2.5.1 on Windows XP:
 
 ev2-c(0.8,-0.6)
 ev1-c(0.6,0.8)
 ev1%*%ev2
   [,1]
 [1,] -2.664427e-17
 sum(ev1*ev2)
 [1] 0
 
 
 (I got the same result with R 2.4.1 on a different Windows XP machine.)
 
 I expect this issue is very familiar and probably has been discussed in
 this
 forum before.  Can someone please point me to some documentation or
 discussion about this?  Is there some standard way to get the correct
 answer from %*%?
 
 Thanks!
 
 --  TMK  --
 212-460-5430  home
 917-656-5351  cell
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 Charles C. Berry(858) 534-2098
  Dept of Family/Preventive
 Medicine
 E mailto:[EMAIL PROTECTED]  UC San Diego
 http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
 
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.

2007-07-27 Thread jim holtman

results=()#character()
myVariableNames=names(x.val)
results[length(myVariableNames)]-NA

for (i in myVariableNames){
results[i]-names(x.val[[i]])# this does not work it returns a
NULL (how can i convert this to x.val$somevalue ? )
}



On 7/27/07, Allan Kamau [EMAIL PROTECTED] wrote:
 Hi All,
 I am having difficulties finding a way to find a substitute to the command 
 names(v.val$PR14) so that I could generate the command on the fly for all 
 PR14 to PR200 (please see the previous discussion below to understand what 
 the object x.val contains) . I have tried the following

 results=()#character()
 myVariableNames=names(x.val)
 results[length(myVariableNames)]-NA

 for as.vector(unlist(strsplit(str,,)),mode=list)
 +results[i]-names(x.val$i)# this does not work it returns a NULL 
 (how can i convert this to x.val$somevalue ? )
 }

 Allan.


 - Original Message 
 From: Allan Kamau [EMAIL PROTECTED]
 To: r-help@stat.math.ethz.ch
 Sent: Thursday, July 26, 2007 10:03:17 AM
 Subject: Re: [R] Obtaining summary of frequencies of value occurrences for a 
 variable in a multivariate dataset.

 Thanks so much Jim, Andaikalavan, Gabor and others for the help and 
 suggestions.
 The solution will result in a matrix containing nested matrices to enable 
 each variable name, each variables distinct value and the count of the 
 distinct value to be accessible individually.
 The main matrix will contain the variable names, the first level nested 
 matrices will consist of the variables unique values, and each such variable 
 entry will contain a one element vector to contain the count or occurrence 
 frequency.
 This matrix can now be used in comparing other similar datasets for variable 
 values and their frequencies.

 Building on the input received so far, a probable solution in building the 
 matrix will include the following.


 1)I reading the csv file (containing column headers)
 my_data=read.table(path/to/my/data.csv,header=TRUE,sep=,,dec=.,fill=TRUE)

 2)I group the values in each variable producing an occurrence count(frequency)
 x.val-apply(my_data,2,table)

 3)I obtain a vector of the names of the variables in the table
 names(x.val)

 4)Now I make use of the names (obtained in step 3) to obtain a vector of 
 distinct values in a given variable (in the example below the variable name 
 is $PR14)
 names(v.val$PR14)

 5)I obtain a vector (with one element) of the frequency of a value obtained 
 from the step above (in our example the value is V)
 as.vector(x.val$PR14[V])

 Todo:
 Now I will need to place the steps above in a script (consisting of loops) to 
 build the matrix, step 4 and 5 seem tricky to do programatically.

 Allan.


 - Original Message 
 From: jim holtman [EMAIL PROTECTED]
 To: Allan Kamau [EMAIL PROTECTED]
 Cc: Adaikalavan Ramasamy [EMAIL PROTECTED]; r-help@stat.math.ethz.ch
 Sent: Wednesday, July 25, 2007 1:50:55 PM
 Subject: Re: [R] Obtaining summary of frequencies of value occurrences for a 
 variable in a multivariate dataset.

 Also if you want to access the individual values, you can just leave
 it as a list:

  x.val - apply(x, 2, table)
  # access each value
  x.val$PR14[V]
 V
 8



 On 7/25/07, Allan Kamau [EMAIL PROTECTED] wrote:
  A subset of the data looks as follows
 
   df[1:10,14:20]
PR10 PR11 PR12 PR13 PR14 PR15 PR16
  1 VTIKVGD
  2 VSIKVGG
  3 VTIRVGG
  4 VSIKIGG
  5 VSIKVGG
  6 VSIRVGG
  7 VTIKIGG
  8 VSIKVEG
  9 VSIKVGG
  10VSIKVGG
 
  The result I would like is as follows
 
  PR10PR11  PR12   ...
  [V:10][S:7,T:3][I:10]
 
  The result can be in a matrix or a vector and each variablename, value and 
  frequency should be accessible so as to be used for comparisons with 
  another dataset later.
  The frequency can be a count or a percentage.
 
 
  Allan.
 
 
  - Original Message 
  From: Adaikalavan Ramasamy [EMAIL PROTECTED]
  To: Allan Kamau [EMAIL PROTECTED]
  Cc: r-help@stat.math.ethz.ch
  Sent: Tuesday, July 24, 2007 10:21:51 PM
  Subject: Re: [R] Obtaining summary of frequencies of value occurrences for 
  a variable in a multivariate dataset.
 
  The name of the table should give you the value. And if you have a
  matrix, you just need to convert it into a vector first.
 
m - matrix( LETTERS[ c(1:3, 3:5, 2:4) ], nc=3 )
m
   [,1] [,2] [,3]
  [1,] A  C  B
  [2,] B  D  C
  [3,] C  E  D
tb - table( as.vector(m) )
tb
 
  A B C D E
  1 2 3 2 1
paste( names(tb), :, tb, sep= )
  [1] A:1 B:2 C:3 D:2 E:1
 
  If this is not what you want, then please give a simple example.
 
  Regards, Adai
 
 
 
  Allan Kamau wrote:
   Hi all,
   If the question below as been answered before I
   apologize for the posting.
   I

Re: [R] get() with complex objects?

2007-07-27 Thread jim holtman

'get' tries to retrieve the object given by the character string.  The
error message says that object can not be found.  You actually have to
'evaluate' the character string.  See the example below:

 x - data.frame(a=1:10, b=11:20)
 x$a
 [1]  1  2  3  4  5  6  7  8  9 10
 z - 'x$a'
 get(z)
Error in get(x, envir, mode, inherits) : variable x$a was not found
 # parse and evaluate the character string 'x$a'
 eval(parse(text=z))
 [1]  1  2  3  4  5  6  7  8  9 10

Does this make sense?


On 7/27/07, Mark Orr [EMAIL PROTECTED] wrote:
 Hello R-listers,
 I'm having trouble accessing sub objects (attributes?), e.g.,
 x$silinfo$avg.width using the /get() /command;  I'm using/ get()/ in a
 loop as illustrated in the following code:

 #FIRST MAKE CLUSTERS of VARYING  k
 /for (i in 1:300){
  assign(paste(x.,i,sep=),pam(x,i))  #WORKS FINE
 }/

 #NEXT, TAKE LOOK AT AVE. SILHOUETTE VALUE FOR EACH k

 #PART 1, MAKE LIST OF OBJECTS NEEDED
 /gen.list - rep(t,300)
 for (i in 1:300){
  assign(gen.list[i],paste(x.,i,$silinfo$avg.width,sep=))
 }
 #WORKS FINE

 /#PART 2, USE LIST IN LOOP TO ACCESS OBJECT.
 /si//l.collector - rep(99,300)
 for(i in 1:300){
  sil.collector - get(gen.list[i])
 }/
 #HERE IS THE ERROR
 /Error in get(x, envir, mode, inherits) : variable
 x.1$silinfo$avg.width was not found

 /So, I get the gist of this error; x.1 is an object findable from get(),
 but the attribute  levels are not accessible.  Any suggestions on how
 to get get() to access these levels?  From reading the get()'s help
 page, I don't think it will access the attributes. (my apologies for
 loosely using the term attributes, but I hope it is clear).

 Thanks,

 Mark Orr

 --
 ***
 Mark G. Orr, PhD
 Heilbrunn Dept. of Population and Family Health
 Columbia University
 60 Haven Ave., B-2
 New York, NY 10032

 Tele: 212-304-7823
 Fax:  212-305-7024

 www.columbia.edu/~mo2259


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Finding matches in 2 files

2007-07-26 Thread jim holtman

Is this what you want?

 g1-c(gene1, gene2, gene3, gene4, gene5, gene9, gene10,
+ geneA)
 g2-c(gene6, gene9, gene1, gene2, gene7, gene8, gene9,
+ gene1, gene10)
 intersect(g1,g2)
[1] gene1  gene2  gene9  gene10


On 7/25/07, jenny tan [EMAIL PROTECTED] wrote:


 I have 2 files containing data analysed by 2 different methods. I would like 
 to find out which genes appear in both analyses. Can someone show me how to 
 do this?
 _
 [[trailing spam removed]]

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Create Strings of Column Id's

2007-07-26 Thread jim holtman

Is this what you want:

 paste(-, paste(colnames(MyMatrix)[COL], collapse='-'), sep='')
[1] -E-T


On 7/26/07, Tom.O [EMAIL PROTECTED] wrote:

 Does anyone know how this is don?

 I have a large matrix where I extract specific columns into txt files for
 further use. To be able to keep track of which txt files contain which
 columns I want to name the filenames with the column Id's.

 The most basic example would be to use an for() loop together with paste(),
 but the result is blank. Not even NULL.

 this is the concept of thecode i use:

 for example

 MyMatrix - matrix(NA,ncol=4,nrow=1,dimnames=list(NULL,c(E,R,T,Y)))
 COL - c(1,3) # a vector of columns I want to extract,

 Filename - NULL # the starting variable, so I can use paste
 Filename - for(i in colnames(MyMatrix)[COL]) {paste(Filename,-,i,sep=)}

 The result is -T, but I want it to be -E-T

 Anyone have a clue?

 Thanks Tom


 --
 View this message in context: 
 http://www.nabble.com/Create-Strings-of-Column-Id%27s-tf4153354.html#a11816439
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Convert string to list?

2007-07-26 Thread jim holtman

Is this what you want:

 str - P = 0.0, T = 0.0, Q = 0.0
 x - eval(parse(text=paste('list(', str, ')')))
 str(x)
List of 3
 $ P: num 0
 $ T: num 0
 $ Q: num 0



On 7/26/07, Manuel Morales [EMAIL PROTECTED] wrote:
 Let's say I have the following string:

 str - P = 0.0, T = 0.0, Q = 0.0

 I'd like to find a function that generates the following object from
 'str'.

 list(P = 0.0, T = 0.0, Q = 0.0)

 Thanks!

 --
 http://mutualism.williams.edu

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.

2007-07-25 Thread jim holtman

Also if you want to access the individual values, you can just leave
it as a list:

 x.val - apply(x, 2, table)
 # access each value
 x.val$PR14[V]
V
8



On 7/25/07, Allan Kamau [EMAIL PROTECTED] wrote:
 A subset of the data looks as follows

  df[1:10,14:20]
   PR10 PR11 PR12 PR13 PR14 PR15 PR16
 1 VTIKVGD
 2 VSIKVGG
 3 VTIRVGG
 4 VSIKIGG
 5 VSIKVGG
 6 VSIRVGG
 7 VTIKIGG
 8 VSIKVEG
 9 VSIKVGG
 10VSIKVGG

 The result I would like is as follows

 PR10PR11  PR12   ...
 [V:10][S:7,T:3][I:10]

 The result can be in a matrix or a vector and each variablename, value and 
 frequency should be accessible so as to be used for comparisons with another 
 dataset later.
 The frequency can be a count or a percentage.


 Allan.


 - Original Message 
 From: Adaikalavan Ramasamy [EMAIL PROTECTED]
 To: Allan Kamau [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch
 Sent: Tuesday, July 24, 2007 10:21:51 PM
 Subject: Re: [R] Obtaining summary of frequencies of value occurrences for a 
 variable in a multivariate dataset.

 The name of the table should give you the value. And if you have a
 matrix, you just need to convert it into a vector first.

   m - matrix( LETTERS[ c(1:3, 3:5, 2:4) ], nc=3 )
   m
  [,1] [,2] [,3]
 [1,] A  C  B
 [2,] B  D  C
 [3,] C  E  D
   tb - table( as.vector(m) )
   tb

 A B C D E
 1 2 3 2 1
   paste( names(tb), :, tb, sep= )
 [1] A:1 B:2 C:3 D:2 E:1

 If this is not what you want, then please give a simple example.

 Regards, Adai



 Allan Kamau wrote:
  Hi all,
  If the question below as been answered before I
  apologize for the posting.
  I would like to get the frequencies of occurrence of
  all values in a given variable in a multivariate
  dataset. In short for each variable (or field) a
  summary of values contained with in a value:frequency
  pair, there can be many such pairs for a given
  variable. I would like to do the same for several such
  variables.
  I have used table() but am unable to extract the
  individual value and frequency values.
  Please advise.
 
  Allan.
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.

2007-07-25 Thread jim holtman

Is this what you want:

 x - read.table(textConnection(  PR10 PR11 PR12 PR13 PR14 PR15 PR16
+ 1 VTIKVGD
+ 2 VSIKVGG
+ 3 VTIRVGG
+ 4 VSIKIGG
+ 5 VSIKVGG
+ 6 VSIRVGG
+ 7 VTIKIGG
+ 8 VSIKVEG
+ 9 VSIKVGG
+ 10VSIKVGG), header=TRUE)
 x.t - apply(x, 2, function(.col){
+ .tab - table(.col)
+ paste('[', paste(names(.tab), .tab, sep=:, collapse=','), ']', sep='')
+ })


 x.t
   PR10PR11PR12PR13PR14PR15
   [V:10] [S:7,T:3][I:10] [K:8,R:2] [I:2,V:8] [E:1,G:9]
   PR16
[D:1,G:9]



On 7/25/07, Allan Kamau [EMAIL PROTECTED] wrote:
 A subset of the data looks as follows

  df[1:10,14:20]
   PR10 PR11 PR12 PR13 PR14 PR15 PR16
 1 VTIKVGD
 2 VSIKVGG
 3 VTIRVGG
 4 VSIKIGG
 5 VSIKVGG
 6 VSIRVGG
 7 VTIKIGG
 8 VSIKVEG
 9 VSIKVGG
 10VSIKVGG

 The result I would like is as follows

 PR10PR11  PR12   ...
 [V:10][S:7,T:3][I:10]

 The result can be in a matrix or a vector and each variablename, value and 
 frequency should be accessible so as to be used for comparisons with another 
 dataset later.
 The frequency can be a count or a percentage.


 Allan.


 - Original Message 
 From: Adaikalavan Ramasamy [EMAIL PROTECTED]
 To: Allan Kamau [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch
 Sent: Tuesday, July 24, 2007 10:21:51 PM
 Subject: Re: [R] Obtaining summary of frequencies of value occurrences for a 
 variable in a multivariate dataset.

 The name of the table should give you the value. And if you have a
 matrix, you just need to convert it into a vector first.

   m - matrix( LETTERS[ c(1:3, 3:5, 2:4) ], nc=3 )
   m
  [,1] [,2] [,3]
 [1,] A  C  B
 [2,] B  D  C
 [3,] C  E  D
   tb - table( as.vector(m) )
   tb

 A B C D E
 1 2 3 2 1
   paste( names(tb), :, tb, sep= )
 [1] A:1 B:2 C:3 D:2 E:1

 If this is not what you want, then please give a simple example.

 Regards, Adai



 Allan Kamau wrote:
  Hi all,
  If the question below as been answered before I
  apologize for the posting.
  I would like to get the frequencies of occurrence of
  all values in a given variable in a multivariate
  dataset. In short for each variable (or field) a
  summary of values contained with in a value:frequency
  pair, there can be many such pairs for a given
  variable. I would like to do the same for several such
  variables.
  I have used table() but am unable to extract the
  individual value and frequency values.
  Please advise.
 
  Allan.
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if - else

2007-07-25 Thread jim holtman

try:

Start - ifelse (DateFirstEven  DateSecondEvent,
(DateFirstEvent+DateSecondEvent)/2, DateFound)



On 7/25/07, James J. Roper [EMAIL PROTECTED] wrote:
 Greetings,

 I have some confusion with the use of if - else.  Let's say I have a
 four variables as follows:

 Condition   DateFound  DateFirstEvent
 DateSecondEvent
 NA10Jan2000  NA NA
 0   05Jan2000  07Jan2000
   10Jan2000
 1   07Jan2000  07Jan2000
   08Jan2000
 2   09Jan2000  NA 
 NA

 Now, what I need to do is make a new variable that is either the
 midpoint of the first and second event dates, or the date found (I
 will call Start).

 I tried an if - else condition as follows:

 Start - if (DateFirstEven  DateSecondEvent)
 (DateFirstEvent+DateSecondEvent)/2 else DateFound

 I also tried

 Start - if (any(DateFirstEven  DateSecondEvent))
 (DateFirstEvent+DateSecondEvent)/2 else DateFound

 Only the first half of the expression was ever evaluated.

 I hope I have not been to brief, and will certainly appreciate any help.

 Thanks,

 Jim

 --
 James J. Roper
 Population Dynamics and Conservation of
 Terrestrial Vertebrates
 Caixa Postal 19034
 81531-990 Curitiba, Paraná, Brasil
 ===
 E-mail:   [EMAIL PROTECTED]
 Phone/Fone/Teléfono: 55 41 33611764
 celular: 55 41 99870543
 Casa:   55 41 33857249
 ===
 Ecologia e Conservação na UFPR
 http://www.bio.ufpr.br/ecologia/
 ---
 http://jjroper.googlepages.com/
 http://arsartium.googlepages.com/

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Passing equations as arguments

2007-07-24 Thread jim holtman

Here is one possible solution:

ifun -
function(a, b, FUN){
evala - FUN(a)
evalb - FUN(b)
if (evala  evalb) return(evala) else return(evalb)
}
ifun(1,2,function(x) (x*x) - 2)



On 7/24/07, Anup Nandialath [EMAIL PROTECTED] wrote:
 Friends,

 I'm trying to pass an equation as an argument to a function. The idea is as 
 follows. Let us say i write an independent function

 Ideal Situation:

 ifunc - function(x)
 {
 return((x*x)-2)
 }

 mainfunc - function(a,b)
 {
 evala - ifunc(a)
 evalb - ifunc(b)
 if (evalaevalb){return(evala)}
 else
 return(evalb)
 }

 Now I want to try and write this entire program in a single function with the 
 user specifying the equation as an argument to the function.

 myfunc - function(a, b, eqn)
 {
func1 - function (x) ??
{
return(eqn in terms of x)  ??
   }

 Further arguments to check

 The  imply that this does not seem to be correct. The idea is how to 
 assign the equation expression from the main equation into the inner 
 function. Is there anyway to do that within this set up?


 Thanks in advance
 Regards

 Anup


 -

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] code optimization tips

2007-07-23 Thread jim holtman

First question is why are you defining the functions within the main
function each time?  Why don't you define them once outside?

On 7/23/07, baptiste Auguié [EMAIL PROTECTED] wrote:
 Hi,

 Being new to R I'm asking for some advice on how to optimize the
 performance of the following piece of code:


  alpha_c - function(lambda=600e-9,alpha_s=1e-14,N=400,spacing=1e-7){
 
  k-2*pi/lambda
  ri-c(0,0) # particle at the origin
  x-c(-N:N)
  positions - function(N) {
 reps - 2*N+1
 matrix(c(rep(-N:N, each = reps), rep(-N:N, times = reps)),
nrow = 2, byrow = TRUE)
  }
  rj-positions(N)*spacing # all positions in the 2N x 2N array
  rj-rj[1:2,-((dim(rj)[2]-1)/2+1)] # remove rj=(0,0)
 
  mod-function(x){sqrt(x[1]^2+x[2]^2)} # modulus
 
  sij -function(rj){
  rij=mod(rj-ri)
  cos_ij=rj[1]/rij
  sin_ij=rj[2]/rij
 
  A-(1-1i*k*rij)*(3*cos_ij^2-1)*exp(1i*k*rij)/(rij^3)
  B-k^2*sin_ij^2*exp(1i*k*rij)/rij
 
  sij-A+B
  }
 
  s_ij-apply(rj,2,sij)
  S-sum(s_ij)
  alpha_s/(1-alpha_s*S)
  }
  alpha_c()


 This function is to be called for a few tens of values of lambda in a
 'for' loop, and possibly a couple of different N and spacing (their
 magnitude is typically around the default one).

 This can be a bit slow ––– not that I would expect otherwise --- and
 I wonder if there is something I could do to optimize it (vectorize
 with respect to the lambda parameter?, change the units of the
 problem to deal with numbers closer to unity?,...)

 Best regards,

 baptiste

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] code optimization tips

2007-07-23 Thread jim holtman

The quote what is the problem you are trying to solve is just part
of my signature.  I used to review projects for performance and
architecture and that was the first question I always asked them.

To pass the argument, if you notice the definition of apply:

apply(X, MARGIN, FUN, ...)

the ... are optional argument, so for your function:

sij -function(rj,ri,k){
rij=mod(rj-ri)
cos_ij=rj[1]/rij
sin_ij=rj[2]/rij
A-(1-1i*k*rij)*(3*cos_ij^2-1)*exp(1i*k*rij)/(rij^3)
B-k^2*sin_ij^2*exp(1i*k*rij)/rij
sij-A+B
}

you would call apply with the following:

s_ij-apply(rj,2,sij, ri=ri, k=k)


On 7/23/07, baptiste Auguié [EMAIL PROTECTED] wrote:
 Thanks for your reply,

 On 23 Jul 2007, at 15:19, jim holtman wrote:

  First question is why are you defining the functions within the main
  function each time?  Why don't you define them once outside?
 


 Fair enough!

 As said, I'm new to R and don't know whether it is best to define
 functions outside and pass to them all necessary arguments, or nest
 them and get variables in the scope from parents. In any case, I'd
 agree my positions(), mod() and sij() functions would be better
 outside. Here is a corrected version (untested as something else is
 running),

 
  positions - function(N) {
 reps - 2*N+1
 matrix(c(rep(-N:N, each = reps), rep(-N:N, times = reps)),
nrow = 2, byrow = TRUE)
  }

  mod-function(x){sqrt(x[1]^2+x[2]^2)} # modulus

  sij -function(rj,ri,k){
  rij=mod(rj-ri)
  cos_ij=rj[1]/rij
  sin_ij=rj[2]/rij
 
  A-(1-1i*k*rij)*(3*cos_ij^2-1)*exp(1i*k*rij)/(rij^3)
  B-k^2*sin_ij^2*exp(1i*k*rij)/rij
 
  sij-A+B
  }

  alpha_c - function(lambda=600e-9,alpha_s=1e-14,N=400,spacing=1e-7){
 
  k-2*pi/lambda
  ri-c(0,0) # particle at the origin
 
  rj-positions(N)*spacing # all positions in the 2N x 2N array
  rj-rj[1:2,-((dim(rj)[2]-1)/2+1)] # remove rj=(0,0)
 
  s_ij-apply(rj,2,sij)

 *** Now, how do I pass k and ri to this function ? ***

  S-sum(s_ij)
  alpha_s/(1-alpha_s*S)
  }
  alpha_c()
 


 
  --
  Jim Holtman
  Cincinnati, OH
  +1 513 646 9390
 
  What is the problem you are trying to solve?


 Wondering whether that's part of the signature?

 the problem is related to scattering by arrays of particles, more
 specifically to evaluate the array influence on the effective
 polarizability (alpha) of a particle via dipolar radiative coupling.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dataframe of factors transform speed?

2007-07-21 Thread jim holtman

, it otherwise is ncol(genoT) instead of 10
  
   +gt-genoT[[j]]  #-- this is to avoid 2D indices
   +for(l in 1:length([EMAIL PROTECTED])){
   +  levels(gt)[l] -
   switch([EMAIL PROTECTED],AA=0,AB=1,BB=2)
   #-- convert levels to 0,1, or 2
   +  genoT[[j]]-factor(gt,levels=0:2)   #-- make a 3-level
   factor
   and put it back
   +}
   + }
   + )
   [1] 785.085   4.358 789.454   0.000   0.000
  
   789s for 10 columns only!
  
   To me it seems like replacing 10 x 3 levels and then making
   a factor
   of
   1002 element vector x 10 is a negligible amount of operations
   needed.
  
   So, what's wrong with me? Any idea how to accelerate
   significantly the
   transformation or (to go to the very beginning) to make
   read.table use
   a fixed set of levels (AA,AB, and BB) and not to drop any
   (missing)
   level?
  
   R-devel_2006-08-26, Sun Solaris 10 OS - x86 64-bit
  
   The machine is with 32G RAM and AMD Opteron 285 (2.? GHz)
   so it's not
   it.
  
   Thank you very much for the help,
  
   Latchezar Dimitrov,
   Analyst/Programmer IV,
   Wake Forest University School of Medicine, Winston-Salem, North
   Carolina, USA
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-
   guide.html and provide commented, minimal, self-contained,
   reproducible code.
  
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dataframe of factors transform speed?

2007-07-21 Thread jim holtman

The problem is in the way that 'as.data.frame' works.  Use Rprof on a
small list and you will see where it is spending its time.

Now if you are really sure that all your data is consistent with being
a data frame,
you can create your own dataframe structure your self.  Not that I
would advocate it, but if you look at the output of 'dput' on a
dataframe, you can construct your own.

Here it took 20 seconds to create the test data with a list of 50,000
and only 2 seconds to create the data frame from that.

 set.seed(123)
 n - 5
 system.time({
+ genoT - lapply(1:n, function(i) factor(sample(c(AA,
+ AB, BB), 1000, prob=c(1000, 1, 1), rep=T)))
+ })
   user  system elapsed
  20.850.12   22.83
 names(genoT) = paste(snp, 1:n, sep=)

 # create your own data frame structure -- if you are real sure of your data

 system.time(genoTz - structure(genoT, .Names=names(genoT),
+ row.names=c(NA, -length(genoT[[1]])), class='data.frame'))
   user  system elapsed
   2.000.082.11
 str(genoTz)
'data.frame':   1000 obs. of  5 variables:
 $ snp1: Factor w/ 2 levels AA,AB: 1 1 1 1 1 1 1 1 1 1 ...
 $ snp2: Factor w/ 3 levels AA,AB,BB: 1 1 1 1 1 1 1 1 1 1 ...
 $ snp3: Factor w/ 2 levels AA,AB: 1 1 1 1 1 1 1 1 1 1 ...
 $ snp4: Factor w/ 2 levels AA,AB: 1 1 1 1 1 1 1 1 1 1 ...
 $ snp5: Factor w/ 3 levels AA,AB,BB: 1 1 1 1 1 1 1 1 1 1 ...
 $ snp6: Factor w/ 2 levels AA,AB: 1 1 1 1 1 1 1 1 1 1 ...
 $ snp7: Factor w/ 1 level AA: 1 1 1 1 1 1 1 1 1 1 ...
 $ snp8: Factor w/ 2 levels AA,BB: 1 1 1 1 1 1 1 1 1 1 ...
 $ snp9: Factor w/ 3 levels AA,AB,BB: 1 1 1 1 1 1 1 1 1 1 ...
 $ snp10   : Factor w/ 3 levels AA,AB,BB: 1 1 1 1 1 1 1 1 1 1 ...
 $ snp11   : Factor w/ 1 level AA: 1 1 1 1 1 1 1 1 1 1 ...



On 7/21/07, Latchezar Dimitrov [EMAIL PROTECTED] wrote:
 Jim,

 No, this is _not the problem. If you go to my 1st mail I have a monster
 (at least was when I purchased it) with 32GB (sic :-) of RAM and 4 dual
 core AMD64 285 (the fastest at that time and still pretty fast now :-)

 The machine stats paging when I run 2 copies of R working on two things
 like that :-). If you look at my last e-mail I found a solution but
 still have no clue why the heck x-as.data.frame(y) where why is a list
 of the same columns take real for ever and this the thing that killed me
 before.

 Thanks,
 Latchezar

  -Original Message-
  From: jim holtman [mailto:[EMAIL PROTECTED]
  Sent: Saturday, July 21, 2007 5:33 PM
  To: Latchezar Dimitrov
  Cc: Benilton Carvalho; r-help@stat.math.ethz.ch
  Subject: Re: [R] Dataframe of factors transform speed?
 
  One of the problems is that you are probably paging on your
  system with an object that size (24 x 1000).  This is
  about 1GB for a single object:
 
   set.seed(123)
   n - 24
   system.time({
  + genoT - lapply(1:n, function(i) factor(sample(c(AA, AB, BB),
  + 1000, prob=c(1000, 1, 1), rep=T)))
  + })
 user  system elapsed
95.000.61  104.71
   names(genoT) = paste(snp, 1:n, sep=)
  
   object.size(genoT)
  [1] 1045258752
  
 
  I can create it on my 2GB machine as a list, but have
  problems converting it to a dataframe because I don't have
  enough memory.
 
  So unless you have at least 4GB on your system, it might take
  a long time.  Look at your performance measurements on your
  system and see if you have run out of physical memory and are paging.
 
  On 7/21/07, Latchezar Dimitrov [EMAIL PROTECTED] wrote:
   Hi,
  
   Thanks for the help. My 1st question still unanswered though :-)
   Please see bellow
  
-Original Message-
From: Benilton Carvalho [mailto:[EMAIL PROTECTED]
Sent: Friday, July 20, 2007 3:30 AM
To: Latchezar Dimitrov
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Dataframe of factors transform speed?
   
set.seed(123)
genoT = lapply(1:24, function(i) factor(sample(c(AA, AB,
BB), 1000, prob=sample(c(1, 1000, 1000), 3), rep=T)))
names(genoT) = paste(snp, 1:24, sep=) genoT =
as.data.frame(genoT)
  
   Now this _is the problem. Everything before converting to
  data.frame
   worked almost instantaneously however as.data.frame runs forever.
   Obviously there is some scalability memory management issue. When I
   tried my own method but creating a new result (instead of modifying
   the
   old) dataframe it worked like a charm for the 1st 100 cols ~ .3s. I
   figured 300,000 cols should be ~1000s. Nope! It ran for about
   50,000(!)s to finish about 42,000 cols only.
  
   BTW, what ver. of R is yours?
  
   Now here's what I discovered further.
  
   #-- create a 1-col frame:
  geno   -
  
  data.frame(c(geno.GASP[[1]],geno.JAG[[1]]),row.names=c(rownames(geno.G
   AS
   P),rownames(geno.JAG)))
  
   #-- main code I repeated it w/ j in 1:1000, 2001:3000, and
  3001:4000,
   i.e., adding a 1000 of cols to geno each time
  
   system.time(
   #   for(j in 1:(ncol(geno.GASP  ))){
  for(j in 3001:(4000  )){
gt.GASP-geno.GASP

Re: [R] binned column in a data.frame

2007-07-20 Thread jim holtman

You can also use 'cut' to break the bins:

 x - c(1,2,6,8,13,0,5,10, runif(10) * 100)
 x.bins - seq(0, max(x)+5, 5)
 x.cut - cut(x, breaks=x.bins, include.lowest=TRUE)
 x.names - paste(head(x.bins, -1), tail(x.bins, -1), sep='-')
 data.frame(x, bins=x.names[x.cut])
  x  bins
1   1.0   0-5
2   2.0   0-5
3   6.0  5-10
4   8.0  5-10
5  13.0 10-15
6   0.0   0-5
7   5.0   0-5
8  10.0  5-10
9  75.85256 75-80
10 38.20424 35-40
11 77.30647 75-80
12 62.02278 60-65
13 73.42095 70-75
14 78.69244 75-80
15 66.52972 65-70
16 61.64897 60-65
17 23.99252 20-25
18 42.08632 40-45


On 7/20/07, João Fadista [EMAIL PROTECTED] wrote:
 Dear all,

 I would like to know how can I create a binned column in a data.frame. The 
 output that I would like is something like this:

 Start  Binned_Start
 10-5
 20-5
 65-10
 85-10
 13  10-15
 ...




 Best regards

 João Fadista
 Ph.d. student



 UNIVERSITY OF AARHUS
 Faculty of Agricultural Sciences
 Dept. of Genetics and Biotechnology
 Blichers Allé 20, P.O. BOX 50
 DK-8830 Tjele

 Phone:   +45 8999 1900
 Direct:  +45 8999 1900
 E-mail:  [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
 Web: www.agrsci.org http://www.agrsci.org/
 

 News and news media http://www.agrsci.org/navigation/nyheder_og_presse .

 This email may contain information that is confidential. Any use or 
 publication of this email without written permission from Faculty of 
 Agricultural Sciences is not allowed. If you are not the intended recipient, 
 please notify Faculty of Agricultural Sciences immediately and delete this 
 email.


[[alternative HTML version deleted]]


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SOS

2007-07-20 Thread jim holtman

You can use sprintf:

 x - runif(5)
 x
[1] 0.89838968 0.94467527 0.66079779 0.62911404 0.06178627
 cat(sprintf(%.2f%% , x * 100))
89.84%  94.47%  66.08%  62.91%  6.18% 


On 7/20/07, Fabrice McShort [EMAIL PROTECTED] wrote:
 Hi Julian,

 Thank you very much. Please let me know how to get 2 numbers after the decim.

 Best regards,

 Fabrice



  Date: Fri, 20 Jul 2007 08:15:42 -0700 From: [EMAIL PROTECTED] To: [EMAIL 
  PROTECTED] CC: r-help@stat.math.ethz.ch Subject: Re: [R] SOS  Multiply 
  by 100? Add  R=R*100  Fabrice McShort wrote:  Dear all, I am a new 
  user of R. I would like to know how to get fund's returns in percentage 
  (%). For example, I use: R - ts(read.xls(FundData), frequency = 12, 
  start = c(1996, 1)) Whith this program, the returns are like 0.0152699. 
  But, I would like to have 1.52%. Please advise me about the function. 
  Thanks! Fabrice  
  _   
  [[trailing spam removed]]   [[alternative HTML version deleted]]   
  __  R-help@stat.math.ethz.ch 
  mailing list  https://stat.ethz.ch/mailman/listinfo/r-help  PLEASE do 
  read the posting guide http://www.R-project.org/posting-guide.html  and 
  provide commented, minimal, self-contained, reproducible code.!
  !
  --  Julian M. Burgos  Fisheries Acoustics Research Lab School of Aquatic 
 and Fishery Science University of Washington  1122 NE Boat Street 
 Seattle, WA 98105   Phone: 206-221-6864 
 _



[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot3d labels

2007-07-19 Thread jim holtman

The documentation has:

text3d(x, y = NULL, z = NULL, texts, adj = 0.5, justify, ...)

Do this do it for you?

On 7/19/07, Birgit Lemcke [EMAIL PROTECTED] wrote:
 Hello R users,

 I am a newby using R  2.5.0 on a Apple Power Book G4 with Mac OS X
 10.4.10.

 Sorry that I ask again such stupid questions, but I haven´t found how
 to label the points created with plot3d (rgl).
 Hope somebody can help me.

 Thanks in advance.

 Birgit


 Birgit Lemcke
 Institut für Systematische Botanik
 Zollikerstrasse 107
 CH-8008 Zürich
 Switzerland
 Ph: +41 (0)44 634 8351
 [EMAIL PROTECTED]






[[alternative HTML version deleted]]


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] linear interpolation of multiple random time series

2007-07-19 Thread jim holtman

This should do it for you:

 x - read.table(textConnection(trial   timex
+ 1   1   1
+ 1   5   4
+ 1   7   9
+ 1   12  20
+ 2   1   0
+ 2   3   5
+ 2   9   10
+ 2   13  14
+ 2   19  22
+ 2   24  32), header=TRUE)
 # compute for each trial
 trial.list - lapply(split(x, x$trial), function(set){
+ .xval - seq(min(set$time), max(set$time))
+ .yval - approx(set$time, set$x, xout=.xval)$y
+ cbind(trial=set$trial[1], time=.xval, x=.yval)
+ })
 do.call('rbind', trial.list)
  trial time x
 [1,] 11  1.00
 [2,] 12  1.75
 [3,] 13  2.50
 [4,] 14  3.25
 [5,] 15  4.00
 [6,] 16  6.50
 [7,] 17  9.00
 [8,] 18 11.20
 [9,] 19 13.40
[10,] 1   10 15.60
[11,] 1   11 17.80
[12,] 1   12 20.00
[13,] 21  0.00
[14,] 22  2.50
[15,] 23  5.00
[16,] 24  5.83
[17,] 25  6.67
[18,] 26  7.50
[19,] 27  8.33
[20,] 28  9.17
[21,] 29 10.00
[22,] 2   10 11.00
[23,] 2   11 12.00
[24,] 2   12 13.00
[25,] 2   13 14.00
[26,] 2   14 15.33
[27,] 2   15 16.67
[28,] 2   16 18.00
[29,] 2   17 19.33
[30,] 2   18 20.67
[31,] 2   19 22.00
[32,] 2   20 24.00
[33,] 2   21 26.00
[34,] 2   22 28.00
[35,] 2   23 30.00
[36,] 2   24 32.00



On 7/19/07, Mike Lawrence [EMAIL PROTECTED] wrote:
 Hi all,

 Looking for tips on how I might more optimally solve this. I have
 time series data (samples from a force sensor) that are not
 guaranteed to be sampled at the same time values across trials. ex.

 trial   timex
 1   1   1
 1   5   4
 1   7   9
 1   12  20
 2   1   0
 2   3   5
 2   9   10
 2   13  14
 2   19  22
 2   24  32

 Within each trial I'd like to use linear interpolation between each
 successive time sample to fill in intermediary timepoints and x-
 values, ex.

 trial   timex
 1   1   1
 1   2   1.75
 1   3   2.5
 1   4   3.25
 1   5   4
 1   6   6.5
 1   7   9
 1   8   11.2
 1   9   13.4
 1   10  15.6
 1   11  17.8
 1   12  20
 2   1   0
 2   2   2.5
 2   3   5
 2   4   5.83
 2   5   6.67
 2   6   7.5
 2   7   8.33
 2   8   9.17
 2   9   10
 2   10  11
 2   11  12
 2   12  13
 2   13  14
 2   14  15.3
 2   15  16.7
 2   16  18
 2   17  19.3
 2   18  20.7
 2   19  22
 2   20  24
 2   21  26
 2   22  28
 2   23  30
 2   24  32


 The solution I've coded (below) involves going through the original
 data frame line by line and is thus very slow (indeed, I had to
 resort to writing to file as with a large data set I started running
 into memory issues if I tried to create the new data frame in
 memory). Any suggestions on a faster way to achieve what I'm trying
 to do?

 #assumes the first data frame above is stored as 'a'
 arows = (length(a$x)-1)
 write('', 'temp.txt')
 for(i in 1:arows){
if(a$time[i+1]  a$time[i]){
write.table(a[i,], 'temp.txt', row.names = F, col.names = F, 
 append
 = T)
x1 = a$time[i]
x2 = a$time[i+1]
dx = x2-x1
if(dx != 1){
y1 = a$x[i]
y2 = a$x[i+1]
dy = y2-y1
slope = dy/dx
int = -slope*x1+y1
temp=a[i,]
for(j in (x1+1):(x2-1)){
temp$time = j
temp$x = slope*j+int
write.table(temp, 'temp.txt', row.names = F, 
 col.names = F,
 append = T)
}
}
}else{
write.table(a[i,], 'temp.txt', row.names = F, col.names = F, 
 append
 = T)
}
 }
 i=i+1
 write.table(a[i,], 'temp.txt', row.names = F, col.names = F, append = T)

 b=read.table('temp.txt',skip=1)
 names(b)=names(a)

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch

Re: [R] Help with Dates

2007-07-19 Thread jim holtman

Try some of the following:

head(subset(df, Yr %in% c(00,01,02,03)))

subset(df, (Yr = '00')  (Yr = '03'))  # same as above

subset(df, (Yr == '00') | (Yr == '01') | (Yr == '02') |(Yr == '03'))  # same


On 7/19/07, Alex Park [EMAIL PROTECTED] wrote:
 R

 I am taking an excel dataset and reading it into R using read.table.
 (actually I am dumping the data into a .txt file first and then reading data
 in to R).

 Here is snippet:

  head(data);
   Date  Price Open.Int. Comm.Long Comm.Short net.comm
 1 15-Jan-86 673.25175645 65910  2842537485
 2 31-Jan-86 677.00167350 54060  2712026940
 3 14-Feb-86 680.25157985 37955  2542512530
 4 28-Feb-86 691.75162775 49760  1603033730
 5 14-Mar-86 706.50163495 54120  2799526125
 6 31-Mar-86 709.75164120 54715  3039024325

 The dataset runs from 1986 to 2007.

 I want to be able to take subsets of my data based on date e.g. data between
 2000 - 2005.

 As it stands, I can't work with the dates as they are not in correct format.

 I tried successfully converting the dates to just the year using:

 transform(data, Yr = format(as.Date(as.character(Date),format = '%d-%b-%y'),
 %y)))

 This gives the following format:

   Date  Price Open.Int. Comm.Long Comm.Short net.comm Yr
 1 15-Jan-86 673.25175645 65910  2842537485 86
 2 31-Jan-86 677.00167350 54060  2712026940 86
 3 14-Feb-86 680.25157985 37955  2542512530 86
 4 28-Feb-86 691.75162775 49760  1603033730 86
 5 14-Mar-86 706.50163495 54120  2799526125 86
 6 31-Mar-86 709.75164120 54715  3039024325 86

 I can subset for a single year e.g:

 head(subset(df, Yr ==00)

 But how can I subset for multiple periods e.g 00- 05? The following won't
 work:

 head(subset(df, Yr ==00  Yr==01)

 or

 head(subset(df, Yr = c(00,01,02,03)

 I can't help but feeling that I am missing something and there is a simpler
 route.

 I leafed through R newletter 4.1 which deals with dates and times but it
 seemed that strptime and POSIXct / POSIXlt are not what I need either.

 Can anybody help me?

 Regards


 Alex

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] can I paste 'newline'?

2007-07-19 Thread jim holtman

Notice the difference:

 cat ('I need to move on to a new line', '\n', 'at here') # change line!
I need to move on to a new line
 at here paste ('I need to move on to a new line', '\n', 'at here') #
'\n' is just a
[1] I need to move on to a new line \n at here
 cat(paste ('I need to move on to a new line', '\n', 'at here'))
I need to move on to a new line
 at here

 paste(a long string
+ with carriage
+ returns)
[1] a long string\nwith carriage\nreturns


 cat(paste(a long string
+ with carriage
+ returns))
a long string
with carriage
returns


paste is showing you the characters in the string; cat is acutally
outputting to a print device where '\n' is a line feed.

On 7/19/07, runner [EMAIL PROTECTED] wrote:

 It is ok to bury a reg expression '\n' when using 'cat', but not 'paste'.
 e.g.

 cat ('I need to move on to a new line', '\n', 'at here') # change line!
 paste ('I need to move on to a new line', '\n', 'at here') # '\n' is just a
 character as it is.

 Is there a way around pasting '\n' ? Thanks a lot.
 --
 View this message in context: 
 http://www.nabble.com/can-I-paste-%27newline%27--tf4114350.html#a11699845
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dataframe of factors transform speed?

2007-07-19 Thread jim holtman

 with different number of levels (from 1 to 3 -
 that's what I got from read.table, i.e., it dropped missing levels). I
 want to convert it to uniform factors with 3 levels. The 1st 10 rows
 above show already converted columns and the rest are not yet converted.
 Here's my attempt wich is a complete failure as speed:

  system.time(
 + for(j in 1:(10 )){ #-- this is to try 1st 10 cols and
 measure the time, it otherwise is ncol(genoT) instead of 10

 +gt-genoT[[j]]  #-- this is to avoid 2D indices
 +for(l in 1:length([EMAIL PROTECTED])){
 +  levels(gt)[l] - switch([EMAIL PROTECTED],AA=0,AB=1,BB=2)
 #-- convert levels to 0,1, or 2
 +  genoT[[j]]-factor(gt,levels=0:2)   #-- make a 3-level factor
 and put it back
 +}
 + }
 + )
 [1] 785.085   4.358 789.454   0.000   0.000

 789s for 10 columns only!

 To me it seems like replacing 10 x 3 levels and then making a factor of
 1002 element vector x 10 is a negligible amount of operations needed.

 So, what's wrong with me? Any idea how to accelerate significantly the
 transformation or (to go to the very beginning) to make read.table use a
 fixed set of levels (AA,AB, and BB) and not to drop any (missing)
 level?

 R-devel_2006-08-26, Sun Solaris 10 OS - x86 64-bit

 The machine is with 32G RAM and AMD Opteron 285 (2.? GHz) so it's not
 it.

 Thank you very much for the help,

 Latchezar Dimitrov,
 Analyst/Programmer IV,
 Wake Forest University School of Medicine,
 Winston-Salem, North Carolina, USA

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] remove columns having a partial match name

2007-07-18 Thread jim holtman

DATA_OK - DATA[-grep(^Start, names(DATA)),]

On 7/18/07, João Fadista [EMAIL PROTECTED] wrote:
 Dear all,

 I would like to know how can I retrieve a data.frame without the columns that 
 have a partial match name. Let´s say that I have a data.frame with 200 
 columns and 100 of them have the name StartX, with X being the unique part 
 for each column name. I want to delete all columns that have the name 
 starting with Start. I´ve tried to do this but it doesn´t work:

  DATA_OK - DATA[,-match((Start*),names(DATA))]
  dim(DATA_OK)
 NULL


 Thanks in advance.
 Best regards

 João Fadista
 Ph.d. student



 UNIVERSITY OF AARHUS
 Faculty of Agricultural Sciences
 Dept. of Genetics and Biotechnology
 Blichers Allé 20, P.O. BOX 50
 DK-8830 Tjele

 Phone:   +45 8999 1900
 Direct:  +45 8999 1900
 E-mail:  [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
 Web: www.agrsci.org http://www.agrsci.org/
 

 News and news media http://www.agrsci.org/navigation/nyheder_og_presse .

 This email may contain information that is confidential. Any use or 
 publication of this email without written permission from Faculty of 
 Agricultural Sciences is not allowed. If you are not the intended recipient, 
 please notify Faculty of Agricultural Sciences immediately and delete this 
 email.


[[alternative HTML version deleted]]


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Classification

2007-07-18 Thread jim holtman

You can use 'cut':

 x
 MD
1  0.20
2  0.10
3  0.80
4  0.30
5  0.70
6  0.60
7  0.01
8  0.20
9  0.50
10 1.00
11 1.00
 cut(x$MD, breaks=seq(0,1,.2), include.lowest=TRUE, labels=LETTERS[1:5])
 [1] A A D B D C A A C E E
Levels: A B C D E



On 7/18/07, Ing. Michal Kneifl, Ph.D. [EMAIL PROTECTED] wrote:
 Hi,
 I am also a quite new user of R and would like to ask you for help:
 I have a data frame where all columns are numeric variables. My aim is
 to convert one columnt in factors.
 Example:
 MD
 0.2
 0.1
 0.8
 0.3
 0.7
 0.6
 0.01
 0.2
 0.5
 1
 1


 I want to make classes:
 0-0.2 A
 0.21-0.4 B
 0.41-0.6 C
 . and so on

 So after classification I wil get:
 MD
 A
 A
 D
 B
 .
 .
 .
 and so on

 Please could you give an advice to a newbie?
 Thanks a lot in advance..

 Michael

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nested for loop

2007-07-18 Thread jim holtman

This should create your files for you:

x - 1:1080  # test data
# create a vector of 30 consecutive values for spliting the data
breaks - rep(1:ceiling(length(x) / 30), each=30)[1:length(x)]
# now partition the data into 30 values and write them
fileNo - 1  # initialize the file number
invisible(lapply(split(x, breaks), function(.values){
write(.values, file=sprintf(NWRxx.%03d.txt, fileNo))
fileNo - fileNo + 1   # update the file number
}))


On 7/18/07, Sherri Heck [EMAIL PROTECTED] wrote:
 Hi,

 I am new to programming and R.  I am reading the manual and R books by 
 Dalgaard and Veranzo to help answer my questions but I am unable to figure 
 out the following:

 I have a data file that contains 1080 data points. Here's a snippet of the 
 file:

 [241]  0.3603704000  0.1640741000  0.2912963000   NA  0.0159259300  
 0.0474074100

 I would like to break the file up into 30 consecutive data point segments and 
 then write each segment into a separate data file.  This is one version of 
 code that I've tried.

 mons = c(1:12)

 data = scan(paste(C:/R/NWR.txt))
 for (mon in mons)  {
  for (i in c(1:30)) {
  for (j in data){

 write((data),paste(mon,'NWR dc_dt_zi ppm meters per sec.txt',sep=''),ncol=1)

 }
  }
  }

 I think I'm really close, but no cigar.  Thanks in advance for any help-

 S.Heck
 Graduate Research Assistant
 University of Colorado, Boulder

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] set up automatic running of R

2007-07-18 Thread jim holtman

Create a .bat file with the commands to execute R BATCH and then
create a scheduled task that will run at the desired time to call the
batch file.

On 7/18/07, Am Stat [EMAIL PROTECTED] wrote:
 Hi useR,

 I am trying to find how to schedule an automatic run of R periodically, I
 have written some scripts to extract data which are updated monthly on
 another server, my os is xp. The goal is that my script will run at a
 scheduled time every month and record the results to some directories.

 Now the scripts are done, only thing I need is to know how to let R run my
 scripts at a certain time, say the first Sunday of each months.

 Could anyone give me some clues?

 Thanks a million in advance!


 Best,

 Leon

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] memory error with 64-bit R in linux

2007-07-18 Thread jim holtman


Are you paging?  That might explain the long run times. How much space
are your other objects taking up?  The matrix by itself should only
require about 13MB if it is numeric.  I would guess it is some of the
other objects that you have in your working space.  Put some gc() in
your loop to see how much space is being used.  Run it with a subset
of the data and see how long it takes.  This might give you an
estimate of the time, and space, that might be needed for the entire
dataset.

Do a 'ps' to see how much memory your process is using.  Do one every
couple of minutes to see if it is growing.  You can alway use Rprof()
to get an idea of where time is being spent (use it on a small
subset).

On 7/18/07, zhihua li [EMAIL PROTECTED] wrote:

Hi netters,

I'm using the 64-bit R-2.5.0 on a x86-64 cpu, with an RAM of 2 GB.  The
operating system is SUSE 10.
The system information is:
-uname -a
Linux someone 2.6.13-15.15-smp #1 SMP Mon Feb 26 14:11:33 UTC 2007 x86_64
x86_64 x86_64 GNU/Linux

I used heatmap to process a matrix of the dim [16000,100].  After 3 hours
of desperating waiting, R told me:
cannot allocate vector of size 896 MB.

I know the matrix is very big, but since I have 2 GB of RAM and in a 64-bit
system, there should be no problem to deal with a vector smaller than 1 GB?
(I was not running any other applications in my system)

Does anyone know what's going on?  Is there a hardware limit where I have
to add more RAM, or is there some way to resolve it softwarely? Also is it
possible to speed up the computing (I don't wanna wait another 3 hours to
know I get another error message)

Thank you in advance!

_
享用世界上最大的电子邮件系统― MSN Hotmail。  http://www.hotmail.com


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Saving a dataset permanently in R

2007-07-18 Thread jim holtman

Where are you trying to copy data from?  I would assume that with that
script you are typing all the data in by hand.  Why don't you put it
in a text file and use read.table?  By default, R will save your
workspace on exit and then reload it on startup.  Is this enough to
save your data?  You can also use the 'save' function to store
explicit objects.

On 7/18/07, Felipe Carrillo [EMAIL PROTECTED] wrote:
 HI:
 I'm still struggling with datasets, the more I read
 about it the more confussed I get. This is the
 scenario... In R console|Edit|Data Editor, I can find
 all the datasets available with the different
 packages, So to create a new dataset in the R console
 I use the following commands to create an empty data
 frame.
 My_Dataset - data.frame()
 My_Dataset - edit(My_dataset)

 The problem is that I can't copy my data into the
 dataframe. Is there any suggestions as of how I can
 transfer the data and how it can be saved so everytime
 I open R the dataset would be available.?
 Thanks

  Felipe D. Carrillo
  Fishery Biologist
  US Fish  Wildlife Service
  Red Bluff, California 96080

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] memory error with 64-bit R in linux

2007-07-18 Thread jim holtman


The output from gc() indicates that you had a maximum usage of
476MB+119MB=~600MB.  If you look at the output of ps you will notice
that the process size is 523MB (or about 500MB if you want to be
exact).  So you are using about 25% of the 2GB that you have
available.

mem.limit just shows the current value of the parameters, and as the
help file says:

Value
mem.limits() returns an integer vector giving the current settings of
the maxima, possibly NA.



On 7/18/07, zhihua li [EMAIL PROTECTED] wrote:

Thanks for replying!
i don't think i'm paging. i tried to use a smaller version of my matrix and
do all the checkings as suggested by jim. The smaller matrix caused another
problem, for which I've opened another thread. But i've found something
about memory that I don't understand.
 gc()
 used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  269577 14.45570995 297.6  8919855 476.4
Vcells 3353395 25.69493567  72.5 15666095 119.6

Does this mean the maximum memory I can use for variables is only 120 M?
However, when I tried to check the memory limits:
 mem.limits()
nsize vsize
  NANA

Here it seems the maximum memory is not limited?

When there is no R function is being executed, I checked the system process
by:
ps u

PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
7821  0.0  0.1  10048  2336 pts/0Ss   Jul18   0:00 -bash
8076  2.9 24.5 523088 504004 pts/0   S+   Jul18   2:46 /usr/lib64/R/bi
8918  1.5  0.1   9912  2328 pts/1Ss   00:44   0:00 -bash
8962  0.0  0.0   3808   868 pts/1R+   00:45   0:00 ps u

Does this mean R is using 25% of my memory? But my RAM is 2 GB and the
objects in R only occupy 40 MB from gc().

Did I interpret it wrong?

Thanks a lot!



From: jim holtman [EMAIL PROTECTED]
To: zhihua li [EMAIL PROTECTED]
CC: r-help@stat.math.ethz.ch
Subject: Re: [R] memory error with 64-bit R in linux
Date: Wed, 18 Jul 2007 17:50:31 -0500

Are you paging?  That might explain the long run times. How much
space
are your other objects taking up?  The matrix by itself should only
require about 13MB if it is numeric.  I would guess it is some of
the
other objects that you have in your working space.  Put some gc() in
your loop to see how much space is being used.  Run it with a subset
of the data and see how long it takes.  This might give you an
estimate of the time, and space, that might be needed for the entire
dataset.

Do a 'ps' to see how much memory your process is using.  Do one
every
couple of minutes to see if it is growing.  You can alway use
Rprof()
to get an idea of where time is being spent (use it on a small
subset).

On 7/18/07, zhihua li [EMAIL PROTECTED] wrote:
Hi netters,

I'm using the 64-bit R-2.5.0 on a x86-64 cpu, with an RAM of 2 GB.
The
operating system is SUSE 10.
The system information is:
-uname -a
Linux someone 2.6.13-15.15-smp #1 SMP Mon Feb 26 14:11:33 UTC 2007
x86_64
x86_64 x86_64 GNU/Linux

I used heatmap to process a matrix of the dim [16000,100].  After 3
hours
of desperating waiting, R told me:
cannot allocate vector of size 896 MB.

I know the matrix is very big, but since I have 2 GB of RAM and in
a 64-bit
system, there should be no problem to deal with a vector smaller
than 1 GB?
(I was not running any other applications in my system)

Does anyone know what's going on?  Is there a hardware limit where
I have
to add more RAM, or is there some way to resolve it softwarely?
Also is it
possible to speed up the computing (I don't wanna wait another 3
hours to
know I get another error message)

Thank you in advance!

_
享用世界上最大的电子邮件系统― MSN Hotmail。
http://www.hotmail.com


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

_
与联机的朋友进行交流，请使用 MSN Messenger:  http://messenger.msn.com/cn





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] poor rbind performance

2007-07-17 Thread jim holtman

Read the data into a list and then:

do.call('rbind', myList)

at the end so you do it only once.  You are having to reallocate
memory each iteration, so no wonder it is slow.

On 7/17/07, Aydemir, Zava (FID) [EMAIL PROTECTED] wrote:
 Hi

 I rbind data frames in a loop in a cumulative way and the performance
 detriorates very quickly.

 My code looks like this:

 for( k in 1:N)
 {
filename - paste(/tmp/myData_,as.character(k),.txt,sep=)
myDataTmp - read.table(filename,header=TRUE,sep=,)
if( k == 1) {
myData - myDataTmp
}
else{
myData - rbind(myData,myDataTmp)
}
 }

 Some more details:
 - the size of the stored text files is about 100,000 rows and 50 columns
 each
 - for k=1: rbind takes 0.0004 seconds
 - for k=2: rbind takes 13 seconds
 - for k=3: rbind takes 30 seconds
 - for k=4: rbind takes 36 seconds
 etc

 Any suggestions to improve speed?

 Thanks

 Zava
 

 This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with length()

2007-07-16 Thread jim holtman

POSIXlt is a list structure of 9 elements (see ?POSIXlt).  You can see
that in the data below:

 x - as.POSIXlt(c('2007-01-01','2007-02-01','2007-03-31'))
 length(x)
[1] 9
 unclass(x)
$sec
[1] 0 0 0

$min
[1] 0 0 0

$hour
[1] 0 0 0

$mday
[1]  1  1 31

$mon
[1] 0 1 2

$year
[1] 107 107 107

$wday
[1] 1 4 6

$yday
[1]  0 31 89

$isdst
[1] 0 0 0

attr(,tzone)
[1] GMT
 length(as.POSIXct(x))
[1] 3


What you probably want to do is to use the POSIXct class.

On 7/16/07, Jacob Etches [EMAIL PROTECTED] wrote:
 In the following, can anyone tell me why length(eee) returns 9?  I
 was expecting 15398, and when I try to add this vector to a data
 frame with that many rows, it fails complaining that the vector is of
 length 9.  In what I thought was an identical situation with a
 related dataset, the same code worked as expected.

   length(fff)
 [1] 15398
   str(fff)
 int [1:15398] 20010102 20010102 20010102 20010103 20010103 20010102
 20010102 20010104 20010103 20010102 ...
   fff[1:12]
 [1] 20010102 20010102 20010102 20010103 20010103 20010102 20010102
 20010104 20010103 20010102 20010105 20010103
   eee - as.POSIXlt(strptime(fff,%Y%m%d))
   length(eee)
 [1] 9
   eee[1:12]
 [1] 2001-01-02 2001-01-02 2001-01-02 2001-01-03 2001-01-03
 2001-01-02 2001-01-02 2001-01-04 2001-01-03 2001-01-02
 2001-01-05 2001-01-03
   str(eee)
 'POSIXlt', format: chr [1:15398] 2001-01-02 2001-01-02
 2001-01-02 2001-01-03 2001-01-03 2001-01-02 2001-01-02
 2001-01-04 2001-01-03 ...



 Many thanks in advance,
 Jacob Etches


 Doctoral candidate, Epidemiology Program
 Department of Public Health Sciences, University of Toronto Faculty
 of Medicine

 Research Associate
 Institute for Work  Health
 800-481 University Avenue, Toronto, Ontario, Canada   M5G 2E9
 T: 416.927.2027 ext. 2290
 F: 416.927.4167
 [EMAIL PROTECTED]
 www.iwh.on.ca





 This e-mail may contain confidential information for the sol...{{dropped}}

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Algorythmic Question on Array Filtration

2007-07-14 Thread jim holtman

 the posting guide
  http://www.R-project.org/posting-guide.html and provide commented,
  minimal, self-contained, reproducible code.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] filling a list faster

2007-07-13 Thread jim holtman

It all depends on what you want to do.  In your example, it is faster
to first fill in a matrix and then convert the matrix to a list.  The
problem with filling in the list is that you are dynamically
allocating space for each iteration which is probably taking at least
an order of magnitude more time than the calculations you are doing.
So I just translated your problem into two steps and it takes about 2
seconds on my system.


 # fill in a matris
 l - matrix(ncol=3, nrow=10^5)
 system.time(for(i in (1:10^5)) l[i,] - c(i,i+1,i))
   user  system elapsed
   1.060.001.10
 # convert to a list
 system.time(l.list - lapply(1:10^5, function(i) l[i,]))
   user  system elapsed
   0.450.000.46
 l.list[1:10]
[[1]]
[1] 1 2 1

[[2]]
[1] 2 3 2

[[3]]
[1] 3 4 3

[[4]]
[1] 4 5 4

[[5]]
[1] 5 6 5





On 7/13/07, Balazs Torma [EMAIL PROTECTED] wrote:
 hello,

first I create a list:

 l - list(1-c(1,2,3))

then I run the following cycle, it takes over a minute(!) to
 complete on a very fast mashine:

 for(i in (1:10^5)) l[[length(l)+1]] - c(i,i+1,i)

 How can I fill a list faster? (This is just a demo test, the elements
 of the list are calculated iteratively in an algorithm)

 Are there any packages and documents on how to use more advanced and
 fast data structures like linked-lists, hash-tables or trees for
 example?

 Thank you,
 Balazs Torma

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] filling a list faster

2007-07-13 Thread jim holtman

Actually if you are really interested in the list, then just do the
lapply and compute your data; it seems to be even faster than the
matrix:

 system.time(l.1 - lapply(1:10^5, function(i) c(i, i+1, i)))
   user  system elapsed
   0.500.000.61
 l.1[1:4]
[[1]]
[1] 1 2 1

[[2]]
[1] 2 3 2

[[3]]
[1] 3 4 3

[[4]]
[1] 4 5 4



On 7/13/07, Philippe Grosjean [EMAIL PROTECTED] wrote:
 If all the data coming from your iterations are numeric (as in your toy
 example), why not to use a matrix with one row per iteration? Also, do
 preallocate the matrix and do not add row or column names before the end
 of the calculation. Something like:

   m - matrix(rep(NA, 3*10^5), ncol = 3)
   system.time(for(i in (1:10^5)) m[i, ] - c(i,i+1,i))
user  system elapsed
   1.362   0.033   1.424

 That is, about 1.5sec on my Intel Duo Core 2.33Mhz MacBook Pro, compared to:

   l - list(1-c(1,2,3))
   system.time(for(i in (1:10^5)) l[[length(l)+1]] - c(i,i+1,i))
user  system elapsed
 191.629  49.110 248.454

 ... more than 4 minutes for your code.

 By the way, what is your very fast machine, that is actually four
 times faster than mine (gr!)?

 Best,

 Philippe Grosjean

 ..∞}))
  ) ) ) ) )
 ( ( ( ( (Prof. Philippe Grosjean
  ) ) ) ) )
 ( ( ( ( (Numerical Ecology of Aquatic Systems
  ) ) ) ) )   Mons-Hainaut University, Belgium
 ( ( ( ( (
 ..

 Balazs Torma wrote:
  hello,
 
  first I create a list:
 
  l - list(1-c(1,2,3))
 
  then I run the following cycle, it takes over a minute(!) to
  complete on a very fast mashine:
 
  for(i in (1:10^5)) l[[length(l)+1]] - c(i,i+1,i)
 
  How can I fill a list faster? (This is just a demo test, the elements
  of the list are calculated iteratively in an algorithm)
 
  Are there any packages and documents on how to use more advanced and
  fast data structures like linked-lists, hash-tables or trees for
  example?
 
  Thank you,
  Balazs Torma
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compute rank within factor groups

2007-07-12 Thread jim holtman

Is this what you are looking for:

 x
report score
9 ADEA  0.96
8 ADEA  0.90
11 Asylum_FED9  0.86
3 ADEA  0.75
14 Asylum_FED9  0.60
5 ADEA  0.56
13 Asylum_FED9  0.51
16 Asylum_FED9  0.51
2 ADEA  0.42
7 ADEA  0.31
17 Asylum_FED9  0.27
1 ADEA  0.17
4 ADEA  0.17
6 ADEA  0.12
10ADEA  0.11
12 Asylum_FED9  0.10
15 Asylum_FED9  0.09
18 Asylum_FED9  0.07
 x$rank - ave(x$score, x$report, FUN=rank)
 x
report score rank
9 ADEA  0.96 10.0
8 ADEA  0.90  9.0
11 Asylum_FED9  0.86  8.0
3 ADEA  0.75  8.0
14 Asylum_FED9  0.60  7.0
5 ADEA  0.56  7.0
13 Asylum_FED9  0.51  5.5
16 Asylum_FED9  0.51  5.5
2 ADEA  0.42  6.0
7 ADEA  0.31  5.0
17 Asylum_FED9  0.27  4.0
1 ADEA  0.17  3.5
4 ADEA  0.17  3.5
6 ADEA  0.12  2.0
10ADEA  0.11  1.0
12 Asylum_FED9  0.10  3.0
15 Asylum_FED9  0.09  2.0
18 Asylum_FED9  0.07  1.0



On 7/12/07, Ken Williams [EMAIL PROTECTED] wrote:
 Hi,

 I have a data.frame which is ordered by score, and has a factor column:

  Browse[1] wc[c(report,score)]
  report score
  9 ADEA  0.96
  8 ADEA  0.90
  11 Asylum_FED9  0.86
  3 ADEA  0.75
  14 Asylum_FED9  0.60
  5 ADEA  0.56
  13 Asylum_FED9  0.51
  16 Asylum_FED9  0.51
  2 ADEA  0.42
  7 ADEA  0.31
  17 Asylum_FED9  0.27
  1 ADEA  0.17
  4 ADEA  0.17
  6 ADEA  0.12
  10ADEA  0.11
  12 Asylum_FED9  0.10
  15 Asylum_FED9  0.09
  18 Asylum_FED9  0.07
  Browse[1]

 I need to add a column indicating rank within each factor group, which I
 currently accomplish like so:

  wc$rank - 0
  for(report in as.character(unique(wc$report))) {
wc[wc$report==report,]$rank - 1:sum(wc$report==report)
  }

 I have to wonder whether there's a better way, something that gets rid of
 the for() loop using tapply() or by() or similar.  But I haven't come up
 with anything.

 I've tried these:

  by(wc, wc$report, FUN=function(pr){pr$rank - 1:nrow(pr)})

  by(wc, wc$report, FUN=function(pr){wc[wc$report %in% pr$report,]$rank -
 1:nrow(pr)})

 But in both cases the effect of the assignment is lost, there's no $rank
 column generated for wc.

 Any suggestions?

  -Ken

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2007-07-12 Thread jim holtman

Is this what you want to do:

 auto.length - c(12,15,6)
 for(i in 1:3) {
+ nam - paste(auto.data,i, sep=.)
+ assign(nam, as.data.frame(matrix(1:auto.length[i], ncol=3)))
+ }
 auto.data.1
  V1 V2 V3
1  1  5  9
2  2  6 10
3  3  7 11
4  4  8 12
 auto.data.2
  V1 V2 V3
1  1  6 11
2  2  7 12
3  3  8 13
4  4  9 14
5  5 10 15
 # output the data
 for(i in 1:3){
+ cat(x - paste('auto.data.', i, sep=''), '\n')
+ print(get(x))
+ }
auto.data.1
  V1 V2 V3
1  1  5  9
2  2  6 10
3  3  7 11
4  4  8 12
auto.data.2
  V1 V2 V3
1  1  6 11
2  2  7 12
3  3  8 13
4  4  9 14
5  5 10 15
auto.data.3
  V1 V2 V3
1  1  3  5
2  2  4  6



On 7/12/07, Drescher, Michael (MNR) [EMAIL PROTECTED] wrote:
 Hi All,

 I want to automatically generate a number of data frames, each with an
 automatically generated name and an automatically generated number of
 rows. The number of rows has been calculated before and is different for
 all data frames (e.g. c(4,5,2)). The number of columns is known a priori
 and the same for all data frames (e.g. c(3,3,3)). The resulting data
 frames could look something like this:

  auto.data.1
  X1 X2 X3
 1  0  0  0
 2  0  0  0
 3  0  0  0
 4  0  0  0

  auto.data.2
  X1 X2 X3
 1  0  0  0
 2  0  0  0
 3  0  0  0
 4  0  0  0
 5  0  0  0

  auto.data.3
  X1 X2 X3
 1  0  0  0
 2  0  0  0

 Later, I want to fill the elements of the data frames with values read
 from somewhere else, automatically looping through the previously
 generated data frames.

 I know that I can automatically generate variables with the right number
 of elements with something like this:

  auto.length - c(12,15,6)
  for(i in 1:3) {
 + nam - paste(auto.data,i, sep=.)
 + assign(nam, 1:auto.length[i])
 + }
  auto.data.1
  [1]  1  2  3  4  5  6  7  8  9 10 11 12
  auto.data.2
  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
  auto.data.3
 [1]  1 2 3 4 5 6

 But how do I turn these variables into data frames or give them any
 dimensions? Any commands such as 'as.matrix', 'data.frame', or 'dim' do
 not seem to work. I also seem not to be able to access the variables
 with something like auto.data.i since:

  auto.data.i
 Error: object auto.data.i not found

 Thus, how would I be able to automatically write to the elements of the
 data frames later in a loop such as ...

  for(i in 1:3) {
 + for(j in 1:nrow(auto.data.i)) {   ### this obviously does not work
 since 'Error in nrow(auto.data.i) : object auto.data.i not found'
 + for(k in 1:ncol(auto.data.i)) {
 + auto.data.i[j,k] - 'some value'
 + }}}

 Thanks a bunch for all your help.

 Best, Michael


 Michael Drescher
 Ontario Forest Research Institute
 Ontario Ministry of Natural Resources
 1235 Queen St East
 Sault Ste Marie, ON, P6A 2E3
 Tel: (705) 946-7406
 Fax: (705) 946-2030

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] is.null doesn't work

2007-07-12 Thread jim holtman

'v' appears to be a list:

  v=c(`-`,`+`,1,`^`,`^`,NA,NA,X,9,X,2)
  i2=16
  v[i2]
[[1]]
NULL

 str(v)
List of 11
 $ :function (e1, e2)
 $ :function (e1, e2)
 $ : num 1
 $ :function (e1, e2)
 $ :function (e1, e2)
 $ : logi NA
 $ : logi NA
 $ : chr X
 $ : num 9
 $ : chr X
 $ : num 2

because you used backquotes(`) on the '-'; notice the difference:

 str(c(`-`,1))
List of 2
 $ :function (e1, e2)
 $ : num 1
 str(c('-',1))
 chr [1:2] - 1





On 7/12/07, Atte Tenkanen [EMAIL PROTECTED] wrote:
 Hi,

 What's wrong here?:

  v=c(`-`,`+`,1,`^`,`^`,NA,NA,X,9,X,2)
  i2=16
  v[i2]
 [[1]]
 NULL

  is.null(v[i2])
 [1] FALSE

 Is it a bug or have I misunderstood something?

 Atte Tenkanen
 University of Turku, Finland

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] exces return by mktcap decile for each year

2007-07-11 Thread jim holtman

 to do this.

 dat - read.table(test.data, header=TRUE)

 if( new.data %in% ls()) {
  rm( new.data)
 }
 yrs - as.character(unique( dat$yr))
 for (y in yrs) {
  bool - as.character(dat$yr) == y
  tmp.dat -  dat[ bool,]
  breaks - quantile(tmp.dat$mc,
 probs=seq(0,1,0.1),na.rm=TRUE)
  breaks[1] - breaks[1]*.9
 # breaks 0, else 1st value not in (a,b] interval
  cuts - cut(tmp.dat$mc, breaks)
  means.by.dec - by( tmp.dat$ret, cuts, mean)
  for ( i in seq(1, dim( tmp.dat)[1])) {
tmp.dat[i,dec.mean] - means.by.dec[ cuts[i]]
  }
  if(! new.data %in% ls()) {
new.data - tmp.dat
  }  else {
new.data - rbind( new.data, tmp.dat)
  }
 }

 Here is some test input data in the file test.data
 - test.data -
 mc yrret
  32902.233 01/01/1995  0.426
  15793.691 01/01/1995  0.024
   2375.868 01/01/1995  0.660
  54586.558 01/01/1996  0.497
  10674.900 01/01/1996  0.405
859.656 01/01/1996 -0.033
770.963 01/01/1995 -1.248
423.480 01/01/1995  0.654
   2135.504 01/01/1995  0.394
696.599 01/01/1995 -0.482
   5115.476 01/01/1995  0.352
821.347 01/01/1995  0.869
  43329.695 01/01/1995  0.495
   7975.151 01/01/1995  0.112
396.450 01/01/1995  0.956
843.870 01/01/1995  0.172
   2727.037 01/01/1995 -0.358
114.584 01/01/1995 -1.015
   1347.327 01/01/1995 -0.083
   4592.049 01/01/1995 -0.251
674.305 01/01/1995 -0.327
  39424.887 01/01/1996  0.198
   4447.383 01/01/1996 -0.045
   1608.540 01/01/1996 -0.109
217.151 01/01/1996  0.539
   1813.320 01/01/1996  0.754
145.170 01/01/1996  0.249
   3176.298 01/01/1996 -0.202
  14379.686 01/01/1996  0.013
   3009.059 01/01/1996 -0.328
   1781.406 01/01/1996 -0.158
   2576.215 01/01/1996  0.514
   1236.317 01/01/1996  0.346
   3003.735 01/01/1996  0.151
   1544.003 01/01/1996  0.482
   7588.657 01/01/1996  0.306
   1516.625 01/01/1996  0.183
   1596.098 01/01/1996  0.674
   2792.192 01/01/1996  0.528
   1276.702 01/01/1996  0.010
875.716 01/01/1996  0.189
   4858.450 01/01/1995  0.250
   2033.623 01/01/1995 -0.582
   2164.125 01/01/1995  0.631

 Here is the output which looks ok

  new.data
  mc yrret   dec.mean
 1  32902.233 01/01/1995  0.426  0.4605000
 2   4858.450 01/01/1995  0.250  0.301
 3   2033.623 01/01/1995 -0.582 -0.094
 4   2164.125 01/01/1995  0.631  0.6455000
 5  15793.691 01/01/1995  0.024  0.068
 6   2375.868 01/01/1995  0.660  0.6455000
 7770.963 01/01/1995 -1.248 -0.1895000
 8423.480 01/01/1995  0.654  0.198
 9   2135.504 01/01/1995  0.394 -0.094
 10   696.599 01/01/1995 -0.482 -0.4045000
 11  5115.476 01/01/1995  0.352  0.301
 12   821.347 01/01/1995  0.869 -0.1895000
 13 43329.695 01/01/1995  0.495  0.4605000
 14  7975.151 01/01/1995  0.112  0.068
 15   396.450 01/01/1995  0.956  0.198
 16   843.870 01/01/1995  0.172  0.0445000
 17  2727.037 01/01/1995 -0.358 -0.3045000
 18   114.584 01/01/1995 -1.015  0.198
 19  1347.327 01/01/1995 -0.083  0.0445000
 20  4592.049 01/01/1995 -0.251 -0.3045000
 21   674.305 01/01/1995 -0.327 -0.4045000
 22 39424.887 01/01/1996  0.198  0.236
 23  4447.383 01/01/1996 -0.045 -0.1235000
 24  1608.540 01/01/1996 -0.109  0.162
 25   217.151 01/01/1996  0.539  0.2516667
 26  1813.320 01/01/1996  0.754  0.162
 27   145.170 01/01/1996  0.249  0.2516667
 28  3176.298 01/01/1996 -0.202 -0.1235000
 29 14379.686 01/01/1996  0.013  0.236
 30  3009.059 01/01/1996 -0.328 -0.0885000
 31  1781.406 01/01/1996 -0.158  0.162
 32  2576.215 01/01/1996  0.514  0.521
 33  1236.317 01/01/1996  0.346  0.2675000
 34  3003.735 01/01/1996  0.151 -0.0885000
 35  1544.003 01/01/1996  0.482  0.578
 36  7588.657 01/01/1996  0.306  0.3555000
 37  1516.625 01/01/1996  0.183  0.0965000
 38 54586.558 01/01/1996  0.497  0.236
 39 10674.900 01/01/1996  0.405  0.3555000
 40   859.656 01/01/1996 -0.033  0.2516667
 41  1596.098 01/01/1996  0.674  0.578
 42  2792.192 01/01/1996  0.528  0.521
 43  1276.702 01/01/1996  0.010  0.0965000
 44   875.716 01/01/1996  0.189  0.2675000
 

 notice that records 1 and 13 fall into the same mc
 decile for the year 1995, and their ret mean is .4605
 and so forth for the other mc deciles in both years.

 I'd be interested to know if there is a cleaner way to
 do this. Thanks.

 Frank




 
 TV dinner still cooling?
 Check out Tonight's Picks on Yahoo! TV.
 http://tv.yahoo.com/



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Split graphs

2007-07-09 Thread jim holtman

How many columns do you have?  Is it 2 or 1000; can not tell from your
email.  A histogram of 2 values does not seem meaningful.

Do you want 1000 separate histograms, one per page, or multiple per
page?  Yes you can do it, the question is what/how do you want to do
it.

On 7/9/07, tian shen [EMAIL PROTECTED] wrote:
 Hello All,
  I have a question, which somehow I think it is easy, however, I just 
 couldn't get it.
  I want to histogram each row of a 1000*2 matrix( means it has 1000 rows), 
 and I want to see those 1000 pictures together. How can I do this? Am I able 
 to split a graph into 1000 parts and in each parts it contains a histogram 
 for one row?

  Thank you very much

  Jessie


 -


[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] parsing strings

2007-07-09 Thread jim holtman

Is this what you want:

 x - A10B10A10  B5AB 10 CD 12A10CD2EF3
 x - gsub( , , x)  # remove blanks
 y - gregexpr([A-Z]+\\s*[0-9]+, x )[[1]]

 substring(x, y, y + attr(y, 'match.length') - 1)
[1] A10  B10  A10  B5   AB10 CD12 A10  CD2  EF3



On 7/9/07, Drescher, Michael (MNR) [EMAIL PROTECTED] wrote:
 Hi All,



 I have strings made up of an unknown number of letters, digits, and
 spaces. Strings always start with one or two letters, and always end
 with one or two digits. A set of letters (one or two letters) is always
 followed by a set of digits (one or two digits), possibly with one or
 more spaces between the sets of letters and digits. A set of letters
 always belongs to the following set of digits and I want to parse the
 strings into these groups. As an example, the strings and the desired
 parsing results could look like this:



 A10B10, desired parsing result: A10 and B10

 A10  B5, desired parsing result: A10 and B5

 AB 10 CD 12, desired parsing result: AB10 and CD12

 A10CD2EF3, desired parsing result: A10, CD2, and EF3



 I assume that it is possible to search a string for letters and digits
 and then break the string where letters are followed by digits, however
 I am a bit clueless about how I could use, e.g., the 'charmatch' or
 'parse' commands to achieve this.



 Thanks a lot in advance for your help.



 Best, Michael







 Michael Drescher

 Ontario Forest Research Institute

 Ontario Ministry of Natural Resources

 1235 Queen St East

 Sault Ste Marie, ON, P6A 2E3

 Tel: (705) 946-7406

 Fax: (705) 946-2030




[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] making groups

2007-07-09 Thread jim holtman

It would be nice if you could supply an example of what your input
looks like and then what you would like your output to look like.  You
would probably use 'tapply', but I would have to see what you data
looks like.

On 7/9/07, Mag. Ferri Leberl [EMAIL PROTECTED] wrote:
 Dear everybody!
 If I have an array of numbers e.g. the points my students got at an
 examination, and a  key to group the numbers, e.g. the key which
 interval corresponds with which mark (two arrays of the same length or
 one 2x(number of marks)), how can I get the array of absolute
 frequencies of marks?
 I hope I have expressed my problem clearly.
 Thank you in advance.
 Mag. Ferri Leberl

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] one question about the loop

2007-07-08 Thread jim holtman

It is part of the standard 'util' library that comes with R

?combn
help.search('combination')


On 7/8/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 jim holtman [EMAIL PROTECTED] a Ã(c)critÂ :

  Is this what you want?
 
  t(combn(5,2))
 

 Well, it seems nice, but from which library does it come ?
 I try help.search(combn), but that did not give me any valuable
 information...

 Christophe



 
 Ce message a ete envoye par IMP, grace a l'Universite Paris 10 Nanterre

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 4 5 6 >

1 - 100 of 566 matches

Mail list logo