Re: [R] Performing operations only on selected data

2012-11-25 Thread Marcel Curlin
Thank you, this works very well. My only remaining question about this is
about how ifelse is working; I understand the basic syntax (df$condition2
gets assigned the value *runif(nrow(df1[df1$condition1=1,]),0,1)* or the
value *df$condition1* depending on whether or not df$condition1 meets the
criterion =1.

As I understand it, runif(nrow(df1[df1$condition1=1,]),0,1) is a vector
of random values with vector length equal to the number of rows meeting
df$condition1=1 and df$condition1 is just my column of condition1 values.
So the command seems to be going down row by row and assigning condition2
values from one of two vectors in an interleaved way. 

So my question is, how does R keep track of which item in each of the
vectors to assign to condition2? For example, if the first 4 entries of
condition1 are 1, 3, 4, 1,  how does R know to use the *first* entry of
vector runif(nrow(df1[df1$condition1=1,]),0,1) then the *second* and
*third* values of vector df$condition1, then the *second* value of vector
runif(nrow(df1[df1$condition1=1,]),0,1)?



--
View this message in context: 
http://r.789695.n4.nabble.com/Performing-operations-only-on-selected-data-tp4650646p4650803.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Performing operations only on selected data

2012-11-24 Thread Marcel Curlin
I spent some time on this simple question, also searched the forum,
eventually hacked my way to an ugly solution for my particular problem but I
would like to improve my coding:

I have data of the form:
df - expand.grid(group=c('copper', 'zinc', 'aluminum', 'nickel'),
condition1=c(1:4))

I would like to add a new data column condition2, with values equal to the
value of condition1 plus a random number from 0-1 (uniform distribution)  if
the value of condition1 is  1, or just condition1 if the value of
condition1 is  1. More generally, my interest is in manipulating the values
of condition1 if they meet one or more criteria, or keeping the values the
same otherwise. Thanks for any thoughts!

 



--
View this message in context: 
http://r.789695.n4.nabble.com/Performing-operations-only-on-selected-data-tp4650646.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculating number of elapsed days from starting date

2012-09-27 Thread Marcel Curlin
Hi 
I have data for events in rows, with columns for person and date. Each
person may have more than one event;

tC - textConnection(
Person  date
bob 1/1/00
bob 1/2/00
bob 1/3/00
dave1/7/00
dave1/8/00
dave1/10/00
kevin   1/2/00
kevin   1/3/00
kevin   1/4/00
)
data - read.table(header=TRUE, tC)
close.connection(tC)
rm(tC)

I would like to add a new column to my dataframe containing the calculated
number of elapsed days from the starting date for each person. So the new
dataframe would read

Person  dateDays
bob 1/1/00  0
bob 1/2/00  1
bob 1/3/00  2
dave1/7/00  0
dave1/8/00  1
dave1/10/00 3
kevin   1/2/00  0
kevin   1/3/00  1
kevin   1/4/00  2

Not sure how to do this, tried looking through the forum but didn't find
anything that seemed to apply. Suggestions appreciated.




--
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-number-of-elapsed-days-from-starting-date-tp4644333.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loop for multiple plots in figure

2012-06-27 Thread Marcel Curlin
Well at this point I have what I need (rough plot for data exploration) but
the simplicity of the first approach is quite elegant and it has become a
learning project. I have succeeded in formatting the overall plot OK but
have not been able to solve the problem of titles or any kind of
label/legend for the subplots. It seems that the title is called for each
datapoint, and then printed one below the other in the plot. Is there any
way at all to get a specific legend/title/text on each subplot?

Marcel



--
View this message in context: 
http://r.789695.n4.nabble.com/Loop-for-multiple-plots-in-figure-tp4634390p4634649.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loop for multiple plots in figure

2012-06-26 Thread Marcel Curlin
This solution works really nicely  I learned much by working through it.
However but I am having trouble with subplot formatting; setting
main=d$Subject results in the correct title over each plot but repeated
multiple times. Also I can't seem to format the axis labels and numbers to
reduce the space between them and the plot. Any more thoughts appreciated. 

revised code:

tC - textConnection(
Subject XvarYvarparam1  param2
bob 9   100 1   100
bob 0   110 1   200
steve   2   250 1   50
bob -5  175 0   35
dave22  260 0   343
bob 3   180 0   74
steve   1   290 1   365
kevin   5   380 1   546
bob 8   185 0   76
dave2   233 0   343
steve   -10 230 0   556
dave-10 233 1   400
steve   -7  250 1   388
dave3   568 0   555
kevin   10  380 0   57
kevin   4   390 0   50
bob 6   115 1   600
)
data - read.table(header=TRUE, tC)
close.connection(tC)
rm(tC)

plot_one - function(d){
 with(d, plot(Xvar, Yvar, t=n, tck=0.02, main=d$Subject, xlim=c(-14,14),
ylim=c(0,600))) # set limits
 with(d[d$param1 == 0,], points(Xvar, Yvar, col = 1)) # first line
 with(d[d$param1 == 1,], points(Xvar, Yvar, col = 2)) # second line

}

par(mfrow=c(2,2))
plyr::d_ply(data, Subject, plot_one)

--
View this message in context: 
http://r.789695.n4.nabble.com/Loop-for-multiple-plots-in-figure-tp4634390p4634482.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Loop for multiple plots in figure

2012-06-25 Thread Marcel Curlin
Hello, I have longitudinal data of the form below from N subjects; I am
trying to create figure with N small subplots on a single page, in which
each plot is from only one subject, and in each plot there is a separate
curve for each value of param1. 

So in this case, there would be four plots on the page (one each for Bob,
Steve, Kevin and Dave), and each plot would have two separate curves (one
for param1 = 1 and one for param1 = 0). The main title of the plot should be
the subject name. I also need to sort the order of the plots on the page by
param2.

I can do this with a small number of subjects using manual commands. For a
larger number I know that a 'for loop' is called for, but can't figure out
how to get each of the subjects to plot separately, could not figure it out
from the existing posts.  For now I want to do this in the basic environment
though I know that lattice could also work (might try that later). Any help
appreciated

tC - textConnection(
Subject XvarYvarparam1  param2
bob 9   100 1   100
bob 0   250 1   200
steve   2   454 1   50
bob -5  271 0   35
bob 3   10  0   74
steve   1   500 1   365
kevin   5   490 1   546
bob 8   855 0   76
dave2   233 0   343
steve   -10 388 0   556
steve   -7  284 1   388
dave3   568 1   555
kevin   4   247 0   57
bob 6   300 1   600
)
data - read.table(header=TRUE, tC)
close.connection(tC)
rm(tC)

par(mfrow=c(2,2)

--
View this message in context: 
http://r.789695.n4.nabble.com/Loop-for-multiple-plots-in-figure-tp4634390.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] linear regression in a ragged array

2011-03-21 Thread Marcel Curlin
Hello,
I have a large dataset of the form

subj   var1   var2   
001100200
001120226
001130238
001140245
001150300
002110205
002125209
003101233
003115254

I would like to perform linear regression of var2 on var1 for each subject
separately. It seems like I should be able to use the tapply function as you
do for simple operations (like finding a mean of var1 for each subject), but
I am not sure of the correct syntax for this. Is there a way to do this?

Many thanks, Marcel

--
View this message in context: 
http://r.789695.n4.nabble.com/linear-regression-in-a-ragged-array-tp3393033p3393033.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] trouble with histograms

2010-10-26 Thread Marcel Curlin

Hi,
I have tab-delimited data with an unequal number of entries per column, of
the sort:

A   B   C
1   2   2
3   4   1
5   2   2   
6   2
5   2
3
6
2

I would like to make a histogram of the frequencies of each represented
number in a stacked histogram, where you can see the contribution of each
group (A, B or C) to the total height of the bar, and each bar labeled with
the represented number. So, there would be a bar labeled 1 of height 2,
half one color for group A, and half another color for group B.

So far,
I can get my data into a dataframe
data - read.table(myfile)

I think I first have to use hist to get the frequencies of each, and I
have figured out how to use breaks to make bins;
 bins=seq(0.5,6.5,by=1)
hist(data$A, header=T, sep=\t, breaks=bins)

Lots of trouble from then on, though, and I just can't get this into a
usable plot. Any help appreciated.

Marcel
-- 
View this message in context: 
http://r.789695.n4.nabble.com/trouble-with-histograms-tp3014838p3014838.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R code output issues

2010-09-03 Thread Marcel Curlin

Hi all,
I have a short R code file that I am using to perform calculations on a
dataset. I am having a few issues with output: 

1. Although my input data file is 2149 lines long, when I type results.df
from the command line, I get the appropriate calculation results for only
the first 46 rows. Same result if I sink the output to a file, and type
results.df at the command line. This creates a file with the first 46
entries. I do get the entire input data file back if I type data, and I
can't see anything in my input file around line 46 that would account for
this. 

2. If I run the code from a file using the command
source(TransmissionCalc2) with the results.df command embedded in the
file, there is no output to the terminal at all (or to the output file, if I
use sink). Sink just creates an empty file. 

So, not sure why my results dataframe seems to only include a small fraction
of the data, or why the write commands are ignored when embedded in the code
and called by source(etc 

CODE

rm(list = ls(all = TRUE))
alldata
-read.table(/Users/marcel/Desktop/V1V2TransmAnalysis/3_transmissiondata,
header=T)
#sink(/Users/marcel/Desktop/V1V2TransmAnalysis/4_output)
data - data.frame(alldata)
V1V2means - with(data, tapply(V1V2, list(Pair, DR), mean))
V1V4means - with(data, tapply(V1V4, list(Pair, DR), mean))
results.df - data.frame(V1V2means, V1V4means, V1V2dif = V1V2means[, R] -
V1V2means[, D], V1V4dif = V1V4means[, R] - V1V4means[, D] )
data

SAMPLE OF INPUT DATA FILE

PairDRV1V2V1V4
1D63277
1D63277
1D63277
.


Thoughts greatly appreciated.

Marcel
-- 
View this message in context: 
http://r.789695.n4.nabble.com/R-code-output-issues-tp2526415p2526415.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R code output issues

2010-09-03 Thread Marcel Curlin

Thanks for the input

Adding print took care of the first problem. The output looks like what I
would expect, so I think the code is doing what I would like it to for the
first 44 observations. 

 print(results.df)
  DR  D.1  R.1 V1V2dif V1V4dif
1  68.92500 75.0 284.5250 296.   6.075  11.475
2  68.81081 67.0 287.7568 283.  -1.8108108  -4.7567568
3  65.43902 62.0 282.5366 279.  -3.4390244  -3.5365854
4  66.6 67.25000 286.7000 288.2500   0.650   1.550
5  68.94872 71.0 297.8462 305.   2.0512821   7.1538462
Etc..

When I use str(results.df) it does seem to indicate a short file of 44
observations.  

'data.frame':44 obs. of  6 variables:
 $ D  : num  68.9 68.8 65.4 66.6 68.9 ...
 $ R  : num  75 67 62 67.2 71 ...
 $ D.1: num  285 288 283 287 298 ...
 $ R.1: num  296 283 279 288 305 ...
 $ V1V2dif: num  6.08 -1.81 -3.44 0.65 2.05 ...
 $ V1V4dif: num  11.48 -4.76 -3.54 1.55 7.15 ...

So I am still left with that question..
-- 
View this message in context: 
http://r.789695.n4.nabble.com/R-code-output-issues-tp2526415p2526469.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] while loop until end of file

2010-08-29 Thread Marcel Curlin

Hi Guys,
stumped by a simple problem. I would like to take a file of the form

Pair group param1
1   D   10
1   D   10
1   R   10
1   D   10
2   D   10
2   D   10
2   D   10
2   R   10
2   R   10
etc..

and for each pair, calculate the average of param1 for group D entries,
subtract from the average of param1 for the group R entries, and then write
the results (ie, AveParam1D  AveParam1R dif) in a tab delimited file. Below
is the start of my code. the difficulty i am having is in creating a while
loop that stops once there are no more lines to read from the input file.
also not sure of the best way to write in the results, though I think I
should use rbind. 

data - data.frame(alldata)

i - 1
# need appropriate while loop
{
ss - subset(data, Pair==i)
ssD - subset(ss, DR==D)
ssR - subset(ss, DR==R)
p1 - mean(ssD$Length)
p2 - mean(ssR$Length)
dif - p1-p2
out - rbind(data.frame(P1, P2, diff)
i -i + 1
}

write.table(out, file=out, quote=F, row.names=F, col.names=T, sep=\t)

I have spent an absurd amount of time trying to sort this out with the
manual and forum searches. Any suggestions appreciated. 

Marcel

-- 
View this message in context: 
http://r.789695.n4.nabble.com/while-loop-until-end-of-file-tp2399544p2399544.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Formatting numerical output

2010-04-24 Thread MARCEL CURLIN
Hello,
I am new to R and am having difficulty formatting numerical output from a 
regression analysis. My code iteratively performs linear regression on a 
dataset while excluding certain data ranges. 

My code:
rm(list = ls(all = TRUE))
sink(outfile)
dat - read.table(testdat, sep=\t, header=TRUE)
int = 0.2

for (x in c(0:20)) { 
subdat - subset(dat, time = int * x | time  (int*x) + int)   
#excludes range of time data between int * x and (int*x) + int
lm.subdat - lm(length~time, subdat)
#regression
rs.subdat - summary(lm.subdat)$r.squared   
#getting R-squared information
txt1 - (Excluded range: Time)
#creating components of output message
txt2 - (R^2 =)   
#creating components of output message
lowend - (int*x)
highend - (int*x + int)
output - c(txt1, lowend, highend, txt2, rs.subdat)
print.noquote(output, sep=\t)
}
sink()

Currently my output looks like:
[1] Excluded range: Time 00.2 
[4] R^2 =0.111526872884505   
[1] Excluded range: Time 0.2  0.4 
[4] R^2 =0.0706332920267015  
[1] Excluded range: Time 0.4  0.6 
[4] R^2 =0.0691466100802879

I would like the output format to look like:
Excluded range: Time 1.0 - 1.2tabR^2 = 0.45  
Excluded range: Time 1.2 - 1.4tabR^2 = 0.5
etc.

I would like to 
1. get time and R^2 data on the same line
2. control (reduce) the number of digits reported for R^2
3. reduce the large number of empty spaces between R^2' and value.

I searched a lot but could not find much on this. Any help on these specifics 
or general comments on formatting numerical output greatly appreciated. 

thanks,

Marcel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.