Re: [R] OK - I got the data - now what? :-)

2009-07-08 Thread Michael A. Miller
 Mark wrote:

 Currently my data is one experiment per row, but that's
 wasting space as most experiments only take 20% of the row
 and 80% of the row is filled with 0's. I might want to make
 the array more narrow and have a flag somewhere in the 1st
 10 columns that says the this row is a continuation row
 from the previous row. That way I could pack the array
 better, use less memory and when I do finally test for 0 I
 have a short line to traverse?

This may be a bit off track from the data manipulation you are
working on, but I thought I'd point out that another way to
handle this sort of data is to make a table with one measurement
per row, rather than one experiment per row.

experiment measurement value
 A   1  0.27
 A   2  0.66
 A   3  0.24
 A   4  0.55
 B   1  0.13
 B   2  0.65
 B   3  0.83
 B   4  0.41
 B   5  0.92
 B   6  0.67
 C   1  0.75
 C   2  0.97
 C   3  0.49
 C   4  0.58
 D   1  1.00
 D   2  0.71
 E   1  0.11
 E   2  0.50
 E   3  0.98
 E   4  0.07
 E   5  0.94
 E   6  0.57
 E   7  0.34
 E   8  0.21


If you wrote the output of your calculations in this way, one
value per line, it can easily be read into R as a data.frame and
handled with less need for munging.  No need to remove the
zero-padding because the zeros aren't needed in the first place.

You can subset the data with subset, as in

  test - read.table('test.dat',header=TRUE)
  expA - subset(test, experiment=='A')
  expB - subset(test, experiment=='B')

so there is no need to deal with ragged/zero-padded arrays. Your
plots can be grouped automatically with lattice: 

require(lattice)
xyplot(value ~ measurement, data=test, group=experiment, type='b')
xyplot(value ~ measurement | experiment, data=test, type='b')


It is simple to do calculations by experiment using tapply.  For
example


 with(test, tapply(value, experiment, mean))
A B C D E 
0.430 0.6016667 0.6975000 0.855 0.465 
 

 with(test, tapply(measurement, experiment, max))
A B C D E 
4 6 4 2 8 



Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-08 Thread Mark Knecht
On Wed, Jul 8, 2009 at 10:51 AM, Michael A. Millermmill...@iupui.edu wrote:
 Mark wrote:

     Currently my data is one experiment per row, but that's
     wasting space as most experiments only take 20% of the row
     and 80% of the row is filled with 0's. I might want to make
     the array more narrow and have a flag somewhere in the 1st
     10 columns that says the this row is a continuation row
     from the previous row. That way I could pack the array
     better, use less memory and when I do finally test for 0 I
     have a short line to traverse?

 This may be a bit off track from the data manipulation you are
 working on, but I thought I'd point out that another way to
 handle this sort of data is to make a table with one measurement
 per row, rather than one experiment per row.

 experiment measurement value
         A           1  0.27
         A           2  0.66
         A           3  0.24
         A           4  0.55
         B           1  0.13
         B           2  0.65
         B           3  0.83
         B           4  0.41
         B           5  0.92
         B           6  0.67
         C           1  0.75
         C           2  0.97
         C           3  0.49
         C           4  0.58
         D           1  1.00
         D           2  0.71
         E           1  0.11
         E           2  0.50
         E           3  0.98
         E           4  0.07
         E           5  0.94
         E           6  0.57
         E           7  0.34
         E           8  0.21


 If you wrote the output of your calculations in this way, one
 value per line, it can easily be read into R as a data.frame and
 handled with less need for munging.  No need to remove the
 zero-padding because the zeros aren't needed in the first place.

 You can subset the data with subset, as in

  test - read.table('test.dat',header=TRUE)
  expA - subset(test, experiment=='A')
  expB - subset(test, experiment=='B')

 so there is no need to deal with ragged/zero-padded arrays. Your
 plots can be grouped automatically with lattice:

 require(lattice)
 xyplot(value ~ measurement, data=test, group=experiment, type='b')
 xyplot(value ~ measurement | experiment, data=test, type='b')


 It is simple to do calculations by experiment using tapply.  For
 example


 with(test, tapply(value, experiment, mean))
        A         B         C         D         E
 0.430 0.6016667 0.6975000 0.855 0.465


 with(test, tapply(measurement, experiment, max))
 A B C D E
 4 6 4 2 8



 Mike


Mike,
   It's not really that far off track as I didn't have any background
when I started this in R. This is the first time I've used it. I
simply chose to use a format that I thought would work for me in both
Excel and R. I do like your examples.

   My impression of reshape coupled with cast is that it's pretty
capable of giving me more or less the same format you suggest although
it is a bit of work. Currently in my files I save only the start and
finish times of the experiments and planned on calculating all the
times in the middle if necessary. With this format I'd just write them
out on each line and save that work in R.

   I suppose the files using this alternative format would be a lot
larger on disk. I currently have 10 values + 500 observations per
experiment with an average experiment tracking file containing maybe
500-1000 experiments. With this format in the worst I suppose I'd have
(10+1) * 1000 per experiment on disk, but on average it would be less
than that because as you say I wouldn't write out any zeros. Once in R
in memory they'd be equivalent. Disk space doesn't matter but reading
and writing the files might be slower. I suppose I don't really have
to write the zeros out anyway, but at this point it's jsut one
additional subset after going through reshape.

   It might be an advantage to get to the subset commands immediately
but still I've got 10 independent variables and I suspect I'm going to
be using reshape/cast more than once to get to my answers so I haven't
been against learning how to work with it.

   Overall they are good inputs and I appreciate them. Thanks!

Cheers,
Mark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-07 Thread Petr PIKAL
Hi

r-help-boun...@r-project.org napsal dne 06.07.2009 01:58:38:

 On Sun, Jul 5, 2009 at 1:44 PM, hadley wickhamh.wick...@gmail.com 
wrote:
I think the root cause of a number of my coding problems in R right
  now is my lack of skills in reading and grabbing portions of the data
  out of arrays. I'm new at this. (And not a programmer) I need to find
  some good examples to read and test on that subject. If I could 
locate
  which column was called C1, then read row 3 from C1 up to the last
  value before a 0, I'd have proper data to plot for one line. Repeat 
as
  necessary through the array and I get all the lines. Doing the lines
  one at a time should allow me the opportunity to apply color or not
  plot based on values in the first few columns.
 
  Thanks,
  Mark
 
  test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
  C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
  test-round(test,2)
 
  #Make array ragged
  test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
  test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
  test$C6[7]-0
  test$C4[8]-0;test$C5[8]-0;test$C6[8]-0
 
  #Print array
  test
 
  Are the zeros always going to be arranged like this? i.e. for
  experiment there is a point at which all later values are zero?  If
  so, the following is a much simpler way of getting to the core of your
  data, without fussing with overly complicated matrix indexing:
 
  library(reshape)
  testm - melt(test, id = c(A, B))
  subset(testm, value  0)
 
  I suspect you will also find this form easier to plot and analyse.
 
  Hadley
 
  --
  http://had.co.nz/
 
 
 Hi Hadley,
I wanted to look at reshape.
 
Yes, there exists a point in each row (unless I get to the end with
 all numbers) where I get to a zero and everything to the right is
 zero.
 
I'm looking at ReShape. It's interesting but I clearly don't
 understand it yet so I'm reading your ReShaping data with the reshap
 package form 11/07. Interesting.
 
I know so little about R that I'm sort of drowning at this point
 that it's hard for me to understand why this would make plotting
 easier. Analysis possibly. Just the way it goes when you get started
 with something new.

E.g. to give different colour according to C1-C6 and/or different shape 
for each A value.

test. - subset(testm, value  0)
plot(test.$value, col=as.numeric(test.$variable), pch=test.$A)

And even fancier plots with ggplot2 package.

Regards
Petr


 
In ReShape lingo I think I have ID's. They cover things like time,
 date, success/failure and a few other things of interest. Once the
 data starts on a row it is all data from there on to the end of the
 row.
 
My initial goal is to make a line plot of the data on a single row.
 All the data points should connect together. There is no real
 interaction planned with data on other rows, at least at this time.
 
Thanks for the pointers and the code stub. I'll be looking at this.
 
 Cheers,
 Mark
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-06 Thread Mark Wardle
Hi. As I said in my first email, converting your data into a long
format makes a lot of sense. I'm sorry that you find it hard ... to
understand why this would make plotting easier.

Wide format:
Subject ID, Experiment ID, humidity, light, whatever, T1, T2,T3,T4.

is much better rotated to be
Subject ID, Experiment ID, humidity, light, whatever, time, result

So you end up with multiple rows per patient/individual/experiment. It
is much easier to analyse and plot data like this, particularly if the
original data is ragged. ie. you have a different number of
measurements per patient/individual/experiment.

Many plotting functions will support connecting related data (e.g. by
virtue of a particular identifier) and support much of what you are
likely to want (different plotting symbols, panelled plots depending
on experimental conditions etc) without you having to manually work
through data as you are suggesting.

Best wishes,

Mark


2009/7/6 Mark Knecht markkne...@gmail.com:
 On Sun, Jul 5, 2009 at 1:44 PM, hadley wickhamh.wick...@gmail.com wrote:
   I think the root cause of a number of my coding problems in R right
 now is my lack of skills in reading and grabbing portions of the data
 out of arrays. I'm new at this. (And not a programmer) I need to find
 some good examples to read and test on that subject. If I could locate
 which column was called C1, then read row 3 from C1 up to the last
 value before a 0, I'd have proper data to plot for one line. Repeat as
 necessary through the array and I get all the lines. Doing the lines
 one at a time should allow me the opportunity to apply color or not
 plot based on values in the first few columns.

 Thanks,
 Mark

 test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
 C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
 test-round(test,2)

 #Make array ragged
 test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
 test$C6[7]-0
 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0

 #Print array
 test

 Are the zeros always going to be arranged like this? i.e. for
 experiment there is a point at which all later values are zero?  If
 so, the following is a much simpler way of getting to the core of your
 data, without fussing with overly complicated matrix indexing:

 library(reshape)
 testm - melt(test, id = c(A, B))
 subset(testm, value  0)

 I suspect you will also find this form easier to plot and analyse.

 Hadley

 --
 http://had.co.nz/


 Hi Hadley,
   I wanted to look at reshape.

   Yes, there exists a point in each row (unless I get to the end with
 all numbers) where I get to a zero and everything to the right is
 zero.

   I'm looking at ReShape. It's interesting but I clearly don't
 understand it yet so I'm reading your ReShaping data with the reshap
 package form 11/07. Interesting.

   I know so little about R that I'm sort of drowning at this point
 that it's hard for me to understand why this would make plotting
 easier. Analysis possibly. Just the way it goes when you get started
 with something new.

   In ReShape lingo I think I have ID's. They cover things like time,
 date, success/failure and a few other things of interest. Once the
 data starts on a row it is all data from there on to the end of the
 row.

   My initial goal is to make a line plot of the data on a single row.
 All the data points should connect together. There is no real
 interaction planned with data on other rows, at least at this time.

   Thanks for the pointers and the code stub. I'll be looking at this.

 Cheers,
 Mark

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Dr. Mark Wardle
Specialist registrar, Neurology
Cardiff, UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-06 Thread Mark Knecht
Hi Mark,
   Don't be the least bit sorry that I'm finding any of this hard to
understand. That's my problem. I ordered the Phil Spektor's Data
Manipulation with R (Use R) book last night as I realize I need to go
through some sort of training. Hopefully that will help clear up some
of my questions about the language in general without burdening this
list so much.

   This morning, taking your input to heart, I started working more
with Hadley's code example. ReShape is pretty slick. I added a

MyExperiments - cast(MyResults, A ~ variable)

and got a new data.frame that looks like it's more or less ready to
print. Note that I'm not attached to data.frames. It's just that I get
one with read.csv and then don't know when to change it to something
else.

 I then tried cast to put the molten data back into a data.frame.
(Maybe this is the point to switch to a list or some other type?) That
done then

MyExperiments[1,]

gives me back the data for experiment #1 with the experiment number in
column 1. If I can figure out how to get rid of that then I think I
can get the experiment plotted. Put that in a loop and I should get
1000 experiments plotted which is my goal.

   This is all very cool as it turns out to be very few lines of code
to dig through the array. I'll have a couple of other problems (for
me) in working with the real array as the name space is much bigger
and I need to learn how to build things like (C1,C2,C3,
...,C1200) automatically, but I'm sure there's a way to do that.

   Note that the code below can 'fail' in the sense of having NA's in
the middle because the runif doesn't guarantee 0's to the right. My
real data won't have that problem

Thanks,
Mark



library(reshape)

test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
test-round(test,2)

#Make array ragged
test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
test$C6[7]-0
test$C4[8]-0;test$C5[8]-0;test$C6[8]-0

#Print array
test

#Display column names
names(test)

ReShapeX - melt(test, id = c(A, B))

MyResults-subset(ReShapeX, value  0)

names(MyResults)

MyResults

MyExperiments - cast(MyResults,A ~ variable)

class(MyExperiments)

MyExperiments[1,]
MyExperiments[2,]
MyExperiments[3,]


On Mon, Jul 6, 2009 at 3:13 AM, Mark Wardlem...@wardle.org wrote:
 Hi. As I said in my first email, converting your data into a long
 format makes a lot of sense. I'm sorry that you find it hard ... to
 understand why this would make plotting easier.

 Wide format:
 Subject ID, Experiment ID, humidity, light, whatever, T1, T2,T3,T4.

 is much better rotated to be
 Subject ID, Experiment ID, humidity, light, whatever, time, result

 So you end up with multiple rows per patient/individual/experiment. It
 is much easier to analyse and plot data like this, particularly if the
 original data is ragged. ie. you have a different number of
 measurements per patient/individual/experiment.

 Many plotting functions will support connecting related data (e.g. by
 virtue of a particular identifier) and support much of what you are
 likely to want (different plotting symbols, panelled plots depending
 on experimental conditions etc) without you having to manually work
 through data as you are suggesting.

 Best wishes,

 Mark


 2009/7/6 Mark Knecht markkne...@gmail.com:
 On Sun, Jul 5, 2009 at 1:44 PM, hadley wickhamh.wick...@gmail.com wrote:
   I think the root cause of a number of my coding problems in R right
 now is my lack of skills in reading and grabbing portions of the data
 out of arrays. I'm new at this. (And not a programmer) I need to find
 some good examples to read and test on that subject. If I could locate
 which column was called C1, then read row 3 from C1 up to the last
 value before a 0, I'd have proper data to plot for one line. Repeat as
 necessary through the array and I get all the lines. Doing the lines
 one at a time should allow me the opportunity to apply color or not
 plot based on values in the first few columns.

 Thanks,
 Mark

 test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
 C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
 test-round(test,2)

 #Make array ragged
 test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
 test$C6[7]-0
 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0

 #Print array
 test

 Are the zeros always going to be arranged like this? i.e. for
 experiment there is a point at which all later values are zero?  If
 so, the following is a much simpler way of getting to the core of your
 data, without fussing with overly complicated matrix indexing:

 library(reshape)
 testm - melt(test, id = c(A, B))
 subset(testm, value  0)

 I suspect you will also find this form easier to plot and analyse.

 Hadley

 --
 http://had.co.nz/


 Hi Hadley,
   I wanted to look at reshape.

   Yes, there exists a point in each row (unless I get to the end with
 all numbers) where I get 

Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread Mark Wardle
Hi. Essentially your data is currently in wide format, with repeated
measures in different columns. For most analysis and in particular for
graphing, it is frequently helpful to reshape your data into a long
format, with one row per data value and additional variables to list
experiment or subject identifier, experimental conditions etc.

see ?reshape and Dr. Wickham's reshape package (http://had.co.nz/reshape/)

Good luck,

Mark


2009/7/5 Mark Knecht markkne...@gmail.com:
 OK, I guess I'm getting better at the data part of R. I wrote a
 program outside of R this morning to dump a bunch of experimental
 data. It's a sort of ragged array - about 700 rows and 400 columns,
 but the amount of data in each column varies based on the length of
 the experiment. The real data ends with a 0 following some non-zero
 value. It might be as short as 5 to 10 columns or as many as 390. The
 first 9 columns contain some data about when the experiment was run
 and a few other things I thought I might be interested in later. All
 the data starts in column 10 and has headers saying C1, C2, C3, C4,
 etc., up to C390 The first value for every experiment is some value I
 will normalize and then the values following are above and below the
 original tracing out the path that the experiment took, ending
 somewhere to the right but not a fixed number of readings.

 R reads it in fine and it looks good so far.

 Now, what I thought I might do with R is plot all 700 rows as
 individual lines, giving them some color based on info in columns 1-9,
 but suddenly I'm lost again in plots which I think should be fairly
 easy. How would I go about creating a plot for even one line, much
 less all of them? I don't have a row with 1,2,3,4 to us as the X axis
 values. I could go back and put one in the data but then I don't think
 that should really be required, or I could go back and make the
 headers for the whole array 1:400 and then plot from 10:400 but I
 thought I read that headers cannot start with numbers.

 Maybe the X axis values for a plot can actually be non-numeric C1, C2,
 C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe
 I should strip the C from C1 and be left with 1? Maybe the best thing
 is to copy the data for one line to another data.frame or array and
 then plot that?

 Just sort of lost looking at help files. Thanks for any ideas you can
 send along. Ask questions if I didn't explain my problem well enough.
 Not looking for anyone to do my work, just trying to get the concepts
 right

 Cheers,
 Mark

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Dr. Mark Wardle
Specialist registrar, Neurology
Cardiff, UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread Mark Knecht
On Sat, Jul 4, 2009 at 5:22 PM, jim holtmanjholt...@gmail.com wrote:
 See if this example helps; show how to either plot the row or columns
 of a data frame:

 test - data.frame(C1=runif(10), C2=runif(10), C3=runif(10))
 test
           C1        C2        C3
 1  0.91287592 0.3390729 0.4346595
 2  0.29360337 0.8394404 0.7125147
 3  0.45906573 0.3466835 0.344
 4  0.33239467 0.3337749 0.3253522
 5  0.65087047 0.4763512 0.7570871
 6  0.25801678 0.8921983 0.2026923
 7  0.47854525 0.8643395 0.7111212
 8  0.76631067 0.3899895 0.1216919
 9  0.08424691 0.7773207 0.2454885
 10 0.87532133 0.9606180 0.1433044
 # this will plot each column (C1, C2, C3)
 matplot(test, type='o')
 # plot each row
 matplot(t(test), type='o')


 On Sat, Jul 4, 2009 at 8:02 PM, Mark Knechtmarkkne...@gmail.com wrote:
 OK, I guess I'm getting better at the data part of R. I wrote a
 program outside of R this morning to dump a bunch of experimental
 data. It's a sort of ragged array - about 700 rows and 400 columns,
 but the amount of data in each column varies based on the length of
 the experiment. The real data ends with a 0 following some non-zero
 value. It might be as short as 5 to 10 columns or as many as 390. The
 first 9 columns contain some data about when the experiment was run
 and a few other things I thought I might be interested in later. All
 the data starts in column 10 and has headers saying C1, C2, C3, C4,
 etc., up to C390 The first value for every experiment is some value I
 will normalize and then the values following are above and below the
 original tracing out the path that the experiment took, ending
 somewhere to the right but not a fixed number of readings.

 R reads it in fine and it looks good so far.

 Now, what I thought I might do with R is plot all 700 rows as
 individual lines, giving them some color based on info in columns 1-9,
 but suddenly I'm lost again in plots which I think should be fairly
 easy. How would I go about creating a plot for even one line, much
 less all of them? I don't have a row with 1,2,3,4 to us as the X axis
 values. I could go back and put one in the data but then I don't think
 that should really be required, or I could go back and make the
 headers for the whole array 1:400 and then plot from 10:400 but I
 thought I read that headers cannot start with numbers.

 Maybe the X axis values for a plot can actually be non-numeric C1, C2,
 C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe
 I should strip the C from C1 and be left with 1? Maybe the best thing
 is to copy the data for one line to another data.frame or array and
 then plot that?

 Just sort of lost looking at help files. Thanks for any ideas you can
 send along. Ask questions if I didn't explain my problem well enough.
 Not looking for anyone to do my work, just trying to get the concepts
 right

 Cheers,
 Mark

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

Hey Jim,
   Thanks for the pointers on matplot. I suspect that will be useful
one of these days.

   I'm attaching a little code to make a test case closer to what I
have to deal with at the bottom. My problem with your data was that
you plot everything. In my data I need to plot only a portion of it,
and in the array not every cell is valid - I don't want to plot cells
that have 0.00 as a value. In the array 'test' I need to plot the
general area defined by C1:C6, each row as a line, but stop plotting
each row when I run into a 0. Keep in mind that I don't know what
column C1 starts in. It is likely to change over time.

   I think the root cause of a number of my coding problems in R right
now is my lack of skills in reading and grabbing portions of the data
out of arrays. I'm new at this. (And not a programmer) I need to find
some good examples to read and test on that subject. If I could locate
which column was called C1, then read row 3 from C1 up to the last
value before a 0, I'd have proper data to plot for one line. Repeat as
necessary through the array and I get all the lines. Doing the lines
one at a time should allow me the opportunity to apply color or not
plot based on values in the first few columns.

Thanks,
Mark

test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
test-round(test,2)

#Make array ragged
test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
test$C6[7]-0
test$C4[8]-0;test$C5[8]-0;test$C6[8]-0

#Print array
test

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, 

Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread Mark Knecht
On Sun, Jul 5, 2009 at 12:00 AM, Mark Wardlem...@wardle.org wrote:
 Hi. Essentially your data is currently in wide format, with repeated
 measures in different columns. For most analysis and in particular for
 graphing, it is frequently helpful to reshape your data into a long
 format, with one row per data value and additional variables to list
 experiment or subject identifier, experimental conditions etc.

 see ?reshape and Dr. Wickham's reshape package (http://had.co.nz/reshape/)

 Good luck,

 Mark


This looks interesting. Thanks!

cheers,
Mark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread David Winsemius


On Jul 5, 2009, at 9:53 AM, Mark Knecht wrote:


On Sat, Jul 4, 2009 at 5:22 PM, jim holtmanjholt...@gmail.com wrote:

See if this example helps; show how to either plot the row or columns
of a data frame:


test - data.frame(C1=runif(10), C2=runif(10), C3=runif(10))
test

  C1C2C3
1  0.91287592 0.3390729 0.4346595
2  0.29360337 0.8394404 0.7125147
3  0.45906573 0.3466835 0.344
4  0.33239467 0.3337749 0.3253522
5  0.65087047 0.4763512 0.7570871
6  0.25801678 0.8921983 0.2026923
7  0.47854525 0.8643395 0.7111212
8  0.76631067 0.3899895 0.1216919
9  0.08424691 0.7773207 0.2454885
10 0.87532133 0.9606180 0.1433044

# this will plot each column (C1, C2, C3)
matplot(test, type='o')
# plot each row
matplot(t(test), type='o')



On Sat, Jul 4, 2009 at 8:02 PM, Mark Knechtmarkkne...@gmail.com  
wrote:

OK, I guess I'm getting better at the data part of R. I wrote a
program outside of R this morning to dump a bunch of experimental
data. It's a sort of ragged array - about 700 rows and 400 columns,
but the amount of data in each column varies based on the length of
the experiment. The real data ends with a 0 following some non-zero
value. It might be as short as 5 to 10 columns or as many as 390.  
The

first 9 columns contain some data about when the experiment was run
and a few other things I thought I might be interested in later. All
the data starts in column 10 and has headers saying C1, C2, C3, C4,
etc., up to C390 The first value for every experiment is some  
value I

will normalize and then the values following are above and below the
original tracing out the path that the experiment took, ending
somewhere to the right but not a fixed number of readings.

R reads it in fine and it looks good so far.

Now, what I thought I might do with R is plot all 700 rows as
individual lines, giving them some color based on info in columns  
1-9,

but suddenly I'm lost again in plots which I think should be fairly
easy. How would I go about creating a plot for even one line, much
less all of them? I don't have a row with 1,2,3,4 to us as the X  
axis
values. I could go back and put one in the data but then I don't  
think

that should really be required, or I could go back and make the
headers for the whole array 1:400 and then plot from 10:400 but I
thought I read that headers cannot start with numbers.

Maybe the X axis values for a plot can actually be non-numeric C1,  
C2,
C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or  
maybe
I should strip the C from C1 and be left with 1? Maybe the best  
thing

is to copy the data for one line to another data.frame or array and
then plot that?

Just sort of lost looking at help files. Thanks for any ideas you  
can
send along. Ask questions if I didn't explain my problem well  
enough.
Not looking for anyone to do my work, just trying to get the  
concepts

right

Cheers,
Mark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390


Hey Jim,
  Thanks for the pointers on matplot. I suspect that will be useful
one of these days.

  I'm attaching a little code to make a test case closer to what I
have to deal with at the bottom. My problem with your data was that
you plot everything. In my data I need to plot only a portion of it,
and in the array not every cell is valid - I don't want to plot cells
that have 0.00 as a value. In the array 'test' I need to plot the
general area defined by C1:C6, each row as a line, but stop plotting
each row when I run into a 0. Keep in mind that I don't know what
column C1 starts in. It is likely to change over time.

  I think the root cause of a number of my coding problems in R right
now is my lack of skills in reading and grabbing portions of the data
out of arrays. I'm new at this. (And not a programmer) I need to find
some good examples to read and test on that subject. If I could locate
which column was called C1, then read row 3 from C1 up to the last
value before a 0, I'd have proper data to plot for one line. Repeat as
necessary through the array and I get all the lines. Doing the lines
one at a time should allow me the opportunity to apply color or not
plot based on values in the first few columns.

Thanks,
Mark

test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
test-round(test,2)

#Make array ragged
test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
test$C6[7]-0
test$C4[8]-0;test$C5[8]-0;test$C6[8]-0

#Print array
test


?[ for the help page on Extract which is a gold mine of useful methods

A single row can be extracted with:
test[3, ]

Two rows:
test[3:4, ]

And individual elements of a vector can be further specified:
 

Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread Uwe Ligges



David Winsemius wrote:


On Jul 5, 2009, at 9:53 AM, Mark Knecht wrote:


On Sat, Jul 4, 2009 at 5:22 PM, jim holtmanjholt...@gmail.com wrote:

See if this example helps; show how to either plot the row or columns
of a data frame:


test - data.frame(C1=runif(10), C2=runif(10), C3=runif(10))
test

  C1C2C3
1  0.91287592 0.3390729 0.4346595
2  0.29360337 0.8394404 0.7125147
3  0.45906573 0.3466835 0.344
4  0.33239467 0.3337749 0.3253522
5  0.65087047 0.4763512 0.7570871
6  0.25801678 0.8921983 0.2026923
7  0.47854525 0.8643395 0.7111212
8  0.76631067 0.3899895 0.1216919
9  0.08424691 0.7773207 0.2454885
10 0.87532133 0.9606180 0.1433044

# this will plot each column (C1, C2, C3)
matplot(test, type='o')
# plot each row
matplot(t(test), type='o')



On Sat, Jul 4, 2009 at 8:02 PM, Mark Knechtmarkkne...@gmail.com wrote:

OK, I guess I'm getting better at the data part of R. I wrote a
program outside of R this morning to dump a bunch of experimental
data. It's a sort of ragged array - about 700 rows and 400 columns,
but the amount of data in each column varies based on the length of
the experiment. The real data ends with a 0 following some non-zero
value. It might be as short as 5 to 10 columns or as many as 390. The
first 9 columns contain some data about when the experiment was run
and a few other things I thought I might be interested in later. All
the data starts in column 10 and has headers saying C1, C2, C3, C4,
etc., up to C390 The first value for every experiment is some value I
will normalize and then the values following are above and below the
original tracing out the path that the experiment took, ending
somewhere to the right but not a fixed number of readings.

R reads it in fine and it looks good so far.

Now, what I thought I might do with R is plot all 700 rows as
individual lines, giving them some color based on info in columns 1-9,
but suddenly I'm lost again in plots which I think should be fairly
easy. How would I go about creating a plot for even one line, much
less all of them? I don't have a row with 1,2,3,4 to us as the X axis
values. I could go back and put one in the data but then I don't think
that should really be required, or I could go back and make the
headers for the whole array 1:400 and then plot from 10:400 but I
thought I read that headers cannot start with numbers.

Maybe the X axis values for a plot can actually be non-numeric C1, C2,
C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe
I should strip the C from C1 and be left with 1? Maybe the best thing
is to copy the data for one line to another data.frame or array and
then plot that?

Just sort of lost looking at help files. Thanks for any ideas you can
send along. Ask questions if I didn't explain my problem well enough.
Not looking for anyone to do my work, just trying to get the concepts
right

Cheers,
Mark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390


Hey Jim,
  Thanks for the pointers on matplot. I suspect that will be useful
one of these days.

  I'm attaching a little code to make a test case closer to what I
have to deal with at the bottom. My problem with your data was that
you plot everything. In my data I need to plot only a portion of it,
and in the array not every cell is valid - I don't want to plot cells
that have 0.00 as a value. In the array 'test' I need to plot the
general area defined by C1:C6, each row as a line, but stop plotting
each row when I run into a 0. Keep in mind that I don't know what
column C1 starts in. It is likely to change over time.

  I think the root cause of a number of my coding problems in R right
now is my lack of skills in reading and grabbing portions of the data
out of arrays. I'm new at this. (And not a programmer) I need to find
some good examples to read and test on that subject. If I could locate
which column was called C1, then read row 3 from C1 up to the last
value before a 0, I'd have proper data to plot for one line. Repeat as
necessary through the array and I get all the lines. Doing the lines
one at a time should allow me the opportunity to apply color or not
plot based on values in the first few columns.

Thanks,
Mark

test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
test-round(test,2)

#Make array ragged
test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
test$C6[7]-0
test$C4[8]-0;test$C5[8]-0;test$C6[8]-0

#Print array
test


?[ for the help page on Extract which is a gold mine of useful methods

A single row can be extracted with:
test[3, ]

Two rows:
test[3:4, ]

And individual elements of a vector can be further specified:
  

Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread Mark Knecht
On Sun, Jul 5, 2009 at 7:35 AM, David Winsemiusdwinsem...@comcast.net wrote:

 On Jul 5, 2009, at 9:53 AM, Mark Knecht wrote:

 On Sat, Jul 4, 2009 at 5:22 PM, jim holtmanjholt...@gmail.com wrote:

 See if this example helps; show how to either plot the row or columns
 of a data frame:

 test - data.frame(C1=runif(10), C2=runif(10), C3=runif(10))
 test

          C1        C2        C3
 1  0.91287592 0.3390729 0.4346595
 2  0.29360337 0.8394404 0.7125147
 3  0.45906573 0.3466835 0.344
 4  0.33239467 0.3337749 0.3253522
 5  0.65087047 0.4763512 0.7570871
 6  0.25801678 0.8921983 0.2026923
 7  0.47854525 0.8643395 0.7111212
 8  0.76631067 0.3899895 0.1216919
 9  0.08424691 0.7773207 0.2454885
 10 0.87532133 0.9606180 0.1433044

 # this will plot each column (C1, C2, C3)
 matplot(test, type='o')
 # plot each row
 matplot(t(test), type='o')


 On Sat, Jul 4, 2009 at 8:02 PM, Mark Knechtmarkkne...@gmail.com wrote:

 OK, I guess I'm getting better at the data part of R. I wrote a
 program outside of R this morning to dump a bunch of experimental
 data. It's a sort of ragged array - about 700 rows and 400 columns,
 but the amount of data in each column varies based on the length of
 the experiment. The real data ends with a 0 following some non-zero
 value. It might be as short as 5 to 10 columns or as many as 390. The
 first 9 columns contain some data about when the experiment was run
 and a few other things I thought I might be interested in later. All
 the data starts in column 10 and has headers saying C1, C2, C3, C4,
 etc., up to C390 The first value for every experiment is some value I
 will normalize and then the values following are above and below the
 original tracing out the path that the experiment took, ending
 somewhere to the right but not a fixed number of readings.

 R reads it in fine and it looks good so far.

 Now, what I thought I might do with R is plot all 700 rows as
 individual lines, giving them some color based on info in columns 1-9,
 but suddenly I'm lost again in plots which I think should be fairly
 easy. How would I go about creating a plot for even one line, much
 less all of them? I don't have a row with 1,2,3,4 to us as the X axis
 values. I could go back and put one in the data but then I don't think
 that should really be required, or I could go back and make the
 headers for the whole array 1:400 and then plot from 10:400 but I
 thought I read that headers cannot start with numbers.

 Maybe the X axis values for a plot can actually be non-numeric C1, C2,
 C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe
 I should strip the C from C1 and be left with 1? Maybe the best thing
 is to copy the data for one line to another data.frame or array and
 then plot that?

 Just sort of lost looking at help files. Thanks for any ideas you can
 send along. Ask questions if I didn't explain my problem well enough.
 Not looking for anyone to do my work, just trying to get the concepts
 right

 Cheers,
 Mark

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 Hey Jim,
  Thanks for the pointers on matplot. I suspect that will be useful
 one of these days.

  I'm attaching a little code to make a test case closer to what I
 have to deal with at the bottom. My problem with your data was that
 you plot everything. In my data I need to plot only a portion of it,
 and in the array not every cell is valid - I don't want to plot cells
 that have 0.00 as a value. In the array 'test' I need to plot the
 general area defined by C1:C6, each row as a line, but stop plotting
 each row when I run into a 0. Keep in mind that I don't know what
 column C1 starts in. It is likely to change over time.

  I think the root cause of a number of my coding problems in R right
 now is my lack of skills in reading and grabbing portions of the data
 out of arrays. I'm new at this. (And not a programmer) I need to find
 some good examples to read and test on that subject. If I could locate
 which column was called C1, then read row 3 from C1 up to the last
 value before a 0, I'd have proper data to plot for one line. Repeat as
 necessary through the array and I get all the lines. Doing the lines
 one at a time should allow me the opportunity to apply color or not
 plot based on values in the first few columns.

 Thanks,
 Mark

 test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
 C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
 test-round(test,2)

 #Make array ragged
 test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
 test$C6[7]-0
 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0

 #Print array
 test

 ?[ for the help page on Extract which is a gold mine of useful methods


Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread David Winsemius


On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:




David Winsemius wrote:


So if your values are calculated from other values then consider  
using all.equal()
And repeated applications of the testing criteria process are  
effective:

test[3,][which(names(test)==C1):(which(test[3,] == 0.0)-1)]
   C1   C2   C3
3 0.52 0.66 0.51
(and a warning that does not seem accurate to me.)
In which(names(test) == C1):(which(test[3, ] == 0) - 1) :
 numerical expression has 3 elements: only the first used



David,

# which(test[3,] == 0.0)
[1] 6 7 8

and in a:b a and b must be length 1 vectors (scalars) otherwise just  
the first element (in this case 6) is used.


That leads us to the conclusion that writing the line above is not  
really the cleanest way or you intended something different 


Thanks, Uwe. I see my confusion. I did want 6 to be used  and it looks  
as though I would not be getting in truouble this way, but a cleaner  
method would be to access only the first element of which(test[3, ] ==  
0):


test[3,][ which(names(test) == C1) : (which(test[3,] == 0.0)[1]-1) ]



David


Seems to me that all of the element were used. I cannot explain  
that warning but am pretty sure it can be ignored.




David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread Mark Knecht
On Sun, Jul 5, 2009 at 8:18 AM, David Winsemiusdwinsem...@comcast.net wrote:

 On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:



 David Winsemius wrote:

 So if your values are calculated from other values then consider using
 all.equal()
 And repeated applications of the testing criteria process are effective:
 test[3,][which(names(test)==C1):(which(test[3,] == 0.0)-1)]
   C1   C2   C3
 3 0.52 0.66 0.51
 (and a warning that does not seem accurate to me.)
 In which(names(test) == C1):(which(test[3, ] == 0) - 1) :
  numerical expression has 3 elements: only the first used


 David,

 # which(test[3,] == 0.0)
 [1] 6 7 8

 and in a:b a and b must be length 1 vectors (scalars) otherwise just the
 first element (in this case 6) is used.

 That leads us to the conclusion that writing the line above is not really
 the cleanest way or you intended something different 

 Thanks, Uwe. I see my confusion. I did want 6 to be used  and it looks as
 though I would not be getting in truouble this way, but a cleaner method
 would be to access only the first element of which(test[3, ] == 0):

 test[3,][ which(names(test) == C1) : (which(test[3,] == 0.0)[1]-1) ]


 David

 Seems to me that all of the element were used. I cannot explain that
 warning but am pretty sure it can be ignored.


 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT



OK - making lots more headway. Thanks for your help.

QUESTION: How do I handle the case where I'm testing for 0 and don't
find it? In this case I need to all of the row from C1:C6.

test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
test-round(test,2)

#Make array ragged
test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
test$C6[7]-0
test$C4[8]-0;test$C5[8]-0;test$C6[8]-0

test

#C1 always the same so calculate it only once
StartCol - which(names(test)==C1)

#Print row 3 explicitly
test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]

#Row 6 fails because 0 is not found
test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]

EndCol - which(test[6,] == 0.0)[1]-1
EndCol

Thanks,
Mark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread David Winsemius


On Jul 5, 2009, at 12:19 PM, Mark Knecht wrote:

On Sun, Jul 5, 2009 at 8:18 AM, David  
Winsemiusdwinsem...@comcast.net wrote:


On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:




David Winsemius wrote:


So if your values are calculated from other values then consider  
using

all.equal()
And repeated applications of the testing criteria process are  
effective:

test[3,][which(names(test)==C1):(which(test[3,] == 0.0)-1)]
  C1   C2   C3
3 0.52 0.66 0.51
(and a warning that does not seem accurate to me.)
In which(names(test) == C1):(which(test[3, ] == 0) - 1) :
 numerical expression has 3 elements: only the first used



David,

# which(test[3,] == 0.0)
[1] 6 7 8

and in a:b a and b must be length 1 vectors (scalars) otherwise  
just the

first element (in this case 6) is used.

That leads us to the conclusion that writing the line above is not  
really

the cleanest way or you intended something different 


Thanks, Uwe. I see my confusion. I did want 6 to be used  and it  
looks as
though I would not be getting in truouble this way, but a cleaner  
method

would be to access only the first element of which(test[3, ] == 0):

test[3,][ which(names(test) == C1) : (which(test[3,] == 0.0) 
[1]-1) ]




David


Seems to me that all of the element were used. I cannot explain  
that

warning but am pretty sure it can be ignored.



David


OK - making lots more headway. Thanks for your help.

QUESTION: How do I handle the case where I'm testing for 0 and don't
find it? In this case I need to all of the row from C1:C6.

test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
test-round(test,2)

#Make array ragged
test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
test$C6[7]-0
test$C4[8]-0;test$C5[8]-0;test$C6[8]-0

test

#C1 always the same so calculate it only once
StartCol - which(names(test)==C1)

#Print row 3 explicitly
test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]

#Row 6 fails because 0 is not found
test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]

EndCol - which(test[6,] == 0.0)[1]-1
EndCol



It's getting a bit Baroque, but here is a solution that handles an NA:

test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) ,
  ncol(test),   which(test[6,] == 0.0) 
[1]-1 )

]
#-
C1   C2   C3   C4   C5   C6
6 0.33 0.84 0.51 0.86 0.84 0.15


Maybe an R-meister can offer something more compact?

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread Uwe Ligges



David Winsemius wrote:


On Jul 5, 2009, at 12:19 PM, Mark Knecht wrote:

On Sun, Jul 5, 2009 at 8:18 AM, David 
Winsemiusdwinsem...@comcast.net wrote:


On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:




David Winsemius wrote:


So if your values are calculated from other values then consider using
all.equal()
And repeated applications of the testing criteria process are 
effective:

test[3,][which(names(test)==C1):(which(test[3,] == 0.0)-1)]
  C1   C2   C3
3 0.52 0.66 0.51
(and a warning that does not seem accurate to me.)
In which(names(test) == C1):(which(test[3, ] == 0) - 1) :
 numerical expression has 3 elements: only the first used



David,

# which(test[3,] == 0.0)
[1] 6 7 8

and in a:b a and b must be length 1 vectors (scalars) otherwise just 
the

first element (in this case 6) is used.

That leads us to the conclusion that writing the line above is not 
really

the cleanest way or you intended something different 


Thanks, Uwe. I see my confusion. I did want 6 to be used  and it 
looks as

though I would not be getting in truouble this way, but a cleaner method
would be to access only the first element of which(test[3, ] == 0):

test[3,][ which(names(test) == C1) : (which(test[3,] == 0.0)[1]-1) ]



David



Seems to me that all of the element were used. I cannot explain that
warning but am pretty sure it can be ignored.



David


OK - making lots more headway. Thanks for your help.

QUESTION: How do I handle the case where I'm testing for 0 and don't
find it? In this case I need to all of the row from C1:C6.

test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
test-round(test,2)

#Make array ragged
test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
test$C6[7]-0
test$C4[8]-0;test$C5[8]-0;test$C6[8]-0

test

#C1 always the same so calculate it only once
StartCol - which(names(test)==C1)

#Print row 3 explicitly
test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]

#Row 6 fails because 0 is not found
test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]

EndCol - which(test[6,] == 0.0)[1]-1
EndCol



It's getting a bit Baroque, but here is a solution that handles an NA:

test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) ,
  ncol(test),   which(test[6,] == 0.0)[1]-1 )
]
#-
C1   C2   C3   C4   C5   C6
6 0.33 0.84 0.51 0.86 0.84 0.15


Maybe an R-meister can offer something more compact?



So let's wait for some R-meister, I'd write even more 

Reason: testing for exactly zero after possible calculations is a bit 
dangerous and ifelse() is designed for vectorized operations but is not 
efficient for scalar operations, particularly since both expressions are 
evaluated, so if() else would be preferable, but we could use min() 
instead. Finally, a:b could end up in 5:3 without a warning and I'd use 
seq() instead.


Hence I'd prefer:

temp - which(sapply(test[6,], function(x, y) isTRUE(all.equal(x,y)), 0))[1]
test[6, seq(from = StartCol, to = min(c(temp - 1, ncol(test)), na.rm = 
TRUE), by = 1)]



Best,
Uwe Ligges




David Winsemius, MD
Heritage Laboratories
West Hartford, CT



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread Mark Knecht
2009/7/5 Uwe Ligges lig...@statistik.tu-dortmund.de:


 David Winsemius wrote:

 On Jul 5, 2009, at 12:19 PM, Mark Knecht wrote:

 On Sun, Jul 5, 2009 at 8:18 AM, David Winsemiusdwinsem...@comcast.net
 wrote:

 On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:



 David Winsemius wrote:

 So if your values are calculated from other values then consider using
 all.equal()
 And repeated applications of the testing criteria process are
 effective:
 test[3,][which(names(test)==C1):(which(test[3,] == 0.0)-1)]
  C1   C2   C3
 3 0.52 0.66 0.51
 (and a warning that does not seem accurate to me.)
 In which(names(test) == C1):(which(test[3, ] == 0) - 1) :
  numerical expression has 3 elements: only the first used


 David,

 # which(test[3,] == 0.0)
 [1] 6 7 8

 and in a:b a and b must be length 1 vectors (scalars) otherwise just
 the
 first element (in this case 6) is used.

 That leads us to the conclusion that writing the line above is not
 really
 the cleanest way or you intended something different 

 Thanks, Uwe. I see my confusion. I did want 6 to be used  and it looks
 as
 though I would not be getting in truouble this way, but a cleaner method
 would be to access only the first element of which(test[3, ] == 0):

 test[3,][ which(names(test) == C1) : (which(test[3,] == 0.0)[1]-1) ]


 David

 Seems to me that all of the element were used. I cannot explain that
 warning but am pretty sure it can be ignored.


 David

 OK - making lots more headway. Thanks for your help.

 QUESTION: How do I handle the case where I'm testing for 0 and don't
 find it? In this case I need to all of the row from C1:C6.

 test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
 C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
 test-round(test,2)

 #Make array ragged
 test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
 test$C6[7]-0
 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0

 test

 #C1 always the same so calculate it only once
 StartCol - which(names(test)==C1)

 #Print row 3 explicitly
 test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]

 #Row 6 fails because 0 is not found
 test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]

 EndCol - which(test[6,] == 0.0)[1]-1
 EndCol


 It's getting a bit Baroque, but here is a solution that handles an NA:

 test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) ,
                              ncol(test),   which(test[6,] == 0.0)[1]-1 )
            ]
 #-
    C1   C2   C3   C4   C5   C6
 6 0.33 0.84 0.51 0.86 0.84 0.15


 Maybe an R-meister can offer something more compact?


 So let's wait for some R-meister, I'd write even more 

 Reason: testing for exactly zero after possible calculations is a bit
 dangerous and ifelse() is designed for vectorized operations but is not
 efficient for scalar operations, particularly since both expressions are
 evaluated, so if() else would be preferable, but we could use min() instead.
 Finally, a:b could end up in 5:3 without a warning and I'd use seq()
 instead.

 Hence I'd prefer:

 temp - which(sapply(test[6,], function(x, y) isTRUE(all.equal(x,y)), 0))[1]
 test[6, seq(from = StartCol, to = min(c(temp - 1, ncol(test)), na.rm =
 TRUE), by = 1)]



I appreciate both of the answers. I don't completely understand them,
but I do appreciate them. Thanks!

I was wondering whether it's easy to simply test the last column for
==0, and if true run the previous command, if false just return
everything up to the end of the row?

Currently my data is one experiment per row, but that's wasting space
as most experiments only take 20% of the row and 80% of the row is
filled with 0's. I might want to make the array more narrow and have a
flag somewhere in the 1st 10 columns that says the this row is a
continuation row from the previous row. That way I could pack the
array better, use less memory and when I do finally test for 0 I have
a short line to traverse?

Just an idea.

Anyway, I suspect either of these will suit my short term needs. On to
the next step.

Cheers,
Mark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread David Winsemius


On Jul 5, 2009, at 1:19 PM, Uwe Ligges wrote:


snippedpreample

test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
test-round(test,2)

#Make array ragged
test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
test$C6[7]-0
test$C4[8]-0;test$C5[8]-0;test$C6[8]-0

test

#C1 always the same so calculate it only once
StartCol - which(names(test)==C1)

#Print row 3 explicitly
test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]

#Row 6 fails because 0 is not found
test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]

EndCol - which(test[6,] == 0.0)[1]-1
EndCol

It's getting a bit Baroque, but here is a solution that handles an  
NA:

test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) ,
 ncol(test),   which(test[6,] == 0.0) 
[1]-1 )

   ]
#-
   C1   C2   C3   C4   C5   C6
6 0.33 0.84 0.51 0.86 0.84 0.15
Maybe an R-meister can offer something more compact?



So let's wait for some R-meister, I'd write even more 

Reason: testing for exactly zero after possible calculations is a  
bit dangerous and ifelse() is designed for vectorized operations but  
is not efficient for scalar operations, particularly since both  
expressions are evaluated, so if() else would be preferable, but we  
could use min() instead. Finally, a:b could end up in 5:3 without a  
warning and I'd use seq() instead.


Hence I'd prefer:

temp - which(sapply(test[6,], function(x, y)  
isTRUE(all.equal(x,y)), 0))[1]


This appears to be learning moment for me. Do I have it correctly that  
the first argument to sapply, the vector(test[6,],  gets passed  
element-wise to the first parameter of the function, x, and the second  
argument, 0, is getting passed via recycling to the second parameter,  
y, through the , ...)  mechanism of the sapply function?


test[6, seq(from = StartCol, to = min(c(temp - 1, ncol(test)), na.rm  
= TRUE), by = 1)]


I had tried a min() solution and got Inf in return when there was an  
NA in the vector, but did not realize that it had an na.rm mode.


Thanks for the meisterhaft corrections.




Best,
Uwe Ligges


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread Uwe Ligges



David Winsemius wrote:


On Jul 5, 2009, at 1:19 PM, Uwe Ligges wrote:


snippedpreample

test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
test-round(test,2)

#Make array ragged
test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
test$C6[7]-0
test$C4[8]-0;test$C5[8]-0;test$C6[8]-0

test

#C1 always the same so calculate it only once
StartCol - which(names(test)==C1)

#Print row 3 explicitly
test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]

#Row 6 fails because 0 is not found
test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]

EndCol - which(test[6,] == 0.0)[1]-1
EndCol


It's getting a bit Baroque, but here is a solution that handles an NA:
test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) ,
 ncol(test),   which(test[6,] == 0.0)[1]-1 )
   ]
#-
   C1   C2   C3   C4   C5   C6
6 0.33 0.84 0.51 0.86 0.84 0.15
Maybe an R-meister can offer something more compact?



So let's wait for some R-meister, I'd write even more 

Reason: testing for exactly zero after possible calculations is a bit 
dangerous and ifelse() is designed for vectorized operations but is 
not efficient for scalar operations, particularly since both 
expressions are evaluated, so if() else would be preferable, but we 
could use min() instead. Finally, a:b could end up in 5:3 without a 
warning and I'd use seq() instead.


Hence I'd prefer:

temp - which(sapply(test[6,], function(x, y) isTRUE(all.equal(x,y)), 
0))[1]


This appears to be learning moment for me. Do I have it correctly that 
the first argument to sapply, the vector(test[6,],  gets passed 
element-wise to the first parameter of the function, x, 



Yes.


and the second 
argument, 0, is getting passed via recycling to the second parameter, y, 
through the , ...)  mechanism of the sapply function?



No, each time the whole thing (which is just 0 here) is passed to 
sapply, not via recycling.




test[6, seq(from = StartCol, to = min(c(temp - 1, ncol(test)), na.rm = 
TRUE), by = 1)]


I had tried a min() solution and got Inf in return when there was an NA 
in the vector, but did not realize that it had an na.rm mode.


Thanks for the meisterhaft corrections.



:-)


Uwe





Best,
Uwe Ligges


David Winsemius, MD
Heritage Laboratories
West Hartford, CT



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread Henrique Dallazuanna
Try this:

subset(test[3,], select=C1:C6)[,subset(test[3,], select = C1:C6)  0]

subset(test[6,], select=C1:C6)[,subset(test[6,], select = C1:C6)  0]



On Sun, Jul 5, 2009 at 1:19 PM, Mark Knecht markkne...@gmail.com wrote:

 On Sun, Jul 5, 2009 at 8:18 AM, David Winsemiusdwinsem...@comcast.net
 wrote:
 
  On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:
 
 
 
  David Winsemius wrote:
 
  So if your values are calculated from other values then consider using
  all.equal()
  And repeated applications of the testing criteria process are
 effective:
  test[3,][which(names(test)==C1):(which(test[3,] == 0.0)-1)]
C1   C2   C3
  3 0.52 0.66 0.51
  (and a warning that does not seem accurate to me.)
  In which(names(test) == C1):(which(test[3, ] == 0) - 1) :
   numerical expression has 3 elements: only the first used
 
 
  David,
 
  # which(test[3,] == 0.0)
  [1] 6 7 8
 
  and in a:b a and b must be length 1 vectors (scalars) otherwise just the
  first element (in this case 6) is used.
 
  That leads us to the conclusion that writing the line above is not
 really
  the cleanest way or you intended something different 
 
  Thanks, Uwe. I see my confusion. I did want 6 to be used  and it looks as
  though I would not be getting in truouble this way, but a cleaner method
  would be to access only the first element of which(test[3, ] == 0):
 
  test[3,][ which(names(test) == C1) : (which(test[3,] == 0.0)[1]-1) ]
 
 
  David
 
  Seems to me that all of the element were used. I cannot explain that
  warning but am pretty sure it can be ignored.
 
 
  David Winsemius, MD
  Heritage Laboratories
  West Hartford, CT
 
 

 OK - making lots more headway. Thanks for your help.

 QUESTION: How do I handle the case where I'm testing for 0 and don't
 find it? In this case I need to all of the row from C1:C6.

 test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
 C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
 test-round(test,2)

 #Make array ragged
 test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
 test$C6[7]-0
 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0

 test

 #C1 always the same so calculate it only once
 StartCol - which(names(test)==C1)

 #Print row 3 explicitly
 test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]

 #Row 6 fails because 0 is not found
 test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]

 EndCol - which(test[6,] == 0.0)[1]-1
 EndCol

 Thanks,
 Mark

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread Mark Knecht
On Sun, Jul 5, 2009 at 12:30 PM, Henrique Dallazuannawww...@gmail.com wrote:
 Try this:

 subset(test[3,], select=C1:C6)[,subset(test[3,], select = C1:C6)  0]

 subset(test[6,], select=C1:C6)[,subset(test[6,], select = C1:C6)  0]



I must admit I like this one. Pleasing to look at. It seems
approachable. Thanks!

If I understand this the second subset gets evaluated first producing
either TRUE or FALSE, and then the first subset gets evaluated but
only for the entries that are TRUE? Is that the process?

Thanks,
Mark


 On Sun, Jul 5, 2009 at 1:19 PM, Mark Knecht markkne...@gmail.com wrote:

 On Sun, Jul 5, 2009 at 8:18 AM, David Winsemiusdwinsem...@comcast.net
 wrote:
 
  On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:
 
 
 
  David Winsemius wrote:
 
  So if your values are calculated from other values then consider using
  all.equal()
  And repeated applications of the testing criteria process are
  effective:
  test[3,][which(names(test)==C1):(which(test[3,] == 0.0)-1)]
    C1   C2   C3
  3 0.52 0.66 0.51
  (and a warning that does not seem accurate to me.)
  In which(names(test) == C1):(which(test[3, ] == 0) - 1) :
   numerical expression has 3 elements: only the first used
 
 
  David,
 
  # which(test[3,] == 0.0)
  [1] 6 7 8
 
  and in a:b a and b must be length 1 vectors (scalars) otherwise just
  the
  first element (in this case 6) is used.
 
  That leads us to the conclusion that writing the line above is not
  really
  the cleanest way or you intended something different 
 
  Thanks, Uwe. I see my confusion. I did want 6 to be used  and it looks
  as
  though I would not be getting in truouble this way, but a cleaner method
  would be to access only the first element of which(test[3, ] == 0):
 
  test[3,][ which(names(test) == C1) : (which(test[3,] == 0.0)[1]-1) ]
 
 
  David
 
  Seems to me that all of the element were used. I cannot explain that
  warning but am pretty sure it can be ignored.
 
 
  David Winsemius, MD
  Heritage Laboratories
  West Hartford, CT
 
 

 OK - making lots more headway. Thanks for your help.

 QUESTION: How do I handle the case where I'm testing for 0 and don't
 find it? In this case I need to all of the row from C1:C6.

 test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
 C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
 test-round(test,2)

 #Make array ragged
 test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
 test$C6[7]-0
 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0

 test

 #C1 always the same so calculate it only once
 StartCol - which(names(test)==C1)

 #Print row 3 explicitly
 test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]

 #Row 6 fails because 0 is not found
 test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]

 EndCol - which(test[6,] == 0.0)[1]-1
 EndCol

 Thanks,
 Mark

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread Henrique Dallazuanna
Yes,

First, select only columns C1 to C6, then look for values greater than 0,
after use this to select the columns in original subset.

On Sun, Jul 5, 2009 at 4:48 PM, Mark Knecht markkne...@gmail.com wrote:

 On Sun, Jul 5, 2009 at 12:30 PM, Henrique Dallazuannawww...@gmail.com
 wrote:
  Try this:
 
  subset(test[3,], select=C1:C6)[,subset(test[3,], select = C1:C6)  0]
 
  subset(test[6,], select=C1:C6)[,subset(test[6,], select = C1:C6)  0]
 
 

 I must admit I like this one. Pleasing to look at. It seems
 approachable. Thanks!

 If I understand this the second subset gets evaluated first producing
 either TRUE or FALSE, and then the first subset gets evaluated but
 only for the entries that are TRUE? Is that the process?

 Thanks,
 Mark

 
  On Sun, Jul 5, 2009 at 1:19 PM, Mark Knecht markkne...@gmail.com
 wrote:
 
  On Sun, Jul 5, 2009 at 8:18 AM, David Winsemiusdwinsem...@comcast.net
  wrote:
  
   On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:
  
  
  
   David Winsemius wrote:
  
   So if your values are calculated from other values then consider
 using
   all.equal()
   And repeated applications of the testing criteria process are
   effective:
   test[3,][which(names(test)==C1):(which(test[3,] == 0.0)-1)]
 C1   C2   C3
   3 0.52 0.66 0.51
   (and a warning that does not seem accurate to me.)
   In which(names(test) == C1):(which(test[3, ] == 0) - 1) :
numerical expression has 3 elements: only the first used
  
  
   David,
  
   # which(test[3,] == 0.0)
   [1] 6 7 8
  
   and in a:b a and b must be length 1 vectors (scalars) otherwise just
   the
   first element (in this case 6) is used.
  
   That leads us to the conclusion that writing the line above is not
   really
   the cleanest way or you intended something different 
  
   Thanks, Uwe. I see my confusion. I did want 6 to be used  and it looks
   as
   though I would not be getting in truouble this way, but a cleaner
 method
   would be to access only the first element of which(test[3, ] == 0):
  
   test[3,][ which(names(test) == C1) : (which(test[3,] == 0.0)[1]-1) ]
  
  
   David
  
   Seems to me that all of the element were used. I cannot explain that
   warning but am pretty sure it can be ignored.
  
  
   David Winsemius, MD
   Heritage Laboratories
   West Hartford, CT
  
  
 
  OK - making lots more headway. Thanks for your help.
 
  QUESTION: How do I handle the case where I'm testing for 0 and don't
  find it? In this case I need to all of the row from C1:C6.
 
  test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
  C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
  test-round(test,2)
 
  #Make array ragged
  test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
  test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
  test$C6[7]-0
  test$C4[8]-0;test$C5[8]-0;test$C6[8]-0
 
  test
 
  #C1 always the same so calculate it only once
  StartCol - which(names(test)==C1)
 
  #Print row 3 explicitly
  test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]
 
  #Row 6 fails because 0 is not found
  test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]
 
  EndCol - which(test[6,] == 0.0)[1]-1
  EndCol
 
  Thanks,
  Mark
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  --
  Henrique Dallazuanna
  Curitiba-Paraná-Brasil
  25° 25' 40 S 49° 16' 22 O
 




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread Mark Knecht
On Sun, Jul 5, 2009 at 1:00 PM, Henrique Dallazuannawww...@gmail.com wrote:
 Yes,

 First, select only columns C1 to C6, then look for values greater than 0,
 after use this to select the columns in original subset.

 On Sun, Jul 5, 2009 at 4:48 PM, Mark Knecht markkne...@gmail.com wrote:

 On Sun, Jul 5, 2009 at 12:30 PM, Henrique Dallazuannawww...@gmail.com
 wrote:
  Try this:
 
  subset(test[3,], select=C1:C6)[,subset(test[3,], select = C1:C6)  0]
 
  subset(test[6,], select=C1:C6)[,subset(test[6,], select = C1:C6)  0]
 
 


Thanks for the further explanation.

One small difference in this approach is that in the general case I
have to supply the name of the last column whereas the other just
starts at the beginning and goes until it's done. No big deal and
possibly an advantage as I could search a subset of the data on the
row, i.e. supply both the start and stop columns, for instance
C61:C120. This could be valuable as each column generally represents 1
minute further into the experiment, so that range would look at the
second hour only.

Cheers,
Mark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread hadley wickham
   I think the root cause of a number of my coding problems in R right
 now is my lack of skills in reading and grabbing portions of the data
 out of arrays. I'm new at this. (And not a programmer) I need to find
 some good examples to read and test on that subject. If I could locate
 which column was called C1, then read row 3 from C1 up to the last
 value before a 0, I'd have proper data to plot for one line. Repeat as
 necessary through the array and I get all the lines. Doing the lines
 one at a time should allow me the opportunity to apply color or not
 plot based on values in the first few columns.

 Thanks,
 Mark

 test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
 C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
 test-round(test,2)

 #Make array ragged
 test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
 test$C6[7]-0
 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0

 #Print array
 test

Are the zeros always going to be arranged like this? i.e. for
experiment there is a point at which all later values are zero?  If
so, the following is a much simpler way of getting to the core of your
data, without fussing with overly complicated matrix indexing:

library(reshape)
testm - melt(test, id = c(A, B))
subset(testm, value  0)

I suspect you will also find this form easier to plot and analyse.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread Mark Knecht
On Sun, Jul 5, 2009 at 1:44 PM, hadley wickhamh.wick...@gmail.com wrote:
   I think the root cause of a number of my coding problems in R right
 now is my lack of skills in reading and grabbing portions of the data
 out of arrays. I'm new at this. (And not a programmer) I need to find
 some good examples to read and test on that subject. If I could locate
 which column was called C1, then read row 3 from C1 up to the last
 value before a 0, I'd have proper data to plot for one line. Repeat as
 necessary through the array and I get all the lines. Doing the lines
 one at a time should allow me the opportunity to apply color or not
 plot based on values in the first few columns.

 Thanks,
 Mark

 test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
 C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
 test-round(test,2)

 #Make array ragged
 test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0
 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0
 test$C6[7]-0
 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0

 #Print array
 test

 Are the zeros always going to be arranged like this? i.e. for
 experiment there is a point at which all later values are zero?  If
 so, the following is a much simpler way of getting to the core of your
 data, without fussing with overly complicated matrix indexing:

 library(reshape)
 testm - melt(test, id = c(A, B))
 subset(testm, value  0)

 I suspect you will also find this form easier to plot and analyse.

 Hadley

 --
 http://had.co.nz/


Hi Hadley,
   I wanted to look at reshape.

   Yes, there exists a point in each row (unless I get to the end with
all numbers) where I get to a zero and everything to the right is
zero.

   I'm looking at ReShape. It's interesting but I clearly don't
understand it yet so I'm reading your ReShaping data with the reshap
package form 11/07. Interesting.

   I know so little about R that I'm sort of drowning at this point
that it's hard for me to understand why this would make plotting
easier. Analysis possibly. Just the way it goes when you get started
with something new.

   In ReShape lingo I think I have ID's. They cover things like time,
date, success/failure and a few other things of interest. Once the
data starts on a row it is all data from there on to the end of the
row.

   My initial goal is to make a line plot of the data on a single row.
All the data points should connect together. There is no real
interaction planned with data on other rows, at least at this time.

   Thanks for the pointers and the code stub. I'll be looking at this.

Cheers,
Mark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread Don MacQueen

At 10:42 AM -0700 7/5/09, Mark Knecht wrote:

2009/7/5 Uwe Ligges lig...@statistik.tu-dortmund.de:






- a lot of other conversation omitted, to focus on the following


Currently my data is one experiment per row, but that's wasting space
as most experiments only take 20% of the row and 80% of the row is
filled with 0's. I might want to make the array more narrow and have a
flag somewhere in the 1st 10 columns that says the this row is a
continuation row from the previous row. That way I could pack the
array better, use less memory and when I do finally test for 0 I have
a short line to traverse?

Just an idea.

Anyway, I suspect either of these will suit my short term needs. On to
the next step.

Cheers,
Mark



This suggests the use of a list rather than a data frame. With a 
list object, each element in the list would represent one experiment, 
and each would have the appropriate number of elements (values) for 
that experiment.


Indeed, the original description,

At 5:02 PM -0700 7/4/09, Mark Knecht wrote:

OK, I guess I'm getting better at the data part of R. I wrote a
program outside of R this morning to dump a bunch of experimental
data. It's a sort of ragged array - about 700 rows and 400 columns,
but the amount of data in each column varies based on the length of
the experiment. The real data ends with a 0 following some non-zero
value. It might be as short as 5 to 10 columns or as many as 390. The
first 9 columns contain some data about when the experiment was run
and a few other things I thought I might be interested in later. All
the data starts in column 10 and has headers saying C1, C2, C3, C4,
etc., up to C390 The first value for every experiment is some value I
will normalize and then the values following are above and below the
original tracing out the path that the experiment took, ending
somewhere to the right but not a fixed number of readings.


Is also suggestive of using a list(). For example, the metadata, 
i.e., the ... data about when the experiment was run and a few other 
things ... could be held separately, instead of embedded in the same 
array, from which it always has to be excluded in order to do an 
analysis.


But I haven't followed the thread all that closely, so confess that 
my thoughts might be off the mark.


-Don

--
-
Don MacQueen
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062
m...@llnl.gov

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OK - I got the data - now what? :-)

2009-07-04 Thread jim holtman
See if this example helps; show how to either plot the row or columns
of a data frame:

 test - data.frame(C1=runif(10), C2=runif(10), C3=runif(10))
 test
   C1C2C3
1  0.91287592 0.3390729 0.4346595
2  0.29360337 0.8394404 0.7125147
3  0.45906573 0.3466835 0.344
4  0.33239467 0.3337749 0.3253522
5  0.65087047 0.4763512 0.7570871
6  0.25801678 0.8921983 0.2026923
7  0.47854525 0.8643395 0.7111212
8  0.76631067 0.3899895 0.1216919
9  0.08424691 0.7773207 0.2454885
10 0.87532133 0.9606180 0.1433044
 # this will plot each column (C1, C2, C3)
 matplot(test, type='o')
 # plot each row
 matplot(t(test), type='o')


On Sat, Jul 4, 2009 at 8:02 PM, Mark Knechtmarkkne...@gmail.com wrote:
 OK, I guess I'm getting better at the data part of R. I wrote a
 program outside of R this morning to dump a bunch of experimental
 data. It's a sort of ragged array - about 700 rows and 400 columns,
 but the amount of data in each column varies based on the length of
 the experiment. The real data ends with a 0 following some non-zero
 value. It might be as short as 5 to 10 columns or as many as 390. The
 first 9 columns contain some data about when the experiment was run
 and a few other things I thought I might be interested in later. All
 the data starts in column 10 and has headers saying C1, C2, C3, C4,
 etc., up to C390 The first value for every experiment is some value I
 will normalize and then the values following are above and below the
 original tracing out the path that the experiment took, ending
 somewhere to the right but not a fixed number of readings.

 R reads it in fine and it looks good so far.

 Now, what I thought I might do with R is plot all 700 rows as
 individual lines, giving them some color based on info in columns 1-9,
 but suddenly I'm lost again in plots which I think should be fairly
 easy. How would I go about creating a plot for even one line, much
 less all of them? I don't have a row with 1,2,3,4 to us as the X axis
 values. I could go back and put one in the data but then I don't think
 that should really be required, or I could go back and make the
 headers for the whole array 1:400 and then plot from 10:400 but I
 thought I read that headers cannot start with numbers.

 Maybe the X axis values for a plot can actually be non-numeric C1, C2,
 C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe
 I should strip the C from C1 and be left with 1? Maybe the best thing
 is to copy the data for one line to another data.frame or array and
 then plot that?

 Just sort of lost looking at help files. Thanks for any ideas you can
 send along. Ask questions if I didn't explain my problem well enough.
 Not looking for anyone to do my work, just trying to get the concepts
 right

 Cheers,
 Mark

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.