Re: [R] OK - I got the data - now what? :-)
Mark wrote: Currently my data is one experiment per row, but that's wasting space as most experiments only take 20% of the row and 80% of the row is filled with 0's. I might want to make the array more narrow and have a flag somewhere in the 1st 10 columns that says the this row is a continuation row from the previous row. That way I could pack the array better, use less memory and when I do finally test for 0 I have a short line to traverse? This may be a bit off track from the data manipulation you are working on, but I thought I'd point out that another way to handle this sort of data is to make a table with one measurement per row, rather than one experiment per row. experiment measurement value A 1 0.27 A 2 0.66 A 3 0.24 A 4 0.55 B 1 0.13 B 2 0.65 B 3 0.83 B 4 0.41 B 5 0.92 B 6 0.67 C 1 0.75 C 2 0.97 C 3 0.49 C 4 0.58 D 1 1.00 D 2 0.71 E 1 0.11 E 2 0.50 E 3 0.98 E 4 0.07 E 5 0.94 E 6 0.57 E 7 0.34 E 8 0.21 If you wrote the output of your calculations in this way, one value per line, it can easily be read into R as a data.frame and handled with less need for munging. No need to remove the zero-padding because the zeros aren't needed in the first place. You can subset the data with subset, as in test - read.table('test.dat',header=TRUE) expA - subset(test, experiment=='A') expB - subset(test, experiment=='B') so there is no need to deal with ragged/zero-padded arrays. Your plots can be grouped automatically with lattice: require(lattice) xyplot(value ~ measurement, data=test, group=experiment, type='b') xyplot(value ~ measurement | experiment, data=test, type='b') It is simple to do calculations by experiment using tapply. For example with(test, tapply(value, experiment, mean)) A B C D E 0.430 0.6016667 0.6975000 0.855 0.465 with(test, tapply(measurement, experiment, max)) A B C D E 4 6 4 2 8 Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
On Wed, Jul 8, 2009 at 10:51 AM, Michael A. Millermmill...@iupui.edu wrote: Mark wrote: Currently my data is one experiment per row, but that's wasting space as most experiments only take 20% of the row and 80% of the row is filled with 0's. I might want to make the array more narrow and have a flag somewhere in the 1st 10 columns that says the this row is a continuation row from the previous row. That way I could pack the array better, use less memory and when I do finally test for 0 I have a short line to traverse? This may be a bit off track from the data manipulation you are working on, but I thought I'd point out that another way to handle this sort of data is to make a table with one measurement per row, rather than one experiment per row. experiment measurement value A 1 0.27 A 2 0.66 A 3 0.24 A 4 0.55 B 1 0.13 B 2 0.65 B 3 0.83 B 4 0.41 B 5 0.92 B 6 0.67 C 1 0.75 C 2 0.97 C 3 0.49 C 4 0.58 D 1 1.00 D 2 0.71 E 1 0.11 E 2 0.50 E 3 0.98 E 4 0.07 E 5 0.94 E 6 0.57 E 7 0.34 E 8 0.21 If you wrote the output of your calculations in this way, one value per line, it can easily be read into R as a data.frame and handled with less need for munging. No need to remove the zero-padding because the zeros aren't needed in the first place. You can subset the data with subset, as in test - read.table('test.dat',header=TRUE) expA - subset(test, experiment=='A') expB - subset(test, experiment=='B') so there is no need to deal with ragged/zero-padded arrays. Your plots can be grouped automatically with lattice: require(lattice) xyplot(value ~ measurement, data=test, group=experiment, type='b') xyplot(value ~ measurement | experiment, data=test, type='b') It is simple to do calculations by experiment using tapply. For example with(test, tapply(value, experiment, mean)) A B C D E 0.430 0.6016667 0.6975000 0.855 0.465 with(test, tapply(measurement, experiment, max)) A B C D E 4 6 4 2 8 Mike Mike, It's not really that far off track as I didn't have any background when I started this in R. This is the first time I've used it. I simply chose to use a format that I thought would work for me in both Excel and R. I do like your examples. My impression of reshape coupled with cast is that it's pretty capable of giving me more or less the same format you suggest although it is a bit of work. Currently in my files I save only the start and finish times of the experiments and planned on calculating all the times in the middle if necessary. With this format I'd just write them out on each line and save that work in R. I suppose the files using this alternative format would be a lot larger on disk. I currently have 10 values + 500 observations per experiment with an average experiment tracking file containing maybe 500-1000 experiments. With this format in the worst I suppose I'd have (10+1) * 1000 per experiment on disk, but on average it would be less than that because as you say I wouldn't write out any zeros. Once in R in memory they'd be equivalent. Disk space doesn't matter but reading and writing the files might be slower. I suppose I don't really have to write the zeros out anyway, but at this point it's jsut one additional subset after going through reshape. It might be an advantage to get to the subset commands immediately but still I've got 10 independent variables and I suspect I'm going to be using reshape/cast more than once to get to my answers so I haven't been against learning how to work with it. Overall they are good inputs and I appreciate them. Thanks! Cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
Hi r-help-boun...@r-project.org napsal dne 06.07.2009 01:58:38: On Sun, Jul 5, 2009 at 1:44 PM, hadley wickhamh.wick...@gmail.com wrote: I think the root cause of a number of my coding problems in R right now is my lack of skills in reading and grabbing portions of the data out of arrays. I'm new at this. (And not a programmer) I need to find some good examples to read and test on that subject. If I could locate which column was called C1, then read row 3 from C1 up to the last value before a 0, I'd have proper data to plot for one line. Repeat as necessary through the array and I get all the lines. Doing the lines one at a time should allow me the opportunity to apply color or not plot based on values in the first few columns. Thanks, Mark test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 #Print array test Are the zeros always going to be arranged like this? i.e. for experiment there is a point at which all later values are zero? If so, the following is a much simpler way of getting to the core of your data, without fussing with overly complicated matrix indexing: library(reshape) testm - melt(test, id = c(A, B)) subset(testm, value 0) I suspect you will also find this form easier to plot and analyse. Hadley -- http://had.co.nz/ Hi Hadley, I wanted to look at reshape. Yes, there exists a point in each row (unless I get to the end with all numbers) where I get to a zero and everything to the right is zero. I'm looking at ReShape. It's interesting but I clearly don't understand it yet so I'm reading your ReShaping data with the reshap package form 11/07. Interesting. I know so little about R that I'm sort of drowning at this point that it's hard for me to understand why this would make plotting easier. Analysis possibly. Just the way it goes when you get started with something new. E.g. to give different colour according to C1-C6 and/or different shape for each A value. test. - subset(testm, value 0) plot(test.$value, col=as.numeric(test.$variable), pch=test.$A) And even fancier plots with ggplot2 package. Regards Petr In ReShape lingo I think I have ID's. They cover things like time, date, success/failure and a few other things of interest. Once the data starts on a row it is all data from there on to the end of the row. My initial goal is to make a line plot of the data on a single row. All the data points should connect together. There is no real interaction planned with data on other rows, at least at this time. Thanks for the pointers and the code stub. I'll be looking at this. Cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
Hi. As I said in my first email, converting your data into a long format makes a lot of sense. I'm sorry that you find it hard ... to understand why this would make plotting easier. Wide format: Subject ID, Experiment ID, humidity, light, whatever, T1, T2,T3,T4. is much better rotated to be Subject ID, Experiment ID, humidity, light, whatever, time, result So you end up with multiple rows per patient/individual/experiment. It is much easier to analyse and plot data like this, particularly if the original data is ragged. ie. you have a different number of measurements per patient/individual/experiment. Many plotting functions will support connecting related data (e.g. by virtue of a particular identifier) and support much of what you are likely to want (different plotting symbols, panelled plots depending on experimental conditions etc) without you having to manually work through data as you are suggesting. Best wishes, Mark 2009/7/6 Mark Knecht markkne...@gmail.com: On Sun, Jul 5, 2009 at 1:44 PM, hadley wickhamh.wick...@gmail.com wrote: I think the root cause of a number of my coding problems in R right now is my lack of skills in reading and grabbing portions of the data out of arrays. I'm new at this. (And not a programmer) I need to find some good examples to read and test on that subject. If I could locate which column was called C1, then read row 3 from C1 up to the last value before a 0, I'd have proper data to plot for one line. Repeat as necessary through the array and I get all the lines. Doing the lines one at a time should allow me the opportunity to apply color or not plot based on values in the first few columns. Thanks, Mark test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 #Print array test Are the zeros always going to be arranged like this? i.e. for experiment there is a point at which all later values are zero? If so, the following is a much simpler way of getting to the core of your data, without fussing with overly complicated matrix indexing: library(reshape) testm - melt(test, id = c(A, B)) subset(testm, value 0) I suspect you will also find this form easier to plot and analyse. Hadley -- http://had.co.nz/ Hi Hadley, I wanted to look at reshape. Yes, there exists a point in each row (unless I get to the end with all numbers) where I get to a zero and everything to the right is zero. I'm looking at ReShape. It's interesting but I clearly don't understand it yet so I'm reading your ReShaping data with the reshap package form 11/07. Interesting. I know so little about R that I'm sort of drowning at this point that it's hard for me to understand why this would make plotting easier. Analysis possibly. Just the way it goes when you get started with something new. In ReShape lingo I think I have ID's. They cover things like time, date, success/failure and a few other things of interest. Once the data starts on a row it is all data from there on to the end of the row. My initial goal is to make a line plot of the data on a single row. All the data points should connect together. There is no real interaction planned with data on other rows, at least at this time. Thanks for the pointers and the code stub. I'll be looking at this. Cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr. Mark Wardle Specialist registrar, Neurology Cardiff, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
Hi Mark, Don't be the least bit sorry that I'm finding any of this hard to understand. That's my problem. I ordered the Phil Spektor's Data Manipulation with R (Use R) book last night as I realize I need to go through some sort of training. Hopefully that will help clear up some of my questions about the language in general without burdening this list so much. This morning, taking your input to heart, I started working more with Hadley's code example. ReShape is pretty slick. I added a MyExperiments - cast(MyResults, A ~ variable) and got a new data.frame that looks like it's more or less ready to print. Note that I'm not attached to data.frames. It's just that I get one with read.csv and then don't know when to change it to something else. I then tried cast to put the molten data back into a data.frame. (Maybe this is the point to switch to a list or some other type?) That done then MyExperiments[1,] gives me back the data for experiment #1 with the experiment number in column 1. If I can figure out how to get rid of that then I think I can get the experiment plotted. Put that in a loop and I should get 1000 experiments plotted which is my goal. This is all very cool as it turns out to be very few lines of code to dig through the array. I'll have a couple of other problems (for me) in working with the real array as the name space is much bigger and I need to learn how to build things like (C1,C2,C3, ...,C1200) automatically, but I'm sure there's a way to do that. Note that the code below can 'fail' in the sense of having NA's in the middle because the runif doesn't guarantee 0's to the right. My real data won't have that problem Thanks, Mark library(reshape) test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 #Print array test #Display column names names(test) ReShapeX - melt(test, id = c(A, B)) MyResults-subset(ReShapeX, value 0) names(MyResults) MyResults MyExperiments - cast(MyResults,A ~ variable) class(MyExperiments) MyExperiments[1,] MyExperiments[2,] MyExperiments[3,] On Mon, Jul 6, 2009 at 3:13 AM, Mark Wardlem...@wardle.org wrote: Hi. As I said in my first email, converting your data into a long format makes a lot of sense. I'm sorry that you find it hard ... to understand why this would make plotting easier. Wide format: Subject ID, Experiment ID, humidity, light, whatever, T1, T2,T3,T4. is much better rotated to be Subject ID, Experiment ID, humidity, light, whatever, time, result So you end up with multiple rows per patient/individual/experiment. It is much easier to analyse and plot data like this, particularly if the original data is ragged. ie. you have a different number of measurements per patient/individual/experiment. Many plotting functions will support connecting related data (e.g. by virtue of a particular identifier) and support much of what you are likely to want (different plotting symbols, panelled plots depending on experimental conditions etc) without you having to manually work through data as you are suggesting. Best wishes, Mark 2009/7/6 Mark Knecht markkne...@gmail.com: On Sun, Jul 5, 2009 at 1:44 PM, hadley wickhamh.wick...@gmail.com wrote: I think the root cause of a number of my coding problems in R right now is my lack of skills in reading and grabbing portions of the data out of arrays. I'm new at this. (And not a programmer) I need to find some good examples to read and test on that subject. If I could locate which column was called C1, then read row 3 from C1 up to the last value before a 0, I'd have proper data to plot for one line. Repeat as necessary through the array and I get all the lines. Doing the lines one at a time should allow me the opportunity to apply color or not plot based on values in the first few columns. Thanks, Mark test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 #Print array test Are the zeros always going to be arranged like this? i.e. for experiment there is a point at which all later values are zero? If so, the following is a much simpler way of getting to the core of your data, without fussing with overly complicated matrix indexing: library(reshape) testm - melt(test, id = c(A, B)) subset(testm, value 0) I suspect you will also find this form easier to plot and analyse. Hadley -- http://had.co.nz/ Hi Hadley, I wanted to look at reshape. Yes, there exists a point in each row (unless I get to the end with all numbers) where I get
Re: [R] OK - I got the data - now what? :-)
Hi. Essentially your data is currently in wide format, with repeated measures in different columns. For most analysis and in particular for graphing, it is frequently helpful to reshape your data into a long format, with one row per data value and additional variables to list experiment or subject identifier, experimental conditions etc. see ?reshape and Dr. Wickham's reshape package (http://had.co.nz/reshape/) Good luck, Mark 2009/7/5 Mark Knecht markkne...@gmail.com: OK, I guess I'm getting better at the data part of R. I wrote a program outside of R this morning to dump a bunch of experimental data. It's a sort of ragged array - about 700 rows and 400 columns, but the amount of data in each column varies based on the length of the experiment. The real data ends with a 0 following some non-zero value. It might be as short as 5 to 10 columns or as many as 390. The first 9 columns contain some data about when the experiment was run and a few other things I thought I might be interested in later. All the data starts in column 10 and has headers saying C1, C2, C3, C4, etc., up to C390 The first value for every experiment is some value I will normalize and then the values following are above and below the original tracing out the path that the experiment took, ending somewhere to the right but not a fixed number of readings. R reads it in fine and it looks good so far. Now, what I thought I might do with R is plot all 700 rows as individual lines, giving them some color based on info in columns 1-9, but suddenly I'm lost again in plots which I think should be fairly easy. How would I go about creating a plot for even one line, much less all of them? I don't have a row with 1,2,3,4 to us as the X axis values. I could go back and put one in the data but then I don't think that should really be required, or I could go back and make the headers for the whole array 1:400 and then plot from 10:400 but I thought I read that headers cannot start with numbers. Maybe the X axis values for a plot can actually be non-numeric C1, C2, C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe I should strip the C from C1 and be left with 1? Maybe the best thing is to copy the data for one line to another data.frame or array and then plot that? Just sort of lost looking at help files. Thanks for any ideas you can send along. Ask questions if I didn't explain my problem well enough. Not looking for anyone to do my work, just trying to get the concepts right Cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr. Mark Wardle Specialist registrar, Neurology Cardiff, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
On Sat, Jul 4, 2009 at 5:22 PM, jim holtmanjholt...@gmail.com wrote: See if this example helps; show how to either plot the row or columns of a data frame: test - data.frame(C1=runif(10), C2=runif(10), C3=runif(10)) test C1 C2 C3 1 0.91287592 0.3390729 0.4346595 2 0.29360337 0.8394404 0.7125147 3 0.45906573 0.3466835 0.344 4 0.33239467 0.3337749 0.3253522 5 0.65087047 0.4763512 0.7570871 6 0.25801678 0.8921983 0.2026923 7 0.47854525 0.8643395 0.7111212 8 0.76631067 0.3899895 0.1216919 9 0.08424691 0.7773207 0.2454885 10 0.87532133 0.9606180 0.1433044 # this will plot each column (C1, C2, C3) matplot(test, type='o') # plot each row matplot(t(test), type='o') On Sat, Jul 4, 2009 at 8:02 PM, Mark Knechtmarkkne...@gmail.com wrote: OK, I guess I'm getting better at the data part of R. I wrote a program outside of R this morning to dump a bunch of experimental data. It's a sort of ragged array - about 700 rows and 400 columns, but the amount of data in each column varies based on the length of the experiment. The real data ends with a 0 following some non-zero value. It might be as short as 5 to 10 columns or as many as 390. The first 9 columns contain some data about when the experiment was run and a few other things I thought I might be interested in later. All the data starts in column 10 and has headers saying C1, C2, C3, C4, etc., up to C390 The first value for every experiment is some value I will normalize and then the values following are above and below the original tracing out the path that the experiment took, ending somewhere to the right but not a fixed number of readings. R reads it in fine and it looks good so far. Now, what I thought I might do with R is plot all 700 rows as individual lines, giving them some color based on info in columns 1-9, but suddenly I'm lost again in plots which I think should be fairly easy. How would I go about creating a plot for even one line, much less all of them? I don't have a row with 1,2,3,4 to us as the X axis values. I could go back and put one in the data but then I don't think that should really be required, or I could go back and make the headers for the whole array 1:400 and then plot from 10:400 but I thought I read that headers cannot start with numbers. Maybe the X axis values for a plot can actually be non-numeric C1, C2, C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe I should strip the C from C1 and be left with 1? Maybe the best thing is to copy the data for one line to another data.frame or array and then plot that? Just sort of lost looking at help files. Thanks for any ideas you can send along. Ask questions if I didn't explain my problem well enough. Not looking for anyone to do my work, just trying to get the concepts right Cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 Hey Jim, Thanks for the pointers on matplot. I suspect that will be useful one of these days. I'm attaching a little code to make a test case closer to what I have to deal with at the bottom. My problem with your data was that you plot everything. In my data I need to plot only a portion of it, and in the array not every cell is valid - I don't want to plot cells that have 0.00 as a value. In the array 'test' I need to plot the general area defined by C1:C6, each row as a line, but stop plotting each row when I run into a 0. Keep in mind that I don't know what column C1 starts in. It is likely to change over time. I think the root cause of a number of my coding problems in R right now is my lack of skills in reading and grabbing portions of the data out of arrays. I'm new at this. (And not a programmer) I need to find some good examples to read and test on that subject. If I could locate which column was called C1, then read row 3 from C1 up to the last value before a 0, I'd have proper data to plot for one line. Repeat as necessary through the array and I get all the lines. Doing the lines one at a time should allow me the opportunity to apply color or not plot based on values in the first few columns. Thanks, Mark test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 #Print array test __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented,
Re: [R] OK - I got the data - now what? :-)
On Sun, Jul 5, 2009 at 12:00 AM, Mark Wardlem...@wardle.org wrote: Hi. Essentially your data is currently in wide format, with repeated measures in different columns. For most analysis and in particular for graphing, it is frequently helpful to reshape your data into a long format, with one row per data value and additional variables to list experiment or subject identifier, experimental conditions etc. see ?reshape and Dr. Wickham's reshape package (http://had.co.nz/reshape/) Good luck, Mark This looks interesting. Thanks! cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
On Jul 5, 2009, at 9:53 AM, Mark Knecht wrote: On Sat, Jul 4, 2009 at 5:22 PM, jim holtmanjholt...@gmail.com wrote: See if this example helps; show how to either plot the row or columns of a data frame: test - data.frame(C1=runif(10), C2=runif(10), C3=runif(10)) test C1C2C3 1 0.91287592 0.3390729 0.4346595 2 0.29360337 0.8394404 0.7125147 3 0.45906573 0.3466835 0.344 4 0.33239467 0.3337749 0.3253522 5 0.65087047 0.4763512 0.7570871 6 0.25801678 0.8921983 0.2026923 7 0.47854525 0.8643395 0.7111212 8 0.76631067 0.3899895 0.1216919 9 0.08424691 0.7773207 0.2454885 10 0.87532133 0.9606180 0.1433044 # this will plot each column (C1, C2, C3) matplot(test, type='o') # plot each row matplot(t(test), type='o') On Sat, Jul 4, 2009 at 8:02 PM, Mark Knechtmarkkne...@gmail.com wrote: OK, I guess I'm getting better at the data part of R. I wrote a program outside of R this morning to dump a bunch of experimental data. It's a sort of ragged array - about 700 rows and 400 columns, but the amount of data in each column varies based on the length of the experiment. The real data ends with a 0 following some non-zero value. It might be as short as 5 to 10 columns or as many as 390. The first 9 columns contain some data about when the experiment was run and a few other things I thought I might be interested in later. All the data starts in column 10 and has headers saying C1, C2, C3, C4, etc., up to C390 The first value for every experiment is some value I will normalize and then the values following are above and below the original tracing out the path that the experiment took, ending somewhere to the right but not a fixed number of readings. R reads it in fine and it looks good so far. Now, what I thought I might do with R is plot all 700 rows as individual lines, giving them some color based on info in columns 1-9, but suddenly I'm lost again in plots which I think should be fairly easy. How would I go about creating a plot for even one line, much less all of them? I don't have a row with 1,2,3,4 to us as the X axis values. I could go back and put one in the data but then I don't think that should really be required, or I could go back and make the headers for the whole array 1:400 and then plot from 10:400 but I thought I read that headers cannot start with numbers. Maybe the X axis values for a plot can actually be non-numeric C1, C2, C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe I should strip the C from C1 and be left with 1? Maybe the best thing is to copy the data for one line to another data.frame or array and then plot that? Just sort of lost looking at help files. Thanks for any ideas you can send along. Ask questions if I didn't explain my problem well enough. Not looking for anyone to do my work, just trying to get the concepts right Cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 Hey Jim, Thanks for the pointers on matplot. I suspect that will be useful one of these days. I'm attaching a little code to make a test case closer to what I have to deal with at the bottom. My problem with your data was that you plot everything. In my data I need to plot only a portion of it, and in the array not every cell is valid - I don't want to plot cells that have 0.00 as a value. In the array 'test' I need to plot the general area defined by C1:C6, each row as a line, but stop plotting each row when I run into a 0. Keep in mind that I don't know what column C1 starts in. It is likely to change over time. I think the root cause of a number of my coding problems in R right now is my lack of skills in reading and grabbing portions of the data out of arrays. I'm new at this. (And not a programmer) I need to find some good examples to read and test on that subject. If I could locate which column was called C1, then read row 3 from C1 up to the last value before a 0, I'd have proper data to plot for one line. Repeat as necessary through the array and I get all the lines. Doing the lines one at a time should allow me the opportunity to apply color or not plot based on values in the first few columns. Thanks, Mark test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 #Print array test ?[ for the help page on Extract which is a gold mine of useful methods A single row can be extracted with: test[3, ] Two rows: test[3:4, ] And individual elements of a vector can be further specified:
Re: [R] OK - I got the data - now what? :-)
David Winsemius wrote: On Jul 5, 2009, at 9:53 AM, Mark Knecht wrote: On Sat, Jul 4, 2009 at 5:22 PM, jim holtmanjholt...@gmail.com wrote: See if this example helps; show how to either plot the row or columns of a data frame: test - data.frame(C1=runif(10), C2=runif(10), C3=runif(10)) test C1C2C3 1 0.91287592 0.3390729 0.4346595 2 0.29360337 0.8394404 0.7125147 3 0.45906573 0.3466835 0.344 4 0.33239467 0.3337749 0.3253522 5 0.65087047 0.4763512 0.7570871 6 0.25801678 0.8921983 0.2026923 7 0.47854525 0.8643395 0.7111212 8 0.76631067 0.3899895 0.1216919 9 0.08424691 0.7773207 0.2454885 10 0.87532133 0.9606180 0.1433044 # this will plot each column (C1, C2, C3) matplot(test, type='o') # plot each row matplot(t(test), type='o') On Sat, Jul 4, 2009 at 8:02 PM, Mark Knechtmarkkne...@gmail.com wrote: OK, I guess I'm getting better at the data part of R. I wrote a program outside of R this morning to dump a bunch of experimental data. It's a sort of ragged array - about 700 rows and 400 columns, but the amount of data in each column varies based on the length of the experiment. The real data ends with a 0 following some non-zero value. It might be as short as 5 to 10 columns or as many as 390. The first 9 columns contain some data about when the experiment was run and a few other things I thought I might be interested in later. All the data starts in column 10 and has headers saying C1, C2, C3, C4, etc., up to C390 The first value for every experiment is some value I will normalize and then the values following are above and below the original tracing out the path that the experiment took, ending somewhere to the right but not a fixed number of readings. R reads it in fine and it looks good so far. Now, what I thought I might do with R is plot all 700 rows as individual lines, giving them some color based on info in columns 1-9, but suddenly I'm lost again in plots which I think should be fairly easy. How would I go about creating a plot for even one line, much less all of them? I don't have a row with 1,2,3,4 to us as the X axis values. I could go back and put one in the data but then I don't think that should really be required, or I could go back and make the headers for the whole array 1:400 and then plot from 10:400 but I thought I read that headers cannot start with numbers. Maybe the X axis values for a plot can actually be non-numeric C1, C2, C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe I should strip the C from C1 and be left with 1? Maybe the best thing is to copy the data for one line to another data.frame or array and then plot that? Just sort of lost looking at help files. Thanks for any ideas you can send along. Ask questions if I didn't explain my problem well enough. Not looking for anyone to do my work, just trying to get the concepts right Cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 Hey Jim, Thanks for the pointers on matplot. I suspect that will be useful one of these days. I'm attaching a little code to make a test case closer to what I have to deal with at the bottom. My problem with your data was that you plot everything. In my data I need to plot only a portion of it, and in the array not every cell is valid - I don't want to plot cells that have 0.00 as a value. In the array 'test' I need to plot the general area defined by C1:C6, each row as a line, but stop plotting each row when I run into a 0. Keep in mind that I don't know what column C1 starts in. It is likely to change over time. I think the root cause of a number of my coding problems in R right now is my lack of skills in reading and grabbing portions of the data out of arrays. I'm new at this. (And not a programmer) I need to find some good examples to read and test on that subject. If I could locate which column was called C1, then read row 3 from C1 up to the last value before a 0, I'd have proper data to plot for one line. Repeat as necessary through the array and I get all the lines. Doing the lines one at a time should allow me the opportunity to apply color or not plot based on values in the first few columns. Thanks, Mark test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 #Print array test ?[ for the help page on Extract which is a gold mine of useful methods A single row can be extracted with: test[3, ] Two rows: test[3:4, ] And individual elements of a vector can be further specified:
Re: [R] OK - I got the data - now what? :-)
On Sun, Jul 5, 2009 at 7:35 AM, David Winsemiusdwinsem...@comcast.net wrote: On Jul 5, 2009, at 9:53 AM, Mark Knecht wrote: On Sat, Jul 4, 2009 at 5:22 PM, jim holtmanjholt...@gmail.com wrote: See if this example helps; show how to either plot the row or columns of a data frame: test - data.frame(C1=runif(10), C2=runif(10), C3=runif(10)) test C1 C2 C3 1 0.91287592 0.3390729 0.4346595 2 0.29360337 0.8394404 0.7125147 3 0.45906573 0.3466835 0.344 4 0.33239467 0.3337749 0.3253522 5 0.65087047 0.4763512 0.7570871 6 0.25801678 0.8921983 0.2026923 7 0.47854525 0.8643395 0.7111212 8 0.76631067 0.3899895 0.1216919 9 0.08424691 0.7773207 0.2454885 10 0.87532133 0.9606180 0.1433044 # this will plot each column (C1, C2, C3) matplot(test, type='o') # plot each row matplot(t(test), type='o') On Sat, Jul 4, 2009 at 8:02 PM, Mark Knechtmarkkne...@gmail.com wrote: OK, I guess I'm getting better at the data part of R. I wrote a program outside of R this morning to dump a bunch of experimental data. It's a sort of ragged array - about 700 rows and 400 columns, but the amount of data in each column varies based on the length of the experiment. The real data ends with a 0 following some non-zero value. It might be as short as 5 to 10 columns or as many as 390. The first 9 columns contain some data about when the experiment was run and a few other things I thought I might be interested in later. All the data starts in column 10 and has headers saying C1, C2, C3, C4, etc., up to C390 The first value for every experiment is some value I will normalize and then the values following are above and below the original tracing out the path that the experiment took, ending somewhere to the right but not a fixed number of readings. R reads it in fine and it looks good so far. Now, what I thought I might do with R is plot all 700 rows as individual lines, giving them some color based on info in columns 1-9, but suddenly I'm lost again in plots which I think should be fairly easy. How would I go about creating a plot for even one line, much less all of them? I don't have a row with 1,2,3,4 to us as the X axis values. I could go back and put one in the data but then I don't think that should really be required, or I could go back and make the headers for the whole array 1:400 and then plot from 10:400 but I thought I read that headers cannot start with numbers. Maybe the X axis values for a plot can actually be non-numeric C1, C2, C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe I should strip the C from C1 and be left with 1? Maybe the best thing is to copy the data for one line to another data.frame or array and then plot that? Just sort of lost looking at help files. Thanks for any ideas you can send along. Ask questions if I didn't explain my problem well enough. Not looking for anyone to do my work, just trying to get the concepts right Cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 Hey Jim, Thanks for the pointers on matplot. I suspect that will be useful one of these days. I'm attaching a little code to make a test case closer to what I have to deal with at the bottom. My problem with your data was that you plot everything. In my data I need to plot only a portion of it, and in the array not every cell is valid - I don't want to plot cells that have 0.00 as a value. In the array 'test' I need to plot the general area defined by C1:C6, each row as a line, but stop plotting each row when I run into a 0. Keep in mind that I don't know what column C1 starts in. It is likely to change over time. I think the root cause of a number of my coding problems in R right now is my lack of skills in reading and grabbing portions of the data out of arrays. I'm new at this. (And not a programmer) I need to find some good examples to read and test on that subject. If I could locate which column was called C1, then read row 3 from C1 up to the last value before a 0, I'd have proper data to plot for one line. Repeat as necessary through the array and I get all the lines. Doing the lines one at a time should allow me the opportunity to apply color or not plot based on values in the first few columns. Thanks, Mark test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 #Print array test ?[ for the help page on Extract which is a gold mine of useful methods
Re: [R] OK - I got the data - now what? :-)
On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote: David Winsemius wrote: So if your values are calculated from other values then consider using all.equal() And repeated applications of the testing criteria process are effective: test[3,][which(names(test)==C1):(which(test[3,] == 0.0)-1)] C1 C2 C3 3 0.52 0.66 0.51 (and a warning that does not seem accurate to me.) In which(names(test) == C1):(which(test[3, ] == 0) - 1) : numerical expression has 3 elements: only the first used David, # which(test[3,] == 0.0) [1] 6 7 8 and in a:b a and b must be length 1 vectors (scalars) otherwise just the first element (in this case 6) is used. That leads us to the conclusion that writing the line above is not really the cleanest way or you intended something different Thanks, Uwe. I see my confusion. I did want 6 to be used and it looks as though I would not be getting in truouble this way, but a cleaner method would be to access only the first element of which(test[3, ] == 0): test[3,][ which(names(test) == C1) : (which(test[3,] == 0.0)[1]-1) ] David Seems to me that all of the element were used. I cannot explain that warning but am pretty sure it can be ignored. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
On Sun, Jul 5, 2009 at 8:18 AM, David Winsemiusdwinsem...@comcast.net wrote: On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote: David Winsemius wrote: So if your values are calculated from other values then consider using all.equal() And repeated applications of the testing criteria process are effective: test[3,][which(names(test)==C1):(which(test[3,] == 0.0)-1)] C1 C2 C3 3 0.52 0.66 0.51 (and a warning that does not seem accurate to me.) In which(names(test) == C1):(which(test[3, ] == 0) - 1) : numerical expression has 3 elements: only the first used David, # which(test[3,] == 0.0) [1] 6 7 8 and in a:b a and b must be length 1 vectors (scalars) otherwise just the first element (in this case 6) is used. That leads us to the conclusion that writing the line above is not really the cleanest way or you intended something different Thanks, Uwe. I see my confusion. I did want 6 to be used and it looks as though I would not be getting in truouble this way, but a cleaner method would be to access only the first element of which(test[3, ] == 0): test[3,][ which(names(test) == C1) : (which(test[3,] == 0.0)[1]-1) ] David Seems to me that all of the element were used. I cannot explain that warning but am pretty sure it can be ignored. David Winsemius, MD Heritage Laboratories West Hartford, CT OK - making lots more headway. Thanks for your help. QUESTION: How do I handle the case where I'm testing for 0 and don't find it? In this case I need to all of the row from C1:C6. test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 test #C1 always the same so calculate it only once StartCol - which(names(test)==C1) #Print row 3 explicitly test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)] #Row 6 fails because 0 is not found test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)] EndCol - which(test[6,] == 0.0)[1]-1 EndCol Thanks, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
On Jul 5, 2009, at 12:19 PM, Mark Knecht wrote: On Sun, Jul 5, 2009 at 8:18 AM, David Winsemiusdwinsem...@comcast.net wrote: On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote: David Winsemius wrote: So if your values are calculated from other values then consider using all.equal() And repeated applications of the testing criteria process are effective: test[3,][which(names(test)==C1):(which(test[3,] == 0.0)-1)] C1 C2 C3 3 0.52 0.66 0.51 (and a warning that does not seem accurate to me.) In which(names(test) == C1):(which(test[3, ] == 0) - 1) : numerical expression has 3 elements: only the first used David, # which(test[3,] == 0.0) [1] 6 7 8 and in a:b a and b must be length 1 vectors (scalars) otherwise just the first element (in this case 6) is used. That leads us to the conclusion that writing the line above is not really the cleanest way or you intended something different Thanks, Uwe. I see my confusion. I did want 6 to be used and it looks as though I would not be getting in truouble this way, but a cleaner method would be to access only the first element of which(test[3, ] == 0): test[3,][ which(names(test) == C1) : (which(test[3,] == 0.0) [1]-1) ] David Seems to me that all of the element were used. I cannot explain that warning but am pretty sure it can be ignored. David OK - making lots more headway. Thanks for your help. QUESTION: How do I handle the case where I'm testing for 0 and don't find it? In this case I need to all of the row from C1:C6. test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 test #C1 always the same so calculate it only once StartCol - which(names(test)==C1) #Print row 3 explicitly test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)] #Row 6 fails because 0 is not found test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)] EndCol - which(test[6,] == 0.0)[1]-1 EndCol It's getting a bit Baroque, but here is a solution that handles an NA: test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) , ncol(test), which(test[6,] == 0.0) [1]-1 ) ] #- C1 C2 C3 C4 C5 C6 6 0.33 0.84 0.51 0.86 0.84 0.15 Maybe an R-meister can offer something more compact? David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
David Winsemius wrote: On Jul 5, 2009, at 12:19 PM, Mark Knecht wrote: On Sun, Jul 5, 2009 at 8:18 AM, David Winsemiusdwinsem...@comcast.net wrote: On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote: David Winsemius wrote: So if your values are calculated from other values then consider using all.equal() And repeated applications of the testing criteria process are effective: test[3,][which(names(test)==C1):(which(test[3,] == 0.0)-1)] C1 C2 C3 3 0.52 0.66 0.51 (and a warning that does not seem accurate to me.) In which(names(test) == C1):(which(test[3, ] == 0) - 1) : numerical expression has 3 elements: only the first used David, # which(test[3,] == 0.0) [1] 6 7 8 and in a:b a and b must be length 1 vectors (scalars) otherwise just the first element (in this case 6) is used. That leads us to the conclusion that writing the line above is not really the cleanest way or you intended something different Thanks, Uwe. I see my confusion. I did want 6 to be used and it looks as though I would not be getting in truouble this way, but a cleaner method would be to access only the first element of which(test[3, ] == 0): test[3,][ which(names(test) == C1) : (which(test[3,] == 0.0)[1]-1) ] David Seems to me that all of the element were used. I cannot explain that warning but am pretty sure it can be ignored. David OK - making lots more headway. Thanks for your help. QUESTION: How do I handle the case where I'm testing for 0 and don't find it? In this case I need to all of the row from C1:C6. test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 test #C1 always the same so calculate it only once StartCol - which(names(test)==C1) #Print row 3 explicitly test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)] #Row 6 fails because 0 is not found test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)] EndCol - which(test[6,] == 0.0)[1]-1 EndCol It's getting a bit Baroque, but here is a solution that handles an NA: test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) , ncol(test), which(test[6,] == 0.0)[1]-1 ) ] #- C1 C2 C3 C4 C5 C6 6 0.33 0.84 0.51 0.86 0.84 0.15 Maybe an R-meister can offer something more compact? So let's wait for some R-meister, I'd write even more Reason: testing for exactly zero after possible calculations is a bit dangerous and ifelse() is designed for vectorized operations but is not efficient for scalar operations, particularly since both expressions are evaluated, so if() else would be preferable, but we could use min() instead. Finally, a:b could end up in 5:3 without a warning and I'd use seq() instead. Hence I'd prefer: temp - which(sapply(test[6,], function(x, y) isTRUE(all.equal(x,y)), 0))[1] test[6, seq(from = StartCol, to = min(c(temp - 1, ncol(test)), na.rm = TRUE), by = 1)] Best, Uwe Ligges David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
2009/7/5 Uwe Ligges lig...@statistik.tu-dortmund.de: David Winsemius wrote: On Jul 5, 2009, at 12:19 PM, Mark Knecht wrote: On Sun, Jul 5, 2009 at 8:18 AM, David Winsemiusdwinsem...@comcast.net wrote: On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote: David Winsemius wrote: So if your values are calculated from other values then consider using all.equal() And repeated applications of the testing criteria process are effective: test[3,][which(names(test)==C1):(which(test[3,] == 0.0)-1)] C1 C2 C3 3 0.52 0.66 0.51 (and a warning that does not seem accurate to me.) In which(names(test) == C1):(which(test[3, ] == 0) - 1) : numerical expression has 3 elements: only the first used David, # which(test[3,] == 0.0) [1] 6 7 8 and in a:b a and b must be length 1 vectors (scalars) otherwise just the first element (in this case 6) is used. That leads us to the conclusion that writing the line above is not really the cleanest way or you intended something different Thanks, Uwe. I see my confusion. I did want 6 to be used and it looks as though I would not be getting in truouble this way, but a cleaner method would be to access only the first element of which(test[3, ] == 0): test[3,][ which(names(test) == C1) : (which(test[3,] == 0.0)[1]-1) ] David Seems to me that all of the element were used. I cannot explain that warning but am pretty sure it can be ignored. David OK - making lots more headway. Thanks for your help. QUESTION: How do I handle the case where I'm testing for 0 and don't find it? In this case I need to all of the row from C1:C6. test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 test #C1 always the same so calculate it only once StartCol - which(names(test)==C1) #Print row 3 explicitly test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)] #Row 6 fails because 0 is not found test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)] EndCol - which(test[6,] == 0.0)[1]-1 EndCol It's getting a bit Baroque, but here is a solution that handles an NA: test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) , ncol(test), which(test[6,] == 0.0)[1]-1 ) ] #- C1 C2 C3 C4 C5 C6 6 0.33 0.84 0.51 0.86 0.84 0.15 Maybe an R-meister can offer something more compact? So let's wait for some R-meister, I'd write even more Reason: testing for exactly zero after possible calculations is a bit dangerous and ifelse() is designed for vectorized operations but is not efficient for scalar operations, particularly since both expressions are evaluated, so if() else would be preferable, but we could use min() instead. Finally, a:b could end up in 5:3 without a warning and I'd use seq() instead. Hence I'd prefer: temp - which(sapply(test[6,], function(x, y) isTRUE(all.equal(x,y)), 0))[1] test[6, seq(from = StartCol, to = min(c(temp - 1, ncol(test)), na.rm = TRUE), by = 1)] I appreciate both of the answers. I don't completely understand them, but I do appreciate them. Thanks! I was wondering whether it's easy to simply test the last column for ==0, and if true run the previous command, if false just return everything up to the end of the row? Currently my data is one experiment per row, but that's wasting space as most experiments only take 20% of the row and 80% of the row is filled with 0's. I might want to make the array more narrow and have a flag somewhere in the 1st 10 columns that says the this row is a continuation row from the previous row. That way I could pack the array better, use less memory and when I do finally test for 0 I have a short line to traverse? Just an idea. Anyway, I suspect either of these will suit my short term needs. On to the next step. Cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
On Jul 5, 2009, at 1:19 PM, Uwe Ligges wrote: snippedpreample test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 test #C1 always the same so calculate it only once StartCol - which(names(test)==C1) #Print row 3 explicitly test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)] #Row 6 fails because 0 is not found test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)] EndCol - which(test[6,] == 0.0)[1]-1 EndCol It's getting a bit Baroque, but here is a solution that handles an NA: test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) , ncol(test), which(test[6,] == 0.0) [1]-1 ) ] #- C1 C2 C3 C4 C5 C6 6 0.33 0.84 0.51 0.86 0.84 0.15 Maybe an R-meister can offer something more compact? So let's wait for some R-meister, I'd write even more Reason: testing for exactly zero after possible calculations is a bit dangerous and ifelse() is designed for vectorized operations but is not efficient for scalar operations, particularly since both expressions are evaluated, so if() else would be preferable, but we could use min() instead. Finally, a:b could end up in 5:3 without a warning and I'd use seq() instead. Hence I'd prefer: temp - which(sapply(test[6,], function(x, y) isTRUE(all.equal(x,y)), 0))[1] This appears to be learning moment for me. Do I have it correctly that the first argument to sapply, the vector(test[6,], gets passed element-wise to the first parameter of the function, x, and the second argument, 0, is getting passed via recycling to the second parameter, y, through the , ...) mechanism of the sapply function? test[6, seq(from = StartCol, to = min(c(temp - 1, ncol(test)), na.rm = TRUE), by = 1)] I had tried a min() solution and got Inf in return when there was an NA in the vector, but did not realize that it had an na.rm mode. Thanks for the meisterhaft corrections. Best, Uwe Ligges David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
David Winsemius wrote: On Jul 5, 2009, at 1:19 PM, Uwe Ligges wrote: snippedpreample test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 test #C1 always the same so calculate it only once StartCol - which(names(test)==C1) #Print row 3 explicitly test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)] #Row 6 fails because 0 is not found test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)] EndCol - which(test[6,] == 0.0)[1]-1 EndCol It's getting a bit Baroque, but here is a solution that handles an NA: test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) , ncol(test), which(test[6,] == 0.0)[1]-1 ) ] #- C1 C2 C3 C4 C5 C6 6 0.33 0.84 0.51 0.86 0.84 0.15 Maybe an R-meister can offer something more compact? So let's wait for some R-meister, I'd write even more Reason: testing for exactly zero after possible calculations is a bit dangerous and ifelse() is designed for vectorized operations but is not efficient for scalar operations, particularly since both expressions are evaluated, so if() else would be preferable, but we could use min() instead. Finally, a:b could end up in 5:3 without a warning and I'd use seq() instead. Hence I'd prefer: temp - which(sapply(test[6,], function(x, y) isTRUE(all.equal(x,y)), 0))[1] This appears to be learning moment for me. Do I have it correctly that the first argument to sapply, the vector(test[6,], gets passed element-wise to the first parameter of the function, x, Yes. and the second argument, 0, is getting passed via recycling to the second parameter, y, through the , ...) mechanism of the sapply function? No, each time the whole thing (which is just 0 here) is passed to sapply, not via recycling. test[6, seq(from = StartCol, to = min(c(temp - 1, ncol(test)), na.rm = TRUE), by = 1)] I had tried a min() solution and got Inf in return when there was an NA in the vector, but did not realize that it had an na.rm mode. Thanks for the meisterhaft corrections. :-) Uwe Best, Uwe Ligges David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
Try this: subset(test[3,], select=C1:C6)[,subset(test[3,], select = C1:C6) 0] subset(test[6,], select=C1:C6)[,subset(test[6,], select = C1:C6) 0] On Sun, Jul 5, 2009 at 1:19 PM, Mark Knecht markkne...@gmail.com wrote: On Sun, Jul 5, 2009 at 8:18 AM, David Winsemiusdwinsem...@comcast.net wrote: On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote: David Winsemius wrote: So if your values are calculated from other values then consider using all.equal() And repeated applications of the testing criteria process are effective: test[3,][which(names(test)==C1):(which(test[3,] == 0.0)-1)] C1 C2 C3 3 0.52 0.66 0.51 (and a warning that does not seem accurate to me.) In which(names(test) == C1):(which(test[3, ] == 0) - 1) : numerical expression has 3 elements: only the first used David, # which(test[3,] == 0.0) [1] 6 7 8 and in a:b a and b must be length 1 vectors (scalars) otherwise just the first element (in this case 6) is used. That leads us to the conclusion that writing the line above is not really the cleanest way or you intended something different Thanks, Uwe. I see my confusion. I did want 6 to be used and it looks as though I would not be getting in truouble this way, but a cleaner method would be to access only the first element of which(test[3, ] == 0): test[3,][ which(names(test) == C1) : (which(test[3,] == 0.0)[1]-1) ] David Seems to me that all of the element were used. I cannot explain that warning but am pretty sure it can be ignored. David Winsemius, MD Heritage Laboratories West Hartford, CT OK - making lots more headway. Thanks for your help. QUESTION: How do I handle the case where I'm testing for 0 and don't find it? In this case I need to all of the row from C1:C6. test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 test #C1 always the same so calculate it only once StartCol - which(names(test)==C1) #Print row 3 explicitly test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)] #Row 6 fails because 0 is not found test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)] EndCol - which(test[6,] == 0.0)[1]-1 EndCol Thanks, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
On Sun, Jul 5, 2009 at 12:30 PM, Henrique Dallazuannawww...@gmail.com wrote: Try this: subset(test[3,], select=C1:C6)[,subset(test[3,], select = C1:C6) 0] subset(test[6,], select=C1:C6)[,subset(test[6,], select = C1:C6) 0] I must admit I like this one. Pleasing to look at. It seems approachable. Thanks! If I understand this the second subset gets evaluated first producing either TRUE or FALSE, and then the first subset gets evaluated but only for the entries that are TRUE? Is that the process? Thanks, Mark On Sun, Jul 5, 2009 at 1:19 PM, Mark Knecht markkne...@gmail.com wrote: On Sun, Jul 5, 2009 at 8:18 AM, David Winsemiusdwinsem...@comcast.net wrote: On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote: David Winsemius wrote: So if your values are calculated from other values then consider using all.equal() And repeated applications of the testing criteria process are effective: test[3,][which(names(test)==C1):(which(test[3,] == 0.0)-1)] C1 C2 C3 3 0.52 0.66 0.51 (and a warning that does not seem accurate to me.) In which(names(test) == C1):(which(test[3, ] == 0) - 1) : numerical expression has 3 elements: only the first used David, # which(test[3,] == 0.0) [1] 6 7 8 and in a:b a and b must be length 1 vectors (scalars) otherwise just the first element (in this case 6) is used. That leads us to the conclusion that writing the line above is not really the cleanest way or you intended something different Thanks, Uwe. I see my confusion. I did want 6 to be used and it looks as though I would not be getting in truouble this way, but a cleaner method would be to access only the first element of which(test[3, ] == 0): test[3,][ which(names(test) == C1) : (which(test[3,] == 0.0)[1]-1) ] David Seems to me that all of the element were used. I cannot explain that warning but am pretty sure it can be ignored. David Winsemius, MD Heritage Laboratories West Hartford, CT OK - making lots more headway. Thanks for your help. QUESTION: How do I handle the case where I'm testing for 0 and don't find it? In this case I need to all of the row from C1:C6. test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 test #C1 always the same so calculate it only once StartCol - which(names(test)==C1) #Print row 3 explicitly test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)] #Row 6 fails because 0 is not found test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)] EndCol - which(test[6,] == 0.0)[1]-1 EndCol Thanks, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
Yes, First, select only columns C1 to C6, then look for values greater than 0, after use this to select the columns in original subset. On Sun, Jul 5, 2009 at 4:48 PM, Mark Knecht markkne...@gmail.com wrote: On Sun, Jul 5, 2009 at 12:30 PM, Henrique Dallazuannawww...@gmail.com wrote: Try this: subset(test[3,], select=C1:C6)[,subset(test[3,], select = C1:C6) 0] subset(test[6,], select=C1:C6)[,subset(test[6,], select = C1:C6) 0] I must admit I like this one. Pleasing to look at. It seems approachable. Thanks! If I understand this the second subset gets evaluated first producing either TRUE or FALSE, and then the first subset gets evaluated but only for the entries that are TRUE? Is that the process? Thanks, Mark On Sun, Jul 5, 2009 at 1:19 PM, Mark Knecht markkne...@gmail.com wrote: On Sun, Jul 5, 2009 at 8:18 AM, David Winsemiusdwinsem...@comcast.net wrote: On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote: David Winsemius wrote: So if your values are calculated from other values then consider using all.equal() And repeated applications of the testing criteria process are effective: test[3,][which(names(test)==C1):(which(test[3,] == 0.0)-1)] C1 C2 C3 3 0.52 0.66 0.51 (and a warning that does not seem accurate to me.) In which(names(test) == C1):(which(test[3, ] == 0) - 1) : numerical expression has 3 elements: only the first used David, # which(test[3,] == 0.0) [1] 6 7 8 and in a:b a and b must be length 1 vectors (scalars) otherwise just the first element (in this case 6) is used. That leads us to the conclusion that writing the line above is not really the cleanest way or you intended something different Thanks, Uwe. I see my confusion. I did want 6 to be used and it looks as though I would not be getting in truouble this way, but a cleaner method would be to access only the first element of which(test[3, ] == 0): test[3,][ which(names(test) == C1) : (which(test[3,] == 0.0)[1]-1) ] David Seems to me that all of the element were used. I cannot explain that warning but am pretty sure it can be ignored. David Winsemius, MD Heritage Laboratories West Hartford, CT OK - making lots more headway. Thanks for your help. QUESTION: How do I handle the case where I'm testing for 0 and don't find it? In this case I need to all of the row from C1:C6. test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 test #C1 always the same so calculate it only once StartCol - which(names(test)==C1) #Print row 3 explicitly test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)] #Row 6 fails because 0 is not found test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)] EndCol - which(test[6,] == 0.0)[1]-1 EndCol Thanks, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
On Sun, Jul 5, 2009 at 1:00 PM, Henrique Dallazuannawww...@gmail.com wrote: Yes, First, select only columns C1 to C6, then look for values greater than 0, after use this to select the columns in original subset. On Sun, Jul 5, 2009 at 4:48 PM, Mark Knecht markkne...@gmail.com wrote: On Sun, Jul 5, 2009 at 12:30 PM, Henrique Dallazuannawww...@gmail.com wrote: Try this: subset(test[3,], select=C1:C6)[,subset(test[3,], select = C1:C6) 0] subset(test[6,], select=C1:C6)[,subset(test[6,], select = C1:C6) 0] Thanks for the further explanation. One small difference in this approach is that in the general case I have to supply the name of the last column whereas the other just starts at the beginning and goes until it's done. No big deal and possibly an advantage as I could search a subset of the data on the row, i.e. supply both the start and stop columns, for instance C61:C120. This could be valuable as each column generally represents 1 minute further into the experiment, so that range would look at the second hour only. Cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
I think the root cause of a number of my coding problems in R right now is my lack of skills in reading and grabbing portions of the data out of arrays. I'm new at this. (And not a programmer) I need to find some good examples to read and test on that subject. If I could locate which column was called C1, then read row 3 from C1 up to the last value before a 0, I'd have proper data to plot for one line. Repeat as necessary through the array and I get all the lines. Doing the lines one at a time should allow me the opportunity to apply color or not plot based on values in the first few columns. Thanks, Mark test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 #Print array test Are the zeros always going to be arranged like this? i.e. for experiment there is a point at which all later values are zero? If so, the following is a much simpler way of getting to the core of your data, without fussing with overly complicated matrix indexing: library(reshape) testm - melt(test, id = c(A, B)) subset(testm, value 0) I suspect you will also find this form easier to plot and analyse. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
On Sun, Jul 5, 2009 at 1:44 PM, hadley wickhamh.wick...@gmail.com wrote: I think the root cause of a number of my coding problems in R right now is my lack of skills in reading and grabbing portions of the data out of arrays. I'm new at this. (And not a programmer) I need to find some good examples to read and test on that subject. If I could locate which column was called C1, then read row 3 from C1 up to the last value before a 0, I'd have proper data to plot for one line. Repeat as necessary through the array and I get all the lines. Doing the lines one at a time should allow me the opportunity to apply color or not plot based on values in the first few columns. Thanks, Mark test - data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10), C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10)) test-round(test,2) #Make array ragged test$C3[2]-0;test$C4[2]-0;test$C5[2]-0;test$C6[2]-0 test$C4[3]-0;test$C5[3]-0;test$C6[3]-0 test$C6[7]-0 test$C4[8]-0;test$C5[8]-0;test$C6[8]-0 #Print array test Are the zeros always going to be arranged like this? i.e. for experiment there is a point at which all later values are zero? If so, the following is a much simpler way of getting to the core of your data, without fussing with overly complicated matrix indexing: library(reshape) testm - melt(test, id = c(A, B)) subset(testm, value 0) I suspect you will also find this form easier to plot and analyse. Hadley -- http://had.co.nz/ Hi Hadley, I wanted to look at reshape. Yes, there exists a point in each row (unless I get to the end with all numbers) where I get to a zero and everything to the right is zero. I'm looking at ReShape. It's interesting but I clearly don't understand it yet so I'm reading your ReShaping data with the reshap package form 11/07. Interesting. I know so little about R that I'm sort of drowning at this point that it's hard for me to understand why this would make plotting easier. Analysis possibly. Just the way it goes when you get started with something new. In ReShape lingo I think I have ID's. They cover things like time, date, success/failure and a few other things of interest. Once the data starts on a row it is all data from there on to the end of the row. My initial goal is to make a line plot of the data on a single row. All the data points should connect together. There is no real interaction planned with data on other rows, at least at this time. Thanks for the pointers and the code stub. I'll be looking at this. Cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
At 10:42 AM -0700 7/5/09, Mark Knecht wrote: 2009/7/5 Uwe Ligges lig...@statistik.tu-dortmund.de: - a lot of other conversation omitted, to focus on the following Currently my data is one experiment per row, but that's wasting space as most experiments only take 20% of the row and 80% of the row is filled with 0's. I might want to make the array more narrow and have a flag somewhere in the 1st 10 columns that says the this row is a continuation row from the previous row. That way I could pack the array better, use less memory and when I do finally test for 0 I have a short line to traverse? Just an idea. Anyway, I suspect either of these will suit my short term needs. On to the next step. Cheers, Mark This suggests the use of a list rather than a data frame. With a list object, each element in the list would represent one experiment, and each would have the appropriate number of elements (values) for that experiment. Indeed, the original description, At 5:02 PM -0700 7/4/09, Mark Knecht wrote: OK, I guess I'm getting better at the data part of R. I wrote a program outside of R this morning to dump a bunch of experimental data. It's a sort of ragged array - about 700 rows and 400 columns, but the amount of data in each column varies based on the length of the experiment. The real data ends with a 0 following some non-zero value. It might be as short as 5 to 10 columns or as many as 390. The first 9 columns contain some data about when the experiment was run and a few other things I thought I might be interested in later. All the data starts in column 10 and has headers saying C1, C2, C3, C4, etc., up to C390 The first value for every experiment is some value I will normalize and then the values following are above and below the original tracing out the path that the experiment took, ending somewhere to the right but not a fixed number of readings. Is also suggestive of using a list(). For example, the metadata, i.e., the ... data about when the experiment was run and a few other things ... could be held separately, instead of embedded in the same array, from which it always has to be excluded in order to do an analysis. But I haven't followed the thread all that closely, so confess that my thoughts might be off the mark. -Don -- - Don MacQueen Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062 m...@llnl.gov __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
See if this example helps; show how to either plot the row or columns of a data frame: test - data.frame(C1=runif(10), C2=runif(10), C3=runif(10)) test C1C2C3 1 0.91287592 0.3390729 0.4346595 2 0.29360337 0.8394404 0.7125147 3 0.45906573 0.3466835 0.344 4 0.33239467 0.3337749 0.3253522 5 0.65087047 0.4763512 0.7570871 6 0.25801678 0.8921983 0.2026923 7 0.47854525 0.8643395 0.7111212 8 0.76631067 0.3899895 0.1216919 9 0.08424691 0.7773207 0.2454885 10 0.87532133 0.9606180 0.1433044 # this will plot each column (C1, C2, C3) matplot(test, type='o') # plot each row matplot(t(test), type='o') On Sat, Jul 4, 2009 at 8:02 PM, Mark Knechtmarkkne...@gmail.com wrote: OK, I guess I'm getting better at the data part of R. I wrote a program outside of R this morning to dump a bunch of experimental data. It's a sort of ragged array - about 700 rows and 400 columns, but the amount of data in each column varies based on the length of the experiment. The real data ends with a 0 following some non-zero value. It might be as short as 5 to 10 columns or as many as 390. The first 9 columns contain some data about when the experiment was run and a few other things I thought I might be interested in later. All the data starts in column 10 and has headers saying C1, C2, C3, C4, etc., up to C390 The first value for every experiment is some value I will normalize and then the values following are above and below the original tracing out the path that the experiment took, ending somewhere to the right but not a fixed number of readings. R reads it in fine and it looks good so far. Now, what I thought I might do with R is plot all 700 rows as individual lines, giving them some color based on info in columns 1-9, but suddenly I'm lost again in plots which I think should be fairly easy. How would I go about creating a plot for even one line, much less all of them? I don't have a row with 1,2,3,4 to us as the X axis values. I could go back and put one in the data but then I don't think that should really be required, or I could go back and make the headers for the whole array 1:400 and then plot from 10:400 but I thought I read that headers cannot start with numbers. Maybe the X axis values for a plot can actually be non-numeric C1, C2, C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe I should strip the C from C1 and be left with 1? Maybe the best thing is to copy the data for one line to another data.frame or array and then plot that? Just sort of lost looking at help files. Thanks for any ideas you can send along. Ask questions if I didn't explain my problem well enough. Not looking for anyone to do my work, just trying to get the concepts right Cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.