[R] Help with Hmisc, cut2, split and quantile
Hello, I have a set of data with two columns: Target and Actual. A http://n4.nabble.com/file/n1584647/Sample_table.txt Sample_table.txt is attached but the data looks like this: Actual Target -0.125 0.016124906 0.135 0.120799865 ... ... ... ... I want to be able to break the data into tables based on quantiles in the Target column. I can see (using cut2, and also quantile) how to get the barrier points between the different quantiles, and I can see how I would achieve this if I was just looking to split up a vector. However I am trying to break up the whole table based on those quantiles, not just the vector. The following code shows me the ranges for the deciles of the Target data: library(Hmisc) read_data=read.table(C:/Sample table.txt, head = T) table(cut2(Read_data$Target,g=10)) However I would like to be able to break the table into ten separate tables, each with both Actual and Target data, based on the Target data deciles: top_decile = ...(top decile of read_data, based on Target data) next_decile = ...and so on... bottom_decile = ... That way I could manipulate the deciles, graph them separately (and together) and so on, just as easily as I can the whole table. I'm sure this must be simple, but I can't see the way forward. I have also looked at split() and quantile() but have not been able to get them to achieve what I am after. Can anybody see a simple way foward on this? Thanks, Guy -- View this message in context: http://n4.nabble.com/Help-with-Hmisc-cut2-split-and-quantile-tp1584647p1584647.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with Hmisc, cut2, split and quantile
On 2010-03-08 8:47, Guy Green wrote: Hello, I have a set of data with two columns: Target and Actual. A http://n4.nabble.com/file/n1584647/Sample_table.txt Sample_table.txt is attached but the data looks like this: Actual Target -0.125 0.016124906 0.135 0.120799865 ... ... ... ... I want to be able to break the data into tables based on quantiles in the Target column. I can see (using cut2, and also quantile) how to get the barrier points between the different quantiles, and I can see how I would achieve this if I was just looking to split up a vector. However I am trying to break up the whole table based on those quantiles, not just the vector. The following code shows me the ranges for the deciles of the Target data: library(Hmisc) read_data=read.table(C:/Sample table.txt, head = T) table(cut2(Read_data$Target,g=10)) However I would like to be able to break the table into ten separate tables, each with both Actual and Target data, based on the Target data deciles: top_decile = ...(top decile of read_data, based on Target data) next_decile = ...and so on... bottom_decile = ... I would just add a factor variable indicating to which decile a particular observation belongs: dat$DEC - with(dat, cut(Target, breaks=10, labels=1:10)) If you really want to have separate data frames you can then split on the decile: L - split(dat, dat$DEC) -Peter Ehlers That way I could manipulate the deciles, graph them separately (and together) and so on, just as easily as I can the whole table. I'm sure this must be simple, but I can't see the way forward. I have also looked at split() and quantile() but have not been able to get them to achieve what I am after. Can anybody see a simple way foward on this? Thanks, Guy -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with Hmisc, cut2, split and quantile
try as.numeric(read_data$DEC) this should turn it into a numeric variable that you can work with hth David Freedman CDC, Atlanta Guy Green wrote: Hi Peter others, Thanks (Peter) - that gets me really close to what I was hoping for. The one problem I have is that the cut approach breaks the data into intervals based on the absolute value of the Target data, rather than their frequency. In other words, if the data ranged from 0 to 50, the data would be separated into 0-5, 5-10 and so on, regardless of the frequency within those categories. However I want to get the data into deciles. The code that does this (incorporating Peter's) is: read_data=read.table(C:/Sample table.txt, head = T) read_data$DEC - with(read_data, cut(Target, breaks=10, labels=1:10)) L - split(read_data, read_data$DEC) This means that I can get separate data frames, such as L$'10', which comes out tidy, but only containing 2 data items (the sample has 63 rows, so each decile should have 6+ data items): ActualTarget DEC 9 0.572 0.3778386 10 31 0.2990.3546606 10 If I try to adjust this to get deciles using cut2(), I can break the data into deciles as follows: read_data=read.table(C:/Sample table.txt, head = T) read_data$DEC - with(read_data, cut2(read_data$Target, g=10), labels=1:10) L - split(read_data, read_data$DEC) However this time, while the data is broken into even data frames, the labels for the separate data frames are unuseable, e.g.: $`[ 0.26477, 0.37784]` ActualTarget DEC 6 0.243 0.2650960[ 0.26477, 0.37784] 9 0.572 0.3778386[ 0.26477, 0.37784] 10 -0.049 0.3212681[ 0.26477, 0.37784] 15 0.780 0.2778518[ 0.26477, 0.37784] 31 0.299 0.3546606[ 0.26477, 0.37784] 33 0.105 0.2647676[ 0.26477, 0.37784] Could anyone suggest a way of rearranging this to make the labels useable again? Sample data is reattached http://n4.nabble.com/file/n1585427/Sample_table.txt Sample_table.txt . Thanks, Guy Peter Ehlers wrote: On 2010-03-08 8:47, Guy Green wrote: Hello, I have a set of data with two columns: Target and Actual. A http://n4.nabble.com/file/n1584647/Sample_table.txt Sample_table.txt is attached but the data looks like this: Actual Target -0.125 0.016124906 0.135 0.120799865 ... ... ... ... I want to be able to break the data into tables based on quantiles in the Target column. I can see (using cut2, and also quantile) how to get the barrier points between the different quantiles, and I can see how I would achieve this if I was just looking to split up a vector. However I am trying to break up the whole table based on those quantiles, not just the vector. However I would like to be able to break the table into ten separate tables, each with both Actual and Target data, based on the Target data deciles: top_decile = ...(top decile of read_data, based on Target data) next_decile = ...and so on... bottom_decile = ... I would just add a factor variable indicating to which decile a particular observation belongs: dat$DEC - with(dat, cut(Target, breaks=10, labels=1:10)) If you really want to have separate data frames you can then split on the decile: L - split(dat, dat$DEC) -Peter Ehlers -- Peter Ehlers University of Calgary -- View this message in context: http://n4.nabble.com/Help-with-Hmisc-cut2-split-and-quantile-tp1584647p1585503.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with Hmisc, cut2, split and quantile
Hi Peter others, Thanks (Peter) - that gets me really close to what I was hoping for. The one problem I have is that the cut approach breaks the data into intervals based on the absolute value of the Target data, rather than their frequency. In other words, if the data ranged from 0 to 50, the data would be separated into 0-5, 5-10 and so on, regardless of the frequency within those categories. However I want to get the data into deciles. The code that does this (incorporating Peter's) is: read_data=read.table(C:/Sample table.txt, head = T) read_data$DEC - with(read_data, cut(Target, breaks=10, labels=1:10)) L - split(read_data, read_data$DEC) This means that I can get separate data frames, such as L$'10', which comes out tidy, but only containing 2 data items (the sample has 63 rows, so each decile should have 6+ data items): ActualTarget DEC 9 0.572 0.3778386 10 31 0.2990.3546606 10 If I try to adjust this to get deciles using cut2(), I can break the data into deciles as follows: read_data=read.table(C:/Sample table.txt, head = T) read_data$DEC - with(read_data, cut2(read_data$Target, g=10), labels=1:10) L - split(read_data, read_data$DEC) However this time, while the data is broken into even data frames, the labels for the separate data frames are unuseable, e.g.: $`[ 0.26477, 0.37784]` ActualTarget DEC 6 0.243 0.2650960[ 0.26477, 0.37784] 9 0.572 0.3778386[ 0.26477, 0.37784] 10 -0.049 0.3212681[ 0.26477, 0.37784] 15 0.780 0.2778518[ 0.26477, 0.37784] 31 0.299 0.3546606[ 0.26477, 0.37784] 33 0.105 0.2647676[ 0.26477, 0.37784] Could anyone suggest a way of rearranging this to make the labels useable again? Sample data is reattached http://n4.nabble.com/file/n1585427/Sample_table.txt Sample_table.txt . Thanks, Guy Peter Ehlers wrote: On 2010-03-08 8:47, Guy Green wrote: Hello, I have a set of data with two columns: Target and Actual. A http://n4.nabble.com/file/n1584647/Sample_table.txt Sample_table.txt is attached but the data looks like this: Actual Target -0.125 0.016124906 0.1350.120799865 ... ... ... ... I want to be able to break the data into tables based on quantiles in the Target column. I can see (using cut2, and also quantile) how to get the barrier points between the different quantiles, and I can see how I would achieve this if I was just looking to split up a vector. However I am trying to break up the whole table based on those quantiles, not just the vector. However I would like to be able to break the table into ten separate tables, each with both Actual and Target data, based on the Target data deciles: top_decile = ...(top decile of read_data, based on Target data) next_decile = ...and so on... bottom_decile = ... I would just add a factor variable indicating to which decile a particular observation belongs: dat$DEC - with(dat, cut(Target, breaks=10, labels=1:10)) If you really want to have separate data frames you can then split on the decile: L - split(dat, dat$DEC) -Peter Ehlers -- Peter Ehlers University of Calgary -- View this message in context: http://n4.nabble.com/Help-with-Hmisc-cut2-split-and-quantile-tp1584647p1585427.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with Hmisc, cut2, split and quantile
On 2010-03-08 18:00, Guy Green wrote: Hi Peter others, Thanks (Peter) - that gets me really close to what I was hoping for. The one problem I have is that the cut approach breaks the data into intervals based on the absolute value of the Target data, rather than their frequency. In other words, if the data ranged from 0 to 50, the data would be separated into 0-5, 5-10 and so on, regardless of the frequency within those categories. However I want to get the data into deciles. The code that does this (incorporating Peter's) is: read_data=read.table(C:/Sample table.txt, head = T) read_data$DEC- with(read_data, cut(Target, breaks=10, labels=1:10)) L- split(read_data, read_data$DEC) This means that I can get separate data frames, such as L$'10', which comes out tidy, but only containing 2 data items (the sample has 63 rows, so each decile should have 6+ data items): ActualTarget DEC 9 0.572 0.3778386 10 31 0.2990.3546606 10 If I try to adjust this to get deciles using cut2(), I can break the data into deciles as follows: read_data=read.table(C:/Sample table.txt, head = T) read_data$DEC- with(read_data, cut2(read_data$Target, g=10), labels=1:10) L- split(read_data, read_data$DEC) However this time, while the data is broken into even data frames, the labels for the separate data frames are unuseable, e.g.: $`[ 0.26477, 0.37784]` ActualTarget DEC 6 0.243 0.2650960[ 0.26477, 0.37784] 9 0.572 0.3778386[ 0.26477, 0.37784] 10 -0.049 0.3212681[ 0.26477, 0.37784] 15 0.780 0.2778518[ 0.26477, 0.37784] 31 0.299 0.3546606[ 0.26477, 0.37784] 33 0.105 0.2647676[ 0.26477, 0.37784] Could anyone suggest a way of rearranging this to make the labels useable again? Sample data is reattached http://n4.nabble.com/file/n1585427/Sample_table.txt Sample_table.txt . I think that the easiest way would be to relabel the levels of DEC: read_data$DEC - factor(read_data$DEC, labels = 1:10) or, since I would prefer letters as factor levels: read_data$DEC - factor(read_data$DEC, labels = LETTERS[1:10]) Another way would be to use cut2() with onlycuts=TRUE to get the breaks and then use these with cut() as in my original post: brks - cut2(read_data$Target, g=10, onlycuts=TRUE) read_data$DEC- with(read_data, cut(Target, breaks=brks, labels=1:10)) But I still don't see why you want a list of separate data frames. For most analyses, it's more convenient to just use the factor variable to subset the data as needed. -Peter Ehlers Thanks, Guy Peter Ehlers wrote: On 2010-03-08 8:47, Guy Green wrote: Hello, I have a set of data with two columns: Target and Actual. A http://n4.nabble.com/file/n1584647/Sample_table.txt Sample_table.txt is attached but the data looks like this: Actual Target -0.125 0.016124906 0.135 0.120799865 ... ... ... ... I want to be able to break the data into tables based on quantiles in the Target column. I can see (using cut2, and also quantile) how to get the barrier points between the different quantiles, and I can see how I would achieve this if I was just looking to split up a vector. However I am trying to break up the whole table based on those quantiles, not just the vector. However I would like to be able to break the table into ten separate tables, each with both Actual and Target data, based on the Target data deciles: top_decile = ...(top decile of read_data, based on Target data) next_decile = ...and so on... bottom_decile = ... I would just add a factor variable indicating to which decile a particular observation belongs: dat$DEC- with(dat, cut(Target, breaks=10, labels=1:10)) If you really want to have separate data frames you can then split on the decile: L- split(dat, dat$DEC) -Peter Ehlers -- Peter Ehlers University of Calgary -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.