[R] Help with Hmisc, cut2, split and quantile

2010-03-08 Thread Guy Green

Hello,
I have a set of data with two columns: Target and Actual.  A 
http://n4.nabble.com/file/n1584647/Sample_table.txt Sample_table.txt  is
attached but the data looks like this:

Actual  Target
-0.125  0.016124906
0.135   0.120799865
... ...
... ...

I want to be able to break the data into tables based on quantiles in the
Target column.  I can see (using cut2, and also quantile) how to get the
barrier points between the different quantiles, and I can see how I would
achieve this if I was just looking to split up a vector.  However I am
trying to break up the whole table based on those quantiles, not just the
vector.

The following code shows me the ranges for the deciles of the Target data:
library(Hmisc)
read_data=read.table(C:/Sample table.txt, head = T)
table(cut2(Read_data$Target,g=10))

However I would like to be able to break the table into ten separate tables,
each with both Actual and Target data, based on the Target data
deciles:

top_decile = ...(top decile of read_data, based on Target data)
next_decile = ...and so on...
bottom_decile = ...

That way I could manipulate the deciles, graph them separately (and
together) and so on, just as easily as I can the whole table.  I'm sure this
must be simple, but I can't see the way forward.  I have also looked at
split() and quantile() but have not been able to get them to achieve what I
am after.  Can anybody see a simple way foward on this?

Thanks,
Guy
-- 
View this message in context: 
http://n4.nabble.com/Help-with-Hmisc-cut2-split-and-quantile-tp1584647p1584647.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with Hmisc, cut2, split and quantile

2010-03-08 Thread Peter Ehlers

On 2010-03-08 8:47, Guy Green wrote:


Hello,
I have a set of data with two columns: Target and Actual.  A
http://n4.nabble.com/file/n1584647/Sample_table.txt Sample_table.txt  is
attached but the data looks like this:

Actual  Target
-0.125  0.016124906
0.135   0.120799865
... ...
... ...

I want to be able to break the data into tables based on quantiles in the
Target column.  I can see (using cut2, and also quantile) how to get the
barrier points between the different quantiles, and I can see how I would
achieve this if I was just looking to split up a vector.  However I am
trying to break up the whole table based on those quantiles, not just the
vector.

The following code shows me the ranges for the deciles of the Target data:
library(Hmisc)
read_data=read.table(C:/Sample table.txt, head = T)
table(cut2(Read_data$Target,g=10))

However I would like to be able to break the table into ten separate tables,
each with both Actual and Target data, based on the Target data
deciles:

top_decile = ...(top decile of read_data, based on Target data)
next_decile = ...and so on...
bottom_decile = ...


I would just add a factor variable indicating to which decile
a particular observation belongs:

 dat$DEC - with(dat, cut(Target, breaks=10, labels=1:10))

If you really want to have separate data frames you can then
split on the decile:

 L - split(dat, dat$DEC)


   -Peter Ehlers



That way I could manipulate the deciles, graph them separately (and
together) and so on, just as easily as I can the whole table.  I'm sure this
must be simple, but I can't see the way forward.  I have also looked at
split() and quantile() but have not been able to get them to achieve what I
am after.  Can anybody see a simple way foward on this?

Thanks,
Guy


--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with Hmisc, cut2, split and quantile

2010-03-08 Thread David Freedman

try 
as.numeric(read_data$DEC)

this should turn it into a numeric variable that you can work with

hth
David Freedman
CDC, Atlanta


Guy Green wrote:
 
 Hi Peter  others,
 
 Thanks (Peter) - that gets me really close to what I was hoping for.
 
 The one problem I have is that the cut approach breaks the data into
 intervals based on the absolute value of the Target data, rather than
 their frequency.  In other words, if the data ranged from 0 to 50, the
 data would be separated into 0-5, 5-10 and so on, regardless of the
 frequency within those categories.  However I want to get the data into
 deciles.
 
 The code that does this (incorporating Peter's) is:
 
 read_data=read.table(C:/Sample table.txt, head = T)
 read_data$DEC - with(read_data, cut(Target, breaks=10, labels=1:10))
 L - split(read_data, read_data$DEC)
 
 This means that I can get separate data frames, such as L$'10', which
 comes out tidy, but only containing 2 data items (the sample has 63 rows,
 so each decile should have 6+ data items):
  ActualTarget   DEC
 9   0.572 0.3778386   10
 31  0.2990.3546606   10
 
 If I try to adjust this to get deciles using cut2(), I can break the data
 into deciles as follows:
 
 read_data=read.table(C:/Sample table.txt, head = T)
 read_data$DEC - with(read_data, cut2(read_data$Target, g=10),
 labels=1:10)
 L - split(read_data, read_data$DEC)
 
 However this time, while the data is broken into even data frames, the
 labels for the separate data frames are unuseable, e.g.:
 $`[ 0.26477, 0.37784]`
 ActualTarget DEC
 6   0.243   0.2650960[ 0.26477, 0.37784]
 9   0.572   0.3778386[ 0.26477, 0.37784]
 10 -0.049  0.3212681[ 0.26477, 0.37784]
 15  0.780  0.2778518[ 0.26477, 0.37784]
 31  0.299  0.3546606[ 0.26477, 0.37784]
 33  0.105  0.2647676[ 0.26477, 0.37784]
 
 Could anyone suggest a way of rearranging this to make the labels useable
 again?  Sample data is reattached
 http://n4.nabble.com/file/n1585427/Sample_table.txt Sample_table.txt .
 
 Thanks,
 Guy
 
 
 
 Peter Ehlers wrote:
 
 On 2010-03-08 8:47, Guy Green wrote:

 Hello,
 I have a set of data with two columns: Target and Actual.  A
 http://n4.nabble.com/file/n1584647/Sample_table.txt Sample_table.txt  is
 attached but the data looks like this:

 Actual  Target
 -0.125  0.016124906
 0.135   0.120799865
 ... ...
 ... ...

 I want to be able to break the data into tables based on quantiles in
 the
 Target column.  I can see (using cut2, and also quantile) how to get
 the
 barrier points between the different quantiles, and I can see how I
 would
 achieve this if I was just looking to split up a vector.  However I am
 trying to break up the whole table based on those quantiles, not just
 the
 vector.

 However I would like to be able to break the table into ten separate
 tables,
 each with both Actual and Target data, based on the Target data
 deciles:

 top_decile = ...(top decile of read_data, based on Target data)
 next_decile = ...and so on...
 bottom_decile = ...
 
 I would just add a factor variable indicating to which decile
 a particular observation belongs:
 
   dat$DEC - with(dat, cut(Target, breaks=10, labels=1:10))
 
 If you really want to have separate data frames you can then
 split on the decile:
 
   L - split(dat, dat$DEC)
 
 -Peter Ehlers
 -- 
 Peter Ehlers
 University of Calgary
 
 
 
 
-- 
View this message in context: 
http://n4.nabble.com/Help-with-Hmisc-cut2-split-and-quantile-tp1584647p1585503.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with Hmisc, cut2, split and quantile

2010-03-08 Thread Guy Green

Hi Peter  others,

Thanks (Peter) - that gets me really close to what I was hoping for.

The one problem I have is that the cut approach breaks the data into
intervals based on the absolute value of the Target data, rather than
their frequency.  In other words, if the data ranged from 0 to 50, the data
would be separated into 0-5, 5-10 and so on, regardless of the frequency
within those categories.  However I want to get the data into deciles.

The code that does this (incorporating Peter's) is:

read_data=read.table(C:/Sample table.txt, head = T)
read_data$DEC - with(read_data, cut(Target, breaks=10, labels=1:10))
L - split(read_data, read_data$DEC)

This means that I can get separate data frames, such as L$'10', which comes
out tidy, but only containing 2 data items (the sample has 63 rows, so each
decile should have 6+ data items):
 ActualTarget   DEC
9   0.572 0.3778386   10
31  0.2990.3546606   10

If I try to adjust this to get deciles using cut2(), I can break the data
into deciles as follows:

read_data=read.table(C:/Sample table.txt, head = T)
read_data$DEC - with(read_data, cut2(read_data$Target, g=10), labels=1:10)
L - split(read_data, read_data$DEC)

However this time, while the data is broken into even data frames, the
labels for the separate data frames are unuseable, e.g.:
$`[ 0.26477, 0.37784]`
ActualTarget DEC
6   0.243   0.2650960[ 0.26477, 0.37784]
9   0.572   0.3778386[ 0.26477, 0.37784]
10 -0.049  0.3212681[ 0.26477, 0.37784]
15  0.780  0.2778518[ 0.26477, 0.37784]
31  0.299  0.3546606[ 0.26477, 0.37784]
33  0.105  0.2647676[ 0.26477, 0.37784]

Could anyone suggest a way of rearranging this to make the labels useable
again?  Sample data is reattached
http://n4.nabble.com/file/n1585427/Sample_table.txt Sample_table.txt .

Thanks,
Guy



Peter Ehlers wrote:
 
 On 2010-03-08 8:47, Guy Green wrote:

 Hello,
 I have a set of data with two columns: Target and Actual.  A
 http://n4.nabble.com/file/n1584647/Sample_table.txt Sample_table.txt  is
 attached but the data looks like this:

 Actual   Target
 -0.125   0.016124906
 0.1350.120799865
 ...  ...
 ...  ...

 I want to be able to break the data into tables based on quantiles in the
 Target column.  I can see (using cut2, and also quantile) how to get
 the
 barrier points between the different quantiles, and I can see how I would
 achieve this if I was just looking to split up a vector.  However I am
 trying to break up the whole table based on those quantiles, not just the
 vector.

 However I would like to be able to break the table into ten separate
 tables,
 each with both Actual and Target data, based on the Target data
 deciles:

 top_decile = ...(top decile of read_data, based on Target data)
 next_decile = ...and so on...
 bottom_decile = ...
 
 I would just add a factor variable indicating to which decile
 a particular observation belongs:
 
   dat$DEC - with(dat, cut(Target, breaks=10, labels=1:10))
 
 If you really want to have separate data frames you can then
 split on the decile:
 
   L - split(dat, dat$DEC)
 
 -Peter Ehlers
 -- 
 Peter Ehlers
 University of Calgary
 
 

-- 
View this message in context: 
http://n4.nabble.com/Help-with-Hmisc-cut2-split-and-quantile-tp1584647p1585427.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with Hmisc, cut2, split and quantile

2010-03-08 Thread Peter Ehlers

On 2010-03-08 18:00, Guy Green wrote:


Hi Peter  others,

Thanks (Peter) - that gets me really close to what I was hoping for.

The one problem I have is that the cut approach breaks the data into
intervals based on the absolute value of the Target data, rather than
their frequency.  In other words, if the data ranged from 0 to 50, the data
would be separated into 0-5, 5-10 and so on, regardless of the frequency
within those categories.  However I want to get the data into deciles.

The code that does this (incorporating Peter's) is:

read_data=read.table(C:/Sample table.txt, head = T)
read_data$DEC- with(read_data, cut(Target, breaks=10, labels=1:10))
L- split(read_data, read_data$DEC)

This means that I can get separate data frames, such as L$'10', which comes
out tidy, but only containing 2 data items (the sample has 63 rows, so each
decile should have 6+ data items):
  ActualTarget   DEC
9   0.572 0.3778386   10
31  0.2990.3546606   10

If I try to adjust this to get deciles using cut2(), I can break the data
into deciles as follows:

read_data=read.table(C:/Sample table.txt, head = T)
read_data$DEC- with(read_data, cut2(read_data$Target, g=10), labels=1:10)
L- split(read_data, read_data$DEC)

However this time, while the data is broken into even data frames, the
labels for the separate data frames are unuseable, e.g.:
$`[ 0.26477, 0.37784]`
 ActualTarget DEC
6   0.243   0.2650960[ 0.26477, 0.37784]
9   0.572   0.3778386[ 0.26477, 0.37784]
10 -0.049  0.3212681[ 0.26477, 0.37784]
15  0.780  0.2778518[ 0.26477, 0.37784]
31  0.299  0.3546606[ 0.26477, 0.37784]
33  0.105  0.2647676[ 0.26477, 0.37784]

Could anyone suggest a way of rearranging this to make the labels useable
again?  Sample data is reattached
http://n4.nabble.com/file/n1585427/Sample_table.txt Sample_table.txt .


I think that the easiest way would be to relabel the levels of DEC:

 read_data$DEC - factor(read_data$DEC, labels = 1:10)

or, since I would prefer letters as factor levels:

 read_data$DEC - factor(read_data$DEC, labels = LETTERS[1:10])

Another way would be to use cut2() with onlycuts=TRUE to get the
breaks and then use these with cut() as in my original post:

 brks - cut2(read_data$Target, g=10, onlycuts=TRUE)
 read_data$DEC- with(read_data,
  cut(Target, breaks=brks, labels=1:10))

But I still don't see why you want a list of separate data
frames. For most analyses, it's more convenient to just use the
factor variable to subset the data as needed.

 -Peter Ehlers



Thanks,
Guy



Peter Ehlers wrote:


On 2010-03-08 8:47, Guy Green wrote:


Hello,
I have a set of data with two columns: Target and Actual.  A
http://n4.nabble.com/file/n1584647/Sample_table.txt Sample_table.txt  is
attached but the data looks like this:

Actual  Target
-0.125  0.016124906
0.135   0.120799865
... ...
... ...

I want to be able to break the data into tables based on quantiles in the
Target column.  I can see (using cut2, and also quantile) how to get
the
barrier points between the different quantiles, and I can see how I would
achieve this if I was just looking to split up a vector.  However I am
trying to break up the whole table based on those quantiles, not just the
vector.

However I would like to be able to break the table into ten separate
tables,
each with both Actual and Target data, based on the Target data
deciles:

top_decile = ...(top decile of read_data, based on Target data)
next_decile = ...and so on...
bottom_decile = ...


I would just add a factor variable indicating to which decile
a particular observation belongs:

   dat$DEC- with(dat, cut(Target, breaks=10, labels=1:10))

If you really want to have separate data frames you can then
split on the decile:

   L- split(dat, dat$DEC)

 -Peter Ehlers
--
Peter Ehlers
University of Calgary






--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.