I don't know what your data are like, since you haven't given a reproducible example. I was imagining something like:

## generate fake data
age <- sample(20:90, 100, replace = TRUE)
year <- sample(1950:2000, 100, replace = TRUE)

##look at big table
table(age, year)

## categorize data
## see include.lowest and right arguments to cut
age.factor <- cut(age, breaks = seq(20, 90, by = 10),
                  include.lowest = TRUE)

year.factor <- cut(year, breaks = seq(1950, 2000, by = 10),
                   include.lowest = TRUE)

table(age.factor, year.factor)

moleps wrote:
I already did try the regression modeling approach. However the epidemiologists (referee) turns out to be quite fond of comparing the incidence rates to different standard populations, hence the need for this labourius approach. And trying the "cutting" approach I ended up with :

table (age5)
age5
(0,5] (5,10] (10,15] (15,20] (20,25] (25,30] (30,35] (35,40] (40,45] (45,50] (50,55] (55,60] (60,65] (65,70] (70,75] (75,80] (80,85] (85,100] 35 34 33 47 51 109 157 231 362 511 745 926 1002 866 547 247 82 18
table (yr5)
yr5
(1950,1955] (1955,1960] (1960,1965] (1965,1970] (1970,1975] (1975,1980] (1980,1985] (1985,1990] (1990,1995] (1995,2000] (2000,2005] (2005,2009] 3 5 5 5 5 5 5 5 5 5 5 3
table (yr5,age5)
Error in table(yr5, age5) : all arguments must have the same length

Sincerely,
M





On 5. apr. 2010, at 20.59, Bert Gunter wrote:

You have tempted, and being weak, I yield to temptation:

"Any good ideas?"

Yes. Don't do this.

(what you probably really want to do is fit a model with age as a factor,
which can be done statistically e.g. by logistic regression; or graphically
using conditioning plots, e.g. via trellis graphics (the lattice package).
This avoids the arbitrariness and discontinuities of binning by age range.)

Bert Gunter
Genentech Nonclinical Biostatistics

-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of moleps
Sent: Monday, April 05, 2010 11:46 AM
To: r-help@r-project.org
Subject: [R] Data manipulation problem

Dear R´ers.

I´ve got a dataset with age and year of diagnosis. In order to
age-standardize the incidence I need to transform the data into a matrix
with age-groups (divided in 5 or 10 years) along one axis and year divided
into 5 years along the other axis. Each cell should contain the number of
cases for that age group and for that period.
I.e.
My data format now is
ID-age (to one decimal)-year(yearly data).

What I´d like is

age 1960-1965 1966-1970 etc...
0-5 3 8 10 15
6-10 2 5 8 13
etc..


Any good ideas?

Regards,
M

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to