subject:"\[R\] persuade tabulate function to count NAs in a data frame"

[R] persuade tabulate function to count NAs in a data frame

2011-03-19 Thread Bodnar Laszlo EB_HU

Hi,

I'd like to ask you a question again. It is basically about data frames, NAs 
and tabulate function.

I have this data frame. I already used this in one of the previous questions of 
mine. It intentionally looks this simple, my real 'df' dataframe is much bigger 
actually and again, I am not willing to annoy anyone with huge databases... So, 
my database:

id -c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
a -c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
b -c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
c -c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
d -c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
e -c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,4)
df -data.frame(id,a,b,c,d,e)
df

I have managed to calculate the distributions of the numbers occurring in 
columns 'b' to 'e' but considering the fact at the very same time that these 
distributions should be 'groupped by' the id numbers in column 'id'. It works 
fine, check it -

matrix(matrix(unlist(lapply(df[,(-(1))],function(x) 
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,2] 
[[1]])),ncol=3,nrow=3,byrow=TRUE)
matrix(matrix(unlist(lapply(df[,(-(1))],function(x) 
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,3] 
[[2]])),ncol=3,nrow=3,byrow=TRUE)
matrix(matrix(unlist(lapply(df[,(-(1))],function(x) 
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,4] 
[[3]])),ncol=3,nrow=3,byrow=TRUE)
matrix(matrix(unlist(lapply(df[,(-(1))],function(x) 
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,5] 
[[4]])),ncol=3,nrow=3,byrow=TRUE)
matrix(matrix(unlist(lapply(df[,(-(1))],function(x) 
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,6] 
[[5]])),ncol=4,nrow=3,byrow=TRUE)

Now my problem is: what if my data frame contains NA values here and there and 
what if I want my in-built tabulate function to collect these NAs as well? So 
what if I want it to count how many occurrences I have from these NAs?

Here's my modified data frame with the NAs:
id -c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
a -c(NA,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
b -c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
c -c(1,3,2,3,2,1,2,3,3,2,2,3,NA,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
d -c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
e -c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,NA,1,4)
df -data.frame(id,a,b,c,d,e)
df

At first I tried something like this (you see, the only thing I did was that I 
tried to apply this exclude=NULL thing).
unlist(lapply(df[,(-(1))],function(x) 
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,2],exclude=NULL [[1]])

At least my code realizes the fact that I have 4 different levels in column 'a' 
(1,2,3,NA) and not only three (1,2,3). Check it here:
nlevels(factor(df[,2],exclude=NULL))

But you see in the result that somehow it could not calculate the NAs. It says
3  0  6  0(!)  4  3  3  0  4  1  5  0

Instead of the correct:
3  0  6  1(!)  4  3  3  0  4  1  5  0

Or in case of:
unlist(lapply(df[,(-(1))],function(x) 
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,4],exclude=NULL [[3]])

It says
2  4  4  0  2  3  4  0(!)  1  5  4  0

Instead of the correct
2  4  4  0  2  3  4  1(!)  1  5  4  0
etc.

Does someone have any ideas how to persuade the function tabulate to count 
NAs? Is it possible at all?
Thanks very much and have a pleasant weekend,
Laszlo


Ez az e-mail Ã©s az Ã¶sszes hozzÃ¡ tartozÃ³ csatolt mellÃ©klet titkos Ã©s/vagy 
jogilag, szakmailag vagy mÃ¡s mÃ³don vÃ©dett informÃ¡ciÃ³t tartalmazhat. 
Amennyiben nem Ãn a levÃ©l cÃmzettje akkor a levÃ©l tartalmÃ¡nak kÃ¶zlÃ©se, 
reprodukÃ¡lÃ¡sa, mÃ¡solÃ¡sa, vagy egyÃ©b mÃ¡s Ãºton tÃ¶rtÃ©nÅ terjesztÃ©se, 
felhasznÃ¡lÃ¡sa szigorÃºan tilos. Amennyiben tÃ©vedÃ©sbÅl kapta meg ezt az 
Ã¼zenetet kÃ©rjÃ¼k azonnal Ã©rtesÃtse az Ã¼zenet kÃ¼ldÅjÃ©t. Az Erste Bank 
Hungary Zrt. (EBH) nem vÃ¡llal felelÅssÃ©get az informÃ¡ciÃ³ teljes Ã©s pontos 
- cÃmzett(ek)hez tÃ¶rtÃ©nÅ - eljuttatÃ¡sÃ¡Ã©rt, valamint semmilyen 
kÃ©sÃ©sÃ©rt, kapcsolat megszakadÃ¡sbÃ³l eredÅ hibÃ¡Ã©rt, vagy az informÃ¡ciÃ³ 
felhasznÃ¡lÃ¡sÃ¡bÃ³l vagy annak megbÃzhatatlansÃ¡gÃ¡bÃ³l eredÅ kÃ¡rÃ©rt.

Az Ã¼zenetek EBH-n kÃvÃ¼li kÃ¼ldÅje vagy cÃmzettje tudomÃ¡sul veszi Ã©s 
hozzÃ¡jÃ¡rul, hogy az Ã¼zenetekhez mÃ¡s banki alkalmazott is hozzÃ¡fÃ©rhet az 
EBH folytonos munkamenetÃ©nek biztosÃtÃ¡sa Ã©rdekÃ©ben.


This e-mail and any attached files are confidential and/...{{dropped:19}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] persuade tabulate function to count NAs in a data frame

2011-03-19 Thread Gavin Simpson

On Sat, 2011-03-19 at 15:58 +0100, Bodnar Laszlo EB_HU wrote:
 Hi,

I'll top-post as the original Q is very lengthy:

tabs -lapply(df[,2:6], 
  function(x, id){ t(table(addNA(x), id, useNA = ifany)) }, df$id)

is one way of doing what you want. More details are here:

http://stackoverflow.com/questions/5362702/persuading-tabulate-function-to-count-nas-in-a-data-frame-in-r

where you also posted your Q.

HTH

G


 I'd like to ask you a question again. It is basically about data frames, NAs 
 and tabulate function.
 
 I have this data frame. I already used this in one of the previous questions 
 of mine. It intentionally looks this simple, my real 'df' dataframe is much 
 bigger actually and again, I am not willing to annoy anyone with huge 
 databases... So, my database:
 
 id -c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
 a -c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
 b -c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
 c -c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
 d -c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
 e -c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,4)
 df -data.frame(id,a,b,c,d,e)
 df
 
 I have managed to calculate the distributions of the numbers occurring in 
 columns 'b' to 'e' but considering the fact at the very same time that these 
 distributions should be 'groupped by' the id numbers in column 'id'. It works 
 fine, check it -
 
 matrix(matrix(unlist(lapply(df[,(-(1))],function(x) 
 tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,2] 
 [[1]])),ncol=3,nrow=3,byrow=TRUE)
 matrix(matrix(unlist(lapply(df[,(-(1))],function(x) 
 tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,3] 
 [[2]])),ncol=3,nrow=3,byrow=TRUE)
 matrix(matrix(unlist(lapply(df[,(-(1))],function(x) 
 tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,4] 
 [[3]])),ncol=3,nrow=3,byrow=TRUE)
 matrix(matrix(unlist(lapply(df[,(-(1))],function(x) 
 tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,5] 
 [[4]])),ncol=3,nrow=3,byrow=TRUE)
 matrix(matrix(unlist(lapply(df[,(-(1))],function(x) 
 tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,6] 
 [[5]])),ncol=4,nrow=3,byrow=TRUE)
 
 Now my problem is: what if my data frame contains NA values here and there 
 and what if I want my in-built tabulate function to collect these NAs as 
 well? So what if I want it to count how many occurrences I have from these 
 NAs?
 
 Here's my modified data frame with the NAs:
 id -c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
 a -c(NA,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
 b -c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
 c -c(1,3,2,3,2,1,2,3,3,2,2,3,NA,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
 d -c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
 e -c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,NA,1,4)
 df -data.frame(id,a,b,c,d,e)
 df
 
 At first I tried something like this (you see, the only thing I did was that 
 I tried to apply this exclude=NULL thing).
 unlist(lapply(df[,(-(1))],function(x) 
 tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,2],exclude=NULL [[1]])
 
 At least my code realizes the fact that I have 4 different levels in column 
 'a' (1,2,3,NA) and not only three (1,2,3). Check it here:
 nlevels(factor(df[,2],exclude=NULL))
 
 But you see in the result that somehow it could not calculate the NAs. It says
 3  0  6  0(!)  4  3  3  0  4  1  5  0
 
 Instead of the correct:
 3  0  6  1(!)  4  3  3  0  4  1  5  0
 
 Or in case of:
 unlist(lapply(df[,(-(1))],function(x) 
 tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,4],exclude=NULL [[3]])
 
 It says
 2  4  4  0  2  3  4  0(!)  1  5  4  0
 
 Instead of the correct
 2  4  4  0  2  3  4  1(!)  1  5  4  0
 etc.
 
 Does someone have any ideas how to persuade the function tabulate to count 
 NAs? Is it possible at all?
 Thanks very much and have a pleasant weekend,
 Laszlo
 
 
 Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy 
 jogilag, szakmailag vagy más módon védett információt tartalmazhat. 
 Amennyiben nem Ön a levél címzettje akkor a levél tartalmának közlése, 
 reprodukálása, másolása, vagy egyéb más úton történő terjesztése, 
 felhasználása szigorúan tilos. Amennyiben tévedésből kapta meg ezt az 
 üzenetet kérjük azonnal értesítse az üzenet küldőjét. Az Erste Bank Hungary 
 Zrt. (EBH) nem vállal felelősséget az információ teljes és pontos - 
 címzett(ek)hez történő - eljuttatásáért, valamint semmilyen késésért, 
 kapcsolat megszakadásból eredő hibáért, vagy az információ felhasználásából 
 vagy annak megbízhatatlanságából eredő kárért.
 
 Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és 
 hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az EBH 
 folytonos munkamenetének biztosítása érdekében.
 
 
 This e-mail

Re: [R] persuade tabulate function to count NAs in a data frame

2011-03-19 Thread Jim Lemon


On 03/20/2011 01:58 AM, Bodnar Laszlo EB_HU wrote:

Hi,

I'd like to ask you a question again. It is basically about data frames, NAs 
and tabulate function.


Hi Bodnar,
The freq function in the prettyR package might do what you want.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] persuade tabulate function to count NAs in a data frame

Re: [R] persuade tabulate function to count NAs in a data frame

Re: [R] persuade tabulate function to count NAs in a data frame

3 matches

Site Navigation

Mail list logo

Footer information