subject:"Re\: \[R\] replacing all NA's in a dataframe with zeros..."

Re: [R] replacing all NA's in a dataframe with zeros...

2007-03-15 Thread Gavin Simpson

On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote:
 Since you can index a matrix or dataframe with
 a matrix of logicals, you can use is.na()
 to index all the NA locations and replace them
 all with 0 in one command.
 

A quicker solution, that, IIRC,  was posted to the list by Peter
Dalgaard several years ago is:

sapply(mydata.df, function(x) {x[is.na(x)] - 0; x}))

Some timings on a larger problem with 100 columns:

 mydata.df - as.data.frame(matrix(sample(c(as.numeric(NA), 1), 
 size = 1000*100, replace = TRUE), 
 nrow = 1000))

 system.time(retval - sapply(mydata.df, 
   function(x) {x[is.na(x)] - 0; x}))
[1] 0.108 0.008 0.120 0.000 0.000

 system.time(mydata.df[is.na(mydata.df)] - 0)
[1] 2.460 0.028 2.498 0.000 0.000

And a larger problem still, 1000 columns

 mydata.df - as.data.frame(matrix(sample(c(as.numeric(NA), 1), 
 size = 1000*1000, replace = TRUE), 
 nrow = 1000))

 system.time(retval - sapply(mydata.df, function(x) {x[is.na(x)] - 0;
x}))
[1] 0.908 0.068 2.657 0.000 0.000
 system.time(mydata.df[is.na(mydata.df)] - 0)
[1] 43.127  0.332 46.440  0.000  0.000

Profiling mydata.df[is.na(mydata.df)] - 0 shows that it spends most of
this time subsetting the the individual cells of the data frame in turn
and setting the NA ones to 0.

HTH

G

  mydata.df - as.data.frame(matrix(sample(c(as.numeric(NA), 1), size = 30, 
  replace = TRUE), nrow = 6))
  mydata.df
   V1 V2 V3 V4 V5
 1  1 NA  1  1  1
 2  1 NA NA NA  1
 3 NA NA  1 NA NA
 4 NA NA NA NA  1
 5 NA  1 NA NA  1
 6  1 NA NA  1  1
  is.na(mydata.df)
  V1V2V3V4V5
 1 FALSE  TRUE FALSE FALSE FALSE
 2 FALSE  TRUE  TRUE  TRUE FALSE
 3  TRUE  TRUE FALSE  TRUE  TRUE
 4  TRUE  TRUE  TRUE  TRUE FALSE
 5  TRUE FALSE  TRUE  TRUE FALSE
 6 FALSE  TRUE  TRUE FALSE FALSE
  mydata.df[is.na(mydata.df)] - 0
  mydata.df
   V1 V2 V3 V4 V5
 1  1  0  1  1  1
 2  1  0  0  0  1
 3  0  0  1  0  0
 4  0  0  0  0  1
 5  0  1  0  0  1
 6  1  0  0  1  1
  
 
 Steven McKinney
 
 Statistician
 Molecular Oncology and Breast Cancer Program
 British Columbia Cancer Research Centre
 
 email: [EMAIL PROTECTED]
 
 tel: 604-675-8000 x7561
 
 BCCRC
 Molecular Oncology
 675 West 10th Ave, Floor 4
 Vancouver B.C. 
 V5Z 1L3
 Canada
 
 
 
 
 -Original Message-
 From: [EMAIL PROTECTED] on behalf of David L. Van Brunt, Ph.D.
 Sent: Wed 3/14/2007 5:22 PM
 To: R-Help List
 Subject: [R] replacing all NA's in a dataframe with zeros...
  
 I've seen how to  replace the NA's in a single column with a data frame
 
 * mydata$ncigs[is.na(mydata$ncigs)]-0
 
 *But this is just one column... I have thousands of columns (!) that I need
 to do this, and I can't figure out a way, outside of the dreaded loop, do
 replace all NA's in an entire data frame (all vars) without naming each var
 separately. Yikes.
 
 I'm racking my brain on this, seems like I must be staring at the obvious,
 but it eludes me. Searches have come up CLOSE, but not quite what I need..
 
 Any pointers?
 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replacing all NA's in a dataframe with zeros...

2007-03-15 Thread Peter Dalgaard

Gavin Simpson wrote:
 On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote:
   
 Since you can index a matrix or dataframe with
 a matrix of logicals, you can use is.na()
 to index all the NA locations and replace them
 all with 0 in one command.

 

 A quicker solution, that, IIRC,  was posted to the list by Peter
 Dalgaard several years ago is:

 sapply(mydata.df, function(x) {x[is.na(x)] - 0; x}))
   
I hope your memory fails you, because it doesn't actually work.

 sapply(test.df, function(x) {x[is.na(x)] - 0; x})
 x1 x2 x3
[1,]  0  1  1
[2,]  2  2  0
[3,]  3  3  0
[4,]  0  4  4

is a matrix, not a data frame.

Instead:

 test.df[] - lapply(test.df, function(x) {x[is.na(x)] - 0; x})
 test.df
  x1 x2 x3
1  0  1  1
2  2  2  0
3  3  3  0
4  0  4  4

Speedwise, sapply() is doing lapply() internally, and the assignment
overhead should be small, so I'd expect similar timings.

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replacing all NA's in a dataframe with zeros...

2007-03-15 Thread Joerg van den Hoff

On Thu, Mar 15, 2007 at 10:21:22AM +0100, Peter Dalgaard wrote:
 Gavin Simpson wrote:
  On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote:

  Since you can index a matrix or dataframe with
  a matrix of logicals, you can use is.na()
  to index all the NA locations and replace them
  all with 0 in one command.
 
  
 
  A quicker solution, that, IIRC,  was posted to the list by Peter
  Dalgaard several years ago is:
 
  sapply(mydata.df, function(x) {x[is.na(x)] - 0; x}))

 I hope your memory fails you, because it doesn't actually work.
 
  sapply(test.df, function(x) {x[is.na(x)] - 0; x})
  x1 x2 x3
 [1,]  0  1  1
 [2,]  2  2  0
 [3,]  3  3  0
 [4,]  0  4  4
 
 is a matrix, not a data frame.
 
 Instead:
 
  test.df[] - lapply(test.df, function(x) {x[is.na(x)] - 0; x})
  test.df
   x1 x2 x3
 1  0  1  1
 2  2  2  0
 3  3  3  0
 4  0  4  4
 
 Speedwise, sapply() is doing lapply() internally, and the assignment
 overhead should be small, so I'd expect similar timings.

just an idea:
given the order of magnitude difference (factor 17 or so) in runtime 
between the obvious solution and the fast one: would'nt it be 
possible/sensible
to modify the corresponding subsetting method ([.data.frame) such that it
recognizes the case when it is called with an arbitrary index matrix (the
problem is not restricted to indexing with a logical matrix, I presume?) and
switch internally to the fast solution given above?

in my (admittedly limited) experience it seems that one of the not so nice
properties of R is that one encounters in quite a few situations exactly the 
above
situation: unexpected massive differences in run time between different 
solutions (I'm not
talking about explicit loop penalty). what concerns me most, are the very
basic scenarios (not complex algorithms): data frames vs. matrices, naming
vector components or not, subsetting, read.table vs. scan, etc. if their were a
concise HOW TO list for the cases when speed matters, that would be helpful,
too.

I understand that part of the uneven performance is unavoidable and one must
expect the user to go to the trouble to understand the reasons, e.g. for
differences between handling purely numerical data in either matrices or data
frames. but a factor of 17 between the obvious approach and the wise one seems
a trap in which 99% of the people will step (probably never thinking that their
might be a faster approach).

joerg

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replacing all NA's in a dataframe with zeros...

2007-03-15 Thread Gavin Simpson

On Thu, 2007-03-15 at 10:21 +0100, Peter Dalgaard wrote:
 Gavin Simpson wrote:
  On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote:

  Since you can index a matrix or dataframe with
  a matrix of logicals, you can use is.na()
  to index all the NA locations and replace them
  all with 0 in one command.
 
  
 
  A quicker solution, that, IIRC,  was posted to the list by Peter
  Dalgaard several years ago is:
 
  sapply(mydata.df, function(x) {x[is.na(x)] - 0; x}))

 I hope your memory fails you, because it doesn't actually work.

Ah, yes, apologies Peter. I have the sapply version embedded in a
package function that I happened to be working on (where I wanted the
result to be a matrix) and pasted directly from there and not my crib
sheet of useful R-help snippets where I do have it as lapply(...). I'd
forgotten I'd changed Peter's suggestion slightly in my function.

That'll teach me to reply before my morning cup of Earl Grey.

All the best,

G

 
  sapply(test.df, function(x) {x[is.na(x)] - 0; x})
  x1 x2 x3
 [1,]  0  1  1
 [2,]  2  2  0
 [3,]  3  3  0
 [4,]  0  4  4
 
 is a matrix, not a data frame.
 
 Instead:
 
  test.df[] - lapply(test.df, function(x) {x[is.na(x)] - 0; x})
  test.df
   x1 x2 x3
 1  0  1  1
 2  2  2  0
 3  3  3  0
 4  0  4  4
 
 Speedwise, sapply() is doing lapply() internally, and the assignment
 overhead should be small, so I'd expect similar timings.
 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replacing all NA's in a dataframe with zeros...

2007-03-15 Thread David L. Van Brunt, Ph.D.

Thanks, one and all... I knew it had to be simple.

On 3/14/07, Jason Barnhart [EMAIL PROTECTED] wrote:

 This should work.

  test.df - data.frame(x1=c(NA,2,3,NA), x2=c(1,2,3,4),
  x3=c(1,NA,NA,4))
  test.df
   x1 x2 x3
 1 NA  1  1
 2  2  2 NA
 3  3  3 NA
 4 NA  4  4

  test.df[is.na(test.df)] - 1000

  test.df
 x1 x2   x3
 1 1000  11
 22  2 1000
 33  3 1000
 4 1000  44



 The following search string cran r replace data.frame NA in Google
 (as US user) yielded some good results (5th and 7th entry), but there
 was another example that explicitly yielded this technique.  I can't
 seem to recall my exact search string.


 - Original Message -
 From: David L. Van Brunt, Ph.D. [EMAIL PROTECTED]
 To: R-Help List r-help@stat.math.ethz.ch
 Sent: Wednesday, March 14, 2007 5:22 PM
 Subject: [R] replacing all NA's in a dataframe with zeros...


  I've seen how to  replace the NA's in a single column with a data
  frame
 
  * mydata$ncigs[is.na(mydata$ncigs)]-0
 
  *But this is just one column... I have thousands of columns (!) that
  I need
  to do this, and I can't figure out a way, outside of the dreaded
  loop, do
  replace all NA's in an entire data frame (all vars) without naming
  each var
  separately. Yikes.
 
  I'm racking my brain on this, seems like I must be staring at the
  obvious,
  but it eludes me. Searches have come up CLOSE, but not quite what I
  need..
 
  Any pointers?
 
  --
  ---
  David L. Van Brunt, Ph.D.
  mailto:[EMAIL PROTECTED]
 
  If Tyranny and Oppression come to this land, it will be in the
  guise of
  fighting a foreign enemy.
  --James Madison
 
  [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 




-- 
---
David L. Van Brunt, Ph.D.
mailto:[EMAIL PROTECTED]

If Tyranny and Oppression come to this land, it will be in the guise of
fighting a foreign enemy.
--James Madison

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replacing all NA's in a dataframe with zeros...

2007-03-14 Thread Jason Barnhart

This should work.

 test.df - data.frame(x1=c(NA,2,3,NA), x2=c(1,2,3,4), 
 x3=c(1,NA,NA,4))
 test.df
  x1 x2 x3
1 NA  1  1
2  2  2 NA
3  3  3 NA
4 NA  4  4

 test.df[is.na(test.df)] - 1000

 test.df
x1 x2   x3
1 1000  11
22  2 1000
33  3 1000
4 1000  44



The following search string cran r replace data.frame NA in Google 
(as US user) yielded some good results (5th and 7th entry), but there 
was another example that explicitly yielded this technique.  I can't 
seem to recall my exact search string.


- Original Message - 
From: David L. Van Brunt, Ph.D. [EMAIL PROTECTED]
To: R-Help List r-help@stat.math.ethz.ch
Sent: Wednesday, March 14, 2007 5:22 PM
Subject: [R] replacing all NA's in a dataframe with zeros...


 I've seen how to  replace the NA's in a single column with a data 
 frame

 * mydata$ncigs[is.na(mydata$ncigs)]-0

 *But this is just one column... I have thousands of columns (!) that 
 I need
 to do this, and I can't figure out a way, outside of the dreaded 
 loop, do
 replace all NA's in an entire data frame (all vars) without naming 
 each var
 separately. Yikes.

 I'm racking my brain on this, seems like I must be staring at the 
 obvious,
 but it eludes me. Searches have come up CLOSE, but not quite what I 
 need..

 Any pointers?

 -- 
 ---
 David L. Van Brunt, Ph.D.
 mailto:[EMAIL PROTECTED]

 If Tyranny and Oppression come to this land, it will be in the 
 guise of
 fighting a foreign enemy.
 --James Madison

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replacing all NA's in a dataframe with zeros...

2007-03-14 Thread Steven McKinney

Since you can index a matrix or dataframe with
a matrix of logicals, you can use is.na()
to index all the NA locations and replace them
all with 0 in one command.

 mydata.df - as.data.frame(matrix(sample(c(as.numeric(NA), 1), size = 30, 
 replace = TRUE), nrow = 6))
 mydata.df
  V1 V2 V3 V4 V5
1  1 NA  1  1  1
2  1 NA NA NA  1
3 NA NA  1 NA NA
4 NA NA NA NA  1
5 NA  1 NA NA  1
6  1 NA NA  1  1
 is.na(mydata.df)
 V1V2V3V4V5
1 FALSE  TRUE FALSE FALSE FALSE
2 FALSE  TRUE  TRUE  TRUE FALSE
3  TRUE  TRUE FALSE  TRUE  TRUE
4  TRUE  TRUE  TRUE  TRUE FALSE
5  TRUE FALSE  TRUE  TRUE FALSE
6 FALSE  TRUE  TRUE FALSE FALSE
 mydata.df[is.na(mydata.df)] - 0
 mydata.df
  V1 V2 V3 V4 V5
1  1  0  1  1  1
2  1  0  0  0  1
3  0  0  1  0  0
4  0  0  0  0  1
5  0  1  0  0  1
6  1  0  0  1  1
 

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: [EMAIL PROTECTED]

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of David L. Van Brunt, Ph.D.
Sent: Wed 3/14/2007 5:22 PM
To: R-Help List
Subject: [R] replacing all NA's in a dataframe with zeros...
 
I've seen how to  replace the NA's in a single column with a data frame

* mydata$ncigs[is.na(mydata$ncigs)]-0

*But this is just one column... I have thousands of columns (!) that I need
to do this, and I can't figure out a way, outside of the dreaded loop, do
replace all NA's in an entire data frame (all vars) without naming each var
separately. Yikes.

I'm racking my brain on this, seems like I must be staring at the obvious,
but it eludes me. Searches have come up CLOSE, but not quite what I need..

Any pointers?

-- 
---
David L. Van Brunt, Ph.D.
mailto:[EMAIL PROTECTED]

If Tyranny and Oppression come to this land, it will be in the guise of
fighting a foreign enemy.
--James Madison

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replacing all NA's in a dataframe with zeros...

Re: [R] replacing all NA's in a dataframe with zeros...

Re: [R] replacing all NA's in a dataframe with zeros...

Re: [R] replacing all NA's in a dataframe with zeros...

Re: [R] replacing all NA's in a dataframe with zeros...

Re: [R] replacing all NA's in a dataframe with zeros...

Re: [R] replacing all NA's in a dataframe with zeros...

7 matches

Site Navigation

Mail list logo

Footer information