Thanks, Windows XP Media Centre Edition (!) Version 2002 Service Pack 2 R version 2.3.1 (already heard that 2.4.0 has better memory handling?)
I think I'll use NaN for 'not applicable' and NA for 'missing': does anyone know how Amelia handles and distinguishes between these two (only the latter, of course, need imputing), and so if this can be done without further formatting? Jon -----Original Message----- From: Petr Pikal [mailto:[EMAIL PROTECTED] Sent: 31 October 2006 12:15 To: Jon Minton; [email protected] Subject: Re: [R] R crashing during batch file formatting Hi you shall probably provide more information (OS, R version). I cannot help you much with crash but here is some opinion. I would try to do conversion interactively before I transferred it to a function. However, if you want different types of NA and your data is numeric, you probably could make a distinction by using -Inf, Inf, NaN and NA, but then you need to be careful when doing analysis, as these values can be treated differently. HTH Petr On 31 Oct 2006 at 11:43, Jon Minton wrote: From: "Jon Minton" <[EMAIL PROTECTED]> To: <[email protected]> Date sent: Tue, 31 Oct 2006 11:43:22 -0000 Subject: [R] R crashing during batch file formatting > Hi R users: > > > > I have the British Household Panel Survey (BHPS) in .tab format. I > want to feed it through the Amelia package (which will be an > 'interesting' job in itself).. > > But first I need to convert the various types of missing value (from > about -9 to -1) to a more generic 'NA' code. > > > > I've written the following function to do this: > > > > BHPS.converter <- function(from="D:/Data/BHPS/UKDA-5151-tab/tab/", > to="D:/BHPS/NA/", ext="tab" ) { > > from.files <- dir(from, > pattern=paste(".",ext,"$",sep="") ) > > existing.to.files <- dir(to, > pattern=paste(".",ext,"$",sep="") ) > > still.to.do.index <- 1:length(from.files) > > still.to.do.index <- > still.to.do.index[-match(existing.to.files, from.files)] > > obs.to.do <- length(still.to.do.index) > > for (i in 1:obs.to.do){ > > temp.table <- > read.delim(paste(from,from.files[still.to.do.index[i]], sep="")) > > print(paste("read:", > from.files[still.to.do.index[i]])) > > temp.table[temp.table < 0 ] <- NA > > write.table(temp.table, > file=paste(to,from.files[still.to.do.index[i]], sep="")) > > print(paste("written:", > from.files[still.to.do.index[i]])) > > } > > > > > > rm(i, from.files, existing.to.files, > still.to.do.index, > obs.to.do, temp.table) > > } > > > > It checks for existing files in the 'to' directory (where files which > have been modified with R- -> NA) because when I tried to do this > conversion operation previously it got about ˝ way through then > crashed. > > > > The problem is that it crashes *this time* too, without displaying a > prompt to say it's read a single file. > > > > The file it gets stuck on is about 75mb in size. > > > > I am using a dual-core 3.2Ghz Pentium D processor with 2 Gb memory (& > 2Gb virtual memory), and (unfortunately) Windows XP. > > > > Questions: > > 1) Any general tips on how to increase the amount of memory available > to > process the file? > > 2) Can you see a more efficient way of doing what I'm doing? > > 3) What's the best way of coding for multiple forms of NA? - the BHPS > code '-8' (meaning 'inapplicable', not routed for this respondent) > should really be distinguished from other forms of nonresponse... > > > > > > Thanks, > > > > Jon > > > > > > p.s. Apologies if this is slightly too vague/long winded... > > > > > > Jon Minton > > > > > > > [[alternative HTML version deleted]] > > Petr Pikal [EMAIL PROTECTED] ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
