Re: [R] Cleaning data
Hi Bayan, Your question seems to imply that the "age" column contains floating point numbers, e.g. df height weight age 170 72 21.5 ... If this is so, you will only find an integer in diff(age) if two adjacent numbers happen to have the same decimal fraction _and_ the subtraction does not produce a very small decimal remainder due to one or both of the numbers being unable to be represented exactly in binary notation as Eric pointed out. This seems an unusual criterion for discarding values. Perhaps if you explain why an integer result is undesirable it would help. It can be done: badrows<-which(is.integer(diff(df$age))) df<-df[-badrows,] OR df<-df[badrows+1,] if you want to delete the second rather than the first age. Jim On Tue, Sep 26, 2017 at 7:50 PM, bayan sardiniwrote: > Hi > > I want to clean my data frame, based on the age column, whereas i want to > delete the rows that the difference between its elements (i+1)-i= integer. i > used > > a <- diff(df$age) > for(i in a){if(is.integer(a) == true){df <- df[-a,] > }} > > but, it doesn’t work, any ideas > > Thanks in advance > Bayan > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning data
Hi Bayan, In your code, 'a' is a vector and is.integer(a) is a logical of length 1 - most likely FALSE if even one element of a is not an integer. (Since R will coerce all the elements of a to the same type.) You need to decide whether something "close enough" to an integer is to be considered an integer - e.g. a distance of 0.01 = 1e-6. a <- df$age df <- df[ c( TRUE, abs( a - round(a,0) )%%1 ) > 1e-6 ), ] I added the 'TRUE' at the beginning to always keep the first row of df. If you prefer to always keep the last row then move the TRUE to the end. HTH, Eric On Tue, Sep 26, 2017 at 12:50 PM, bayan sardiniwrote: > Hi > > I want to clean my data frame, based on the age column, whereas i want to > delete the rows that the difference between its elements (i+1)-i= integer. > i used > > a <- diff(df$age) > for(i in a){if(is.integer(a) == true){df <- df[-a,] > }} > > but, it doesn’t work, any ideas > > Thanks in advance > Bayan > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cleaning data
Hi I want to clean my data frame, based on the age column, whereas i want to delete the rows that the difference between its elements (i+1)-i= integer. i used a <- diff(df$age) for(i in a){if(is.integer(a) == true){df <- df[-a,] }} but, it doesn’t work, any ideas Thanks in advance Bayan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning
Sarah, Thank you very much. For the other variables I was trying to do the same job in different way because it is easier to list it Example test < which(dat$var1 !="BAA" | dat$var1 !="FAG" ) { dat <- dat[-test,]} and I did not get the right result. What am I missing here? On Wed, Nov 11, 2015 at 7:54 PM, Sarah Gosleewrote: > On Wed, Nov 11, 2015 at 8:44 PM, Ashta wrote: > > Hi Sarah, > > > > I used the following to clean my data, the program crushed several times. > > > > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > > > > What is the difference between these two > > > > test <- dat[dat$Var1 %in% "YYZ" | dat$Var1 %in% "MSN" ,] > > Besides that you're using %in% wrong? I told you how to proceed. > > myvalues <- c("YYZ", "MSN") > > test <- subset(dat, Var1 %in% myvalues) > > > > subset(dat, Var1 %in% myvalues) > X Var1 Freq > 3 3 MSN 1040 > 4 4 YYZ 300 > > > > > > > > > > > On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee > > wrote: > >> > >> Please keep replies on the list so others may participate in the > >> conversation. > >> > >> If you have a character vector containing the potential values, you > >> might look at %in% for one approach to subsetting your data. > >> > >> Var1 %in% myvalues > >> > >> Sarah > >> > >> On Wed, Nov 11, 2015 at 7:10 PM, Ashta wrote: > >> > Thank you Sarah for your prompt response! > >> > > >> > I have the list of values of the variable Var1 it is around 20. > >> > How can I modify this one to include all the 20 valid values? > >> > > >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > >> > > >> > Is there a way (efficient ) of doing it? > >> > > >> > Thank you again > >> > > >> > > >> > > >> > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee > > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta wrote: > >> >> > Hi all, > >> >> > > >> >> > I have a data frame with huge rows and columns. > >> >> > > >> >> > When I looked at the data, it has several garbage values need to > be > >> >> > > >> >> > cleaned. For a sample I am showing you the frequency distribution > >> >> > of one variables > >> >> > > >> >> > Var1 Freq > >> >> > 1:3 > >> >> > 2]6 > >> >> > 3MSN 1040 > >> >> > 4YYZ 300 > >> >> > 5\\4 > >> >> > 6+ 3 > >> >> > 7. ?> 15 > >> >> > >> >> Please use dput() to provide your data. I made a guess at what you > had > >> >> in R, but could be wrong. > >> >> > >> >> > >> >> > and continues. > >> >> > > >> >> > I want to keep those rows that contain only a valid variable value > >> >> > > >> >> > In this case MSN and YYZ. I tried the following > >> >> > > >> >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* > >> >> > > >> >> > but I am not getting the desired result. > >> >> > >> >> What are you getting? How does it differ from the desired result? > >> >> > >> >> > I have > >> >> > > >> >> > Any help or idea? > >> >> > >> >> I get: > >> >> > >> >> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", > >> >> > "", > >> >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names = > >> >> c("X", > >> >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L)) > >> >> > > >> >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > >> >> > test > >> >> X Var1 Freq > >> >> 3 3 MSN 1040 > >> >> 4 4 YYZ 300 > >> >> > >> >> Which seems reasonable to me. > >> >> > >> >> > >> >> > > >> >> > [[alternative HTML version deleted]] > >> >> > >> >> Please don't post in HTML either: it introduces all sorts of errors > to > >> >> your message. > >> >> > >> >> Sarah > >> >> > > > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning
Please keep replies on the list so others may participate in the conversation. If you have a character vector containing the potential values, you might look at %in% for one approach to subsetting your data. Var1 %in% myvalues Sarah On Wed, Nov 11, 2015 at 7:10 PM, Ashtawrote: > Thank you Sarah for your prompt response! > > I have the list of values of the variable Var1 it is around 20. > How can I modify this one to include all the 20 valid values? > > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > > Is there a way (efficient ) of doing it? > > Thank you again > > > > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee > wrote: >> >> Hi, >> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta wrote: >> > Hi all, >> > >> > I have a data frame with huge rows and columns. >> > >> > When I looked at the data, it has several garbage values need to be >> > >> > cleaned. For a sample I am showing you the frequency distribution >> > of one variables >> > >> > Var1 Freq >> > 1:3 >> > 2]6 >> > 3MSN 1040 >> > 4YYZ 300 >> > 5\\4 >> > 6+ 3 >> > 7. ?> 15 >> >> Please use dput() to provide your data. I made a guess at what you had >> in R, but could be wrong. >> >> >> > and continues. >> > >> > I want to keep those rows that contain only a valid variable value >> > >> > In this case MSN and YYZ. I tried the following >> > >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* >> > >> > but I am not getting the desired result. >> >> What are you getting? How does it differ from the desired result? >> >> > I have >> > >> > Any help or idea? >> >> I get: >> >> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", "", >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names = c("X", >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L)) >> > >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] >> > test >> X Var1 Freq >> 3 3 MSN 1040 >> 4 4 YYZ 300 >> >> Which seems reasonable to me. >> >> >> > >> > [[alternative HTML version deleted]] >> >> Please don't post in HTML either: it introduces all sorts of errors to >> your message. >> >> Sarah >> __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning
If what you posted here is what you typed, your syntax is wrong. I strongly advise you to consult the two links here: http://adv-r.had.co.nz/Reproducibility.html http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example ... and please read the posting guide and don't post in HTML. B. On Nov 11, 2015, at 10:03 PM, Ashtawrote: > Sarah, > > Thank you very much. For the other variables > I was trying to do the same job in different way because it is easier to > list it > > Example > > test < which(dat$var1 !="BAA" | dat$var1 !="FAG" ) > { >dat <- dat[-test,]} and I did not get the right result. What am I > missing here? > > > > > > On Wed, Nov 11, 2015 at 7:54 PM, Sarah Goslee > wrote: > >> On Wed, Nov 11, 2015 at 8:44 PM, Ashta wrote: >>> Hi Sarah, >>> >>> I used the following to clean my data, the program crushed several times. >>> >>> test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] >>> >>> What is the difference between these two >>> >>> test <- dat[dat$Var1 %in% "YYZ" | dat$Var1 %in% "MSN" ,] >> >> Besides that you're using %in% wrong? I told you how to proceed. >> >> myvalues <- c("YYZ", "MSN") >> >> test <- subset(dat, Var1 %in% myvalues) >> >> >>> subset(dat, Var1 %in% myvalues) >> X Var1 Freq >> 3 3 MSN 1040 >> 4 4 YYZ 300 >> >>> >>> >>> >>> >>> On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee >>> wrote: Please keep replies on the list so others may participate in the conversation. If you have a character vector containing the potential values, you might look at %in% for one approach to subsetting your data. Var1 %in% myvalues Sarah On Wed, Nov 11, 2015 at 7:10 PM, Ashta wrote: > Thank you Sarah for your prompt response! > > I have the list of values of the variable Var1 it is around 20. > How can I modify this one to include all the 20 valid values? > > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > > Is there a way (efficient ) of doing it? > > Thank you again > > > > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee >> > wrote: >> >> Hi, >> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta wrote: >>> Hi all, >>> >>> I have a data frame with huge rows and columns. >>> >>> When I looked at the data, it has several garbage values need to >> be >>> >>> cleaned. For a sample I am showing you the frequency distribution >>> of one variables >>> >>>Var1 Freq >>> 1:3 >>> 2]6 >>> 3MSN 1040 >>> 4YYZ 300 >>> 5\\4 >>> 6+ 3 >>> 7. ?> 15 >> >> Please use dput() to provide your data. I made a guess at what you >> had >> in R, but could be wrong. >> >> >>> and continues. >>> >>> I want to keep those rows that contain only a valid variable value >>> >>> In this case MSN and YYZ. I tried the following >>> >>> *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* >>> >>> but I am not getting the desired result. >> >> What are you getting? How does it differ from the desired result? >> >>> I have >>> >>> Any help or idea? >> >> I get: >> >>> dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", >>> "", >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names = >> c("X", >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L)) >>> >>> test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] >>> test >> X Var1 Freq >> 3 3 MSN 1040 >> 4 4 YYZ 300 >> >> Which seems reasonable to me. >> >> >>> >>>[[alternative HTML version deleted]] >> >> Please don't post in HTML either: it introduces all sorts of errors >> to >> your message. >> >> Sarah >> >>> >>> >> > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning
On Wed, Nov 11, 2015 at 8:44 PM, Ashtawrote: > Hi Sarah, > > I used the following to clean my data, the program crushed several times. > > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > > What is the difference between these two > > test <- dat[dat$Var1 %in% "YYZ" | dat$Var1 %in% "MSN" ,] Besides that you're using %in% wrong? I told you how to proceed. myvalues <- c("YYZ", "MSN") test <- subset(dat, Var1 %in% myvalues) > subset(dat, Var1 %in% myvalues) X Var1 Freq 3 3 MSN 1040 4 4 YYZ 300 > > > > > On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee > wrote: >> >> Please keep replies on the list so others may participate in the >> conversation. >> >> If you have a character vector containing the potential values, you >> might look at %in% for one approach to subsetting your data. >> >> Var1 %in% myvalues >> >> Sarah >> >> On Wed, Nov 11, 2015 at 7:10 PM, Ashta wrote: >> > Thank you Sarah for your prompt response! >> > >> > I have the list of values of the variable Var1 it is around 20. >> > How can I modify this one to include all the 20 valid values? >> > >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] >> > >> > Is there a way (efficient ) of doing it? >> > >> > Thank you again >> > >> > >> > >> > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee >> > wrote: >> >> >> >> Hi, >> >> >> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta wrote: >> >> > Hi all, >> >> > >> >> > I have a data frame with huge rows and columns. >> >> > >> >> > When I looked at the data, it has several garbage values need to be >> >> > >> >> > cleaned. For a sample I am showing you the frequency distribution >> >> > of one variables >> >> > >> >> > Var1 Freq >> >> > 1:3 >> >> > 2]6 >> >> > 3MSN 1040 >> >> > 4YYZ 300 >> >> > 5\\4 >> >> > 6+ 3 >> >> > 7. ?> 15 >> >> >> >> Please use dput() to provide your data. I made a guess at what you had >> >> in R, but could be wrong. >> >> >> >> >> >> > and continues. >> >> > >> >> > I want to keep those rows that contain only a valid variable value >> >> > >> >> > In this case MSN and YYZ. I tried the following >> >> > >> >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* >> >> > >> >> > but I am not getting the desired result. >> >> >> >> What are you getting? How does it differ from the desired result? >> >> >> >> > I have >> >> > >> >> > Any help or idea? >> >> >> >> I get: >> >> >> >> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", >> >> > "", >> >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names = >> >> c("X", >> >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L)) >> >> > >> >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] >> >> > test >> >> X Var1 Freq >> >> 3 3 MSN 1040 >> >> 4 4 YYZ 300 >> >> >> >> Which seems reasonable to me. >> >> >> >> >> >> > >> >> > [[alternative HTML version deleted]] >> >> >> >> Please don't post in HTML either: it introduces all sorts of errors to >> >> your message. >> >> >> >> Sarah >> >> > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cleaning
Hi all, I have a data frame with huge rows and columns. When I looked at the data, it has several garbage values need to be cleaned. For a sample I am showing you the frequency distribution of one variables Var1 Freq 1:3 2]6 3MSN 1040 4YYZ 300 5\\4 6+ 3 7. ?> 15 and continues. I want to keep those rows that contain only a valid variable value In this case MSN and YYZ. I tried the following *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* but I am not getting the desired result. I have Any help or idea? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning
Hi, On Wed, Nov 11, 2015 at 6:51 PM, Ashtawrote: > Hi all, > > I have a data frame with huge rows and columns. > > When I looked at the data, it has several garbage values need to be > > cleaned. For a sample I am showing you the frequency distribution > of one variables > > Var1 Freq > 1:3 > 2]6 > 3MSN 1040 > 4YYZ 300 > 5\\4 > 6+ 3 > 7. ?> 15 Please use dput() to provide your data. I made a guess at what you had in R, but could be wrong. > and continues. > > I want to keep those rows that contain only a valid variable value > > In this case MSN and YYZ. I tried the following > > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* > > but I am not getting the desired result. What are you getting? How does it differ from the desired result? > I have > > Any help or idea? I get: > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", "", + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names = c("X", + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L)) > > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > test X Var1 Freq 3 3 MSN 1040 4 4 YYZ 300 Which seems reasonable to me. > > [[alternative HTML version deleted]] Please don't post in HTML either: it introduces all sorts of errors to your message. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning
Hi Sarah, I used the following to clean my data, the program crushed several times. *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* *What is the difference between these two**test <- dat[dat$Var1 **%in% "YYZ" | dat$Var1** %in% "MSN" ,]* On Wed, Nov 11, 2015 at 6:38 PM, Sarah Gosleewrote: > Please keep replies on the list so others may participate in the > conversation. > > If you have a character vector containing the potential values, you > might look at %in% for one approach to subsetting your data. > > Var1 %in% myvalues > > Sarah > > On Wed, Nov 11, 2015 at 7:10 PM, Ashta wrote: > > Thank you Sarah for your prompt response! > > > > I have the list of values of the variable Var1 it is around 20. > > How can I modify this one to include all the 20 valid values? > > > > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > > > > Is there a way (efficient ) of doing it? > > > > Thank you again > > > > > > > > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee > > wrote: > >> > >> Hi, > >> > >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta wrote: > >> > Hi all, > >> > > >> > I have a data frame with huge rows and columns. > >> > > >> > When I looked at the data, it has several garbage values need to be > >> > > >> > cleaned. For a sample I am showing you the frequency distribution > >> > of one variables > >> > > >> > Var1 Freq > >> > 1:3 > >> > 2]6 > >> > 3MSN 1040 > >> > 4YYZ 300 > >> > 5\\4 > >> > 6+ 3 > >> > 7. ?> 15 > >> > >> Please use dput() to provide your data. I made a guess at what you had > >> in R, but could be wrong. > >> > >> > >> > and continues. > >> > > >> > I want to keep those rows that contain only a valid variable value > >> > > >> > In this case MSN and YYZ. I tried the following > >> > > >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* > >> > > >> > but I am not getting the desired result. > >> > >> What are you getting? How does it differ from the desired result? > >> > >> > I have > >> > > >> > Any help or idea? > >> > >> I get: > >> > >> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", > "", > >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names = > c("X", > >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L)) > >> > > >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > >> > test > >> X Var1 Freq > >> 3 3 MSN 1040 > >> 4 4 YYZ 300 > >> > >> Which seems reasonable to me. > >> > >> > >> > > >> > [[alternative HTML version deleted]] > >> > >> Please don't post in HTML either: it introduces all sorts of errors to > >> your message. > >> > >> Sarah > >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cleaning up workspace
In order to have a clean workspace at the start of each chapter of a book I'm kniting I've written a little script as follows: # chapclean.R # This cleans up the R workspace ilist-c(.GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base) print(ilist) xlist-search()[which(!(search() %in% ilist))] print(xlist) for (ff in xlist){ cat(Detach ,ff, which is pos ,as.integer(which(ff == search())),\n) detach(pos=as.integer(which(ff == search())), unload=TRUE) # ?? do we need unload } rm(list=ls()) This appears to work fine in my system -- session info is below, but I get 30 warnings of the type 30: In FUN(X[[2L]], ...) : Created a package name, ‘2013-10-16 10:56:47’, when none found Does anyone have ideas why the warnings are being generated? I'd like to avoid suppressing them. Here's the session info. R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_CA.UTF-8 [5] LC_MONETARY=en_CA.UTF-8LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.0.1 John Nash __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up workspace
This has been reported before on the bug list (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15481). The message is coming from the methods package, but I don't know if it's a bug or ignorable. Duncan Murdoch On 16/10/2013 11:03 AM, Prof J C Nash (U30A) wrote: In order to have a clean workspace at the start of each chapter of a book I'm kniting I've written a little script as follows: # chapclean.R # This cleans up the R workspace ilist-c(.GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base) print(ilist) xlist-search()[which(!(search() %in% ilist))] print(xlist) for (ff in xlist){ cat(Detach ,ff, which is pos ,as.integer(which(ff == search())),\n) detach(pos=as.integer(which(ff == search())), unload=TRUE) # ?? do we need unload } rm(list=ls()) This appears to work fine in my system -- session info is below, but I get 30 warnings of the type 30: In FUN(X[[2L]], ...) : Created a package name, ‘2013-10-16 10:56:47’, when none found Does anyone have ideas why the warnings are being generated? I'd like to avoid suppressing them. Here's the session info. R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_CA.UTF-8 [5] LC_MONETARY=en_CA.UTF-8LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.0.1 John Nash __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up messy Excel data
When I was still teaching undergraduate intro biz-stat (among that community it is always abbreviated), we needed to control the spreadsheet behaviour of TAs who entered marks into a spreadsheet. We came up with TellTable (the Sourceforge site is still around with refs at http://telltable-s.sourceforge.net/), which put openoffice calc on a server and made sure change recording was on and the menu to switch off change recording was removed. It is used over a web browser with a VNC client. Neil Smith wrote a Java application to view all the changes by who, what, when etc., and we discovered the infrastructure was quite nice for running any single user app in a shared mode with version control. However, with Google Docs, we realized we could try to make money or enjoy life, and so the project is now moribund. However, the ideas are there, and if anyone gets interested, I'll be happy to try to dig up materials, though I suspect that it would be easier to work with the ideas and more modern tools. The key idea is that there is just ONE master file, and that there is some discipline over keeping that file OK. My opinion is that this concept could be exploited much more for lots of different situations, but it seems that cloud technology is being used to create lots of versions of files rather than consolidate and control such files. JN On 03/03/2012 06:00 AM, r-help-requ...@r-project.org wrote: Message: 76 Date: Fri, 2 Mar 2012 20:04:05 -0500 From: jim holtman jholt...@gmail.com To: Greg Snow 538...@gmail.com Cc: r-help r-help@r-project.org Subject: Re: [R] Cleaning up messy Excel data Message-ID: caaxdm-6vzxcli4mr0gukwge5eva0-gx03fruey9ej3cajy4...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 Unfortunately they only know how to use Excel and Word. They are not folks who use a computer every day. Many of them run factories or warehouses and asking them to use something like Access would not happen in my lifetime (I have retired twice already). I don't have any problems with them messing up the data that I send them; they are pretty good about making changes within the context of the spreadsheet. The other issue is that I working with people in twenty different locations spread across the US, so I might be able to one of them to use Access (there is one I know that uses it), but that leaves 19 other people I would not be able to communicate with. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up messy Excel data
Sometimes we adapt to our environment, sometimes we adapt our environment to us. I like fortune(108). I actually was suggesting that you add a tool to your toolbox, not limit it. In my experience (and I don't expect everyone else's to match) data manipulation that seems easier in Excel than R is only easier until the client comes back and wants me to redo the whole analysis with one typo fixed. Then rerunning the script in R (or Perl or other tool) is a lot easier than trying to remember where all I clicked, dragged, selected, etc. I do use Excel for somethings (though I would be happy to find other tools for that if it were possible to expunge Excel from the earth) and Word (I actually like using R2wd to send tables and graphs to word that I can then give to clients who just want to be able to copy and paste them to something else), I just think that many of the tasks that many people use excel for would be better served with a better tool. If someone reading this decides to put some more thought into a project up front and actually design a database up front rather than letting it evolve into some monstrosity in Excel, and that decision saves them some later grief, then the world will be a little bit better place. On Fri, Mar 2, 2012 at 6:04 PM, jim holtman jholt...@gmail.com wrote: Unfortunately they only know how to use Excel and Word. They are not folks who use a computer every day. Many of them run factories or warehouses and asking them to use something like Access would not happen in my lifetime (I have retired twice already). I don't have any problems with them messing up the data that I send them; they are pretty good about making changes within the context of the spreadsheet. The other issue is that I working with people in twenty different locations spread across the US, so I might be able to one of them to use Access (there is one I know that uses it), but that leaves 19 other people I would not be able to communicate with. The other thing is, is that I use Excel myself to slice/dice data since there are things that are easier in Excel than R (believe it or not). There are a number of tools I keep in my toolkit, and R is probably the most important, but I have not thrown the rest of them away since they still serve a purpose. So if you can come up with a way to 20 diverse groups, who are not computer literate, to change over in a couple of days from Excel to Access let me know. BTW, I tried to use Access once and gave it up because it was not as intuitive as some other tools and did not give me any more capability than the ones I was using. So I know I would have a problem in convincing other to make the change just so they could communicate with me, while they still had to use Excel to most of their other interfaces. This is the real world where you have to learn how to adapt to your environment and make the best of it. So you just have to learn that Excel can be your friend (or at least not your enemy) and can serve a very useful purpose in getting your ideas across to other people. On Fri, Mar 2, 2012 at 6:41 PM, Greg Snow 538...@gmail.com wrote: Try sending your clients a data set (data frame, table, etc) as an MS Access data table instead. They can still view the data as a table, but will have to go to much more effort to mess up the data, more likely they will do proper edits without messing anything up (mixing characters in with numbers, have more sexes than your biology teacher told you about, add extra lines at top or bottom that makes reading back into R more difficult, etc.) I have had a few clients that I talked into using MS Access from the start to enter their data, there was often a bit of resistance at first, but once they tried it and went through the process of designing the database up front they ended up thanking me and believed that the entire data entry process was easier and quicker than had the used excel as they originally planned. Access is still part of MS office, so they don't need to learn R or in any way break their chains from being prisoners of bill, but they will be more productive in more ways than just interfacing with you. Access (databases in general) force you to plan things out and do the correct thing from the start. It is possible to do the right thing in Excel, but Excel does not encourage (let alone force) you to do the right thing, but makes it easy to do the wrong thing. On Thu, Mar 1, 2012 at 6:15 AM, jim holtman jholt...@gmail.com wrote: But there are some important reasons to use Excel. In my work there are a lot of people that I have to send the equivalent of a data.frame to who want to look at the data and possibly slice/dice the data differently and then send back to me updates. These folks do not know how to use R, but do have Microsoft Office installed on their computers and know how to use the different products. I have been very successful in conveying what
Re: [R] Cleaning up messy Excel data
Seconded John Kane Kingston ON Canada -Original Message- From: rolf.tur...@xtra.co.nz Sent: Sat, 03 Mar 2012 13:46:42 +1300 To: 538...@gmail.com Subject: Re: [R] Cleaning up messy Excel data On 03/03/12 12:41, Greg Snow wrote: SNIP It is possible to do the right thing in Excel, but Excel does not encourage (let alone force) you to do the right thing, but makes it easy to do the wrong thing. SNIP Fortune! cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up messy Excel data
Try sending your clients a data set (data frame, table, etc) as an MS Access data table instead. They can still view the data as a table, but will have to go to much more effort to mess up the data, more likely they will do proper edits without messing anything up (mixing characters in with numbers, have more sexes than your biology teacher told you about, add extra lines at top or bottom that makes reading back into R more difficult, etc.) I have had a few clients that I talked into using MS Access from the start to enter their data, there was often a bit of resistance at first, but once they tried it and went through the process of designing the database up front they ended up thanking me and believed that the entire data entry process was easier and quicker than had the used excel as they originally planned. Access is still part of MS office, so they don't need to learn R or in any way break their chains from being prisoners of bill, but they will be more productive in more ways than just interfacing with you. Access (databases in general) force you to plan things out and do the correct thing from the start. It is possible to do the right thing in Excel, but Excel does not encourage (let alone force) you to do the right thing, but makes it easy to do the wrong thing. On Thu, Mar 1, 2012 at 6:15 AM, jim holtman jholt...@gmail.com wrote: But there are some important reasons to use Excel. In my work there are a lot of people that I have to send the equivalent of a data.frame to who want to look at the data and possibly slice/dice the data differently and then send back to me updates. These folks do not know how to use R, but do have Microsoft Office installed on their computers and know how to use the different products. I have been very successful in conveying what I am doing for them by communicating via Excel spreadsheets. It is also an important medium in dealing with some international companies who provide data via Excel and expect responses back via Excel. When dealing with data in a tabular form, Excel does provide a way for a majority of the people I work with to understand the data. Yes, there are problems with some of the ways that people use Excel, and yes I have had to invest time in scrubbing some of the data that I get from them, but if I did not, then I would probably not have a job working for them. I use R exclusively for the analysis that I do, but find it convenient to use Excel to provide a communication mechanism to the majority of the non-R users that I have to deal with. It is a convenient work-around because I would never get them to invest the time to learn R. So in the real world these is a need to Excel and we are not going to cause it to go away; we have to learn how to live with it, and from my standpoint, it has definitely benefited me in being able to communicate with my users and continuing to provide them with results that they are happy with. They refer to letting me work my magic on the data; all they know is they see the result via Excel and in the background R is doing the heavy lifting that they do not have to know about. On Wed, Feb 29, 2012 at 4:41 PM, Rolf Turner rolf.tur...@xtra.co.nz wrote: On 01/03/12 04:43, John Kane wrote: (mydata- as.factor(c(1,2,3, 2, 5, 2))) str(mydata) newdata- as.character(mydata) newdata[newdata==2]- 0 newdata- as.numeric(newdata) str(newdata) We really need to keep Excel (and other spreadsheets) out of peoples hands. Amen, bro'!!! cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up messy Excel data
Unfortunately, a lot of people who use MS Office don't have or know how to use MS Access. Where I work now (as in the past) I have to tie someone to their chair, give them a few pokes with the cattle prod and then show them that a CSV file will load straight into Excel before I can convince them that they can use such a heretical data format. You don't want to know what I have to do to convince them that they can view my listings in HTML. Jim PS - Always give them a _copy_ of the CSV file. On 03/03/2012 10:41 AM, Greg Snow wrote: Try sending your clients a data set (data frame, table, etc) as an MS Access data table instead. They can still view the data as a table, but will have to go to much more effort to mess up the data, more likely they will do proper edits without messing anything up (mixing characters in with numbers, have more sexes than your biology teacher told you about, add extra lines at top or bottom that makes reading back into R more difficult, etc.) I have had a few clients that I talked into using MS Access from the start to enter their data, there was often a bit of resistance at first, but once they tried it and went through the process of designing the database up front they ended up thanking me and believed that the entire data entry process was easier and quicker than had the used excel as they originally planned. Access is still part of MS office, so they don't need to learn R or in any way break their chains from being prisoners of bill, but they will be more productive in more ways than just interfacing with you. Access (databases in general) force you to plan things out and do the correct thing from the start. It is possible to do the right thing in Excel, but Excel does not encourage (let alone force) you to do the right thing, but makes it easy to do the wrong thing. On Thu, Mar 1, 2012 at 6:15 AM, jim holtmanjholt...@gmail.com wrote: But there are some important reasons to use Excel. In my work there are a lot of people that I have to send the equivalent of a data.frame to who want to look at the data and possibly slice/dice the data differently and then send back to me updates. These folks do not know how to use R, but do have Microsoft Office installed on their computers and know how to use the different products. I have been very successful in conveying what I am doing for them by communicating via Excel spreadsheets. It is also an important medium in dealing with some international companies who provide data via Excel and expect responses back via Excel. When dealing with data in a tabular form, Excel does provide a way for a majority of the people I work with to understand the data. Yes, there are problems with some of the ways that people use Excel, and yes I have had to invest time in scrubbing some of the data that I get from them, but if I did not, then I would probably not have a job working for them. I use R exclusively for the analysis that I do, but find it convenient to use Excel to provide a communication mechanism to the majority of the non-R users that I have to deal with. It is a convenient work-around because I would never get them to invest the time to learn R. So in the real world these is a need to Excel and we are not going to cause it to go away; we have to learn how to live with it, and from my standpoint, it has definitely benefited me in being able to communicate with my users and continuing to provide them with results that they are happy with. They refer to letting me work my magic on the data; all they know is they see the result via Excel and in the background R is doing the heavy lifting that they do not have to know about. On Wed, Feb 29, 2012 at 4:41 PM, Rolf Turnerrolf.tur...@xtra.co.nz wrote: On 01/03/12 04:43, John Kane wrote: (mydata- as.factor(c(1,2,3, 2, 5, 2))) str(mydata) newdata- as.character(mydata) newdata[newdata==2]- 0 newdata- as.numeric(newdata) str(newdata) We really need to keep Excel (and other spreadsheets) out of peoples hands. Amen, bro'!!! cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up messy Excel data
On 03/03/12 12:41, Greg Snow wrote: SNIP It is possible to do the right thing in Excel, but Excel does not encourage (let alone force) you to do the right thing, but makes it easy to do the wrong thing. SNIP Fortune! cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up messy Excel data
Unfortunately they only know how to use Excel and Word. They are not folks who use a computer every day. Many of them run factories or warehouses and asking them to use something like Access would not happen in my lifetime (I have retired twice already). I don't have any problems with them messing up the data that I send them; they are pretty good about making changes within the context of the spreadsheet. The other issue is that I working with people in twenty different locations spread across the US, so I might be able to one of them to use Access (there is one I know that uses it), but that leaves 19 other people I would not be able to communicate with. The other thing is, is that I use Excel myself to slice/dice data since there are things that are easier in Excel than R (believe it or not). There are a number of tools I keep in my toolkit, and R is probably the most important, but I have not thrown the rest of them away since they still serve a purpose. So if you can come up with a way to 20 diverse groups, who are not computer literate, to change over in a couple of days from Excel to Access let me know. BTW, I tried to use Access once and gave it up because it was not as intuitive as some other tools and did not give me any more capability than the ones I was using. So I know I would have a problem in convincing other to make the change just so they could communicate with me, while they still had to use Excel to most of their other interfaces. This is the real world where you have to learn how to adapt to your environment and make the best of it. So you just have to learn that Excel can be your friend (or at least not your enemy) and can serve a very useful purpose in getting your ideas across to other people. On Fri, Mar 2, 2012 at 6:41 PM, Greg Snow 538...@gmail.com wrote: Try sending your clients a data set (data frame, table, etc) as an MS Access data table instead. They can still view the data as a table, but will have to go to much more effort to mess up the data, more likely they will do proper edits without messing anything up (mixing characters in with numbers, have more sexes than your biology teacher told you about, add extra lines at top or bottom that makes reading back into R more difficult, etc.) I have had a few clients that I talked into using MS Access from the start to enter their data, there was often a bit of resistance at first, but once they tried it and went through the process of designing the database up front they ended up thanking me and believed that the entire data entry process was easier and quicker than had the used excel as they originally planned. Access is still part of MS office, so they don't need to learn R or in any way break their chains from being prisoners of bill, but they will be more productive in more ways than just interfacing with you. Access (databases in general) force you to plan things out and do the correct thing from the start. It is possible to do the right thing in Excel, but Excel does not encourage (let alone force) you to do the right thing, but makes it easy to do the wrong thing. On Thu, Mar 1, 2012 at 6:15 AM, jim holtman jholt...@gmail.com wrote: But there are some important reasons to use Excel. In my work there are a lot of people that I have to send the equivalent of a data.frame to who want to look at the data and possibly slice/dice the data differently and then send back to me updates. These folks do not know how to use R, but do have Microsoft Office installed on their computers and know how to use the different products. I have been very successful in conveying what I am doing for them by communicating via Excel spreadsheets. It is also an important medium in dealing with some international companies who provide data via Excel and expect responses back via Excel. When dealing with data in a tabular form, Excel does provide a way for a majority of the people I work with to understand the data. Yes, there are problems with some of the ways that people use Excel, and yes I have had to invest time in scrubbing some of the data that I get from them, but if I did not, then I would probably not have a job working for them. I use R exclusively for the analysis that I do, but find it convenient to use Excel to provide a communication mechanism to the majority of the non-R users that I have to deal with. It is a convenient work-around because I would never get them to invest the time to learn R. So in the real world these is a need to Excel and we are not going to cause it to go away; we have to learn how to live with it, and from my standpoint, it has definitely benefited me in being able to communicate with my users and continuing to provide them with results that they are happy with. They refer to letting me work my magic on the data; all they know is they see the result via Excel and in the background R is doing the heavy lifting that they do not have to know
Re: [R] Cleaning up messy Excel data
But there are some important reasons to use Excel. In my work there are a lot of people that I have to send the equivalent of a data.frame to who want to look at the data and possibly slice/dice the data differently and then send back to me updates. These folks do not know how to use R, but do have Microsoft Office installed on their computers and know how to use the different products. I have been very successful in conveying what I am doing for them by communicating via Excel spreadsheets. It is also an important medium in dealing with some international companies who provide data via Excel and expect responses back via Excel. When dealing with data in a tabular form, Excel does provide a way for a majority of the people I work with to understand the data. Yes, there are problems with some of the ways that people use Excel, and yes I have had to invest time in scrubbing some of the data that I get from them, but if I did not, then I would probably not have a job working for them. I use R exclusively for the analysis that I do, but find it convenient to use Excel to provide a communication mechanism to the majority of the non-R users that I have to deal with. It is a convenient work-around because I would never get them to invest the time to learn R. So in the real world these is a need to Excel and we are not going to cause it to go away; we have to learn how to live with it, and from my standpoint, it has definitely benefited me in being able to communicate with my users and continuing to provide them with results that they are happy with. They refer to letting me work my magic on the data; all they know is they see the result via Excel and in the background R is doing the heavy lifting that they do not have to know about. On Wed, Feb 29, 2012 at 4:41 PM, Rolf Turner rolf.tur...@xtra.co.nz wrote: On 01/03/12 04:43, John Kane wrote: (mydata- as.factor(c(1,2,3, 2, 5, 2))) str(mydata) newdata- as.character(mydata) newdata[newdata==2]- 0 newdata- as.numeric(newdata) str(newdata) We really need to keep Excel (and other spreadsheets) out of peoples hands. Amen, bro'!!! cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up messy Excel data
(mydata - as.factor(c(1,2,3, 2, 5, 2))) str(mydata) newdata - as.character(mydata) newdata[newdata==2] - 0 newdata - as.numeric(newdata) str(newdata) We really need to keep Excel (and other spreadsheets) out of peoples hands. John Kane Kingston ON Canada -Original Message- From: noahsilver...@ucla.edu Sent: Tue, 28 Feb 2012 13:27:13 -0800 To: r-help@r-project.org Subject: [R] Cleaning up messy Excel data Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import into R and clean up some things so that I can do my analysis. Pulling in a CSV from Excel is the easy part. My current challenge is dealing with some text mixed in the values. i.e. 118 5.7 2.0 3.7 Since this column in Excel has a 2.0 value, then R reads the column as a factor with levels. Ideally, I want to convert it a normal vector of scalars and code code the 2.0 as 0. Can anyone suggest an easy way to do this? Thanks! -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up messy Excel data
On 01/03/12 04:43, John Kane wrote: (mydata- as.factor(c(1,2,3, 2, 5, 2))) str(mydata) newdata- as.character(mydata) newdata[newdata==2]- 0 newdata- as.numeric(newdata) str(newdata) We really need to keep Excel (and other spreadsheets) out of peoples hands. Amen, bro'!!! cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cleaning up messy Excel data
Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import into R and clean up some things so that I can do my analysis. Pulling in a CSV from Excel is the easy part. My current challenge is dealing with some text mixed in the values. i.e. 118 5.7 2.0 3.7 Since this column in Excel has a 2.0 value, then R reads the column as a factor with levels. Ideally, I want to convert it a normal vector of scalars and code code the 2.0 as 0. Can anyone suggest an easy way to do this? Thanks! -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up messy Excel data
First of all when reading in the CSV file, use 'as.is = TRUE' to prevent the changing to factors. Now that things are character in that column, you can use some pattern expressions (gsub, regex, ...) to search for and change your data. E.g., sub(.*, 0, yourCol) should do it for you. On Tue, Feb 28, 2012 at 4:27 PM, Noah Silverman noahsilver...@ucla.edu wrote: Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import into R and clean up some things so that I can do my analysis. Pulling in a CSV from Excel is the easy part. My current challenge is dealing with some text mixed in the values. i.e. 118 5.7 2.0 3.7 Since this column in Excel has a 2.0 value, then R reads the column as a factor with levels. Ideally, I want to convert it a normal vector of scalars and code code the 2.0 as 0. Can anyone suggest an easy way to do this? Thanks! -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up messy Excel data
-Original Message- From: Noah Silverman Sent: Tuesday, February 28, 2012 3:27 PM To: r-help Subject: [R] Cleaning up messy Excel data Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import into R and clean up some things so that I can do my analysis. Pulling in a CSV from Excel is the easy part. My current challenge is dealing with some text mixed in the values. i.e. 118 5.7 2.0 3.7 Since this column in Excel has a 2.0 value, then R reads the column as a factor with levels. Ideally, I want to convert it a normal vector of scalars and code code the 2.0 as 0. Can anyone suggest an easy way to do this? -- ?as.character will show you how to change the factor column into a character column. Then, you can replace text using any of a number of procedures. see for example ?gsub finally, you can use as.numeric if you want numbers. Coding is best done in the context of factors, so you might want to consider where replacing 2 with NA is more appropriate than replacing with 0. In this end, the choice might be context sensitive. Rob -- Robert W. Baer, Ph.D. Professor of Physiology Kirksville College of Osteopathic Medicine A. T. Still University of Health Sciences 800 W. Jefferson St. Kirksville, MO 63501 660-626-2322 FAX 660-626-2965 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up messy Excel data
That's exactly what I need. Thank You!! -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 On Feb 28, 2012, at 1:42 PM, jim holtman wrote: First of all when reading in the CSV file, use 'as.is = TRUE' to prevent the changing to factors. Now that things are character in that column, you can use some pattern expressions (gsub, regex, ...) to search for and change your data. E.g., sub(.*, 0, yourCol) should do it for you. On Tue, Feb 28, 2012 at 4:27 PM, Noah Silverman noahsilver...@ucla.edu wrote: Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import into R and clean up some things so that I can do my analysis. Pulling in a CSV from Excel is the easy part. My current challenge is dealing with some text mixed in the values. i.e. 118 5.7 2.0 3.7 Since this column in Excel has a 2.0 value, then R reads the column as a factor with levels. Ideally, I want to convert it a normal vector of scalars and code code the 2.0 as 0. Can anyone suggest an easy way to do this? Thanks! -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up messy Excel data
Just replace that value with zero. If you provide some reproducible code I could probably give you a solution. ?dput good luck, Stephen On 02/28/2012 03:27 PM, Noah Silverman wrote: Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import into R and clean up some things so that I can do my analysis. Pulling in a CSV from Excel is the easy part. My current challenge is dealing with some text mixed in the values. i.e. 118 5.72.0 3.7 Since this column in Excel has a 2.0 value, then R reads the column as a factor with levels. Ideally, I want to convert it a normal vector of scalars and code code the 2.0 as 0. Can anyone suggest an easy way to do this? Thanks! -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick ** Auburn University Biological Sciences 331 Funchess Hall Auburn, Alabama 36849 ** sas0...@auburn.edu http://www.auburn.edu/~sas0025 ** Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis A big computer, a complex algorithm and a long time does not equal science. -Robert Gentleman __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning date columns
Dear Bill, Thanks very much for the reply and for the code. I have amended my personal details for future posts. I was wondering if there were any good books or tutorials for writing code similar to what you have provided above? Best wishes, Natalie Van Zuydam - Natalie Van Zuydam PhD Student University of Dundee nvanzuy...@dundee.ac.uk -- View this message in context: http://r.789695.n4.nabble.com/Cleaning-date-columns-tp3343359p3345482.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cleaning date columns
Hi Everyone, I have the following problem: data - structure(list(prochi = c(IND1, IND1, IND1, IND2, IND2, IND2, IND2, IND3, IND4, IND5), date_admission = structure(c(6468, 6470, 7063, 9981, 9983, 14186, 14372, 5129, 9767, 11168), class = Date)), .Names = c(prochi, date_admission), row.names = c(27, 28, 21, 86, 77, 80, 1, 114, 192, 322), class = data.frame) I have records for individuals that were taken on specific dates. Some of the dates are within 3 days of each other. I want to be able to clean my date column and select the earliest of the dates that occur within 3 days of each other per individual as a single observation that represents the N observations. So for example: input: IND11987-09-17 IND1 1987-09-19 IND1 1989-05-04 output: IND11987-09-17 IND1 1989-05-04 I'm not sure where to start with this? Thanks, Nat -- View this message in context: http://r.789695.n4.nabble.com/Cleaning-date-columns-tp3343359p3343359.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning date columns
Here is one possible way (I think - untested code) cData - do.call(rbind, lapply(split(data, data$prochi), function(dat) { dat - dat[order(dat$date), ] while(any(d - (diff(dat$date) = 3))) dat - dat[-(min(which(d))+1), ] dat })) (It would be courteous of you to give us your real name, by the way) Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Newbie19_02 Sent: Wednesday, 9 March 2011 9:20 PM To: r-help@r-project.org Subject: [R] Cleaning date columns Hi Everyone, I have the following problem: data - structure(list(prochi = c(IND1, IND1, IND1, IND2, IND2, IND2, IND2, IND3, IND4, IND5), date_admission = structure(c(6468, 6470, 7063, 9981, 9983, 14186, 14372, 5129, 9767, 11168), class = Date)), .Names = c(prochi, date_admission), row.names = c(27, 28, 21, 86, 77, 80, 1, 114, 192, 322), class = data.frame) I have records for individuals that were taken on specific dates. Some of the dates are within 3 days of each other. I want to be able to clean my date column and select the earliest of the dates that occur within 3 days of each other per individual as a single observation that represents the N observations. So for example: input: IND11987-09-17 IND1 1987-09-19 IND1 1989-05-04 output: IND11987-09-17 IND1 1989-05-04 I'm not sure where to start with this? Thanks, Nat -- View this message in context: http://r.789695.n4.nabble.com/Cleaning-date-columns-tp3343359p3343359.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cleaning up a vector
I calculated a large vector. Unfortunately, I have some measurement error in my data and some of the values in the vector are erroneous. I ended up wih some Infs and NaNs in the vector. I would like to filter out the Inf and NaN values and only keep the values in my vector that range from 1 to 20. Is there a way to filter out Infs and NaNs in R and end up with a clean vector? Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cleaning up a vector
Try this: x[is.finite(x)] On Fri, Oct 1, 2010 at 2:51 PM, mlar...@rsmas.miami.edu wrote: I calculated a large vector. Unfortunately, I have some measurement error in my data and some of the values in the vector are erroneous. I ended up wih some Infs and NaNs in the vector. I would like to filter out the Inf and NaN values and only keep the values in my vector that range from 1 to 20. Is there a way to filter out Infs and NaNs in R and end up with a clean vector? Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cleaning up a vector
Mike, Small, reproducible examples are always useful for the rest of the us. x - c(0, NA, NaN, 1 , 10, 20, 21, Inf) x[!is.na(x) x =1 x= 20] Is that what you're looking for? mlar...@rsmas.miami.edu wrote: I calculated a large vector. Unfortunately, I have some measurement error in my data and some of the values in the vector are erroneous. I ended up wih some Infs and NaNs in the vector. I would like to filter out the Inf and NaN values and only keep the values in my vector that range from 1 to 20. Is there a way to filter out Infs and NaNs in R and end up with a clean vector? Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cleaning up a vector
On Fri, Oct 1, 2010 at 10:51 AM, mlar...@rsmas.miami.edu wrote: I calculated a large vector. Unfortunately, I have some measurement error in my data and some of the values in the vector are erroneous. I ended up wih some Infs and NaNs in the vector. I would like to filter out the Inf and NaN values and only keep the values in my vector that range from 1 to 20. Is there a way to filter out Infs and NaNs in R and end up with a clean vector? Two steps, starting from vector x x1 = x[is.finite(x)]; x2 = x1[(x1 = 20) (x1 = 1)]; From what you say, x2 is the result you want. Just be aware that dropping values will change the indexing. Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cleaning up a vector
On Oct 1, 2010, at 12:51 PM, mlar...@rsmas.miami.edu wrote: I calculated a large vector. Unfortunately, I have some measurement error in my data and some of the values in the vector are erroneous. I ended up wih some Infs and NaNs in the vector. I would like to filter out the Inf and NaN values and only keep the values in my vector that range from 1 to 20. Is there a way to filter out Infs and NaNs in R and end up with a clean vector? Mike set.seed(1) x - sample(c(0:25, NaN, Inf, -Inf), 50, replace = TRUE) x [1]7 10 16 NaN5 NaN Inf 19 18155 19 [14] 11 22 14 20 -Inf 11 22 Inf6 1837 11 [27]0 11 259 13 17 145 23 19 233 20 [40] 11 23 18 22 16 15 220 13 21 20 x[is.finite(x) x = 1 x = 20] [1] 7 10 16 5 19 18 1 5 5 19 11 14 20 11 6 18 3 7 11 11 9 13 [23] 17 14 5 19 3 20 11 18 16 15 13 20 See ?is.finite HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cleaning up a vector
Complementing: findInterval(x[is.finite(x)], 1:20) On Fri, Oct 1, 2010 at 2:55 PM, Henrique Dallazuanna www...@gmail.comwrote: Try this: x[is.finite(x)] On Fri, Oct 1, 2010 at 2:51 PM, mlar...@rsmas.miami.edu wrote: I calculated a large vector. Unfortunately, I have some measurement error in my data and some of the values in the vector are erroneous. I ended up wih some Infs and NaNs in the vector. I would like to filter out the Inf and NaN values and only keep the values in my vector that range from 1 to 20. Is there a way to filter out Infs and NaNs in R and end up with a clean vector? Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cleaning a time series
Dear R Users, Was wondering if anyone can give me pointers to functionality in R that can help clean a time series ? For example, some kind of package/functionality which identifies potential errors and takes some action, such as replacement by some suitable value (carry-forward, average of nearest, what have you) and reporting of errors identified. I did search Google for R cran time series clean outlier and various permutations but did not come across anything. Thanks, Tolga Generally, this communication is for informational purposes only and it is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. In the event you are receiving the offering materials attached below related to your interest in hedge funds or private equity, this communication may be intended as an offer or solicitation for the purchase or sale of such fund(s). All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase Co., its subsidiaries and affiliates. This transmission may contain information that is privileged, confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMorgan Chase Co., its subsidiaries and affiliates, as applicable, for any loss or damage arising in any way from its use. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank you. Please refer to http://www.jpmorgan.com/pages/disclosures for disclosures relating to UK legal entities. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning a time series
The zoo package has six na.* routines for carrying values forward, etc. library(zoo) ?zoo describes them. Also see the vignettes. On Fri, May 23, 2008 at 6:55 AM, [EMAIL PROTECTED] wrote: Dear R Users, Was wondering if anyone can give me pointers to functionality in R that can help clean a time series ? For example, some kind of package/functionality which identifies potential errors and takes some action, such as replacement by some suitable value (carry-forward, average of nearest, what have you) and reporting of errors identified. I did search Google for R cran time series clean outlier and various permutations but did not come across anything. Thanks, Tolga Generally, this communication is for informational purposes only and it is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. In the event you are receiving the offering materials attached below related to your interest in hedge funds or private equity, this communication may be intended as an offer or solicitation for the purchase or sale of such fund(s). All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase Co., its subsidiaries and affiliates. This transmission may contain information that is privileged, confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMorgan Chase Co., its subsidiaries and affiliates, as applicable, for any loss or damage arising in any way from its use. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank you. Please refer to http://www.jpmorgan.com/pages/disclosures for disclosures relating to UK legal entities. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cleaning up memory in R
I'm trying to work on a large dataset and after each segment of run, I need a command to flush the memory. I tried gc() and rm(list=ls()) but they don't seem to help. gc() does not do anything beside showing the memory usage. I'm using the package BSgenome from BioC. Thanks a bunch -- Regards, Anh Tran [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up memory in R
On 5/14/2008 3:59 PM, Anh Tran wrote: I'm trying to work on a large dataset and after each segment of run, I need a command to flush the memory. I tried gc() and rm(list=ls()) but they don't seem to help. gc() does not do anything beside showing the memory usage. How do you know it does nothing? R won't normally release memory to the OS, but it is still freed to be reused internally in R. On the other hand, if you still have references to the variables, then gc() really will do nothing. Duncan Murdoch I'm using the package BSgenome from BioC. Thanks a bunch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cleaning database: grep()? apply()?
Dear R users, I have a huge database and I need to adjust it somewhat. Here is a very little cut out from database: CODENAME DATE DATA1 4813ADVANCED TELECOM19870.013 3845ADVANCED THERAPEUTIC SYS LTD198710.1 3845ADVANCED THERAPEUTIC SYS LTD19892.463 3845ADVANCED THERAPEUTIC SYS LTD19881.563 2836ADVANCED TISSUE SCI -CL A 19870.847 2836ADVANCED TISSUE SCI -CL A 1989 0.872 2836ADVANCED TISSUE SCI -CL A 1988 0.529 What I need is: 1) Delete all cases containing -CL A (and also -OLD, -ADS, etc) at the end 2) Delete all cases that have less than 3 years of data 3) For each remaining case compute ratio DATA1(1989) / DATA1(1987) [and then ratios involving other data variables] and output this into new database consisting of CODE, NAME, RATIOs. Maybe someone can suggest an effective way to do these things? I imagine the first one would involve grep(), and 2 and 3 would involve apply family of functions, but I cannot get my mind around the actual code to perform this adjustments. I am new to R, I do write code but usually it consists of for-functions and plotting. I would much appreciate your help. Thank you in advance! -- Jonas Malmros Stockholm University Stockholm, Sweden __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning database: grep()? apply()?
Here is how to wittle it down for the first two parts of your question. I am not exactly what you are after in the third part. Is it that you want specific DATEs or do you want the ratio of the DATE[max]/DATE[min]? x - read.table(textConnection(CODENAME DATE DATA1 + 4813'ADVANCED TELECOM'19870.013 + 3845'ADVANCED THERAPEUTIC SYS LTD'198710.1 + 3845'ADVANCED THERAPEUTIC SYS LTD'19892.463 + 3845'ADVANCED THERAPEUTIC SYS LTD'19881.563 + 2836'ADVANCED TISSUE SCI -CL A' 19870.847 + 2836'ADVANCED TISSUE SCI -CL A' 1989 0.872 + 2836'ADVANCED TISSUE SCI -CL A' 1988 0.529), header=TRUE) # matches on things to delete delete_indx - grep(-CL A$|-OLD$|-ADS$, x$NAME) # delete them x - x[-delete_indx,] x CODE NAME DATE DATA1 1 4813 ADVANCED TELECOM 1987 0.013 2 3845 ADVANCED THERAPEUTIC SYS LTD 1987 10.100 3 3845 ADVANCED THERAPEUTIC SYS LTD 1989 2.463 4 3845 ADVANCED THERAPEUTIC SYS LTD 1988 1.563 # I assume you want to use NAME to check for ranges of data date_range - tapply(x$DATE, x$NAME, function(dates) diff(range(dates))) date_range ADVANCED TELECOM ADVANCED THERAPEUTIC SYS LTD 02 ADVANCED TISSUE SCI -CL A NA # delete ones with less than 3 years names_to_delete - names(date_range[date_range 2]) # delete those entries x - x[!(x$NAME %in% names_to_delete),] x CODE NAME DATE DATA1 2 3845 ADVANCED THERAPEUTIC SYS LTD 1987 10.100 3 3845 ADVANCED THERAPEUTIC SYS LTD 1989 2.463 4 3845 ADVANCED THERAPEUTIC SYS LTD 1988 1.563 On Nov 13, 2007 2:34 PM, Jonas Malmros [EMAIL PROTECTED] wrote: Dear R users, I have a huge database and I need to adjust it somewhat. Here is a very little cut out from database: CODENAME DATE DATA1 4813ADVANCED TELECOM19870.013 3845ADVANCED THERAPEUTIC SYS LTD198710.1 3845ADVANCED THERAPEUTIC SYS LTD19892.463 3845ADVANCED THERAPEUTIC SYS LTD19881.563 2836ADVANCED TISSUE SCI -CL A 19870.847 2836ADVANCED TISSUE SCI -CL A 1989 0.872 2836ADVANCED TISSUE SCI -CL A 1988 0.529 What I need is: 1) Delete all cases containing -CL A (and also -OLD, -ADS, etc) at the end 2) Delete all cases that have less than 3 years of data 3) For each remaining case compute ratio DATA1(1989) / DATA1(1987) [and then ratios involving other data variables] and output this into new database consisting of CODE, NAME, RATIOs. Maybe someone can suggest an effective way to do these things? I imagine the first one would involve grep(), and 2 and 3 would involve apply family of functions, but I cannot get my mind around the actual code to perform this adjustments. I am new to R, I do write code but usually it consists of for-functions and plotting. I would much appreciate your help. Thank you in advance! -- Jonas Malmros Stockholm University Stockholm, Sweden __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.