Re: [R] Cleaning data

2017-09-26 Thread Jim Lemon
Hi Bayan,
Your question seems to imply that the "age" column contains floating
point numbers, e.g.

df
height  weight  age
170  72 21.5
...

If this is so, you will only find an integer in diff(age) if two
adjacent numbers happen to have the same decimal fraction _and_ the
subtraction does not produce a very small decimal remainder due to one
or both of the numbers being unable to be represented exactly in
binary notation as Eric pointed out. This seems an unusual criterion
for discarding values. Perhaps if you explain why an integer result is
undesirable it would help. It can be done:

badrows<-which(is.integer(diff(df$age)))
df<-df[-badrows,]

OR

df<-df[badrows+1,]

if you want to delete the second rather than the first age.

Jim

On Tue, Sep 26, 2017 at 7:50 PM, bayan sardini  wrote:
> Hi
>
> I want to clean my data frame, based on the age column, whereas i want to 
> delete the rows that the difference between its elements (i+1)-i= integer. i 
> used
>
> a <- diff(df$age)
> for(i in a){if(is.integer(a) == true){df <- df[-a,]
> }}
>
> but, it doesn’t work, any ideas
>
> Thanks in advance
> Bayan
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cleaning data

2017-09-26 Thread Eric Berger
Hi Bayan,
In your code, 'a' is a vector and is.integer(a) is a logical of length 1 -
most likely FALSE if even one element of a is not an integer. (Since R will
coerce all the elements of a to the same type.)
You need to decide whether something "close enough" to an integer is to be
considered an integer - e.g. a distance of 0.01 = 1e-6.

 a <- df$age
df <- df[ c( TRUE, abs( a - round(a,0) )%%1 ) > 1e-6 ), ]

I added the 'TRUE' at the beginning to always keep the first row of df. If
you prefer to always keep the last row then move the TRUE to the end.

HTH,

Eric




On Tue, Sep 26, 2017 at 12:50 PM, bayan sardini 
wrote:

> Hi
>
> I want to clean my data frame, based on the age column, whereas i want to
> delete the rows that the difference between its elements (i+1)-i= integer.
> i used
>
> a <- diff(df$age)
> for(i in a){if(is.integer(a) == true){df <- df[-a,]
> }}
>
> but, it doesn’t work, any ideas
>
> Thanks in advance
> Bayan
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cleaning data

2017-09-26 Thread bayan sardini
Hi 

I want to clean my data frame, based on the age column, whereas i want to 
delete the rows that the difference between its elements (i+1)-i= integer. i 
used 

a <- diff(df$age)
for(i in a){if(is.integer(a) == true){df <- df[-a,]
}}

but, it doesn’t work, any ideas

Thanks in advance
Bayan
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cleaning

2015-11-11 Thread Ashta
Sarah,

Thank you very much.   For the other variables
I was trying to do the same job in different way because it is easier to
list it

Example

test < which(dat$var1  !="BAA" | dat$var1 !="FAG" )
 {
dat <- dat[-test,]}   and I did not get the  right result. What am I
missing here?





On Wed, Nov 11, 2015 at 7:54 PM, Sarah Goslee 
wrote:

> On Wed, Nov 11, 2015 at 8:44 PM, Ashta  wrote:
> > Hi Sarah,
> >
> > I used the following to clean my data, the program crushed several times.
> >
> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
> >
> > What is the difference between these two
> >
> > test <- dat[dat$Var1  %in% "YYZ" | dat$Var1 %in% "MSN" ,]
>
> Besides that you're using %in% wrong? I told you how to proceed.
>
> myvalues <- c("YYZ", "MSN")
>
> test <- subset(dat, Var1 %in% myvalues)
>
>
> > subset(dat, Var1 %in% myvalues)
>   X Var1 Freq
> 3 3  MSN 1040
> 4 4  YYZ  300
>
> >
> >
> >
> >
> > On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee 
> > wrote:
> >>
> >> Please keep replies on the list so others may participate in the
> >> conversation.
> >>
> >> If you have a character vector containing the potential values, you
> >> might look at %in% for one approach to subsetting your data.
> >>
> >> Var1 %in% myvalues
> >>
> >> Sarah
> >>
> >> On Wed, Nov 11, 2015 at 7:10 PM, Ashta  wrote:
> >> > Thank you Sarah for your prompt response!
> >> >
> >> > I have the list of values of the variable Var1 it is around 20.
> >> > How can I modify this one to include all the 20 valid values?
> >> >
> >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
> >> >
> >> > Is there a way (efficient )  of doing it?
> >> >
> >> > Thank you again
> >> >
> >> >
> >> >
> >> > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee  >
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta  wrote:
> >> >> > Hi all,
> >> >> >
> >> >> > I have a data frame with  huge rows and columns.
> >> >> >
> >> >> > When I looked at the data,  it has several garbage values need to
> be
> >> >> >
> >> >> > cleaned. For a sample I am showing you the frequency distribution
> >> >> > of one variables
> >> >> >
> >> >> > Var1 Freq
> >> >> > 1:3
> >> >> > 2]6
> >> >> > 3MSN 1040
> >> >> > 4YYZ  300
> >> >> > 5\\4
> >> >> > 6+ 3
> >> >> > 7.   ?>   15
> >> >>
> >> >> Please use dput() to provide your data. I made a guess at what you
> had
> >> >> in R, but could be wrong.
> >> >>
> >> >>
> >> >> > and continues.
> >> >> >
> >> >> > I want to keep those rows that contain only a valid variable value
> >> >> >
> >> >> > In this  case MSN and YYZ. I tried the following
> >> >> >
> >> >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*
> >> >> >
> >> >> > but I am not getting the desired result.
> >> >>
> >> >> What are you getting? How does it differ from the desired result?
> >> >>
> >> >> >  I have
> >> >> >
> >> >> > Any help or idea?
> >> >>
> >> >> I get:
> >> >>
> >> >> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ",
> >> >> > "",
> >> >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names =
> >> >> c("X",
> >> >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L))
> >> >> >
> >> >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
> >> >> > test
> >> >>   X Var1 Freq
> >> >> 3 3  MSN 1040
> >> >> 4 4  YYZ  300
> >> >>
> >> >> Which seems reasonable to me.
> >> >>
> >> >>
> >> >> >
> >> >> > [[alternative HTML version deleted]]
> >> >>
> >> >> Please don't post in HTML either: it introduces all sorts of errors
> to
> >> >> your message.
> >> >>
> >> >> Sarah
> >> >>
> >
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning

2015-11-11 Thread Sarah Goslee
Please keep replies on the list so others may participate in the conversation.

If you have a character vector containing the potential values, you
might look at %in% for one approach to subsetting your data.

Var1 %in% myvalues

Sarah

On Wed, Nov 11, 2015 at 7:10 PM, Ashta  wrote:
> Thank you Sarah for your prompt response!
>
> I have the list of values of the variable Var1 it is around 20.
> How can I modify this one to include all the 20 valid values?
>
> test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
>
> Is there a way (efficient )  of doing it?
>
> Thank you again
>
>
>
> On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee 
> wrote:
>>
>> Hi,
>>
>> On Wed, Nov 11, 2015 at 6:51 PM, Ashta  wrote:
>> > Hi all,
>> >
>> > I have a data frame with  huge rows and columns.
>> >
>> > When I looked at the data,  it has several garbage values need to be
>> >
>> > cleaned. For a sample I am showing you the frequency distribution
>> > of one variables
>> >
>> > Var1 Freq
>> > 1:3
>> > 2]6
>> > 3MSN 1040
>> > 4YYZ  300
>> > 5\\4
>> > 6+ 3
>> > 7.   ?>   15
>>
>> Please use dput() to provide your data. I made a guess at what you had
>> in R, but could be wrong.
>>
>>
>> > and continues.
>> >
>> > I want to keep those rows that contain only a valid variable value
>> >
>> > In this  case MSN and YYZ. I tried the following
>> >
>> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*
>> >
>> > but I am not getting the desired result.
>>
>> What are you getting? How does it differ from the desired result?
>>
>> >  I have
>> >
>> > Any help or idea?
>>
>> I get:
>>
>> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", "",
>> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names = c("X",
>> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L))
>> >
>> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
>> > test
>>   X Var1 Freq
>> 3 3  MSN 1040
>> 4 4  YYZ  300
>>
>> Which seems reasonable to me.
>>
>>
>> >
>> > [[alternative HTML version deleted]]
>>
>> Please don't post in HTML either: it introduces all sorts of errors to
>> your message.
>>
>> Sarah
>>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning

2015-11-11 Thread Boris Steipe
If what you posted here is what you typed, your syntax is wrong.
I strongly advise you to consult the two links here:

http://adv-r.had.co.nz/Reproducibility.html
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
... and please read the posting guide and don't post in HTML.


B.


On Nov 11, 2015, at 10:03 PM, Ashta  wrote:

> Sarah,
> 
> Thank you very much.   For the other variables
> I was trying to do the same job in different way because it is easier to
> list it
> 
> Example
> 
> test < which(dat$var1  !="BAA" | dat$var1 !="FAG" )
> {
>dat <- dat[-test,]}   and I did not get the  right result. What am I
> missing here?
> 
> 
> 
> 
> 
> On Wed, Nov 11, 2015 at 7:54 PM, Sarah Goslee 
> wrote:
> 
>> On Wed, Nov 11, 2015 at 8:44 PM, Ashta  wrote:
>>> Hi Sarah,
>>> 
>>> I used the following to clean my data, the program crushed several times.
>>> 
>>> test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
>>> 
>>> What is the difference between these two
>>> 
>>> test <- dat[dat$Var1  %in% "YYZ" | dat$Var1 %in% "MSN" ,]
>> 
>> Besides that you're using %in% wrong? I told you how to proceed.
>> 
>> myvalues <- c("YYZ", "MSN")
>> 
>> test <- subset(dat, Var1 %in% myvalues)
>> 
>> 
>>> subset(dat, Var1 %in% myvalues)
>>  X Var1 Freq
>> 3 3  MSN 1040
>> 4 4  YYZ  300
>> 
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee 
>>> wrote:
 
 Please keep replies on the list so others may participate in the
 conversation.
 
 If you have a character vector containing the potential values, you
 might look at %in% for one approach to subsetting your data.
 
 Var1 %in% myvalues
 
 Sarah
 
 On Wed, Nov 11, 2015 at 7:10 PM, Ashta  wrote:
> Thank you Sarah for your prompt response!
> 
> I have the list of values of the variable Var1 it is around 20.
> How can I modify this one to include all the 20 valid values?
> 
> test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
> 
> Is there a way (efficient )  of doing it?
> 
> Thank you again
> 
> 
> 
> On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee >> 
> wrote:
>> 
>> Hi,
>> 
>> On Wed, Nov 11, 2015 at 6:51 PM, Ashta  wrote:
>>> Hi all,
>>> 
>>> I have a data frame with  huge rows and columns.
>>> 
>>> When I looked at the data,  it has several garbage values need to
>> be
>>> 
>>> cleaned. For a sample I am showing you the frequency distribution
>>> of one variables
>>> 
>>>Var1 Freq
>>> 1:3
>>> 2]6
>>> 3MSN 1040
>>> 4YYZ  300
>>> 5\\4
>>> 6+ 3
>>> 7.   ?>   15
>> 
>> Please use dput() to provide your data. I made a guess at what you
>> had
>> in R, but could be wrong.
>> 
>> 
>>> and continues.
>>> 
>>> I want to keep those rows that contain only a valid variable value
>>> 
>>> In this  case MSN and YYZ. I tried the following
>>> 
>>> *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*
>>> 
>>> but I am not getting the desired result.
>> 
>> What are you getting? How does it differ from the desired result?
>> 
>>> I have
>>> 
>>> Any help or idea?
>> 
>> I get:
>> 
>>> dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ",
>>> "",
>> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names =
>> c("X",
>> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L))
>>> 
>>> test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
>>> test
>>  X Var1 Freq
>> 3 3  MSN 1040
>> 4 4  YYZ  300
>> 
>> Which seems reasonable to me.
>> 
>> 
>>> 
>>>[[alternative HTML version deleted]]
>> 
>> Please don't post in HTML either: it introduces all sorts of errors
>> to
>> your message.
>> 
>> Sarah
>> 
>>> 
>>> 
>> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning

2015-11-11 Thread Sarah Goslee
On Wed, Nov 11, 2015 at 8:44 PM, Ashta  wrote:
> Hi Sarah,
>
> I used the following to clean my data, the program crushed several times.
>
> test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
>
> What is the difference between these two
>
> test <- dat[dat$Var1  %in% "YYZ" | dat$Var1 %in% "MSN" ,]

Besides that you're using %in% wrong? I told you how to proceed.

myvalues <- c("YYZ", "MSN")

test <- subset(dat, Var1 %in% myvalues)


> subset(dat, Var1 %in% myvalues)
  X Var1 Freq
3 3  MSN 1040
4 4  YYZ  300

>
>
>
>
> On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee 
> wrote:
>>
>> Please keep replies on the list so others may participate in the
>> conversation.
>>
>> If you have a character vector containing the potential values, you
>> might look at %in% for one approach to subsetting your data.
>>
>> Var1 %in% myvalues
>>
>> Sarah
>>
>> On Wed, Nov 11, 2015 at 7:10 PM, Ashta  wrote:
>> > Thank you Sarah for your prompt response!
>> >
>> > I have the list of values of the variable Var1 it is around 20.
>> > How can I modify this one to include all the 20 valid values?
>> >
>> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
>> >
>> > Is there a way (efficient )  of doing it?
>> >
>> > Thank you again
>> >
>> >
>> >
>> > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee 
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta  wrote:
>> >> > Hi all,
>> >> >
>> >> > I have a data frame with  huge rows and columns.
>> >> >
>> >> > When I looked at the data,  it has several garbage values need to be
>> >> >
>> >> > cleaned. For a sample I am showing you the frequency distribution
>> >> > of one variables
>> >> >
>> >> > Var1 Freq
>> >> > 1:3
>> >> > 2]6
>> >> > 3MSN 1040
>> >> > 4YYZ  300
>> >> > 5\\4
>> >> > 6+ 3
>> >> > 7.   ?>   15
>> >>
>> >> Please use dput() to provide your data. I made a guess at what you had
>> >> in R, but could be wrong.
>> >>
>> >>
>> >> > and continues.
>> >> >
>> >> > I want to keep those rows that contain only a valid variable value
>> >> >
>> >> > In this  case MSN and YYZ. I tried the following
>> >> >
>> >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*
>> >> >
>> >> > but I am not getting the desired result.
>> >>
>> >> What are you getting? How does it differ from the desired result?
>> >>
>> >> >  I have
>> >> >
>> >> > Any help or idea?
>> >>
>> >> I get:
>> >>
>> >> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ",
>> >> > "",
>> >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names =
>> >> c("X",
>> >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L))
>> >> >
>> >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
>> >> > test
>> >>   X Var1 Freq
>> >> 3 3  MSN 1040
>> >> 4 4  YYZ  300
>> >>
>> >> Which seems reasonable to me.
>> >>
>> >>
>> >> >
>> >> > [[alternative HTML version deleted]]
>> >>
>> >> Please don't post in HTML either: it introduces all sorts of errors to
>> >> your message.
>> >>
>> >> Sarah
>> >>
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cleaning

2015-11-11 Thread Ashta
Hi all,

I have a data frame with  huge rows and columns.

When I looked at the data,  it has several garbage values need to be

cleaned. For a sample I am showing you the frequency distribution
of one variables

Var1 Freq
1:3
2]6
3MSN 1040
4YYZ  300
5\\4
6+ 3
7.   ?>   15

and continues.

I want to keep those rows that contain only a valid variable value

In this  case MSN and YYZ. I tried the following

*test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*

but I am not getting the desired result.

 I have

Any help or idea?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning

2015-11-11 Thread Sarah Goslee
Hi,

On Wed, Nov 11, 2015 at 6:51 PM, Ashta  wrote:
> Hi all,
>
> I have a data frame with  huge rows and columns.
>
> When I looked at the data,  it has several garbage values need to be
>
> cleaned. For a sample I am showing you the frequency distribution
> of one variables
>
> Var1 Freq
> 1:3
> 2]6
> 3MSN 1040
> 4YYZ  300
> 5\\4
> 6+ 3
> 7.   ?>   15

Please use dput() to provide your data. I made a guess at what you had
in R, but could be wrong.


> and continues.
>
> I want to keep those rows that contain only a valid variable value
>
> In this  case MSN and YYZ. I tried the following
>
> *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*
>
> but I am not getting the desired result.

What are you getting? How does it differ from the desired result?

>  I have
>
> Any help or idea?

I get:

> dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", "",
+ "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names = c("X",
+ "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L))
>
> test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
> test
  X Var1 Freq
3 3  MSN 1040
4 4  YYZ  300

Which seems reasonable to me.


>
> [[alternative HTML version deleted]]

Please don't post in HTML either: it introduces all sorts of errors to
your message.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning

2015-11-11 Thread Ashta
Hi Sarah,

I used the following to clean my data, the program crushed several times.


*test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*



*What is the difference between these two**test <- dat[dat$Var1
**%in% "YYZ" | dat$Var1** %in% "MSN" ,]*




On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee 
wrote:

> Please keep replies on the list so others may participate in the
> conversation.
>
> If you have a character vector containing the potential values, you
> might look at %in% for one approach to subsetting your data.
>
> Var1 %in% myvalues
>
> Sarah
>
> On Wed, Nov 11, 2015 at 7:10 PM, Ashta  wrote:
> > Thank you Sarah for your prompt response!
> >
> > I have the list of values of the variable Var1 it is around 20.
> > How can I modify this one to include all the 20 valid values?
> >
> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
> >
> > Is there a way (efficient )  of doing it?
> >
> > Thank you again
> >
> >
> >
> > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee 
> > wrote:
> >>
> >> Hi,
> >>
> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta  wrote:
> >> > Hi all,
> >> >
> >> > I have a data frame with  huge rows and columns.
> >> >
> >> > When I looked at the data,  it has several garbage values need to be
> >> >
> >> > cleaned. For a sample I am showing you the frequency distribution
> >> > of one variables
> >> >
> >> > Var1 Freq
> >> > 1:3
> >> > 2]6
> >> > 3MSN 1040
> >> > 4YYZ  300
> >> > 5\\4
> >> > 6+ 3
> >> > 7.   ?>   15
> >>
> >> Please use dput() to provide your data. I made a guess at what you had
> >> in R, but could be wrong.
> >>
> >>
> >> > and continues.
> >> >
> >> > I want to keep those rows that contain only a valid variable value
> >> >
> >> > In this  case MSN and YYZ. I tried the following
> >> >
> >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*
> >> >
> >> > but I am not getting the desired result.
> >>
> >> What are you getting? How does it differ from the desired result?
> >>
> >> >  I have
> >> >
> >> > Any help or idea?
> >>
> >> I get:
> >>
> >> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ",
> "",
> >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names =
> c("X",
> >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L))
> >> >
> >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
> >> > test
> >>   X Var1 Freq
> >> 3 3  MSN 1040
> >> 4 4  YYZ  300
> >>
> >> Which seems reasonable to me.
> >>
> >>
> >> >
> >> > [[alternative HTML version deleted]]
> >>
> >> Please don't post in HTML either: it introduces all sorts of errors to
> >> your message.
> >>
> >> Sarah
> >>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cleaning up workspace

2013-10-16 Thread Prof J C Nash (U30A)
In order to have a clean workspace at the start of each chapter of a 
book I'm kniting I've written a little script as follows:


# chapclean.R
# This cleans up the R workspace
ilist-c(.GlobalEnv, package:stats, package:graphics, 
package:grDevices,

package:utils, package:datasets, package:methods, Autoloads,
package:base)
print(ilist)
xlist-search()[which(!(search() %in% ilist))]
print(xlist)
for (ff in xlist){
   cat(Detach ,ff, which is pos ,as.integer(which(ff == 
search())),\n)
   detach(pos=as.integer(which(ff == search())), unload=TRUE) # ?? do 
we need unload

}
rm(list=ls())


This appears to work fine in my system -- session info is below, but I 
get 30 warnings of the type


30: In FUN(X[[2L]], ...) :
  Created a package name, ‘2013-10-16 10:56:47’, when none found

Does anyone have ideas why the warnings are being generated?  I'd like 
to avoid suppressing them. Here's the session info.


R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_CA.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_CA.UTF-8
 [5] LC_MONETARY=en_CA.UTF-8LC_MESSAGES=en_CA.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.0.1


John Nash

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up workspace

2013-10-16 Thread Duncan Murdoch
This has been reported before on the bug list 
(https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15481).  The 
message is coming from the methods package, but I don't know if it's a 
bug or ignorable.


Duncan Murdoch

On 16/10/2013 11:03 AM, Prof J C Nash (U30A) wrote:

In order to have a clean workspace at the start of each chapter of a
book I'm kniting I've written a little script as follows:

# chapclean.R
# This cleans up the R workspace
ilist-c(.GlobalEnv, package:stats, package:graphics,
package:grDevices,
package:utils, package:datasets, package:methods, Autoloads,
package:base)
print(ilist)
xlist-search()[which(!(search() %in% ilist))]
print(xlist)
for (ff in xlist){
 cat(Detach ,ff, which is pos ,as.integer(which(ff ==
search())),\n)
 detach(pos=as.integer(which(ff == search())), unload=TRUE) # ?? do
we need unload
}
rm(list=ls())


This appears to work fine in my system -- session info is below, but I
get 30 warnings of the type

30: In FUN(X[[2L]], ...) :
Created a package name, ‘2013-10-16 10:56:47’, when none found

Does anyone have ideas why the warnings are being generated?  I'd like
to avoid suppressing them. Here's the session info.

R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
   [1] LC_CTYPE=en_CA.UTF-8   LC_NUMERIC=C
   [3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_CA.UTF-8
   [5] LC_MONETARY=en_CA.UTF-8LC_MESSAGES=en_CA.UTF-8
   [7] LC_PAPER=C LC_NAME=C
   [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.0.1
  

John Nash

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-03-03 Thread John C Nash
When I was still teaching undergraduate intro biz-stat (among that community it 
is always
abbreviated), we needed to control the spreadsheet behaviour of TAs who entered 
marks into
a spreadsheet. We came up with TellTable (the Sourceforge site is still around 
with refs
at http://telltable-s.sourceforge.net/), which put openoffice calc on a server 
and made
sure change recording was on and the menu to switch off change recording was 
removed. It
is used over a web browser with a VNC client. Neil Smith wrote a Java 
application to view
all the changes by who, what, when etc., and we discovered the infrastructure 
was quite
nice for running any single user app in a shared mode with version control. 
However, with
Google Docs, we realized we could try to make money or enjoy life, and so the 
project is
now moribund. However, the ideas are there, and if anyone gets interested, I'll 
be happy
to try to dig up materials, though I suspect that it would be easier to work 
with the
ideas and more modern tools.

The key idea is that there is just ONE master file, and that there is some 
discipline over
keeping that file OK. My opinion is that this concept could be exploited much 
more for
lots of different situations, but it seems that cloud technology is being used 
to create
lots of versions of files rather than consolidate and control such files.

JN


On 03/03/2012 06:00 AM, r-help-requ...@r-project.org wrote:
 Message: 76
 Date: Fri, 2 Mar 2012 20:04:05 -0500
 From: jim holtman jholt...@gmail.com
 To: Greg Snow 538...@gmail.com
 Cc: r-help r-help@r-project.org
 Subject: Re: [R] Cleaning up messy Excel data
 Message-ID:
   caaxdm-6vzxcli4mr0gukwge5eva0-gx03fruey9ej3cajy4...@mail.gmail.com
 Content-Type: text/plain; charset=ISO-8859-1
 
 Unfortunately they only know how to use Excel and Word.  They are not
 folks who use a computer every day.  Many of them run factories or
 warehouses and asking them to use something like Access would not
 happen in my lifetime (I have retired twice already).
 
 I don't have any problems with them messing up the data that I send
 them; they are pretty good about making changes within the context of
 the spreadsheet.  The other issue is that I working with people in
 twenty different locations spread across the US, so I might be able to
 one of them to use Access (there is one I know that uses it), but that
 leaves 19 other people I would not be able to communicate with.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-03-03 Thread Greg Snow
Sometimes we adapt to our environment, sometimes we adapt our
environment to us. I like fortune(108).

I actually was suggesting that you add a tool to your toolbox, not limit it.

In my experience (and I don't expect everyone else's to match) data
manipulation that seems easier in Excel than R is only easier until
the client comes back and wants me to redo the whole analysis with one
typo fixed.  Then rerunning the script in R (or Perl or other tool) is
a lot easier than trying to remember where all I clicked, dragged,
selected, etc.

I do use Excel for somethings (though I would be happy to find other
tools for that if it were possible to expunge Excel from the earth)
and Word (I actually like using R2wd to send tables and graphs to word
that I can then give to clients who just want to be able to copy and
paste them to something else), I just think that many of the tasks
that many people use excel for would be better served with a better
tool.

If someone reading this decides to put some more thought into a
project up front and actually design a database up front rather than
letting it evolve into some monstrosity in Excel, and that decision
saves them some later grief, then the world will be a little bit
better place.

On Fri, Mar 2, 2012 at 6:04 PM, jim holtman jholt...@gmail.com wrote:
 Unfortunately they only know how to use Excel and Word.  They are not
 folks who use a computer every day.  Many of them run factories or
 warehouses and asking them to use something like Access would not
 happen in my lifetime (I have retired twice already).

 I don't have any problems with them messing up the data that I send
 them; they are pretty good about making changes within the context of
 the spreadsheet.  The other issue is that I working with people in
 twenty different locations spread across the US, so I might be able to
 one of them to use Access (there is one I know that uses it), but that
 leaves 19 other people I would not be able to communicate with.

 The other thing is, is that I use Excel myself to slice/dice data
 since there are things that are easier in Excel than R (believe it or
 not).  There are a number of tools I keep in my toolkit, and R is
 probably the most important, but I have not thrown the rest of them
 away since they still serve a purpose.

 So if you can come up with a way to 20 diverse groups, who are not
 computer literate, to change over in a couple of days from Excel to
 Access let me know.  BTW, I tried to use Access once and gave it up
 because it was not as intuitive as some other tools and did not give
 me any more capability than the ones I was using.  So I know I would
 have a problem in convincing other to make the change just so they
 could communicate with me, while they still had to use Excel to most
 of their other interfaces.

 This is the real world where you have to learn how to adapt to your
 environment and make the best of it.  So you just have to learn that
 Excel can be your friend (or at least not your enemy) and can serve a
 very useful purpose in getting your ideas across to other people.

 On Fri, Mar 2, 2012 at 6:41 PM, Greg Snow 538...@gmail.com wrote:
 Try sending your clients a data set (data frame, table, etc) as an MS
 Access data table instead.  They can still view the data as a table,
 but will have to go to much more effort to mess up the data, more
 likely they will do proper edits without messing anything up (mixing
 characters in with numbers, have more sexes than your biology teacher
 told you about, add extra lines at top or bottom that makes reading
 back into R more difficult, etc.)

 I have had a few clients that I talked into using MS Access from the
 start to enter their data, there was often a bit of resistance at
 first, but once they tried it and went through the process of
 designing the database up front they ended up thanking me and believed
 that the entire data entry process was easier and quicker than had the
 used excel as they originally planned.

 Access is still part of MS office, so they don't need to learn R or in
 any way break their chains from being prisoners of bill, but they will
 be more productive in more ways than just interfacing with you.

 Access (databases in general) force you to plan things out and do the
 correct thing from the start.  It is possible to do the right thing in
 Excel, but Excel does not encourage (let alone force) you to do the
 right thing, but makes it easy to do the wrong thing.

 On Thu, Mar 1, 2012 at 6:15 AM, jim holtman jholt...@gmail.com wrote:
 But there are some important reasons to use Excel.  In my work there
 are a lot of people that I have to send the equivalent of a data.frame
 to who want to look at the data and possibly slice/dice the data
 differently and then send back to me updates.  These folks do not know
 how to use R, but do have Microsoft Office installed on their
 computers and know how to use the different products.

 I have been very successful in conveying what 

Re: [R] Cleaning up messy Excel data

2012-03-03 Thread John Kane
Seconded 

John Kane
Kingston ON Canada


 -Original Message-
 From: rolf.tur...@xtra.co.nz
 Sent: Sat, 03 Mar 2012 13:46:42 +1300
 To: 538...@gmail.com
 Subject: Re: [R] Cleaning up messy Excel data
 
 On 03/03/12 12:41, Greg Snow wrote:
 
 SNIP
 It is possible to do the right thing in
 Excel, but Excel does not encourage (let alone force) you to do the
 right thing, but makes it easy to do the wrong thing.
 SNIP
 
 Fortune!
 
  cheers,
 
  Rolf Turner
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE ONLINE PHOTOSHARING - Share your photos online with your friends and 
family!
Visit http://www.inbox.com/photosharing to find out more!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-03-02 Thread Greg Snow
Try sending your clients a data set (data frame, table, etc) as an MS
Access data table instead.  They can still view the data as a table,
but will have to go to much more effort to mess up the data, more
likely they will do proper edits without messing anything up (mixing
characters in with numbers, have more sexes than your biology teacher
told you about, add extra lines at top or bottom that makes reading
back into R more difficult, etc.)

I have had a few clients that I talked into using MS Access from the
start to enter their data, there was often a bit of resistance at
first, but once they tried it and went through the process of
designing the database up front they ended up thanking me and believed
that the entire data entry process was easier and quicker than had the
used excel as they originally planned.

Access is still part of MS office, so they don't need to learn R or in
any way break their chains from being prisoners of bill, but they will
be more productive in more ways than just interfacing with you.

Access (databases in general) force you to plan things out and do the
correct thing from the start.  It is possible to do the right thing in
Excel, but Excel does not encourage (let alone force) you to do the
right thing, but makes it easy to do the wrong thing.

On Thu, Mar 1, 2012 at 6:15 AM, jim holtman jholt...@gmail.com wrote:
 But there are some important reasons to use Excel.  In my work there
 are a lot of people that I have to send the equivalent of a data.frame
 to who want to look at the data and possibly slice/dice the data
 differently and then send back to me updates.  These folks do not know
 how to use R, but do have Microsoft Office installed on their
 computers and know how to use the different products.

 I have been very successful in conveying what I am doing for them by
 communicating via Excel spreadsheets.  It is also an important medium
 in dealing with some international companies who provide data via
 Excel and expect responses back via Excel.

 When dealing with data in a tabular form, Excel does provide a way for
 a majority of the people I work with to understand the data.  Yes,
 there are problems with some of the ways that people use Excel, and
 yes I have had to invest time in scrubbing some of the data that I get
 from them, but if I did not, then I would probably not have a job
 working for them.  I use R exclusively for the analysis that I do, but
 find it convenient to use Excel to provide a communication mechanism
 to the majority of the non-R users that I have to deal with.  It is a
 convenient work-around because I would never get them to invest the
 time to learn R.

 So in the real world these is a need to Excel and we are not going to
 cause it to go away; we have to learn how to live with it, and from my
 standpoint, it has definitely benefited me in being able to
 communicate with my users and continuing to provide them with results
 that they are happy with.  They refer to letting me work my magic on
 the data; all they know is they see the result via Excel and in the
 background R is doing the heavy lifting that they do not have to know
 about.

 On Wed, Feb 29, 2012 at 4:41 PM, Rolf Turner rolf.tur...@xtra.co.nz wrote:
 On 01/03/12 04:43, John Kane wrote:

 (mydata- as.factor(c(1,2,3, 2, 5, 2)))
 str(mydata)

 newdata- as.character(mydata)

 newdata[newdata==2]- 0
 newdata- as.numeric(newdata)
 str(newdata)

 We really need to keep Excel (and other spreadsheets) out of peoples
 hands.


 Amen, bro'!!!

    cheers,

        Rolf Turner

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-03-02 Thread Jim Lemon
Unfortunately, a lot of people who use MS Office don't have or know how 
to use MS Access. Where I work now (as in the past) I have to tie 
someone to their chair, give them a few pokes with the cattle prod and 
then show them that a CSV file will load straight into Excel before I 
can convince them that they can use such a heretical data format. You 
don't want to know what I have to do to convince them that they can view 
my listings in HTML.


Jim

PS - Always give them a _copy_ of the CSV file.

On 03/03/2012 10:41 AM, Greg Snow wrote:

Try sending your clients a data set (data frame, table, etc) as an MS
Access data table instead.  They can still view the data as a table,
but will have to go to much more effort to mess up the data, more
likely they will do proper edits without messing anything up (mixing
characters in with numbers, have more sexes than your biology teacher
told you about, add extra lines at top or bottom that makes reading
back into R more difficult, etc.)

I have had a few clients that I talked into using MS Access from the
start to enter their data, there was often a bit of resistance at
first, but once they tried it and went through the process of
designing the database up front they ended up thanking me and believed
that the entire data entry process was easier and quicker than had the
used excel as they originally planned.

Access is still part of MS office, so they don't need to learn R or in
any way break their chains from being prisoners of bill, but they will
be more productive in more ways than just interfacing with you.

Access (databases in general) force you to plan things out and do the
correct thing from the start.  It is possible to do the right thing in
Excel, but Excel does not encourage (let alone force) you to do the
right thing, but makes it easy to do the wrong thing.

On Thu, Mar 1, 2012 at 6:15 AM, jim holtmanjholt...@gmail.com  wrote:

But there are some important reasons to use Excel.  In my work there
are a lot of people that I have to send the equivalent of a data.frame
to who want to look at the data and possibly slice/dice the data
differently and then send back to me updates.  These folks do not know
how to use R, but do have Microsoft Office installed on their
computers and know how to use the different products.

I have been very successful in conveying what I am doing for them by
communicating via Excel spreadsheets.  It is also an important medium
in dealing with some international companies who provide data via
Excel and expect responses back via Excel.

When dealing with data in a tabular form, Excel does provide a way for
a majority of the people I work with to understand the data.  Yes,
there are problems with some of the ways that people use Excel, and
yes I have had to invest time in scrubbing some of the data that I get
from them, but if I did not, then I would probably not have a job
working for them.  I use R exclusively for the analysis that I do, but
find it convenient to use Excel to provide a communication mechanism
to the majority of the non-R users that I have to deal with.  It is a
convenient work-around because I would never get them to invest the
time to learn R.

So in the real world these is a need to Excel and we are not going to
cause it to go away; we have to learn how to live with it, and from my
standpoint, it has definitely benefited me in being able to
communicate with my users and continuing to provide them with results
that they are happy with.  They refer to letting me work my magic on
the data; all they know is they see the result via Excel and in the
background R is doing the heavy lifting that they do not have to know
about.

On Wed, Feb 29, 2012 at 4:41 PM, Rolf Turnerrolf.tur...@xtra.co.nz  wrote:

On 01/03/12 04:43, John Kane wrote:


(mydata- as.factor(c(1,2,3, 2, 5, 2)))
str(mydata)

newdata- as.character(mydata)

newdata[newdata==2]- 0
newdata- as.numeric(newdata)
str(newdata)

We really need to keep Excel (and other spreadsheets) out of peoples
hands.



Amen, bro'!!!

cheers,

Rolf Turner



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-03-02 Thread Rolf Turner

On 03/03/12 12:41, Greg Snow wrote:

SNIP

It is possible to do the right thing in
Excel, but Excel does not encourage (let alone force) you to do the
right thing, but makes it easy to do the wrong thing.

SNIP

Fortune!

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-03-02 Thread jim holtman
Unfortunately they only know how to use Excel and Word.  They are not
folks who use a computer every day.  Many of them run factories or
warehouses and asking them to use something like Access would not
happen in my lifetime (I have retired twice already).

I don't have any problems with them messing up the data that I send
them; they are pretty good about making changes within the context of
the spreadsheet.  The other issue is that I working with people in
twenty different locations spread across the US, so I might be able to
one of them to use Access (there is one I know that uses it), but that
leaves 19 other people I would not be able to communicate with.

The other thing is, is that I use Excel myself to slice/dice data
since there are things that are easier in Excel than R (believe it or
not).  There are a number of tools I keep in my toolkit, and R is
probably the most important, but I have not thrown the rest of them
away since they still serve a purpose.

So if you can come up with a way to 20 diverse groups, who are not
computer literate, to change over in a couple of days from Excel to
Access let me know.  BTW, I tried to use Access once and gave it up
because it was not as intuitive as some other tools and did not give
me any more capability than the ones I was using.  So I know I would
have a problem in convincing other to make the change just so they
could communicate with me, while they still had to use Excel to most
of their other interfaces.

This is the real world where you have to learn how to adapt to your
environment and make the best of it.  So you just have to learn that
Excel can be your friend (or at least not your enemy) and can serve a
very useful purpose in getting your ideas across to other people.

On Fri, Mar 2, 2012 at 6:41 PM, Greg Snow 538...@gmail.com wrote:
 Try sending your clients a data set (data frame, table, etc) as an MS
 Access data table instead.  They can still view the data as a table,
 but will have to go to much more effort to mess up the data, more
 likely they will do proper edits without messing anything up (mixing
 characters in with numbers, have more sexes than your biology teacher
 told you about, add extra lines at top or bottom that makes reading
 back into R more difficult, etc.)

 I have had a few clients that I talked into using MS Access from the
 start to enter their data, there was often a bit of resistance at
 first, but once they tried it and went through the process of
 designing the database up front they ended up thanking me and believed
 that the entire data entry process was easier and quicker than had the
 used excel as they originally planned.

 Access is still part of MS office, so they don't need to learn R or in
 any way break their chains from being prisoners of bill, but they will
 be more productive in more ways than just interfacing with you.

 Access (databases in general) force you to plan things out and do the
 correct thing from the start.  It is possible to do the right thing in
 Excel, but Excel does not encourage (let alone force) you to do the
 right thing, but makes it easy to do the wrong thing.

 On Thu, Mar 1, 2012 at 6:15 AM, jim holtman jholt...@gmail.com wrote:
 But there are some important reasons to use Excel.  In my work there
 are a lot of people that I have to send the equivalent of a data.frame
 to who want to look at the data and possibly slice/dice the data
 differently and then send back to me updates.  These folks do not know
 how to use R, but do have Microsoft Office installed on their
 computers and know how to use the different products.

 I have been very successful in conveying what I am doing for them by
 communicating via Excel spreadsheets.  It is also an important medium
 in dealing with some international companies who provide data via
 Excel and expect responses back via Excel.

 When dealing with data in a tabular form, Excel does provide a way for
 a majority of the people I work with to understand the data.  Yes,
 there are problems with some of the ways that people use Excel, and
 yes I have had to invest time in scrubbing some of the data that I get
 from them, but if I did not, then I would probably not have a job
 working for them.  I use R exclusively for the analysis that I do, but
 find it convenient to use Excel to provide a communication mechanism
 to the majority of the non-R users that I have to deal with.  It is a
 convenient work-around because I would never get them to invest the
 time to learn R.

 So in the real world these is a need to Excel and we are not going to
 cause it to go away; we have to learn how to live with it, and from my
 standpoint, it has definitely benefited me in being able to
 communicate with my users and continuing to provide them with results
 that they are happy with.  They refer to letting me work my magic on
 the data; all they know is they see the result via Excel and in the
 background R is doing the heavy lifting that they do not have to know

Re: [R] Cleaning up messy Excel data

2012-03-01 Thread jim holtman
But there are some important reasons to use Excel.  In my work there
are a lot of people that I have to send the equivalent of a data.frame
to who want to look at the data and possibly slice/dice the data
differently and then send back to me updates.  These folks do not know
how to use R, but do have Microsoft Office installed on their
computers and know how to use the different products.

I have been very successful in conveying what I am doing for them by
communicating via Excel spreadsheets.  It is also an important medium
in dealing with some international companies who provide data via
Excel and expect responses back via Excel.

When dealing with data in a tabular form, Excel does provide a way for
a majority of the people I work with to understand the data.  Yes,
there are problems with some of the ways that people use Excel, and
yes I have had to invest time in scrubbing some of the data that I get
from them, but if I did not, then I would probably not have a job
working for them.  I use R exclusively for the analysis that I do, but
find it convenient to use Excel to provide a communication mechanism
to the majority of the non-R users that I have to deal with.  It is a
convenient work-around because I would never get them to invest the
time to learn R.

So in the real world these is a need to Excel and we are not going to
cause it to go away; we have to learn how to live with it, and from my
standpoint, it has definitely benefited me in being able to
communicate with my users and continuing to provide them with results
that they are happy with.  They refer to letting me work my magic on
the data; all they know is they see the result via Excel and in the
background R is doing the heavy lifting that they do not have to know
about.

On Wed, Feb 29, 2012 at 4:41 PM, Rolf Turner rolf.tur...@xtra.co.nz wrote:
 On 01/03/12 04:43, John Kane wrote:

 (mydata- as.factor(c(1,2,3, 2, 5, 2)))
 str(mydata)

 newdata- as.character(mydata)

 newdata[newdata==2]- 0
 newdata- as.numeric(newdata)
 str(newdata)

 We really need to keep Excel (and other spreadsheets) out of peoples
 hands.


 Amen, bro'!!!

    cheers,

        Rolf Turner

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-02-29 Thread John Kane

(mydata - as.factor(c(1,2,3, 2, 5, 2)))
str(mydata)

newdata - as.character(mydata)

newdata[newdata==2] - 0
newdata - as.numeric(newdata)
str(newdata)

We really need to keep Excel (and other spreadsheets) out of peoples hands.

John Kane
Kingston ON Canada


 -Original Message-
 From: noahsilver...@ucla.edu
 Sent: Tue, 28 Feb 2012 13:27:13 -0800
 To: r-help@r-project.org
 Subject: [R] Cleaning up messy Excel data
 
 Unfortunately, some data I need to work with was delivered in a rather
 messy Excel file.  I want to import into R and clean up some things so
 that I can do my analysis.  Pulling in a CSV from Excel is the easy part.
 
 My current challenge is dealing with some text mixed in the values.
 i.e.   118   5.7   2.0  3.7
 
 Since this column in Excel has a 2.0 value, then R reads the column as
 a factor with levels.  Ideally, I want to convert it a normal vector of
 scalars and code code the 2.0 as 0.
 
 Can anyone suggest an easy way to do this?
 
 Thanks!
 
 
 --
 Noah Silverman
 UCLA Department of Statistics
 8117 Math Sciences Building
 Los Angeles, CA 90095
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE ONLINE PHOTOSHARING - Share your photos online with your friends and 
family!
Visit http://www.inbox.com/photosharing to find out more!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-02-29 Thread Rolf Turner

On 01/03/12 04:43, John Kane wrote:

(mydata- as.factor(c(1,2,3, 2, 5, 2)))
str(mydata)

newdata- as.character(mydata)

newdata[newdata==2]- 0
newdata- as.numeric(newdata)
str(newdata)

We really need to keep Excel (and other spreadsheets) out of peoples hands.


Amen, bro'!!!

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cleaning up messy Excel data

2012-02-28 Thread Noah Silverman
Unfortunately, some data I need to work with was delivered in a rather messy 
Excel file.  I want to import into R and clean up some things so that I can do 
my analysis.  Pulling in a CSV from Excel is the easy part.

My current challenge is dealing with some text mixed in the values.  
i.e.   118   5.7   2.0  3.7 

Since this column in Excel has a 2.0 value, then R reads the column as a 
factor with levels.  Ideally, I want to convert it a normal vector of scalars 
and code code the 2.0 as 0.  

Can anyone suggest an easy way to do this?

Thanks!


--
Noah Silverman
UCLA Department of Statistics
8117 Math Sciences Building
Los Angeles, CA 90095


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-02-28 Thread jim holtman
First of all when reading in the CSV file, use 'as.is = TRUE' to
prevent the changing to factors.

Now that things are character in that column, you can use some pattern
expressions (gsub, regex, ...) to search for and change your data.
E.g.,

sub(.*, 0, yourCol)

should do it for you.

On Tue, Feb 28, 2012 at 4:27 PM, Noah Silverman noahsilver...@ucla.edu wrote:
 Unfortunately, some data I need to work with was delivered in a rather messy 
 Excel file.  I want to import into R and clean up some things so that I can 
 do my analysis.  Pulling in a CSV from Excel is the easy part.

 My current challenge is dealing with some text mixed in the values.
 i.e.   118   5.7   2.0  3.7

 Since this column in Excel has a 2.0 value, then R reads the column as a 
 factor with levels.  Ideally, I want to convert it a normal vector of scalars 
 and code code the 2.0 as 0.

 Can anyone suggest an easy way to do this?

 Thanks!


 --
 Noah Silverman
 UCLA Department of Statistics
 8117 Math Sciences Building
 Los Angeles, CA 90095


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-02-28 Thread Robert Baer
-Original Message- 
From: Noah Silverman

Sent: Tuesday, February 28, 2012 3:27 PM
To: r-help
Subject: [R] Cleaning up messy Excel data

Unfortunately, some data I need to work with was delivered in a rather messy 
Excel file.  I want to import into R and clean up some things so that I can 
do my analysis.  Pulling in a CSV from Excel is the easy part.


My current challenge is dealing with some text mixed in the values.
i.e.   118   5.7   2.0  3.7

Since this column in Excel has a 2.0 value, then R reads the column as a 
factor with levels.  Ideally, I want to convert it a normal vector of 
scalars and code code the 2.0 as 0.


Can anyone suggest an easy way to do this?
--
?as.character
will show you how to change the factor column into a character column. 
Then, you can replace text using any of a number of procedures.

see for example
?gsub

finally, you can use as.numeric if you want numbers.  Coding is best done 
in the context of factors, so you might want to consider where replacing  2 
with NA is more appropriate than replacing with 0.  In this end, the choice 
might be context sensitive.


Rob

--
Robert W. Baer, Ph.D.
Professor of Physiology
Kirksville College of Osteopathic Medicine
A. T. Still University of Health Sciences
800 W. Jefferson St.
Kirksville, MO 63501
660-626-2322
FAX 660-626-2965

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-02-28 Thread Noah Silverman
That's exactly what I need.

Thank You!!


--
Noah Silverman
UCLA Department of Statistics
8117 Math Sciences Building
Los Angeles, CA 90095

On Feb 28, 2012, at 1:42 PM, jim holtman wrote:

 First of all when reading in the CSV file, use 'as.is = TRUE' to
 prevent the changing to factors.
 
 Now that things are character in that column, you can use some pattern
 expressions (gsub, regex, ...) to search for and change your data.
 E.g.,
 
 sub(.*, 0, yourCol)
 
 should do it for you.
 
 On Tue, Feb 28, 2012 at 4:27 PM, Noah Silverman noahsilver...@ucla.edu 
 wrote:
 Unfortunately, some data I need to work with was delivered in a rather messy 
 Excel file.  I want to import into R and clean up some things so that I can 
 do my analysis.  Pulling in a CSV from Excel is the easy part.
 
 My current challenge is dealing with some text mixed in the values.
 i.e.   118   5.7   2.0  3.7
 
 Since this column in Excel has a 2.0 value, then R reads the column as a 
 factor with levels.  Ideally, I want to convert it a normal vector of 
 scalars and code code the 2.0 as 0.
 
 Can anyone suggest an easy way to do this?
 
 Thanks!
 
 
 --
 Noah Silverman
 UCLA Department of Statistics
 8117 Math Sciences Building
 Los Angeles, CA 90095
 
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 Jim Holtman
 Data Munger Guru
 
 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-02-28 Thread Stephen Sefick
Just replace that value with zero.  If you provide some reproducible 
code I could probably give you a solution.

?dput
good luck,

Stephen

On 02/28/2012 03:27 PM, Noah Silverman wrote:

Unfortunately, some data I need to work with was delivered in a rather messy 
Excel file.  I want to import into R and clean up some things so that I can do 
my analysis.  Pulling in a CSV from Excel is the easy part.

My current challenge is dealing with some text mixed in the values.
i.e.   118   5.72.0  3.7

Since this column in Excel has a 2.0 value, then R reads the column as a factor with 
levels.  Ideally, I want to convert it a normal vector of scalars and code code the 2.0 
as 0.

Can anyone suggest an easy way to do this?

Thanks!


--
Noah Silverman
UCLA Department of Statistics
8117 Math Sciences Building
Los Angeles, CA 90095


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Stephen Sefick
**
Auburn University
Biological Sciences
331 Funchess Hall
Auburn, Alabama
36849
**
sas0...@auburn.edu
http://www.auburn.edu/~sas0025
**

Let's not spend our time and resources thinking about things that are so little 
or so large that all they really do for us is puff us up and make us feel like 
gods.  We are mammals, and have not exhausted the annoying little problems of 
being mammals.

-K. Mullis

A big computer, a complex algorithm and a long time does not equal science.

  -Robert Gentleman

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning date columns

2011-03-10 Thread natalie.vanzuydam
Dear Bill,

Thanks very much for the reply and for the code.  I have amended my personal
details for future posts.  I was wondering if there were any good books or
tutorials for writing code similar to what you have provided above?

Best wishes,
Natalie Van Zuydam

-
Natalie Van Zuydam

PhD Student
University of Dundee
nvanzuy...@dundee.ac.uk
--
View this message in context: 
http://r.789695.n4.nabble.com/Cleaning-date-columns-tp3343359p3345482.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cleaning date columns

2011-03-09 Thread Newbie19_02
Hi Everyone,

I have the following problem:

data - structure(list(prochi = c(IND1, IND1, IND1, 
IND2, IND2, IND2, IND2, IND3, 
IND4, IND5), date_admission = structure(c(6468, 
6470, 7063, 9981, 9983, 14186, 14372, 5129, 9767, 11168), class = Date)),
.Names = c(prochi, 
date_admission), row.names = c(27, 28, 21, 86, 77, 
80, 1, 114, 192, 322), class = data.frame)


I have records for individuals that were taken on specific dates.  Some of
the dates are within 3 days of each other.  I want to be able to clean my
date column and select the earliest of the dates that occur within 3 days of
each other per individual as a single observation that represents the N
observations.  So for example:

input:
IND11987-09-17
IND1 1987-09-19
IND1 1989-05-04

output:
IND11987-09-17
IND1 1989-05-04

I'm  not sure where to start with this?

Thanks,
Nat
 

--
View this message in context: 
http://r.789695.n4.nabble.com/Cleaning-date-columns-tp3343359p3343359.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning date columns

2011-03-09 Thread Bill.Venables
Here is one possible way (I think - untested code)


cData - do.call(rbind, lapply(split(data, data$prochi), 
function(dat) {
dat - dat[order(dat$date), ]
while(any(d - (diff(dat$date) = 3))) 
dat - dat[-(min(which(d))+1), ]
dat
}))

(It would be courteous of you to give us your real name, by the way) 

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Newbie19_02
Sent: Wednesday, 9 March 2011 9:20 PM
To: r-help@r-project.org
Subject: [R] Cleaning date columns

Hi Everyone,

I have the following problem:

data - structure(list(prochi = c(IND1, IND1, IND1, 
IND2, IND2, IND2, IND2, IND3, 
IND4, IND5), date_admission = structure(c(6468, 
6470, 7063, 9981, 9983, 14186, 14372, 5129, 9767, 11168), class = Date)),
.Names = c(prochi, 
date_admission), row.names = c(27, 28, 21, 86, 77, 
80, 1, 114, 192, 322), class = data.frame)


I have records for individuals that were taken on specific dates.  Some of
the dates are within 3 days of each other.  I want to be able to clean my
date column and select the earliest of the dates that occur within 3 days of
each other per individual as a single observation that represents the N
observations.  So for example:

input:
IND11987-09-17
IND1 1987-09-19
IND1 1989-05-04

output:
IND11987-09-17
IND1 1989-05-04

I'm  not sure where to start with this?

Thanks,
Nat
 

--
View this message in context: 
http://r.789695.n4.nabble.com/Cleaning-date-columns-tp3343359p3343359.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cleaning up a vector

2010-10-01 Thread mlarkin
I calculated a large vector.  Unfortunately, I have some measurement error
in my data and some of the values in the vector are erroneous.  I ended up
wih some Infs and NaNs in the vector.  I would like to filter out the Inf
and NaN values and only keep the values in my vector that range from 1 to
20.  Is there a way to filter out Infs and NaNs in R and end up with a
clean vector?

Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cleaning up a vector

2010-10-01 Thread Henrique Dallazuanna
Try this:

x[is.finite(x)]


On Fri, Oct 1, 2010 at 2:51 PM, mlar...@rsmas.miami.edu wrote:

 I calculated a large vector.  Unfortunately, I have some measurement error
 in my data and some of the values in the vector are erroneous.  I ended up
 wih some Infs and NaNs in the vector.  I would like to filter out the Inf
 and NaN values and only keep the values in my vector that range from 1 to
 20.  Is there a way to filter out Infs and NaNs in R and end up with a
 clean vector?

 Mike

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cleaning up a vector

2010-10-01 Thread Erik Iverson

Mike,

Small, reproducible examples are always useful for the rest of the us.

x - c(0, NA, NaN, 1 , 10, 20, 21, Inf)
x[!is.na(x)  x =1  x= 20]

Is that what you're looking for?

mlar...@rsmas.miami.edu wrote:

I calculated a large vector.  Unfortunately, I have some measurement error
in my data and some of the values in the vector are erroneous.  I ended up
wih some Infs and NaNs in the vector.  I would like to filter out the Inf
and NaN values and only keep the values in my vector that range from 1 to
20.  





Is there a way to filter out Infs and NaNs in R and end up with a

clean vector?

Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cleaning up a vector

2010-10-01 Thread Peter Langfelder
On Fri, Oct 1, 2010 at 10:51 AM,  mlar...@rsmas.miami.edu wrote:
 I calculated a large vector.  Unfortunately, I have some measurement error
 in my data and some of the values in the vector are erroneous.  I ended up
 wih some Infs and NaNs in the vector.  I would like to filter out the Inf
 and NaN values and only keep the values in my vector that range from 1 to
 20.  Is there a way to filter out Infs and NaNs in R and end up with a
 clean vector?


Two steps, starting from vector x

x1 = x[is.finite(x)];
x2 = x1[(x1 = 20)  (x1 = 1)];

From what you say, x2 is the result you want. Just be aware  that
dropping values will change the indexing.

Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cleaning up a vector

2010-10-01 Thread Marc Schwartz
On Oct 1, 2010, at 12:51 PM, mlar...@rsmas.miami.edu wrote:

 I calculated a large vector.  Unfortunately, I have some measurement error
 in my data and some of the values in the vector are erroneous.  I ended up
 wih some Infs and NaNs in the vector.  I would like to filter out the Inf
 and NaN values and only keep the values in my vector that range from 1 to
 20.  Is there a way to filter out Infs and NaNs in R and end up with a
 clean vector?
 
 Mike


set.seed(1)
x - sample(c(0:25, NaN, Inf, -Inf), 50, replace = TRUE)

 x
 [1]7   10   16  NaN5  NaN  Inf   19   18155   19
[14]   11   22   14   20 -Inf   11   22  Inf6   1837   11
[27]0   11   259   13   17   145   23   19   233   20
[40]   11   23   18   22   16   15   220   13   21   20


 x[is.finite(x)  x = 1  x = 20]
 [1]  7 10 16  5 19 18  1  5  5 19 11 14 20 11  6 18  3  7 11 11  9 13
[23] 17 14  5 19  3 20 11 18 16 15 13 20



See ?is.finite

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cleaning up a vector

2010-10-01 Thread Henrique Dallazuanna
Complementing:

findInterval(x[is.finite(x)], 1:20)


On Fri, Oct 1, 2010 at 2:55 PM, Henrique Dallazuanna www...@gmail.comwrote:

 Try this:

 x[is.finite(x)]



 On Fri, Oct 1, 2010 at 2:51 PM, mlar...@rsmas.miami.edu wrote:

 I calculated a large vector.  Unfortunately, I have some measurement error
 in my data and some of the values in the vector are erroneous.  I ended up
 wih some Infs and NaNs in the vector.  I would like to filter out the Inf
 and NaN values and only keep the values in my vector that range from 1 to
 20.  Is there a way to filter out Infs and NaNs in R and end up with a
 clean vector?

 Mike

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cleaning a time series

2008-05-23 Thread tolga . i . uzuner
Dear R Users,

Was wondering if anyone can give me pointers to functionality in R that 
can help clean a time series ? For example, some kind of 
package/functionality which identifies potential errors and takes some 
action, such as replacement by some suitable value (carry-forward, average 
of nearest, what have you) and reporting of errors identified.

I did search Google for R cran time series clean outlier and various 
permutations but did not come across anything.

Thanks,
Tolga

Generally, this communication is for informational purposes only
and it is not intended as an offer or solicitation for the purchase
or sale of any financial instrument or as an official confirmation
of any transaction. In the event you are receiving the offering
materials attached below related to your interest in hedge funds or
private equity, this communication may be intended as an offer or
solicitation for the purchase or sale of such fund(s).  All market
prices, data and other information are not warranted as to
completeness or accuracy and are subject to change without notice.
Any comments or statements made herein do not necessarily reflect
those of JPMorgan Chase  Co., its subsidiaries and affiliates.

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase 
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.
Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to UK legal entities.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning a time series

2008-05-23 Thread Gabor Grothendieck
The zoo package has six na.* routines for carrying values
forward, etc.

library(zoo)
?zoo

describes them.  Also see the vignettes.

On Fri, May 23, 2008 at 6:55 AM,  [EMAIL PROTECTED] wrote:
 Dear R Users,

 Was wondering if anyone can give me pointers to functionality in R that
 can help clean a time series ? For example, some kind of
 package/functionality which identifies potential errors and takes some
 action, such as replacement by some suitable value (carry-forward, average
 of nearest, what have you) and reporting of errors identified.

 I did search Google for R cran time series clean outlier and various
 permutations but did not come across anything.

 Thanks,
 Tolga

 Generally, this communication is for informational purposes only
 and it is not intended as an offer or solicitation for the purchase
 or sale of any financial instrument or as an official confirmation
 of any transaction. In the event you are receiving the offering
 materials attached below related to your interest in hedge funds or
 private equity, this communication may be intended as an offer or
 solicitation for the purchase or sale of such fund(s).  All market
 prices, data and other information are not warranted as to
 completeness or accuracy and are subject to change without notice.
 Any comments or statements made herein do not necessarily reflect
 those of JPMorgan Chase  Co., its subsidiaries and affiliates.

 This transmission may contain information that is privileged,
 confidential, legally privileged, and/or exempt from disclosure
 under applicable law. If you are not the intended recipient, you
 are hereby notified that any disclosure, copying, distribution, or
 use of the information contained herein (including any reliance
 thereon) is STRICTLY PROHIBITED. Although this transmission and any
 attachments are believed to be free of any virus or other defect
 that might affect any computer system into which it is received and
 opened, it is the responsibility of the recipient to ensure that it
 is virus free and no responsibility is accepted by JPMorgan Chase 
 Co., its subsidiaries and affiliates, as applicable, for any loss
 or damage arising in any way from its use. If you received this
 transmission in error, please immediately contact the sender and
 destroy the material in its entirety, whether in electronic or hard
 copy format. Thank you.
 Please refer to http://www.jpmorgan.com/pages/disclosures for
 disclosures relating to UK legal entities.
[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cleaning up memory in R

2008-05-14 Thread Anh Tran
I'm trying to work on a large dataset and after each segment of run, I need
a command to flush the memory. I tried gc() and rm(list=ls()) but they don't
seem to help. gc() does not do anything beside showing the memory usage.

I'm using the package BSgenome from BioC.

Thanks a bunch

-- 
Regards,
Anh Tran

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up memory in R

2008-05-14 Thread Duncan Murdoch

On 5/14/2008 3:59 PM, Anh Tran wrote:

I'm trying to work on a large dataset and after each segment of run, I need
a command to flush the memory. I tried gc() and rm(list=ls()) but they don't
seem to help. gc() does not do anything beside showing the memory usage.


How do you know it does nothing?  R won't normally release memory to the 
OS, but it is still freed to be reused internally in R.


On the other hand, if you still have references to the variables, then 
gc() really will do nothing.


Duncan Murdoch


I'm using the package BSgenome from BioC.

Thanks a bunch



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cleaning database: grep()? apply()?

2007-11-13 Thread Jonas Malmros
Dear R users,

I have a huge database and I need to adjust it somewhat.

Here is a very little cut out from database:

CODENAME   DATE 
DATA1
4813ADVANCED TELECOM19870.013
3845ADVANCED THERAPEUTIC SYS LTD198710.1
3845ADVANCED THERAPEUTIC SYS LTD19892.463
3845ADVANCED THERAPEUTIC SYS LTD19881.563
2836ADVANCED TISSUE SCI  -CL A  19870.847
2836ADVANCED TISSUE SCI  -CL A   1989   0.872
2836ADVANCED TISSUE SCI  -CL A   1988   0.529

What I need is:
1) Delete all cases containing -CL A (and also -OLD, -ADS, etc) at the end
2) Delete all cases that have less than 3 years of data
3) For each remaining case compute ratio DATA1(1989) / DATA1(1987)
[and then ratios involving other data variables] and output this into
new database consisting of CODE, NAME, RATIOs.

Maybe someone can suggest an effective way to do these things? I
imagine the first one would involve grep(), and 2 and 3 would involve
apply family of functions, but I cannot get my mind around the actual
code to perform this adjustments. I am new to R, I do write code but
usually it consists of for-functions and plotting. I would much
appreciate your help.
Thank you in advance!
-- 
Jonas Malmros
Stockholm University
Stockholm, Sweden

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning database: grep()? apply()?

2007-11-13 Thread jim holtman
Here is how to wittle it down for the first two parts of your
question.  I am not exactly what you are after in the third part.  Is
it that you want specific DATEs or do you want the ratio of the
DATE[max]/DATE[min]?

 x - read.table(textConnection(CODENAME  
  DATE DATA1
+ 4813'ADVANCED TELECOM'19870.013
+ 3845'ADVANCED THERAPEUTIC SYS LTD'198710.1
+ 3845'ADVANCED THERAPEUTIC SYS LTD'19892.463
+ 3845'ADVANCED THERAPEUTIC SYS LTD'19881.563
+ 2836'ADVANCED TISSUE SCI  -CL A'  19870.847
+ 2836'ADVANCED TISSUE SCI  -CL A'   1989   0.872
+ 2836'ADVANCED TISSUE SCI  -CL A'   1988
0.529), header=TRUE)
 # matches on things to delete
 delete_indx - grep(-CL A$|-OLD$|-ADS$, x$NAME)
 # delete them
 x - x[-delete_indx,]
 x
  CODE NAME DATE  DATA1
1 4813 ADVANCED TELECOM 1987  0.013
2 3845 ADVANCED THERAPEUTIC SYS LTD 1987 10.100
3 3845 ADVANCED THERAPEUTIC SYS LTD 1989  2.463
4 3845 ADVANCED THERAPEUTIC SYS LTD 1988  1.563
 # I assume you want to use NAME to check for ranges of data
 date_range - tapply(x$DATE, x$NAME, function(dates) diff(range(dates)))
 date_range
ADVANCED TELECOM ADVANCED THERAPEUTIC SYS LTD
   02
  ADVANCED TISSUE SCI  -CL A
  NA
 # delete ones with less than 3 years
 names_to_delete - names(date_range[date_range  2])
 # delete those entries
 x - x[!(x$NAME %in% names_to_delete),]
 x
  CODE NAME DATE  DATA1
2 3845 ADVANCED THERAPEUTIC SYS LTD 1987 10.100
3 3845 ADVANCED THERAPEUTIC SYS LTD 1989  2.463
4 3845 ADVANCED THERAPEUTIC SYS LTD 1988  1.563




On Nov 13, 2007 2:34 PM, Jonas Malmros [EMAIL PROTECTED] wrote:
 Dear R users,

 I have a huge database and I need to adjust it somewhat.

 Here is a very little cut out from database:

 CODENAME   DATE 
 DATA1
 4813ADVANCED TELECOM19870.013
 3845ADVANCED THERAPEUTIC SYS LTD198710.1
 3845ADVANCED THERAPEUTIC SYS LTD19892.463
 3845ADVANCED THERAPEUTIC SYS LTD19881.563
 2836ADVANCED TISSUE SCI  -CL A  19870.847
 2836ADVANCED TISSUE SCI  -CL A   1989   0.872
 2836ADVANCED TISSUE SCI  -CL A   1988   0.529

 What I need is:
 1) Delete all cases containing -CL A (and also -OLD, -ADS, etc) at the end
 2) Delete all cases that have less than 3 years of data
 3) For each remaining case compute ratio DATA1(1989) / DATA1(1987)
 [and then ratios involving other data variables] and output this into
 new database consisting of CODE, NAME, RATIOs.

 Maybe someone can suggest an effective way to do these things? I
 imagine the first one would involve grep(), and 2 and 3 would involve
 apply family of functions, but I cannot get my mind around the actual
 code to perform this adjustments. I am new to R, I do write code but
 usually it consists of for-functions and plotting. I would much
 appreciate your help.
 Thank you in advance!
 --
 Jonas Malmros
 Stockholm University
 Stockholm, Sweden

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.