Re: [R] Is this a bug or am I making a mistake?
On Mon, 06-Jan-2014 at 07:38PM +, William Dunlap wrote: | You could compare the outputs of | z1 - with(dd, dd$EVYEAR==2012 dd$EVMONTH=='02') Wouldn't with(dd, EVYEAR==2012 EVMONTH=='02') be sufficient when using with()? | (which is like subset()) and that of | z2 - dd$EVYEAR==2012 dd$EVMONTH=='02' | (evaluated from within the same context) with | table(z1, z2, exclude=NULL) | That may show something useful. | | Bill Dunlap | Spotfire, TIBCO Software | wdunlap tibco.com | | | -Original Message- | From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf | Of Walter Anderson | Sent: Monday, January 06, 2014 11:17 AM | To: Sarah Goslee | Cc: R Help | Subject: Re: [R] Is this a bug or am I making a mistake? | | On 01/06/2014 11:14 AM, Sarah Goslee wrote: | Hi Walter, | | I can't reproduce your results. Please provide some data that | demonstrates the problem. | | http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible- | example | | subset() and [ differ in their handling of NA values, and you don't | need the dd$ in the arguments to subset(). | | But those don't explain your result given the information provided. | Please provide more information. | | Sarah | | | On Mon, Jan 6, 2014 at 12:06 PM, Walter Anderson wandrso...@gmail.com wrote: | I have a data frame that I am extracting some records from and noticed the | following issue | | I originally used tmp - subset(dd, dd$EVYEAR==2012 dd$EVMONTH=='02') | | and noticed that I wasn't ending up with all of the records I should have; | however, when I used | | tmp - dd[dd$EVYEAR==2012 dd$EVMONTH=='02',] | | I did get all of the records I should have. | | I thought the two forms were equivalent, am I mistaken? | | Thanks everyone for the response. I didn't provide a reproducible test, | since the data I experienced this issue with was quite large ( 40MB) | and I have not been able to reproduce the problem with any other data | set. I have also performed the subset using Microsoft Access on the | original dbf file I use for the data frame and confirmed that the second | query format (dd[QUERY,]) is producing the correct results. It doesn't | appear that any of the impacted (or any in the data frame) contain NA | records. | | I am not really looking for any particular solution, but was surprised | by the different results from what I presumed to be the same query. If | it is believed to be a possible bug, I would be glad to package up the | data that is generating the issue, but not sure where to place such a | large data set. | | __ | R-help@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. | | __ | R-help@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) . Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is this a bug or am I making a mistake?
Wouldn't with(dd, EVYEAR==2012 EVMONTH=='02') be sufficient when using with()? It probably would be sufficient to get the right answer, but I thought the OP was wondering why there was a difference. Comparing the results of his original code with new code would help uncover the reason. Bill Dunlap TIBCO Software wdunlap tibco.com -Original Message- From: Patrick Connolly [mailto:p_conno...@slingshot.co.nz] Sent: Sunday, January 12, 2014 12:56 AM To: William Dunlap Cc: Walter Anderson; Sarah Goslee; R Help Subject: Re: [R] Is this a bug or am I making a mistake? On Mon, 06-Jan-2014 at 07:38PM +, William Dunlap wrote: | You could compare the outputs of | z1 - with(dd, dd$EVYEAR==2012 dd$EVMONTH=='02') Wouldn't with(dd, EVYEAR==2012 EVMONTH=='02') be sufficient when using with()? | (which is like subset()) and that of | z2 - dd$EVYEAR==2012 dd$EVMONTH=='02' | (evaluated from within the same context) with | table(z1, z2, exclude=NULL) | That may show something useful. | | Bill Dunlap | Spotfire, TIBCO Software | wdunlap tibco.com | | | -Original Message- | From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf | Of Walter Anderson | Sent: Monday, January 06, 2014 11:17 AM | To: Sarah Goslee | Cc: R Help | Subject: Re: [R] Is this a bug or am I making a mistake? | | On 01/06/2014 11:14 AM, Sarah Goslee wrote: | Hi Walter, | | I can't reproduce your results. Please provide some data that | demonstrates the problem. | | http://stackoverflow.com/questions/5963269/how-to-make-a-great-r- reproducible- | example | | subset() and [ differ in their handling of NA values, and you don't | need the dd$ in the arguments to subset(). | | But those don't explain your result given the information provided. | Please provide more information. | | Sarah | | | On Mon, Jan 6, 2014 at 12:06 PM, Walter Anderson wandrso...@gmail.com wrote: | I have a data frame that I am extracting some records from and noticed the | following issue | | I originally used tmp - subset(dd, dd$EVYEAR==2012 dd$EVMONTH=='02') | | and noticed that I wasn't ending up with all of the records I should have; | however, when I used | | tmp - dd[dd$EVYEAR==2012 dd$EVMONTH=='02',] | | I did get all of the records I should have. | | I thought the two forms were equivalent, am I mistaken? | | Thanks everyone for the response. I didn't provide a reproducible test, | since the data I experienced this issue with was quite large ( 40MB) | and I have not been able to reproduce the problem with any other data | set. I have also performed the subset using Microsoft Access on the | original dbf file I use for the data frame and confirmed that the second | query format (dd[QUERY,]) is producing the correct results. It doesn't | appear that any of the impacted (or any in the data frame) contain NA | records. | | I am not really looking for any particular solution, but was surprised | by the different results from what I presumed to be the same query. If | it is believed to be a possible bug, I would be glad to package up the | data that is generating the issue, but not sure where to place such a | large data set. | | __ | R-help@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. | | __ | R-help@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_). Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is this a bug or am I making a mistake?
On Jan 6, 2014, at 11:16 AM, Walter Anderson wrote: On 01/06/2014 11:14 AM, Sarah Goslee wrote: Hi Walter, I can't reproduce your results. Please provide some data that demonstrates the problem. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example subset() and [ differ in their handling of NA values, and you don't need the dd$ in the arguments to subset(). But those don't explain your result given the information provided. Please provide more information. Sarah On Mon, Jan 6, 2014 at 12:06 PM, Walter Anderson wandrso...@gmail.com wrote: I have a data frame that I am extracting some records from and noticed the following issue I originally used tmp - subset(dd, dd$EVYEAR==2012 dd$EVMONTH=='02') and noticed that I wasn't ending up with all of the records I should have; however, when I used tmp - dd[dd$EVYEAR==2012 dd$EVMONTH=='02',] I did get all of the records I should have. I thought the two forms were equivalent, am I mistaken? Thanks everyone for the response. I didn't provide a reproducible test, since the data I experienced this issue with was quite large ( 40MB) and I have not been able to reproduce the problem with any other data set. I have also performed the subset using Microsoft Access on the original dbf file I use for the data frame and confirmed that the second query format (dd[QUERY,]) is producing the correct results. It doesn't appear that any of the impacted (or any in the data frame) contain NA records. What does it mean to say it doesn't appear that any of the impacted (or any in the data frame) contain NA records? Where is the code and output to support that appearance. What does this show? table( is.na(dd$EVYEAR==2012, is.na(dd$EVMONTH=='02') ) The other difference between [ and subset is that drop=FALSE in `subset` although how that would affect results is not clear. I am not really looking for any particular solution, but was surprised by the different results from what I presumed to be the same query. If it is believed to be a possible bug, I would be glad to package up the data that is generating the issue, but not sure where to place such a large data set. I don't think you have yet demonstrated a bug. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is this a bug or am I making a mistake?
I have a data frame that I am extracting some records from and noticed the following issue I originally used tmp - subset(dd, dd$EVYEAR==2012 dd$EVMONTH=='02') and noticed that I wasn't ending up with all of the records I should have; however, when I used tmp - dd[dd$EVYEAR==2012 dd$EVMONTH=='02',] I did get all of the records I should have. I thought the two forms were equivalent, am I mistaken? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is this a bug or am I making a mistake?
Hi Walter, I can't reproduce your results. Please provide some data that demonstrates the problem. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example subset() and [ differ in their handling of NA values, and you don't need the dd$ in the arguments to subset(). But those don't explain your result given the information provided. Please provide more information. Sarah On Mon, Jan 6, 2014 at 12:06 PM, Walter Anderson wandrso...@gmail.com wrote: I have a data frame that I am extracting some records from and noticed the following issue I originally used tmp - subset(dd, dd$EVYEAR==2012 dd$EVMONTH=='02') and noticed that I wasn't ending up with all of the records I should have; however, when I used tmp - dd[dd$EVYEAR==2012 dd$EVMONTH=='02',] I did get all of the records I should have. I thought the two forms were equivalent, am I mistaken? -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is this a bug or am I making a mistake?
On 01/06/2014 11:14 AM, Sarah Goslee wrote: Hi Walter, I can't reproduce your results. Please provide some data that demonstrates the problem. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example subset() and [ differ in their handling of NA values, and you don't need the dd$ in the arguments to subset(). But those don't explain your result given the information provided. Please provide more information. Sarah On Mon, Jan 6, 2014 at 12:06 PM, Walter Anderson wandrso...@gmail.com wrote: I have a data frame that I am extracting some records from and noticed the following issue I originally used tmp - subset(dd, dd$EVYEAR==2012 dd$EVMONTH=='02') and noticed that I wasn't ending up with all of the records I should have; however, when I used tmp - dd[dd$EVYEAR==2012 dd$EVMONTH=='02',] I did get all of the records I should have. I thought the two forms were equivalent, am I mistaken? Thanks everyone for the response. I didn't provide a reproducible test, since the data I experienced this issue with was quite large ( 40MB) and I have not been able to reproduce the problem with any other data set. I have also performed the subset using Microsoft Access on the original dbf file I use for the data frame and confirmed that the second query format (dd[QUERY,]) is producing the correct results. It doesn't appear that any of the impacted (or any in the data frame) contain NA records. I am not really looking for any particular solution, but was surprised by the different results from what I presumed to be the same query. If it is believed to be a possible bug, I would be glad to package up the data that is generating the issue, but not sure where to place such a large data set. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is this a bug or am I making a mistake?
You could compare the outputs of z1 - with(dd, dd$EVYEAR==2012 dd$EVMONTH=='02') (which is like subset()) and that of z2 - dd$EVYEAR==2012 dd$EVMONTH=='02' (evaluated from within the same context) with table(z1, z2, exclude=NULL) That may show something useful. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Walter Anderson Sent: Monday, January 06, 2014 11:17 AM To: Sarah Goslee Cc: R Help Subject: Re: [R] Is this a bug or am I making a mistake? On 01/06/2014 11:14 AM, Sarah Goslee wrote: Hi Walter, I can't reproduce your results. Please provide some data that demonstrates the problem. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible- example subset() and [ differ in their handling of NA values, and you don't need the dd$ in the arguments to subset(). But those don't explain your result given the information provided. Please provide more information. Sarah On Mon, Jan 6, 2014 at 12:06 PM, Walter Anderson wandrso...@gmail.com wrote: I have a data frame that I am extracting some records from and noticed the following issue I originally used tmp - subset(dd, dd$EVYEAR==2012 dd$EVMONTH=='02') and noticed that I wasn't ending up with all of the records I should have; however, when I used tmp - dd[dd$EVYEAR==2012 dd$EVMONTH=='02',] I did get all of the records I should have. I thought the two forms were equivalent, am I mistaken? Thanks everyone for the response. I didn't provide a reproducible test, since the data I experienced this issue with was quite large ( 40MB) and I have not been able to reproduce the problem with any other data set. I have also performed the subset using Microsoft Access on the original dbf file I use for the data frame and confirmed that the second query format (dd[QUERY,]) is producing the correct results. It doesn't appear that any of the impacted (or any in the data frame) contain NA records. I am not really looking for any particular solution, but was surprised by the different results from what I presumed to be the same query. If it is believed to be a possible bug, I would be glad to package up the data that is generating the issue, but not sure where to place such a large data set. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.