Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'
On Tue, 3 Jan 2012, David Winsemius wrote: burns.tds[ !duplicated(burns.tds) , ] Apparently it does not matter if the site column in the data frame is a factor or a character, read.zoo() generates the same error. Applying the above produces a long list starting with: burns.tds[!duplicated(burns.tds), ] site sampdatequant 599 BC-3 1992-03-270.100 600 BC-3 1992-04-300.100 601 BC-3 1992-05-300.100 603 BC-3 1992-06-190.100 1214BC-3 1992-07-200.100 1215BC-3 1992-08-100.100 1216BC-3 1992-09-300.100 1217BC-3 1992-10-290.100 1218BC-3 1992-11-190.100 1929BC-3 1995-03-238.080 I don't know how to interpret this. I don't see two rows with the same values, but ~ 500 rows each with a different value. What is duplicated? The entire row? The site ID? ?duplicated has some examples, but those do not show the output of the function nor explain what's duplicated. I need to get past this blockage and appreciate your help in determining why read.zoo() sees duplicates when the database table has none, and how to resolve this issue. TIA, Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'
On Jan 4, 2012, at 12:21 PM, Rich Shepard wrote: On Tue, 3 Jan 2012, David Winsemius wrote: burns.tds[ !duplicated(burns.tds) , ] Apparently it does not matter if the site column in the data frame is a factor or a character, read.zoo() generates the same error. Applying the above produces a long list starting with: burns.tds[!duplicated(burns.tds), ] site sampdatequant 599 BC-3 1992-03-270.100 600 BC-3 1992-04-300.100 601 BC-3 1992-05-300.100 603 BC-3 1992-06-190.100 1214BC-3 1992-07-200.100 1215BC-3 1992-08-100.100 1216BC-3 1992-09-300.100 1217BC-3 1992-10-290.100 1218BC-3 1992-11-190.100 1929BC-3 1995-03-238.080 I don't know how to interpret this. I don't see two rows with the same values, but ~ 500 rows each with a different value. What is duplicated? The entire row? The site ID? You didn't ask for what was duplicated, but rather what was NOT duplicated with that code. In the case of a dataframe it is the entire row that is tested. ?duplicated has some examples, but those do not show the output of the function nor explain what's duplicated. I need to get past this blockage and appreciate your help in determining why read.zoo() sees duplicates when the database table has none, and how to resolve this issue. I think you need to reduce this problem to a dataframe that you either post an access method for or use dput() to include. Then you need to say what you goals are and what code is not working on that example. TIA, Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'
On Wed, 4 Jan 2012, David Winsemius wrote: You didn't ask for what was duplicated, but rather what was NOT duplicated with that code. In the case of a dataframe it is the entire row that is tested. My original question was what was duplicated, but ... I changed the function by dropping the 'not'. There's something seriously wrong here and I need help from R gurus to tell me why. Example: burns.tds[duplicated(burns.tds), ] ... 25760 BC-1.5 1996-09-19 NA 25761 BC-1.5 1996-09-19 0.010 ... But, when I query the database table I see this: select * from chemistry where site = 'BC-1.5' and sampdate = '1996-09-19' and param = 'TDS'; site | sampdate | param | quant | units | qual | easting | northing | stream | basin ++---+---+---+--+-+--+- -+ BC-1.5 | 1996-09-19 | TDS | 935 | mg/L | | | | BurnsCrk | (1 row) There is only a single row for that site, sampdate, and parameter and the quantity is different from those in the R data frame. I think you need to reduce this problem to a dataframe that you either post an access method for or use dput() to include. Then you need to say what you goals are and what code is not working on that example. I'll gladly do this. Which data frame should I make available: the original chemdata or the subset burns.tds? I'll start with the latter. Compressed dput() output attached. My goal is to produce time series plots of TDS, by site, on several streams over the period for which that component was measured. Lattice lets me superpose multiple lines on the same axis set with different color lines and a legend. What's not working is something in the workflow of subsettiong chemdata to extract all TDS data for a named stream (e.g., burns.tds and winters.tds), then convert them to zoo objects using read.zoo(). Somewhere along this process my data are being mangled. It's not in the source data frame, chemdata: chemdata[duplicated(chemdata), ] [1] site sampdate paramquantunitsqual easting northing [9] stream basin 0 rows (or 0-length row.names) The command I used to subset burns.tds from chemdata was: burns.tds - subset(chemdata, stream == 'BurnsCrk', select = c(site, sampdate, param == 'TDS', quant), drop = T) Thanks, David, Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'
Nothing attached. I don't know what you entitled teh compressed dput output but it did not pass the filters of the mailserver and you did not copy me. If chemdata is available as a text file, hten make sure its extension is .txt and then attach it. -- David. On Jan 4, 2012, at 1:31 PM, Rich Shepard wrote: On Wed, 4 Jan 2012, David Winsemius wrote: You didn't ask for what was duplicated, but rather what was NOT duplicated with that code. In the case of a dataframe it is the entire row that is tested. My original question was what was duplicated, but ... I changed the function by dropping the 'not'. There's something seriously wrong here and I need help from R gurus to tell me why. Example: burns.tds[duplicated(burns.tds), ] ... 25760 BC-1.5 1996-09-19 NA 25761 BC-1.5 1996-09-19 0.010 ... But, when I query the database table I see this: select * from chemistry where site = 'BC-1.5' and sampdate = '1996-09-19' and param = 'TDS'; site | sampdate | param | quant | units | qual | easting | northing | stream | basin ++---+---+---+-- +-+--+- -+ BC-1.5 | 1996-09-19 | TDS | 935 | mg/L | | | | BurnsCrk | (1 row) There is only a single row for that site, sampdate, and parameter and the quantity is different from those in the R data frame. I think you need to reduce this problem to a dataframe that you either post an access method for or use dput() to include. Then you need to say what you goals are and what code is not working on that example. I'll gladly do this. Which data frame should I make available: the original chemdata or the subset burns.tds? I'll start with the latter. Compressed dput() output attached. My goal is to produce time series plots of TDS, by site, on several streams over the period for which that component was measured. Lattice lets me superpose multiple lines on the same axis set with different color lines and a legend. What's not working is something in the workflow of subsettiong chemdata to extract all TDS data for a named stream (e.g., burns.tds and winters.tds), then convert them to zoo objects using read.zoo(). Somewhere along this process my data are being mangled. It's not in the source data frame, chemdata: chemdata[duplicated(chemdata), ] [1] site sampdate paramquantunitsqual easting northing [9] stream basin 0 rows (or 0-length row.names) The command I used to subset burns.tds from chemdata was: burns.tds - subset(chemdata, stream == 'BurnsCrk', select = c(site, sampdate, param == 'TDS', quant), drop = T) Thanks, David, Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries' [FIXED]
On Wed, 4 Jan 2012, David Winsemius wrote: Nothing attached. I don't know what you entitled teh compressed dput output but it did not pass the filters of the mailserver and you did not copy me. David, It must have been stripped off as too large (14K). Regardless, I solved the problem: The examples in the subset help page show only a single row criterion being used, but does not explicitly note that rows can be selected only one criterion at a time. Because the issue showed up only with the subset() function to extract rows with the parameter TDS, the problem had to be in that syntax. By changing the data frame argument to subset() from the overall chemdata to that of only a single stream, creation of the zoo object threw no errors and duplicated() returned zero rows. Wow! And I thought I understood the subset() syntax. I now do! Thanks to you, Gabor, and everyone else who responded to this thread, Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries' [FIXED]
On Jan 4, 2012, at 3:21 PM, Rich Shepard wrote: On Wed, 4 Jan 2012, David Winsemius wrote: Nothing attached. I don't know what you entitled teh compressed dput output but it did not pass the filters of the mailserver and you did not copy me. David, It must have been stripped off as too large (14K). Regardless, I solved the problem: The examples in the subset help page show only a single row criterion being used, but does not explicitly note that rows can be selected only one criterion at a time. Because that is simply not true. Connect your criteria with ampersands and you can have as many as you want. The only requirement is that the logical vector that results be exactly the number of rows in the dataframe as described in the help page. Because the issue showed up only with the subset() function to extract rows with the parameter TDS, the problem had to be in that syntax. By changing the data frame argument to subset() from the overall chemdata to that of only a single stream, creation of the zoo object threw no errors and duplicated() returned zero rows. Wow! And I thought I understood the subset() syntax. I now do! Thanks to you, Gabor, and everyone else who responded to this thread, Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'
On Jan 3, 2012, at 11:45 AM, Rich Shepard wrote: I have a situation that I cannot resolve by myself. When I try to create a zoo object (with read.zoo() ) I get this error: Error in merge.zoo(`BC-0.5 = c( 0.000,0.010,0.010, 0.060, : series cannot be merged with non-unique index entries in a series This suggests that there is a duplicate entry for the factor BC-0.5 on a given date. Because the data originate in a relational database table with a multi-column primary key there are no duplicate rows in the table, or the text file copied from that table. I need to find the source of this error. How can I identify the non-unique index entries within R? ?duplicated -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'
On Tue, 3 Jan 2012, David Winsemius wrote: How can I identify the non-unique index entries within R? ?duplicated Thank you, David. I _think_ the problem comes from a duplated factor column in the data frame. Now I need to figure out how subset() generated that additional column. Regards, Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'
On Tue, 3 Jan 2012, Rich Shepard wrote: I _think_ the problem comes from a duplicated factor column in the data frame. Now I need to figure out how subset() generated that additional column. Nope. That's not it. Running 'duplicated(burns.tds, incomparables = FALSE)' produces a listing of FALSE and TRUE keyed by position: duplicated(burns.tds, incomparables = FALSE) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE ... but, summary(burns.tds), str(burns.tds), or burns.tds produces a comparable list so I can find the TRUE by position and determine what's duplicated. What do I do to use this output to find the duplicates that won't let read.zoo() complete? Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'
On Jan 3, 2012, at 12:26 PM, Rich Shepard wrote: On Tue, 3 Jan 2012, David Winsemius wrote: How can I identify the non-unique index entries within R? ?duplicated Thank you, David. I _think_ the problem comes from a duplated factor column in the data frame. Now I need to figure out how subset() generated that additional column. A subsetting vector could contain (or perhaps cause [ to create) duplicates. data.frame(a=1:10)[ seq(1,10, by=0.5),] [1] 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'
On Jan 3, 2012, at 12:48 PM, Rich Shepard wrote: On Tue, 3 Jan 2012, Rich Shepard wrote: I _think_ the problem comes from a duplicated factor column in the data frame. Now I need to figure out how subset() generated that additional column. Nope. That's not it. Running 'duplicated(burns.tds, incomparables = FALSE)' produces a listing of FALSE and TRUE keyed by position: duplicated(burns.tds, incomparables = FALSE) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE ... That's rather unconvincing. What does this show: burns.tds[ !duplicated(burns.tds) ] but, summary(burns.tds), str(burns.tds), or burns.tds produces a comparable list so I can find the TRUE by position and determine what's duplicated. What do I do to use this output to find the duplicates that won't let read.zoo() complete? Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'
On Tue, 3 Jan 2012, David Winsemius wrote: That's rather unconvincing. What does this show: burns.tds[ !duplicated(burns.tds) ] I saw that in the help page but assumed it was the opposite of duplicated; apparently not. burns.tds[ !duplicated(burns.tds) ] Error in .data.frame(burns.tds, !duplicated(burns.tds)) : undefined columns selected burns.tds was generated by burns.tds - subset(chemdata, stream == 'BurnsCrk', select = c(site, sampdate, param == 'TDS', quant), drop = T) and has this structure 'data.frame': 2472 obs. of 3 variables: $ site: Factor w/ 137 levels BC-0.5,BC-1,..: 5 5 5 5 5 5 5 5 ... $ sampdate: Factor w/ 1056 levels 1978-03-28,1978-04-11,..: 155 156 158 161 163 164 172 175 177 309 ... $ quant : num 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 8.08 ... Because the data frame chemdata came from a relational table with a primary key of (site, sampdate, param) I am having trouble understanding where duplicate rows could have originated. Thanks, David, Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'
Maybe we need to backtrack a bit. You originally were complaining about an error that said you had duplicated index entries as you attempted to make a zoo object. I assumed, incorrectly it now appears, that you understood that an index in a zoo object was a vector. You now seem to be admitting that you were trying to use an entire dataframe as your index. As the acronym goes, ... DDT. On Jan 3, 2012, at 1:53 PM, Rich Shepard wrote: On Tue, 3 Jan 2012, David Winsemius wrote: That's rather unconvincing. What does this show: burns.tds[ !duplicated(burns.tds) ] I saw that in the help page but assumed it was the opposite of duplicated; apparently not. burns.tds[ !duplicated(burns.tds) ] Error in .data.frame(burns.tds, !duplicated(burns.tds)) : undefined columns selected Right. If you had said it was a dataframe, I would have suggested: burns.tds[ !duplicated(burns.tds) , ] But that would only identify entire duplicated rows; it would not cure the misguided notion of creating a zoo-index from a dataframe. burns.tds was generated by burns.tds - subset(chemdata, stream == 'BurnsCrk', select = c(site, sampdate, param == 'TDS', quant), drop = T) and has this structure 'data.frame': 2472 obs. of 3 variables: $ site: Factor w/ 137 levels BC-0.5,BC-1,..: 5 5 5 5 5 5 5 5 ... $ sampdate: Factor w/ 1056 levels 1978-03-28,1978-04-11,..: 155 156 158 161 163 164 172 175 177 309 ... $ quant : num 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 8.08 ... Because the data frame chemdata came from a relational table with a primary key of (site, sampdate, param) I am having trouble understanding where duplicate rows could have originated. You still have not really described what your are trying to do ... or with what data you are trying to do it with. You might want to think about taking that sampdate which is now a factor and turinging it into a data object which would then satisfy the requirements of an index for a zoo object. The second letter in zoo stands for ordered and factors have no order. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'
On Tue, 3 Jan 2012, David Winsemius wrote: Maybe we need to backtrack a bit. Yes. I've been trying to do this but still have too little experience with R to be successful on my own. You originally were complaining about an error that said you had duplicated index entries as you attempted to make a zoo object. I assumed, incorrectly it now appears, that you understood that an index in a zoo object was a vector. You now seem to be admitting that you were trying to use an entire dataframe as your index. As the acronym goes, ... DDT. My understanding, apparently incorrect, was that read.zoo() converted a data frame to a zoo object with the date column as the index vector. From the read.zoo help page: file: character string or strings giving the name of the file(s) which the data are to be read from/written to. ... ‘file’ can be a ‘connection’ or a ‘data.frame’ (e.g., resulting from a previous ‘read.table’ call) that is subsequently processed to a ‘zoo’ series. Based on this, I created a subset for a single parameter: burns.tds - subset(chemdata, stream == 'BurnsCrk', select = c(site, sampdate, param == 'TDS', quant), drop = T) This provides three columns: site sampdate quant 599 BC-3 1992-03-27 0.1 600 BC-3 1992-04-30 0.1 601 BC-3 1992-05-30 0.1 603 BC-3 1992-06-19 0.1 1214 BC-3 1992-07-20 0.1 1215 BC-3 1992-08-10 0.1 Then, to create the zoo object, burns.tds.z - read.zoo(burns.tds, split = 1, index = 2) Right. If you had said it was a dataframe, I would have suggested: burns.tds[ !duplicated(burns.tds) , ] But that would only identify entire duplicated rows; it would not cure the misguided notion of creating a zoo-index from a dataframe. Where did I go astray in trying to create a zoo-index with this procedure? How do I extract zoo objects from a data frame? You still have not really described what your are trying to do ... or with what data you are trying to do it with. I have a data frame with water quality sampling data and I'm now trying to plot time series of specific chemical concentrations (y axis) as a function of their irregular collection over a period of 30 years or less. I want to plot the time series for each site along the stream as a separate line in the same panel. You might want to think about taking that sampdate which is now a factor and turinging it into a data object which would then satisfy the requirements of an index for a zoo object. I had run the read.table() function again and forgot to convert the date column from a factor to a date. I've now done this but the result is still the same: str(burns.tds) 'data.frame': 2472 obs. of 3 variables: $ site: Factor w/ 137 levels BC-0.5,BC-1,..: 5 5 5 5 5 5 5 5 5 5 $ sampdate: Date, format: 1992-03-27 1992-04-30 ... $ quant : num 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 8.08 ... burns.tds.z - read.zoo(burns.tds, split = 1, index = 2) Error in merge.zoo(C-0.5 = c(0, 0.01, 0.01, 0.06, 0.18, NA, 76.56, : series cannot be merged with non-unique index entries in a series In addition: Warning messages: 1: In zoo(rval4[[i]], ix[[i]]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique I want to learn how to get data into zoo objects for time series analyses so I greatly appreciate the help provided here. Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'
On Jan 3, 2012, at 5:53 PM, Rich Shepard wrote: On Tue, 3 Jan 2012, David Winsemius wrote: Maybe we need to backtrack a bit. Yes. I've been trying to do this but still have too little experience with R to be successful on my own. You originally were complaining about an error that said you had duplicated index entries as you attempted to make a zoo object. I assumed, incorrectly it now appears, that you understood that an index in a zoo object was a vector. You now seem to be admitting that you were trying to use an entire dataframe as your index. As the acronym goes, ... DDT. My understanding, apparently incorrect, was that read.zoo() converted a data frame to a zoo object with the date column as the index vector. From the read.zoo help page: file: character string or strings giving the name of the file(s) which the data are to be read from/written to. ... ‘file’ can be a ‘connection’ or a ‘data.frame’ (e.g., resulting from a previous ‘read.table’ call) that is subsequently processed to a ‘zoo’ series. Based on this, I created a subset for a single parameter: burns.tds - subset(chemdata, stream == 'BurnsCrk', select = c(site, sampdate, param == 'TDS', quant), drop = T) This provides three columns: site sampdate quant 599 BC-3 1992-03-27 0.1 600 BC-3 1992-04-30 0.1 601 BC-3 1992-05-30 0.1 603 BC-3 1992-06-19 0.1 1214 BC-3 1992-07-20 0.1 1215 BC-3 1992-08-10 0.1 Then, to create the zoo object, burns.tds.z - read.zoo(burns.tds, split = 1, index = 2) Right. If you had said it was a dataframe, I would have suggested: burns.tds[ !duplicated(burns.tds) , ] But that would only identify entire duplicated rows; it would not cure the misguided notion of creating a zoo-index from a dataframe. Where did I go astray in trying to create a zoo-index with this procedure? How do I extract zoo objects from a data frame? You still have not really described what your are trying to do ... or with what data you are trying to do it with. I have a data frame with water quality sampling data and I'm now trying to plot time series of specific chemical concentrations (y axis) as a function of their irregular collection over a period of 30 years or less. I want to plot the time series for each site along the stream as a separate line in the same panel. You might want to think about taking that sampdate which is now a factor and turinging it into a data object which would then satisfy the requirements of an index for a zoo object. I had run the read.table() function again and forgot to convert the date column from a factor to a date. I've now done this but the result is still the same: str(burns.tds) 'data.frame': 2472 obs. of 3 variables: $ site: Factor w/ 137 levels BC-0.5,BC-1,..: 5 5 5 5 5 5 5 5 5 5 $ sampdate: Date, format: 1992-03-27 1992-04-30 ... $ quant : num 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 8.08 ... burns.tds.z - read.zoo(burns.tds, split = 1, index = 2) Error in merge.zoo(C-0.5 = c(0, 0.01, 0.01, 0.06, 0.18, NA, 76.56, : series cannot be merged with non-unique index entries in a series In addition: Warning messages: 1: In zoo(rval4[[i]], ix[[i]]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique So NOW you need to look at : burns.tds$sampdate[ duplicated(burns.tds$sampdate) ] But since you are using data that is in the long format you will need to do it within categories determined by your site variable. I would be starting with tapply for that purpose, but I suppose you could just use with(chamedata, table(sampdate, site) ) and see if any anomalies popped out. I want to learn how to get data into zoo objects for time series analyses so I greatly appreciate the help provided here. Rich -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.