Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'

2012-01-04 Thread Rich Shepard

On Tue, 3 Jan 2012, David Winsemius wrote:


burns.tds[ !duplicated(burns.tds) ,  ]


  Apparently it does not matter if the site column in the data frame is a
factor or a character, read.zoo() generates the same error. Applying the
above produces a long list starting with:

burns.tds[!duplicated(burns.tds), ]
site   sampdatequant
599 BC-3 1992-03-270.100
600 BC-3 1992-04-300.100
601 BC-3 1992-05-300.100
603 BC-3 1992-06-190.100
1214BC-3 1992-07-200.100
1215BC-3 1992-08-100.100
1216BC-3 1992-09-300.100
1217BC-3 1992-10-290.100
1218BC-3 1992-11-190.100
1929BC-3 1995-03-238.080

  I don't know how to interpret this. I don't see two rows with the same
values, but ~ 500 rows each with a different value. What is duplicated? The
entire row? The site ID?

  ?duplicated has some examples, but those do not show the output of the
function nor explain what's duplicated.

  I need to get past this blockage and appreciate your help in determining
why read.zoo() sees duplicates when the database table has none, and how to
resolve this issue.

TIA,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'

2012-01-04 Thread David Winsemius


On Jan 4, 2012, at 12:21 PM, Rich Shepard wrote:


On Tue, 3 Jan 2012, David Winsemius wrote:


burns.tds[ !duplicated(burns.tds) ,  ]


 Apparently it does not matter if the site column in the data frame  
is a
factor or a character, read.zoo() generates the same error. Applying  
the

above produces a long list starting with:

burns.tds[!duplicated(burns.tds), ]
   site   sampdatequant
599 BC-3 1992-03-270.100
600 BC-3 1992-04-300.100
601 BC-3 1992-05-300.100
603 BC-3 1992-06-190.100
1214BC-3 1992-07-200.100
1215BC-3 1992-08-100.100
1216BC-3 1992-09-300.100
1217BC-3 1992-10-290.100
1218BC-3 1992-11-190.100
1929BC-3 1995-03-238.080

 I don't know how to interpret this. I don't see two rows with the  
same
values, but ~ 500 rows each with a different value. What is  
duplicated? The

entire row? The site ID?


You didn't ask for what was duplicated, but rather what was NOT  
duplicated with that code. In the case of a dataframe it is the entire  
row that is tested.




 ?duplicated has some examples, but those do not show the output of  
the

function nor explain what's duplicated.

 I need to get past this blockage and appreciate your help in  
determining
why read.zoo() sees duplicates when the database table has none, and  
how to

resolve this issue.


I think you need to reduce this problem to a dataframe that you either  
post an access method for or use dput() to include. Then you need to  
say what you goals are and what code is not working on that example.




TIA,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'

2012-01-04 Thread Rich Shepard

On Wed, 4 Jan 2012, David Winsemius wrote:


You didn't ask for what was duplicated, but rather what was NOT duplicated
with that code. In the case of a dataframe it is the entire row that is
tested.


  My original question was what was duplicated, but ... I changed the
function by dropping the 'not'. There's something seriously wrong here and I
need help from R gurus to tell me why.

  Example:

burns.tds[duplicated(burns.tds), ]
  ...
25760 BC-1.5 1996-09-19  NA
25761 BC-1.5 1996-09-19   0.010
  ...

  But, when I query the database table I see this:

select * from chemistry where site = 'BC-1.5' and sampdate = '1996-09-19'
and param = 'TDS';
  site  |  sampdate  | param | quant | units | qual | easting | northing |
 stream  | basin 
++---+---+---+--+-+--+-

-+
 BC-1.5 | 1996-09-19 | TDS   |   935 | mg/L  |  | |  | 
BurnsCrk | 
(1 row)


  There is only a single row for that site, sampdate, and parameter and the
quantity is different from those in the R data frame.


I think you need to reduce this problem to a dataframe that you either
post an access method for or use dput() to include. Then you need to say
what you goals are and what code is not working on that example.


  I'll gladly do this. Which data frame should I make available: the
original chemdata or the subset burns.tds? I'll start with the latter.
Compressed dput() output attached.

  My goal is to produce time series plots of TDS, by site, on several
streams over the period for which that component was measured. Lattice lets
me superpose multiple lines on the same axis set with different color lines
and a legend.

  What's not working is something in the workflow of subsettiong chemdata to
extract all TDS data for a named stream (e.g., burns.tds and winters.tds),
then convert them to zoo objects using read.zoo(). Somewhere along this
process my data are being mangled. It's not in the source data frame,
chemdata:

chemdata[duplicated(chemdata), ]
 [1] site sampdate paramquantunitsqual easting  northing
 [9] stream   basin 
0 rows (or 0-length row.names)


  The command I used to subset burns.tds from chemdata was:

burns.tds - subset(chemdata, stream == 'BurnsCrk', select = c(site,
sampdate, param == 'TDS', quant), drop = T)

Thanks, David,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'

2012-01-04 Thread David Winsemius
Nothing attached. I don't know what you entitled teh compressed dput  
output but it did not pass the filters of the mailserver and you did  
not copy me. If chemdata is available as a text file, hten make sure  
its extension is .txt and then attach it.


--
David.

On Jan 4, 2012, at 1:31 PM, Rich Shepard wrote:


On Wed, 4 Jan 2012, David Winsemius wrote:

You didn't ask for what was duplicated, but rather what was NOT  
duplicated
with that code. In the case of a dataframe it is the entire row  
that is

tested.


 My original question was what was duplicated, but ... I changed the
function by dropping the 'not'. There's something seriously wrong  
here and I

need help from R gurus to tell me why.

 Example:

burns.tds[duplicated(burns.tds), ]
 ...
25760 BC-1.5 1996-09-19  NA
25761 BC-1.5 1996-09-19   0.010
 ...

 But, when I query the database table I see this:

select * from chemistry where site = 'BC-1.5' and sampdate =  
'1996-09-19'

and param = 'TDS';
 site  |  sampdate  | param | quant | units | qual | easting |  
northing |
stream  | basin ++---+---+---+-- 
+-+--+-

-+
BC-1.5 | 1996-09-19 | TDS   |   935 | mg/L  |  |  
|  | BurnsCrk | (1 row)


 There is only a single row for that site, sampdate, and parameter  
and the

quantity is different from those in the R data frame.

I think you need to reduce this problem to a dataframe that you  
either
post an access method for or use dput() to include. Then you need  
to say

what you goals are and what code is not working on that example.


 I'll gladly do this. Which data frame should I make available: the
original chemdata or the subset burns.tds? I'll start with the latter.
Compressed dput() output attached.

 My goal is to produce time series plots of TDS, by site, on several
streams over the period for which that component was measured.  
Lattice lets
me superpose multiple lines on the same axis set with different  
color lines

and a legend.

 What's not working is something in the workflow of subsettiong  
chemdata to
extract all TDS data for a named stream (e.g., burns.tds and  
winters.tds),
then convert them to zoo objects using read.zoo(). Somewhere along  
this

process my data are being mangled. It's not in the source data frame,
chemdata:

chemdata[duplicated(chemdata), ]
[1] site sampdate paramquantunitsqual easting   
northing

[9] stream   basin 0 rows (or 0-length row.names)

 The command I used to subset burns.tds from chemdata was:

burns.tds - subset(chemdata, stream == 'BurnsCrk', select = c(site,
sampdate, param == 'TDS', quant), drop = T)

Thanks, David,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries' [FIXED]

2012-01-04 Thread Rich Shepard

On Wed, 4 Jan 2012, David Winsemius wrote:

Nothing attached. I don't know what you entitled teh compressed dput output 
but it did not pass the filters of the mailserver and you did not copy me.


David,

  It must have been stripped off as too large (14K).

  Regardless, I solved the problem:

  The examples in the subset help page show only a single row criterion
being used, but does not explicitly note that rows can be selected only one
criterion at a time.

  Because the issue showed up only with the subset() function to extract
rows with the parameter TDS, the problem had to be in that syntax. By
changing the data frame argument to subset() from the overall chemdata to
that of only a single stream, creation of the zoo object threw no errors and
duplicated() returned zero rows.

  Wow! And I thought I understood the subset() syntax. I now do!

Thanks to you, Gabor, and everyone else who responded to this thread,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries' [FIXED]

2012-01-04 Thread David Winsemius


On Jan 4, 2012, at 3:21 PM, Rich Shepard wrote:


On Wed, 4 Jan 2012, David Winsemius wrote:

Nothing attached. I don't know what you entitled teh compressed  
dput output but it did not pass the filters of the mailserver and  
you did not copy me.


David,

 It must have been stripped off as too large (14K).

 Regardless, I solved the problem:

 The examples in the subset help page show only a single row criterion
being used, but does not explicitly note that rows can be selected  
only one

criterion at a time.


Because that is simply not true. Connect your criteria with ampersands  
and you can have as many as you want. The only requirement is that the  
logical vector that results be exactly the number of rows in the  
dataframe  as described in the help page.




 Because the issue showed up only with the subset() function to  
extract

rows with the parameter TDS, the problem had to be in that syntax. By
changing the data frame argument to subset() from the overall  
chemdata to
that of only a single stream, creation of the zoo object threw no  
errors and

duplicated() returned zero rows.

 Wow! And I thought I understood the subset() syntax. I now do!

Thanks to you, Gabor, and everyone else who responded to this thread,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'

2012-01-03 Thread David Winsemius


On Jan 3, 2012, at 11:45 AM, Rich Shepard wrote:

 I have a situation that I cannot resolve by myself. When I try to  
create a

zoo object (with read.zoo() ) I get this error:

Error in merge.zoo(`BC-0.5 = c(   0.000,0.010,0.010, 
0.060,  :
 series cannot be merged with non-unique index entries in a series

 This suggests that there is a duplicate entry for the factor BC-0.5  
on a
given date. Because the data originate in a relational database  
table with
a multi-column primary key there are no duplicate rows in the table,  
or the
text file copied from that table. I need to find the source of this  
error.


 How can I identify the non-unique index entries within R?


?duplicated

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'

2012-01-03 Thread Rich Shepard

On Tue, 3 Jan 2012, David Winsemius wrote:


How can I identify the non-unique index entries within R?


?duplicated


  Thank you, David.

  I _think_ the problem comes from a duplated factor column in the data
frame. Now I need to figure out how subset() generated that additional
column.

Regards,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'

2012-01-03 Thread Rich Shepard

On Tue, 3 Jan 2012, Rich Shepard wrote:


 I _think_ the problem comes from a duplicated factor column in the data
frame. Now I need to figure out how subset() generated that additional
column.


  Nope. That's not it.

  Running 'duplicated(burns.tds, incomparables = FALSE)' produces a listing
of FALSE and TRUE keyed by position:


duplicated(burns.tds, incomparables = FALSE)

[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
 ...

but, summary(burns.tds), str(burns.tds), or burns.tds produces a comparable
list so I can find the TRUE by position and determine what's duplicated.
What do I do to use this output to find the duplicates that won't let
read.zoo() complete?

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'

2012-01-03 Thread David Winsemius


On Jan 3, 2012, at 12:26 PM, Rich Shepard wrote:


On Tue, 3 Jan 2012, David Winsemius wrote:


How can I identify the non-unique index entries within R?


?duplicated


 Thank you, David.

 I _think_ the problem comes from a duplated factor column in the data
frame. Now I need to figure out how subset() generated that additional
column.


A subsetting vector could contain (or perhaps cause [ to create)   
duplicates.


data.frame(a=1:10)[ seq(1,10, by=0.5),]
 [1]  1  1  2  2  3  3  4  4  5  5  6  6  7  7  8  8  9  9 10



--


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'

2012-01-03 Thread David Winsemius


On Jan 3, 2012, at 12:48 PM, Rich Shepard wrote:


On Tue, 3 Jan 2012, Rich Shepard wrote:

I _think_ the problem comes from a duplicated factor column in the  
data
frame. Now I need to figure out how subset() generated that  
additional

column.


 Nope. That's not it.

 Running 'duplicated(burns.tds, incomparables = FALSE)' produces a  
listing

of FALSE and TRUE keyed by position:


duplicated(burns.tds, incomparables = FALSE)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  
FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  
FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  
FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  
FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  
FALSE FALSE
[61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  
FALSE FALSE
[73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  
FALSE FALSE
[85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  
FALSE FALSE
[97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  
FALSE FALSE
[109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE  
FALSE FALSE

...



That's rather unconvincing. What does this show:

burns.tds[ !duplicated(burns.tds) ]


but, summary(burns.tds), str(burns.tds), or burns.tds produces a  
comparable
list so I can find the TRUE by position and determine what's  
duplicated.

What do I do to use this output to find the duplicates that won't let
read.zoo() complete?

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'

2012-01-03 Thread Rich Shepard

On Tue, 3 Jan 2012, David Winsemius wrote:


That's rather unconvincing. What does this show:
burns.tds[ !duplicated(burns.tds) ]


  I saw that in the help page but assumed it was the opposite of duplicated;
apparently not.

burns.tds[ !duplicated(burns.tds) ]
Error in .data.frame(burns.tds, !duplicated(burns.tds)) :
  undefined columns selected

  burns.tds was generated by

burns.tds - subset(chemdata, stream == 'BurnsCrk', select = c(site,
sampdate, param == 'TDS', quant), drop = T)

and has this structure

'data.frame':   2472 obs. of  3 variables:
 $ site: Factor w/ 137 levels BC-0.5,BC-1,..: 5 5 5 5 5 5 5 5 ...
 $ sampdate: Factor w/ 1056 levels 1978-03-28,1978-04-11,..: 155 156 158
161 163 164 172 175 177 309 ...
 $ quant   : num  0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 8.08 ...

  Because the data frame chemdata came from a relational table with a
primary key of (site, sampdate, param) I am having trouble understanding
where duplicate rows could have originated.

Thanks, David,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'

2012-01-03 Thread David Winsemius
Maybe we need to backtrack a bit. You originally were complaining  
about an error that said you had duplicated index entries as you  
attempted to make a zoo object. I assumed, incorrectly it now  
appears,  that you understood that an index in a zoo object was a  
vector. You now seem to be admitting that you were trying to use an  
entire dataframe as your index. As the acronym goes, ...  DDT.



On Jan 3, 2012, at 1:53 PM, Rich Shepard wrote:


On Tue, 3 Jan 2012, David Winsemius wrote:


That's rather unconvincing. What does this show:
burns.tds[ !duplicated(burns.tds) ]


 I saw that in the help page but assumed it was the opposite of  
duplicated;

apparently not.

burns.tds[ !duplicated(burns.tds) ]
Error in .data.frame(burns.tds, !duplicated(burns.tds)) :
 undefined columns selected


Right. If you had said it was a dataframe, I would have suggested:

burns.tds[ !duplicated(burns.tds) ,  ]

But that would only identify entire duplicated rows; it would  not  
cure the misguided notion of creating a zoo-index from a dataframe.




 burns.tds was generated by

burns.tds - subset(chemdata, stream == 'BurnsCrk', select = c(site,
sampdate, param == 'TDS', quant), drop = T)

and has this structure

'data.frame':   2472 obs. of  3 variables:
$ site: Factor w/ 137 levels BC-0.5,BC-1,..: 5 5 5 5 5 5 5  
5 ...
$ sampdate: Factor w/ 1056 levels 1978-03-28,1978-04-11,..: 155  
156 158

161 163 164 172 175 177 309 ...
$ quant   : num  0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 8.08 ...

 Because the data frame chemdata came from a relational table with a
primary key of (site, sampdate, param) I am having trouble  
understanding

where duplicate rows could have originated.


You still have not really described what your are trying to do ... or  
with what data you are trying to do it with. You might want to think  
about taking that sampdate which is now a factor and turinging it into  
a data object which would then satisfy the requirements of an index  
for a zoo object. The second letter in zoo stands for ordered and  
factors have no order.




--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'

2012-01-03 Thread Rich Shepard

On Tue, 3 Jan 2012, David Winsemius wrote:


Maybe we need to backtrack a bit.


  Yes. I've been trying to do this but still have too little experience with
R to be successful on my own.


You originally were complaining about an error that said you had
duplicated index entries as you attempted to make a zoo object. I assumed,
incorrectly it now appears, that you understood that an index in a zoo
object was a vector. You now seem to be admitting that you were trying to
use an entire dataframe as your index. As the acronym goes,
...  DDT.


  My understanding, apparently incorrect, was that read.zoo() converted a
data frame to a zoo object with the date column as the index vector. From
the read.zoo help page:

file: character string or strings giving the name of the file(s) which
  the data are to be read from/written to. ... ‘file’ can be a
  ‘connection’ or a ‘data.frame’ (e.g., resulting from a previous
  ‘read.table’ call) that is subsequently processed to a ‘zoo’
  series.

  Based on this, I created a subset for a single parameter:

burns.tds - subset(chemdata, stream == 'BurnsCrk', select = c(site,
sampdate, param == 'TDS', quant), drop = T)

This provides three columns:

 site   sampdate quant
599  BC-3 1992-03-27   0.1
600  BC-3 1992-04-30   0.1
601  BC-3 1992-05-30   0.1
603  BC-3 1992-06-19   0.1
1214 BC-3 1992-07-20   0.1
1215 BC-3 1992-08-10   0.1

  Then, to create the zoo object,

burns.tds.z - read.zoo(burns.tds, split = 1, index = 2)



Right. If you had said it was a dataframe, I would have suggested:

burns.tds[ !duplicated(burns.tds) ,  ]

But that would only identify entire duplicated rows; it would  not cure the 
misguided notion of creating a zoo-index from a dataframe.


  Where did I go astray in trying to create a zoo-index with this procedure?
How do I extract zoo objects from a data frame?

You still have not really described what your are trying to do ... or with 
what data you are trying to do it with.


  I have a data frame with water quality sampling data and I'm now trying to
plot time series of specific chemical concentrations (y axis) as a function
of their irregular collection over a period of 30 years or less. I want to
plot the time series for each site along the stream as a separate line in
the same panel.


You might want to think about taking that sampdate which is now a factor
and turinging it into a data object which would then satisfy the
requirements of an index for a zoo object.


  I had run the read.table() function again and forgot to convert the date
column from a factor to a date. I've now done this but the result is still
the same:

str(burns.tds)
'data.frame':   2472 obs. of  3 variables:
 $ site: Factor w/ 137 levels BC-0.5,BC-1,..: 5 5 5 5 5 5 5 5 5 5
 $ sampdate: Date, format: 1992-03-27 1992-04-30 ...
 $ quant   : num  0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 8.08 ...

burns.tds.z - read.zoo(burns.tds, split = 1, index = 2)
Error in merge.zoo(C-0.5 = c(0, 0.01, 0.01, 0.06, 0.18, NA, 76.56,  :
  series cannot be merged with non-unique index entries in a series
In addition: Warning messages:
1: In zoo(rval4[[i]], ix[[i]]) :
  some methods for “zoo” objects do not work if the index entries in
‘order.by’ are not unique

  I want to learn how to get data into zoo objects for time series analyses
so I greatly appreciate the help provided here.

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding Source of Error Message of 'Non-Unique Index Entries'

2012-01-03 Thread David Winsemius


On Jan 3, 2012, at 5:53 PM, Rich Shepard wrote:


On Tue, 3 Jan 2012, David Winsemius wrote:


Maybe we need to backtrack a bit.


 Yes. I've been trying to do this but still have too little  
experience with

R to be successful on my own.


You originally were complaining about an error that said you had
duplicated index entries as you attempted to make a zoo object. I  
assumed,
incorrectly it now appears, that you understood that an index in a  
zoo
object was a vector. You now seem to be admitting that you were  
trying to

use an entire dataframe as your index. As the acronym goes,
...  DDT.


 My understanding, apparently incorrect, was that read.zoo()  
converted a
data frame to a zoo object with the date column as the index vector.  
From

the read.zoo help page:

   file: character string or strings giving the name of the file(s)  
which

 the data are to be read from/written to. ... ‘file’ can be a
 ‘connection’ or a ‘data.frame’ (e.g., resulting from a  
previous
 ‘read.table’ call) that is subsequently processed to a  
‘zoo’

 series.

 Based on this, I created a subset for a single parameter:

burns.tds - subset(chemdata, stream == 'BurnsCrk', select = c(site,
sampdate, param == 'TDS', quant), drop = T)

This provides three columns:

site   sampdate quant
599  BC-3 1992-03-27   0.1
600  BC-3 1992-04-30   0.1
601  BC-3 1992-05-30   0.1
603  BC-3 1992-06-19   0.1
1214 BC-3 1992-07-20   0.1
1215 BC-3 1992-08-10   0.1

 Then, to create the zoo object,

burns.tds.z - read.zoo(burns.tds, split = 1, index = 2)



Right. If you had said it was a dataframe, I would have suggested:

burns.tds[ !duplicated(burns.tds) ,  ]

But that would only identify entire duplicated rows; it would  not  
cure the misguided notion of creating a zoo-index from a dataframe.


 Where did I go astray in trying to create a zoo-index with this  
procedure?

How do I extract zoo objects from a data frame?

You still have not really described what your are trying to do ...  
or with what data you are trying to do it with.


 I have a data frame with water quality sampling data and I'm now  
trying to
plot time series of specific chemical concentrations (y axis) as a  
function
of their irregular collection over a period of 30 years or less. I  
want to
plot the time series for each site along the stream as a separate  
line in

the same panel.

You might want to think about taking that sampdate which is now a  
factor

and turinging it into a data object which would then satisfy the
requirements of an index for a zoo object.


 I had run the read.table() function again and forgot to convert the  
date
column from a factor to a date. I've now done this but the result is  
still

the same:

str(burns.tds)
'data.frame':   2472 obs. of  3 variables:
$ site: Factor w/ 137 levels BC-0.5,BC-1,..: 5 5 5 5 5 5 5 5  
5 5

$ sampdate: Date, format: 1992-03-27 1992-04-30 ...
$ quant   : num  0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 8.08 ...

burns.tds.z - read.zoo(burns.tds, split = 1, index = 2)
Error in merge.zoo(C-0.5 = c(0, 0.01, 0.01, 0.06, 0.18, NA, 76.56,  :
 series cannot be merged with non-unique index entries in a series
In addition: Warning messages:
1: In zoo(rval4[[i]], ix[[i]]) :
 some methods for “zoo” objects do not work if the index entries in
‘order.by’ are not unique


So NOW you need to look at :

burns.tds$sampdate[ duplicated(burns.tds$sampdate) ]

But since you are using data that is in the long format you will need  
to do it within categories determined by your site variable. I would  
be starting with tapply for that purpose, but I suppose you could just  
use


with(chamedata, table(sampdate, site) ) and see if any anomalies  
popped out.




 I want to learn how to get data into zoo objects for time series  
analyses

so I greatly appreciate the help provided here.

Rich


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.