Re: [R-sig-teaching] Graph Two Series over Time

2015-12-29 Thread Steven Stoline
Dear Randall:

I could not find package 9or function) called "*tidyr*". I install all
other packages, but could not find tidyr.

with many thanks
steve

On Tue, Dec 29, 2015 at 5:43 PM, Randall Pruim  wrote:

> A few more suggestions and an update to my ggplot2 plot.
>
>   1) I recommend using SPACES in your code to make things more readable.
>   2) Coding things with COLOR isn’t really very useful.  This is an
> additional variable and should be coded as such.
>   3) I don’t really know what detected means, but I’ve coded it as a
> logical variable.  You could use a factor or character vector instead.
>   4) You have used inconsistent date formatting which (without my edits)
> will cause some years to be 0005 and others to be 2005.  (This will be
> immediately clear when the plot spans 2000 years — that’s how I detected
> the problem.)
>
> Here’s what my first draft would look like:
>
>
> ### Put data into a data frame -- avoid loose vectors
> library(dplyr); library(lubridate); require(tidyr)
> library(ggplot2)
>
> # recreate your data in a data frame
> MyData <- data_frame(
>   Well1 =
> c(0.005,0.005,0.004,0.006,0.004,0.009,0.017,0.045,0.05,0.07,0.12,0.10,NA,0.20,0.25),
>   Well2 =
> c(0.10,0.12,0.125,0.107,0.099,0.11,0.13,0.109,NA,0.10,0.115,0.14,0.17,NA,0.11),
>   dateString =
> c("2Jan05","7April05","17July05","24Oct05","7Jan06","30March06","28Jun06",
>
>  
> "2Oct06","17Oct06","15Jan07","10April07","9July07","5Oct07","29Oct07","30Dec07"),
>   date = dmy(dateString)
> )
>
> # put the data into "long" format
> MyData2 <-
>   MyData %>%
>   gather(location, concentration, Well1, Well2) %>%
>   mutate(detected = TRUE)
>
> # hand-code your colored values (should be double checked for accuracy)
>
> MyData2$detected[c(1, 2, 5, 15 + 1, 15 + 5, 15 + 10)] <- FALSE
>
> # Create plot using ggplot2
>
> ggplot( data = MyData2 %>% filter(!is.na(concentration)),
> aes(x = date, y = concentration, colour = location)) +
>   geom_line(alpha = 0.8) +
>   geom_point( aes(shape = detected, group = location), size = 3, alpha =
> 0.8) +
>   scale_shape_manual(values = c(1, 16)) +
>   theme_minimal()
>
>
>
>
> > On Dec 26, 2015, at 6:02 AM, Steven Stoline  wrote:
> >
> > Dear Randall:
> >
> >
> > Thank you very much for the details and for your support and patience.
> >
> >
> >
> > ### This how are the original data look like:
> > ### ---
> >
> >
> >
> >
> Well1<-c(0.005,0.005,0.004,0.006,0.004,0.009,0.017,0.045,0.05,0.07,0.12,0.10,NA,0.20,0.25)
> >
> >
> >
> >
> Well2<-c(0.10,0.12,0.125,0.107,0.099,0.11,0.13,0.109,NA,0.10,0.115,0.14,0.17,NA,0.11)
> >
> >
> >
> >
> date<-c("2Jan2005","7April05","17July05","24Oct05","7Jan06","30March06","28Jun06","2Oct06","17Oct06","15Jan07","10April07","9July07","5Oct07","29Oct07","30Dec07")
> >
> >
> >
> > The data values in red font are Non-detected. So I need to make
> difference between these non-detected values and the detected ones in the
> graph.
> >
> >
> >
> > For example, solid circle for the detected ones, and open circles for
> the non-detected one (the ones in red font).
> >
> >
> > So, I was trying to use pch for.
> >
> >
> >
> > Please notice that, now, both data sets Well1 and Well2, and date have
> the same length of 15, but Well1 has one NA, and Well2 has two NA.
> >
> >
> > Happy Holiday and Happy Christmas (if you are celebrating)
> >
> > with many thanks
> > steve
> >
> > On Thu, Dec 24, 2015 at 9:31 AM, Randall Pruim 
> wrote:
> > Steve,
> >
> > This is on the edge of what R-sig-teaching is for (since it isn’t really
> about teaching).  But since I think there are elements of what you are
> doing that lead students to think that R is terrible, I’ll show you how I
> might approach things.
> >
> > First a few comments about my solution.
> >
> > 1) I generally avoid loose vectors.  I prefer to use data frames to keep
> related vectors related.
> >
> > 2) I prefer to code dates as dates.  I would be very nervous about code
> that manually sets the axis labels differently from the data.  That can
> lead to all sorts of bad errors down the road if you change the data and
> forget to change the labels and often indicates you don’t have the data
> formatted the way you should.  (Note:  I added day of month values to your
> dates that had none.)  The lubridate package makes it easy to create dates
> from strings.
> >
> > 3) I rarely use base graphics, so I’ll show you solutions using lattice
> and ggplot2.  There may be nice ways to do this in base graphics as well.
> >
> > 4) I’m ignoring the color choices, title, etc.  All that can be easily
> added, but I’m focusing on getting the data display correct.  That’s
> generally the approach I take to plotting:  First get the data display
> correct, then fancy up titles, colors, fonts, etc.  It’s saves lots of
> times, because often once I see the plot, I realize it isn’t what I need,
> so there is no reason to gussy 

Re: [R-sig-teaching] Graph Two Series over Time

2015-12-26 Thread Steven Stoline
Dear Randall:


Thank you very much for the details and for your support and patience.



### This how are the original data look like:
### ---



Well1<-c(*0.005,0.005*,0.004,0.006,*0.004*
,0.009,0.017,0.045,0.05,0.07,0.12,0.10,NA,0.20,0.25)



Well2<-c(*0.10*,0.12,0.125,0.107,*0.099*,0.11,0.13,0.109,NA,*0.10*
,0.115,0.14,0.17,NA,0.11)



date<-c("2Jan2005","7April05","17July05","24Oct05","7Jan06","30March06","28Jun06","2Oct06","17Oct06","15Jan07","10April07","9July07","5Oct07","29Oct07","30Dec07")



The data values in red font are *Non-detected*. So I need to make
difference between these non-detected values and the detected ones in the
graph.



For example, solid circle for the detected ones, and open circles for the
non-detected one (the ones in red font).


So, I was trying to use pch for.



Please notice that, now, both data sets Well1 and Well2, and date have the
same length of 15, but Well1 has one NA, and Well2 has two NA.


Happy Holiday and Happy Christmas (if you are celebrating)

with many thanks
steve

On Thu, Dec 24, 2015 at 9:31 AM, Randall Pruim  wrote:

> Steve,
>
> This is on the edge of what R-sig-teaching is for (since it isn’t really
> about teaching).  But since I think there are elements of what you are
> doing that lead students to think that R is terrible, I’ll show you how I
> might approach things.
>
> First a few comments about my solution.
>
> 1) I generally avoid loose vectors.  I prefer to use data frames to keep
> related vectors related.
>
> 2) I prefer to code dates as dates.  I would be very nervous about code
> that manually sets the axis labels differently from the data.  That can
> lead to all sorts of bad errors down the road if you change the data and
> forget to change the labels and often indicates you don’t have the data
> formatted the way you should.  (Note:  I added day of month values to your
> dates that had none.)  The lubridate package makes it easy to create dates
> from strings.
>
> 3) I rarely use base graphics, so I’ll show you solutions using lattice
> and ggplot2.  There may be nice ways to do this in base graphics as well.
>
> 4) I’m ignoring the color choices, title, etc.  All that can be easily
> added, but I’m focusing on getting the data display correct.  That’s
> generally the approach I take to plotting:  First get the data display
> correct, then fancy up titles, colors, fonts, etc.  It’s saves lots of
> times, because often once I see the plot, I realize it isn’t what I need,
> so there is no reason to gussy it up.
>
> 5) I prefer (and lattice and ggplot2) encourage keeping the data
> manipulation in one location and the plotting after that rather than going
> back and forth between those two types of operations.  I find that it makes
> the code easier to read.
>
> 6) One of your series as fewer points than the other.  I made the
> assumption that the missing value was at the end.  That should be changed
> to whatever is correct for your data.
>
> 7) I don’t know what you were using pch to indicate, so I created a
> variable called “group” with values 0 and 15.  The variable and its values
> should ideally be renamed to reflect what they represent.  That will make
> your code easier to read and produce better labeling of the plot.
>
> And one note about your code.
>
> 6*0:max_y
>
>
> probably doesn’t do what you expect since the 6 does nothing here (because
> 6 * 0 = 0).  You could do 6 * (0:max_y), but isn’t clear why you would want
> the range of the plot to be six times that of the data.  Maybe you were
> thinking something like seq(0, max_y, length.out = 6), but that will give
> pretty ugly breakpoints.  In any case, the plots below do a fine job of
> setting the axes by default, and each system allows you to tune them if you
> disagree with the default for a particular plot.
>
>
> With that much preamble, the code is now shorter than the introduction.
>
>
> ### Put data into a data frame -- avoid loose vectors
> library(dplyr); library(lubridate)
>
> # if i knew what you were using pch for, i would name group and its values
> to match
> MyData <- data_frame(
>   Well1 =
> c(0.005,0.005,0.004,0.006,0.004,0.009,0.017,0.045,0.05,0.07,0.12,0.10,0.20,0.25),
>   Well2 =
> c(0.10,0.12,0.125,0.107,0.099,0.11,0.13,0.109,0.10,0.115,0.14,0.17,0.11,NA),
>   dateString =
> c("1Jan05","1April05","1Jul05","1Oct05","1Jan06","1March06","1Jun06","2Oct06","17Oct06","1Jan07","1April07","1Jul07","1Oct07","1Dec07"),
>   date = dmy(dateString),
>   group = factor(c(0,0,15,15,0,15,15,15,15,15,15,15,15,15))
> )
>
> ## using lattice
> ## lattice makes plotting two series easy
> ## but doesn't make it as easy to have different symbols along the same
> series
>
> library(lattice)
> xyplot(Well1 + Well2 ~ date, data = MyData, type = c("p","l"), auto.key =
> TRUE)
> ## better legend
> xyplot(Well1 + Well2 ~ date, data = MyData, type = c("p","l"),
>auto.key = list(points = TRUE, lines = 

Re: [R-sig-teaching] Graph Two Series over Time

2015-12-24 Thread Randall Pruim
Steve,

This is on the edge of what R-sig-teaching is for (since it isn’t really about 
teaching).  But since I think there are elements of what you are doing that 
lead students to think that R is terrible, I’ll show you how I might approach 
things.

First a few comments about my solution.

1) I generally avoid loose vectors.  I prefer to use data frames to keep 
related vectors related.

2) I prefer to code dates as dates.  I would be very nervous about code that 
manually sets the axis labels differently from the data.  That can lead to all 
sorts of bad errors down the road if you change the data and forget to change 
the labels and often indicates you don’t have the data formatted the way you 
should.  (Note:  I added day of month values to your dates that had none.)  The 
lubridate package makes it easy to create dates from strings.

3) I rarely use base graphics, so I’ll show you solutions using lattice and 
ggplot2.  There may be nice ways to do this in base graphics as well.

4) I’m ignoring the color choices, title, etc.  All that can be easily added, 
but I’m focusing on getting the data display correct.  That’s generally the 
approach I take to plotting:  First get the data display correct, then fancy up 
titles, colors, fonts, etc.  It’s saves lots of times, because often once I see 
the plot, I realize it isn’t what I need, so there is no reason to gussy it up.

5) I prefer (and lattice and ggplot2) encourage keeping the data manipulation 
in one location and the plotting after that rather than going back and forth 
between those two types of operations.  I find that it makes the code easier to 
read.

6) One of your series as fewer points than the other.  I made the assumption 
that the missing value was at the end.  That should be changed to whatever is 
correct for your data.

7) I don’t know what you were using pch to indicate, so I created a variable 
called “group” with values 0 and 15.  The variable and its values should 
ideally be renamed to reflect what they represent.  That will make your code 
easier to read and produce better labeling of the plot.

And one note about your code.

6*0:max_y

probably doesn’t do what you expect since the 6 does nothing here (because 6 * 
0 = 0).  You could do 6 * (0:max_y), but isn’t clear why you would want the 
range of the plot to be six times that of the data.  Maybe you were thinking 
something like seq(0, max_y, length.out = 6), but that will give pretty ugly 
breakpoints.  In any case, the plots below do a fine job of setting the axes by 
default, and each system allows you to tune them if you disagree with the 
default for a particular plot.


With that much preamble, the code is now shorter than the introduction.


### Put data into a data frame -- avoid loose vectors
library(dplyr); library(lubridate)

# if i knew what you were using pch for, i would name group and its values to 
match
MyData <- data_frame(
  Well1 = 
c(0.005,0.005,0.004,0.006,0.004,0.009,0.017,0.045,0.05,0.07,0.12,0.10,0.20,0.25),
  Well2 = 
c(0.10,0.12,0.125,0.107,0.099,0.11,0.13,0.109,0.10,0.115,0.14,0.17,0.11,NA),
  dateString = 
c("1Jan05","1April05","1Jul05","1Oct05","1Jan06","1March06","1Jun06","2Oct06","17Oct06","1Jan07","1April07","1Jul07","1Oct07","1Dec07"),
  date = dmy(dateString),
  group = factor(c(0,0,15,15,0,15,15,15,15,15,15,15,15,15))
)

## using lattice
## lattice makes plotting two series easy
## but doesn't make it as easy to have different symbols along the same series

library(lattice)
xyplot(Well1 + Well2 ~ date, data = MyData, type = c("p","l"), auto.key = TRUE)
## better legend
xyplot(Well1 + Well2 ~ date, data = MyData, type = c("p","l"),
   auto.key = list(points = TRUE, lines = TRUE))

## using ggplot2
## for highly customized plots, i generally find ggplot2 works better
## i would reshape the data with tidyr before plotting (could be don in lattice 
as well)

library(ggplot2); library(tidyr)

MyData2 <-
  MyData %>%
  gather(location, concentration, Well1, Well2)

ggplot( data = MyData2, aes(x = date, y = concentration, colour = location)) +
  geom_line() +
  geom_point( aes(shape = group), size = 2)

xyplot(concentration ~ date, data = MyData2, groups = location, type = c("p", 
"l"),
   auto.key = TRUE)

## without reshaping, you can plot 4 layers well manually, but the default 
labeling isn’t as nice

ggplot(data = MyData) +
  geom_line(aes(x = date, y = Well1, colour = "Well1")) +
  geom_line(aes(x = date, y = Well2, colour = "Well2")) +
  geom_point(aes(x = date, y = Well1, colour = "Well1", shape = group)) +
  geom_point(aes(x = date, y = Well2, colour = "Well2", shape = group))


Happy Holidays.  I hope one of these approaches will get you headed in the 
right direction.

—rjp






On Dec 24, 2015, at 7:51 AM, Steven Stoline 
> wrote:

Dear All:

I am trying to plot two series in one graph. But I have some difficulties
to set up the y-axis lim. Also, the second series is not