Re: [R] Stacked Histogram, multiple lines for dates of news stories?

2010-06-29 Thread Jim Lemon

On 06/29/2010 01:04 AM, Simon Kiss wrote:

Dear colleagues,
I have extracted the dates of several news stories from a newspaper data
base to chart coverage trends of an issue over time. They are in a data
frame that looks just like one generated by the reproducible code below.
I can already generate a histogram of the dates with various intervals
(months, quarters, weeks years) using hist.Date. However, there are two
other things I'd like to do.
First, I'd like to either create a stacked histogram so that one could
see whether one newspaper really pushed coverage of an issue at a
certain point while others then followed later on in time. Second, or
alternatively, I would like to do a line graph of the same data for the
different papers to represent the same trends.
I guess what I'm finding challenging is that I don't have counts of the
number of stories on each day or in each week or in each month; I just
have the dates themselves. The date.Hist command was very useful in
turning those into bins, but I'd like to push it a bit further and to a
stacked histogram or a multiple line chart.
Can anyone suggest a way to go about doing this?

I should say, I played around in Hadley Wickham's ggplot package and
looked at his website, and there is a way to render multiple lines here:
http://had.co.nz/ggplot2/scale_date.html
but it was not clear to me how to plot just the dates or an index of the
dates as I don't have a value for the y axis, other than the number of
times a story was published in that time frame.


Hi Simon,
I had to think about this for a while, but the following may be what you 
want. It also gave me an idea for a new plot. Thanks.


Jim

library(plotrix)
count1<-
 hist(as.numeric(test_df$test2[test_df$test=="Globe and Mail"]),
 breaks=6)$counts
count2<-
 hist(as.numeric(test_df$test2[test_df$test=="Post"]),
 breaks=6)$counts
count3<-
 hist(as.numeric(test_df$test2[test_df$test=="Star"]),
 breaks=6)$counts
plot(test_df$test2,test_df$test,ylim=c(0.4,3.6),type="n",
 main="Date of articles",xlab="Year",ylab="Journal",axes=FALSE)
yearpos<-seq(12599,14425,length.out=6)
axis(1,at=yearpos,labels=2004:2009)
axis(2,at=1:3,labels=c("Globe and Mail","Post","Star"))
box()
dispersion(yearpos,rep(1,6),count1/(max(count1)*2),
 type="l",fill="green")
dispersion(yearpos,rep(2,6),count2/(max(count2)*2),
 type="l",fill="red")
dispersion(yearpos[1:5],rep(3,5),count3/(max(count3)*2),
 type="l",fill="blue")

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stacked Histogram, multiple lines for dates of news stories?

2010-06-28 Thread Hadley Wickham
Hi Simon,

Here are two ways to do that with ggplot:

qplot(test2, data = test_df, geom = "freqpoly", colour = test,
binwidth = 30, drop = F)
qplot(test2, data = test_df, geom = "bar", fill = test, binwidth = 30)

binwidth is in days.  If you want to bin by other intervals (like
months), I'd recommend doing so before plotting.

Hadley

On Mon, Jun 28, 2010 at 10:04 AM, Simon Kiss  wrote:
> Dear colleagues,
> I have extracted the dates of several news stories from a newspaper data
> base to chart coverage trends of an issue over time. They are in a data
> frame that looks just like one generated by the reproducible code below.
> I can already generate a histogram of the dates with various intervals
> (months, quarters, weeks years) using hist.Date.  However, there are two
> other things I'd like to do.
> First, I'd like to either create a stacked histogram so that one could see
> whether one newspaper really pushed coverage of an issue at a certain point
> while others then followed later on in time.  Second, or alternatively, I
> would like to do a line graph of the same data for the different papers to
> represent the same trends.
> I guess what I'm finding challenging is that I don't have counts of the
> number of stories on each day or in each week or in each month; I just have
> the dates themselves.  The date.Hist command was very useful in turning
> those into bins, but I'd like to push it a bit further and to a stacked
> histogram or a multiple line chart.
> Can anyone suggest a way to go about doing this?
>
> I should say, I played around in Hadley Wickham's ggplot package and looked
> at his website, and there is a way to render multiple lines here:
> http://had.co.nz/ggplot2/scale_date.html
> but it was not clear to me how to plot just the dates or an index of the
> dates as I don't have a value for the y axis, other than the number of times
> a story was published in that time frame.
>
> Regardless, I hope someone can suggest something.
> Yours,
> Simon J. Kiss
>
> test=sample(1:3, 50, replace=TRUE)
> test=as.factor(test)
> levels(test)=c("Star", "Globe and Mail", "Post")
> test2=ISOdatetime(sample(2004:2009, 50, replace=TRUE), sample(1:12, size=50,
> replace=TRUE), sample(1:30, 50, replace=TRUE), 0,0,0)
> test2=as.Date(test2)
> test_df=data.frame(test, test2)
>
> *
> Simon J. Kiss, PhD
> SSHRC and DAAD Post-Doctoral Fellow
> John F. Kennedy Institute of North America Studies
> Free University of Berlin
> Lansstraße 7-9
> 14195 Berlin, Germany
> Cell: +49 (0)1525-300-2812,
> Web: http://www.jfki.fu-berlin.de/index.html
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Stacked Histogram, multiple lines for dates of news stories?

2010-06-28 Thread Simon Kiss

Dear colleagues,
I have extracted the dates of several news stories from a newspaper  
data base to chart coverage trends of an issue over time. They are in  
a data frame that looks just like one generated by the reproducible  
code below.
I can already generate a histogram of the dates with various intervals  
(months, quarters, weeks years) using hist.Date.  However, there are  
two other things I'd like to do.
First, I'd like to either create a stacked histogram so that one could  
see whether one newspaper really pushed coverage of an issue at a  
certain point while others then followed later on in time.  Second, or  
alternatively, I would like to do a line graph of the same data for  
the different papers to represent the same trends.
I guess what I'm finding challenging is that I don't have counts of  
the number of stories on each day or in each week or in each month; I  
just have the dates themselves.  The date.Hist command was very useful  
in turning those into bins, but I'd like to push it a bit further and  
to a stacked histogram or a multiple line chart.

Can anyone suggest a way to go about doing this?

I should say, I played around in Hadley Wickham's ggplot package and  
looked at his website, and there is a way to render multiple lines  
here: http://had.co.nz/ggplot2/scale_date.html
but it was not clear to me how to plot just the dates or an index of  
the dates as I don't have a value for the y axis, other than the  
number of times a story was published in that time frame.


Regardless, I hope someone can suggest something.
Yours,
Simon J. Kiss

test=sample(1:3, 50, replace=TRUE)
test=as.factor(test)
levels(test)=c("Star", "Globe and Mail", "Post")
test2=ISOdatetime(sample(2004:2009, 50, replace=TRUE), sample(1:12,  
size=50, replace=TRUE), sample(1:30, 50, replace=TRUE), 0,0,0)

test2=as.Date(test2)
test_df=data.frame(test, test2)

*
Simon J. Kiss, PhD
SSHRC and DAAD Post-Doctoral Fellow
John F. Kennedy Institute of North America Studies
Free University of Berlin
Lansstraße 7-9
14195 Berlin, Germany
Cell: +49 (0)1525-300-2812,
Web: http://www.jfki.fu-berlin.de/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.