Re: [R] graphically representing frequency of words in a speech?

2009-06-10 Thread Brown, Tony Nicholas
Yihui,

This is quite impressive, thanks for helping me think about how to make tag 
clouds in R.

Tony

-Original Message-
From: Yihui Xie [mailto:xieyi...@gmail.com] 
Sent: Wednesday, June 10, 2009 3:15 AM
To: Brown, Tony Nicholas
Cc: r-help@r-project.org
Subject: Re: [R] graphically representing frequency of words in a speech?

Hi,

As Gregor Gorjanc mentioned, it's very inconvenient to let R decide
the fontsize and placement of words in a plot. There have already been
very mature applications of tag cloud; one of them I'm relatively
familiar is the WordPress plugin "wp-cumulus", which makes use of a
Flash object to generate tag cloud, and it has fantastic 3D rotation
effect of the cloud. I've spent a couple of hours porting it into R;
see the source code and effect here:

http://yihui.name/en/2009/06/creating-tag-cloud-using-r-and-flash-javascript-swfobject/

HTH.

Regards,
Yihui
--
Yihui Xie 
Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
Mobile: +86-15810805877
Homepage: http://www.yihui.name
School of Statistics, Room 1037, Mingde Main Building,
Renmin University of China, Beijing, 100872, China



On Mon, Jun 8, 2009 at 2:41 AM, Brown, Tony
Nicholas wrote:
> Dear all,
>
>
>
> I recently saw a graph on television that displayed selected
> words/phrases in a speech scaled in size according to their frequency.
> So words/phrases that were often used appeared large and words that were
> rarely used appeared small. The closest thing I can find on the web to
> approximate what I saw can be found here:
> http://stateoftheunion.onetwothree.net/ The example at that website is
> more complicated but captures the general idea.
>
>
>
> Would someone point me in the right direction in terms of replicating
> such a graph.
>
>
>
> Thanks in advance,
>
> Tony
>
>
>
> 
> -
>
> Tony N. Brown, Ph.D.
>
> Editor-Elect, American Sociological Review
>
> Associate Professor of Sociology and Human and Organizational
> Development (secondary)
>
> Program Faculty, Effective Health Communication and African American &
> Diaspora Studies
>
> Faculty Head of Hank Ingram House, The Commons
>
> Vanderbilt University
>
> (615) 322-7518
>
> (615) 322-7505 fax
>
>
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graphically representing frequency of words in a speech?

2009-06-10 Thread Ronggui Huang
There is a similar discussion in statalist
(http://n2.nabble.com/st%3A-Tag-clouds-in-Stata--tt2992551.html#none),
I think they make a reasonable argument that tag cloud is not a good
statistical graphic.


2009/6/10 Yihui Xie :
> Hi,
>
> As Gregor Gorjanc mentioned, it's very inconvenient to let R decide
> the fontsize and placement of words in a plot. There have already been
> very mature applications of tag cloud; one of them I'm relatively
> familiar is the WordPress plugin "wp-cumulus", which makes use of a
> Flash object to generate tag cloud, and it has fantastic 3D rotation
> effect of the cloud. I've spent a couple of hours porting it into R;
> see the source code and effect here:
>
> http://yihui.name/en/2009/06/creating-tag-cloud-using-r-and-flash-javascript-swfobject/
>
> HTH.
>
> Regards,
> Yihui
> --
> Yihui Xie 
> Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
> Mobile: +86-15810805877
> Homepage: http://www.yihui.name
> School of Statistics, Room 1037, Mingde Main Building,
> Renmin University of China, Beijing, 100872, China
>
>
>
> On Mon, Jun 8, 2009 at 2:41 AM, Brown, Tony
> Nicholas wrote:
>> Dear all,
>>
>>
>>
>> I recently saw a graph on television that displayed selected
>> words/phrases in a speech scaled in size according to their frequency.
>> So words/phrases that were often used appeared large and words that were
>> rarely used appeared small. The closest thing I can find on the web to
>> approximate what I saw can be found here:
>> http://stateoftheunion.onetwothree.net/ The example at that website is
>> more complicated but captures the general idea.
>>
>>
>>
>> Would someone point me in the right direction in terms of replicating
>> such a graph.
>>
>>
>>
>> Thanks in advance,
>>
>> Tony
>>
>>
>>
>> 
>> -
>>
>> Tony N. Brown, Ph.D.
>>
>> Editor-Elect, American Sociological Review
>>
>> Associate Professor of Sociology and Human and Organizational
>> Development (secondary)
>>
>> Program Faculty, Effective Health Communication and African American &
>> Diaspora Studies
>>
>> Faculty Head of Hank Ingram House, The Commons
>>
>> Vanderbilt University
>>
>> (615) 322-7518
>>
>> (615) 322-7505 fax
>>
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
HUANG Ronggui, Wincent
PhD Candidate
Dept of Public and Social Administration
City University of Hong Kong
Home page: http://asrr.r-forge.r-project.org/rghuang.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graphically representing frequency of words in a speech?

2009-06-10 Thread Yihui Xie
Hi,

As Gregor Gorjanc mentioned, it's very inconvenient to let R decide
the fontsize and placement of words in a plot. There have already been
very mature applications of tag cloud; one of them I'm relatively
familiar is the WordPress plugin "wp-cumulus", which makes use of a
Flash object to generate tag cloud, and it has fantastic 3D rotation
effect of the cloud. I've spent a couple of hours porting it into R;
see the source code and effect here:

http://yihui.name/en/2009/06/creating-tag-cloud-using-r-and-flash-javascript-swfobject/

HTH.

Regards,
Yihui
--
Yihui Xie 
Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
Mobile: +86-15810805877
Homepage: http://www.yihui.name
School of Statistics, Room 1037, Mingde Main Building,
Renmin University of China, Beijing, 100872, China



On Mon, Jun 8, 2009 at 2:41 AM, Brown, Tony
Nicholas wrote:
> Dear all,
>
>
>
> I recently saw a graph on television that displayed selected
> words/phrases in a speech scaled in size according to their frequency.
> So words/phrases that were often used appeared large and words that were
> rarely used appeared small. The closest thing I can find on the web to
> approximate what I saw can be found here:
> http://stateoftheunion.onetwothree.net/ The example at that website is
> more complicated but captures the general idea.
>
>
>
> Would someone point me in the right direction in terms of replicating
> such a graph.
>
>
>
> Thanks in advance,
>
> Tony
>
>
>
> 
> -
>
> Tony N. Brown, Ph.D.
>
> Editor-Elect, American Sociological Review
>
> Associate Professor of Sociology and Human and Organizational
> Development (secondary)
>
> Program Faculty, Effective Health Communication and African American &
> Diaspora Studies
>
> Faculty Head of Hank Ingram House, The Commons
>
> Vanderbilt University
>
> (615) 322-7518
>
> (615) 322-7505 fax
>
>
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graphically representing frequency of words in a speech?

2009-06-07 Thread Mike Lawrence
Below are various attempts using using ggplot2
(http://had.co.nz/ggplot2/). First I try random positioning, then
random positioning with alpha, then a quasi-random position scheme in
polar coordinates:

#this demo has random number generation
# so best to set a seed to make it
# reproducible.
set.seed(1)

#generate some fake data
a = data.frame(
word = month.name
, freq = sample(1:10,12,replace=TRUE)
)

#add arbitrary location information
a$x = sample(1:12,12)
a$y = sample(1:12,12)

#load ggplot2
library(ggplot2)

#initialize a ggplot object
my_plot = ggplot()

#create an object for the text layer
my_text = geom_text(
data = a
, aes(
x = x
, y = y
, label = word
, size = freq
)
)

#create an object for the text size limits
my_size_scale = scale_size(
to = c(3,20)
)

#create an object to expand the x-axis limits
# (ensures that text isn't cropped)
my_x_scale = scale_x_continuous(
expand = c(.5, 0)
)

#ditto for the y axis
my_y_scale = scale_y_continuous(
expand = c(.5, 0)
)

#create an opts object that removes
# plot elements unnecessary in a tag cloud
my_opts = opts(
legend.position = 'none'
, panel.grid.minor = theme_blank()
, panel.grid.major = theme_blank()
, panel.background = theme_blank()
, axis.line = theme_blank()
, axis.text.x = theme_blank()
, axis.text.y = theme_blank()
, axis.ticks = theme_blank()
, axis.title.x = theme_blank()
, axis.title.y = theme_blank()
)

#show the plot
print(
my_plot+
my_text+
my_size_scale+
my_x_scale+
my_y_scale+
my_opts
)

#to aid readability amidst overlap, set alpha in
# the call to geom_text
my_text_with_alpha = geom_text(
data = a
, aes(
x = x
, y = y
, label = word
, size = freq
)
, alpha = .5
)

#show the version with alpha
print(
my_plot+
my_text_with_alpha+
my_size_scale+
my_x_scale+
my_y_scale+
my_opts
)

#alternatively, in polar coordinates,
# which maps x to angle and y to radius,
# making a nice circle
print(
my_plot+
my_text_with_alpha+
my_size_scale+
my_opts+
coord_polar()
)
#(note omission of my_y_scale &
# my_x_scale, which seem to be ignored
# when coord_polar() is called. I'll
# report this possible bug to the ggplot2
# maintainer)

#a possible way to avoid overlap is to
# map radius (y) to frequency so that
# larger text is in the periphery
# where there is more room. This
# necessitates adding some random
# noise to the frequency so that
# the low frequency words don't
# jumble in the center too badly
a$freq2 = a$freq+rnorm(12)

#now map radius (y) to freq2
my_text_with_alpha_and_freq2 = geom_text(
data = a
, aes(
x = x
, y = freq2
, label = word
, size = freq
)
, alpha = .5
)

#show the version with alpha & radius mapped to freq2
print(
my_plot+
my_text_with_alpha_and_freq2+
my_size_scale+
my_opts+
coord_polar()
)

-- 
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tr.im/mikes_public_calendar

~ Certainty is folly... I think. ~

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graphically representing frequency of words in a speech?

2009-06-07 Thread Brown, Tony Nicholas
Thank you so much Mark and Gregor. The basic information, suggestions,
and R code that you provided is most helpful. 

Tony

-Original Message-
From: Gorjanc Gregor [mailto:gregor.gorj...@bfro.uni-lj.si] 
Sent: Sunday, June 07, 2009 2:17 PM
To: Marc Schwartz; Brown, Tony Nicholas
Cc: rhelp help
Subject: RE: [R] graphically representing frequency of words in a
speech?

> The only thing that I found for R is by Gregor Gorjanc, but the
> information seems to be dated:
>
>http://www.bfro.uni-lj.si/MR/ggorjan/software/R/index.html#tagCloud

Hi,

Yes, I have tried to create a tag cloud plot in R, but I abandoned the
project
due to other things. The main obstacle was that in R we need to take
care of the fontsizes and placement of words, while this is very easy
with
say browsers, who do all the renderind. I tracked the last version of
the R file
which is pasted bellow. I must say that I do not remember the status of
the
code so use it as you wish. If anyone wishes to take this project
further, please
do so!

gg

### tagCloud.R
###-
---
### What: Tag cloud plot functions
### Time-stamp: <2006-09-10 02:53:29 ggorjan>
###-
---

tagCloud <- function(x, n=100, decreasing=TRUE,
 threshold=NULL, fontsize=c(12, 36),
 align=TRUE, expandRow=TRUE,
 justRow="bottom", title,
 textGpar=gpar(col="navy"),
 rectGpar=gpar(col="white"),
 titleGpar=gpar(), viewGpar=gpar(),
 mar=c(1, 1, 1, 1))
{
  UseMethod("tagCloud")
}

tagCloud.default <- function(x, n=100, decreasing=TRUE,
 threshold=NULL, fontsize=c(12, 36),
 align=TRUE, expandRow=TRUE,
 justRow="bottom", title,
 textGpar=gpar(col="navy"),
 rectGpar=gpar(col="white"),
 titleGpar=gpar(), viewGpar=gpar(),
 mar=c(1, 1, 1, 1))
{
  if(!is.null(dim(x))) stop("'x' must be a vector")

  tagCloud.table(table(x), n=n, decreasing=decreasing,
fontsize=fontsize,
 threshold=threshold, align=align, expandRow=expandRow,
 justRow=justRow, title=title, textGpar=textGpar,
 rectGpar=rectGpar, titleGpar=titleGpar,
viewGpar=viewGpar,
 mar=mar)
}

tagCloud.table <- function(x, n=100, decreasing=TRUE,
   threshold=NULL, fontsize=c(12, 36),
   align=TRUE, expandRow=TRUE,
   justRow="bottom", title,
   textGpar=gpar(col="navy"),
   rectGpar=gpar(col="white"),
   titleGpar=gpar(), viewGpar=gpar(),
   mar=c(1, 1, 1, 1))
{
  ## --- Check ---

  if(length(dim(x)) != 1)
stop("'x' must be one dimensional table")

  ## --- Threshold ---

  if(!is.null(threshold)) x <- x[x >= threshold]

  ## --- Number of units ---

  N <- length(x)## length of table
  if(is.null(n)) {  ## if n=NULL, plot all units
n <- N
  } else {
if(n > N) n <- N## if n is to big, decrease it
if(n < 1) n <- round(N * n) ## if n is percentage of units
  }

  fontsizeLength <- length(fontsize)
  if(fontsizeLength != 2)
stop("'fontsize' must be of length two")

  ## --- Sort and subset ---

  if(n < N) { ## only if we want to plot subset of units
tmp <- sort(x, decreasing=decreasing)
x <- x[names(x) %in% names(tmp[1:n])]
  }

  ## --- Get relative freq ---

  x <- prop.table(x)

  ## --- Fontsize ---

  fontsizeDiff <- diff(fontsize)
  xDiff <- max(x) - min(x)
  if(xDiff != 0) {
off <- ifelse(fontsizeDiff > 0, min(x), max(x))
fontsize <- (x - off) / xDiff * fontsizeDiff + min(fontsize)
  } else { ## all units have the same frequency
fontsize <- rep(min(fontsize), times=n)
  }

  ## --- Viewport and rectangle ---

  grid.newpage()
  width <- unit(1, "npc")
  height <- unit(1, "npc")
  vp <- viewport(y=unit(mar[1], "lines"), x=unit(mar[2], "lines"), ,
 width=width - unit(mar[2] + mar[4], "lines"),
 height=height - unit(mar[1] + mar[3], "lines"),
 just=c("left", "bottom"), gp=viewGpar, name="main")
  pushViewport(vp)

  if(!missing(title))
grid.text(title, y=height, gp=titleGpar, name="title")

  grid.rect(gp=rectGpar, name="cloud")

  ## --- Grobs ---

Re: [R] graphically representing frequency of words in a speech?

2009-06-07 Thread Gorjanc Gregor
> The only thing that I found for R is by Gregor Gorjanc, but the
> information seems to be dated:
>
>http://www.bfro.uni-lj.si/MR/ggorjan/software/R/index.html#tagCloud

Hi,

Yes, I have tried to create a tag cloud plot in R, but I abandoned the project
due to other things. The main obstacle was that in R we need to take
care of the fontsizes and placement of words, while this is very easy with
say browsers, who do all the renderind. I tracked the last version of the R file
which is pasted bellow. I must say that I do not remember the status of the
code so use it as you wish. If anyone wishes to take this project further, 
please
do so!

gg

### tagCloud.R
###
### What: Tag cloud plot functions
### Time-stamp: <2006-09-10 02:53:29 ggorjan>
###

tagCloud <- function(x, n=100, decreasing=TRUE,
 threshold=NULL, fontsize=c(12, 36),
 align=TRUE, expandRow=TRUE,
 justRow="bottom", title,
 textGpar=gpar(col="navy"),
 rectGpar=gpar(col="white"),
 titleGpar=gpar(), viewGpar=gpar(),
 mar=c(1, 1, 1, 1))
{
  UseMethod("tagCloud")
}

tagCloud.default <- function(x, n=100, decreasing=TRUE,
 threshold=NULL, fontsize=c(12, 36),
 align=TRUE, expandRow=TRUE,
 justRow="bottom", title,
 textGpar=gpar(col="navy"),
 rectGpar=gpar(col="white"),
 titleGpar=gpar(), viewGpar=gpar(),
 mar=c(1, 1, 1, 1))
{
  if(!is.null(dim(x))) stop("'x' must be a vector")

  tagCloud.table(table(x), n=n, decreasing=decreasing, fontsize=fontsize,
 threshold=threshold, align=align, expandRow=expandRow,
 justRow=justRow, title=title, textGpar=textGpar,
 rectGpar=rectGpar, titleGpar=titleGpar, viewGpar=viewGpar,
 mar=mar)
}

tagCloud.table <- function(x, n=100, decreasing=TRUE,
   threshold=NULL, fontsize=c(12, 36),
   align=TRUE, expandRow=TRUE,
   justRow="bottom", title,
   textGpar=gpar(col="navy"),
   rectGpar=gpar(col="white"),
   titleGpar=gpar(), viewGpar=gpar(),
   mar=c(1, 1, 1, 1))
{
  ## --- Check ---

  if(length(dim(x)) != 1)
stop("'x' must be one dimensional table")

  ## --- Threshold ---

  if(!is.null(threshold)) x <- x[x >= threshold]

  ## --- Number of units ---

  N <- length(x)## length of table
  if(is.null(n)) {  ## if n=NULL, plot all units
n <- N
  } else {
if(n > N) n <- N## if n is to big, decrease it
if(n < 1) n <- round(N * n) ## if n is percentage of units
  }

  fontsizeLength <- length(fontsize)
  if(fontsizeLength != 2)
stop("'fontsize' must be of length two")

  ## --- Sort and subset ---

  if(n < N) { ## only if we want to plot subset of units
tmp <- sort(x, decreasing=decreasing)
x <- x[names(x) %in% names(tmp[1:n])]
  }

  ## --- Get relative freq ---

  x <- prop.table(x)

  ## --- Fontsize ---

  fontsizeDiff <- diff(fontsize)
  xDiff <- max(x) - min(x)
  if(xDiff != 0) {
off <- ifelse(fontsizeDiff > 0, min(x), max(x))
fontsize <- (x - off) / xDiff * fontsizeDiff + min(fontsize)
  } else { ## all units have the same frequency
fontsize <- rep(min(fontsize), times=n)
  }

  ## --- Viewport and rectangle ---

  grid.newpage()
  width <- unit(1, "npc")
  height <- unit(1, "npc")
  vp <- viewport(y=unit(mar[1], "lines"), x=unit(mar[2], "lines"), ,
 width=width - unit(mar[2] + mar[4], "lines"),
 height=height - unit(mar[1] + mar[3], "lines"),
 just=c("left", "bottom"), gp=viewGpar, name="main")
  pushViewport(vp)

  if(!missing(title))
grid.text(title, y=height, gp=titleGpar, name="title")

  grid.rect(gp=rectGpar, name="cloud")

  ## --- Grobs ---

  tag <- vector(mode="list", length=4)
  names(tag) <- c("fontsize", "grob", "width", "height")
  tag[[1]] <- tag[[2]] <- tag[[3]] <- tag[[4]] <- vector(mode="list", length=n)
  for(i in 1:n) {
tag$fontsize[[i]] <- fontsize[i]
tag$grob[[i]] <- textGrob(names(x[i]), gp=gpar(fontsize=fontsize[i]))
tag$width[[i]] <- convertWidth(grobWidth(tag$grob[[i]]), unitTo="npc",
   valueOnly=TRUE)
tag$height[[i]] <- convertHeight(grobHeight(tag$grob[[i]]), unitTo="npc",
 valueOnly=TRUE)
  }

  ## --- Split lines ---

  row <- colWidth <- vector(length=n)
  row[1] <- 1
  colWidth[1] <- 0
  lineWidth <- tag$width[[1]]
  j <- 1
  gapWidth <- convertWidth(stringWidth(" "), u

Re: [R] graphically representing frequency of words in a speech?

2009-06-07 Thread Marc Schwartz


On Jun 7, 2009, at 1:41 PM, Brown, Tony Nicholas wrote:


Dear all,

I recently saw a graph on television that displayed selected
words/phrases in a speech scaled in size according to their frequency.
So words/phrases that were often used appeared large and words that  
were

rarely used appeared small. The closest thing I can find on the web to
approximate what I saw can be found here:
http://stateoftheunion.onetwothree.net/ The example at that website is
more complicated but captures the general idea.

Would someone point me in the right direction in terms of replicating
such a graph.

Thanks in advance,

Tony


Tony,

What you are referring to is called a 'tag cloud'. See this page:

  http://en.wikipedia.org/wiki/Tag_cloud

They are commonly used on wikis, Twitter and so forth. For example:

  http://tweetstats.com/trends


The only thing that I found for R is by Gregor Gorjanc, but the  
information seems to be dated:


  http://www.bfro.uni-lj.si/MR/ggorjan/software/R/index.html#tagCloud

I have cc'd him here for any updates.

Otherwise, there are some links on the Wikipedia page and some other  
applications such as Wordle:


  http://www.wordle.net/

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.