Re: [R] graphically representing frequency of words in a speech?

2009-06-10 Thread Yihui Xie
Hi,

As Gregor Gorjanc mentioned, it's very inconvenient to let R decide
the fontsize and placement of words in a plot. There have already been
very mature applications of tag cloud; one of them I'm relatively
familiar is the WordPress plugin wp-cumulus, which makes use of a
Flash object to generate tag cloud, and it has fantastic 3D rotation
effect of the cloud. I've spent a couple of hours porting it into R;
see the source code and effect here:

http://yihui.name/en/2009/06/creating-tag-cloud-using-r-and-flash-javascript-swfobject/

HTH.

Regards,
Yihui
--
Yihui Xie xieyi...@gmail.com
Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
Mobile: +86-15810805877
Homepage: http://www.yihui.name
School of Statistics, Room 1037, Mingde Main Building,
Renmin University of China, Beijing, 100872, China



On Mon, Jun 8, 2009 at 2:41 AM, Brown, Tony
Nicholastony.n.br...@vanderbilt.edu wrote:
 Dear all,



 I recently saw a graph on television that displayed selected
 words/phrases in a speech scaled in size according to their frequency.
 So words/phrases that were often used appeared large and words that were
 rarely used appeared small. The closest thing I can find on the web to
 approximate what I saw can be found here:
 http://stateoftheunion.onetwothree.net/ The example at that website is
 more complicated but captures the general idea.



 Would someone point me in the right direction in terms of replicating
 such a graph.



 Thanks in advance,

 Tony



 
 -

 Tony N. Brown, Ph.D.

 Editor-Elect, American Sociological Review

 Associate Professor of Sociology and Human and Organizational
 Development (secondary)

 Program Faculty, Effective Health Communication and African American 
 Diaspora Studies

 Faculty Head of Hank Ingram House, The Commons

 Vanderbilt University

 (615) 322-7518

 (615) 322-7505 fax




        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graphically representing frequency of words in a speech?

2009-06-10 Thread Ronggui Huang
There is a similar discussion in statalist
(http://n2.nabble.com/st%3A-Tag-clouds-in-Stata--tt2992551.html#none),
I think they make a reasonable argument that tag cloud is not a good
statistical graphic.


2009/6/10 Yihui Xie xieyi...@gmail.com:
 Hi,

 As Gregor Gorjanc mentioned, it's very inconvenient to let R decide
 the fontsize and placement of words in a plot. There have already been
 very mature applications of tag cloud; one of them I'm relatively
 familiar is the WordPress plugin wp-cumulus, which makes use of a
 Flash object to generate tag cloud, and it has fantastic 3D rotation
 effect of the cloud. I've spent a couple of hours porting it into R;
 see the source code and effect here:

 http://yihui.name/en/2009/06/creating-tag-cloud-using-r-and-flash-javascript-swfobject/

 HTH.

 Regards,
 Yihui
 --
 Yihui Xie xieyi...@gmail.com
 Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
 Mobile: +86-15810805877
 Homepage: http://www.yihui.name
 School of Statistics, Room 1037, Mingde Main Building,
 Renmin University of China, Beijing, 100872, China



 On Mon, Jun 8, 2009 at 2:41 AM, Brown, Tony
 Nicholastony.n.br...@vanderbilt.edu wrote:
 Dear all,



 I recently saw a graph on television that displayed selected
 words/phrases in a speech scaled in size according to their frequency.
 So words/phrases that were often used appeared large and words that were
 rarely used appeared small. The closest thing I can find on the web to
 approximate what I saw can be found here:
 http://stateoftheunion.onetwothree.net/ The example at that website is
 more complicated but captures the general idea.



 Would someone point me in the right direction in terms of replicating
 such a graph.



 Thanks in advance,

 Tony



 
 -

 Tony N. Brown, Ph.D.

 Editor-Elect, American Sociological Review

 Associate Professor of Sociology and Human and Organizational
 Development (secondary)

 Program Faculty, Effective Health Communication and African American 
 Diaspora Studies

 Faculty Head of Hank Ingram House, The Commons

 Vanderbilt University

 (615) 322-7518

 (615) 322-7505 fax




        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
HUANG Ronggui, Wincent
PhD Candidate
Dept of Public and Social Administration
City University of Hong Kong
Home page: http://asrr.r-forge.r-project.org/rghuang.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graphically representing frequency of words in a speech?

2009-06-10 Thread Brown, Tony Nicholas
Yihui,

This is quite impressive, thanks for helping me think about how to make tag 
clouds in R.

Tony

-Original Message-
From: Yihui Xie [mailto:xieyi...@gmail.com] 
Sent: Wednesday, June 10, 2009 3:15 AM
To: Brown, Tony Nicholas
Cc: r-help@r-project.org
Subject: Re: [R] graphically representing frequency of words in a speech?

Hi,

As Gregor Gorjanc mentioned, it's very inconvenient to let R decide
the fontsize and placement of words in a plot. There have already been
very mature applications of tag cloud; one of them I'm relatively
familiar is the WordPress plugin wp-cumulus, which makes use of a
Flash object to generate tag cloud, and it has fantastic 3D rotation
effect of the cloud. I've spent a couple of hours porting it into R;
see the source code and effect here:

http://yihui.name/en/2009/06/creating-tag-cloud-using-r-and-flash-javascript-swfobject/

HTH.

Regards,
Yihui
--
Yihui Xie xieyi...@gmail.com
Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
Mobile: +86-15810805877
Homepage: http://www.yihui.name
School of Statistics, Room 1037, Mingde Main Building,
Renmin University of China, Beijing, 100872, China



On Mon, Jun 8, 2009 at 2:41 AM, Brown, Tony
Nicholastony.n.br...@vanderbilt.edu wrote:
 Dear all,



 I recently saw a graph on television that displayed selected
 words/phrases in a speech scaled in size according to their frequency.
 So words/phrases that were often used appeared large and words that were
 rarely used appeared small. The closest thing I can find on the web to
 approximate what I saw can be found here:
 http://stateoftheunion.onetwothree.net/ The example at that website is
 more complicated but captures the general idea.



 Would someone point me in the right direction in terms of replicating
 such a graph.



 Thanks in advance,

 Tony



 
 -

 Tony N. Brown, Ph.D.

 Editor-Elect, American Sociological Review

 Associate Professor of Sociology and Human and Organizational
 Development (secondary)

 Program Faculty, Effective Health Communication and African American 
 Diaspora Studies

 Faculty Head of Hank Ingram House, The Commons

 Vanderbilt University

 (615) 322-7518

 (615) 322-7505 fax




        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graphically representing frequency of words in a speech?

2009-06-07 Thread Marc Schwartz


On Jun 7, 2009, at 1:41 PM, Brown, Tony Nicholas wrote:


Dear all,

I recently saw a graph on television that displayed selected
words/phrases in a speech scaled in size according to their frequency.
So words/phrases that were often used appeared large and words that  
were

rarely used appeared small. The closest thing I can find on the web to
approximate what I saw can be found here:
http://stateoftheunion.onetwothree.net/ The example at that website is
more complicated but captures the general idea.

Would someone point me in the right direction in terms of replicating
such a graph.

Thanks in advance,

Tony


Tony,

What you are referring to is called a 'tag cloud'. See this page:

  http://en.wikipedia.org/wiki/Tag_cloud

They are commonly used on wikis, Twitter and so forth. For example:

  http://tweetstats.com/trends


The only thing that I found for R is by Gregor Gorjanc, but the  
information seems to be dated:


  http://www.bfro.uni-lj.si/MR/ggorjan/software/R/index.html#tagCloud

I have cc'd him here for any updates.

Otherwise, there are some links on the Wikipedia page and some other  
applications such as Wordle:


  http://www.wordle.net/

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graphically representing frequency of words in a speech?

2009-06-07 Thread Gorjanc Gregor
 The only thing that I found for R is by Gregor Gorjanc, but the
 information seems to be dated:

http://www.bfro.uni-lj.si/MR/ggorjan/software/R/index.html#tagCloud

Hi,

Yes, I have tried to create a tag cloud plot in R, but I abandoned the project
due to other things. The main obstacle was that in R we need to take
care of the fontsizes and placement of words, while this is very easy with
say browsers, who do all the renderind. I tracked the last version of the R file
which is pasted bellow. I must say that I do not remember the status of the
code so use it as you wish. If anyone wishes to take this project further, 
please
do so!

gg

### tagCloud.R
###
### What: Tag cloud plot functions
### Time-stamp: 2006-09-10 02:53:29 ggorjan
###

tagCloud - function(x, n=100, decreasing=TRUE,
 threshold=NULL, fontsize=c(12, 36),
 align=TRUE, expandRow=TRUE,
 justRow=bottom, title,
 textGpar=gpar(col=navy),
 rectGpar=gpar(col=white),
 titleGpar=gpar(), viewGpar=gpar(),
 mar=c(1, 1, 1, 1))
{
  UseMethod(tagCloud)
}

tagCloud.default - function(x, n=100, decreasing=TRUE,
 threshold=NULL, fontsize=c(12, 36),
 align=TRUE, expandRow=TRUE,
 justRow=bottom, title,
 textGpar=gpar(col=navy),
 rectGpar=gpar(col=white),
 titleGpar=gpar(), viewGpar=gpar(),
 mar=c(1, 1, 1, 1))
{
  if(!is.null(dim(x))) stop('x' must be a vector)

  tagCloud.table(table(x), n=n, decreasing=decreasing, fontsize=fontsize,
 threshold=threshold, align=align, expandRow=expandRow,
 justRow=justRow, title=title, textGpar=textGpar,
 rectGpar=rectGpar, titleGpar=titleGpar, viewGpar=viewGpar,
 mar=mar)
}

tagCloud.table - function(x, n=100, decreasing=TRUE,
   threshold=NULL, fontsize=c(12, 36),
   align=TRUE, expandRow=TRUE,
   justRow=bottom, title,
   textGpar=gpar(col=navy),
   rectGpar=gpar(col=white),
   titleGpar=gpar(), viewGpar=gpar(),
   mar=c(1, 1, 1, 1))
{
  ## --- Check ---

  if(length(dim(x)) != 1)
stop('x' must be one dimensional table)

  ## --- Threshold ---

  if(!is.null(threshold)) x - x[x = threshold]

  ## --- Number of units ---

  N - length(x)## length of table
  if(is.null(n)) {  ## if n=NULL, plot all units
n - N
  } else {
if(n  N) n - N## if n is to big, decrease it
if(n  1) n - round(N * n) ## if n is percentage of units
  }

  fontsizeLength - length(fontsize)
  if(fontsizeLength != 2)
stop('fontsize' must be of length two)

  ## --- Sort and subset ---

  if(n  N) { ## only if we want to plot subset of units
tmp - sort(x, decreasing=decreasing)
x - x[names(x) %in% names(tmp[1:n])]
  }

  ## --- Get relative freq ---

  x - prop.table(x)

  ## --- Fontsize ---

  fontsizeDiff - diff(fontsize)
  xDiff - max(x) - min(x)
  if(xDiff != 0) {
off - ifelse(fontsizeDiff  0, min(x), max(x))
fontsize - (x - off) / xDiff * fontsizeDiff + min(fontsize)
  } else { ## all units have the same frequency
fontsize - rep(min(fontsize), times=n)
  }

  ## --- Viewport and rectangle ---

  grid.newpage()
  width - unit(1, npc)
  height - unit(1, npc)
  vp - viewport(y=unit(mar[1], lines), x=unit(mar[2], lines), ,
 width=width - unit(mar[2] + mar[4], lines),
 height=height - unit(mar[1] + mar[3], lines),
 just=c(left, bottom), gp=viewGpar, name=main)
  pushViewport(vp)

  if(!missing(title))
grid.text(title, y=height, gp=titleGpar, name=title)

  grid.rect(gp=rectGpar, name=cloud)

  ## --- Grobs ---

  tag - vector(mode=list, length=4)
  names(tag) - c(fontsize, grob, width, height)
  tag[[1]] - tag[[2]] - tag[[3]] - tag[[4]] - vector(mode=list, length=n)
  for(i in 1:n) {
tag$fontsize[[i]] - fontsize[i]
tag$grob[[i]] - textGrob(names(x[i]), gp=gpar(fontsize=fontsize[i]))
tag$width[[i]] - convertWidth(grobWidth(tag$grob[[i]]), unitTo=npc,
   valueOnly=TRUE)
tag$height[[i]] - convertHeight(grobHeight(tag$grob[[i]]), unitTo=npc,
 valueOnly=TRUE)
  }

  ## --- Split lines ---

  row - colWidth - vector(length=n)
  row[1] - 1
  colWidth[1] - 0
  lineWidth - tag$width[[1]]
  j - 1
  gapWidth - convertWidth(stringWidth( ), unitTo=npc, valueOnly=TRUE)
  maxWidth - convertWidth(width, unitTo=npc, valueOnly=TRUE)

  for(i in 

Re: [R] graphically representing frequency of words in a speech?

2009-06-07 Thread Brown, Tony Nicholas
Thank you so much Mark and Gregor. The basic information, suggestions,
and R code that you provided is most helpful. 

Tony

-Original Message-
From: Gorjanc Gregor [mailto:gregor.gorj...@bfro.uni-lj.si] 
Sent: Sunday, June 07, 2009 2:17 PM
To: Marc Schwartz; Brown, Tony Nicholas
Cc: rhelp help
Subject: RE: [R] graphically representing frequency of words in a
speech?

 The only thing that I found for R is by Gregor Gorjanc, but the
 information seems to be dated:

http://www.bfro.uni-lj.si/MR/ggorjan/software/R/index.html#tagCloud

Hi,

Yes, I have tried to create a tag cloud plot in R, but I abandoned the
project
due to other things. The main obstacle was that in R we need to take
care of the fontsizes and placement of words, while this is very easy
with
say browsers, who do all the renderind. I tracked the last version of
the R file
which is pasted bellow. I must say that I do not remember the status of
the
code so use it as you wish. If anyone wishes to take this project
further, please
do so!

gg

### tagCloud.R
###-
---
### What: Tag cloud plot functions
### Time-stamp: 2006-09-10 02:53:29 ggorjan
###-
---

tagCloud - function(x, n=100, decreasing=TRUE,
 threshold=NULL, fontsize=c(12, 36),
 align=TRUE, expandRow=TRUE,
 justRow=bottom, title,
 textGpar=gpar(col=navy),
 rectGpar=gpar(col=white),
 titleGpar=gpar(), viewGpar=gpar(),
 mar=c(1, 1, 1, 1))
{
  UseMethod(tagCloud)
}

tagCloud.default - function(x, n=100, decreasing=TRUE,
 threshold=NULL, fontsize=c(12, 36),
 align=TRUE, expandRow=TRUE,
 justRow=bottom, title,
 textGpar=gpar(col=navy),
 rectGpar=gpar(col=white),
 titleGpar=gpar(), viewGpar=gpar(),
 mar=c(1, 1, 1, 1))
{
  if(!is.null(dim(x))) stop('x' must be a vector)

  tagCloud.table(table(x), n=n, decreasing=decreasing,
fontsize=fontsize,
 threshold=threshold, align=align, expandRow=expandRow,
 justRow=justRow, title=title, textGpar=textGpar,
 rectGpar=rectGpar, titleGpar=titleGpar,
viewGpar=viewGpar,
 mar=mar)
}

tagCloud.table - function(x, n=100, decreasing=TRUE,
   threshold=NULL, fontsize=c(12, 36),
   align=TRUE, expandRow=TRUE,
   justRow=bottom, title,
   textGpar=gpar(col=navy),
   rectGpar=gpar(col=white),
   titleGpar=gpar(), viewGpar=gpar(),
   mar=c(1, 1, 1, 1))
{
  ## --- Check ---

  if(length(dim(x)) != 1)
stop('x' must be one dimensional table)

  ## --- Threshold ---

  if(!is.null(threshold)) x - x[x = threshold]

  ## --- Number of units ---

  N - length(x)## length of table
  if(is.null(n)) {  ## if n=NULL, plot all units
n - N
  } else {
if(n  N) n - N## if n is to big, decrease it
if(n  1) n - round(N * n) ## if n is percentage of units
  }

  fontsizeLength - length(fontsize)
  if(fontsizeLength != 2)
stop('fontsize' must be of length two)

  ## --- Sort and subset ---

  if(n  N) { ## only if we want to plot subset of units
tmp - sort(x, decreasing=decreasing)
x - x[names(x) %in% names(tmp[1:n])]
  }

  ## --- Get relative freq ---

  x - prop.table(x)

  ## --- Fontsize ---

  fontsizeDiff - diff(fontsize)
  xDiff - max(x) - min(x)
  if(xDiff != 0) {
off - ifelse(fontsizeDiff  0, min(x), max(x))
fontsize - (x - off) / xDiff * fontsizeDiff + min(fontsize)
  } else { ## all units have the same frequency
fontsize - rep(min(fontsize), times=n)
  }

  ## --- Viewport and rectangle ---

  grid.newpage()
  width - unit(1, npc)
  height - unit(1, npc)
  vp - viewport(y=unit(mar[1], lines), x=unit(mar[2], lines), ,
 width=width - unit(mar[2] + mar[4], lines),
 height=height - unit(mar[1] + mar[3], lines),
 just=c(left, bottom), gp=viewGpar, name=main)
  pushViewport(vp)

  if(!missing(title))
grid.text(title, y=height, gp=titleGpar, name=title)

  grid.rect(gp=rectGpar, name=cloud)

  ## --- Grobs ---

  tag - vector(mode=list, length=4)
  names(tag) - c(fontsize, grob, width, height)
  tag[[1]] - tag[[2]] - tag[[3]] - tag[[4]] - vector(mode=list,
length=n)
  for(i in 1:n) {
tag$fontsize[[i]] - fontsize[i]
tag$grob[[i]] - textGrob(names(x[i]),
gp=gpar(fontsize=fontsize[i]))
tag$width[[i]] - convertWidth(grobWidth(tag$grob[[i]]),
unitTo=npc,
   valueOnly=TRUE)
tag$height[[i]] - convertHeight

Re: [R] graphically representing frequency of words in a speech?

2009-06-07 Thread Mike Lawrence
Below are various attempts using using ggplot2
(http://had.co.nz/ggplot2/). First I try random positioning, then
random positioning with alpha, then a quasi-random position scheme in
polar coordinates:

#this demo has random number generation
# so best to set a seed to make it
# reproducible.
set.seed(1)

#generate some fake data
a = data.frame(
word = month.name
, freq = sample(1:10,12,replace=TRUE)
)

#add arbitrary location information
a$x = sample(1:12,12)
a$y = sample(1:12,12)

#load ggplot2
library(ggplot2)

#initialize a ggplot object
my_plot = ggplot()

#create an object for the text layer
my_text = geom_text(
data = a
, aes(
x = x
, y = y
, label = word
, size = freq
)
)

#create an object for the text size limits
my_size_scale = scale_size(
to = c(3,20)
)

#create an object to expand the x-axis limits
# (ensures that text isn't cropped)
my_x_scale = scale_x_continuous(
expand = c(.5, 0)
)

#ditto for the y axis
my_y_scale = scale_y_continuous(
expand = c(.5, 0)
)

#create an opts object that removes
# plot elements unnecessary in a tag cloud
my_opts = opts(
legend.position = 'none'
, panel.grid.minor = theme_blank()
, panel.grid.major = theme_blank()
, panel.background = theme_blank()
, axis.line = theme_blank()
, axis.text.x = theme_blank()
, axis.text.y = theme_blank()
, axis.ticks = theme_blank()
, axis.title.x = theme_blank()
, axis.title.y = theme_blank()
)

#show the plot
print(
my_plot+
my_text+
my_size_scale+
my_x_scale+
my_y_scale+
my_opts
)

#to aid readability amidst overlap, set alpha in
# the call to geom_text
my_text_with_alpha = geom_text(
data = a
, aes(
x = x
, y = y
, label = word
, size = freq
)
, alpha = .5
)

#show the version with alpha
print(
my_plot+
my_text_with_alpha+
my_size_scale+
my_x_scale+
my_y_scale+
my_opts
)

#alternatively, in polar coordinates,
# which maps x to angle and y to radius,
# making a nice circle
print(
my_plot+
my_text_with_alpha+
my_size_scale+
my_opts+
coord_polar()
)
#(note omission of my_y_scale 
# my_x_scale, which seem to be ignored
# when coord_polar() is called. I'll
# report this possible bug to the ggplot2
# maintainer)

#a possible way to avoid overlap is to
# map radius (y) to frequency so that
# larger text is in the periphery
# where there is more room. This
# necessitates adding some random
# noise to the frequency so that
# the low frequency words don't
# jumble in the center too badly
a$freq2 = a$freq+rnorm(12)

#now map radius (y) to freq2
my_text_with_alpha_and_freq2 = geom_text(
data = a
, aes(
x = x
, y = freq2
, label = word
, size = freq
)
, alpha = .5
)

#show the version with alpha  radius mapped to freq2
print(
my_plot+
my_text_with_alpha_and_freq2+
my_size_scale+
my_opts+
coord_polar()
)

-- 
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tr.im/mikes_public_calendar

~ Certainty is folly... I think. ~

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.