Re: [R] graphically representing frequency of words in a speech?
Yihui, This is quite impressive, thanks for helping me think about how to make tag clouds in R. Tony -Original Message- From: Yihui Xie [mailto:xieyi...@gmail.com] Sent: Wednesday, June 10, 2009 3:15 AM To: Brown, Tony Nicholas Cc: r-help@r-project.org Subject: Re: [R] graphically representing frequency of words in a speech? Hi, As Gregor Gorjanc mentioned, it's very inconvenient to let R decide the fontsize and placement of words in a plot. There have already been very mature applications of tag cloud; one of them I'm relatively familiar is the WordPress plugin "wp-cumulus", which makes use of a Flash object to generate tag cloud, and it has fantastic 3D rotation effect of the cloud. I've spent a couple of hours porting it into R; see the source code and effect here: http://yihui.name/en/2009/06/creating-tag-cloud-using-r-and-flash-javascript-swfobject/ HTH. Regards, Yihui -- Yihui Xie Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086 Mobile: +86-15810805877 Homepage: http://www.yihui.name School of Statistics, Room 1037, Mingde Main Building, Renmin University of China, Beijing, 100872, China On Mon, Jun 8, 2009 at 2:41 AM, Brown, Tony Nicholas wrote: > Dear all, > > > > I recently saw a graph on television that displayed selected > words/phrases in a speech scaled in size according to their frequency. > So words/phrases that were often used appeared large and words that were > rarely used appeared small. The closest thing I can find on the web to > approximate what I saw can be found here: > http://stateoftheunion.onetwothree.net/ The example at that website is > more complicated but captures the general idea. > > > > Would someone point me in the right direction in terms of replicating > such a graph. > > > > Thanks in advance, > > Tony > > > > > - > > Tony N. Brown, Ph.D. > > Editor-Elect, American Sociological Review > > Associate Professor of Sociology and Human and Organizational > Development (secondary) > > Program Faculty, Effective Health Communication and African American & > Diaspora Studies > > Faculty Head of Hank Ingram House, The Commons > > Vanderbilt University > > (615) 322-7518 > > (615) 322-7505 fax > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graphically representing frequency of words in a speech?
There is a similar discussion in statalist (http://n2.nabble.com/st%3A-Tag-clouds-in-Stata--tt2992551.html#none), I think they make a reasonable argument that tag cloud is not a good statistical graphic. 2009/6/10 Yihui Xie : > Hi, > > As Gregor Gorjanc mentioned, it's very inconvenient to let R decide > the fontsize and placement of words in a plot. There have already been > very mature applications of tag cloud; one of them I'm relatively > familiar is the WordPress plugin "wp-cumulus", which makes use of a > Flash object to generate tag cloud, and it has fantastic 3D rotation > effect of the cloud. I've spent a couple of hours porting it into R; > see the source code and effect here: > > http://yihui.name/en/2009/06/creating-tag-cloud-using-r-and-flash-javascript-swfobject/ > > HTH. > > Regards, > Yihui > -- > Yihui Xie > Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086 > Mobile: +86-15810805877 > Homepage: http://www.yihui.name > School of Statistics, Room 1037, Mingde Main Building, > Renmin University of China, Beijing, 100872, China > > > > On Mon, Jun 8, 2009 at 2:41 AM, Brown, Tony > Nicholas wrote: >> Dear all, >> >> >> >> I recently saw a graph on television that displayed selected >> words/phrases in a speech scaled in size according to their frequency. >> So words/phrases that were often used appeared large and words that were >> rarely used appeared small. The closest thing I can find on the web to >> approximate what I saw can be found here: >> http://stateoftheunion.onetwothree.net/ The example at that website is >> more complicated but captures the general idea. >> >> >> >> Would someone point me in the right direction in terms of replicating >> such a graph. >> >> >> >> Thanks in advance, >> >> Tony >> >> >> >> >> - >> >> Tony N. Brown, Ph.D. >> >> Editor-Elect, American Sociological Review >> >> Associate Professor of Sociology and Human and Organizational >> Development (secondary) >> >> Program Faculty, Effective Health Communication and African American & >> Diaspora Studies >> >> Faculty Head of Hank Ingram House, The Commons >> >> Vanderbilt University >> >> (615) 322-7518 >> >> (615) 322-7505 fax >> >> >> >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- HUANG Ronggui, Wincent PhD Candidate Dept of Public and Social Administration City University of Hong Kong Home page: http://asrr.r-forge.r-project.org/rghuang.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graphically representing frequency of words in a speech?
Hi, As Gregor Gorjanc mentioned, it's very inconvenient to let R decide the fontsize and placement of words in a plot. There have already been very mature applications of tag cloud; one of them I'm relatively familiar is the WordPress plugin "wp-cumulus", which makes use of a Flash object to generate tag cloud, and it has fantastic 3D rotation effect of the cloud. I've spent a couple of hours porting it into R; see the source code and effect here: http://yihui.name/en/2009/06/creating-tag-cloud-using-r-and-flash-javascript-swfobject/ HTH. Regards, Yihui -- Yihui Xie Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086 Mobile: +86-15810805877 Homepage: http://www.yihui.name School of Statistics, Room 1037, Mingde Main Building, Renmin University of China, Beijing, 100872, China On Mon, Jun 8, 2009 at 2:41 AM, Brown, Tony Nicholas wrote: > Dear all, > > > > I recently saw a graph on television that displayed selected > words/phrases in a speech scaled in size according to their frequency. > So words/phrases that were often used appeared large and words that were > rarely used appeared small. The closest thing I can find on the web to > approximate what I saw can be found here: > http://stateoftheunion.onetwothree.net/ The example at that website is > more complicated but captures the general idea. > > > > Would someone point me in the right direction in terms of replicating > such a graph. > > > > Thanks in advance, > > Tony > > > > > - > > Tony N. Brown, Ph.D. > > Editor-Elect, American Sociological Review > > Associate Professor of Sociology and Human and Organizational > Development (secondary) > > Program Faculty, Effective Health Communication and African American & > Diaspora Studies > > Faculty Head of Hank Ingram House, The Commons > > Vanderbilt University > > (615) 322-7518 > > (615) 322-7505 fax > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graphically representing frequency of words in a speech?
Below are various attempts using using ggplot2 (http://had.co.nz/ggplot2/). First I try random positioning, then random positioning with alpha, then a quasi-random position scheme in polar coordinates: #this demo has random number generation # so best to set a seed to make it # reproducible. set.seed(1) #generate some fake data a = data.frame( word = month.name , freq = sample(1:10,12,replace=TRUE) ) #add arbitrary location information a$x = sample(1:12,12) a$y = sample(1:12,12) #load ggplot2 library(ggplot2) #initialize a ggplot object my_plot = ggplot() #create an object for the text layer my_text = geom_text( data = a , aes( x = x , y = y , label = word , size = freq ) ) #create an object for the text size limits my_size_scale = scale_size( to = c(3,20) ) #create an object to expand the x-axis limits # (ensures that text isn't cropped) my_x_scale = scale_x_continuous( expand = c(.5, 0) ) #ditto for the y axis my_y_scale = scale_y_continuous( expand = c(.5, 0) ) #create an opts object that removes # plot elements unnecessary in a tag cloud my_opts = opts( legend.position = 'none' , panel.grid.minor = theme_blank() , panel.grid.major = theme_blank() , panel.background = theme_blank() , axis.line = theme_blank() , axis.text.x = theme_blank() , axis.text.y = theme_blank() , axis.ticks = theme_blank() , axis.title.x = theme_blank() , axis.title.y = theme_blank() ) #show the plot print( my_plot+ my_text+ my_size_scale+ my_x_scale+ my_y_scale+ my_opts ) #to aid readability amidst overlap, set alpha in # the call to geom_text my_text_with_alpha = geom_text( data = a , aes( x = x , y = y , label = word , size = freq ) , alpha = .5 ) #show the version with alpha print( my_plot+ my_text_with_alpha+ my_size_scale+ my_x_scale+ my_y_scale+ my_opts ) #alternatively, in polar coordinates, # which maps x to angle and y to radius, # making a nice circle print( my_plot+ my_text_with_alpha+ my_size_scale+ my_opts+ coord_polar() ) #(note omission of my_y_scale & # my_x_scale, which seem to be ignored # when coord_polar() is called. I'll # report this possible bug to the ggplot2 # maintainer) #a possible way to avoid overlap is to # map radius (y) to frequency so that # larger text is in the periphery # where there is more room. This # necessitates adding some random # noise to the frequency so that # the low frequency words don't # jumble in the center too badly a$freq2 = a$freq+rnorm(12) #now map radius (y) to freq2 my_text_with_alpha_and_freq2 = geom_text( data = a , aes( x = x , y = freq2 , label = word , size = freq ) , alpha = .5 ) #show the version with alpha & radius mapped to freq2 print( my_plot+ my_text_with_alpha_and_freq2+ my_size_scale+ my_opts+ coord_polar() ) -- Mike Lawrence Graduate Student Department of Psychology Dalhousie University Looking to arrange a meeting? Check my public calendar: http://tr.im/mikes_public_calendar ~ Certainty is folly... I think. ~ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graphically representing frequency of words in a speech?
Thank you so much Mark and Gregor. The basic information, suggestions, and R code that you provided is most helpful. Tony -Original Message- From: Gorjanc Gregor [mailto:gregor.gorj...@bfro.uni-lj.si] Sent: Sunday, June 07, 2009 2:17 PM To: Marc Schwartz; Brown, Tony Nicholas Cc: rhelp help Subject: RE: [R] graphically representing frequency of words in a speech? > The only thing that I found for R is by Gregor Gorjanc, but the > information seems to be dated: > >http://www.bfro.uni-lj.si/MR/ggorjan/software/R/index.html#tagCloud Hi, Yes, I have tried to create a tag cloud plot in R, but I abandoned the project due to other things. The main obstacle was that in R we need to take care of the fontsizes and placement of words, while this is very easy with say browsers, who do all the renderind. I tracked the last version of the R file which is pasted bellow. I must say that I do not remember the status of the code so use it as you wish. If anyone wishes to take this project further, please do so! gg ### tagCloud.R ###- --- ### What: Tag cloud plot functions ### Time-stamp: <2006-09-10 02:53:29 ggorjan> ###- --- tagCloud <- function(x, n=100, decreasing=TRUE, threshold=NULL, fontsize=c(12, 36), align=TRUE, expandRow=TRUE, justRow="bottom", title, textGpar=gpar(col="navy"), rectGpar=gpar(col="white"), titleGpar=gpar(), viewGpar=gpar(), mar=c(1, 1, 1, 1)) { UseMethod("tagCloud") } tagCloud.default <- function(x, n=100, decreasing=TRUE, threshold=NULL, fontsize=c(12, 36), align=TRUE, expandRow=TRUE, justRow="bottom", title, textGpar=gpar(col="navy"), rectGpar=gpar(col="white"), titleGpar=gpar(), viewGpar=gpar(), mar=c(1, 1, 1, 1)) { if(!is.null(dim(x))) stop("'x' must be a vector") tagCloud.table(table(x), n=n, decreasing=decreasing, fontsize=fontsize, threshold=threshold, align=align, expandRow=expandRow, justRow=justRow, title=title, textGpar=textGpar, rectGpar=rectGpar, titleGpar=titleGpar, viewGpar=viewGpar, mar=mar) } tagCloud.table <- function(x, n=100, decreasing=TRUE, threshold=NULL, fontsize=c(12, 36), align=TRUE, expandRow=TRUE, justRow="bottom", title, textGpar=gpar(col="navy"), rectGpar=gpar(col="white"), titleGpar=gpar(), viewGpar=gpar(), mar=c(1, 1, 1, 1)) { ## --- Check --- if(length(dim(x)) != 1) stop("'x' must be one dimensional table") ## --- Threshold --- if(!is.null(threshold)) x <- x[x >= threshold] ## --- Number of units --- N <- length(x)## length of table if(is.null(n)) { ## if n=NULL, plot all units n <- N } else { if(n > N) n <- N## if n is to big, decrease it if(n < 1) n <- round(N * n) ## if n is percentage of units } fontsizeLength <- length(fontsize) if(fontsizeLength != 2) stop("'fontsize' must be of length two") ## --- Sort and subset --- if(n < N) { ## only if we want to plot subset of units tmp <- sort(x, decreasing=decreasing) x <- x[names(x) %in% names(tmp[1:n])] } ## --- Get relative freq --- x <- prop.table(x) ## --- Fontsize --- fontsizeDiff <- diff(fontsize) xDiff <- max(x) - min(x) if(xDiff != 0) { off <- ifelse(fontsizeDiff > 0, min(x), max(x)) fontsize <- (x - off) / xDiff * fontsizeDiff + min(fontsize) } else { ## all units have the same frequency fontsize <- rep(min(fontsize), times=n) } ## --- Viewport and rectangle --- grid.newpage() width <- unit(1, "npc") height <- unit(1, "npc") vp <- viewport(y=unit(mar[1], "lines"), x=unit(mar[2], "lines"), , width=width - unit(mar[2] + mar[4], "lines"), height=height - unit(mar[1] + mar[3], "lines"), just=c("left", "bottom"), gp=viewGpar, name="main") pushViewport(vp) if(!missing(title)) grid.text(title, y=height, gp=titleGpar, name="title") grid.rect(gp=rectGpar, name="cloud") ## --- Grobs ---
Re: [R] graphically representing frequency of words in a speech?
> The only thing that I found for R is by Gregor Gorjanc, but the > information seems to be dated: > >http://www.bfro.uni-lj.si/MR/ggorjan/software/R/index.html#tagCloud Hi, Yes, I have tried to create a tag cloud plot in R, but I abandoned the project due to other things. The main obstacle was that in R we need to take care of the fontsizes and placement of words, while this is very easy with say browsers, who do all the renderind. I tracked the last version of the R file which is pasted bellow. I must say that I do not remember the status of the code so use it as you wish. If anyone wishes to take this project further, please do so! gg ### tagCloud.R ### ### What: Tag cloud plot functions ### Time-stamp: <2006-09-10 02:53:29 ggorjan> ### tagCloud <- function(x, n=100, decreasing=TRUE, threshold=NULL, fontsize=c(12, 36), align=TRUE, expandRow=TRUE, justRow="bottom", title, textGpar=gpar(col="navy"), rectGpar=gpar(col="white"), titleGpar=gpar(), viewGpar=gpar(), mar=c(1, 1, 1, 1)) { UseMethod("tagCloud") } tagCloud.default <- function(x, n=100, decreasing=TRUE, threshold=NULL, fontsize=c(12, 36), align=TRUE, expandRow=TRUE, justRow="bottom", title, textGpar=gpar(col="navy"), rectGpar=gpar(col="white"), titleGpar=gpar(), viewGpar=gpar(), mar=c(1, 1, 1, 1)) { if(!is.null(dim(x))) stop("'x' must be a vector") tagCloud.table(table(x), n=n, decreasing=decreasing, fontsize=fontsize, threshold=threshold, align=align, expandRow=expandRow, justRow=justRow, title=title, textGpar=textGpar, rectGpar=rectGpar, titleGpar=titleGpar, viewGpar=viewGpar, mar=mar) } tagCloud.table <- function(x, n=100, decreasing=TRUE, threshold=NULL, fontsize=c(12, 36), align=TRUE, expandRow=TRUE, justRow="bottom", title, textGpar=gpar(col="navy"), rectGpar=gpar(col="white"), titleGpar=gpar(), viewGpar=gpar(), mar=c(1, 1, 1, 1)) { ## --- Check --- if(length(dim(x)) != 1) stop("'x' must be one dimensional table") ## --- Threshold --- if(!is.null(threshold)) x <- x[x >= threshold] ## --- Number of units --- N <- length(x)## length of table if(is.null(n)) { ## if n=NULL, plot all units n <- N } else { if(n > N) n <- N## if n is to big, decrease it if(n < 1) n <- round(N * n) ## if n is percentage of units } fontsizeLength <- length(fontsize) if(fontsizeLength != 2) stop("'fontsize' must be of length two") ## --- Sort and subset --- if(n < N) { ## only if we want to plot subset of units tmp <- sort(x, decreasing=decreasing) x <- x[names(x) %in% names(tmp[1:n])] } ## --- Get relative freq --- x <- prop.table(x) ## --- Fontsize --- fontsizeDiff <- diff(fontsize) xDiff <- max(x) - min(x) if(xDiff != 0) { off <- ifelse(fontsizeDiff > 0, min(x), max(x)) fontsize <- (x - off) / xDiff * fontsizeDiff + min(fontsize) } else { ## all units have the same frequency fontsize <- rep(min(fontsize), times=n) } ## --- Viewport and rectangle --- grid.newpage() width <- unit(1, "npc") height <- unit(1, "npc") vp <- viewport(y=unit(mar[1], "lines"), x=unit(mar[2], "lines"), , width=width - unit(mar[2] + mar[4], "lines"), height=height - unit(mar[1] + mar[3], "lines"), just=c("left", "bottom"), gp=viewGpar, name="main") pushViewport(vp) if(!missing(title)) grid.text(title, y=height, gp=titleGpar, name="title") grid.rect(gp=rectGpar, name="cloud") ## --- Grobs --- tag <- vector(mode="list", length=4) names(tag) <- c("fontsize", "grob", "width", "height") tag[[1]] <- tag[[2]] <- tag[[3]] <- tag[[4]] <- vector(mode="list", length=n) for(i in 1:n) { tag$fontsize[[i]] <- fontsize[i] tag$grob[[i]] <- textGrob(names(x[i]), gp=gpar(fontsize=fontsize[i])) tag$width[[i]] <- convertWidth(grobWidth(tag$grob[[i]]), unitTo="npc", valueOnly=TRUE) tag$height[[i]] <- convertHeight(grobHeight(tag$grob[[i]]), unitTo="npc", valueOnly=TRUE) } ## --- Split lines --- row <- colWidth <- vector(length=n) row[1] <- 1 colWidth[1] <- 0 lineWidth <- tag$width[[1]] j <- 1 gapWidth <- convertWidth(stringWidth(" "), u
Re: [R] graphically representing frequency of words in a speech?
On Jun 7, 2009, at 1:41 PM, Brown, Tony Nicholas wrote: Dear all, I recently saw a graph on television that displayed selected words/phrases in a speech scaled in size according to their frequency. So words/phrases that were often used appeared large and words that were rarely used appeared small. The closest thing I can find on the web to approximate what I saw can be found here: http://stateoftheunion.onetwothree.net/ The example at that website is more complicated but captures the general idea. Would someone point me in the right direction in terms of replicating such a graph. Thanks in advance, Tony Tony, What you are referring to is called a 'tag cloud'. See this page: http://en.wikipedia.org/wiki/Tag_cloud They are commonly used on wikis, Twitter and so forth. For example: http://tweetstats.com/trends The only thing that I found for R is by Gregor Gorjanc, but the information seems to be dated: http://www.bfro.uni-lj.si/MR/ggorjan/software/R/index.html#tagCloud I have cc'd him here for any updates. Otherwise, there are some links on the Wikipedia page and some other applications such as Wordle: http://www.wordle.net/ HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.