Re: [R] graphically representing frequency of words in a speech?
Hi, As Gregor Gorjanc mentioned, it's very inconvenient to let R decide the fontsize and placement of words in a plot. There have already been very mature applications of tag cloud; one of them I'm relatively familiar is the WordPress plugin wp-cumulus, which makes use of a Flash object to generate tag cloud, and it has fantastic 3D rotation effect of the cloud. I've spent a couple of hours porting it into R; see the source code and effect here: http://yihui.name/en/2009/06/creating-tag-cloud-using-r-and-flash-javascript-swfobject/ HTH. Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086 Mobile: +86-15810805877 Homepage: http://www.yihui.name School of Statistics, Room 1037, Mingde Main Building, Renmin University of China, Beijing, 100872, China On Mon, Jun 8, 2009 at 2:41 AM, Brown, Tony Nicholastony.n.br...@vanderbilt.edu wrote: Dear all, I recently saw a graph on television that displayed selected words/phrases in a speech scaled in size according to their frequency. So words/phrases that were often used appeared large and words that were rarely used appeared small. The closest thing I can find on the web to approximate what I saw can be found here: http://stateoftheunion.onetwothree.net/ The example at that website is more complicated but captures the general idea. Would someone point me in the right direction in terms of replicating such a graph. Thanks in advance, Tony - Tony N. Brown, Ph.D. Editor-Elect, American Sociological Review Associate Professor of Sociology and Human and Organizational Development (secondary) Program Faculty, Effective Health Communication and African American Diaspora Studies Faculty Head of Hank Ingram House, The Commons Vanderbilt University (615) 322-7518 (615) 322-7505 fax [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graphically representing frequency of words in a speech?
There is a similar discussion in statalist (http://n2.nabble.com/st%3A-Tag-clouds-in-Stata--tt2992551.html#none), I think they make a reasonable argument that tag cloud is not a good statistical graphic. 2009/6/10 Yihui Xie xieyi...@gmail.com: Hi, As Gregor Gorjanc mentioned, it's very inconvenient to let R decide the fontsize and placement of words in a plot. There have already been very mature applications of tag cloud; one of them I'm relatively familiar is the WordPress plugin wp-cumulus, which makes use of a Flash object to generate tag cloud, and it has fantastic 3D rotation effect of the cloud. I've spent a couple of hours porting it into R; see the source code and effect here: http://yihui.name/en/2009/06/creating-tag-cloud-using-r-and-flash-javascript-swfobject/ HTH. Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086 Mobile: +86-15810805877 Homepage: http://www.yihui.name School of Statistics, Room 1037, Mingde Main Building, Renmin University of China, Beijing, 100872, China On Mon, Jun 8, 2009 at 2:41 AM, Brown, Tony Nicholastony.n.br...@vanderbilt.edu wrote: Dear all, I recently saw a graph on television that displayed selected words/phrases in a speech scaled in size according to their frequency. So words/phrases that were often used appeared large and words that were rarely used appeared small. The closest thing I can find on the web to approximate what I saw can be found here: http://stateoftheunion.onetwothree.net/ The example at that website is more complicated but captures the general idea. Would someone point me in the right direction in terms of replicating such a graph. Thanks in advance, Tony - Tony N. Brown, Ph.D. Editor-Elect, American Sociological Review Associate Professor of Sociology and Human and Organizational Development (secondary) Program Faculty, Effective Health Communication and African American Diaspora Studies Faculty Head of Hank Ingram House, The Commons Vanderbilt University (615) 322-7518 (615) 322-7505 fax [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- HUANG Ronggui, Wincent PhD Candidate Dept of Public and Social Administration City University of Hong Kong Home page: http://asrr.r-forge.r-project.org/rghuang.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graphically representing frequency of words in a speech?
Yihui, This is quite impressive, thanks for helping me think about how to make tag clouds in R. Tony -Original Message- From: Yihui Xie [mailto:xieyi...@gmail.com] Sent: Wednesday, June 10, 2009 3:15 AM To: Brown, Tony Nicholas Cc: r-help@r-project.org Subject: Re: [R] graphically representing frequency of words in a speech? Hi, As Gregor Gorjanc mentioned, it's very inconvenient to let R decide the fontsize and placement of words in a plot. There have already been very mature applications of tag cloud; one of them I'm relatively familiar is the WordPress plugin wp-cumulus, which makes use of a Flash object to generate tag cloud, and it has fantastic 3D rotation effect of the cloud. I've spent a couple of hours porting it into R; see the source code and effect here: http://yihui.name/en/2009/06/creating-tag-cloud-using-r-and-flash-javascript-swfobject/ HTH. Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086 Mobile: +86-15810805877 Homepage: http://www.yihui.name School of Statistics, Room 1037, Mingde Main Building, Renmin University of China, Beijing, 100872, China On Mon, Jun 8, 2009 at 2:41 AM, Brown, Tony Nicholastony.n.br...@vanderbilt.edu wrote: Dear all, I recently saw a graph on television that displayed selected words/phrases in a speech scaled in size according to their frequency. So words/phrases that were often used appeared large and words that were rarely used appeared small. The closest thing I can find on the web to approximate what I saw can be found here: http://stateoftheunion.onetwothree.net/ The example at that website is more complicated but captures the general idea. Would someone point me in the right direction in terms of replicating such a graph. Thanks in advance, Tony - Tony N. Brown, Ph.D. Editor-Elect, American Sociological Review Associate Professor of Sociology and Human and Organizational Development (secondary) Program Faculty, Effective Health Communication and African American Diaspora Studies Faculty Head of Hank Ingram House, The Commons Vanderbilt University (615) 322-7518 (615) 322-7505 fax [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graphically representing frequency of words in a speech?
On Jun 7, 2009, at 1:41 PM, Brown, Tony Nicholas wrote: Dear all, I recently saw a graph on television that displayed selected words/phrases in a speech scaled in size according to their frequency. So words/phrases that were often used appeared large and words that were rarely used appeared small. The closest thing I can find on the web to approximate what I saw can be found here: http://stateoftheunion.onetwothree.net/ The example at that website is more complicated but captures the general idea. Would someone point me in the right direction in terms of replicating such a graph. Thanks in advance, Tony Tony, What you are referring to is called a 'tag cloud'. See this page: http://en.wikipedia.org/wiki/Tag_cloud They are commonly used on wikis, Twitter and so forth. For example: http://tweetstats.com/trends The only thing that I found for R is by Gregor Gorjanc, but the information seems to be dated: http://www.bfro.uni-lj.si/MR/ggorjan/software/R/index.html#tagCloud I have cc'd him here for any updates. Otherwise, there are some links on the Wikipedia page and some other applications such as Wordle: http://www.wordle.net/ HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graphically representing frequency of words in a speech?
The only thing that I found for R is by Gregor Gorjanc, but the information seems to be dated: http://www.bfro.uni-lj.si/MR/ggorjan/software/R/index.html#tagCloud Hi, Yes, I have tried to create a tag cloud plot in R, but I abandoned the project due to other things. The main obstacle was that in R we need to take care of the fontsizes and placement of words, while this is very easy with say browsers, who do all the renderind. I tracked the last version of the R file which is pasted bellow. I must say that I do not remember the status of the code so use it as you wish. If anyone wishes to take this project further, please do so! gg ### tagCloud.R ### ### What: Tag cloud plot functions ### Time-stamp: 2006-09-10 02:53:29 ggorjan ### tagCloud - function(x, n=100, decreasing=TRUE, threshold=NULL, fontsize=c(12, 36), align=TRUE, expandRow=TRUE, justRow=bottom, title, textGpar=gpar(col=navy), rectGpar=gpar(col=white), titleGpar=gpar(), viewGpar=gpar(), mar=c(1, 1, 1, 1)) { UseMethod(tagCloud) } tagCloud.default - function(x, n=100, decreasing=TRUE, threshold=NULL, fontsize=c(12, 36), align=TRUE, expandRow=TRUE, justRow=bottom, title, textGpar=gpar(col=navy), rectGpar=gpar(col=white), titleGpar=gpar(), viewGpar=gpar(), mar=c(1, 1, 1, 1)) { if(!is.null(dim(x))) stop('x' must be a vector) tagCloud.table(table(x), n=n, decreasing=decreasing, fontsize=fontsize, threshold=threshold, align=align, expandRow=expandRow, justRow=justRow, title=title, textGpar=textGpar, rectGpar=rectGpar, titleGpar=titleGpar, viewGpar=viewGpar, mar=mar) } tagCloud.table - function(x, n=100, decreasing=TRUE, threshold=NULL, fontsize=c(12, 36), align=TRUE, expandRow=TRUE, justRow=bottom, title, textGpar=gpar(col=navy), rectGpar=gpar(col=white), titleGpar=gpar(), viewGpar=gpar(), mar=c(1, 1, 1, 1)) { ## --- Check --- if(length(dim(x)) != 1) stop('x' must be one dimensional table) ## --- Threshold --- if(!is.null(threshold)) x - x[x = threshold] ## --- Number of units --- N - length(x)## length of table if(is.null(n)) { ## if n=NULL, plot all units n - N } else { if(n N) n - N## if n is to big, decrease it if(n 1) n - round(N * n) ## if n is percentage of units } fontsizeLength - length(fontsize) if(fontsizeLength != 2) stop('fontsize' must be of length two) ## --- Sort and subset --- if(n N) { ## only if we want to plot subset of units tmp - sort(x, decreasing=decreasing) x - x[names(x) %in% names(tmp[1:n])] } ## --- Get relative freq --- x - prop.table(x) ## --- Fontsize --- fontsizeDiff - diff(fontsize) xDiff - max(x) - min(x) if(xDiff != 0) { off - ifelse(fontsizeDiff 0, min(x), max(x)) fontsize - (x - off) / xDiff * fontsizeDiff + min(fontsize) } else { ## all units have the same frequency fontsize - rep(min(fontsize), times=n) } ## --- Viewport and rectangle --- grid.newpage() width - unit(1, npc) height - unit(1, npc) vp - viewport(y=unit(mar[1], lines), x=unit(mar[2], lines), , width=width - unit(mar[2] + mar[4], lines), height=height - unit(mar[1] + mar[3], lines), just=c(left, bottom), gp=viewGpar, name=main) pushViewport(vp) if(!missing(title)) grid.text(title, y=height, gp=titleGpar, name=title) grid.rect(gp=rectGpar, name=cloud) ## --- Grobs --- tag - vector(mode=list, length=4) names(tag) - c(fontsize, grob, width, height) tag[[1]] - tag[[2]] - tag[[3]] - tag[[4]] - vector(mode=list, length=n) for(i in 1:n) { tag$fontsize[[i]] - fontsize[i] tag$grob[[i]] - textGrob(names(x[i]), gp=gpar(fontsize=fontsize[i])) tag$width[[i]] - convertWidth(grobWidth(tag$grob[[i]]), unitTo=npc, valueOnly=TRUE) tag$height[[i]] - convertHeight(grobHeight(tag$grob[[i]]), unitTo=npc, valueOnly=TRUE) } ## --- Split lines --- row - colWidth - vector(length=n) row[1] - 1 colWidth[1] - 0 lineWidth - tag$width[[1]] j - 1 gapWidth - convertWidth(stringWidth( ), unitTo=npc, valueOnly=TRUE) maxWidth - convertWidth(width, unitTo=npc, valueOnly=TRUE) for(i in
Re: [R] graphically representing frequency of words in a speech?
Thank you so much Mark and Gregor. The basic information, suggestions, and R code that you provided is most helpful. Tony -Original Message- From: Gorjanc Gregor [mailto:gregor.gorj...@bfro.uni-lj.si] Sent: Sunday, June 07, 2009 2:17 PM To: Marc Schwartz; Brown, Tony Nicholas Cc: rhelp help Subject: RE: [R] graphically representing frequency of words in a speech? The only thing that I found for R is by Gregor Gorjanc, but the information seems to be dated: http://www.bfro.uni-lj.si/MR/ggorjan/software/R/index.html#tagCloud Hi, Yes, I have tried to create a tag cloud plot in R, but I abandoned the project due to other things. The main obstacle was that in R we need to take care of the fontsizes and placement of words, while this is very easy with say browsers, who do all the renderind. I tracked the last version of the R file which is pasted bellow. I must say that I do not remember the status of the code so use it as you wish. If anyone wishes to take this project further, please do so! gg ### tagCloud.R ###- --- ### What: Tag cloud plot functions ### Time-stamp: 2006-09-10 02:53:29 ggorjan ###- --- tagCloud - function(x, n=100, decreasing=TRUE, threshold=NULL, fontsize=c(12, 36), align=TRUE, expandRow=TRUE, justRow=bottom, title, textGpar=gpar(col=navy), rectGpar=gpar(col=white), titleGpar=gpar(), viewGpar=gpar(), mar=c(1, 1, 1, 1)) { UseMethod(tagCloud) } tagCloud.default - function(x, n=100, decreasing=TRUE, threshold=NULL, fontsize=c(12, 36), align=TRUE, expandRow=TRUE, justRow=bottom, title, textGpar=gpar(col=navy), rectGpar=gpar(col=white), titleGpar=gpar(), viewGpar=gpar(), mar=c(1, 1, 1, 1)) { if(!is.null(dim(x))) stop('x' must be a vector) tagCloud.table(table(x), n=n, decreasing=decreasing, fontsize=fontsize, threshold=threshold, align=align, expandRow=expandRow, justRow=justRow, title=title, textGpar=textGpar, rectGpar=rectGpar, titleGpar=titleGpar, viewGpar=viewGpar, mar=mar) } tagCloud.table - function(x, n=100, decreasing=TRUE, threshold=NULL, fontsize=c(12, 36), align=TRUE, expandRow=TRUE, justRow=bottom, title, textGpar=gpar(col=navy), rectGpar=gpar(col=white), titleGpar=gpar(), viewGpar=gpar(), mar=c(1, 1, 1, 1)) { ## --- Check --- if(length(dim(x)) != 1) stop('x' must be one dimensional table) ## --- Threshold --- if(!is.null(threshold)) x - x[x = threshold] ## --- Number of units --- N - length(x)## length of table if(is.null(n)) { ## if n=NULL, plot all units n - N } else { if(n N) n - N## if n is to big, decrease it if(n 1) n - round(N * n) ## if n is percentage of units } fontsizeLength - length(fontsize) if(fontsizeLength != 2) stop('fontsize' must be of length two) ## --- Sort and subset --- if(n N) { ## only if we want to plot subset of units tmp - sort(x, decreasing=decreasing) x - x[names(x) %in% names(tmp[1:n])] } ## --- Get relative freq --- x - prop.table(x) ## --- Fontsize --- fontsizeDiff - diff(fontsize) xDiff - max(x) - min(x) if(xDiff != 0) { off - ifelse(fontsizeDiff 0, min(x), max(x)) fontsize - (x - off) / xDiff * fontsizeDiff + min(fontsize) } else { ## all units have the same frequency fontsize - rep(min(fontsize), times=n) } ## --- Viewport and rectangle --- grid.newpage() width - unit(1, npc) height - unit(1, npc) vp - viewport(y=unit(mar[1], lines), x=unit(mar[2], lines), , width=width - unit(mar[2] + mar[4], lines), height=height - unit(mar[1] + mar[3], lines), just=c(left, bottom), gp=viewGpar, name=main) pushViewport(vp) if(!missing(title)) grid.text(title, y=height, gp=titleGpar, name=title) grid.rect(gp=rectGpar, name=cloud) ## --- Grobs --- tag - vector(mode=list, length=4) names(tag) - c(fontsize, grob, width, height) tag[[1]] - tag[[2]] - tag[[3]] - tag[[4]] - vector(mode=list, length=n) for(i in 1:n) { tag$fontsize[[i]] - fontsize[i] tag$grob[[i]] - textGrob(names(x[i]), gp=gpar(fontsize=fontsize[i])) tag$width[[i]] - convertWidth(grobWidth(tag$grob[[i]]), unitTo=npc, valueOnly=TRUE) tag$height[[i]] - convertHeight
Re: [R] graphically representing frequency of words in a speech?
Below are various attempts using using ggplot2 (http://had.co.nz/ggplot2/). First I try random positioning, then random positioning with alpha, then a quasi-random position scheme in polar coordinates: #this demo has random number generation # so best to set a seed to make it # reproducible. set.seed(1) #generate some fake data a = data.frame( word = month.name , freq = sample(1:10,12,replace=TRUE) ) #add arbitrary location information a$x = sample(1:12,12) a$y = sample(1:12,12) #load ggplot2 library(ggplot2) #initialize a ggplot object my_plot = ggplot() #create an object for the text layer my_text = geom_text( data = a , aes( x = x , y = y , label = word , size = freq ) ) #create an object for the text size limits my_size_scale = scale_size( to = c(3,20) ) #create an object to expand the x-axis limits # (ensures that text isn't cropped) my_x_scale = scale_x_continuous( expand = c(.5, 0) ) #ditto for the y axis my_y_scale = scale_y_continuous( expand = c(.5, 0) ) #create an opts object that removes # plot elements unnecessary in a tag cloud my_opts = opts( legend.position = 'none' , panel.grid.minor = theme_blank() , panel.grid.major = theme_blank() , panel.background = theme_blank() , axis.line = theme_blank() , axis.text.x = theme_blank() , axis.text.y = theme_blank() , axis.ticks = theme_blank() , axis.title.x = theme_blank() , axis.title.y = theme_blank() ) #show the plot print( my_plot+ my_text+ my_size_scale+ my_x_scale+ my_y_scale+ my_opts ) #to aid readability amidst overlap, set alpha in # the call to geom_text my_text_with_alpha = geom_text( data = a , aes( x = x , y = y , label = word , size = freq ) , alpha = .5 ) #show the version with alpha print( my_plot+ my_text_with_alpha+ my_size_scale+ my_x_scale+ my_y_scale+ my_opts ) #alternatively, in polar coordinates, # which maps x to angle and y to radius, # making a nice circle print( my_plot+ my_text_with_alpha+ my_size_scale+ my_opts+ coord_polar() ) #(note omission of my_y_scale # my_x_scale, which seem to be ignored # when coord_polar() is called. I'll # report this possible bug to the ggplot2 # maintainer) #a possible way to avoid overlap is to # map radius (y) to frequency so that # larger text is in the periphery # where there is more room. This # necessitates adding some random # noise to the frequency so that # the low frequency words don't # jumble in the center too badly a$freq2 = a$freq+rnorm(12) #now map radius (y) to freq2 my_text_with_alpha_and_freq2 = geom_text( data = a , aes( x = x , y = freq2 , label = word , size = freq ) , alpha = .5 ) #show the version with alpha radius mapped to freq2 print( my_plot+ my_text_with_alpha_and_freq2+ my_size_scale+ my_opts+ coord_polar() ) -- Mike Lawrence Graduate Student Department of Psychology Dalhousie University Looking to arrange a meeting? Check my public calendar: http://tr.im/mikes_public_calendar ~ Certainty is folly... I think. ~ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.