Re: [R] why is nrow() so slow?

2009-09-15 Thread hadley wickham
On Tue, Sep 15, 2009 at 9:48 AM, ivo welch ivo...@gmail.com wrote: dear R wizards:  here is the strange question for the day.  It seems to me that nrow() is very slow.  Let me explain what I mean: ds= data.frame( NA, x=rnorm(1) )   ##  a sample data set system.time( { for (i in 1:1)

Re: [R] why is nrow() so slow?

2009-09-15 Thread hadley wickham
PS: Any speed suggestions are appreciated.  This is experimenting time for me. You might want to check out the plyr package - it incorporates everything I know about making these sorts of operations fast. The next version will do even better. Hadley -- http://had.co.nz/

Re: [R] eval(expr) without printing to screen?

2009-09-20 Thread hadley wickham
Here is a simpler mockup which shows the issue: x = data.frame(rbind(c(1,2,3),c(1,2,3))) xnames = c(a, b, c) names(x) = xnames for(i in 1:length(x)) { # Create a varying string expression expr = paste(y = x$, xnames[i], [1], sep=) # evaluate expression eval(parse(text=print(expr)))

Re: [R] R crashes when packages 'impute' and 'GeneMeta' are used together.

2009-09-21 Thread hadley wickham
Well, the title says all for this one. Not really. What do you mean by crash? What do you mean by used together? Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the

Re: [R] Working around 256 byte variable names? + trouble opening large file

2009-09-21 Thread hadley wickham
On Mon, Sep 21, 2009 at 3:04 PM, A Singh aditi.si...@bristol.ac.uk wrote: Dear R users, I am trying to read in a file with 105 columns, and when trying to attach it, get an error as follows: vc1-read.table(P:\\R\\Everything-I.txt, header=T, sep= , dec=., na.strings=NA, strip.white=T)

Re: [R] Working around 256 byte variable names? + trouble opening large file

2009-09-21 Thread hadley wickham
From: hadley wickham h.wick...@gmail.com Don't use attach? Obvously good advice but why? Philosophically, it's better to be explicit than implicit, and the extremely non-local effects of attach can make debugging difficult. Hadley -- http://had.co.nz

Re: [R] Problem with xtabs(), exclude=NULL, and counting NA's

2009-09-24 Thread hadley wickham
wtf - factor(x, levels(c(levels(wtf), NA), exclude=NULL) xtabs (~ wtf, exclude=NULL, na.action=na.pass) Also see addNA. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read

Re: [R] packGrob and dynamic resizing

2009-09-25 Thread hadley wickham
On Fri, Sep 25, 2009 at 7:55 AM, baptiste auguie baptiste.aug...@googlemail.com wrote: Thank you Paul, I was convinced I tried this option but I obviously didn't! In ?packGrob, the user is warned that packing grobs can be slow. In order to quantify this, I made the following comparison of 3

Re: [R] How to assess object names within a function in lapply or l_ply?

2009-09-28 Thread hadley wickham
or with l_ply (plyr package) l_ply(data.frame(a=1:3, b=2:4), function(x) print(deparse(substitute(x The best way to do this is to supply both the object you want to iterate over, and its names. Unfortunately it's slightly difficult to create a data structure of the correct form to do this

Re: [R] How to assess object names within a function in lapply or l_ply?

2009-09-28 Thread hadley wickham
many thanks for your answer and for the enormous work you put into plyr, a really powerful package. For now, I will solve my problem with a variable label attribute, I usually attach to columns in data frames. I asked the list, because I thought, I am overlooking something trivial, since

Re: [R] R and REST API's

2009-09-28 Thread hadley wickham
On Mon, Sep 28, 2009 at 9:01 AM, Gary Lewis gary.m.le...@gmail.com wrote: Hi - Many organizations now make their data available as XML via a REST web service architecture. Is there any R package or facility to access this type of data directly (eg, to make the HTTP GET request and have the

Re: [R] ggplot2 : bug in coord_equal() ?

2010-02-26 Thread hadley wickham
Hi David, That's the behaviour I'd expect - the plot is 5 x 13000. What were you expecting? Hadley On Fri, Feb 26, 2010 at 8:06 AM, David Hajage dhajag...@gmail.com wrote: Hello, I think there is a bug in coord_equal when x s a factor : ggplot(diamonds, aes(clarity, fill=cut)) +

Re: [R] two questions for R beginners

2010-03-01 Thread hadley wickham
One of the things about R which many (and that certainly includes me) have to find out the hard way is that you have to *learn* what to expect! You can't just import it from prior experience in other contexts. So, by the time you have learned that a matrix is such that all its elements must

Re: [R] two questions for R beginners

2010-03-01 Thread hadley wickham
Suppose X is a dataframe or a matrix.  What would you expect to get from X[1]?  What about as.vector(X), or as.numeric(X)? The point is that a dataframe is a list, and a matrix isn't.  If users don't understand that, then they'll be confused somewhere.  Making matrices more list-like in one

Re: [R] Why can't apply be used with as.factor on a data.frame ?

2010-03-07 Thread hadley wickham
The basic reason because apply works with matrices - it first turns the input into a matrix, processes each column and then returns a matrix. See colwise in the plyr package for a function that works column wise on a data frame, returning a data frame. Hadley On Sun, Mar 7, 2010 at 11:07 AM,

Re: [R] vectorizing ANOVA over a vectorized linear model

2010-03-07 Thread hadley wickham
Hi Mark, If efficiency is a concern you might want to read Computing Thousands of Test Statistics Simultaneously in R by Holger Schwender and Tina Müller, http://stat-computing.org/newsletter/issues/scgn-18-1.pdf. If you just want to do it, see the examples in

Re: [R] vectorizing ANOVA over a vectorized linear model

2010-03-07 Thread hadley wickham
please On Sun, Mar 7, 2010 at 2:08 PM, hadley wickham h.wick...@gmail.com wrote: Hi Mark, If efficiency is a concern you might want to read Computing Thousands of Test Statistics Simultaneously in R by Holger Schwender and Tina Müller, http://stat-computing.org/newsletter/issues/scgn-18-1

Re: [R] ggplot2 rose diagram

2010-03-10 Thread hadley wickham
For Q2 you can use opts(legend.position = c(0.9, 0.9)). For Q3, you can also use scale_y_sqrt(). Hadley On Wed, Mar 10, 2010 at 2:05 PM, Tim Howard tghow...@gw.dec.state.ny.us wrote: To answer two of my own questions to get them into the archives (I am slowly getting the hang of ggplot):

Re: [R] ggplot2 rose diagram

2010-03-10 Thread hadley wickham
By Q2 I was trying to refer to the Y-axis labels. For the polar plot, the Y-axis labels reside left of the panel. I was looking for a way to get the Y-axis labels to radiate out from the center so it would be clear which line each label refers to. I still can't find any reference to moving

Re: [R] Help with aggregate and cor

2010-03-10 Thread hadley wickham
Run that function hourly with plyr output.hourly - dlply(df.i1,tshour,cor.dat) Why not output.hourly - ddply(df.i1,tshour,cor.dat) ? Generally you want to work with data frames in R, if at all possible. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics

Re: [R] ggplot2: varwidth-equivalent for geom_boxplot?

2010-03-11 Thread hadley wickham
to the square-roots of the number of observations in the groups. I find this option often very useful. Thanks for any insight into how to achieve this with geom_boxplot. Joh On Wednesday 10 March 2010 16:12:49 hadley wickham wrote: What is varwidth? Hadley On Wed, Mar 10, 2010 at 1:55 PM

[R] [R-pkgs] ggplot2: version 0.8.7

2010-03-13 Thread Hadley Wickham
ggplot2 ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and avoid bad parts. It takes care of many of the fiddly details that make plotting a hassle

Re: [R] likelihood ratio test between glmer and glm

2010-03-14 Thread hadley wickham
Based on a discussion found on the R mailing list but dating back to 2008, I have compared the log-likelihoods of the glm model and of the glmer model as follows: lrt - function (obj1, obj2){ L0 - logLik(obj1) L1 - logLik(obj2) L01 - as.vector(- 2 * (L0 - L1)) df - attr(L1, df) -

Re: [R] Retrieving latitude and longitude via Google Maps API

2010-03-16 Thread hadley wickham
Does anyone have any experience retrieving latitutde and longitude for an address from the Google Maps API? This thread from r-sig-geo may be of interest: https://stat.ethz.ch/pipermail/r-sig-geo/2010-March/thread.html#7788 In particularly, note that what you are doing is against the google

[R] Encrypt/decrypt in R

2010-03-19 Thread Hadley Wickham
Hi all, Does any one know of any encryption/decryption algorithms in R? I'm not looking for anything robust - I want some way of printing output to the screen that the user can't read immediately, but can decrypt a little later. The main thing I don't want to the user to see is a number, so

Re: [R] Encrypt/decrypt in R

2010-03-19 Thread Hadley Wickham
On Fri, Mar 19, 2010 at 12:35 PM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Fri, Mar 19, 2010 at 5:10 PM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Fri, Mar 19, 2010 at 5:00 PM, Hadley Wickham had...@rice.edu wrote: Hi all, Does any one know of any encryption

Re: [R] flexible alternative to subsetting dataframe inside nested loops

2010-03-24 Thread hadley wickham
On Wed, Mar 24, 2010 at 8:52 AM, mgierdal mgier...@gmail.com wrote: I have a dataFrame variable:      L1  L2 L3 ... v1 v2 ...  1  2  3  4  ... I want to process subsets of it as defined by combinations of L1-L2-L3. I do it successfully using nested loops: for (i in valuesOfL1 {  for

Re: [R] GGPLOT2: Reverse order of legend to match order of x-axis

2010-03-24 Thread hadley wickham
See here: http://learnr.wordpress.com/2010/03/23/ggplot2-changing-the-default-order-of-legend-labels-and-stacking-of-data/ Hadley On Wed, Mar 24, 2010 at 2:40 PM, Ryan Garner ryan.steven.gar...@gmail.com wrote: How do I reverse the order of the legend in a bar plot to match order of the

Re: [R] Competing with SPSS and SAS: improving code that loops through rows (data manipulation)

2010-03-27 Thread hadley wickham
# Set up the ratio variables system.time({ temp - cbind(data, do.call(cbind, lapply(names(data)[3:4], function(.x)        {                unlist(by(data, data$group, function(.y) .y[,.x] / max(.y[,.x])))        }))) colnames(temp)[5:6] - paste(colnames(data)[3:4], 'ind.to.max', sep =

Re: [R] Competing with SPSS and SAS: improving code that loops throughrows (data manipulation)

2010-03-27 Thread hadley wickham
 exp1^(a[case] * l * 10) would be better written out of the loop as  b - exp1^(a * l * 10) And even better as b - exp(a * l * 10) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/

Re: [R] regular expression help to extract specific strings from text

2010-03-31 Thread hadley wickham
On Wed, Mar 31, 2010 at 8:20 AM, Tony B tony.bre...@googlemail.com wrote: Dear all, Lets say I have the following: x - c(Eve: Going to try something new today..., Adam: Hey @Eve, how are you finding R? #rstats, Eve: @Adam, It's awesome, so much better at statistics that #Excel ever was!

Re: [R] Sharing levels across multiple factor vectors

2010-04-01 Thread hadley wickham
On Thu, Apr 1, 2010 at 3:05 AM, Peter Dalgaard pda...@gmail.com wrote: Jeff Brown wrote: Sorry for spamming.  I swear I had worked on that problem a long time before posting. But I just figured it out: I have to change the values, which are represented as integers, not strings.  So the

Re: [R] sample size 20K? Was: fitness of regression tree: how to measure???

2010-04-01 Thread hadley wickham
Incidentally, there is nothing new or radical in this; indeed, John Tukey, Leo Breiman, George Box, and others wrote eloquently about this decades ago. And Breiman's random forest modeling procedure explicitly abandoned efforts to build simply interpretable models (from which one might infer

Re: [R] SAS and R on multiple operating systems

2010-04-06 Thread hadley wickham
 Also I have seen 5,000 page listings in SAS. Is this a pro or a con? Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list

Re: [R] ggplot2, density barplot and geom_point layer

2010-04-07 Thread hadley wickham
Because of the way you've constructed the plot with qplot, you need to use: myPlot + geom_point( data=medians, aes(x=med,shape=cut, y=0), size=2.5, ) Hadley On Wed, Apr 7, 2010 at 5:11 AM, Johannes Graumann johannes_graum...@web.de wrote: Hi, Please consider the example below. How can I

[R] Using read.table to read file created with read.table and qmethod = escape

2010-04-07 Thread Hadley Wickham
df - data.frame(a = a\b) write.table(df, test.csv, sep = ,, row = F) Is there any to load test.csv into R correctly? I've tried the following: read.table(test.csv, sep = ,) [1] V1 0 rows (or 0-length row.names) Warning message: In read.table(test.csv, sep = ,) : incomplete final line found

[R] Strange csv parsing problem

2010-04-07 Thread Hadley Wickham
url - http://dl.dropbox.com/u/41902/22240.csv; read.csv(url)[, 1] [1] oppose NAoppose support read.csv(url, header = F)[, 1] [1] url [2] http://maplight.org/us-congress/bill/109-hr-5825/387248; [3] http://maplight.org/us-congress/bill/110-hr-3546/378743; [4]

Re: [R] Using read.table to read file created with read.table and qmethod = escape

2010-04-08 Thread Hadley Wickham
df - data.frame(a = a\b, v = 4, z = this is Z) write.csv(df, test.csv, row.names = FALSE, quote = FALSE) read.csv(test.csv, quote = ) Unfortunately my real example is more like: df - data.frame(a = a\b, v = 4, z = this is: A, B, C) so quote = F won't work. Can write.table and read.table

Re: [R] Using read.table to read file created with read.table and qmethod = escape

2010-04-08 Thread Hadley Wickham
Can write.table and read.table really be so asymmetric? write() is a wrapper for cat() and read() is a wrapper for scan() so the question should really be can cat() and scan() be so asymmetric. Looking at their help pages, I would say that at least some degree of asymmetry is plausible.

Re: [R] Using read.table to read file created with read.table and qmethod = escape

2010-04-08 Thread Hadley Wickham
On Thu, Apr 8, 2010 at 8:20 AM, jim holtman jholt...@gmail.com wrote: You were using read.csv and not read.table.  The following seems to work with using a separator that will probably not appear in the text: df - data.frame(a = a\b, v = 4, z = this is: A, B, C) write.table(df, test.csv,

Re: [R] Strange csv parsing problem

2010-04-08 Thread Hadley Wickham
Remove the comma and count.fields gives 11 for all rows. From your other post(s) on escaped quotes, I assume that this won't solve your problem with the existing files. (: Right - but assuming I'm not crazy, that should cause an error in read.csv, right? It shouldn't just parse the file

Re: [R] Combining ggplot2 objects and/or extracting layers

2010-04-09 Thread hadley wickham
Other then rebuilding the plots, is there any way either (1) to combine existing ggplot2 plots or (2) to extract a layer from an existing plot so that it can be added to another? Not really, although you can always pull apart the plot components. Can you give an example of what you are trying

Re: [R] error bars on barplot

2010-04-09 Thread hadley wickham
bar.err (agricolae) plotCI (gplots) xYplot (Hmisc) error.bars (psych) dispersion (plotrix) plotCI (plotrix) Not to mention: http://biostat.mc.vanderbilt.edu/wiki/Main/DynamitePlots Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University

Re: [R] Points but no lines in qplot.

2011-06-30 Thread Hadley Wickham
On Thu, Jun 30, 2011 at 7:31 AM, Ashim Kapoor ashimkap...@gmail.com wrote: Dear R helpers, I have molten data which is : - t3   Year       variable        value 1  2005     ICICI.Bank 27488370 2  2006     ICICI.Bank 43166850 3  2007     ICICI.Bank 59515300 4  2008    

[R] [R-pkgs] stringr 0.5

2011-07-01 Thread Hadley Wickham
# stringr Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparations tasks. R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn.

Re: [R] coefficients lm of data.frame

2011-07-07 Thread Hadley Wickham
On Thu, Jul 7, 2011 at 5:24 PM, Dennis Murphy djmu...@gmail.com wrote: Hi: Here's another approach using the plyr package: library(plyr) df - data.frame(gp = factor(rep(1:3, each = 4)), x = rnorm(12), y = rnorm(12)) mylst - split(df, df$gp) mycoefs -  ldply(mylst, function(d) coef(lm(y ~

Re: [R] Referencing a vector of data labels in ggplot function

2011-07-09 Thread Hadley Wickham
Maybe something like this? withNames - function(dframe, lineNames, plotName, colors){ one_day - subset(dframe, data == '1941-06-16') one_day$lineNames - lineNames ggplot(dframe, aes(date, value, group = factor, color = factor)) + geom_line(size = 1) + facet_grid(Facet~., scales =

Re: [R] grey colored lines and overwriting labels i qqplot2

2011-07-15 Thread Hadley Wickham
You should only have one scale_ call for each scale type.  Here, you have three scale_colour_ calls, the first selecting a grey scale, the second defining a single break with its label (and thus implicitly subsetting on that single break value), and a second which defines a different

Re: [R] Odd behaviour of as.POSIXct

2011-07-16 Thread Hadley Wickham
Also, if we make days a list, the class attributes are kept when looping over the list, ie. days- list( as.Date( c(2000-01-01, 2000-01-02) ) ) Do you realise that that's a list with length one? I suspect you want days - as.list( as.Date( c(2000-01-01, 2000-01-02) ) ) for (day in days) {

Re: [R] Save generic plot to file (before rendering to device)

2011-07-16 Thread Hadley Wickham
Thank you, this is very helpful. One final question regarding this method: suppose a function prints multiple plots, i.e. multiple pages to a PDF. Is it possible to record all of these plots at once? The code below only records the final plot. I would like to record all of them, without

Re: [R] squared pie chart - is there such a thing?

2011-07-21 Thread Hadley Wickham
This is called a squarified pie chart or a waffle chart (if you want to keep the food metaphor going): http://eagereyes.org/communication/Engaging-readers-with-square-pie-waffle-charts.html Hadley On Thu, Jul 21, 2011 at 10:29 AM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote:

Re: [R] Reading name-value data

2011-07-28 Thread Hadley Wickham
Use plyr::rbind.fill? That does match up columns by name. Hadley On Thu, Jul 28, 2011 at 5:23 PM, Stavros Macrakis macra...@alum.mit.edu wrote: I have a file of data where each line is a series of name-value pairs, but where the names are not necessarily the same from line to line, e.g.    

Re: [R] 'breackpoints' (package 'strucchange'): 2 blocking error messages when using for multiple regression model testing

2011-07-29 Thread Hadley Wickham
struc.test - breakpoints(y~x1+x2+x3+x3+x4, data=D) *I get an error message:*  Erreur dans chol2inv(qr.R(fm$qr)) :  l'?l?ment (5, 5) est nul, donc l'inverse ne peut ?tre calcul? (sorry for the french version, I don't know how to get the message english translation in R). My first

Re: [R] http://www.r-project.org/contributors.html: display problem, because no character set is defined

2011-07-29 Thread Hadley Wickham
And I think Uwe is missing from that list! Hadley On Fri, Jul 29, 2011 at 3:34 PM, Paul Menzel paulepan...@users.sourceforge.net wrote: Dear R webmasters, my browser defaults to the charset UTF-8 and since [1] seems to be encoded in ISO-8859-1 the umlauts are not displayed correctly. It

[R] [R-pkgs] plyr version 1.6

2011-07-30 Thread Hadley Wickham
# plyr plyr is a set of tools for a common set of problems: you need to __split__ up a big data structure into homogeneous pieces, __apply__ a function to each piece and then __combine__ all the results back together. For example, you might want to: * fit the same model each patient subsets of

Re: [R] Reading name-value data

2011-08-01 Thread Hadley Wickham
if give non-data-frames. -s On Thu, Jul 28, 2011 at 19:30, Hadley Wickham had...@rice.edu wrote: Use plyr::rbind.fill? That does match up columns by name. Hadley On Thu, Jul 28, 2011 at 5:23 PM, Stavros Macrakis macra...@alum.mit.edu wrote: I have a file of data where each line

Re: [R] Utilizing column names to multiply over all columns

2011-08-16 Thread Hadley Wickham
You will get the warning that last last column is not going right but otherwise this returns what you asked for: sapply(1:length(mydf), function(i) mydf[[i]]* as.numeric(names(mydf)[i])  ) This suits my purposes well with a couple slight modifications: ## I made this into a data.frame so I

Re: [R] How to apply a function to subsets of a data frame *and* obtain a data frame again?

2011-08-17 Thread Hadley Wickham
The following example does what you want using ddply: library(plyr) edfPerGroup = ddply(df, .(Group), summarise, edf = edf(Value), Value = Value) Or slightly more succinctly: ddply(df, .(Group), mutate, edf = edf(Value)) Hadley -- Assistant Professor / Dobelman Family Junior Chair

Re: [R] Persistent storage between package invocations

2011-03-16 Thread Hadley Wickham
No.  First, please use path.expand(~) for this, and it does not necessarily mean the home directory (and in principle it might not expand at all).  In practice I think it will always be *a* home directory, but on Windows there may be more than one (and watch out for local/roaming profile

Re: [R] File Save As...

2011-03-16 Thread Hadley Wickham
No, defaults are evaluated in the evaluation frame of the function. That's why you can use local variables in them, e.g. the way rgamma uses 1/rate as a default for scale. Oops, yes, I was getting confused with promises - non-missing arguments are promises evaluated in the parent frame. But

Re: [R] proportional symbol map ggplot

2011-03-16 Thread Hadley Wickham
On Mon, Mar 14, 2011 at 9:41 AM, Strategische Analyse CSD Hasselt csd...@fedpolhasselt.be wrote: Hello, we want to plot a proportional symbol map with ggplot. Symbols' area should have the same proportions as the scaled variable. Hereby an example we found on

Re: [R] Does R have a const object?

2011-03-16 Thread Hadley Wickham
Its useful for being able to set defaults for arguments that do not have defaults.  That cannot break existing programs. Until the next program decides do co change those defaults and either can't or does and you end up with incompatible assumptions.  It also make the code with the added

Re: [R] assigning to list element within target environment

2011-03-17 Thread Hadley Wickham
On Thu, Mar 17, 2011 at 7:25 AM, Richard D. Morey r.d.mo...@rug.nl wrote: I would like to assign an value to an element of a list contained in an environment. The list will contain vectors and matrices. Here's a simple example: # create toy environment testEnv = new.env(parent = emptyenv())

Re: [R] Strange R squared, possible error

2011-03-17 Thread Hadley Wickham
2) I don't want to fit data with linear model of zero intercept. 3) I dont know if I understand correctly. Im 100% sure the model for my data should have zero intercept. The only coordinate which Im 100% sure is correct. If I had measured quality Y of a same sample X0 number of times I would

Re: [R] Popularity of R, SAS, SPSS, Stata, Statistica, S-PLUS updated

2011-03-22 Thread Hadley Wickham
I don't doubt that R may be the most popular in terms of discussion group traffic, but you should be aware that the traffic for SAS comprises two separate lists that used to be mirrored, but are no longer linked Usenet --  news://comp.soft-sys.sas  (what you counted) listserve -- SAS-L

Re: [R] subset and as.POSIXct / as.POSIXlt oddness

2011-03-24 Thread Hadley Wickham
On Thu, Mar 24, 2011 at 8:29 AM, Michael Bach pha...@gmail.com wrote: Dear R users, Given this data: x - seq(1,100,1) dx - as.POSIXct(x*900, origin=2007-06-01 00:00:00) dfx - data.frame(dx) Now to play around for example: subset(dfx, dx as.POSIXct(2007-06-01 16:00:00)) Ok. Now for

Re: [R] How create vector that sums correct responses for multiple subjects?

2011-03-24 Thread Hadley Wickham
On Thu, Mar 24, 2011 at 2:24 PM, Kevin Burnham kburn...@gmail.com wrote: I have a data file with indicates pretest scores for a linguistics experiment.  The data are in long form so for each of 33 subjects there are 400 rows, one for each item on the test, and there is a column called

Re: [R] merging data list in to single data frame

2011-04-04 Thread Hadley Wickham
filelist = list.files(pattern = K*cd.txt) # the file names are K1cd.txt .to K200cd.txt It's very easy: names(filelist) - basename(filelist) data_list - ldply(filelist, read.table, header=T, comment=;, fill=T) Hadley -- Assistant Professor / Dobelman Family Junior Chair

Re: [R] R licence

2011-04-07 Thread Hadley Wickham
If all you need is loess, I suspect it would be cheaper to re-write it in C# than to get a considered legal opinion on the matter. Hadley On Thu, Apr 7, 2011 at 2:45 AM, Stanislav Bek stanislav.pavel@gmail.com wrote: Hi, is it possible to use some statistic computing by R in proprietary

Re: [R] Windrose Percent Interval Frequencies Are Non Linear! Help!

2011-04-07 Thread Hadley Wickham
Does anyone with specific windrose experience know how to adjust the graphic such that the data and the percent intervals are evenly spaced? Hopefully I am making sense here How about giving us a reproducible example? Code is better than mere description; code + description is best.

[R] [R-pkgs] plyr: version 1.5

2011-04-11 Thread Hadley Wickham
# plyr plyr is a set of tools for a common set of problems: you need to __split__ up a big data structure into homogeneous pieces, __apply__ a function to each piece and then __combine__ all the results back together. For example, you might want to: * fit the same model each patient subsets of

Re: [R] Fwd: CRAN problem with plyr-1.4.1

2011-04-12 Thread Hadley Wickham
Then, can we have the ERROR message, please? Otherwise the only explanation I can guess is that a mirror grabs the contents of a repository exactly in the second the repository is updated and that is unlikely, particularly if more than one mirror is involved. Isn't one possible explanation

Re: [R] Is there a better way to parse strings than this?

2011-04-13 Thread Hadley Wickham
On Wed, Apr 13, 2011 at 5:18 AM, Dennis Murphy djmu...@gmail.com wrote: Hi: Here's one approach: strings - c( A5.Brands.bought...Dulux, A5.Brands.bought...Haymes, A5.Brands.bought...Solver, A5.Brands.bought...Taubmans.or.Bristol, A5.Brands.bought...Wattyl, A5.Brands.bought...Other)

Re: [R] R plots pdf() does not allow spotcolors?

2011-04-13 Thread Hadley Wickham
Even so, this would depend on what your publisher/printer requires in what you submit. It would be important to obtain from them a full and exact specification of what they require for colour printing in files submitted to them for printing. No one else has mentioned this, but the publisher

[R] Line plots in base graphics

2011-04-13 Thread Hadley Wickham
Am I missing something obvious on how to draw multi-line plots in base graphics? In ggplot2, I can do: data(Oxboys, package = nlme) library(ggplot2) qplot(age, height, data = Oxboys, geom = line, group = Subject) But in base graphics, the best I can come up with is this: with(Oxboys,

Re: [R] Line plots in base graphics

2011-04-13 Thread Hadley Wickham
On Wed, Apr 13, 2011 at 2:58 PM, Ben Bolker bbol...@gmail.com wrote: Hadley Wickham hadley at rice.edu writes: Am I missing something obvious on how to draw multi-line plots in base graphics? In ggplot2, I can do: data(Oxboys, package = nlme) library(ggplot2) qplot(age, height, data

Re: [R] Is there a better way to parse strings than this?

2011-04-14 Thread Hadley Wickham
I was trying strsplit(string,\.\.\.) as per the suggestion in Venables and Ripleys book to (use '\.' to match '.'), which is in the Regular expressions section. I noticed that in the suggestions sent to me people used: strsplit(test,\\.\\.\\.) Could anyone please explain why I should have

Re: [R] (no subject)

2011-04-18 Thread Hadley Wickham
Yes, it's fixed and a new version of plyr has been pushed up to cran - hopefully will be available for download soon. In the meantime, I think you can fix it by running library(stats) before library(ggplot2). Hadley On Sun, Apr 17, 2011 at 3:51 PM, Bryan Hanson han...@depauw.edu wrote: Is

Re: [R] taking rows from data.frames in list to form new data.frame?

2011-04-21 Thread Hadley Wickham
On Wed, Apr 20, 2011 at 6:36 PM, Dennis Murphy djmu...@gmail.com wrote: Hi: Perhaps you're looking for subset()? I'm not sure I understand the problem completely, but is do.call(rbind, lapply(database, function(df) subset(df, Symbol == 'IBM'))) or library(plyr) ldply(lapply(database,

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

2011-04-25 Thread Hadley Wickham
If you need plyr for other tasks you ought to use a different class for your date data (or wait until plyr can deal with POSIXlt objects). How do you get POSIXlt objects into a data frame? df - data.frame(x = as.POSIXlt(as.Date(c(2008-01-01 str(df) 'data.frame': 1 obs. of 1 variable:

Re: [R] Empty Data Frame

2011-04-27 Thread Hadley Wickham
On Wed, Apr 27, 2011 at 4:58 AM, Dennis Murphy djmu...@gmail.com wrote: Hi: You could try something like df - data.frame( expand.grid( Week = 1:52, Year = 2002:2011 )) expand.grid already returns a data frame... You might want KEEP.OUT.ATTRS = F though. Even it feels like you are yelling

Re: [R] setting options only inside functions

2011-04-27 Thread Hadley Wickham
This has the side effect of ignoring errors and even hiding the error messages.  If you are concerned about multiple calls to on.exit() in one function you could define a new function like  withOptions - function(optionList, expr) {   oldOpts - options(optionList)  

Re: [R] MASS fitdistr with plyr or data.table?

2011-04-27 Thread Hadley Wickham
On Wed, Apr 27, 2011 at 3:55 PM, Justin Haynes jto...@gmail.com wrote: I am trying to extract the shape and scale parameters of a wind speed distribution for different sites.  I can do this in a clunky way, but I was hoping to find a way using data.table or plyr.  However, when I try I am met

Re: [R] setting options only inside functions

2011-04-27 Thread Hadley Wickham
Put together a list and we can see what might make sense.  If we did take this on it would be good to think about providing a reasonable mechanism for addressing the small flaw in this function as it is defined here. In devtools, I have: #' Evaluate code in specified locale. with_locale -

Re: [R] Simple loop

2011-05-07 Thread Hadley Wickham
Using paste(Site,Prof) when calling ave() is ugly, in that it forces you to consider implementation details that you expect ave() to take care of (how does paste convert various types to strings?).  It also courts errors  since paste(A B, C) and paste(A, B C) give the same result but

Re: [R] ddply from plyr package - any alternatives?

2011-08-25 Thread Hadley Wickham
z - ddply(past, c(GEO_CNTRY_NAME,PROD_SEG_NAME),  function(x) summary(lm(VAL~fy,x))$r.squared) But when ave is not exactly doing what I need. Above code runs under a minute for my data set where as ave runs over 8 mins. It's hard to know without a reproducible example, but I doubt that ddply

Re: [R] how to read a group of files into one dataset?

2011-08-25 Thread Hadley Wickham
# Method 2: Use the plyr package library('plyr') bdf - ldply(mlply(files, read.csv, header = TRUE), rbind) Or just bdf - ldply(files, read.csv, header = TRUE) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/

Re: [R] x %% y as an alternative to which( x y)

2011-09-13 Thread Hadley Wickham
Because in coding, I often end up with big chunks looking like this: ((mydataframeName$myvariableName 2 !is.na(mydataframeName$myvariableName)) (mydataframeName$myotherVariableName == male !is.na(mydataframeName$myotherVariableName))) Which is much less

[R] Quelplot

2011-09-21 Thread Hadley Wickham
Hi all, Does anyone have an R implementation of the queplot (K. M. Goldberg and B. Iglewicz. Bivariate extensions of the boxplot. Technometrics, 34(3):pp. 307–320, 1992)? I'm struggling with the estimation of the asymmetry parameters. Hadley -- Assistant Professor / Dobelman Family Junior

Re: [R] Quelplot

2011-09-22 Thread Hadley Wickham
. Best Wishes, Boris On Wed, Sep 21, 2011 at 3:11 PM, Hadley Wickham had...@rice.edu wrote: Hi all, Does anyone have an R implementation of the queplot (K. M. Goldberg and B. Iglewicz. Bivariate extensions of the boxplot. Technometrics, 34(3):pp. 307–320, 1992)?  I'm struggling

Re: [R] remove NaN from element in a vector in a list

2011-09-27 Thread Hadley Wickham
apply(mt, 1, function(x) x[!is.nan(x)] ) [[1]] [1] 1 3 [[2]] [1] 4 5 6 You need to be a little careful with apply: mt2 - matrix(c(1,4,2,5,3,6),2,3) apply(mt2, 1, function(x) x[!is.nan(x)] ) [,1] [,2] [1,]14 [2,]25 [3,]36 Depending on the input you will get a

Re: [R] Question about ggplot2 and stat_smooth

2011-10-04 Thread Hadley Wickham
On Mon, Oct 3, 2011 at 12:24 PM, Thomas Adams thomas.ad...@noaa.gov wrote:  I'm interested in creating a graphic -like- this: c - ggplot(mtcars, aes(qsec, wt)) c + geom_point() + stat_smooth(fill=blue, colour=darkblue, size=2, alpha = 0.2) but I need to show 2 sets of bands (with different

Re: [R] ggplot2: expression() in legend labels?

2011-10-04 Thread Hadley Wickham
You need to set the labels... Hadley On Sat, Sep 24, 2011 at 3:49 AM, Casper Ti. Vector caspervec...@gmail.com wrote: Is there any way to use expression() in legend labels with ggplot2? It seems that things like scale_shape_manual(value = c(   x = expression(italic(x)),   y =

Re: [R] Question about ggplot2 and stat_smooth

2011-10-04 Thread Hadley Wickham
# Function to compute quantiles and return a data frame g - function(d) {   qq - as.data.frame(as.list(quantile(d$y, c(.05, .25, .50, .75, .95   names(qq) - paste('Q', c(5, 25, 50, 75, 95), sep = '')   qq   } You could cut out the melt step by making this return a data frame: g -

Re: [R] multicore by(), like mclapply?

2011-10-10 Thread Hadley Wickham
On Mon, Oct 10, 2011 at 4:14 PM, Joshua Wiley jwiley.ps...@gmail.com wrote: I could be waay off base here, but my concern about presplitting the data is that you will have your data, and a second copy of our data that is something like a list where each element contains the portion of the

Re: [R] assigning NULL to a list element

2012-02-18 Thread Hadley Wickham
On Fri, Feb 17, 2012 at 7:51 PM, Benilton Carvalho beniltoncarva...@gmail.com wrote: Hi everyone, For reasons beyond the scope of this message, I'd like to append a NULL element to the end of a list. tmp0 - list(a=1, b=NULL, c=3) append(tmp0, c(d=4)) ## works as expected append(tmp0,

[R] [R-pkgs] devtools 0.6

2012-03-03 Thread Hadley Wickham
# devtools The aim of `devtools` is to make your life as a package developer easier by providing R functions that simplify many common tasks. Devtools is opinionated about how to do package development, and requires that you use `roxygen2` for documentation and `testthat` for testing. Future

Re: [R] Improving help in R

2012-03-17 Thread Hadley Wickham
One difficulty in getting the help pages to look beautiful is that the original input is so inconsistent, and package authors (naturally) get upset when CRAN starts rejecting packages because of errors that used to be ignored.  The current output is definitely a compromise aimed at making most

Re: [R] Year of data collection for 'diamonds' dataset in ggplot2

2012-03-27 Thread Hadley Wickham
I believe it was 2008. Hadley On Mon, Mar 26, 2012 at 11:46 AM, Marina Doucerain marinadoucer...@gmail.com wrote: Hello, I'm wondering what was the year (or year range) of collection for the data included in the 'diamonds' dataset in ggplot2. This information would be very helpful in

Re: [R] Appropriate method for sharing data across functions

2012-04-05 Thread Hadley Wickham
Why not pass around a reference class? Hadley On Thu, Apr 5, 2012 at 3:20 PM, John C Nash nas...@uottawa.ca wrote: In trying to streamline various optimization functions, I would like to have a scratch pad of working data that is shared across a number of functions. These can be called from

<    6   7   8   9   10   11   12   13   14   15   >