Re: [R] Problem of vocabulary : retrieve element of a list of a list
On 2007-August-31 , at 10:17 , Ptit_Bleu wrote: >> x<-list(LETTERS[1:5], LETTERS[10:20]) not sure to have understood exactly what you meant. if you want to search for the D in the list: lapply(x,charmatch,"D") should get you started. if you just want to know the syntax to extract an element from a list x[[1]][4] will get you the "D". but I a sure you would have found out if you read the manual carefully. maybe you should read an R introduction and practice on the examples there rather than go straight into your own data. It would take a week at most and is very rewarding in the long term. An introduction in english: http://cran.r-project.org/doc/manuals/R-intro.pdf A nice one in French http://www.cran.r-project.org/doc/contrib/Paradis-rdebuts_fr.pdf Cheers, JiHO --- http://jo.irisson.free.fr/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Excel
On 2007-August-31 , at 00:13 , David Scott wrote: > On Thu, 30 Aug 2007, Duncan Murdoch wrote: >> On 8/28/2007 3:16 AM, J Dougherty wrote: >>> On Monday 27 August 2007 22:21, David Scott wrote: >>>> On Tue, 28 Aug 2007, Robert A LaBudde wrote: >>>>> If you format the column as "Text", you won't have this >>>>> problem. By >>>>> leaving the cells as "General", you leave it up to Excel to >>>>> guess at >>>>> the correct interpretation. >>>> >>>> Not true actually. I had converted the column to Text because I >>>> saw the >>>> interpretation as a date in the .xls file. I saved the .csv file >>>> *after* >>>> the column had been converted to Text. Looking at the .csv file >>>> in a text >>>> editor, the entry is correct. >>>> >>>> I have just rechecked this. >>>> >>>> On reopening the .csv using Excel, the entry AUG2699 had been >>>> interpreted >>>> as a date, and was showing as Aug-99. Most bizarre is that the >>>> NHI value >>>> of AUG1838 has *not* been interpreted as a date. >>>> >>> Actually, in Excel 2000, he's right. What you have to is be sure >>> of is that >>> the "'" that denotes a text entry precedes EVERY entry that can >>> be confused >>> with a date. Selecting the entire column and setting the format >>> to "text" >>> *before* data is entered does this. It will also create an >>> appropriate *.csv >>> file. Excel is notable too because it will automatically convert >>> "date-like" >>> entries as you type. In a column of IDs or similar critical >>> data, that >>> behaviour is really bad. I have never tried the MS site, but I >>> haven't been >>> able to find any entry about how to turn that particular >>> automatic behaviour >>> off. >>> >>> However, while I have not experimented extensively, as far as I have >>> experimented, OpenOffice spreadsheet does not behave this way. >> >> I don't use Excel, but in OpenOffice 2.2.1 the ' is lost when a >> file is >> saved as .csv and reloaded. So if I take care and enter >> >> 'November 15 >> >> in a cell, then save it, OO will change it to 11/15/2007 when I >> reload. >> I can override this change by manually changing "Standard" format to >> "Text" *every time* I load the file. There's a help index entry >> "date >> formats;avoiding conversion to", but it offers no more help than >> "add an >> apostrophe at the beginning of the entry". >> >> This is brain-dead behaviour. > > This was the behaviour that really scared me in Excel: saving as .csv > loses any formatting (it is just an ascii file, how can it have > formatting > info?). Then opening in Excel (or it seems OO), the incorrect date > interpretation occurs. If I then save the .csv I have erroneous data. > > I often do just this sort of thing because I get given data > in .xls, it > has clunky column names or extraneous stuff so I alter it, save it as > .csv. Then I get a data correction, some clarification of a value, > so I > want to go to the .csv to correct that data value. Once I do that > if I am > not *extremely* careful, before saving the .csv file, I have a > problem. I'll probably advise everyone to use Gnumeric then: - entries such as 2005/06/08 are interpreted as date and show as 8/6/2005. but even if you change them to 8/7/05 for example they will be written in the csv in your original format, with the change included (i.e. 2005/07/08 here) - entries with several decimals such as 1.4563 can be formatted to be displayed 1.46 but will still be written 1.4563 in the csv - there is no text import/export dialog when opening or closing csv files which speeds up things quite a bit. but you can get the dialog if you are so inclined Still some problems - "0568" in the csv, which is a label (notice the quotes and leading zero) is still interpreted as a number by default - the date is in fact written using the default preferences (namely /mm/dd) and some date in ISO format (-mm-dd) is converted to /mm/dd when written in csv So not perfect but much better (and quicker and possibly more precise) than both Excel and OO Calc. Oh and cross platform also ;). JiHO --- http://jo.irisson.free.fr/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Variance explained by cluster analysis
Hello, As suggested in "De'ath, 2002. Multivariate regression trees: A new technique for modelling species-environment relationships. Ecology, 83 (4):1105-1117" (for those interested), I am trying to compare the performance of a multivariate regression tree to a cluster analysis. A simple partitioning with k clusters (as done by `pam`) seemed straightforward and appropriate to compare to an MRT with k leaves. Now I am looking for a measure of how much variance each of these methods explains. The MRT analysis provides me with such a measure. I was wondering what I could use in a cluster analysis. When plotting the pam object with which.plots=clusplot, there is a message at the bottom of the plot: "These two components explain x% of the point variability". Can I safely assume that this is a percentage of variance explained by the k clusters? Is there anything else that I could compute? More generally, am I totally wrong in comparing these two methods? Are there some references particularly appropriate to this? (NB: I am already hunting down the Kaufman, L. and Rousseeuw book) Thank you in advance for your help. JiHO --- http://jo.irisson.free.fr/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] superposing lattice plots
Hello everyone, I am sorry if this has already been asked but I can't find it. I want to superpose two lattice plots, namely a levelplot and a contourplot of two different variables with the same x-y scale. I found information about panel.superpose but it does not seem to correspond to what I want (I have two different variables, not groups of the same variable) How can I do this? Is there a way to concatenate the two trellis objects and plot that? Simple example using simulated data: x=seq(-5,5,length.out=100) y=seq(-2,2,length.out=60) mat1=cos(x)%*%t(cos(y)) mat2=cos(x)%*%t(sin(y)) levelplot(mat1) contourplot(mat2) I would like both plots to appear superposed. PS: accessory question, for the enthusiast ;). When data in contained in a matrix and x-y coordinates in separate vectors, as above, is there a way to get level/contourplot to use x and y as the coordinates vectors other than by "unrolling" the matrix in a data.frame: x y mat 1 1 0.125 1 2 0.1367 1 3 0.2345 and using mat ~ x*y ? Thank you in advance. Sincerely, JiHO --- http://jo.irisson.free.fr/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply, lapply and data.frame in R 2.5
On 2007-July-30 , at 12:20 , Prof Brian Ripley wrote: > On Mon, 30 Jul 2007, jiho wrote: >> A recent (in 2.5 I suspect) change in R is giving me trouble. I want >> to apply a function (tolower) to all the columns of a data.frame and >> get a data.frame in return. >> Currently, on a data.frame, both apply (for arrays) and lapply (for >> lists) work, but each returns its native class (resp. matrix and >> list): >> >> apply(mydat,2,tolower) # gives a matrix >> lapply(mydat,tolower)# gives a list >> and >> sapply(mydat,tolower)# gives a matrix > > which is exactly what R 2.0.0 did, so no recent(ish) change at all. > >> If I remember well, apply did not used to work on data.frames and >> lapply returned a data.frame when it was provided with one, with the >> same properties (columns classes etc). At least this is what my code >> written with R 2.4.* suggests. > > apply has coerced data frames for many years and lapply always > returned a list. The solution has always been > > mydat[] <- lapply(mydat,tolower) sorry about that, my previous code was misleading and indeed your code above does exactly what I need. I should have tested this a bit further before posting. I was just afraid to install two different R versions I guess. thank you again. JiHO --- http://jo.irisson.free.fr/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] apply, lapply and data.frame in R 2.5
Hello everyone, A recent (in 2.5 I suspect) change in R is giving me trouble. I want to apply a function (tolower) to all the columns of a data.frame and get a data.frame in return. Currently, on a data.frame, both apply (for arrays) and lapply (for lists) work, but each returns its native class (resp. matrix and list): apply(mydat,2,tolower) # gives a matrix lapply(mydat,tolower) # gives a list and sapply(mydat,tolower) # gives a matrix If I remember well, apply did not used to work on data.frames and lapply returned a data.frame when it was provided with one, with the same properties (columns classes etc). At least this is what my code written with R 2.4.* suggests. The solution would be: as.data.frame(apply(mydat,2,tolower)) or as.data.frame(lapply(mydat,tolower)) But this does not keep columns attributes (all columns are reinterpreted, for example strings are converted to factors etc). For my particular use stringsAsFactors=FALSE does what I need, but I am wondering wether there is a more general solution to apply a function on all elements of a data.frame and get a similar data.frame in return. Indeed data.frames are probably the most common object in R and applying a function to each of its columns/variables appears to me as something one would want to do quite often. Thank you in advance. JiHO --- http://jo.irisson.free.fr/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] x,y,z table to matrix with x as rows and y as columns
Hello all, I am sure I am missing something obvious but I cannot find the function I am looking for. I have a data frame with three columns: X, Y and Z, with X and Y being grid coordinates and Z the value associated with these coordinates. I want to transform this data frame in a matrix of Z values, on the grid defined by X and Y (and, as a plus, fill the X.Y combinations which do no exist in the original data frame with NAs in the resulting matrix). I could do this manually but I guess the appropriate function should be somewhere around. I just can't find it. Thank you in advance for your help. JiHO --- http://jo.irisson.free.fr/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with Weighted Variance in Hmisc
. Most of these rejections occurred at sites with fewer than 100 samples, in agreement with previous results. Nevertheless, the hypothesis was often rejected at sites with more than 100 samples as well. The maximum error (relative to Mw) in the 95{\%} confidence limits made by assuming a normal distribution of the Mw at the ten sites examined was about 27{\%}. Most such errors were less than 10{\%}, and errors were smaller at sampling sites with > 100 samples than at those with < 100 samples.}} Cheers, JiHO --- http://jo.irisson.free.fr/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with Weighted Variance in Hmisc
On 2007-June-01 , at 01:03 , Tom La Bone wrote: > The function wtd.var(x,w) in Hmisc calculates the weighted variance > of x > where w are the weights. It appears to me that wtd.var(x,w) = var > (x) if all > of the weights are equal, but this does not appear to be the case. Can > someone point out to me where I am going wrong here? Thanks. The true formula of weighted variance is this one: http://www.itl.nist.gov/div898/software/dataplot/refman2/ch2/ weighvar.pdf But for computation purposes, wtd.var uses another definition which considers the weights as repeats instead of true weights. However if the weights are normalized (sum to one) to two formulas are equal. If you consider weights as real weights instead of repeats, I would recommend to use this option. With normwt=T, your issue is solved: > a=1:10 > b=a > b[]=2 > b [1] 2 2 2 2 2 2 2 2 2 2 > wtd.var(a,b) [1] 8.68421 # all weights equal 2 <=> there are two repeats of each element of a > var(c(a,a)) [1] 8.68421 > wtd.var(a,b,normwt=T) [1] 9.17 > var(a) [1] 9.17 Cheers, JiHO --- http://jo.irisson.free.fr/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing multiple distributions
On 2007-May-31 , at 18:56 , Bert Gunter wrote: While Ravi's suggestion of the "compositions" package is certainly appropriate, I suspect that the complex and extensive statistical "homework" you would need to do to use it might be overwhelming (the geometry of compositions is a simplex, and this makes things hard). Yes I am reading the documentation now, which is well written but huge indeed... As a simple and perhaps useful alternative, use pairs() or splom() to plot your 5-D data, distinguishing the different treatments via color and/or symbol. In addition, it might be useful to do the same sort of plot on the first two principal components (?prcomp) of the first 4 dimensions of your 5 component vectors (since the 5th is determined by the first 4). Because of the simplicial geometry, this PCA approach is not right, but it may nevertheless be revealing. The same plotting ideas are in the compositions package done properly (in the correct geometry),so if you are motivated to do so, you can do these things there. Even if you don't dig into the details, using the compositions package version of the plots may be realtively easy to do,interpretable, and revealing -- more so than my "simple but wrong" suggestions. You can decide. I would not trust inference using ad hoc approaches in the untransformed data. That's what the package is for. But plotting the data should always be at least the first thing you do anyway. I often find it to be sufficient, too. Thank you for your suggestions on plotting, I will look into it. I was using histograms of mean proportions + SE until now because it was what seemed the most straightforward given my specific questions. If we come back to my original data (abandoning the statistical language for a while ;) ) I have proportions of fishes caught 1. near the surface, 2. a bit below, 5. near the bottom. The questions I want to ask are for example: does the vertical distribution of species A and species B differ? So I can plot the mean proportion at each depth for both species and obtain a visual representation of the vertical distribution of each. At this stage differences between fishes that accumulate near the surface or near the bottom are quite obvious. If I add error bars I can get an idea of the variability of those distributions. The issue arise when I want to *test* for a difference between the distributions of species A and B. If I use a basic KS test I can only compare the mean proportions for species A (5 points) to the mean proportions of species B (5 points) and this has low power + does not take in account the variability around those means. In addition I may also want to know wether there is a difference within species A, B and C and pairwise KS tests would increase alpha error risk. Am I explaining things correctly? Does this seem logical to you too? As for the PCA I must admit I don't really understand what you mean. Thank you very much again. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of jiho Subject: Re: [R] Comparing multiple distributions Nobody answered my first request. I am sorry if I did not explain my problem clearly. English is not my native language and statistical english is even more difficult. I'll try to summarize my issue in more appropriate statistical terms: Each of my observations is not a single number but a vector of 5 proportions (which add up to 1 for each observation). I want to compare the "shape" of those vectors between two treatments (i.e. how the quantities are distributed between the 5 values in treatment A with respect to treatment B). I was pointed to Hotelling T-squared. Does it seem appropriate? Are there other possibilities (I read many discussions about hotelling vs. manova but I could not see how any of those related to my particular case)? Thank you very much in advance for your insights. See below for my earlier, more detailed, e-mail. On 2007-May-21 , at 19:26 , jiho wrote: I am studying the vertical distribution of plankton and want to study its variations relatively to several factors (time of day, species, water column structure etc.). So my data is special in that, at each sampling site (each observation), I don't have *one* number, I have *several* numbers (abundance of organisms in each depth bin, I sample 5 depth bins) which describe a vertical distribution. Then let say I want to compare speciesA with speciesB, I would end up trying to compare a group of several distributions with another group of several distributions (where a "distribution" is a vector of 5 numbers: an abundance for each depth bin). Does anyone know how I could do this (with R obviously ;) )? Currently I kind of get around the problem and: - compute mean abundance per depth bin within each group and compare the two mean distribu
Re: [R] Comparing multiple distributions
Nobody answered my first request. I am sorry if I did not explain my problem clearly. English is not my native language and statistical english is even more difficult. I'll try to summarize my issue in more appropriate statistical terms: Each of my observations is not a single number but a vector of 5 proportions (which add up to 1 for each observation). I want to compare the "shape" of those vectors between two treatments (i.e. how the quantities are distributed between the 5 values in treatment A with respect to treatment B). I was pointed to Hotelling T-squared. Does it seem appropriate? Are there other possibilities (I read many discussions about hotelling vs. manova but I could not see how any of those related to my particular case)? Thank you very much in advance for your insights. See below for my earlier, more detailed, e-mail. On 2007-May-21 , at 19:26 , jiho wrote: I am studying the vertical distribution of plankton and want to study its variations relatively to several factors (time of day, species, water column structure etc.). So my data is special in that, at each sampling site (each observation), I don't have *one* number, I have *several* numbers (abundance of organisms in each depth bin, I sample 5 depth bins) which describe a vertical distribution. Then let say I want to compare speciesA with speciesB, I would end up trying to compare a group of several distributions with another group of several distributions (where a "distribution" is a vector of 5 numbers: an abundance for each depth bin). Does anyone know how I could do this (with R obviously ;) )? Currently I kind of get around the problem and: - compute mean abundance per depth bin within each group and compare the two mean distributions with a ks.test but this obviously diminishes the power of the test (I only compare 5*2 "observations") - restrict the information at each sampling site to the mean depth weighted by the abundance of the species of interest. This way I have one observation per station but I reduce the information to the mean depths while the actual repartition is important also. I know this is probably not directly R related but I have already searched around for solutions and solicited my local statistics expert... to no avail. So I hope that the stats' experts on this list will help me. Thank you very much in advance. JiHO --- http://jo.irisson.free.fr/ -- Ce message a été vérifié par MailScanner pour des virus ou des polluriels et rien de suspect n'a été trouvé. CRI UPVD http://www.univ-perp.fr __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot(......,new=T) vs. par(new=T)
On 2007-May-22 , at 13:51 , John Kane wrote: > ?par > There are several parameters can only be set by a call > to par(): "new" > > You just were lucky enough to find one. Yes sorry about that, I saw this afterwards. I read the help pages a while ago and it seems it's time to take a re-read tour. Thank you. JiHO --- http://jo.irisson.free.fr/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot(......,new=T) vs. par(new=T)
Hello everybody, This is probably a classic but I cannot find an answer to this on the mailing list (i.e. with a google search restricted to the mailing list archive). Setting: par(new=T) plot(x,y) works but plot(x,y,new=T) doesn't while it is said in plot's help that ... arguments are passed to par. What am I missing? JiHO --- http://jo.irisson.free.fr/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] quartz() on MAC OSX
On 2007-May-21 , at 08:14 , Rolf Turner wrote: > I am (desperately) trying to get used to using a Mac here at my new > location. (Why *anyone* would ever use anything other than Linux, > except > under duress as I am, totally escapes me, but that's another story.) > Oh that's harsh, Mac OS X is quite a good citizen and probably one of the best Unices out there. It is true that it has "its own way of doing things" and that's actually why Mac users love their Mac (there is kind of a Mac way of life ;) ). If you try to fight against it, you'll loose, but if you try to do things the Mac way, it ends up being a very efficient desktop (there are several things I know I would really miss if I had to switch back to Linux: smart folders, nice antialiased graphics, very good font management etc.) > Fortunately much of the Mac OSX is actually Unix, so a civilized > person can > manage to carry on ... But there are some things. (Like this > deleted> mailer ... But that's another story.) > If you want OS X to be really unix like, use DarwinPorts (or Fink). But you need to install additional software and be able to sudo. OK back to R: > When I ``open'' R using the icon on the ``dock'' several things are > unsatisfactory; like I can't clear the screen using system > ("clear"), nor can > I use vi syntax in command line editing. When I start R from the > command > line (as a civilized person would do) these unsatisfactory > circumstances go > away, but then a new one rears its ugly head: I can't plot!!! If > I try a > plot without explicitly opening a plotting device, a postscript > device with > file name ``Rplots.ps'' is silently opened. If I try opening a > device with > quartz() to get an on-screen plot, I get a warning message > > quartz() device interactivity reduced without an event loop manager > in: > quartz() > > And a little coloured wheel spins round and round and the quartz() > window > that opens hides underneath the terminal window and appears to be > frozen to > the spot. > > Apparently ``it'' wants .Platform$GUI to be equal to "AQUA", but it is > (under the circumstances) "X11". > Yes, this is a known limitation: quartz() has to be started from RGUI (or JGR also I think) and can't be started from the terminal without some tinkering: https://stat.ethz.ch/pipermail/r-sig-mac/2004-September/001269.html [NB: this question is probably more for the R-SIG-Mac mailing list by the way] > Trying to open a device using x11() simply results in an error. > Is there any way to get a working on-screen graphics window under > these > circumstances? > Is X11 installed on you system? Which OS X version do you have? Basically you need 2 things to get x11 going from Terminal.app (i.e. the mac terminal, not an xterm): - to install X11 and launch it - to set the DISPLAY variable (to :0.0 for example) I have export DISPLAY=:0.0 in my .bashrc and I can open any x11 application directly from a Terminal. > I am very much hand-cuffed by the officious ITS policies here as to > what > I can install on my Mac. (Effectively, nothing.) You *need* to install additional software on a Mac to do anything else that email/web/amusement... as with any other platform I guess. So you'll need to convince your ITs to give you a little more freedom and you'll probably enjoy the Mac afterwards. If you want a nice terminal replacement try iTerm (and tweak a bit the appearance settings to make it easier on the eye). If you want a very nice text editor (which can actually interact with RGUI or send text to a Terminal with a running R session) try TextMate. It costs $40 but it's the only shareware I ever bought and I don't regret a cent of it. Cheers, JiHO --- http://jo.irisson.free.fr/ NB: when I find a little time, I'll add some content to this blog which details how to get Mac OS X behave a little bit more like Linux. Everything is written I just need to proofread it and actually post it. Let me know if you are interested. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Comparing multiple distributions
Hello eveybody, I am studying the vertical distribution of plankton and want to study its variations relatively to several factors (time of day, species, water column structure etc.). So my data is special in that, at each sampling site (each observation), I don't have *one* number, I have *several* numbers (abundance of organisms in each depth bin, I sample 5 depth bins) which describe a vertical distribution. Then let say I want to compare speciesA with speciesB, I would end up trying to compare a group of several distributions with another group of several distributions (where a "distribution" is a vector of 5 numbers: an abundance for each depth bin). Does anyone know how I could do this (with R obviously ;) )? Currently I kind of get around the problem and: - compute mean abundance per depth bin within each group and compare the two mean distributions with a ks.test but this obviously diminishes the power of the test (I only compare 5*2 "observations") - restrict the information at each sampling site to the mean depth weighted by the abundance of the species of interest. This way I have one observation per station but I reduce the information to the mean depths while the actual repartition is important also. I know this is probably not directly R related but I have already searched around for solutions and solicited my local statistics expert... to no avail. So I hope that the stats' experts on this list will help me. Thank you very much in advance. JiHO --- http://jo.irisson.free.fr/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] displaying intensity through opacity on an image (ONE SOLUTION)
On 2007-May-19 , at 15:08 , Ranjan Maitra wrote: > On Sat, 19 May 2007 22:05:36 +1000 Jim Lemon <[EMAIL PROTECTED]> > wrote: >> Ranjan Maitra wrote: >>> ... >>> (we are out of R). >>> >>> And then look at the pdf file created: by default it is Rplots.pdf. >>> >>> OK, now we can use gimp, simply to convert this to .eps. >>> Alternatively on linux, the command pdftops and then psto epsi on >>> it would also work. >>> >>> Yippee! Isn't R wonderful?? >>> >> Sure is. You could probably save one step by using postscript() >> instead >> of pdf() and get an eps file directly. The reason I didn't answer the >> first time is I couldn't quite figure out how to do what you wanted. > > Thanks, Jim! Not a problem, But will postscript() work? I thought > that help file said that only pdf and MacOSX quartz would work (at > the time it was written). > > It certainly does not work for me on the screen. > > Btw, I made an error in writing the previous e-mail: the command to > convert to .eps from .ps is ps2epsi. I haven't followed the discussion from the beginning but, independently of R, some image formats support transparency while others don't. PDF supports transparency but EPS and PS don't. So you can't expect R's postscript() device to support it (and you will loose it when converting a pdf to and eps or a ps file). SVG support transparency beautifully and you'll be able to edit it with Inkscape (which is cross platform). R can produce SVG thrhough the package RSvgDevice. Furthermore, if you open a PDF (or any vector based format such as EPS or PS) with Gimp it will "rasterize" it: convert the vector information to pixels. You'll be able to save it to many formats but it will still be pixel based (zooming on it will reveal pixels while it's not true with vector based formats). http://en.wikipedia.org/wiki/Vector_Graphics http://en.wikipedia.org/wiki/Raster_graphics Hope that helps. JiHO --- http://jo.irisson.free.fr/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lapply not reading arguments from the correct environment
On 2007-May-18 , at 18:21 , Gabor Grothendieck wrote: > In particular, we can use "[" directly instead of subset. This is the > same as your function except for the line marked ### : > > myfun2 <- function() { > foo = data.frame(1:10,10:1) > foos = list(foo) > fooCollumn=2 > cFoo = lapply(foos, "[", fooCollumn) ### > return(cFoo) > } > myfun2() # test > > On 5/18/07, Prof Brian Ripley <[EMAIL PROTECTED]> wrote: >> You need to study carefully what the semantics of 'subset' are. The >> function body of myfun is not in the evaluation environment. (The >> issue >> is 'subset', not 'lapply': select is an *expression* and not a >> value.) >> >> Hint: using subset() programmatically is almost always a mistake. >> R's >> subsetting function is '[': subset is a convenience wrapper. Thank you very much. Indeed it is much better this way. I got used to subset for data.frames because [ does not work with negative named arguments while select does. E.g.: x[,-c("name1","name2")] does not work while subset(x,select=-c("name1","name2")) works (it eliminates columns named name1 and name 2 from x). But I guess in most cases an other syntax can achieve the same thing with [, like: x[,-which(names(x)%in%c("name1","name2"))] it's just a little less clear. Thanks again. JiHO --- http://jo.irisson.free.fr/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lapply not reading arguments from the correct environment
On 2007-May-18 , at 17:09 , Thomas Lumley wrote: > On Fri, 18 May 2007, jiho wrote: >> I am facing a problem with lapply which I ''''think''' may be a bug. >> This is the most basic function in which I can reproduce it: >> >> myfun <- function() >> { >> foo = data.frame(1:10,10:1) >> foos = list(foo) >> fooCollumn=2 >> cFoo = lapply(foos,subset,select=fooCollumn) >> return(cFoo) >> } >> > >> I get this error: >> Error in eval(expr, envir, enclos) : object "fooCollumn" not found >> while fooCollumn is defined, in the function, right before lapply. > >> This is with R 2.5.0 on both OS X and Linux (Fedora Core 6) >> What did I do wrong? Is this indeed a bug? An intended behavior? > > The problem is that subset() evaluates its "select" argument in an > unusual way. Usually the argument would be evaluated inside myfun() > and the value passed to lapply(), and everything would work as you > expect. > subset() bypasses the normal evaluation and explicitly evaluates > the "select" argument in the calling frame, ie, inside lapply(), > where fooCollumn is not visible. > You could do > lapply(foos, function(foo) subset(foo, select=fooCollum)) > capturing fooCollum by lexical scope. In R this is often a better > option than passing extra arguments to lapply (or other functions > that take function arguments). Thank you very much, this works well indeed. I agree it is a bit confusing, to say the least. The point is that supplying other arguments in the ... of lapply worked for all other functions I tried before (mean, sd, summary and even spline) so it is really a problem with subset. Anyway, R is great even with such little flaws here and there and as long as the community is there to support it, it will rule. Cheers, JiHO --- http://jo.irisson.free.fr/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lapply not reading arguments from the correct environment
Hello, I am facing a problem with lapply which I ''''think''' may be a bug. This is the most basic function in which I can reproduce it: myfun <- function() { foo = data.frame(1:10,10:1) foos = list(foo) fooCollumn=2 cFoo = lapply(foos,subset,select=fooCollumn) return(cFoo) } I am building a list of dataframes, in each of which I want to keep only column 2 (obviously I would not do it this way in real life but that's just to demonstrate the bug). If I execute the commands inline it works but if I clean my environment, then define the function and then execute: > myfun() I get this error: Error in eval(expr, envir, enclos) : object "fooCollumn" not found while fooCollumn is defined, in the function, right before lapply. In addition, if I define it outside the function and then execute the function: > fooCollumn=1 > myfun() it works but uses the value defined in the general environment and not the one defined in the function. This is with R 2.5.0 on both OS X and Linux (Fedora Core 6) What did I do wrong? Is this indeed a bug? An intended behavior? Thanks in advance. JiHO --- http://jo.irisson.free.fr/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.