Re: [R] Problem of vocabulary : retrieve element of a list of a list

2007-08-31 Thread jiho
On 2007-August-31  , at 10:17 , Ptit_Bleu wrote:

>> x<-list(LETTERS[1:5], LETTERS[10:20])

not sure to have understood exactly what you meant.
if you want to search for the D in the list:
lapply(x,charmatch,"D")
should get you started.

if you just want to know the syntax to extract an element from a list
x[[1]][4]
will get you the "D". but I a sure you would have found out if you  
read the manual carefully.

maybe you should read an R introduction and practice on the examples  
there rather than go straight into your own data. It would take a  
week at most and is very rewarding in the long term.
An introduction in english:
http://cran.r-project.org/doc/manuals/R-intro.pdf
A nice one in French
http://www.cran.r-project.org/doc/contrib/Paradis-rdebuts_fr.pdf

Cheers,

JiHO
---
http://jo.irisson.free.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Excel

2007-08-30 Thread jiho
On 2007-August-31  , at 00:13 , David Scott wrote:
> On Thu, 30 Aug 2007, Duncan Murdoch wrote:
>> On 8/28/2007 3:16 AM, J Dougherty wrote:
>>> On Monday 27 August 2007 22:21, David Scott wrote:
>>>> On Tue, 28 Aug 2007, Robert A LaBudde wrote:
>>>>> If you format the column as "Text", you won't have this  
>>>>> problem. By
>>>>> leaving the cells as "General", you leave it up to Excel to  
>>>>> guess at
>>>>> the correct interpretation.
>>>>
>>>> Not true actually. I had converted the column to Text because I  
>>>> saw the
>>>> interpretation as a date in the .xls file. I saved the .csv file  
>>>> *after*
>>>> the column had been converted to Text. Looking at the .csv file  
>>>> in a text
>>>> editor, the entry is correct.
>>>>
>>>> I have just rechecked this.
>>>>
>>>> On reopening the .csv using Excel, the entry AUG2699 had been  
>>>> interpreted
>>>> as a date, and was showing as Aug-99. Most bizarre is that the  
>>>> NHI value
>>>> of AUG1838 has *not* been interpreted as a date.
>>>>
>>> Actually, in Excel 2000, he's right.  What you have to is be sure  
>>> of is that
>>> the "'" that denotes a text entry precedes EVERY entry that can  
>>> be confused
>>> with a date.  Selecting the entire column and setting the format  
>>> to "text"
>>> *before* data is entered does this.  It will also create an  
>>> appropriate *.csv
>>> file.  Excel is notable too because it will automatically convert  
>>> "date-like"
>>> entries as you type.  In a column of IDs or similar critical  
>>> data, that
>>> behaviour is really bad.  I have never tried the MS site, but I  
>>> haven't been
>>> able to find any entry about how to turn that particular  
>>> automatic behaviour
>>> off.
>>>
>>> However, while I have not experimented extensively, as far as I have
>>> experimented, OpenOffice spreadsheet does not behave this way.
>>
>> I don't use Excel, but in OpenOffice 2.2.1 the ' is lost when a  
>> file is
>> saved as .csv and reloaded.  So if I take care and enter
>>
>> 'November 15
>>
>> in a cell, then save it, OO will change it to 11/15/2007 when I  
>> reload.
>>  I can override this change by manually changing "Standard" format to
>> "Text" *every time* I load the file.  There's a help index entry  
>> "date
>> formats;avoiding conversion to", but it offers no more help than  
>> "add an
>> apostrophe at the beginning of the entry".
>>
>> This is brain-dead behaviour.
>
> This was the behaviour that really scared me in Excel: saving as .csv
> loses any formatting (it is just an ascii file, how can it have  
> formatting
> info?). Then opening in Excel (or it seems OO), the incorrect date
> interpretation occurs. If I then save the .csv I have erroneous data.
>
> I often do just this sort of thing because I get given data  
> in .xls, it
> has clunky column names or extraneous stuff so I alter it, save it as
> .csv. Then I get a data correction, some clarification of a value,  
> so I
> want to go to the .csv to correct that data value. Once I do that  
> if I am
> not *extremely* careful, before saving the .csv file, I have a  
> problem.

I'll probably advise everyone to use Gnumeric then:
- entries such as 2005/06/08 are interpreted as date and show as  
8/6/2005. but even if you change them to 8/7/05 for example they will  
be written in the csv in your original format, with the change  
included (i.e. 2005/07/08 here)
- entries with several decimals such as 1.4563 can be formatted to be  
displayed 1.46 but will still be written 1.4563 in the csv
- there is no text import/export dialog when opening or closing csv  
files which speeds up things quite a bit. but you can get the dialog  
if you are so inclined

Still some problems
- "0568" in the csv, which is a label (notice the quotes and leading  
zero) is still interpreted as a number by default
- the date is in fact written using the default preferences (namely  
/mm/dd) and some date in ISO format (-mm-dd) is converted to  
/mm/dd when written in csv

So not perfect but much better (and quicker and possibly more  
precise) than both Excel and OO Calc. Oh and cross platform also ;).

JiHO
---
http://jo.irisson.free.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Variance explained by cluster analysis

2007-08-28 Thread jiho
Hello,

As suggested in "De'ath, 2002. Multivariate regression trees: A new  
technique for modelling species-environment relationships. Ecology, 83 
(4):1105-1117" (for those interested), I am trying to compare the  
performance of a multivariate regression tree to a cluster analysis.  
A simple partitioning with k clusters (as done by `pam`) seemed  
straightforward and appropriate to compare to an MRT with k leaves.
Now I am looking for a measure of how much variance each of these  
methods explains. The MRT analysis provides me with such a measure. I  
was wondering what I could use in a cluster analysis. When plotting  
the pam object with which.plots=clusplot, there is a message at the  
bottom of the plot: "These two components explain x% of the point  
variability". Can I safely assume that this is a percentage of  
variance explained by the k clusters? Is there anything else that I  
could compute?
More generally, am I totally wrong in comparing these two methods?  
Are there some references particularly appropriate to this? (NB: I am  
already hunting down the Kaufman, L. and Rousseeuw
  book)

Thank you in advance for your help.

JiHO
---
http://jo.irisson.free.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] superposing lattice plots

2007-08-17 Thread jiho
Hello everyone,

I am sorry if this has already been asked but I can't find it. I want  
to superpose two lattice plots, namely a levelplot and a contourplot  
of two different variables with the same x-y scale. I found  
information about panel.superpose but it does not seem to correspond  
to what I want (I have two different variables, not groups of the  
same variable)

How can I do this? Is there a way to concatenate the two trellis  
objects and plot that?

Simple example using simulated data:

x=seq(-5,5,length.out=100)
y=seq(-2,2,length.out=60)
mat1=cos(x)%*%t(cos(y))
mat2=cos(x)%*%t(sin(y))
levelplot(mat1)
contourplot(mat2)

I would like both plots to appear superposed.

PS: accessory question, for the enthusiast ;). When data in contained  
in a matrix and x-y coordinates in separate vectors, as above, is  
there a way to get level/contourplot to use x and y as the  
coordinates vectors other than by "unrolling" the matrix in a  
data.frame:
x   y   mat
1   1   0.125
1   2   0.1367
1   3   0.2345

and using mat ~ x*y ?

Thank you in advance. Sincerely,

JiHO
---
http://jo.irisson.free.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] apply, lapply and data.frame in R 2.5

2007-07-30 Thread jiho

On 2007-July-30  , at 12:20 , Prof Brian Ripley wrote:
> On Mon, 30 Jul 2007, jiho wrote:
>> A recent (in 2.5 I suspect) change in R is giving me trouble. I want
>> to apply a function (tolower) to all the columns of a data.frame and
>> get a data.frame in return.
>> Currently, on a data.frame, both apply (for arrays) and lapply (for
>> lists) work, but each returns its native class (resp. matrix and  
>> list):
>>
>> apply(mydat,2,tolower)   # gives a matrix
>> lapply(mydat,tolower)# gives a list
>> and
>> sapply(mydat,tolower)# gives a matrix
>
> which is exactly what R 2.0.0 did, so no recent(ish) change at all.
>
>> If I remember well, apply did not used to work on data.frames and
>> lapply returned a data.frame when it was provided with one, with the
>> same properties (columns classes etc). At least this is what my code
>> written with R 2.4.* suggests.
>
> apply has coerced data frames for many years and lapply always  
> returned a list.  The solution has always been
>
> mydat[] <- lapply(mydat,tolower)

sorry about that, my previous code was misleading and indeed your  
code above does exactly what I need. I should have tested this a bit  
further before posting. I was just afraid to install two different R  
versions I guess.
thank you again.

JiHO
---
http://jo.irisson.free.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] apply, lapply and data.frame in R 2.5

2007-07-30 Thread jiho
Hello everyone,

A recent (in 2.5 I suspect) change in R is giving me trouble. I want  
to apply a function (tolower) to all the columns of a data.frame and  
get a data.frame in return.
Currently, on a data.frame, both apply (for arrays) and lapply (for  
lists) work, but each returns its native class (resp. matrix and list):

apply(mydat,2,tolower)  # gives a matrix
lapply(mydat,tolower)   # gives a list
and
sapply(mydat,tolower)   # gives a matrix

If I remember well, apply did not used to work on data.frames and  
lapply returned a data.frame when it was provided with one, with the  
same properties (columns classes etc). At least this is what my code  
written with R 2.4.* suggests.

The solution would be:
as.data.frame(apply(mydat,2,tolower))
or
as.data.frame(lapply(mydat,tolower))

But this does not keep columns attributes (all columns are  
reinterpreted, for example strings are converted to factors etc). For  
my particular use stringsAsFactors=FALSE does what I need, but I am  
wondering wether there is a more general solution to apply a function  
on all elements of a data.frame and get a similar data.frame in  
return. Indeed data.frames are probably the most common object in R  
and applying a function to each of its columns/variables appears to  
me as something one would want to do quite often.

Thank you in advance.

JiHO
---
http://jo.irisson.free.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] x,y,z table to matrix with x as rows and y as columns

2007-07-24 Thread jiho
Hello all,

I am sure I am missing something obvious but I cannot find the  
function I am looking for. I have a data frame with three columns: X,  
Y and Z, with X and Y being grid coordinates and Z the value  
associated with these coordinates. I want to transform this data  
frame in a matrix of Z values, on the grid defined by X and Y (and,  
as a plus, fill the X.Y combinations which do no exist in the  
original data frame with NAs in the resulting matrix). I could do  
this manually but I guess the appropriate function should be  
somewhere around. I just can't find it.

Thank you in advance for your help.

JiHO
---
http://jo.irisson.free.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with Weighted Variance in Hmisc

2007-06-01 Thread jiho
. Most of these  
rejections occurred at sites with fewer than 100 samples, in  
agreement with previous results. Nevertheless, the hypothesis was  
often rejected at sites with more than 100 samples as well. The  
maximum error (relative to Mw) in the 95{\%} confidence limits made  
by assuming a normal distribution of the Mw at the ten sites examined  
was about 27{\%}. Most such errors were less than 10{\%}, and errors  
were smaller at sampling sites with > 100 samples than at those with  
< 100 samples.}}

Cheers,

JiHO
---
http://jo.irisson.free.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with Weighted Variance in Hmisc

2007-05-31 Thread jiho
On 2007-June-01  , at 01:03 , Tom La Bone wrote:
> The function wtd.var(x,w) in Hmisc calculates the weighted variance  
> of x
> where w are the weights.  It appears to me that wtd.var(x,w) = var 
> (x) if all
> of the weights are equal, but this does not appear to be the case. Can
> someone point out to me where I am going wrong here?  Thanks.

The true formula of weighted variance is this one:
http://www.itl.nist.gov/div898/software/dataplot/refman2/ch2/ 
weighvar.pdf
But for computation purposes, wtd.var uses another definition which  
considers the weights as repeats instead of true weights. However if  
the weights are normalized (sum to one) to two formulas are equal. If  
you consider weights as real weights instead of repeats, I would  
recommend to use this option.
With normwt=T, your issue is solved:

 > a=1:10
 > b=a
 > b[]=2
 > b
[1] 2 2 2 2 2 2 2 2 2 2
 > wtd.var(a,b)
[1] 8.68421
# all weights equal 2 <=> there are two repeats of each element of a
 > var(c(a,a))
[1] 8.68421
 > wtd.var(a,b,normwt=T)
[1] 9.17
 > var(a)
[1] 9.17

Cheers,

JiHO
---
http://jo.irisson.free.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing multiple distributions

2007-05-31 Thread jiho


On 2007-May-31  , at 18:56 , Bert Gunter wrote:

While Ravi's suggestion of the "compositions" package is certainly
appropriate, I suspect that the complex and extensive statistical  
"homework"

you would need to do to use it might be overwhelming (the geometry of
compositions is a simplex, and this makes things hard).


Yes I am reading the documentation now, which is well written but  
huge indeed...



As a simple and
perhaps useful alternative, use pairs() or splom() to plot your 5-D  
data,

distinguishing the different treatments via color and/or symbol.

In addition, it might be useful to do the same sort of plot on the  
first two
principal components (?prcomp) of the first 4 dimensions of your 5  
component

vectors (since the 5th is determined by the first 4). Because of the
simplicial geometry, this PCA approach is not right, but it may  
nevertheless
be revealing. The same plotting ideas are in the compositions  
package done
properly (in the correct geometry),so if you are motivated to do  
so, you can
do these things there. Even if you don't dig into the details,  
using the

compositions package version of the plots may be realtively easy to
do,interpretable, and revealing -- more so than my "simple but wrong"
suggestions. You can decide.

I would not trust inference using ad hoc approaches in the  
untransformed
data. That's what the package is for. But plotting the data should  
always be
at least the first thing you do anyway. I often find it to be  
sufficient,

too.


Thank you for your suggestions on plotting, I will look into it. I  
was using histograms of mean proportions + SE until now because it  
was what seemed the most straightforward given my specific questions.  
If we come back to my original data (abandoning the statistical  
language for a while ;) ) I have proportions of fishes caught 1. near  
the surface, 2. a bit below,  5. near the bottom. The questions I  
want to ask are for example: does the vertical distribution of  
species A and species B differ? So I can plot the mean proportion at  
each depth for both species and obtain a visual representation of the  
vertical distribution of each.
At this stage differences between fishes that accumulate near the  
surface or near the bottom are quite obvious. If I add error bars I  
can get an idea of the variability of those distributions. The issue  
arise when I want to *test* for a difference between the  
distributions of species A and B. If I use a basic KS test I can only  
compare the mean proportions for species A (5 points) to the mean  
proportions of species B (5 points) and this has low power + does not  
take in account the variability around those means. In addition I may  
also want to know wether there is a difference within species A, B  
and C and pairwise KS tests would increase alpha error risk. Am I  
explaining things correctly? Does this seem logical to you too?

As for the PCA I must admit I don't really understand what you mean.

Thank you very much again.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of jiho
Subject: Re: [R] Comparing multiple distributions

Nobody answered my first request. I am sorry if I did not explain my
problem clearly. English is not my native language and statistical
english is even more difficult. I'll try to summarize my issue in
more appropriate statistical terms:

Each of my observations is not a single number but a vector of 5
proportions (which add up to 1 for each observation). I want to
compare the "shape" of those vectors between two treatments (i.e. how
the quantities are distributed between the 5 values in treatment A
with respect to treatment B).

I was pointed to Hotelling T-squared. Does it seem appropriate? Are
there other possibilities (I read many discussions about hotelling
vs. manova but I could not see how any of those related to my
particular case)?

Thank you very much in advance for your insights. See below for my
earlier, more detailed, e-mail.

On 2007-May-21  , at 19:26 , jiho wrote:

I am studying the vertical distribution of plankton and want to
study its variations relatively to several factors (time of day,
species, water column structure etc.). So my data is special in
that, at each sampling site (each observation), I don't have *one*
number, I have *several* numbers (abundance of organisms in each
depth bin, I sample 5 depth bins) which describe a vertical
distribution.

Then let say I want to compare speciesA with speciesB, I would end
up trying to compare a group of several distributions with another
group of several distributions (where a "distribution" is a vector
of 5 numbers: an abundance for each depth bin). Does anyone know
how I could do this (with R obviously ;) )?

Currently I kind of get around the problem and:
- compute mean abundance per depth bin within each group and
compare the two mean distribu

Re: [R] Comparing multiple distributions

2007-05-31 Thread jiho
Nobody answered my first request. I am sorry if I did not explain my  
problem clearly. English is not my native language and statistical  
english is even more difficult. I'll try to summarize my issue in  
more appropriate statistical terms:


Each of my observations is not a single number but a vector of 5  
proportions (which add up to 1 for each observation). I want to  
compare the "shape" of those vectors between two treatments (i.e. how  
the quantities are distributed between the 5 values in treatment A  
with respect to treatment B).


I was pointed to Hotelling T-squared. Does it seem appropriate? Are  
there other possibilities (I read many discussions about hotelling  
vs. manova but I could not see how any of those related to my  
particular case)?


Thank you very much in advance for your insights. See below for my  
earlier, more detailed, e-mail.


On 2007-May-21  , at 19:26 , jiho wrote:
I am studying the vertical distribution of plankton and want to  
study its variations relatively to several factors (time of day,  
species, water column structure etc.). So my data is special in  
that, at each sampling site (each observation), I don't have *one*  
number, I have *several* numbers (abundance of organisms in each  
depth bin, I sample 5 depth bins) which describe a vertical  
distribution.


Then let say I want to compare speciesA with speciesB, I would end  
up trying to compare a group of several distributions with another  
group of several distributions (where a "distribution" is a vector  
of 5 numbers: an abundance for each depth bin). Does anyone know  
how I could do this (with R obviously ;) )?


Currently I kind of get around the problem and:
- compute mean abundance per depth bin within each group and  
compare the two mean distributions with a ks.test but this  
obviously diminishes the power of the test (I only compare 5*2  
"observations")
- restrict the information at each sampling site to the mean depth  
weighted by the abundance of the species of interest. This way I  
have one observation per station but I reduce the information to  
the mean depths while the actual repartition is important also.


I know this is probably not directly R related but I have already  
searched around for solutions and solicited my local statistics  
expert... to no avail. So I hope that the stats' experts on this  
list will help me.


Thank you very much in advance.


JiHO
---
http://jo.irisson.free.fr/



--
Ce message a été vérifié par MailScanner
pour des virus ou des polluriels et rien de
suspect n'a été trouvé.
CRI UPVD http://www.univ-perp.fr

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot(......,new=T) vs. par(new=T)

2007-05-22 Thread jiho
On 2007-May-22  , at 13:51 , John Kane wrote:
> ?par
> There are several parameters can only be set by a call
> to par(): "new"
>
> You just were lucky enough to find one.

Yes sorry about that, I saw this afterwards. I read the help pages a  
while ago and it seems it's time to take a re-read tour.
Thank you.

JiHO
---
http://jo.irisson.free.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plot(......,new=T) vs. par(new=T)

2007-05-21 Thread jiho
Hello everybody,

This is probably a classic but I cannot find an answer to this on the  
mailing list (i.e. with a google search restricted to the mailing  
list archive). Setting:
par(new=T)
plot(x,y)
works but
plot(x,y,new=T)
doesn't while it is said in plot's help that ... arguments are passed  
to par. What am I missing?

JiHO
---
http://jo.irisson.free.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] quartz() on MAC OSX

2007-05-21 Thread jiho
On 2007-May-21  , at 08:14 , Rolf Turner wrote:

> I am (desperately) trying to get used to using a Mac here at my new
> location. (Why *anyone* would ever use anything other than Linux,  
> except
> under duress as I am, totally escapes me, but that's another story.)
>

Oh that's harsh, Mac OS X is quite a good citizen and probably one of  
the best Unices out there. It is true that it has "its own way of  
doing things" and that's actually why Mac users love their Mac (there  
is kind of a Mac way of life ;) ). If you try to fight against it,  
you'll loose, but if you try to do things the Mac way, it ends up  
being a very efficient desktop (there are several things I know I  
would really miss if I had to switch back to Linux: smart folders,  
nice antialiased graphics, very good font management etc.)


> Fortunately much of the Mac OSX is actually Unix, so a civilized  
> person can
> manage to carry on ... But there are some things.  (Like this  
>  deleted> mailer ... But that's another story.)
>

If you want OS X to be really unix like, use DarwinPorts (or Fink).  
But you need to install additional software and be able to sudo.

OK back to R:


> When I ``open'' R using the icon on the ``dock'' several things are
> unsatisfactory; like I can't clear the screen using system 
> ("clear"), nor can
> I use vi syntax in command line editing.  When I start R from the  
> command
> line (as a civilized person would do) these unsatisfactory  
> circumstances go
> away, but then a new one rears its ugly head:  I can't plot!!!  If  
> I try a
> plot without explicitly opening a plotting device, a postscript  
> device with
> file name ``Rplots.ps'' is silently opened.  If I try opening a  
> device with
> quartz() to get an on-screen plot, I get a warning message
>
> quartz() device interactivity reduced without an event loop manager  
> in:
> quartz()
>
> And a little coloured wheel spins round and round and the quartz()  
> window
> that opens hides underneath the terminal window and appears to be  
> frozen to
> the spot.
>
> Apparently ``it'' wants .Platform$GUI to be equal to "AQUA", but it is
> (under the circumstances) "X11".
>

Yes, this is a known limitation: quartz() has to be started from RGUI  
(or JGR also I think) and can't be started from the terminal without  
some tinkering:
https://stat.ethz.ch/pipermail/r-sig-mac/2004-September/001269.html
[NB: this question is probably more for the R-SIG-Mac mailing list by  
the way]


> Trying to open a device using x11() simply results in an error.
> Is there any way to get a working on-screen graphics window under  
> these
> circumstances?
>

Is X11 installed on you system? Which OS X version do you have?  
Basically you need 2 things to get x11 going from Terminal.app (i.e.  
the mac terminal, not an xterm):
- to install X11 and launch it
- to set the DISPLAY variable (to :0.0 for example)
I have
export DISPLAY=:0.0
in my .bashrc and I can open any x11 application directly from a  
Terminal.

> I am very much hand-cuffed by the officious ITS policies here as to  
> what
> I can install on my Mac.  (Effectively, nothing.)

You *need* to install additional software on a Mac to do anything  
else that email/web/amusement... as with any other platform I guess.  
So you'll need to convince your ITs to give you a little more freedom  
and you'll probably enjoy the Mac afterwards.

If you want a nice terminal replacement try iTerm (and tweak a bit  
the appearance settings to make it easier on the eye). If you want a  
very nice text editor (which can actually interact with RGUI or send  
text to a Terminal with a running R session) try TextMate. It costs  
$40 but it's the only shareware I ever bought and I don't regret a  
cent of it.

Cheers,

JiHO
---
http://jo.irisson.free.fr/
NB: when I find a little time, I'll add some content to this blog  
which details how to get Mac OS X behave a little bit more like  
Linux. Everything is written I just need to proofread it and actually  
post it. Let me know if you are interested.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Comparing multiple distributions

2007-05-21 Thread jiho
Hello eveybody,

I am studying the vertical distribution of plankton and want to study  
its variations relatively to several factors (time of day, species,  
water column structure etc.). So my data is special in that, at each  
sampling site (each observation), I don't have *one* number, I have  
*several* numbers (abundance of organisms in each depth bin, I sample  
5 depth bins) which describe a vertical distribution.

Then let say I want to compare speciesA with speciesB, I would end up  
trying to compare a group of several distributions with another group  
of several distributions (where a "distribution" is a vector of 5  
numbers: an abundance for each depth bin). Does anyone know how I  
could do this (with R obviously ;) )?

Currently I kind of get around the problem and:
- compute mean abundance per depth bin within each group and compare  
the two mean distributions with a ks.test but this obviously  
diminishes the power of the test (I only compare 5*2 "observations")
- restrict the information at each sampling site to the mean depth  
weighted by the abundance of the species of interest. This way I have  
one observation per station but I reduce the information to the mean  
depths while the actual repartition is important also.

I know this is probably not directly R related but I have already  
searched around for solutions and solicited my local statistics  
expert... to no avail. So I hope that the stats' experts on this list  
will help me.

Thank you very much in advance.

JiHO
---
http://jo.irisson.free.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] displaying intensity through opacity on an image (ONE SOLUTION)

2007-05-19 Thread jiho
On 2007-May-19  , at 15:08 , Ranjan Maitra wrote:
> On Sat, 19 May 2007 22:05:36 +1000 Jim Lemon <[EMAIL PROTECTED]>  
> wrote:
>> Ranjan Maitra wrote:
>>> ...
>>> (we are out of R).
>>>
>>> And then look at the pdf file created: by default it is Rplots.pdf.
>>>
>>> OK, now we can use gimp, simply to convert this to .eps.  
>>> Alternatively on linux, the command pdftops and then psto epsi on  
>>> it would also work.
>>>
>>> Yippee! Isn't R wonderful??
>>>
>> Sure is. You could probably save one step by using postscript()  
>> instead
>> of pdf() and get an eps file directly. The reason I didn't answer the
>> first time is I couldn't quite figure out how to do what you wanted.
>
> Thanks, Jim! Not a problem, But will postscript() work? I thought  
> that help file said that only pdf and MacOSX quartz would work (at  
> the time it was written).
>
> It certainly does not work for me on the screen.
>
> Btw, I made an error in writing the previous e-mail: the command to  
> convert to .eps from .ps is ps2epsi.

I haven't followed the discussion from the beginning but,  
independently of R, some image formats support transparency while  
others don't. PDF supports transparency but EPS and PS don't. So you  
can't expect R's postscript() device to support it (and you will  
loose it when converting a pdf to and eps or a ps file). SVG support  
transparency beautifully and you'll be able to edit it with Inkscape  
(which is cross platform). R can produce SVG thrhough the package  
RSvgDevice.
Furthermore, if you open a PDF (or any vector based format such as  
EPS or PS) with Gimp it will "rasterize" it: convert the vector  
information to pixels. You'll be able to save it to many formats but  
it will still be pixel based (zooming on it will reveal pixels while  
it's not true with vector based formats).
http://en.wikipedia.org/wiki/Vector_Graphics
http://en.wikipedia.org/wiki/Raster_graphics

Hope that helps.

JiHO
---
http://jo.irisson.free.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lapply not reading arguments from the correct environment

2007-05-18 Thread jiho
On 2007-May-18  , at 18:21 , Gabor Grothendieck wrote:
> In particular, we can use "[" directly instead of subset.  This is the
> same as your function except for the line marked ### :
>
> myfun2 <- function() {
>   foo = data.frame(1:10,10:1)
>   foos = list(foo)
>   fooCollumn=2
>   cFoo = lapply(foos, "[", fooCollumn) ###
>   return(cFoo)
> }
> myfun2() # test
>
> On 5/18/07, Prof Brian Ripley <[EMAIL PROTECTED]> wrote:
>> You need to study carefully what the semantics of 'subset' are.  The
>> function body of myfun is not in the evaluation environment.  (The  
>> issue
>> is 'subset', not 'lapply': select is an *expression* and not a  
>> value.)
>>
>> Hint: using subset() programmatically is almost always a mistake.   
>> R's
>> subsetting function is '[': subset is a convenience wrapper.

Thank you very much. Indeed it is much better this way. I got used to  
subset for data.frames because [ does not work with negative named  
arguments while select does. E.g.:
x[,-c("name1","name2")]
does not work while
subset(x,select=-c("name1","name2"))
works (it eliminates columns named name1 and name 2 from x). But I  
guess in most cases an other syntax can achieve the same thing with  
[, like:
x[,-which(names(x)%in%c("name1","name2"))]
it's just a little less clear.
Thanks again.

JiHO
---
http://jo.irisson.free.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lapply not reading arguments from the correct environment

2007-05-18 Thread jiho

On 2007-May-18  , at 17:09 , Thomas Lumley wrote:
> On Fri, 18 May 2007, jiho wrote:
>> I am facing a problem with lapply which I ''''think''' may be a bug.
>> This is the most basic function in which I can reproduce it:
>>
>> myfun <- function()
>> {
>>  foo = data.frame(1:10,10:1)
>>  foos = list(foo)
>>  fooCollumn=2
>>  cFoo = lapply(foos,subset,select=fooCollumn)
>>  return(cFoo)
>> }
>>
> 
>> I get this error:
>>  Error in eval(expr, envir, enclos) : object "fooCollumn" not found
>> while fooCollumn is defined, in the function, right before lapply.
> 
>> This is with R 2.5.0 on both OS X and Linux (Fedora Core 6)
>> What did I do wrong? Is this indeed a bug? An intended behavior?
>
> The problem is that subset() evaluates its "select" argument in an  
> unusual way. Usually the argument would be evaluated inside myfun()  
> and the value passed to lapply(), and everything would work as you  
> expect.
> subset() bypasses the normal evaluation and explicitly evaluates  
> the "select" argument in the calling frame, ie, inside lapply(),  
> where fooCollumn is not visible.
> You could do
>   lapply(foos, function(foo) subset(foo, select=fooCollum))
> capturing fooCollum by lexical scope.  In R this is often a better  
> option than passing extra arguments to lapply (or other functions  
> that take function arguments).

Thank you very much, this works well indeed. I agree it is a bit  
confusing, to say the least. The point is that supplying other  
arguments in the ... of lapply worked for all other functions I tried  
before (mean, sd, summary and even spline) so it is really a problem  
with subset. Anyway, R is great even with such little flaws here and  
there and as long as the community is there to support it, it will rule.

Cheers,

JiHO
---
http://jo.irisson.free.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lapply not reading arguments from the correct environment

2007-05-18 Thread jiho
Hello,

I am facing a problem with lapply which I ''''think''' may be a bug.  
This is the most basic function in which I can reproduce it:

myfun <- function()
{
foo = data.frame(1:10,10:1)
foos = list(foo)
fooCollumn=2
cFoo = lapply(foos,subset,select=fooCollumn)
return(cFoo)
}

I am building a list of dataframes, in each of which I want to keep  
only column 2 (obviously I would not do it this way in real life but  
that's just to demonstrate the bug).
If I execute the commands inline it works but if I clean my  
environment, then define the function and then execute:
> myfun()
I get this error:
Error in eval(expr, envir, enclos) : object "fooCollumn" not found
while fooCollumn is defined, in the function, right before lapply. In  
addition, if I define it outside the function and then execute the  
function:
> fooCollumn=1
> myfun()
it works but uses the value defined in the general environment and  
not the one defined in the function.
This is with R 2.5.0 on both OS X and Linux (Fedora Core 6)
What did I do wrong? Is this indeed a bug? An intended behavior?
Thanks in advance.

JiHO
---
http://jo.irisson.free.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.