date:20061013

Re: [R] Arrange histogram

2006-10-13 Thread Uwe Ligges



[EMAIL PROTECTED] wrote:
 The data set has a number of variables each of which is classified into two 
 groups.
 
 For each variable of each group, I need to create a histogram. All the 
 histograms are to be lined up into a file that looks like
 
group1 group2
 Variable 1 Histogram  histogram
 Variable 2 Histogram  histogram
 ...
 
 Can you give me a hint as to what package I'd look into for help?

lattice is your friend.

Uwe Ligges


 Thank you
 
 Jue Wang, Biostatistician
 Contracted Position for Preclinical  Research Biostatistics
 PrO Unlimited
 (908) 231-3022
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Need help with barplots

2006-10-13 Thread laba diena

I`ve read all the manuals and still couln`t find what is the difference
between the stacked and side-by-side barplots ? Could you explain me ?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need help with barplots

2006-10-13 Thread Peter Dalgaard

laba diena [EMAIL PROTECTED] writes:

 I`ve read all the manuals and still couln`t find what is the difference
 between the stacked and side-by-side barplots ? Could you explain me ?

Did you try

par(ask=TRUE)
example(barplot)  

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need help with barplots

2006-10-13 Thread David Barron

First, produce two barplots for comparison:

 par(mfrow=c(2,1) )
 barplot(VADeaths,beside=TRUE)
 barplot(VADeaths)

The same information is in both plots; in the top, it is displayed as
5 separate bars for each group, and in the stacked plot it is shown as
5 separate regions in each of the four bars.  The hight of each of
these regions is the same as the hight of the corresponding bar in the
side-by-side plot.  The stacked plot enables you to see overall
differences more easily (easier to see that the death rate is highest
for Urban Males), but it is harder to compare the sizes of the
categories.

On 13/10/06, laba diena [EMAIL PROTECTED] wrote:
 I`ve read all the manuals and still couln`t find what is the difference
 between the stacked and side-by-side barplots ? Could you explain me ?

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
=
David Barron
Said Business School
University of Oxford
Park End Street
Oxford OX1 1HP

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] C code for KalmnaLike

2006-10-13 Thread Gavin Simpson

On Thu, 2006-10-12 at 10:57 -0400, Leeds, Mark (IED) wrote:
 you shouldn't need it. Kalmanlike() ( spelling ) I think is in the base
 package and there is atleast
 One constributed package and probably many others that do kalman
 filtering but I can't recall the names of them.
 
 Check out the list of packages at www.r-project.org.
 

Mark, That pre-supposes that Malini just wants to perform kalman
filtering, and not look at the inner workings of the implementation in
R. KalmanLike is in package stats distributed with base R, but it is
defined as:

 KalmanLike
function (y, mod, nit = 0, fast = TRUE)
{
x - .Call(KalmanLike, y, mod$Z, mod$a, mod$P, mod$T, mod$V,
mod$h, mod$Pn, as.integer(nit), FALSE, fast = fast, PACKAGE = stats)
names(x) - c(ssq, sumlog)
s2 - x[1]/length(y)
list(Lik = 0.5 * (log(x[1]/length(y)) + x[2]/length(y)),
s2 = s2)
}
environment: namespace:stats

So, not much use in reading the R code as this just calls compiled code.
If Malini really does want to look at the C code for KalmanLike then Uwe
Ligges recently posted a preview of an article he is writing for R News,
which explains how to access various parts of R's source code. The
preview is still available from:

http://www.statistik.uni-dortmund.de/~ligges/R_Help_Desk_preview.pdf

The information contained in the article should allow Malini to find the
C for KalmanLike.

HTH,

G

 
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Malini
 Subramanian
 Sent: Thursday, October 12, 2006 9:56 AM
 To: R-help@stat.math.ethz.ch
 Subject: Re: [R] C code for KalmnaLike
 
 hi,
   i am looking for c code of kalman filtering please can you help
 me...thankyou bye...
 
   
 -
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/cv/
 London, UK. WC1E 6BT. [w] http://www.ucl.ac.uk/~ucfagls/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rarefy a matrix of counts

2006-10-13 Thread Alex Brown

I thought at first that you could use a weighted sample (the sample  
function) but, you can't since it doesn't take proper account of  
replacement if you try that.

You can use the list approach, but through the power of R, you don't  
need a lot of loops to do it...

I can't speak for the efficiency of this approach in terms of cpu cycle.

In short:

apply(z2,2,function(x)sample(rep(names(x),x),100))

In long:

#let's load the data:

z = scan(,,sep=\n)
sample.1 sample.2 sample.3
red.candy   400 300   2500
green.candy1000  200
black.candy 3001000500

#and turn into a table

  z2 = read.table(textConnection(z), header=TRUE, row.names=1)

# let's create a functon to expand a sample column into individuals:

expand - function(x) rep(names(x), x)

# test it on a smaller set:

ex - expand( c( red = 2, blue = 3) )

ex
[1] red  red  blue blue blue

# and sample 2 things from that:

sample( ex, 2 )

# combine the two

samplex - function( x, size ) sample(expand(x), size )

samplex( c( red = 2, blue = 3), size = 2 )

# ok, now we use the apply function to apply this to each column

apply(z2, 2, samplex, size = 2 )

# you wanted 100?

apply(z2, 2, samplex, size = 100 )

# all done.

#You should note that if there are less than 100 (samplenumber)  
candies in any given sample, this function will fail.
# eg:

apply(z2, 2, samplex, size = 2000 )

Error in sample(length(x), size, replace, prob) :
cannot take a sample larger than the population
when 'replace = FALSE'

-Alex

On 11 Oct 2006, at 15:10, Brian Frappier wrote:

 Hi Petr,

 Thanks for your response.  I have data that looks like the following:

sample 1 sample 2 sample 3  
 red candy400 300   2500
 green candy1000  200
 black candy 3001000500

 I don't want to randomly select either the samples (columns) or the  
 candy
 types (rows), which sample as you state would allow me.  Instead, I  
 want to
 randomly sample 100 candies from each sample and retain info on their
 associated type.  I could make a list of all the candies in each  
 sample:

 sample 1
 red
 red
 red
 red
 green
 green
 black
 red
 black
 ...

 and then randomly sample those rows.  Repeat for each sample.  But,  
 I am not
 sure how to do that without alot of loops, and am wondering if  
 there is an
 easier way in R.  Thanks!  I should have laid this out in the first
 email...sorry.


 On 10/11/06, Petr Pikal [EMAIL PROTECTED] wrote:

 Hi

 I am not experienced in Matlab and from your explanation I do not
 understand what exactly do you want. It seems that you want randomly
 choose a sample of 100 rows from your martix, what can be achived by
 sample.

 DF-data.frame(rnorm(100), 1:100, 101:200, 201:300)
 DF[sample(1:100, 10),]

 If you want to do this several times, you need to save your result
 and than it depends on what you want to do next. One suitable form is
 list of matrices the other is array and you can use for loop for
 completing it.

 HTH
 Petr


 On 10 Oct 2006 at 17:40, Brian Frappier wrote:

 Date sent:  Tue, 10 Oct 2006 17:40:47 -0400
 From:   Brian Frappier [EMAIL PROTECTED]
 To: r-help@stat.math.ethz.ch
 Subject:[R] rarefy a matrix of counts

 Hi all,

 I have a matrix of counts for objects (rows) by samples  
 (columns).  I
 aimed for about 500 counts in each sample (I have about 80 samples)
 and would now like to rarefy these down to 100 counts in each sample
 using simple random sampling without replacement.  I plan on  
 rarefying
 several times for each sample.  I could do the tedious looping  
 task of
 making a list of all objects (with its associated identifier) in  
 each
 sample and then use the wonderful sampling package to select a
 sub-sample of 100 for each sample and thereby get a logical  
 vector of
 inclusions.  I would then regroup the resulting logical vector  
 into a
 vector of counts by object, rinse and repeat several times for each
 sample.

 Alternately, using the same list, I could create a random index of
 integers between 1 and the number of objects for a sample (without
 repeats) and then select those objects from the list.  Again, rinse
 and repeat several time for each sample.

 Is there a way to directly rarefy a matrix of counts without  
 having to
 create a list of objects first?  I am trying to switch to R from
 Matlab and am trying to pick up good programming habits from the
 start.

 Much appreciation!

  [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented,

[R] Need help with tables

2006-10-13 Thread laba diena

I have a data file with 3 columns and I need to take only 2, how to do that
?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need help with tables

2006-10-13 Thread David Barron

I'm not quite sure what you mean, but if you are wanting to select
columns of a data frame, have a look at

 help([)

David

On 13/10/06, laba diena [EMAIL PROTECTED] wrote:
 I have a data file with 3 columns and I need to take only 2, how to do that
 ?

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
=
David Barron
Said Business School
University of Oxford
Park End Street
Oxford OX1 1HP

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] bug: Editing function formals deletes the environment

2006-10-13 Thread Alex Brown

First, here's the specific bug I have.  Later I'll say why I care.

  ls(zappo)
Error in try(name) : object zappo not found
# good.
  f = function(zappo) { function(y) zappo + y }
  g = f(1)
  g(1)
[1] 2

  formals(g)
$y

  formals(g)$y
  formals(g)$y = 2
  g
function (y = 2)
zappo + y
  g(1)
Error in g(1) : object zappo not found

# looks like formals strips the environment off stuff.

anything I can do about this?

-Alex


Original question:

I'm trying to change the behaviour of a package, to simplify the  
interface.

I'd rather not change the package, although I could.

There's a hidden function whose defaults I wish to change.

I'm using R 2.3.1 for macosX.  Upgrading is not an option.

This is what I do:

library(R2HTML)

# get the function to modify
x = getFromNamespace(HTML.data.frame, R2HTML)
# change the default for an argument
formals(x)[Border]=list(NULL)
# put the function back
assignInNamespace(HTML.data.frame, x, R2HTML)

#test the function:

HTML(data.frame(1:2), file=stdout())

Error: could not find function HTMLReplaceNA

# what seems to be happening is that the formals function is  
stripping the namespace off the variable x.  I can't tell why.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bug: Editing function formals deletes the environment

2006-10-13 Thread Alex Brown

Ah, it's fixed in 2.4.0. I'll work around it.

-Alex


On 13 Oct 2006, at 11:19, Alex Brown wrote:

 First, here's the specific bug I have.  Later I'll say why I care.

 ls(zappo)
 Error in try(name) : object zappo not found
 # good.
 f = function(zappo) { function(y) zappo + y }
 g = f(1)
 g(1)
 [1] 2

 formals(g)
 $y

 formals(g)$y
 formals(g)$y = 2
 g
 function (y = 2)
 zappo + y
 g(1)
 Error in g(1) : object zappo not found

 # looks like formals strips the environment off stuff.

 anything I can do about this?

 -Alex


 Original question:

 I'm trying to change the behaviour of a package, to simplify the
 interface.

 I'd rather not change the package, although I could.

 There's a hidden function whose defaults I wish to change.

 I'm using R 2.3.1 for macosX.  Upgrading is not an option.

 This is what I do:

 library(R2HTML)

 # get the function to modify
 x = getFromNamespace(HTML.data.frame, R2HTML)
 # change the default for an argument
 formals(x)[Border]=list(NULL)
 # put the function back
 assignInNamespace(HTML.data.frame, x, R2HTML)

 #test the function:

 HTML(data.frame(1:2), file=stdout())

 Error: could not find function HTMLReplaceNA

 # what seems to be happening is that the formals function is
 stripping the namespace off the variable x.  I can't tell why.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] NA-handling in glm.fit?

2006-10-13 Thread Fredrik Thuring



Dear Sir or Madam,

I'm wondering if there is any routine or argument in the function 'glm.fit'
that makes it handle NA's. The function 'glm' can handle NA's but I can't
make make it work (or find anything written on this in the help files) with
'glm.fit'.
Is it even possible in'glm.fit'? How?

Thanks before hand,
Fredrik Thuring, Business Researcher
__
Codan Forsikring, Gammel Kongevej 60, DK-1790 Copenhagen V
tele: +45 33 55 26 63, fax: +45 33 55 21 22
e-mail: [EMAIL PROTECTED]
http://www.codan.dk
--
This e-mail and any attachment may be confidential and may a...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Comparing of two non-normal distributions

2006-10-13 Thread Amir Safari


Dear R Users,
  Suppose comparing two non-normal distributions is our interest. Like  
distribution of financial time series, they are negative skewed with  fat tail. 
Which test can better help and in which pachage? ( For  example in 
goodness-of-fit)  Kolmogorov-Smirnov test has its own  incompatibility. Are 
there for example Anderson-Darling or Kuiper  statistics in R?
  I'm grateful to any reply.
  Amir
  
  

-
Get your email and more, right on the  new Yahoo.com 
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fw: nested linear model; with common intercept

2006-10-13 Thread Mark Difford

Dear R-help,

I posted this on 4 Oct but got no response (I wasn't even told to go away and 
do some more background reading ;) ).  I am reposting it in the, perhaps, vain 
hope that someone with knowledge of the subject will reply, if only to point me 
in a different direction to which I am now facing.

Earlier Posting:---
I am sorry if this is more of a stats question than an R-question, but I have 
found it difficult to get a clear answer by other means.

Q. Would it be wrong to specify a nested model and retain a common intercept, 
e.g.

lm(NH4 ~ Site/TideCode + 1)

I am aware (?) that my Site-coefficients are now calculated relative to my 
reference Site  (treatment.contrasts), *but* that my TideCode levels now relate 
to their reference level within Site.

Is that correct?

Thank you in advance for help.

Regards,
Mark Difford.

  
Mark Difford 
Ph.D. candidate, Botany Department, 
Nelson Mandela Metropolitan University, 
Port Elizabeth, SA. 
 






Send instant messages to your online friends http://uk.messenger.yahoo.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Barplot legend position

2006-10-13 Thread Ingmar Visser

Dear useRs,

I'm trying to create a barplot like so:

x=matrix(1:10,2,5)
barplot(x,leg=c(left,right),besid=T)

The legend is placed in default position topright, however the data are
plotted there too. I tried controlling the legend position by adding
x=topleft but this results in an error that x matches multiple formal
arguments. 

Leaving out the legend and making a separate call to legend leaves out the
colors of bars ...

Please advice, Ingmar

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] multiply two matrixes with the different dimension column by column

2006-10-13 Thread Majid Iravani

Dear all,
I would like to multiply two matrixes with the different dimension column 
by column. Let make an example:
If I have two matrixes X and Yas follow:

X- matrix(1:12, nrow=4, ncol=3, dimnames=list(c(A,B,C,D), 
c(stage1,stage2,stage3)))
Y- matrix(1:28, nrow=4, ncol=7, dimnames=list(c(A,B,C,D), 
c(site1,site2,site3,site4,site5, site6,site7)))

  I would like to multiply first column of the Ymatrix (site1) to the all 
of the columns in Xmatrix. Then, the product will be three new columns 
(for example:site1stage1, site1stage2 and site1stage3 or something like 
this) which I want to add to Ymatrix. As my site (Y) dataset has too many 
columns, it's not easy to do it in Excel and I'm looking for a command in R 
to prepare a new data frame for more analysis. So I would greatly 
appreciate if anybody can help me in this case.
Thanks in advance
Majid

  Majid Iravani
  PhD Student
  Swiss Federal Research Institute WSL
  Research Group of Vegetation Ecology
  Zürcherstrasse 111  CH-8903 Birmensdorf  Switzerland
  Phone: +41-1-739-2693
  Fax: +41-1-739-2215
  Email: [EMAIL PROTECTED]
http://www.wsl.ch/staff/majid.iravani/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bug in lowess

2006-10-13 Thread Prof Brian Ripley

Frank Harrell wrote:

[...]

 Thank you Brian.  It seems that no matter what is the right answer, the
 answer currently returned on my system is clearly wrong.  lowess()$y
 should be constrained to be within range(y).

Really?  That assertion is offered without proof and I am pretty sure is 
incorrect.  Consider

 x - c(1:10, 20)
 y - c(1:10, 5) + 0.01*rnorm(11)
 lowess(x,y)
$x
  [1]  1  2  3  4  5  6  7  8  9 10 20

$y
  [1]  0.9983192  1.9969599  2.9960805  3.9948224  4.9944158  5.9959855
  [7]  6.9964400  7.9981434  8.9990607 10.0002567 19.9946117

Remember that lowess is a local *linear* fitting method, and may give zero 
weight to some data points, so (as here) it can extrapolate.

After reading what src/appl/lowess.doc says should happen with zero 
weights, I think the answer given on Frank's system probably is the 
correct one.  Rounding error is determining which of the last two points 
is given zero robustness weight: on a i686 system both of the last two 
are, and on mine only the last is. As far as I can tell in 
infinite-precision arithmetic both would be zero, and hence the value at 
x=120 would be found by extrapolation from those (far) to the left of it.

I am inclined to think that the best course of action is to quit with a 
warning when the MAD of the residuals is effectively zero.  However, we 
need to be careful not to call things 'bugs' that we do not understand 
well enough.  This might be a design error in lowess, but it is not AFAICS 
a bug in the implementation.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Mantel test!

2006-10-13 Thread Hossein Moradi

Dear Sir/ Madam,
 
I would like to ask you about mantel test in R. In ade4 package, if I
want to use mantel.rtest, I get error massage Object of class to dist
expected. As I already have two dissimilarity matrices, shall I again
compute distance measure using this function?
If not, could you please let me know which function/command I can use to
do?
Thank you very much in advance and have a nice day!
 
Best regards,
Hossein
 
 
___
 Hossein Moradi
PhD Student
Institute of Environmental Sciences
University of Zurich
Winterthurerstrasse 190
Switzerland
Tel: +41 1 635 61 18
Fax: +41 1 635 57 11
E-mail:  mailto:[EMAIL PROTECTED] [EMAIL PROTECTED]
 

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] NA-handling in glm.fit?

2006-10-13 Thread Gavin Simpson

On Fri, 2006-10-13 at 12:51 +0200, Fredrik Thuring wrote:
 
 Dear Sir or Madam,
 
 I'm wondering if there is any routine or argument in the function 'glm.fit'
 that makes it handle NA's. The function 'glm' can handle NA's but I can't
 make make it work (or find anything written on this in the help files) with
 'glm.fit'.
 Is it even possible in'glm.fit'? How?
 
 Thanks before hand,
 Fredrik Thuring, Business Researcher

glm() deals with NAs etc via the na.action argument, which is missing
from glm.fit. glm is a wrapper around glm.fit, so look at the code of
glm and see how it handles NAs when generating the x and y arguments
that go into glm.fit.

You should also look at ?glm which points to ?na.omit and look at
options(na.action) to set what your setting is currently. You'll
probably want to run na.omit or na.exlude on a combination of the
response and predictor matrix (the arguments you are passing as x and y
to glm.fit) to remove incomplete cases.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/cv/
 London, UK. WC1E 6BT. [w] http://www.ucl.ac.uk/~ucfagls/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Barplot legend position

2006-10-13 Thread David Hajage

For example :

x=matrix(1:10,2,5)
barplot(x,besid=T)
legend(topleft, c(left,right), density= c(0,1000))


2006/10/13, Ingmar Visser [EMAIL PROTECTED]:

 Dear useRs,

 I'm trying to create a barplot like so:

 x=matrix(1:10,2,5)
 barplot(x,leg=c(left,right),besid=T)

 The legend is placed in default position topright, however the data are
 plotted there too. I tried controlling the legend position by adding
 x=topleft but this results in an error that x matches multiple formal
 arguments.

 Leaving out the legend and making a separate call to legend leaves out the
 colors of bars ...

 Please advice, Ingmar

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
David

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Barplot legend position

2006-10-13 Thread Ingmar Visser

Thanks, this could work!
However, the legend does not reproduce the color/shading used in the
original 
barplot, are those available somehow?
Best, Ingmar

From: David Hajage [EMAIL PROTECTED]
Date: Fri, 13 Oct 2006 14:11:21 +0200
To: Ingmar Visser [EMAIL PROTECTED]
Cc: R-help@stat.math.ethz.ch
Subject: Re: [R] Barplot legend position

For example :

x=matrix(1:10,2,5)
barplot(x,besid=T)
legend(topleft, c(left,right), density= c(0,1000))

2006/10/13, Ingmar Visser  [EMAIL PROTECTED]:
 Dear useRs,

 I'm trying to create a barplot like so:

 x=matrix(1:10,2,5)
 barplot(x,leg=c(left,right),besid=T)

 The legend is placed in default position topright, however the data are
 plotted there too. I tried controlling the legend position by adding
 x=topleft but this results in an error that x matches multiple formal
 arguments.

 Leaving out the legend and making a separate call to legend leaves out the
 colors of bars ...

 Please advice, Ingmar

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
David

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Copula in R 2.4.0

2006-10-13 Thread Dominique Katshunga

Dear R helper,
does anyone have an idea on why R.2.4.0 draws the surface for the two
command lines below and the next time it renders the error message below
for exactly the same command lines:

 norm.cop - normalCopula(0.5)
 persp(norm.cop, dcopula)

Error in ceiling(length.out) : Non-numeric argument to mathematical
function.
I will appreciate any help from anyone
thanks,
Dominique K.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mantel test!

2006-10-13 Thread Gavin Simpson

On Fri, 2006-10-13 at 14:07 +0200, Hossein Moradi wrote:
 Dear Sir/ Madam,
  
 I would like to ask you about mantel test in R. In ade4 package, if I
 want to use mantel.rtest, I get error massage Object of class to dist
 expected. As I already have two dissimilarity matrices, shall I again
 compute distance measure using this function?
 If not, could you please let me know which function/command I can use to
 do?
 Thank you very much in advance and have a nice day!
  
 Best regards,
 Hossein

If the dissimilarity matrices are full, square, symmetric matrices, then
you can use:

mantel.rtest(as.dist(your_mat1), as.dist(your_mat2))

See ?as.dist

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/cv/
 London, UK. WC1E 6BT. [w] http://www.ucl.ac.uk/~ucfagls/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Log-scale in histogramm

2006-10-13 Thread David Graf

Hello

My data looks ugly in a normal histogramm. How can I create a histogramm with a 
Y-axis in log-scale?

Thanks for your help!

David Graf
--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Multiple barplots on the same axis

2006-10-13 Thread Andre Nathan

Hi

R newbie here :)

I need to plot 3 barplots in the same axis, something like

   |
   |  ___
   | | | _| | _| | _ 
   |   _ | || | _ | || | _ | || |
   |  | || || || || || || || || |
  -+-
   | v1   v2   v3


Is there any documentation describing how to achieve that, and what data
file layout would make the job easier?

Thanks in advance,
Andre

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fw: nested linear model; with common intercept

2006-10-13 Thread ken knoblauch

I f I understand you correctly, I don't think that your model is doing 
what you think it is.
Look at the model.matrix.   Consider a toy example:
  x - 1:10
  y - factor(letters[1:2])
  dd - expand.grid(x, y)
  dd$resp - rnorm(20)
  model.matrix(~Var2/Var1+1, dd)
(Intercept) Var2b Var2a:Var1 Var2b:Var1
11 0  1  0
21 0  2  0
31 0  3  0
41 0  4  0
51 0  5  0
61 0  6  0
71 0  7  0
81 0  8  0
91 0  9  0
10   1 0 10  0
11   1 1  0  1
12   1 1  0  2
13   1 1  0  3
14   1 1  0  4
15   1 1  0  5
16   1 1  0  6
17   1 1  0  7
18   1 1  0  8
19   1 1  0  9
20   1 1  0 10
attr(,assign)
[1] 0 1 2 2
attr(,contrasts)
attr(,contrasts)$Var2
[1] contr.treatment


Do you want something more like the following?

model.matrix(~Var2:Var1, dd)
(Intercept) Var2a:Var1 Var2b:Var1
11  1  0
21  2  0
31  3  0
41  4  0
51  5  0
61  6  0
71  7  0
81  8  0
91  9  0
10   1 10  0
11   1  0  1
12   1  0  2
13   1  0  3
14   1  0  4
15   1  0  5
16   1  0  6
17   1  0  7
18   1  0  8
19   1  0  9
20   1  0 10
attr(,assign)
[1] 0 1 1
attr(,contrasts)
attr(,contrasts)$Var2
[1] contr.treatment

although, an expert will surely correct me, if I'm in error here


 Q. Would it be wrong to specify a nested model and retain a common 
 intercept, e.g.

 lm(NH4 ~ Site/TideCode + 1)

 I am aware (?) that my Site-coefficients are now calculated relative 
 to my reference Site  (treatment.contrasts), *but* that my TideCode 
 levels now relate to their reference level within Site.

 Is that correct?

 Thank you in advance for help.

 Regards,
 Mark Difford.

-- 
Ken Knoblauch
Inserm U371
Institut Cellule Souche et Cerveau
Département Neurosciences Intégratives
18 avenue du Doyen Lépine
69500 Bron
France
tel: +33 (0)4 72 91 34 77
fax: +33 (0)4 72 91 34 61
portable: +33 (0)6 84 10 64 10
http://www.lyon.inserm.fr/371/

[[alternative text/enriched version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bug: Editing function formals deletes the environment

2006-10-13 Thread Gabor Grothendieck

If you are just modifying an S3 method in a package you may not need to reinsert
the method into the package since UseMethod first looks into the
caller environment
for methods anyways and only second does it look for methods in the
package.  Thus:

   HTML.data.frame - R2HTML:::HTML.data.frame
   HTML.data.frame$Border - 2
   HTML(BOD, file = file(clipboard, w), append = FALSE)

would be sufficient if you intend to call HTML.

There is one significant caveat.  If you intend to call a function in
R2HTML which in
turn calls HTML then the above would not be enough.  You would also
have to modify
the environment of the caller too.

Thus after running the above:

   HTML2clip(BOD)

would still get the old Border since we are calling HTML2clip which in
turn calls
HTML (as opposed to calling HTML directly).   In this case, we would
need to create
a new HTML2clip with a reset environment too:

   HTML2clip - R2HTML:::HTML2clip
   environment(HTML2clip) - environment()
   HTML2clip(BOD)

would get the Border=2 value.

On 10/13/06, Alex Brown [EMAIL PROTECTED] wrote:
 First, here's the specific bug I have.  Later I'll say why I care.

   ls(zappo)
 Error in try(name) : object zappo not found
 # good.
   f = function(zappo) { function(y) zappo + y }
   g = f(1)
   g(1)
 [1] 2

   formals(g)
 $y

   formals(g)$y
   formals(g)$y = 2
   g
 function (y = 2)
 zappo + y
   g(1)
 Error in g(1) : object zappo not found

 # looks like formals strips the environment off stuff.

 anything I can do about this?

 -Alex


 Original question:

 I'm trying to change the behaviour of a package, to simplify the
 interface.

 I'd rather not change the package, although I could.

 There's a hidden function whose defaults I wish to change.

 I'm using R 2.3.1 for macosX.  Upgrading is not an option.

 This is what I do:

 library(R2HTML)

 # get the function to modify
 x = getFromNamespace(HTML.data.frame, R2HTML)
 # change the default for an argument
 formals(x)[Border]=list(NULL)
 # put the function back
 assignInNamespace(HTML.data.frame, x, R2HTML)

 #test the function:

 HTML(data.frame(1:2), file=stdout())

 Error: could not find function HTMLReplaceNA

 # what seems to be happening is that the formals function is
 stripping the namespace off the variable x.  I can't tell why.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bug in lowess

2006-10-13 Thread Frank E Harrell Jr

Prof Brian Ripley wrote:
 Frank Harrell wrote:
 
 [...]
 
 Thank you Brian.  It seems that no matter what is the right answer, the
 answer currently returned on my system is clearly wrong.  lowess()$y
 should be constrained to be within range(y).
 
 Really?  That assertion is offered without proof and I am pretty sure is 
 incorrect.  Consider
 
 x - c(1:10, 20)
 y - c(1:10, 5) + 0.01*rnorm(11)
 lowess(x,y)
 $x
  [1]  1  2  3  4  5  6  7  8  9 10 20
 
 $y
  [1]  0.9983192  1.9969599  2.9960805  3.9948224  4.9944158  5.9959855
  [7]  6.9964400  7.9981434  8.9990607 10.0002567 19.9946117
 
 Remember that lowess is a local *linear* fitting method, and may give 
 zero weight to some data points, so (as here) it can extrapolate.

Brian - thanks - that's a good example though not typical of the kind I 
see from patients.

 
 After reading what src/appl/lowess.doc says should happen with zero 
 weights, I think the answer given on Frank's system probably is the 
 correct one.  Rounding error is determining which of the last two points 
 is given zero robustness weight: on a i686 system both of the last two 
 are, and on mine only the last is. As far as I can tell in 
 infinite-precision arithmetic both would be zero, and hence the value at 
 x=120 would be found by extrapolation from those (far) to the left of it.
 
 I am inclined to think that the best course of action is to quit with a 
 warning when the MAD of the residuals is effectively zero.  However, we 
 need to be careful not to call things 'bugs' that we do not understand 
 well enough.  This might be a design error in lowess, but it is not 
 AFAICS a bug in the implementation.

Yes it appears to be a weakness in the underlying algorithm.

Thanks
Frank

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiply two matrixes with the different dimension column by column

2006-10-13 Thread Gabor Grothendieck

Here are two ways:

1. Using inner from:

 http://tolstoy.newcastle.edu.au/R/help/05/04/3709.html

try:

 array(inner(t(X), Y, *), c(4, 21))

2. using model.matrix get all terms and interactions and eliminate the
non-interactions:

 model.matrix(~ X * Y - X - Y - 1)

On 10/13/06, Majid Iravani [EMAIL PROTECTED] wrote:
 Dear all,
 I would like to multiply two matrixes with the different dimension column
 by column. Let make an example:
 If I have two matrixes X and Yas follow:

 X- matrix(1:12, nrow=4, ncol=3, dimnames=list(c(A,B,C,D),
 c(stage1,stage2,stage3)))
 Y- matrix(1:28, nrow=4, ncol=7, dimnames=list(c(A,B,C,D),
 c(site1,site2,site3,site4,site5, site6,site7)))

  I would like to multiply first column of the Ymatrix (site1) to the all
 of the columns in Xmatrix. Then, the product will be three new columns
 (for example:site1stage1, site1stage2 and site1stage3 or something like
 this) which I want to add to Ymatrix. As my site (Y) dataset has too many
 columns, it's not easy to do it in Excel and I'm looking for a command in R
 to prepare a new data frame for more analysis. So I would greatly
 appreciate if anybody can help me in this case.
 Thanks in advance
 Majid
 
  Majid Iravani
  PhD Student
  Swiss Federal Research Institute WSL
  Research Group of Vegetation Ecology
  Zürcherstrasse 111  CH-8903 Birmensdorf  Switzerland
  Phone: +41-1-739-2693
  Fax: +41-1-739-2215
  Email: [EMAIL PROTECTED]
 http://www.wsl.ch/staff/majid.iravani/

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiply two matrixes with the different dimension column by column

2006-10-13 Thread Alberto Monteiro

Majid Iravani wrote:

 I would like to multiply two matrixes with the different dimension 
 column by column. Let make an example: If I have two matrixes X 
 and Yas follow:
 
 X- matrix(1:12, nrow=4, ncol=3, dimnames=list(c(A,B,C,D), 
 c(stage1,stage2,stage3)))
 Y- matrix(1:28, nrow=4, ncol=7, dimnames=list(c(A,B,C,D), 
 c(site1,site2,site3,site4,site5, site6,site7)))
 
t(X) %*% Y will give something; I don't know if this is what
you want.

Alberto Monteiro

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiply two matrixes with the different dimension column by column

2006-10-13 Thread Alex Brown

apply(Y, 2, function(y)list(y*X))

On 13 Oct 2006, at 12:33, Majid Iravani wrote:

 Dear all,
 I would like to multiply two matrixes with the different dimension  
 column
 by column. Let make an example:
 If I have two matrixes X and Yas follow:

 X- matrix(1:12, nrow=4, ncol=3, dimnames=list(c(A,B,C,D),
 c(stage1,stage2,stage3)))
 Y- matrix(1:28, nrow=4, ncol=7, dimnames=list(c(A,B,C,D),
 c(site1,site2,site3,site4,site5, site6,site7)))

   I would like to multiply first column of the Ymatrix (site1) to  
 the all
 of the columns in Xmatrix. Then, the product will be three new  
 columns
 (for example:site1stage1, site1stage2 and site1stage3 or something  
 like
 this) which I want to add to Ymatrix. As my site (Y) dataset has  
 too many
 columns, it's not easy to do it in Excel and I'm looking for a  
 command in R
 to prepare a new data frame for more analysis. So I would greatly
 appreciate if anybody can help me in this case.
 Thanks in advance
 Majid
 -- 
 --
   Majid Iravani
   PhD Student
   Swiss Federal Research Institute WSL
   Research Group of Vegetation Ecology
   Zürcherstrasse 111  CH-8903 Birmensdorf  Switzerland
   Phone: +41-1-739-2693
   Fax: +41-1-739-2215
   Email: [EMAIL PROTECTED]
 http://www.wsl.ch/staff/majid.iravani/

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross two dataframe

2006-10-13 Thread Mihai Nica

Actually, I'm a beginner too :-) It's just this list is so helpful, and I was 
helped so many times, that I am trying to bring my small contribution :-)
How about:

#_
Y1=X*Y[,1]
Y=cbind(Y,Y1)
Y

  site1 site2 site3 site4 site5 site6 site7 stage1 stage2 stage3
A 1 5 913172125  1  5  9
B 2 61014182226  4 12 20
C 3 71115192327  9 21 33
D 4 81216202428 16 32 48
#___

And then change the names as you like... I bet there is a more elegant way, but 
it seems to work :-)

hth,
 
Mihai Nica
170 East Griffith St. G5
Jackson, MS 39201
601-914-0361

- Original Message 
From: Majid Iravani [EMAIL PROTECTED]
To: Mihai Nica [EMAIL PROTECTED]
Sent: Friday, October 13, 2006 1:58:29 AM
Subject: Re: [R] Cross two dataframe

Dear Mihai Nica
Thanks. Actually I dont want to merge two data frames.
For example if I have two matrixes X and Yas follow:

X- matrix(1:12, nrow=4, ncol=3, dimnames=list(c(A,B,C,D), 
c(stage1,stage2,stage3)))

Y- matrix(1:28, nrow=4, ncol=7, dimnames=list(c(A,B,C,D), 
c(site1,site2,site3,site4,site5, site6,site7)))

  I would like to multiply first column of the Ymatrix (site1) to the all 
of the columns in Xmatrix. Then, the product will be three new columns 
(site1stage1, site1stage2 and site1stage3) which I want to add to Ymatrix. 
As my site (Y) dataset has about 400 columns its not easy to do it in Excel 
and Im looking for a command in R to prepare a new data frame for more 
analysis. So I would greatly appreciate if you help me in this case.
Thanks a lot again
Majid





At 10:06 AM 10/12/2006 -0700, you wrote:
Mihai Nica


  Majid Iravani
  PhD Student
  Swiss Federal Research Institute WSL
  Research Group of Vegetation Ecology
  Zürcherstrasse 111  CH-8903 Birmensdorf  Switzerland
  Phone: +41-1-739-2693
  Fax: +41-1-739-2215
  Email: [EMAIL PROTECTED]
http://www.wsl.ch/staff/majid.iravani/









[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] C code for KalmnaLike

2006-10-13 Thread Leeds, Mark \(IED\)

Gavin, you are right. I thought he just wanted a routine. My bad.

-Original Message-
From: Gavin Simpson [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 13, 2006 5:00 AM
To: Leeds, Mark (IED)
Cc: Malini Subramanian; R-help@stat.math.ethz.ch
Subject: Re: [R] C code for KalmnaLike

On Thu, 2006-10-12 at 10:57 -0400, Leeds, Mark (IED) wrote:
 you shouldn't need it. Kalmanlike() ( spelling ) I think is in the 
 base package and there is atleast One constributed package and 
 probably many others that do kalman filtering but I can't recall the 
 names of them.

 Check out the list of packages at www.r-project.org.

Mark, That pre-supposes that Malini just wants to perform kalman
filtering, and not look at the inner workings of the implementation in
R. KalmanLike is in package stats distributed with base R, but it is
defined as:

 KalmanLike
function (y, mod, nit = 0, fast = TRUE)
{
x - .Call(KalmanLike, y, mod$Z, mod$a, mod$P, mod$T, mod$V,
mod$h, mod$Pn, as.integer(nit), FALSE, fast = fast, PACKAGE =
stats)
names(x) - c(ssq, sumlog)
s2 - x[1]/length(y)
list(Lik = 0.5 * (log(x[1]/length(y)) + x[2]/length(y)),
s2 = s2)
}
environment: namespace:stats

So, not much use in reading the R code as this just calls compiled code.
If Malini really does want to look at the C code for KalmanLike then Uwe
Ligges recently posted a preview of an article he is writing for R News,
which explains how to access various parts of R's source code. The
preview is still available from:

http://www.statistik.uni-dortmund.de/~ligges/R_Help_Desk_preview.pdf

The information contained in the article should allow Malini to find the
C for KalmanLike.

HTH,

G

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Malini 
 Subramanian
 Sent: Thursday, October 12, 2006 9:56 AM
 To: R-help@stat.math.ethz.ch
 Subject: Re: [R] C code for KalmnaLike

 hi,
   i am looking for c code of kalman filtering please can you help 
 me...thankyou bye...

 -

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 This is not an offer (or solicitation of an offer) to 
 buy/se...{{dropped}}

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/cv/
 London, UK. WC1E 6BT. [w] http://www.ucl.ac.uk/~ucfagls/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Copula in R 2.4.0

2006-10-13 Thread Prof Brian Ripley

As there is no function normalCopula in R 2.4.0, your example is 
incomplete.  Please see the footer of this message (and perhaps contact 
the maintainer of the package involved).

Please don't send messages repeatedly: if you don't get an answer the 
first time study the posting guide and work out what the problem might be.

On Fri, 13 Oct 2006, Dominique Katshunga wrote:

 Dear R helper,
 does anyone have an idea on why R.2.4.0 draws the surface for the two
 command lines below and the next time it renders the error message below
 for exactly the same command lines:

 norm.cop - normalCopula(0.5)
 persp(norm.cop, dcopula)

 Error in ceiling(length.out) : Non-numeric argument to mathematical
 function.
 I will appreciate any help from anyone
 thanks,
 Dominique K.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RODBC sqlQuery insert slow

2006-10-13 Thread Bill Szkotnicki

Hello,
I am trying to insert a lot of data into a table using windows R (2.3.1) 
and a mysql database via RODBC.
First I read a file with read.csv and then form sql insert statements 
for each row and execute the insert query one row at a time. See the 
loop below.
This turns out to be very slow.
Can anyone please suggest a way to speed it up?

Thanks, Bill

# R code
ntry=dim(ti)[1]
date()
nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger)
for (i in 1:ntry) {
sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES(
d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p)
sql=paste(sql,',d1,' )
sql=paste(sql,,,ti[i,3] )
sql=paste(sql,,,ti[i,4] )
sql=paste(sql,,,ti[i,5] )
sql=paste(sql,,,ti[i,6] )
sql=paste(sql,,,ti[i,7] )
sql=paste(sql,,,ti[i,8] )
sql=paste(sql,,,ti[i,9] )
sql=paste(sql,,,ti[i,10])
sql=paste(sql,,,ti[i,11])
sql=paste(sql,,,ti[i,12])
sql=paste(sql,) )
#print(sql)
sqlQuery(channel, sql)
}
nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger)
nadded=nafter-nbefore;nadded
date()

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple barplots on the same axis

2006-10-13 Thread Marc Schwartz

On Thu, 2006-10-12 at 23:08 -0300, Andre Nathan wrote:
 Hi
 
 R newbie here :)
 
 I need to plot 3 barplots in the same axis, something like
 
|
|  ___
| | | _| | _| | _ 
|   _ | || | _ | || | _ | || |
|  | || || || || || || || || |
   -+-
| v1   v2   v3
 
 
 Is there any documentation describing how to achieve that, and what data
 file layout would make the job easier?
 
 Thanks in advance,
 Andre

There are examples in ?barplot, where the VADeaths data is used. The key
is the use of 'beside = TRUE', to enable grouped bars:

barplot(VADeaths, beside = TRUE,
col = c(lightblue, mistyrose, lightcyan,
lavender, cornsilk),
legend = rownames(VADeaths), ylim = c(0, 100))

To see what the VADeaths data set looks like:

  VADeaths

See ?barplot for more information.

There is also an R Graphics Gallery with code at:

http://addictedtor.free.fr/graphiques/index.php

and From Data to Graphics at:

http://zoonek2.free.fr/UNIX/48_R/03.html

Both of which are helpful.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problemss compiling RODBC

2006-10-13 Thread Vittorio

When updating to the very last version of RODBC under freebsd 6.1 the 
errors below pop up but RODBC compiles till the end and, it seems, to 
work properly.
What are those errors about?

Vittorio

..
checking for suffix of 
executables...
checking for suffix of object files... o
checking 
whether we are using the GNU C compiler... grep: error while loading 
shared libraries: /usr/local/lib/libpcre.so.0: ELF file OS ABI invalid
yes
checking whether cc accepts -g... grep: error while loading shared 
libraries: /usr/local/lib/libpcre.so.0: ELF file OS ABI invalid
yes
checking for cc option to accept ANSI C... grep: error while loading 
shared libraries: /usr/local/lib/libpcre.so.0: ELF file OS ABI invalid
none needed
..
.
grep: error while loading shared 
libraries: /usr/local/lib/libpcre.so.0: ELF file OS ABI invalid
config.
status: creating src/Makevars
config.status: creating src/config.h
grep: error while loading shared libraries: /usr/local/lib/libpcre.so.
0: ELF file OS ABI invalid
** libs
cc -I/usr/local/lib/R/include -
I/usr/local/lib/R/include -I. -I/usr/local/include -
I/usr/local/include  -D__NO_MATH_INLINES  -fpic  -O2 -fno-strict-
aliasing -pipe -march=pentium4 -c RODBC.c -o RODBC.o
cc -shared -
L/usr/local/lib -o RODBC.so RODBC.o -lodbc -L/usr/local/lib -
L/usr/local/lib  -L/usr/local/lib/R/lib -lR
** R
** inst
** preparing 
package for lazy loading
** help
etc.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] combinatorics

2006-10-13 Thread Robin Hankin

Hi

How do I generate all ways of ordering  sets of indistinguishable items?

suppose I have two A's, two B's and a C.

Then I want

AABBC
AABCB
AACBC
ABABC
. . .snip...
BBAAC
. . .snip...
CBBAA

[there are 5!/(2!*2!) = 30 arrangements.  Note AABBC != BBAAC]

How do I do this?





--
Robin Hankin
Uncertainty Analyst
National Oceanography Centre, Southampton
European Way, Southampton SO14 3ZH, UK
  tel  023-8059-7743

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] loop, pipe connection, output objects

2006-10-13 Thread Marco Grazzi

Hi all,

I have the following -newbye- problem.
Inside R, I am trying to process a file and creating from it many files.
The file is organized in different columns, the second containing a code. I 
want to create as output objects, which contain only entries in a certain code 
range, and whose name contain the code itself.
Here is my attempt

indice - (201:399)
for(i in indice){
  data.i -  read.table(pipe(paste(gawk '{if ($2 =, i,  $2, i+1,) 
print $2,$3}'  base_6_mod   , sep='')))
print(paste(code ..., i,   ... done))
}

The problems are:
1- My sintax data.i does not work, and loop only produces a big data.i 
object. My goal, obviously was to have something like data.201, data.202, etc
(second order problem)
2- I wonder if the sintax for the index ($2 =, i,  $2, i+1,) works

Thanks for your help (hoping I manged to be enough clear), 
marco
-- 

Marco Grazzi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Barplot legend position

2006-10-13 Thread Stefan Grosse

e.g.:
barplot(x,col=c(lightgrey,darkgrey),besid=T)
legend(topleft,c(left,right),fill=c(lightgrey,darkgrey))

try:
?legend
and
example(legend)
for documentation!!!

Ingmar Visser schrieb:
 Thanks, this could work!
 However, the legend does not reproduce the color/shading used in the
 original 
 barplot, are those available somehow?
 Best, Ingmar


 From: David Hajage [EMAIL PROTECTED]
 Date: Fri, 13 Oct 2006 14:11:21 +0200
 To: Ingmar Visser [EMAIL PROTECTED]
 Cc: R-help@stat.math.ethz.ch
 Subject: Re: [R] Barplot legend position

 For example :

 x=matrix(1:10,2,5)
 barplot(x,besid=T)
 legend(topleft, c(left,right), density= c(0,1000))


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need help with tables

2006-10-13 Thread Stefan Grosse

I suggest you look at one of the guides:
http://cran.r-project.org/other-docs.html
before answering questions like this to the mailing list... please read
the posting guide!

laba diena schrieb:
 I have a data file with 3 columns and I need to take only 2, how to do that
 ?

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm cannot find valid starting values

2006-10-13 Thread Ronaldo ReisJunior

Hi,

I have some similar problems. Some times ago this problem dont there existed.

Look this simple example:

 Y - c(0,0,0,1,4,8,16)
 X - c(1,2,3,4,5,6,7)

 m - glm(Y~X,family=gaussian(link=log))
Error in eval(expr, envir, enclos) : cannot find valid starting values: please 
specify some

 m - glm(Y~X,family=gaussian(link=log))
Error in eval(expr, envir, enclos) : cannot find valid starting values: please 
specify some

What is the problem? I think that the problem is with the log link algorithm 
and zeros.

Look

 Y - c(0,0.1,0.5,1,4,8,16)

 m - glm(Y~X,family=gaussian(link=log))
Error in eval(expr, envir, enclos) : cannot find valid starting values: please 
specify some
 
 Y - c(0.01,0.1,0.5,1,4,8,16)
 
 m - glm(Y~X,family=gaussian(link=log))

Without Zeros it work.

It is a real bug?

Thanks
Ronaldo
-- 
Preserve wildlife -- pickle a squirrel today!
--
 Prof. Ronaldo Reis Júnior
|  .''`. UNIMONTES/Depto. Biologia Geral/Lab. Ecologia Evolutiva
| : :'  : Campus Universitário Prof. Darcy Ribeiro, Vila Mauricéia
| `. `'` CP: 126, CEP: 39401-089, Montes Claros - MG - Brasil
|   `- Fone: (38) 3229-8190 | [EMAIL PROTECTED]
| ICQ#: 5692561 | LinuxUser#: 205366

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Producing a random bistochastic matrix

2006-10-13 Thread Florent Bresson

Hi everyone,

I need some help to produce a random bistochastic matrix,
that is a squared matrix of positive real numbers e_ij, with sum(e_i)=1
and sum(e_j)=1.
Thanks

Florent Bresson







___ 

Demandez à ceux qui savent sur Yahoo! Questions/Réponses

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rarefy a matrix of counts

2006-10-13 Thread Brian Frappier

Thank you, Alex!  That's exactly what I was looking to do.  I'm going to
remove the loops and use your apply function approach.  Best regards and
much thanks,  brian

On 10/13/06, Alex Brown [EMAIL PROTECTED] wrote:

 I thought at first that you could use a weighted sample (the sample
 function) but, you can't since it doesn't take proper account of
 replacement if you try that.

 You can use the list approach, but through the power of R, you don't
 need a lot of loops to do it...

 I can't speak for the efficiency of this approach in terms of cpu cycle.

 In short:

 apply(z2,2,function(x)sample(rep(names(x),x),100))

 In long:

 #let's load the data:

 z = scan(,,sep=\n)
 sample.1 sample.2 sample.3
 red.candy   400 300   2500
 green.candy1000  200
 black.candy 3001000500

 #and turn into a table

   z2 = read.table(textConnection(z), header=TRUE, row.names=1)

 # let's create a functon to expand a sample column into individuals:

 expand - function(x) rep(names(x), x)

 # test it on a smaller set:

 ex - expand( c( red = 2, blue = 3) )

 ex
 [1] red  red  blue blue blue

 # and sample 2 things from that:

 sample( ex, 2 )

 # combine the two

 samplex - function( x, size ) sample(expand(x), size )

 samplex( c( red = 2, blue = 3), size = 2 )

 # ok, now we use the apply function to apply this to each column

 apply(z2, 2, samplex, size = 2 )

 # you wanted 100?

 apply(z2, 2, samplex, size = 100 )

 # all done.

 #You should note that if there are less than 100 (samplenumber)
 candies in any given sample, this function will fail.
 # eg:

 apply(z2, 2, samplex, size = 2000 )

 Error in sample(length(x), size, replace, prob) :
 cannot take a sample larger than the population
 when 'replace = FALSE'

 -Alex

 On 11 Oct 2006, at 15:10, Brian Frappier wrote:

  Hi Petr,
 
  Thanks for your response.  I have data that looks like the following:
 
 sample 1 sample 2 sample 3  
  red candy400 300   2500
  green candy1000  200
  black candy 3001000500
 
  I don't want to randomly select either the samples (columns) or the
  candy
  types (rows), which sample as you state would allow me.  Instead, I
  want to
  randomly sample 100 candies from each sample and retain info on their
  associated type.  I could make a list of all the candies in each
  sample:
 
  sample 1
  red
  red
  red
  red
  green
  green
  black
  red
  black
  ...
 
  and then randomly sample those rows.  Repeat for each sample.  But,
  I am not
  sure how to do that without alot of loops, and am wondering if
  there is an
  easier way in R.  Thanks!  I should have laid this out in the first
  email...sorry.
 
 
  On 10/11/06, Petr Pikal [EMAIL PROTECTED] wrote:
 
  Hi
 
  I am not experienced in Matlab and from your explanation I do not
  understand what exactly do you want. It seems that you want randomly
  choose a sample of 100 rows from your martix, what can be achived by
  sample.
 
  DF-data.frame(rnorm(100), 1:100, 101:200, 201:300)
  DF[sample(1:100, 10),]
 
  If you want to do this several times, you need to save your result
  and than it depends on what you want to do next. One suitable form is
  list of matrices the other is array and you can use for loop for
  completing it.
 
  HTH
  Petr
 
 
  On 10 Oct 2006 at 17:40, Brian Frappier wrote:
 
  Date sent:  Tue, 10 Oct 2006 17:40:47 -0400
  From:   Brian Frappier [EMAIL PROTECTED]
  To: r-help@stat.math.ethz.ch
  Subject:[R] rarefy a matrix of counts
 
  Hi all,
 
  I have a matrix of counts for objects (rows) by samples
  (columns).  I
  aimed for about 500 counts in each sample (I have about 80 samples)
  and would now like to rarefy these down to 100 counts in each sample
  using simple random sampling without replacement.  I plan on
  rarefying
  several times for each sample.  I could do the tedious looping
  task of
  making a list of all objects (with its associated identifier) in
  each
  sample and then use the wonderful sampling package to select a
  sub-sample of 100 for each sample and thereby get a logical
  vector of
  inclusions.  I would then regroup the resulting logical vector
  into a
  vector of counts by object, rinse and repeat several times for each
  sample.
 
  Alternately, using the same list, I could create a random index of
  integers between 1 and the number of objects for a sample (without
  repeats) and then select those objects from the list.  Again, rinse
  and repeat several time for each sample.
 
  Is there a way to directly rarefy a matrix of counts without
  having to
  create a list of objects first?  I am trying to switch to R from
  Matlab and am trying to pick up good programming habits

Re: [R] RODBC sqlQuery insert slow

2006-10-13 Thread Jerome Asselin

On Fri, 2006-10-13 at 09:09 -0400, Bill Szkotnicki wrote:
 Hello,
 I am trying to insert a lot of data into a table using windows R (2.3.1) 
 and a mysql database via RODBC.
 First I read a file with read.csv and then form sql insert statements 
 for each row and execute the insert query one row at a time. See the 
 loop below.
 This turns out to be very slow.
 Can anyone please suggest a way to speed it up?
 
 Thanks, Bill
 
 # R code
 ntry=dim(ti)[1]
 date()
 nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger)
 for (i in 1:ntry) {
 sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES(
 d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p)
 sql=paste(sql,',d1,' )
 sql=paste(sql,,,ti[i,3] )
 sql=paste(sql,,,ti[i,4] )
 sql=paste(sql,,,ti[i,5] )
 sql=paste(sql,,,ti[i,6] )
 sql=paste(sql,,,ti[i,7] )
 sql=paste(sql,,,ti[i,8] )
 sql=paste(sql,,,ti[i,9] )
 sql=paste(sql,,,ti[i,10])
 sql=paste(sql,,,ti[i,11])
 sql=paste(sql,,,ti[i,12])
 sql=paste(sql,) )
 #print(sql)
 sqlQuery(channel, sql)
 }
 nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger)
 nadded=nafter-nbefore;nadded
 date()

I sure will try to help you out here. I've been working with RODBC. I
think what slows you down here is your loop with multiple paste
commands.

Have you considered the sqlSave() function with the append=T argument? I
think you could replace your loop with:

dat - cbind(strptime(ti[,2],%d/%m/%y %H:%M:%S %p),d1,ti[,3:12])
sqlSave(channel,dat,logger,append=T)

Of course, I haven't tested this so you may need some minor adjustments,
but I think this will greatly speed up your insert job.

Regards,
Jerome
-- 
Jerome Asselin, M.Sc., Agent de recherche, RHCE
CHUM -- Centre de recherche
3875 rue St-Urbain, 3e etage // Montreal QC  H2W 1V1
Tel.: 514-890-8000 Poste 15914; Fax: 514-412-7106

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RODBC sqlQuery insert slow

2006-10-13 Thread ONKELINX, Thierry

Large for loops are slow. Try to avoid them using apply, sapply, etc.
I've made the paste statements a lot shorter by using collapse. See
?paste for more info.

Append.SQL - function(x, channel){
  sql=INSERT INTO logger (time, v1, v2, v3, v4, v5, v6, v7, v8, v9,
v10) VALUES(d1=strptime(x[2],%d/%m/%y %H:%M:%S %p ', d1, ' ,,
paste(x[3:12], collapse = , ), ) ) 
  sqlQuery(channel, sql)
}

ntry=dim(ti)[1]
date()
nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger)
apply(ti, 2, Append.SQL, channel = channel)
nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger)
nadded=nafter-nbefore;nadded
date()




ir. Thierry Onkelinx

Instituut voor natuur- en bosonderzoek / Reseach Institute for Nature
and Forest

Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance

Gaverstraat 4

9500 Geraardsbergen

Belgium

tel. + 32 54/436 185

[EMAIL PROTECTED]

www.inbo.be 

 

Do not put your faith in what statistics say until you have carefully
considered what they do not say.  ~William W. Watt

A statistical analysis, properly conducted, is a delicate dissection of
uncertainties, a surgery of suppositions. ~M.J.Moroney


-Oorspronkelijk bericht-
Van: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Namens Bill Szkotnicki
Verzonden: vrijdag 13 oktober 2006 15:09
Aan: [EMAIL PROTECTED]
Onderwerp: [R] RODBC sqlQuery insert slow

Hello,
I am trying to insert a lot of data into a table using windows R (2.3.1)

and a mysql database via RODBC.
First I read a file with read.csv and then form sql insert statements 
for each row and execute the insert query one row at a time. See the 
loop below.
This turns out to be very slow.
Can anyone please suggest a way to speed it up?

Thanks, Bill

# R code
ntry=dim(ti)[1]
date()
nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger)
for (i in 1:ntry) {
sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES(
d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p)
sql=paste(sql,',d1,' )
sql=paste(sql,,,ti[i,3] )
sql=paste(sql,,,ti[i,4] )
sql=paste(sql,,,ti[i,5] )
sql=paste(sql,,,ti[i,6] )
sql=paste(sql,,,ti[i,7] )
sql=paste(sql,,,ti[i,8] )
sql=paste(sql,,,ti[i,9] )
sql=paste(sql,,,ti[i,10])
sql=paste(sql,,,ti[i,11])
sql=paste(sql,,,ti[i,12])
sql=paste(sql,) )
#print(sql)
sqlQuery(channel, sql)
}
nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger)
nadded=nafter-nbefore;nadded
date()

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Barplot legend position

2006-10-13 Thread John Kane


--- Ingmar Visser [EMAIL PROTECTED] wrote:

 Thanks, this could work!
 However, the legend does not reproduce the
 color/shading used in the
 original 
 barplot, are those available somehow?
 Best, Ingmar

?legend

Try
x=matrix(1:10,2,5)
barplot(x,besid=T, col=c(red,blue))
legend(topleft, c(left,right),
fill=c(red,blue))



 
 
 From: David Hajage [EMAIL PROTECTED]
 Date: Fri, 13 Oct 2006 14:11:21 +0200
 To: Ingmar Visser [EMAIL PROTECTED]
 Cc: R-help@stat.math.ethz.ch
 Subject: Re: [R] Barplot legend position
 
 For example :
 
 x=matrix(1:10,2,5)
 barplot(x,besid=T)
 legend(topleft, c(left,right), density=
 c(0,1000))
 
 
 2006/10/13, Ingmar Visser  [EMAIL PROTECTED]:
  Dear useRs,
  
  I'm trying to create a barplot like so:
  
  x=matrix(1:10,2,5)
  barplot(x,leg=c(left,right),besid=T)
  
  The legend is placed in default position topright,
 however the data are
  plotted there too. I tried controlling the legend
 position by adding
  x=topleft but this results in an error that x
 matches multiple formal
  arguments.
  
  Leaving out the legend and making a separate call
 to legend leaves out the
  colors of bars ...
  
  Please advice, Ingmar
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
 
 
 
 -- 
 David
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Log-scale in histogramm

2006-10-13 Thread Marc Schwartz

On Fri, 2006-10-13 at 13:33 +0200, David Graf wrote:
 Hello
 
 My data looks ugly in a normal histogramm. How can I create a
 histogramm with a Y-axis in log-scale?
 
 Thanks for your help!
 
 David Graf

I'm not sure that you want to use a log scale here, but may be better
served by log transforming your data.

For example:

# Generate 100 random values from a log normal dist:
x - rlnorm(100)


# Now do a histogram on x
hist(x, freq = FALSE)


# Now use log(x)
hist(log(x), freq = FALSE)


Does that help?

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] nontabular logistic regression

2006-10-13 Thread Jeffrey Stratford

Hi.  I'm attempting to fit a logistic/binomial model so I can determine
the influence of landscape on the probability that a box gets used by a
bird.  I've looked at a few sources (MASS text, Dalgaard, Fox and
google) and the examples are almost always based on tabular predictor
variables.  My data, however are not.  I'm not sure if that is the
source of the problems or not because the one example that includes a
continuous predictor looks to be coded exactly the same way.  Looking at
the output, I get estimates for each case when I should get a single
estimate for purbank.  Any suggestions?

Many thanks,

Jeff


THE DATA: (200 boxes total, used [0 if unoccupied, 1 occupied], the rest
are landscape variables).  

box use purbank purban2 purban1 pgrassk pgrass2 pgrass1 grassdist   
grasspatchk
1   1   0.003813435 0.02684564  0.06896552  0.3282487   
0.6845638   0.7586207   0   3.73
2   1   0.04429451  0.1610738   0.1724138   0.1534174   
0.3825503   0.6551724   0   1.023261
3   1   0.04458785  0.06040268  0   0.1628043   
0.5570470.7586207   0   0.9605769
4   1   0.06072162  0.2080537   0.06896552  0.01936052  
0   0   323.10990.2284615
5   0   0.6080962   0.6979866   0.6896552   0.03168084  
0.1275168   0.2413793   30  0.2627027
6   1   0.6060428   0.6107383   0.3448276   0.04077442  
0.2885906   0.4482759   30  0.2978571
7   1   0.3807568   0.4362416   0.6896552   0.06864183  
0.03355705  0   94.868330.468
8   0   0.3649164   0.3154362   0.4137931   0.06277501  
0.1275168   0   120 0.4585714

THE CODE:

box.use- read.csv(c:\\eabl\\2004\\use_logistic2.csv, header=TRUE)
attach(box.use)
box.use - na.omit(box.use)
use - factor(use, levels=0:1)
levels(use) - c(unused, used)
glm1 - glm(use ~ purbank, binomial)

THE OUTPUT:

Coefficients:
 Estimate Std. Error   z value Pr(|z|)
(Intercept)-4.544e-16  1.414e+00 -3.21e-161.000
purbank02.157e+01  2.923e+04 0.0010.999
purbank0.001173365  2.157e+01  2.067e+04 0.0010.999
purbank0.001466706  2.157e+01  2.923e+04 0.0010.999
purbank0.001760047  6.429e-16  2.000e+00  3.21e-161.000
purbank0.002346729  2.157e+01  2.923e+04 0.0010.999
purbank0.003813435  2.157e+01  2.923e+04 0.0010.999
purbank0.004106776  2.157e+01  2.067e+04 0.0010.999
purbank0.004693458  2.157e+01  2.067e+04 0.0010.999



Jeffrey A. Stratford, Ph.D.
Postdoctoral Associate
331 Funchess Hall
Department of Biological Sciences
Auburn University
Auburn, AL 36849
334-329-9198
FAX 334-844-9234
http://www.auburn.edu/~stratja

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problemss compiling RODBC

2006-10-13 Thread Jerome Asselin

On Fri, 2006-10-13 at 14:59 +0100, Vittorio wrote:
 When updating to the very last version of RODBC under freebsd 6.1 the 
 errors below pop up but RODBC compiles till the end and, it seems, to 
 work properly.
 What are those errors about?

I don't know what adverse effect this may have in RODBC.

To me, it looks like you're either missing
the /usr/local/lib/libpcre.so.0 file or that the file is damaged. Can
you confirm whether libpcre is installed on your system and for the
correct ARCH? Try to reinstall/recompile the pcre package. More info at
http://www.pcre.org (appears temporarily down at the moment I write).

Then I'd try to recompile RODBC.

HTH,
Jerome
-- 
Jerome Asselin, M.Sc., Agent de recherche, RHCE
CHUM -- Centre de recherche
3875 rue St-Urbain, 3e etage // Montreal QC  H2W 1V1
Tel.: 514-890-8000 Poste 15914; Fax: 514-412-7106

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RODBC sqlQuery insert slow

2006-10-13 Thread Armstrong, Whit

Is there a reason why the data have to be inserted 1 row at a time?

Is it possible to insert the entire table at once?

sqlSave perhaps.



 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Bill Szkotnicki
 Sent: Friday, October 13, 2006 9:09 AM
 To: [EMAIL PROTECTED]
 Subject: [R] RODBC sqlQuery insert slow
 
 Hello,
 I am trying to insert a lot of data into a table using 
 windows R (2.3.1) and a mysql database via RODBC.
 First I read a file with read.csv and then form sql insert 
 statements for each row and execute the insert query one row 
 at a time. See the loop below.
 This turns out to be very slow.
 Can anyone please suggest a way to speed it up?
 
 Thanks, Bill
 
 # R code
 ntry=dim(ti)[1]
 date()
 nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger) for 
 (i in 1:ntry) { sql=INSERT INTO logger 
 (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES(
 d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p) 
 sql=paste(sql,',d1,' ) sql=paste(sql,,,ti[i,3] ) 
 sql=paste(sql,,,ti[i,4] ) sql=paste(sql,,,ti[i,5] ) 
 sql=paste(sql,,,ti[i,6] ) sql=paste(sql,,,ti[i,7] ) 
 sql=paste(sql,,,ti[i,8] ) sql=paste(sql,,,ti[i,9] )
 sql=paste(sql,,,ti[i,10])
 sql=paste(sql,,,ti[i,11])
 sql=paste(sql,,,ti[i,12])
 sql=paste(sql,) )
 #print(sql)
 sqlQuery(channel, sql)
 }
 nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger) 
 nadded=nafter-nbefore;nadded
 date()
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 




This e-mail message is intended only for the named recipient(s) above. It may 
contain confidential information. If you are not the intended recipient you are 
hereby notified that any dissemination, distribution or copying of this e-mail 
and any attachment(s) is strictly prohibited. If you have received this e-mail 
in error, please immediately notify the sender by replying to this e-mail and 
delete the message and any attachment(s) from your system. Thank you.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need help with tables

2006-10-13 Thread John Kane


--- laba diena [EMAIL PROTECTED] wrote:

 I have a data file with 3 columns and I need to take
 only 2, how to do that

Have a look at the manual?
An Introduction to R
2.7 Index vectors; selecting and modifying subsets of
a data set

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R in Perl-Scripts

2006-10-13 Thread Torsten Mathies

Dear all,
 
I would like to use R-comands within Perl-Scripts. How can I do this?
 
Yours
 
Torsten

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rmpi performance

2006-10-13 Thread Michela Cameletti

Dear R users,
we are trying to do some parallel computing using library(snow).
In particular we have a cluster with 3 nodes

cl - makeCluster(3, type = MPI)
3 slaves are spawned successfully. 0 failed.


and we want to compute the function op_mat (see below) first with the 
master and then with the cluster using system.time for checking the 
computational performance.

op_mat = function(mat) {

+   inv = solve(mat)
+   det_inv = det(inversa)
+   tr_inv  = sum(diag(inversa))
+   return(list(c(det=det_inv,tr=tr_inv)))
+ }

nn = 3000
XX = matrix(rnorm(nn*nn),nn,nn)
# with the master
 system.time(op_matrici(XX))
[1] 42.283  1.883 44.168  0.000  0.000
# with the cluster
 system.time(clusterCall(cl,op_matrici,XX))
[1] 11.523 12.612 71.562  0.000  0.000

You can see that using the master it takes 44.168 seconds for computing 
the function on matrix XX while it takes 71.562 seconds (more time!!!) 
with the cluster. Can you give us some advice in order to understand why 
the cluster is slower than the master?
Thank you very much in advance,
bye
Michela  and Marco
Ps: we have a gigabit ethernet between the master and the nodes

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Helmert contrasts for repeated measures and split-plot expts

2006-10-13 Thread Spencer Graves

comments in line  

Roy Sanderson wrote:
 Dear R-help

 I have two separate experiments, one a repeated-measures design, the other
 a split-plot.  In a standard ANOVA I have usually undertaken a
 multiple-comparison test on a significant factor with e.g TukeyHSD, but as
 I understand it such a test is inappropriate for repeated measures or
 split-plot designs.

 Is it therefore sensible to use Helmert contrasts for either of these
 designs?  Whilst not providing all the pairwise comparisons of TukeyHSD,
 presumably the P-statistic for each Helmert contrast will indicate clearly
 whether that contrast is significant and should be retained in the model.
 (This seems to come with the disadvantage that the parameter values are
 harder to interpret than with Treatment contrasts.)  In the
 repeated-measures design the factor in question has three levels, whilst in
 the split-plot design it has four.
   
  You don't need to restrict yourself to Helmert vs. treatment 
contrasts:  You can use any set of contrasts that will provide 
estimates of (k-1) parameters for a factor with k levels and interpret 
the p values as you suggest.  I see two issues with doing this:  
correlation among parameter estimates and individual vs. group p values. 

CORRELATED PARAMETER ESTIMATES:  Helmert contrasts are orthogonal for a 
balanced design but will produce correlated parameter estimates with an 
unbalanced design.  This will generally increase the p values due to 
variance inflation created by the correlation.  If one or more 
correlations are too large, you may wish to try custom contrasts that 
produce parameter estimates that are essentially uncorrelated;  this 
should give you the smallest p value you could expect for that 
comparison.  If I was interested in, e.g., 2*k comparisons, I might run 
the same analysis several times with different contrasts, taking the p 
value for each comparison from an analysis in which the coefficient for 
that comparison had a low correlation with the remaining (k-2) 
coefficients for that k-level factor. 

INDIVIDUAL VS. GROUP p VALUES:  In many but not all cases, under the 
null hypothesis of no effect, a p value will follow a uniform 
distribution.  Thus, if we compute 1,000 p values using a typical 
procedure when nothing is going on, we can expect roughly 50 of them to 
be less than 0.05 by chance alone.  The Bonferroni inequality suggests 
that if we do m comparisons, we should multiply the smallest p value by 
m to convert it to a family- or group-wise p value.  This is known to be 
conservative, and with more than (k-1) comparisons among k levels of a 
factor, it is extremely conservative.  In that case, I would be inclined 
to multiple the smallest p value by (k-1), even if I considered many 
more than (k-1) comparisons among the k levels.  I don't know a 
reference for doing this, and if I were going to do it for a 
publication, I might do some simulations to check it.  Perhaps someone 
else might enlighten us both on how sensible this might be. 

  Hope this helps. 
  Spencer Graves
 Many thanks in advance
 Roy
 
 ---
 Roy Sanderson
 Institute for Research on Environment and Sustainability
 Devonshire Building
 University of Newcastle
 Newcastle upon Tyne
 NE1 7RU
 United Kingdom

 Tel: +44 191 246 4835
 Fax: +44 191 246 4999

 http://www.ncl.ac.uk/environment/
 [EMAIL PROTECTED]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RODBC sqlQuery insert slow

2006-10-13 Thread Michel Lang

 I am trying to insert a lot of data into a table using windows R (2.3.1)
 and a mysql database via RODBC.
 First I read a file with read.csv and then form sql insert statements
 for each row and execute the insert query one row at a time. See the
 loop below.
 This turns out to be very slow.
 Can anyone please suggest a way to speed it up?

A few weeks ago I had to solve a similar task. Inserting each row turned out 
to be horrible slow due to paste() and the data.frame-indexing. The estimated 
runtime would have been over 3 weeks, so I used MySQLs LOAD DATE INFILE 
syntax to speed things up. You must have FILE_PRIV = 'Y' set in the 
mysql.user-table to use this small hack, and I'm not that sure that this runs 
remotely. It is also assumed that your df has valid column-names.

tmp_filename - tempfile()

write.table(df, tmp_filename, na = \\N, row.names = FALSE, col.names = 
FALSE, quote = FALSE, sep = \t)

query - paste(
LOAD DATA LOCAL INFILE ',  tmp_filename, ',
 INTO TABLE , your_table,  (, toString(names(df)), );, sep = )

sqlQuery(channel, query)
unlink(tmp_filename)

The total runtime for the LOAD DATA INFILE querys was something below 5 
minutes, inserting 3e+6 rows with  200 columns.

Michel Lang

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple barplots on the same axis

2006-10-13 Thread John Kane


--- Andre Nathan [EMAIL PROTECTED] wrote:

 Hi
 
 R newbie here :)
 
 I need to plot 3 barplots in the same axis,
 something like
 
|
|  ___
| | | _| | _| | _ 
|   _ | || | _ | || | _ | || |
|  | || || || || || || || || |
   -+-
| v1   v2   v3
 
 
 Is there any documentation describing how to achieve
 that, and what data
 file layout would make the job easier?
 
 Thanks in advance,
 Andre

You might have a look at Using R for Data Analysis
and Graphics - Introduction, Examples and Commentary 
by John Maindonald
and at
An Introduction to S and the Hmisc and Design
Libraries
 by Carlos Alzola and Frank E. Harrell

Both are available on the R site : Follow the trail
Other  Contributed documentation.

Are you aware of R Site Search
http://finzi.psych.upenn.edu/search.html?  

To do what you want using traditional R graphics you
should read up on par() Type   ?par  

Here is some really quick and dirty code that might
help as a starting point. 
aa- 1:4
bb - c(2,3,5,6)
cc - c(4,5,6,7)
par(mfcol=c(1,3))
barplot(aa)
barplot(bb)
barplot(cc)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] combinatorics

2006-10-13 Thread Christos Hatzis

Hi Robin,

This approach first generates all combinations and then eliminates the
non-feasible ones.  It should work fine for smallish vectors but might not
scale well for larger vectors.  Hopefully it gives you what you need for
this problem.

xx - c(A,A,B,B,C)
yy - 1:length(xx)
zz - expand.grid(yy,yy,yy,yy,yy)

ss - zz[ apply(zz, 1, FUN=function(x) length(unique(x))) == length(xx), ] 
ss - as.matrix(ss)

pp - apply(ss, 1, FUN=function(x,v) paste(v[as.vector(x)], collapse=),
v=xx) 
res - unique(pp)

 res
 [1] CBBAA BCBAA BBCAA CBABA BCABA CABBA ACBBA BACBA ABCBA
BBACA BABCA
[12] ABBCA CBAAB BCAAB CABAB ACBAB BACAB ABCAB CAABB ACABB
AACBB BAACB
[23] ABACB AABCB BBAAC BABAC ABBAC BAABC ABABC AABBC
 length(res)
[1] 30 

-Christos

Christos Hatzis, Ph.D.
Nuvera Biosciences, Inc.
400 West Cummings Park
Suite 5350
Woburn, MA 01801
Tel: 781-938-3830
www.nuverabio.com
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Robin Hankin
Sent: Friday, October 13, 2006 10:19 AM
To: [EMAIL PROTECTED]
Subject: [R] combinatorics

Hi

How do I generate all ways of ordering  sets of indistinguishable items?

suppose I have two A's, two B's and a C.

Then I want

AABBC
AABCB
AACBC
ABABC
. . .snip...
BBAAC
. . .snip...
CBBAA

[there are 5!/(2!*2!) = 30 arrangements.  Note AABBC != BBAAC]

How do I do this?





--
Robin Hankin
Uncertainty Analyst
National Oceanography Centre, Southampton European Way, Southampton SO14
3ZH, UK
  tel  023-8059-7743

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nontabular logistic regression

2006-10-13 Thread Marc Schwartz

On Fri, 2006-10-13 at 09:28 -0500, Jeffrey Stratford wrote:
 Hi.  I'm attempting to fit a logistic/binomial model so I can determine
 the influence of landscape on the probability that a box gets used by a
 bird.  I've looked at a few sources (MASS text, Dalgaard, Fox and
 google) and the examples are almost always based on tabular predictor
 variables.  My data, however are not.  I'm not sure if that is the
 source of the problems or not because the one example that includes a
 continuous predictor looks to be coded exactly the same way.  Looking at
 the output, I get estimates for each case when I should get a single
 estimate for purbank.  Any suggestions?
 
 Many thanks,
 
 Jeff
 
 
 THE DATA: (200 boxes total, used [0 if unoccupied, 1 occupied], the rest
 are landscape variables).  
 
 box   use purbank purban2 purban1 pgrassk pgrass2 pgrass1 grassdist   
 grasspatchk
 1 1   0.003813435 0.02684564  0.06896552  0.3282487   
 0.6845638   0.7586207   0   3.73
 2 1   0.04429451  0.1610738   0.1724138   0.1534174   
 0.3825503   0.6551724   0   1.023261
 3 1   0.04458785  0.06040268  0   0.1628043   
 0.5570470.7586207   0   0.9605769
 4 1   0.06072162  0.2080537   0.06896552  0.01936052  
 0   0   323.10990.2284615
 5 0   0.6080962   0.6979866   0.6896552   0.03168084  
 0.1275168   0.2413793   30  0.2627027
 6 1   0.6060428   0.6107383   0.3448276   0.04077442  
 0.2885906   0.4482759   30  0.2978571
 7 1   0.3807568   0.4362416   0.6896552   0.06864183  
 0.03355705  0   94.868330.468
 8 0   0.3649164   0.3154362   0.4137931   0.06277501  
 0.1275168   0   120 0.4585714
 
 THE CODE:
 
 box.use- read.csv(c:\\eabl\\2004\\use_logistic2.csv, header=TRUE)
 attach(box.use)
 box.use - na.omit(box.use)
 use - factor(use, levels=0:1)
 levels(use) - c(unused, used)
 glm1 - glm(use ~ purbank, binomial)
 
 THE OUTPUT:
 
 Coefficients:
  Estimate Std. Error   z value Pr(|z|)
 (Intercept)-4.544e-16  1.414e+00 -3.21e-161.000
 purbank02.157e+01  2.923e+04 0.0010.999
 purbank0.001173365  2.157e+01  2.067e+04 0.0010.999
 purbank0.001466706  2.157e+01  2.923e+04 0.0010.999
 purbank0.001760047  6.429e-16  2.000e+00  3.21e-161.000
 purbank0.002346729  2.157e+01  2.923e+04 0.0010.999
 purbank0.003813435  2.157e+01  2.923e+04 0.0010.999
 purbank0.004106776  2.157e+01  2.067e+04 0.0010.999
 purbank0.004693458  2.157e+01  2.067e+04 0.0010.999


It appears that the 'purbank' variable is being imported as a factor,
hence the multiple levels indicated in the left hand column. 

Check:

  str(box.use)

right after the read.csv() step and see what it shows.


From the sample data above, it _should_ be along the lines of:

 str(box.use)
'data.frame':   8 obs. of  10 variables:
 $ box: int  1 2 3 4 5 6 7 8
 $ use: int  1 1 1 1 0 1 1 0
 $ purbank: num  0.00381 0.04429 0.04459 0.06072 0.60810 ...
 $ purban2: num  0.0268 0.1611 0.0604 0.2081 0.6980 ...
 $ purban1: num  0.069 0.172 0.000 0.069 0.690 ...
 $ pgrassk: num  0.3282 0.1534 0.1628 0.0194 0.0317 ...
 $ pgrass2: num  0.685 0.383 0.557 0.000 0.128 ...
 $ pgrass1: num  0.759 0.655 0.759 0.000 0.241 ...
 $ grassdist  : num0   0   0 323  30 ...
 $ grasspatchk: num  3.730 1.023 0.961 0.228 0.263 ...


Hence, you should be able to use:

 model - glm(use ~ purbank, data = box.use, family = binomial)

 summary(model)

Call:
glm(formula = use ~ purbank, family = binomial, data = box.use)

Deviance Residuals:
 Min1QMedian3Q   Max
-1.61450  -0.03098   0.31935   0.45888   1.39194

Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept)3.223  2.225   1.4480.147
purbank   -6.129  4.773  -1.2840.199

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 8.9974  on 7  degrees of freedom
Residual deviance: 6.5741  on 6  degrees of freedom
AIC: 10.574

Number of Fisher Scoring iterations: 5


Note that na.omit() is the default operation for most R models, so is
redundant. Also, I would not attach the data frame, as you can use the
'data' argument in model related functions. This avoids the confusion of
having multiple copies of the source data set and wondering why changes
made can become confusing and problematic.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nontabular logistic regression

2006-10-13 Thread Sundar Dorai-Raj



Jeffrey Stratford said the following on 10/13/2006 9:28 AM:
 Hi.  I'm attempting to fit a logistic/binomial model so I can determine
 the influence of landscape on the probability that a box gets used by a
 bird.  I've looked at a few sources (MASS text, Dalgaard, Fox and
 google) and the examples are almost always based on tabular predictor
 variables.  My data, however are not.  I'm not sure if that is the
 source of the problems or not because the one example that includes a
 continuous predictor looks to be coded exactly the same way.  Looking at
 the output, I get estimates for each case when I should get a single
 estimate for purbank.  Any suggestions?
 
 Many thanks,
 
 Jeff
 
 
 THE DATA: (200 boxes total, used [0 if unoccupied, 1 occupied], the rest
 are landscape variables).  
 
 box   use purbank purban2 purban1 pgrassk pgrass2 pgrass1 grassdist   
 grasspatchk
 1 1   0.003813435 0.02684564  0.06896552  0.3282487   
 0.6845638   0.7586207   0   3.73
 2 1   0.04429451  0.1610738   0.1724138   0.1534174   
 0.3825503   0.6551724   0   1.023261
 3 1   0.04458785  0.06040268  0   0.1628043   
 0.5570470.7586207   0   0.9605769
 4 1   0.06072162  0.2080537   0.06896552  0.01936052  
 0   0   323.10990.2284615
 5 0   0.6080962   0.6979866   0.6896552   0.03168084  
 0.1275168   0.2413793   30  0.2627027
 6 1   0.6060428   0.6107383   0.3448276   0.04077442  
 0.2885906   0.4482759   30  0.2978571
 7 1   0.3807568   0.4362416   0.6896552   0.06864183  
 0.03355705  0   94.868330.468
 8 0   0.3649164   0.3154362   0.4137931   0.06277501  
 0.1275168   0   120 0.4585714
 
 THE CODE:
 
 box.use- read.csv(c:\\eabl\\2004\\use_logistic2.csv, header=TRUE)
 attach(box.use)
 box.use - na.omit(box.use)
 use - factor(use, levels=0:1)
 levels(use) - c(unused, used)
 glm1 - glm(use ~ purbank, binomial)
 
 THE OUTPUT:
 
 Coefficients:
  Estimate Std. Error   z value Pr(|z|)
 (Intercept)-4.544e-16  1.414e+00 -3.21e-161.000
 purbank02.157e+01  2.923e+04 0.0010.999
 purbank0.001173365  2.157e+01  2.067e+04 0.0010.999
 purbank0.001466706  2.157e+01  2.923e+04 0.0010.999
 purbank0.001760047  6.429e-16  2.000e+00  3.21e-161.000
 purbank0.002346729  2.157e+01  2.923e+04 0.0010.999
 purbank0.003813435  2.157e+01  2.923e+04 0.0010.999
 purbank0.004106776  2.157e+01  2.067e+04 0.0010.999
 purbank0.004693458  2.157e+01  2.067e+04 0.0010.999
 
 
 
 Jeffrey A. Stratford, Ph.D.
 Postdoctoral Associate
 331 Funchess Hall
 Department of Biological Sciences
 Auburn University
 Auburn, AL 36849
 334-329-9198
 FAX 334-844-9234
 http://www.auburn.edu/~stratja
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

That's not what I get:

lines - 
box,use,purbank,purban2,purban1,pgrassk,pgrass2,pgrass1,grassdist,grasspatchk
1,1,0.003813435,0.02684564,0.06896552,0.3282487,0.6845638,0.7586207,0,3.73
2,1,0.04429451,0.1610738,0.1724138,0.1534174,0.3825503,0.6551724,0,1.023261
3,1,0.04458785,0.06040268,0,0.1628043,0.557047,0.7586207,0,0.9605769
4,1,0.06072162,0.2080537,0.06896552,0.01936052,0,0,323.1099,0.2284615
5,0,0.6080962,0.6979866,0.6896552,0.03168084,0.1275168,0.2413793,30,0.2627027
6,1,0.6060428,0.6107383,0.3448276,0.04077442,0.2885906,0.4482759,30,0.2978571
7,1,0.3807568,0.4362416,0.6896552,0.06864183,0.03355705,0,94.86833,0.468
8,0,0.3649164,0.3154362,0.4137931,0.06277501,0.1275168,0,120,0.4585714
box.use - read.csv(textConnection(lines))

box.use - na.omit(box.use)
box.use$use - factor(box.use$use, levels=0:1)
levels(box.use$use) - c(unused, used)
(glm1 - glm(use ~ purbank, binomial, box.use))

You need to check why purbank is being interpreted as a factor in your code.

Also, I removed your use of attach because I find it dangerous 
(especially with no detach). Better to be explicit.


HTH,

--sundar

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R in Perl-Scripts

2006-10-13 Thread Jörg Beyer

Torsten, 

your message is a bit terse.
Do you have Omegahat's RSPerl package installed?
If not, visit www.omegahat.org, and have a look at the documentation. (Btw,
you will find a lot of other useful stuff there.)

Sorry that I can't offer more at the moment -- but to know where to start
may be better than nothing ;-)

Cheers, 
Jörg

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] combinatorics

2006-10-13 Thread Robin Hankin

Hi Christos

thanks for this.  Unfortunately, this approach wouldn't work for me
because the real problem is too big for it: I have
letters A-F and two of each, giving

12!/(2^6) ~= 7e6 combinations (borderline feasible)

But in the approach you coded up below, matrix zz would have
  6^12 ~= 2e9 rows before eliminating the non-feasible ones.
This is too big for me.

Looks like it's going to be another weekend lost to C
[but at least I now have some confidence that I've not overlooked
anything obvious!]

With very best wishes, I really appreciate your efforts



Robin




On 13 Oct 2006, at 16:21, Christos Hatzis wrote:

 Hi Robin,

 This approach first generates all combinations and then eliminates the
 non-feasible ones.  It should work fine for smallish vectors but  
 might not
 scale well for larger vectors.  Hopefully it gives you what you  
 need for
 this problem.

 xx - c(A,A,B,B,C)
 yy - 1:length(xx)
 zz - expand.grid(yy,yy,yy,yy,yy)

 ss - zz[ apply(zz, 1, FUN=function(x) length(unique(x))) == length 
 (xx), ]
 ss - as.matrix(ss)

 pp - apply(ss, 1, FUN=function(x,v) paste(v[as.vector(x)],  
 collapse=),
 v=xx)
 res - unique(pp)

 res
  [1] CBBAA BCBAA BBCAA CBABA BCABA CABBA ACBBA  
 BACBA ABCBA
 BBACA BABCA
 [12] ABBCA CBAAB BCAAB CABAB ACBAB BACAB ABCAB  
 CAABB ACABB
 AACBB BAACB
 [23] ABACB AABCB BBAAC BABAC ABBAC BAABC ABABC AABBC
 length(res)
 [1] 30

 -Christos

 Christos Hatzis, Ph.D.
 Nuvera Biosciences, Inc.
 400 West Cummings Park
 Suite 5350
 Woburn, MA 01801
 Tel: 781-938-3830
 www.nuverabio.com


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Robin Hankin
 Sent: Friday, October 13, 2006 10:19 AM
 To: [EMAIL PROTECTED]
 Subject: [R] combinatorics

 Hi

 How do I generate all ways of ordering  sets of indistinguishable  
 items?

 suppose I have two A's, two B's and a C.

 Then I want

 AABBC
 AABCB
 AACBC
 ABABC
 . . .snip...
 BBAAC
 . . .snip...
 CBBAA

 [there are 5!/(2!*2!) = 30 arrangements.  Note AABBC != BBAAC]

 How do I do this?





 --
 Robin Hankin
 Uncertainty Analyst
 National Oceanography Centre, Southampton European Way, Southampton  
 SO14
 3ZH, UK
   tel  023-8059-7743

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.



--
Robin Hankin
Uncertainty Analyst
National Oceanography Centre, Southampton
European Way, Southampton SO14 3ZH, UK
  tel  023-8059-7743

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] combinatorics

2006-10-13 Thread jim holtman

Use 'permutations' in 'gtools'

x - permutations(5,5)
y - c('a','a','b','b','c')[x]
dim(y) - dim(x)
unique(y)



On 10/13/06, Robin Hankin [EMAIL PROTECTED] wrote:
 Hi

 How do I generate all ways of ordering  sets of indistinguishable items?

 suppose I have two A's, two B's and a C.

 Then I want

 AABBC
 AABCB
 AACBC
 ABABC
 . . .snip...
 BBAAC
 . . .snip...
 CBBAA

 [there are 5!/(2!*2!) = 30 arrangements.  Note AABBC != BBAAC]

 How do I do this?





 --
 Robin Hankin
 Uncertainty Analyst
 National Oceanography Centre, Southampton
 European Way, Southampton SO14 3ZH, UK
  tel  023-8059-7743

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nontabular logistic regression

2006-10-13 Thread Gavin Simpson

On Fri, 2006-10-13 at 09:28 -0500, Jeffrey Stratford wrote:
 Hi.  I'm attempting to fit a logistic/binomial model so I can determine
 the influence of landscape on the probability that a box gets used by a
 bird.  I've looked at a few sources (MASS text, Dalgaard, Fox and
 google) and the examples are almost always based on tabular predictor
 variables.  My data, however are not.  I'm not sure if that is the
 source of the problems or not because the one example that includes a
 continuous predictor looks to be coded exactly the same way.  Looking at
 the output, I get estimates for each case when I should get a single
 estimate for purbank.  Any suggestions?
 
 Many thanks,
 
 Jeff

Hi Jeff,

using the snippet of data you provided (copy/paste into a text file and
read in with read.table) worked fine:

box.use - read.table(~/tmp/tmp.txt, header = TRUE)
box.use
str(box.use)
'data.frame':   8 obs. of  10 variables:
 $ box: int  1 2 3 4 5 6 7 8
 $ use: int  1 1 1 1 0 1 1 0
 $ purbank: num  0.00381 0.04429 0.04459 0.06072 0.60810 ...
 $ purban2: num  0.0268 0.1611 0.0604 0.2081 0.6980 ...
 $ purban1: num  0.069 0.172 0.000 0.069 0.690 ...
 $ pgrassk: num  0.3282 0.1534 0.1628 0.0194 0.0317 ...
 $ pgrass2: num  0.685 0.383 0.557 0.000 0.128 ...
 $ pgrass1: num  0.759 0.655 0.759 0.000 0.241 ...
 $ grassdist  : num0   0   0 323  30 ...
 $ grasspatchk: num  3.730 1.023 0.961 0.228 0.263 ...

Now I don't like attach, and you just don't need it so I deviate a
little now. Replace box.use$use directly and make use of the data
argument in glm. Also, your data didn't have any missing data so I'm not
sure whether the response or predictor is missing and whether your
na.omit is needed or not - I don't use it below.

box.use$use - factor(box.use$use, levels=0:1)
levels(box.use$use) - c(unused, used)
box.use
str(box.use)
glm1 - glm(use ~ purbank, data = box.use, family = binomial())

summary(glm1)

Call:
glm(formula = use ~ purbank, family = binomial(), data = box.use)

Deviance Residuals:
 Min1QMedian3Q   Max
-1.61450  -0.03098   0.31935   0.45888   1.39194

Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept)3.223  2.225   1.4480.147
purbank   -6.129  4.773  -1.2840.199

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 8.9974  on 7  degrees of freedom
Residual deviance: 6.5741  on 6  degrees of freedom
AIC: 10.574

Number of Fisher Scoring iterations: 5


I suspect something got messed up in your reading of the data and R
thought purbank was a factor or character. Always check your data after
reading in, and str() is a your friend here as printed representations
are not always what they seem.

HTH

G

 
 
 THE DATA: (200 boxes total, used [0 if unoccupied, 1 occupied], the rest
 are landscape variables).  
 
 box   use purbank purban2 purban1 pgrassk pgrass2 pgrass1 grassdist   
 grasspatchk
 1 1   0.003813435 0.02684564  0.06896552  0.3282487   
 0.6845638   0.7586207   0   3.73
 2 1   0.04429451  0.1610738   0.1724138   0.1534174   
 0.3825503   0.6551724   0   1.023261
 3 1   0.04458785  0.06040268  0   0.1628043   
 0.5570470.7586207   0   0.9605769
 4 1   0.06072162  0.2080537   0.06896552  0.01936052  
 0   0   323.10990.2284615
 5 0   0.6080962   0.6979866   0.6896552   0.03168084  
 0.1275168   0.2413793   30  0.2627027
 6 1   0.6060428   0.6107383   0.3448276   0.04077442  
 0.2885906   0.4482759   30  0.2978571
 7 1   0.3807568   0.4362416   0.6896552   0.06864183  
 0.03355705  0   94.868330.468
 8 0   0.3649164   0.3154362   0.4137931   0.06277501  
 0.1275168   0   120 0.4585714
 
 THE CODE:
 
 box.use- read.csv(c:\\eabl\\2004\\use_logistic2.csv, header=TRUE)
 attach(box.use)
 box.use - na.omit(box.use)
 use - factor(use, levels=0:1)
 levels(use) - c(unused, used)
 glm1 - glm(use ~ purbank, binomial)
 
 THE OUTPUT:
 
 Coefficients:
  Estimate Std. Error   z value Pr(|z|)
 (Intercept)-4.544e-16  1.414e+00 -3.21e-161.000
 purbank02.157e+01  2.923e+04 0.0010.999
 purbank0.001173365  2.157e+01  2.067e+04 0.0010.999
 purbank0.001466706  2.157e+01  2.923e+04 0.0010.999
 purbank0.001760047  6.429e-16  2.000e+00  3.21e-161.000
 purbank0.002346729  2.157e+01  2.923e+04 0.0010.999
 purbank0.003813435  2.157e+01  2.923e+04 0.0010.999
 purbank0.004106776  2.157e+01  2.067e+04 0.0010.999
 purbank0.004693458  2.157e+01  2.067e+04 0.0010.999
 
 
 
 Jeffrey A. Stratford, Ph.D.
 Postdoctoral Associate
 331 Funchess Hall
 Department of

Re: [R] RODBC sqlQuery insert slow

2006-10-13 Thread Bill Szkotnicki

Thanks for the help ... the  sqlSave() function was the solution.
The lesson, which has been stated many times before,  is to avoid loops 
wherever possible!
Bill

# fast RODBC inserting
dat - cbind(as.character(strptime(ti[,2],%d/%m/%y %H:%M:%S 
%p)),ti[,3:12])
# you need the as.character to make sure the time is stored correctly in 
mysql
names(dat)=c(time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10)
sqlSave(channel,dat,logger,rownames=F,append=T) # very fast.
#

Jerome Asselin wrote:
 On Fri, 2006-10-13 at 09:09 -0400, Bill Szkotnicki wrote:
   
 Hello,
 I am trying to insert a lot of data into a table using windows R (2.3.1) 
 and a mysql database via RODBC.
 First I read a file with read.csv and then form sql insert statements 
 for each row and execute the insert query one row at a time. See the 
 loop below.
 This turns out to be very slow.
 Can anyone please suggest a way to speed it up?

 Thanks, Bill

 # R code
 ntry=dim(ti)[1]
 date()
 nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger)
 for (i in 1:ntry) {
 sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES(
 d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p)
 sql=paste(sql,',d1,' )
 sql=paste(sql,,,ti[i,3] )
 sql=paste(sql,,,ti[i,4] )
 sql=paste(sql,,,ti[i,5] )
 sql=paste(sql,,,ti[i,6] )
 sql=paste(sql,,,ti[i,7] )
 sql=paste(sql,,,ti[i,8] )
 sql=paste(sql,,,ti[i,9] )
 sql=paste(sql,,,ti[i,10])
 sql=paste(sql,,,ti[i,11])
 sql=paste(sql,,,ti[i,12])
 sql=paste(sql,) )
 #print(sql)
 sqlQuery(channel, sql)
 }
 nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger)
 nadded=nafter-nbefore;nadded
 date()
 

 I sure will try to help you out here. I've been working with RODBC. I
 think what slows you down here is your loop with multiple paste
 commands.

 Have you considered the sqlSave() function with the append=T argument? I
 think you could replace your loop with:

 dat - cbind(strptime(ti[,2],%d/%m/%y %H:%M:%S %p),d1,ti[,3:12])
 sqlSave(channel,dat,logger,append=T)

 Of course, I haven't tested this so you may need some minor adjustments,
 but I think this will greatly speed up your insert job.

 Regards,
 Jerome


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problemss compiling RODBC

2006-10-13 Thread vittorio

There is.

# ls /usr/local/lib/libpcre*
@libpcre.so
libpcre.so.0

Vittorio

Alle 14:39, venerdì 13 ottobre 2006, Jerome Asselin ha scritto:
 On Fri, 2006-10-13 at 14:59 +0100, Vittorio wrote:
  When updating to the very last version of RODBC under freebsd 6.1 the
  errors below pop up but RODBC compiles till the end and, it seems, to
  work properly.
  What are those errors about?

 I don't know what adverse effect this may have in RODBC.

 To me, it looks like you're either missing
 the /usr/local/lib/libpcre.so.0 file or that the file is damaged. Can
 you confirm whether libpcre is installed on your system and for the
 correct ARCH? Try to reinstall/recompile the pcre package. More info at
 http://www.pcre.org (appears temporarily down at the moment I write).

 Then I'd try to recompile RODBC.

 HTH,
 Jerome

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rmpi performance

2006-10-13 Thread Martin Morgan

clusterCall invokes the same function on all three nodes. You have
basically discovered the communication costs of performing the
calculation in parallel.

You'll get the easiest gains from snow (and other parallel packages in
R) with 'embarrassingly parallel' problems, where the same algorithm is
applied to different data sets / slices of data. For performance gains
from a single call to op_mat, you'd have to do some serious parallel
algorithm development to distribute the data and computations
effectively.

Hope that helps,

Martin

Michela Cameletti [EMAIL PROTECTED] writes:

 Dear R users,
 we are trying to do some parallel computing using library(snow).
 In particular we have a cluster with 3 nodes

cl - makeCluster(3, type = MPI)
 3 slaves are spawned successfully. 0 failed.


 and we want to compute the function op_mat (see below) first with the 
 master and then with the cluster using system.time for checking the 
 computational performance.

 op_mat = function(mat) {

 +   inv = solve(mat)
 +   det_inv = det(inversa)
 +   tr_inv  = sum(diag(inversa))
 +   return(list(c(det=det_inv,tr=tr_inv)))
 + }

nn = 3000
XX = matrix(rnorm(nn*nn),nn,nn)
 # with the master
 system.time(op_matrici(XX))
 [1] 42.283  1.883 44.168  0.000  0.000
 # with the cluster
 system.time(clusterCall(cl,op_matrici,XX))
 [1] 11.523 12.612 71.562  0.000  0.000

 You can see that using the master it takes 44.168 seconds for computing 
 the function on matrix XX while it takes 71.562 seconds (more time!!!) 
 with the cluster. Can you give us some advice in order to understand why 
 the cluster is slower than the master?
 Thank you very much in advance,
 bye
 Michela  and Marco
 Ps: we have a gigabit ethernet between the master and the nodes

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Martin T. Morgan
Bioconductor / Computational Biology
http://bioconductor.org

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Log-scale in histogramm

2006-10-13 Thread David Scott

On Fri, 13 Oct 2006, Marc Schwartz wrote:

 On Fri, 2006-10-13 at 13:33 +0200, David Graf wrote:
 Hello

 My data looks ugly in a normal histogramm. How can I create a
 histogramm with a Y-axis in log-scale?

 Thanks for your help!

 David Graf


There is a log-histogram (called log.hist) in my package HyperbolicDist, 
and an updated one on my web page: 
http://www.stat.auckland.ac.nz/~dscott/Rpackage/NewFunctions/logHist.R

David Scott



_
David Scott Visiting (Until January 07)
Department of Probability and Statistics
The University of Sheffield
The Hicks Building
Hounsfield Road
Sheffield S3 7RH
United Kingdom
Phone:  +44 114 222 3908
Email:  [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Adding per-panel text to panel strips in lattice xyplot

2006-10-13 Thread Frank E Harrell Jr

I would like to add auxiliary information to the bottom of two strips on 
each panel that comes from a table look-up using the values of two 
variables that define the panel.  For example I might panel on sex and 
race, showing 3 randomly chosen time series in each panel and want to 
add (n=100) in the bottom strip to indicate the 3 curves were sampled 
from 100.  Is there a not-too-hard way to do that?

I would like to do this both with and without groups= and superposition, 
but especially with.

Thanks
Frank
-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rmpi performance

2006-10-13 Thread Luke Tierney

On Fri, 13 Oct 2006, Michela Cameletti wrote:

 Dear R users,
 we are trying to do some parallel computing using library(snow).
 In particular we have a cluster with 3 nodes

 cl - makeCluster(3, type = MPI)
3 slaves are spawned successfully. 0 failed.


 and we want to compute the function op_mat (see below) first with the
 master and then with the cluster using system.time for checking the
 computational performance.

 op_mat = function(mat) {

 +   inv = solve(mat)
 +   det_inv = det(inversa)
 +   tr_inv  = sum(diag(inversa))
 +   return(list(c(det=det_inv,tr=tr_inv)))
 + }

What is inversa?


 nn = 3000
 XX = matrix(rnorm(nn*nn),nn,nn)
 # with the master
 system.time(op_matrici(XX))
 [1] 42.283  1.883 44.168  0.000  0.000
 # with the cluster
 system.time(clusterCall(cl,op_matrici,XX))
 [1] 11.523 12.612 71.562  0.000  0.000

 You can see that using the master it takes 44.168 seconds for computing
 the function on matrix XX while it takes 71.562 seconds (more time!!!)

Of coure it takes more time to do the same computation plus
communication!

The amount of additional time seems high if your nodes are comparable
in speed to your master and you really are getting gigabit
performance.  I would look for a visualization tool an idea of what is
happening--perhaps xmpi if your MPI is LAM.

Best,

luke


 with the cluster. Can you give us some advice in order to understand why
 the cluster is slower than the master?
 Thank you very much in advance,
 bye
 Michela  and Marco
 Ps: we have a gigabit ethernet between the master and the nodes

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:  [EMAIL PROTECTED]
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] loop, pipe connection, output objects

2006-10-13 Thread jim holtman

Why don't you read the whole file in and the use subsetting to get
your ranges instead of reading the file multiple time using 'gawk'.
You can then use 'assign' to create your objects; it would be better
to use a list.

indice - (201:399)
result - list()
x - read.table('base_6_mod')
for (i in indice){
assign(paste('data.', i, sep=''), x[x[,2] = i  x[,2]  i+1, 2:3])
# or better
result[[i]] - x[x[,2] = i  x[,2]  i+1, 2:3]
}
# or without loops
result - lapply(indice, function(z) x[x[,2] = z  x[,2]  z,])

On 10/13/06, Marco Grazzi [EMAIL PROTECTED] wrote:
 Hi all,

 I have the following -newbye- problem.
 Inside R, I am trying to process a file and creating from it many files.
 The file is organized in different columns, the second containing a code. I 
 want to create as output objects, which contain only entries in a certain 
 code range, and whose name contain the code itself.
 Here is my attempt

 indice - (201:399)
 for(i in indice){
  data.i -  read.table(pipe(paste(gawk '{if ($2 =, i,  $2, i+1,) 
 print $2,$3}'  base_6_mod   , sep='')))
print(paste(code ..., i,   ... done))
 }

 The problems are:
 1- My sintax data.i does not work, and loop only produces a big data.i 
 object. My goal, obviously was to have something like data.201, data.202, etc
 (second order problem)
 2- I wonder if the sintax for the index ($2 =, i,  $2, i+1,) works

 Thanks for your help (hoping I manged to be enough clear),
 marco
 --

 Marco Grazzi

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cygwin script for Sweave

2006-10-13 Thread Thomas Harte

below is a very simple bash script to run Sweave from a cygwin terminal, run 
pdflatex on 
the generated .tex file, and then view the resulting .pdf output.

i usually use cygwin when i am (forced to be on) Windoze, but i found a few 
issues 
with paths that this script works around.

pdfview, used in the script, is simply:


$ cat /usr/local/bin/pdfview
#!/bin/bash.exe

if [ $# -eq 1 ]
then
/c/Program\ Files/Adobe/Acrobat\ 6.0/Reader/AcroRd32.exe `cygpath -w -a 
-s $1`
else
/c/Program\ Files/Adobe/Acrobat\ 6.0/Reader/AcroRd32.exe
fi


mutatis mutandis for your own Adobe Reader. 


here is the script:

#!/bin/bash.exe
# rnw.sh [.Rnw file]
#
# $1 must be a .Rnw file
#
RNWFILE=$1
PWD=`pwd`
FILEBASE=`basename $1 .Rnw`
TEXFILE=$FILEBASE.tex
PDFFILE=$FILEBASE.pdf

echo  \
library(\utils\); \
setwd(\`cygpath -m $PWD`\);   \
Sweave(\$RNWFILE\)\
   \
  | /c/R/R-2.3.1/bin/Rterm.exe --no-save --no-restore
 
# the resulting .tex file contains an annoying c: ... 
# replace it with the pdflatex-friendly /c :
sed -e 's/c:/\/c/g' --in-place $TEXFILE

# now run text processing
pdflatex $TEXFILE
pdfview $PDFFILE 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] HP UX

2006-10-13 Thread Ricky Rankin

I have a user who is currently running R on a desktop system which takes 3
days to run. 

 

We have an Itanium 2 Cluster running HP UX. My system manager has tried to
install R and has sent the following message

INSTALL file says do

 

./configure

 

make 

 

./configure fails with ( tail end of output

 

 

checking for main in -ltermcap... no checking for main in -ltermlib... no
checking for rl_callback_read_char in -lreadline... no checking for
history_truncate_file... no configure: error: --with-readline=yes (default)
and headers/libs are not available

 

Has anyone install R on HP UX system?

 

Ricky

__

Principal Analyst

Information Services

Queen's University Belfast

 

tel: 02890 974824

email: [EMAIL PROTECTED]

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] loop, pipe connection, output objects

2006-10-13 Thread Gabor Grothendieck

read your data frame in all at once and then cut it on x[2] and split
the result, e.g.

split(iris, cut(iris$Sepal.Length, 4:8))

Please provide reproducible code.  Without input its not reproducible.
See last line of every message to r-help.


On 10/13/06, Marco Grazzi [EMAIL PROTECTED] wrote:
 Hi all,

 I have the following -newbye- problem.
 Inside R, I am trying to process a file and creating from it many files.
 The file is organized in different columns, the second containing a code. I 
 want to create as output objects, which contain only entries in a certain 
 code range, and whose name contain the code itself.
 Here is my attempt

 indice - (201:399)
 for(i in indice){
  data.i -  read.table(pipe(paste(gawk '{if ($2 =, i,  $2, i+1,) 
 print $2,$3}'  base_6_mod   , sep='')))
print(paste(code ..., i,   ... done))
 }

 The problems are:
 1- My sintax data.i does not work, and loop only produces a big data.i 
 object. My goal, obviously was to have something like data.201, data.202, etc
 (second order problem)
 2- I wonder if the sintax for the index ($2 =, i,  $2, i+1,) works

 Thanks for your help (hoping I manged to be enough clear),
 marco
 --

 Marco Grazzi

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problemss compiling RODBC

2006-10-13 Thread Jerome Asselin

On Fri, 2006-10-13 at 18:07 +, vittorio wrote:
 There is.
 
 # ls /usr/local/lib/libpcre*
 @libpcre.so
 libpcre.so.0

I don't use freebsd. So I'm not sure how to help. As I hinted before,
I'd first try to reinstall or update the pcre package and make sure that
all its dependencies are satisfied. Confirm that the pcre package
version is reasonably up-to-date.

I see your problem seems to have something to do with the grep command.
Can you actually use grep as a system command? E.g.:
ps axu | grep $USER

What do you get if you run this? 
ldd /bin/grep

Jerome

-- 
Jerome Asselin, M.Sc., Agent de recherche, RHCE
CHUM -- Centre de recherche
3875 rue St-Urbain, 3e etage // Montreal QC  H2W 1V1
Tel.: 514-890-8000 Poste 15914; Fax: 514-412-7106

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rmpi performance

2006-10-13 Thread Thomas Lumley

On Fri, 13 Oct 2006, Michela Cameletti wrote:

 Dear R users,
 we are trying to do some parallel computing using library(snow).
 In particular we have a cluster with 3 nodes

 cl - makeCluster(3, type = MPI)
3 slaves are spawned successfully. 0 failed.


 and we want to compute the function op_mat (see below) first with the
 master and then with the cluster using system.time for checking the
 computational performance.

 op_mat = function(mat) {

 +   inv = solve(mat)
 +   det_inv = det(inversa)
 +   tr_inv  = sum(diag(inversa))
 +   return(list(c(det=det_inv,tr=tr_inv)))
 + }

 nn = 3000
 XX = matrix(rnorm(nn*nn),nn,nn)
 # with the master
 system.time(op_matrici(XX))
 [1] 42.283  1.883 44.168  0.000  0.000
 # with the cluster
 system.time(clusterCall(cl,op_matrici,XX))
 [1] 11.523 12.612 71.562  0.000  0.000

 You can see that using the master it takes 44.168 seconds for computing
 the function on matrix XX while it takes 71.562 seconds (more time!!!)
 with the cluster. Can you give us some advice in order to understand why
 the cluster is slower than the master?

clusterCall() evaluates the same call on each computer in the cluster, so 
it will always be slower than just evaluating on the master.  It is 
useful for setup that has to be performed on each machine, or for parallel 
evaluation of random functions (eg boostrapping, simulation)

To split up a single computation you have to do it explicitly, eg with 
parLapply, parSapply, and parApply, or parMM for parallel matrix 
multiplication. It's unlikely that you could speed up inverting a dense 
matrix even with gigabit ethernet for communication -- the success of 
ATLAS and Dr Goto's tuned BLAS libraries shows that the time taken for 
dense linear algebra can be dominated by communications overhead even 
between a CPU and its own memory.

-thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] combinatorics

2006-10-13 Thread Ted Harding

On 13-Oct-06 Robin Hankin wrote:
 Hi
 
 How do I generate all ways of ordering  sets of indistinguishable
 items?
 
 suppose I have two A's, two B's and a C.
 
 Then I want
 
 AABBC
 AABCB
 AACBC  I think you mean AACBB here!
 ABABC
 . . .snip...
 BBAAC
 . . .snip...
 CBBAA
 
 [there are 5!/(2!*2!) = 30 arrangements.  Note AABBC != BBAAC]
 
 How do I do this?

I've tried to think of an efficient and economical (and therefore
clever) way of doing this for larger problems; but that will have
to wait for another day!

Meanwhile, a problem of the order of the one you describe above
can be solved quite slickly:

X-c(A,A,B,B,C)
library(combinat)

##[result below stripped of  quotes for clarity]
unique(array(permn(X)))
[[1]]
 [1]   A A B B C
[[2]]
 [1]   A A B C B
[[3]]
 [1]   A A C B B
[[4]]
 [1]   A C A B B
[[5]]
 [1]   C A A B B
[[6]]
 [1]   A B A B C
[[7]]
 [1]   A B A C B
[[8]]
 [1]   A B C A B
[[9]]
 [1]   A C B A B
[[10]]
 [1]   C A B A B
[[11]]
 [1]   C B A A B
[[12]]
 [1]   B C A A B
[[13]]
 [1]   B A C A B
[[14]]
 [1]   B A A C B
[[15]]
 [1]   B A A B C
[[16]]
 [1]   B A B A C
[[17]]
 [1]   B A B C A
[[18]]
 [1]   B A C B A
[[19]]
 [1]   B C A B A
[[20]]
 [1]   C B A B A
[[21]]
 [1]   C A B B A
[[22]]
 [1]   A C B B A
[[23]]
 [1]   A B C B A
[[24]]
 [1]   A B B C A
[[25]]
 [1]   A B B A C
[[26]]
 [1]   B B A A C
[[27]]
 [1]   B B A C A
[[28]]
 [1]   B B C A A
[[29]]
 [1]   B C B A A
[[30]]
 [1]   C B B A A

However, the above simple function will quickly get short
of breath if the total number of items gets much above, say 10.

Hoping this helps!
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 13-Oct-06   Time: 17:40:20
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] combinatorics

2006-10-13 Thread Charles C. Berry

On Fri, 13 Oct 2006, Robin Hankin wrote:

 Hi

 How do I generate all ways of ordering  sets of indistinguishable items?

 suppose I have two A's, two B's and a C.

 Then I want

 AABBC
 AABCB
 AACBC
 ABABC
 . . .snip...
 BBAAC
 . . .snip...
 CBBAA

 [there are 5!/(2!*2!) = 30 arrangements.  Note AABBC != BBAAC]

 How do I do this?

I'd recursively use combn() to choose locations for A's, then B's, then 
...

 where.A - combn(5,2)[, rep( 1:choose(5,2), each = choose(3,2)*choose(1,1))]
 where.not.A - apply(where.A,2,function(x) (1:5)[-x])
 where.B - matrix(apply(unique( where.not.A, MARGIN=2), 2, combn, 2 ),nr=2)
 where.not.AB - apply(rbind(where.A,where.B),2,function(x) (1:5)[-x] )
 result - matrix(C,nr=5,nc=30)
 result[ cbind( c( where.A ), c( col( where.A ) ) ) ] - A
 result[ cbind( c( where.B ), c( col( where.B ) ) ) ] - B
 cbind( apply(result,2,paste,collapse=) )
   [,1]
  [1,] AABBC
  [2,] AABCB
  [3,] AACBB
.
.
.



 --
 Robin Hankin
 Uncertainty Analyst
 National Oceanography Centre, Southampton
 European Way, Southampton SO14 3ZH, UK
  tel  023-8059-7743

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry(858) 534-2098
  Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]   UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bug in lowess

2006-10-13 Thread Berton Gunter

Folks:

This interesting dicussion brings up an issue of what I have referred to for
some time as safe statistics, by which I mean:

Usually, but not necessarily automated)Statistical procedures that are
guranteed to give either 

(a) a reasonable answer; or
(b) Do not give an answer and when possible emit useful error messages.

All standard least squares procedures taught in basic statistics courses are
examples (from many different perspectives) of unsafe statistics.
Robustness/resistance clearly takes us some way along this path, but as is
clear from the discussion, not the whole way. The reason I think that this
is important is 

a) Based on my own profound ignorance/limitations, I think it's impossible
to expect those who need to use many different kinds of sophisticated
statistical analyses to understand enough of the technical details to be
able to actively and effectively guide their appropriate when this requires
such guidance (e.g., least aquares with extensive diagnostics; overfitting
in nonlinear regression);  

b) The explosion of large complex data in all disciplines that **require**
some sort of automated analyses to be used (e.g. microarray data?).

Having said this, it is unclear to me even **if** safe statistics is a
meaningful concept: can it ever be -- at all? But I believe one thing is
clear: A lot of people devote a lot of labor to optimal procedures that
are far too sensitive to the manifold peculiarities of real data to give
reliable, trustworthy results in practice considerable expert coaxing. We at
least need a greater variety of less optimal but safer data analysis
procedures. R -- or rather it's many contributors-- seems to me to be the
exception in recognizing and doing something about this.

And as a humble example of what I mean: I like simple running medians of
generally small span for smoothing sequential data (please don't waste
time giving me counterexamples of why this is bad or how it can go wrong --
I know there are many).

I would appreciate anyone else's thoughts on this, pro or con, perhaps
privately rather than on the list if you view this as too far off-topic.

(NOTE: TO be clear: My personal views, not those of my company or
colleagues)

My best regards to all,

Bert

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Frank E Harrell Jr
Sent: Friday, October 13, 2006 5:51 AM
To: Prof Brian Ripley
Cc: [EMAIL PROTECTED]
Subject: Re: [R] Bug in lowess

Prof Brian Ripley wrote:
 Frank Harrell wrote:
 
 [...]
 
 Thank you Brian.  It seems that no matter what is the right answer, the
 answer currently returned on my system is clearly wrong.  lowess()$y
 should be constrained to be within range(y).
 
 Really?  That assertion is offered without proof and I am pretty sure is 
 incorrect.  Consider
 
 x - c(1:10, 20)
 y - c(1:10, 5) + 0.01*rnorm(11)
 lowess(x,y)
 $x
  [1]  1  2  3  4  5  6  7  8  9 10 20
 
 $y
  [1]  0.9983192  1.9969599  2.9960805  3.9948224  4.9944158  5.9959855
  [7]  6.9964400  7.9981434  8.9990607 10.0002567 19.9946117
 
 Remember that lowess is a local *linear* fitting method, and may give 
 zero weight to some data points, so (as here) it can extrapolate.

Brian - thanks - that's a good example though not typical of the kind I 
see from patients.

 
 After reading what src/appl/lowess.doc says should happen with zero 
 weights, I think the answer given on Frank's system probably is the 
 correct one.  Rounding error is determining which of the last two points 
 is given zero robustness weight: on a i686 system both of the last two 
 are, and on mine only the last is. As far as I can tell in 
 infinite-precision arithmetic both would be zero, and hence the value at 
 x=120 would be found by extrapolation from those (far) to the left of it.
 
 I am inclined to think that the best course of action is to quit with a 
 warning when the MAD of the residuals is effectively zero.  However, we 
 need to be careful not to call things 'bugs' that we do not understand 
 well enough.  This might be a design error in lowess, but it is not 
 AFAICS a bug in the implementation.

Yes it appears to be a weakness in the underlying algorithm.

Thanks
Frank

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] a correlation matrix subset where the subset avg. is a maximum

2006-10-13 Thread Ryan Austin

Hello R group,

Given a correlation matrix, I would like to obtain the best subset of 
pairs in the matrix of some size  n such that the mean of r for that 
subset is a maximum compared to any other possible subset of size  n.  
I've been looking at the deal and subselect packages but they don't seem 
to do what I need.  Does anyone have any suggestions?

Thanks in advance,
Ryan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nontabular logistic regression

2006-10-13 Thread Jeffrey Stratford

Gavin,

That worked!  I went through and I found a few missing cases where I had
. instead of NA - I'm still in SAS mode. 

Many thanks!



Jeffrey A. Stratford, Ph.D.
Postdoctoral Associate
331 Funchess Hall
Department of Biological Sciences
Auburn University
Auburn, AL 36849
334-329-9198
FAX 334-844-9234
http://www.auburn.edu/~stratja

 Gavin Simpson [EMAIL PROTECTED] 10/13/06 11:23 AM 
On Fri, 2006-10-13 at 09:28 -0500, Jeffrey Stratford wrote:
 Hi.  I'm attempting to fit a logistic/binomial model so I can
determine
 the influence of landscape on the probability that a box gets used by
a
 bird.  I've looked at a few sources (MASS text, Dalgaard, Fox and
 google) and the examples are almost always based on tabular predictor
 variables.  My data, however are not.  I'm not sure if that is the
 source of the problems or not because the one example that includes a
 continuous predictor looks to be coded exactly the same way.  Looking
at
 the output, I get estimates for each case when I should get a single
 estimate for purbank.  Any suggestions?
 
 Many thanks,
 
 Jeff

Hi Jeff,

using the snippet of data you provided (copy/paste into a text file and
read in with read.table) worked fine:

box.use - read.table(~/tmp/tmp.txt, header = TRUE)
box.use
str(box.use)
'data.frame':   8 obs. of  10 variables:
 $ box: int  1 2 3 4 5 6 7 8
 $ use: int  1 1 1 1 0 1 1 0
 $ purbank: num  0.00381 0.04429 0.04459 0.06072 0.60810 ...
 $ purban2: num  0.0268 0.1611 0.0604 0.2081 0.6980 ...
 $ purban1: num  0.069 0.172 0.000 0.069 0.690 ...
 $ pgrassk: num  0.3282 0.1534 0.1628 0.0194 0.0317 ...
 $ pgrass2: num  0.685 0.383 0.557 0.000 0.128 ...
 $ pgrass1: num  0.759 0.655 0.759 0.000 0.241 ...
 $ grassdist  : num0   0   0 323  30 ...
 $ grasspatchk: num  3.730 1.023 0.961 0.228 0.263 ...

Now I don't like attach, and you just don't need it so I deviate a
little now. Replace box.use$use directly and make use of the data
argument in glm. Also, your data didn't have any missing data so I'm not
sure whether the response or predictor is missing and whether your
na.omit is needed or not - I don't use it below.

box.use$use - factor(box.use$use, levels=0:1)
levels(box.use$use) - c(unused, used)
box.use
str(box.use)
glm1 - glm(use ~ purbank, data = box.use, family = binomial())

summary(glm1)

Call:
glm(formula = use ~ purbank, family = binomial(), data = box.use)

Deviance Residuals:
 Min1QMedian3Q   Max
-1.61450  -0.03098   0.31935   0.45888   1.39194

Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept)3.223  2.225   1.4480.147
purbank   -6.129  4.773  -1.2840.199

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 8.9974  on 7  degrees of freedom
Residual deviance: 6.5741  on 6  degrees of freedom
AIC: 10.574

Number of Fisher Scoring iterations: 5


I suspect something got messed up in your reading of the data and R
thought purbank was a factor or character. Always check your data after
reading in, and str() is a your friend here as printed representations
are not always what they seem.

HTH

G

 
 
 THE DATA: (200 boxes total, used [0 if unoccupied, 1 occupied], the
rest
 are landscape variables).  
 

box use purbank purban2 purban1 pgrassk pgrass2 pgrass1 grassdist   
grasspatchk

1   1   0.003813435 0.02684564  0.06896552  0.3282487   
0.6845638   0.7586207   0   3.73

2   1   0.04429451  0.1610738   0.1724138   0.1534174   
0.3825503   0.6551724   0   1.023261

3   1   0.04458785  0.06040268  0   0.1628043   
0.5570470.7586207   0   0.9605769

4   1   0.06072162  0.2080537   0.06896552  0.01936052  
0   0   323.10990.2284615

5   0   0.6080962   0.6979866   0.6896552   0.03168084  
0.1275168   0.2413793   30  0.2627027

6   1   0.6060428   0.6107383   0.3448276   0.04077442  
0.2885906   0.4482759   30  0.2978571

7   1   0.3807568   0.4362416   0.6896552   0.06864183  
0.03355705  0   94.868330.468

8   0   0.3649164   0.3154362   0.4137931   0.06277501  
0.1275168   0   120 0.4585714
 
 THE CODE:
 
 box.use- read.csv(c:\\eabl\\2004\\use_logistic2.csv, header=TRUE)
 attach(box.use)
 box.use - na.omit(box.use)
 use - factor(use, levels=0:1)
 levels(use) - c(unused, used)
 glm1 - glm(use ~ purbank, binomial)
 
 THE OUTPUT:
 
 Coefficients:
  Estimate Std. Error   z value Pr(|z|)
 (Intercept)-4.544e-16  1.414e+00 -3.21e-161.000
 purbank02.157e+01  2.923e+04 0.0010.999
 purbank0.001173365  2.157e+01  2.067e+04 0.0010.999
 purbank0.001466706

Re: [R] combinatorics

2006-10-13 Thread hadley wickham

 I've tried to think of an efficient and economical (and therefore
 clever) way of doing this for larger problems; but that will have
 to wait for another day!

The ruby permutations library
(http://permutation.rubyforge.org/doc/index.html) references  The
Algorithm Design Manual, Steven S. Skiena, Telos/Springer, 1997, for
permutation sequences.

Hadley

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adding per-panel text to panel strips in lattice xyplot

2006-10-13 Thread Richard M. Heiberger

Take a look at the xysplom function in package HH.
You can use it as a model for what you want.

tmp - data.frame(x=rnorm(24),
  y=rnorm(24),
  a=factor(rep(letters[1:2],12)),
  b=factor(rep(LETTERS[1:3], c(8,8,8
xysplom(y ~ x | a*b, data=tmp, corr=TRUE, layout=c(2,3))
 
The work is done with the cooperation of two functions.

xysplom.default looks for the corr argument and then creates
an additional conditioning factor and gives it a constant value.

strip.xysplom sees where it is and changes the strip label as needed.

strip.xysplom uses the R-2.3.1 technology for finding where it is.
Deepayan added some more functions in R-2.4.1, so that part of
the code can now be somewhat simplified.

lattice in R-2.4.0 also has a new strip.left argument (see ?xyplot and
serach for strip.left) that will allow you to put the additional
information in an additional left strip for each panel, rather than by making
changes to one of the standard top strips.

Rich

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] a correlation matrix subset where the subset avg is a maximum

2006-10-13 Thread Ryan Austin

Hello R group,

Given a correlation matrix, I would like to obtain the best subset of 
pairs in the matrix of some size  n such that the mean of r for that 
subset is a maximum compared to any other possible subset of size  n.  
I've been looking at the deal and subselect packages but they don't seem 
to do what I need.  Does anyone have any suggestions?

Thanks in advance,
Ryan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cygwin script for Sweave

2006-10-13 Thread Gabor Grothendieck

On Windows you could just put this into sweave.bat, say, and then
place that anywhere in your path (or in the current directory):

set infile=%~sdpn1
set infile=%infile:\=/%
cmd Rcmd Sweave %infile%.Rnw
pdflatex %infile%.tex
start %infile%.pdf


On 10/13/06, Thomas Harte [EMAIL PROTECTED] wrote:
 below is a very simple bash script to run Sweave from a cygwin terminal, run 
 pdflatex on
 the generated .tex file, and then view the resulting .pdf output.

 i usually use cygwin when i am (forced to be on) Windoze, but i found a few 
 issues
 with paths that this script works around.

 pdfview, used in the script, is simply:

 
 $ cat /usr/local/bin/pdfview
 #!/bin/bash.exe

 if [ $# -eq 1 ]
 then
/c/Program\ Files/Adobe/Acrobat\ 6.0/Reader/AcroRd32.exe `cygpath -w 
 -a -s $1`
 else
/c/Program\ Files/Adobe/Acrobat\ 6.0/Reader/AcroRd32.exe
 fi
 

 mutatis mutandis for your own Adobe Reader.


 here is the script:
 
 #!/bin/bash.exe
 # rnw.sh [.Rnw file]
 #
 # $1 must be a .Rnw file
 #
 RNWFILE=$1
 PWD=`pwd`
 FILEBASE=`basename $1 .Rnw`
 TEXFILE=$FILEBASE.tex
 PDFFILE=$FILEBASE.pdf

 echo  \
library(\utils\); \
setwd(\`cygpath -m $PWD`\);   \
Sweave(\$RNWFILE\)\
   \
  | /c/R/R-2.3.1/bin/Rterm.exe --no-save --no-restore

 # the resulting .tex file contains an annoying c: ...
 # replace it with the pdflatex-friendly /c :
 sed -e 's/c:/\/c/g' --in-place $TEXFILE

 # now run text processing
 pdflatex $TEXFILE
 pdfview $PDFFILE 
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] a correlation matrix subset where the subset avg. is a maximum

2006-10-13 Thread Ryan Austin

Hello R group,

Given a correlation matrix, I would like to obtain the best subset of 
pairs in the matrix of some size  n such that the mean of r for that 
subset is a maximum compared to any other possible subset of size  n.  
I've been looking at the deal and subselect packages but they don't seem 
to do what I need.  Does anyone have any suggestions?

Thanks in advance,
Ryan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] adding error bars to lattice plots

2006-10-13 Thread Daniel E. Bunker

Dear Deepayan and Sundar,

Thank you so much for your help with this.  However, I may have phrased 
my problem too specifically, assuming that *in general* I could apply 
your response to all Lattice graphics.

What I need is a barchart or vertical dotchart, with error bars, across 
three treatments, with a form that should look something like: 
barchart(median~fac1|by1, groups=group1).

Your solution works great with the problem I had posed 
xyplot(voice.part ~ median|by1, groups=group1, [snip], yet when I 
switch the axes for a vertical dotchart xyplot(median~voice.part|by1, 
groups=group1, [snip] the error bars remain horizontal and do not 
follow the switched axes.

I tried to alter all 'lx' and 'ux' to 'ly' and 'uy' in panel.ci and 
prepanel.ci, but to no avail.  I'm afraid I have not (YET) managed to 
understand just how Lattice works.

Re. moving panel.abline() from panel.groups to panel, does that mean I 
need to rewrite panel.superpose?

Again, any advice would be greatly appreciated.

Thanks, Dan



Deepayan Sarkar wrote:
 On 10/12/06, Daniel E. Bunker [EMAIL PROTECTED] wrote:
 Dear R users,

 About a year ago Deepayan offered a suggestion to incorporate error bars
 into a dotplot using the singer data as an example
 http://finzi.psych.upenn.edu/R/Rhelp02a/archive/63875.html.

 When I try to utilize this code with a grouping variable, I get an error
 stating that the subscripts argument is missing.  I have tried to insert
 them in various ways, but cannot figure out where they should go.

 Deepayan's original code follows, with additions from me for factor,
 grouping and by variables.

 (Note that I could use xYplot (Dotplot), but I need my response variable
 on the vertical axis.)

 Any suggestions would be greatly appreciated.

 Thanks, Dan

 prepanel.ci - function(x, y, lx, ux, subscripts, ...) {
 x - as.numeric(x)
 lx - as.numeric(lx[subscripts])
  ux - as.numeric(ux[subscripts])
  list(xlim = range(x, ux, lx, finite = TRUE))
  }

 panel.ci - function(x, y, lx, ux, subscripts, pch = 16, ...) {
  x - as.numeric(x)
  y - as.numeric(y)
  lx - as.numeric(lx[subscripts])
  ux - as.numeric(ux[subscripts])
  panel.abline(h = unique(y), col = grey)
  panel.arrows(lx, y, ux, y, col = 'black',
   length = 0.25, unit = native,
   angle = 90, code = 3)
  panel.xyplot(x, y, pch = pch, ...)
  }

 singer.split -
  with(singer,
   split(height, voice.part))

 singer.ucl -
  sapply(singer.split,
 function(x) {
 st - boxplot.stats(x)
 c(st$stats[3], st$conf)
 })

 singer.ucl - as.data.frame(t(singer.ucl))
 names(singer.ucl) - c(median, lower, upper)
 singer.ucl$voice.part -
  factor(rownames(singer.ucl),
 levels = rownames(singer.ucl))

 # add factor, grouping and by variables
 singer.ucl$fac1=c(level1,level1, level2, level2)
 singer.ucl$by1=c(two,one)
 singer.ucl$group1=c(rep(letters[1],4),rep(letters[2],4))

 ## show the data frame
 singer.ucl

 # Deepayan's original example
 with(singer.ucl,
   xyplot(voice.part ~ median,
  lx = lower, ux = upper,
  prepanel = prepanel.ci,
  panel = panel.ci),
  horizontal=FALSE)

 # with by variable, works fine
 with(singer.ucl,
   xyplot(voice.part ~ median|by1,
  lx = lower, ux = upper,
  prepanel = prepanel.ci,
  panel = panel.ci))

 # with groups, fails for lack of subscripts.
 with(singer.ucl,
   xyplot(voice.part ~ median, groups=group1,
  lx = lower, ux = upper,
  prepanel = prepanel.ci,
  panel = panel.ci))

 Although that does seem to be the eventual error message, this fails
 not due to the lack of subscripts, but because 'panel.ci' does not
 know how to deal with groups. One solution to this is Sundar's
 approach, which is to change the panel function to handle groups.
 Another generic solution is to use 'panel.superpose', which _does_
 know how to handle groups, and also accepts a custom panel function to
 be called for each group. Often (but not always), you can use a panel
 function designed for a non-groups aware display for this. In this
 case, the following gives results similar to Sundar's code:

 with(singer.ucl,
 xyplot(voice.part ~ median, groups=group1,
lx = lower, ux = upper,
prepanel = prepanel.ci,
panel = panel.superpose,
panel.groups = panel.ci,
pch = 16))

 Note the need for an explicit pch=16, since the default in panel.ci is
 overridden by panel.groups.

 # what I need, ultimately, is something like this, with error bars:

 with(singer.ucl,
   dotplot(median~fac1|by1, groups=group1))

 If you have more than one interval (from different groups) for any
 level of your categorical variable - which seems to be the case in
 this example - you

[R] side by side plot of Histogram and densityplot

2006-10-13 Thread Jue.Wang2

Using par seems easily put a hist and a density side by side on the same 
output window.

I would like to use some features in histogram from Lattice, but how can I put

histogram and densityplot side by side on the same graph?

Thank you


par(mfrow=c(2,1))
hist(y)
plot(density(y))


Jue Wang, Biostatistician
Contracted Position for Preclinical  Research Biostatistics
PrO Unlimited
(908) 231-3022

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] side by side plot of Histogram and densityplot

2006-10-13 Thread Deepayan Sarkar

On 10/13/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Using par seems easily put a hist and a density side by side on the same 
 output window.

 I would like to use some features in histogram from Lattice, but how can I 
 put

 histogram and densityplot side by side on the same graph?

?print.trellis

example(print.trellis)

-Deepayan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] correlation b/w a continuous variable and a categorical variable

2006-10-13 Thread Weiwei Shi

Dear Listers:

I happen to have this question in mind, is there a way to evaluate the
correlation between
a continuous variable and a categorical variable (without discretizing
the former)? My intuitive is using lda by considering the latter as
response variable but not sure.

thanks,

weiwei

-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

Did you always know?
No, I did not. But I believed...
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a correlation matrix subset where the subset avg is a maximum

2006-10-13 Thread Ryan Austin

Thanks for the thought in any case Mark.  Your right about the brute force.
I'll expand a bit with an example though for the sake of clarity.

Given a correlation matrix of 4 covariates ABCD with distances of:
AB=0.2;  AC=0.6; AD=0.3 ; BC=0.9 ; BD=0.8 ; CD=0.7

Find the optimal subset (size  n, n being the number of covariates) 
where the mean of r for the subset is a maximum.
Of course all NxN distances need to be considered between any chosen 
subset covariates.

Thus for n1, the solution would be simply BC = 0.9
And for n2, the solution would be BCD as (BC + CD + BD)/3) = 0.8 is the 
maximum mean r value that could be obtained from
any of the subsets with n2.

I'd expected that this would be a common problem but 2 days of googling 
has given me little.  I'm expecting a greedy graph traversal
or the like will be my answer but I'd hoped to whip a solution of in R.
Any help would be greatly appreciated.
Ryan
 

Leeds, Mark (IED) wrote:

hi ryan : I reread and you already have the correlation matrix so brute
force should definitely work.

So, if the correlation matrix was size 20 by 20 and your n was 9.

Then, you have to have of size 10 or greater so  the number of
possoibilities would be ( 20 choose 10 ) + ( 20 choose 11 ) +  ( 200
choose 12 ) +  ( 20 choose 13 ) + .  ( 20 choose 20 )

Oh boy, it is too large a problem to do by brute force. There are too
many possibilities even for this size of problem.
Hopefully Someone else will have a better idea. Forget my brute force
idea. It's useless and I apologize. I Made a mistake.




 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ryan Austin
Sent: Friday, October 13, 2006 2:43 PM
To: r-help@stat.math.ethz.ch
Subject: [R] a correlation matrix subset where the subset avg is a
maximum

Hello R group,

Given a correlation matrix, I would like to obtain the best subset of
pairs in the matrix of some size  n such that the mean of r for that
subset is a maximum compared to any other possible subset of size  n.  
I've been looking at the deal and subselect packages but they don't seem
to do what I need.  Does anyone have any suggestions?

Thanks in advance,
Ryan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This is not an offer (or solicitation of an offer) to buy/sell the 
securities/instruments mentioned or an official confirmation.  Morgan Stanley 
may deal as principal in or own or act as market maker for 
securities/instruments mentioned or may advise the issuers.  This is not 
research and is not from MS Research but it may refer to a research 
analyst/research report.  Unless indicated, these views are the author's and 
may differ from those of Morgan Stanley research or others in the Firm.  We do 
not represent this is accurate or complete and we may not update this.  Past 
performance is not indicative of future returns.  For additional information, 
research reports and important disclosures, contact me or see 
https://secure.ms.com/servlet/cls.  You should not use e-mail to request, 
authorize or effect the purchase or sale of any security or instrument, to 
send transfer instructions, or to effect any other transactions.  We cannot 
guarantee that any such requests received via!
  e-mail will be processed in a timely manner.  This communication is solely 
for the addressee(s) and may contain confidential information.  We do not waive 
confidentiality by mistransmission.  Contact me if you do not wish to receive 
these communications.  In the UK, this communication is directed in the UK to 
those persons who are market counterparties or intermediate customers (as 
defined in the UK Financial Services Authority's rules).
  


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a correlation matrix subset where the subset avg is a maximum

2006-10-13 Thread Ryan Austin

Thanks for the thought in any case Mark.  Your right about the brute force.
I'll expand a bit with an example though for the sake of clarity.

Given a correlation matrix of 4 covariates ABCD with distances of:
AB=0.2;  AC=0.6; AD=0.3 ; BC=0.9 ; BD=0.8 ; CD=0.7

Find the optimal subset (size  n, n being the number of covariates) 
where the mean of r for the subset is a maximum.
Of course all NxN distances need to be considered between any chosen 
subset covariates.

Thus for n1, the solution would be simply BC = 0.9
And for n2, the solution would be BCD as (BC + CD + BD)/3) = 0.8 is the 
maximum mean r value that could be obtained from
any of the subsets with n2.

I'd expected that this would be a common problem but 2 days of googling 
has given me little.  I'm expecting a greedy graph traversal
or the like will be my answer but I'd hoped to whip a solution of in R.
Any help would be greatly appreciated.
Ryan
 

Leeds, Mark (IED) wrote:

hi ryan : I reread and you already have the correlation matrix so brute
force should definitely work.

So, if the correlation matrix was size 20 by 20 and your n was 9.

Then, you have to have of size 10 or greater so  the number of
possoibilities would be ( 20 choose 10 ) + ( 20 choose 11 ) +  ( 200
choose 12 ) +  ( 20 choose 13 ) + .  ( 20 choose 20 )

Oh boy, it is too large a problem to do by brute force. There are too
many possibilities even for this size of problem.
Hopefully Someone else will have a better idea. Forget my brute force
idea. It's useless and I apologize. I Made a mistake.




 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ryan Austin
Sent: Friday, October 13, 2006 2:43 PM
To: r-help@stat.math.ethz.ch
Subject: [R] a correlation matrix subset where the subset avg is a
maximum

Hello R group,

Given a correlation matrix, I would like to obtain the best subset of
pairs in the matrix of some size  n such that the mean of r for that
subset is a maximum compared to any other possible subset of size  n.  
I've been looking at the deal and subselect packages but they don't seem
to do what I need.  Does anyone have any suggestions?

Thanks in advance,
Ryan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This is not an offer (or solicitation of an offer) to buy/sell the 
securities/instruments mentioned or an official confirmation.  Morgan Stanley 
may deal as principal in or own or act as market maker for 
securities/instruments mentioned or may advise the issuers.  This is not 
research and is not from MS Research but it may refer to a research 
analyst/research report.  Unless indicated, these views are the author's and 
may differ from those of Morgan Stanley research or others in the Firm.  We do 
not represent this is accurate or complete and we may not update this.  Past 
performance is not indicative of future returns.  For additional information, 
research reports and important disclosures, contact me or see 
https://secure.ms.com/servlet/cls.  You should not use e-mail to request, 
authorize or effect the purchase or sale of any security or instrument, to 
send transfer instructions, or to effect any other transactions.  We cannot 
guarantee that any such requests received via!
  e-mail will be processed in a timely manner.  This communication is solely 
for the addressee(s) and may contain confidential information.  We do not waive 
confidentiality by mistransmission.  Contact me if you do not wish to receive 
these communications.  In the UK, this communication is directed in the UK to 
those persons who are market counterparties or intermediate customers (as 
defined in the UK Financial Services Authority's rules).
  


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Version of R in saved workspaces

2006-10-13 Thread Soukup, Mat

useRs and developeRs-

Apologies for my naivety, but I just couldn't figure out how to open an
old workspace (created using R 2.3.0) using R 2.3.0 and not R 2.4.0
which is currently happening. For all I know this has always been the
case, but I'm having a problem with a function that doesn't work in
2.4.0 but does work in 2.3.1 (function is in the process of being
fixed). So I would ideally like to open my old workspaces with the
version of R I used in creating the workspace and at the moment not in R
2.4.0 - though in a few weeks when the function is fixed I'm sure I'll
love 2.4.0 as I have all previous versions:)

Thanks for any help,

-Mat

Steps to what I'm doing.

1. In R 2.3.1 I import data, manipulate it until my heart's content, and
run some analyses.
2. I save the workspace to my local drive - name it CoolStats.RData.
3. I see R 2.4.0 is released and immediately download it and start
playing with it only to find the above mentioned error.
4. Since error is going to take some time to fix, will work with R 2.3.1
in the interim.
5. Open CoolStat.Rdata and when I type version I get the below.

 version
   _   
platform   i386-pc-mingw32 
arch   i386
os mingw32 
system i386, mingw32   
status 
major  2   
minor  4.0 
year   2006
month  10  
day03  
svn rev39566   
language   R   
version.string R version 2.4.0 (2006-10-03)

***
Mat Soukup, Ph.D.
Food and Drug Administration
10903 New Hampshire Ave. 
BLDG 22 RM 5329
Silver Spring, MD 20993-0002
Phone: 301.796.1005
***


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] adding error bars to lattice plots

2006-10-13 Thread Deepayan Sarkar

On 10/13/06, Daniel E. Bunker [EMAIL PROTECTED] wrote:
 Dear Deepayan and Sundar,

 Thank you so much for your help with this.  However, I may have phrased
 my problem too specifically, assuming that *in general* I could apply
 your response to all Lattice graphics.

 What I need is a barchart or vertical dotchart, with error bars, across
 three treatments, with a form that should look something like:
 barchart(median~fac1|by1, groups=group1).

 Your solution works great with the problem I had posed
 xyplot(voice.part ~ median|by1, groups=group1, [snip], yet when I
 switch the axes for a vertical dotchart xyplot(median~voice.part|by1,
 groups=group1, [snip] the error bars remain horizontal and do not
 follow the switched axes.

 I tried to alter all 'lx' and 'ux' to 'ly' and 'uy' in panel.ci and
 prepanel.ci, but to no avail.  I'm afraid I have not (YET) managed to
 understand just how Lattice works.

 Re. moving panel.abline() from panel.groups to panel, does that mean I
 need to rewrite panel.superpose?

Only if you consider writing a function B that uses function A
rewriting function A.

 Again, any advice would be greatly appreciated.

Continuing with the singer example, the following works for me:


prepanel.ci - function(x, y, ly, uy, subscripts, ...)
{
y - as.numeric(y)
ly - as.numeric(ly[subscripts])
uy - as.numeric(uy[subscripts])
list(ylim = range(y, uy, ly, finite = TRUE))
}

panel.ci - function(x, y, ly, uy, subscripts, pch = 16, ...)
{
x - as.numeric(x)
y - as.numeric(y)
ly - as.numeric(ly[subscripts])
uy - as.numeric(uy[subscripts])
panel.arrows(x, ly, x, uy, col = 'black',
 length = 0.25, unit = native,
 angle = 90, code = 3)
panel.xyplot(x, y, pch = pch, ...)
}


with(singer.ucl,
 xyplot(median ~ voice.part, groups=group1,
ly = lower, uy = upper,
prepanel = prepanel.ci,
panel = function(x, y, ...) {
panel.abline(v = unique(as.numeric(x)),
 col = grey)
panel.superpose(x, y, ...)
},
panel.groups = panel.ci,
pch = 16))

Let us know if anything is not obvious.

-Deepayan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] correlation b/w a continuous variable and a categorical variable

2006-10-13 Thread Achim Zeileis

On Fri, 13 Oct 2006 17:15:45 -0400 Weiwei Shi wrote:

 Dear Listers:
 
 I happen to have this question in mind, is there a way to evaluate the
 correlation between
 a continuous variable and a categorical variable (without discretizing
 the former)? My intuitive is using lda by considering the latter as
 response variable but not sure.

It depends what exactly you mean by evaluate correlation. If you want
to test independence of two variables X and Y against some form of
association, you can generally use statistics based on
  sum h(Y) * g(X)
where h() and g() are suitable transformations of X and Y. Special
cases of this framework are tests for correlation of continuous
variables and Chi-squared type statistics for categorical variables.
This approach is implemented in the package coin, see
independence_test() and the package vignette.

hth,
Z

 thanks,
 
 weiwei
 
 -- 
 Weiwei Shi, Ph.D
 Research Scientist
 GeneGO, Inc.
 
 Did you always know?
 No, I did not. But I believed...
 ---Matrix III
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented,
 minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] correlation b/w a continuous variable and a categorical variable

2006-10-13 Thread Weiwei Shi

I see.

i think the question is, I did not have a clear idea of the
correlation between them (if I insist no transformation). Otherwise,
for a binary variable case, maybe a simple one-way t-test serves the
purpose if I defined such correlation or dependency as the group mean
difference.

thanks.

On 10/13/06, Achim Zeileis [EMAIL PROTECTED] wrote:
 On Fri, 13 Oct 2006 17:15:45 -0400 Weiwei Shi wrote:

  Dear Listers:
 
  I happen to have this question in mind, is there a way to evaluate the
  correlation between
  a continuous variable and a categorical variable (without discretizing
  the former)? My intuitive is using lda by considering the latter as
  response variable but not sure.

 It depends what exactly you mean by evaluate correlation. If you want
 to test independence of two variables X and Y against some form of
 association, you can generally use statistics based on
   sum h(Y) * g(X)
 where h() and g() are suitable transformations of X and Y. Special
 cases of this framework are tests for correlation of continuous
 variables and Chi-squared type statistics for categorical variables.
 This approach is implemented in the package coin, see
 independence_test() and the package vignette.

 hth,
 Z

  thanks,
 
  weiwei
 
  --
  Weiwei Shi, Ph.D
  Research Scientist
  GeneGO, Inc.
 
  Did you always know?
  No, I did not. But I believed...
  ---Matrix III
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html and provide commented,
  minimal, self-contained, reproducible code.
 



-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

Did you always know?
No, I did not. But I believed...
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a correlation matrix subset where the subset avg is a maximum

2006-10-13 Thread Ryan Austin

Thanks for the thought in any case Mark.  Your right about the brute force.
I'll expand a bit with an example though for the sake of clarity.

Given a correlation matrix of 4 covariates ABCD with distances of:
AB=0.2;  AC=0.6; AD=0.3 ; BC=0.9 ; BD=0.8 ; CD=0.7

Find the optimal subset (size  n, n being the number of covariates) 
where the mean of r for the subset is a maximum.
Of course all NxN distances need to be considered between any chosen 
subset covariates.

Thus for n1, the solution would be simply BC = 0.9
And for n2, the solution would be BCD as (BC + CD + BD)/3) = 0.8 is the 
maximum mean r value that could be obtained from
any of the subsets with n2.

I'd expected that this would be a common problem but 2 days of googling 
has given me little.  I'm expecting a greedy graph traversal
or the like will be my answer but I'd hoped to whip a solution off in R.
Any help would be greatly appreciated.
Ryan




Leeds, Mark (IED) wrote:

hi ryan : I reread and you already have the correlation matrix so brute
force should definitely work.

So, if the correlation matrix was size 20 by 20 and your n was 9.

Then, you have to have of size 10 or greater so  the number of
possoibilities would be ( 20 choose 10 ) + ( 20 choose 11 ) +  ( 200
choose 12 ) +  ( 20 choose 13 ) + .  ( 20 choose 20 )

Oh boy, it is too large a problem to do by brute force. There are too
many possibilities even for this size of problem.
Hopefully Someone else will have a better idea. Forget my brute force
idea. It's useless and I apologize. I Made a mistake.




 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ryan Austin
Sent: Friday, October 13, 2006 2:43 PM
To: r-help@stat.math.ethz.ch
Subject: [R] a correlation matrix subset where the subset avg is a
maximum

Hello R group,

Given a correlation matrix, I would like to obtain the best subset of
pairs in the matrix of some size  n such that the mean of r for that
subset is a maximum compared to any other possible subset of size  n.  
I've been looking at the deal and subselect packages but they don't seem
to do what I need.  Does anyone have any suggestions?

Thanks in advance,
Ryan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This is not an offer (or solicitation of an offer) to buy/sell the 
securities/instruments mentioned or an official confirmation.  Morgan Stanley 
may deal as principal in or own or act as market maker for 
securities/instruments mentioned or may advise the issuers.  This is not 
research and is not from MS Research but it may refer to a research 
analyst/research report.  Unless indicated, these views are the author's and 
may differ from those of Morgan Stanley research or others in the Firm.  We do 
not represent this is accurate or complete and we may not update this.  Past 
performance is not indicative of future returns.  For additional information, 
research reports and important disclosures, contact me or see 
https://secure.ms.com/servlet/cls.  You should not use e-mail to request, 
authorize or effect the purchase or sale of any security or instrument, to 
send transfer instructions, or to effect any other transactions.  We cannot 
guarantee that any such requests received via!
  e-mail will be processed in a timely manner.  This communication is solely 
for the addressee(s) and may contain confidential information.  We do not waive 
confidentiality by mistransmission.  Contact me if you do not wish to receive 
these communications.  In the UK, this communication is directed in the UK to 
those persons who are market counterparties or intermediate customers (as 
defined in the UK Financial Services Authority's rules).
  


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Version of R in saved workspaces

2006-10-13 Thread Richard M. Heiberger

Start R-2.3.1 from the icon which is probably still on your desktop.

setwd() to the directory where your CoolStats.RData sits.

load(CoolStats.RData)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] correlation b/w a continuous variable and a categorical variable

2006-10-13 Thread Achim Zeileis

On Fri, 13 Oct 2006 18:17:10 -0400 Weiwei Shi wrote:

 I see.
 
 i think the question is, I did not have a clear idea of the
 correlation between them (if I insist no transformation). Otherwise,
 for a binary variable case, maybe a simple one-way t-test serves the
 purpose if I defined such correlation or dependency as the group mean
 difference.

...another special case of the general framework I outlined below. But
the man page and package vignette I already pointed you to, give you a
much better explanation of this.
Z


 thanks.
 
 On 10/13/06, Achim Zeileis [EMAIL PROTECTED] wrote:
  On Fri, 13 Oct 2006 17:15:45 -0400 Weiwei Shi wrote:
 
   Dear Listers:
  
   I happen to have this question in mind, is there a way to
   evaluate the correlation between
   a continuous variable and a categorical variable (without
   discretizing the former)? My intuitive is using lda by
   considering the latter as response variable but not sure.
 
  It depends what exactly you mean by evaluate correlation. If you
  want to test independence of two variables X and Y against some
  form of association, you can generally use statistics based on
sum h(Y) * g(X)
  where h() and g() are suitable transformations of X and Y. Special
  cases of this framework are tests for correlation of continuous
  variables and Chi-squared type statistics for categorical variables.
  This approach is implemented in the package coin, see
  independence_test() and the package vignette.
 
  hth,
  Z
 
   thanks,
  
   weiwei
  
   --
   Weiwei Shi, Ph.D
   Research Scientist
   GeneGO, Inc.
  
   Did you always know?
   No, I did not. But I believed...
   ---Matrix III
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html and provide commented,
   minimal, self-contained, reproducible code.
  
 
 
 
 -- 
 Weiwei Shi, Ph.D
 Research Scientist
 GeneGO, Inc.
 
 Did you always know?
 No, I did not. But I believed...
 ---Matrix III
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented,
 minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to see if row names of a dataframe are stored compactly

2006-10-13 Thread Hsiu-Khuern Tang

Reading the list of changes for R version 2.4.0, I was happy to see that the
row names of dataframes can be stored compactly (as the integer n when
row.names(df) is 1:n).

help(row.names) contains this paragraph:

Row names of the form '1:n' for 'n  2' are stored internally in a
compact form, which might be seen by calling 'attributes' but never
via 'row.names' or 'attr(x, row.names)'.

I am unable to get attributes(x)$row.names to return just nrow(x).  Am I
misreading the documentation?  Does might be seen mean possibly in some
future version of R in this case?

 (x - as.data.frame(matrix(1:9, nrow=3)))
  V1 V2 V3
1  1  4  7
2  2  5  8
3  3  6  9
 attributes(x)$row.names
[1] 1 2 3
 row.names(x) - seq(len=nrow(x))
 attributes(x)$row.names
[1] 1 2 3

Best,
Hsiu-Khuern.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] combinatorics

2006-10-13 Thread Duncan Temple Lang

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Hi Robin.

 When I saw this, I thought expand.grid would do.
 But since it is too big and since I sympathize
 that C isn't the ideal use of ones time, perhaps
 the Combinations package on www.omegahat.org
 might be helpful.  This provides a one-at-a-time
 approach.

  D.

Robin Hankin wrote:
 Hi Christos
 
 thanks for this.  Unfortunately, this approach wouldn't work for me
 because the real problem is too big for it: I have
 letters A-F and two of each, giving
 
 12!/(2^6) ~= 7e6 combinations (borderline feasible)
 
 But in the approach you coded up below, matrix zz would have
   6^12 ~= 2e9 rows before eliminating the non-feasible ones.
 This is too big for me.
 
 Looks like it's going to be another weekend lost to C
 [but at least I now have some confidence that I've not overlooked
 anything obvious!]
 
 With very best wishes, I really appreciate your efforts
 
 
 
 Robin
 
 
 
 
 On 13 Oct 2006, at 16:21, Christos Hatzis wrote:
 
 
Hi Robin,

This approach first generates all combinations and then eliminates the
non-feasible ones.  It should work fine for smallish vectors but  
might not
scale well for larger vectors.  Hopefully it gives you what you  
need for
this problem.

xx - c(A,A,B,B,C)
yy - 1:length(xx)
zz - expand.grid(yy,yy,yy,yy,yy)

ss - zz[ apply(zz, 1, FUN=function(x) length(unique(x))) == length 
(xx), ]
ss - as.matrix(ss)

pp - apply(ss, 1, FUN=function(x,v) paste(v[as.vector(x)],  
collapse=),
v=xx)
res - unique(pp)


res

 [1] CBBAA BCBAA BBCAA CBABA BCABA CABBA ACBBA  
BACBA ABCBA
BBACA BABCA
[12] ABBCA CBAAB BCAAB CABAB ACBAB BACAB ABCAB  
CAABB ACABB
AACBB BAACB
[23] ABACB AABCB BBAAC BABAC ABBAC BAABC ABABC AABBC

length(res)

[1] 30

-Christos

Christos Hatzis, Ph.D.
Nuvera Biosciences, Inc.
400 West Cummings Park
Suite 5350
Woburn, MA 01801
Tel: 781-938-3830
www.nuverabio.com


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Robin Hankin
Sent: Friday, October 13, 2006 10:19 AM
To: [EMAIL PROTECTED]
Subject: [R] combinatorics

Hi

How do I generate all ways of ordering  sets of indistinguishable  
items?

suppose I have two A's, two B's and a C.

Then I want

AABBC
AABCB
AACBC
ABABC
. . .snip...
BBAAC
. . .snip...
CBBAA

[there are 5!/(2!*2!) = 30 arrangements.  Note AABBC != BBAAC]

How do I do this?





--
Robin Hankin
Uncertainty Analyst
National Oceanography Centre, Southampton European Way, Southampton  
SO14
3ZH, UK
  tel  023-8059-7743

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting- 
guide.html
and provide commented, minimal, self-contained, reproducible code.


 
 
 --
 Robin Hankin
 Uncertainty Analyst
 National Oceanography Centre, Southampton
 European Way, Southampton SO14 3ZH, UK
   tel  023-8059-7743
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

- --
Duncan Temple Lang[EMAIL PROTECTED]
Department of Statistics  work:  (530) 752-4782
4210 Mathematical Sciences Building   fax:   (530) 752-7099
One Shields Ave.
University of California at Davis
Davis,
CA 95616,
USA
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFFMCTY9p/Jzwa2QP4RAiCiAJ9APb87RkA7Ap1Y8BigNtmI3Q8oAQCfRzfp
3+v/Ari5BVD5/5hDYDIVzWY=
=8NBK
-END PGP SIGNATURE-

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 106 matches

Mail list logo