date:20070406

Re: [R] Likelihood returning inf values to optim(L-BFGS-B) other options?

2007-04-06 Thread Michael Jungbluth

Thank you very much for your postings. Rewriting the likelihood with  
lgamma helps a lot and the mistake with "fnscale" was quite stupid  
(Sorry for that!).

The model is working for most of the parameter sets but I am still  
facing some inf-returns on my (with lgamma updated and negative)  
loglikelihood if I am putting in some extreme parameter values (e.g.  
u-shaped beta densities for the simulation data which generates x,t_x  
and T). Actually these are the ones which are really interesting for  
my project. So, is there another similar optimization algorithm which  
can deal with inf-returns?

Thanks a lot!

Best regards,
Michael

Zitat von Prof Brian Ripley <[EMAIL PROTECTED]>:

> On Thu, 5 Apr 2007, Ravi Varadhan wrote:
>
>> In your code, the variables x (which I assume is the observed data), Tvec,
>> and flag are not passed to the function as arguments.  This could be a
>> potential problem.
>
> I think scoping will probably find them.
>
>> Another problem could be that you have to use "negative"
>> log-likelihood function as input to optim, since by default it "minimizes"
>> the function, whereas you are interested in finding the argmax of
>> log-likelihood.  So, in your function you should return (-ll) instead of ll.
>
> OR set fnscale.  This is the most serious problem.
>
>> If the above strategies don't work, I would try different initial values (it
>> would be best if you have a data-driven strategy for picking a starting
>> value) and different optimization methods (e.g. conjugate gradient with
>> "Polak-Ribiere" steplength option, Nelder-Mead, etc.).
>
> It looks to me as if the calculations are very vulnerable to
> overflow/underflow, as they use gamma and not lgamma.  They could be
> rearranged to be much stabler by computing the sum of logs for each
> sub-expression.
>
> There were over 50 warnings, which we were not shown.  They probably
> explained the problem.
>
> Beyond that, the feasible region seems to be the interior of the
> positive orthant, in which case transforming the parameters (e.g.
> working with their logs) would be a good idea.
>
> Finally, always supply analytical gradients when you can (as would be
> easy here).
>
>
>> -Original Message-
>> From: [EMAIL PROTECTED]
>> [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
>> Sent: Thursday, April 05, 2007 6:12 AM
>> To: r-help@stat.math.ethz.ch
>> Subject: [R] Likelihood returning inf values to optim(L-BFGS-B) other
>> options?
>>
>> Dear R-help list,
>>
>> I am working on an optimization with R by evaluating a likelihood
>> function that contains lots of Gamma calculations (BGNBD: Hardie Fader
>> Lee 2005 Management Science). Since I am forced to implement lower
>> bounds for the four parameters included in the model, I chose the
>> optim() function mith L-BFGS-B as method. But the likelihood often
>> returns inf-values which L-BFGS-B can't deal with.
>>
>> Are there any other options to implement an optimization algorithm
>> with R accounting for lower bounds and a four parameter-space?
>>
>> Here is the error message I receive (german):
>> --
>>>
>> out=optim(c(.1,.1,.1,.1),fn,method="L-BFGS-B",lower=c(.0001,.0001,.0001,.000
>> 1,.0001))
>> Fehler in optim(c(0.1, 0.1, 0.1, 0.1), fn, method = "L-BFGS-B", lower
>> = c(1e-04,  :
>>  L-BFGS-B benötigt endliche Werte von 'fn'
>> Zusätzlich: Es gab 50 oder mehr Warnungen (Anzeige der ersten 50 mit
>> warnings())
>> --
>> And this is the likelihood function:
>> --
>> fn<-function(p) {
>>A1=(gamma(p[1]+x)*p[2]^p[1])/(gamma(p[1]))
>>A2=(gamma(p[3]+p[4])*gamma(p[4]+x))/(gamma(p[4])*gamma(p[3]+p[4]+x))
>>A3=(1/(p[2]+Tvec))^(p[1]+x)
>>A4=(p[3]/(p[4]+x-1))*((1/(p[2]+t_x))^(p[1]+x))
>>ll=sum(log(A1*A2*(A3+flag*A4)))
>>return(ll)
>> }
>>
>> Thank you very much for your help in advance!
>>
>> Best regards,
>>
>> Michael
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> -- 
> Brian D. Ripley,  [EMAIL PROTECTED]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595



-- 
Michael Jungbluth
Research Associate
Department of Marketing
Ingolstadt School of Management
CU-Eichstaett-Ingolstadt
Germany

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.

Re: [R] Setting where the x-axis crosses the y-axis

2007-04-06 Thread hadley wickham

Hi Stephane,

Drawing a bar plot with log axes is a really bad idea.  The whole
point of a bar is that you are judging the area between the top of the
bar and the y-axis.  If you use a log scaled axis the distance to y=0
is Inf, and your plot isn't really meaningul.

You might want to consider using a dot plot instead.  See
http://www.b-eye-network.com/view/index.php?cid=2468&fc=0&frss=1&ua=
for a good discussion of the issues.

Hadley

On 4/6/07, stephane helleringer <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> my apologies for a probably very obvious question but i can't figure out if, 
> on
> a bar plot, there is a simple way to have the x-axis cross the y-axis at 1,
> when the y-axis is on a log-scale?
> I want to draw a bar plot, and have some of the bars "drop" below 1 while
> starting from 1. Is this possible?
> I have been trying various things using barplot, barplot2 etc... without
> success.
> Thanks a lot for your help,
>
> stephane
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Setting where the x-axis crosses the y-axis

2007-04-06 Thread Richard M. Heiberger

I think you are looking for this

tmp <- abs(rnorm(100, s=8))
ltmp <- log(tmp)
plot(ltmp, type="h", yaxt="n", main="what you want")
exp(par("usr")[3:4])
par("yaxp")[1:2]
logticks <- axTicks(2, axp=c(10^c(-1,3),3), log=TRUE)
axis(2, at=log(logticks), labels=logticks)
plot(tmp, type="h", log="y", main="standard")

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Repeated-measures: aov(), lme() and lmer()

2007-04-06 Thread Francisco Torreira

Hello,

Can anyone confirm if these 3 function calls are equivalent? I would
like to test for treatment effects in a repeated-measures design. The
design is balanced:

# 1:

aov(y~subject+treatment)

# 2:

aov(y~treatment+Error(subject))

# 3:

lme(y~treatment, random= ~1|subject)

# 4:

lmer(y~treatment+(1|subject))

##

Thanks in advance,

-- 
Francisco Torreira
PhD Candidate in Romance Linguistics
University of Illinois at Urbana-Champaign

https://netfiles.uiuc.edu/ftorrei2/www/index.html
tel: (+1) 217 - 778 8510

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Setting where the x-axis crosses the y-axis

2007-04-06 Thread stephane helleringer

Hi all,

my apologies for a probably very obvious question but i can't figure out if, on
a bar plot, there is a simple way to have the x-axis cross the y-axis at 1,
when the y-axis is on a log-scale?
I want to draw a bar plot, and have some of the bars "drop" below 1 while
starting from 1. Is this possible?
I have been trying various things using barplot, barplot2 etc... without
success.
Thanks a lot for your help,

stephane

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.spss (package foreign) and SPSS 15.0 files

2007-04-06 Thread John Kane


--- Frank E Harrell Jr <[EMAIL PROTECTED]>
wrote:

> Charilaos Skiadas wrote:
> > On Apr 6, 2007, at 12:32 PM, John Kane wrote:
> > 
> >> I have simply moved to exporting the SPSS file to
> a
> >> delimited file and loading it. Unfortunately I'm
> >> losing all the labelling which can be
> time-consuming
> >> to redo.Some of the data has something like
> 10
> >> categories for a variable.
> > 
> > I save as csv format all the time, and it offers
> me a choice to use  
> > the labels instead of the corresponding numbers.
> So you shouldn't  
> > have to lose that labelling.
> > 
> > Haris Skiadas
> > Department of Mathematics and Computer Science
> > Hanover College
> 
> That's a different point.  The great advantage of
> read.spss (and the 
> spss.get function in Hmisc that uses it) is that
> long variable labels 
> are supported in addition to variable names.  That's
> why I like getting 
> SPSS or Stata files instead of csv files.  I'm going
> to enhance csv.get 
> in Hmisc to allow a row number to be specified, to
> contain long variable 
> labels.
> 
> Frank
> 
 Ah, I missed that point. I think it is a "little" 
bit less important to me but I do notice that I
'label' just about everything.  Trying to rememeber
what "sexy" meant 3 months ago is not always easy even
when I'm reading the code. :)

The enhancement will, definitely, be appreciated.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reasons to Use R

2007-04-06 Thread Stephen Tucker

Regarding (2),

I wonder if this information is too outdated or not relevant when scaled up
to larger problems...

http://www.sciviews.org/benchmark/index.html




--- Ramon Diaz-Uriarte <[EMAIL PROTECTED]> wrote:

> Dear Lorenzo,
> 
> I'll try not to repeat what other have answered before.
> 
> On 4/5/07, Lorenzo Isella <[EMAIL PROTECTED]> wrote:
> > The institute I work for is organizing an internal workshop for High
> > Performance Computing (HPC).
> (...)
> 
> > (1)Institutions (not only academia) using R
> 
> You can count my institution too. Several groups. (I can provide more
> details off-list if you want).
> 
> > (2)Hardware requirements, possibly benchmarks
> > (3)R & clusters, R & multiple CPU machines, R performance on different
> hardware.
> 
> We do use R in commodity off-the shelf clusters; our two clusters are
> running Debian GNU/Linux; both 32-bit machines ---Xeons--- and 64-bit
> machines ---dual-core AMD Opterons. We use parallelization quite a
> bit, with MPI (via Rmpi and papply packages mainly). One convenient
> feature is that (once the lam universe is up and running) whether we
> are using the 4 cores in a single box, or the max available 120, is
> completeley transparent. Using R and MPI is, really, a piece of cake.
> That said, there are things that I miss; in particular, oftentimes I
> wish R were Erlang or Oz because of the straightforward fault-tolerant
> distributed computing and the built-in abstractions for distribution
> and concurrency. The issue of multithreading has come up several times
> in this list and is something that some people miss.
> 
> I am not sure how much R is used in the usual HPC realms. It is my
> understanding that the "traditional HPC" is still dominated by things
> such as HPF, and C with MPI, OpenMP, or UPC or Cilk. The usual answer
> to "but R is too slow" is "but you can write Fortran or C code for the
> bottlenecks and call it from R". I guess you could use, say, UPC in
> that C that is linked to R, but I have no experience. And I think this
> code can become a pain to write and maintain (specially if you want to
> play around with what you try to parallelize, etc). My feeling (based
> on no information or documentation whatsoever) is that how far R can
> be stretched or extended into HPC is still an open question.
> 
> 
> > (4)finally, a list of the advantages for using R over commercial
> > statistical packages. The money-saving in itself is not a reason good
> > enough and some people are scared by the lack of professional support,
> > though this mailing list is simply wonderful.
> >
> 
> (In addition to all the already mentioned answers)
> Complete source code availability. Being able to look at the C source
> code for a few things has been invaluable for me.
> And, of course, and extremely active, responsive, and vibrant
> community that, among other things, has contributed packages and code
> for an incredible range of problems.
> 
> 
> Best,
> 
> R.
> 
> P.S. I'd be interested in hearing about the responses you get to your
> presentation.
> 
> 
> > Kind Regards
> >
> > Lorenzo Isella
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 
> -- 
> Ramon Diaz-Uriarte
> Statistical Computing Team
> Structural Biology and Biocomputing Programme
> Spanish National Cancer Centre (CNIO)
> http://ligarto.org/rdiaz
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



 

TV dinner still cooling?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expression

2007-04-06 Thread Laurent Rhelp

Uwe Ligges a écrit :

>
>
> Laurent Rhelp wrote:
>
>> Dear R-List,
>>
>>  I have a great many files in a directory and I would like to 
>> replace in every file the character " by the character ' and in the 
>> same time, I have to change ' by '' (i.e. the character ' twice and 
>> not the unique character ") when the character ' is embodied in "."
>>   So, "." becomes '.' and ".'.." becomes '.''..'
>> Certainly, regular expression could help me but I am not able to use it.
>>
>> How can I do that with R ?
>
>
>
> In fact, you do not need to know anything about regular expressions in 
> this case, since you are simply going to replace certain characters by 
> others without any fuzzy restrictions:
>
> x <- "\".'..\""
> cat(x, "\n")
> xn <- gsub('"', "'", gsub("'", "''", x))
> cat(xn, "\n")
>
>
> Uwe Ligges
>
>
>> Thank you very much
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>

Yes, You are right. So I wrote the code below (that I find a little 
awkward but it works).

##-

dirdata <- getwd()
fichnames <- list.files(path=paste(dirdata,"\\initial\\",sep=""))

for( i in 1:length(fichnames)){

  filein <- paste(dirdata,"\\initial\\",fichnames[i],sep="")
  conin <- file(filein)
  open(conin)   
  nbrows <- length( readLines(conin,n=-1) )
  close(conin)

  fileout <- paste(dirdata,"\\result\\",fichnames[i],sep="")
  conout <- file(fileout,"w")

  conin <- file(filein)
  open(conin)


  for( l in 1:nbrows )
  {
text <- gsub('"',"'",gsub("'","''",readLines(conin,n=1)))
writeLines(con=conout,text=text)
  }

  close(conin)
  close(conout)
  }

##--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expression

2007-04-06 Thread Uwe Ligges



Laurent Rhelp wrote:
> Dear R-List,
> 
>  I have a great many files in a directory and I would like to 
> replace in every file the character " by the character ' and in the same 
> time, I have to change ' by '' (i.e. the character ' twice and not the 
> unique character ") when the character ' is embodied in "."
>   So, "." becomes '.' and ".'.." becomes '.''..'
> Certainly, regular expression could help me but I am not able to use it.
> 
> How can I do that with R ?


In fact, you do not need to know anything about regular expressions in 
this case, since you are simply going to replace certain characters by 
others without any fuzzy restrictions:

x <- "\".'..\""
cat(x, "\n")
xn <- gsub('"', "'", gsub("'", "''", x))
cat(xn, "\n")


Uwe Ligges


> Thank you very much
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotting multilevel / lme lines

2007-04-06 Thread toby909

Yesterday, I tried to do exactly this, too. Below is my approach. 
Unfortunately, 
I did not find a textbook example against which I could verify my code. Hints, 
simplifications, and verification highly appreciated!!!

### random slope! model
v = lme(POPULAR ~ SEX + TEXP, data=dta, random = ~ SEX | SCHOOL)
### setting up the thing to plot
a = matrix(t(cbind(0, coef(v)[,1], 1, coef(v)[,2])), ncol=2, byrow=1)
plot(a)
### connecting only the pairs I want to have connected, ie dont connect all dots
for (i in 1:(length(a)/4)*2-1) lines(a[i:(i+1),])

Thanks Toby

data downloaded from http://www.ats.ucla.edu/stat/examples/ma_hox/default.htm

Rense Nieuwenhuis wrote:
> Dear expeRts,
> 
> I am trying to plot a lme-object {package nlme) in such a way, that  
> on a selected level the x-axis represents the value on a selected  
> predictor and the y-axis represents the predicted-outcome variable.  
> The graphs would than consist of several lines that each represent  
> one group. I can't find such a plotting function.
> 
> I could write such a function myself, based on ranef() and fixef(),  
> but it would be a waste of time if such a function would already exist.
> 
> Does any of you such a function?
> 
> Regards,
> 
> Rense Nieuwenhuis
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reasons to Use R

2007-04-06 Thread Wilfred Zegwaard

Dear Lorenzo and Steven,

I'm not a programmer, but I have the experience that R is good for
processing large datasets, especially in combination with specialised
statistics. There are some limits to that, but R handles large datasets
/ complicated computation a lot better that SPSS for example. I cannot
speak of Fortran, but I have the experience of Pascal. I prefer R,
because in Pascal you become easily confused an endless programming
effort which has nothing to do with the problem. I do like Pascal, it's
the only programming language I actually learned, but it isn't an
adequate replacement of R.
The experience I have is that the SPSS language, and menu-driven
package, is far easier to handle than R, but when it comes to specific
computations, SPSS loses it, by far. Non-parametrics is good in R, e.g.
Dataset handling is adequate (my SPSS ports can be read), I noticed that
R has good numerical routines like optimisation (even mixed integer
programming), good procedures for regression (GLM, which is not an SPSS
standard). Try to compute a Kendall-W statistic in SPSS. It's relatively
easy in R.
The only thing that I DON'T like about R is dataset computations and
it's syntax. When I have a dataset with only non-parametric content
which is also "dirty" (dataset is incomplete / wrong value), I have to
call in almost a technician how to do that. To be honest: I use a
spreadsheet for these dataset computations, and then export it to R. But
I noted in R there are several solutions for that. With SciViews I could
get a basic feeling for it.
Pascal is basically the only programming language that I syntactically
understood. It had a kind of logical mathematical structure to it. The
logic of Fortran (and to some extent R): I completely miss it.

Statistically: R is my choice, and luckely most procedures in R are
easily accessible. And my experience with computations in R are... good.

I have done in the past simulations, especially with time-series, but I
cannot recommend R for it (arima.sim is not sufficient for these types
of simulations). I still would prefer Pascal for it. There is also an
excellent open source package for Pascal: Free Pascal, but I hardly use
it. I do have some good experiences with computations in C, but little
experience. Instead of C I would prefer R, I believe.

Cheers,

Wilfred

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reasons to Use R

2007-04-06 Thread Ramon Diaz-Uriarte

Dear Lorenzo,

I'll try not to repeat what other have answered before.

On 4/5/07, Lorenzo Isella <[EMAIL PROTECTED]> wrote:
> The institute I work for is organizing an internal workshop for High
> Performance Computing (HPC).
(...)

> (1)Institutions (not only academia) using R

You can count my institution too. Several groups. (I can provide more
details off-list if you want).

> (2)Hardware requirements, possibly benchmarks
> (3)R & clusters, R & multiple CPU machines, R performance on different 
> hardware.

We do use R in commodity off-the shelf clusters; our two clusters are
running Debian GNU/Linux; both 32-bit machines ---Xeons--- and 64-bit
machines ---dual-core AMD Opterons. We use parallelization quite a
bit, with MPI (via Rmpi and papply packages mainly). One convenient
feature is that (once the lam universe is up and running) whether we
are using the 4 cores in a single box, or the max available 120, is
completeley transparent. Using R and MPI is, really, a piece of cake.
That said, there are things that I miss; in particular, oftentimes I
wish R were Erlang or Oz because of the straightforward fault-tolerant
distributed computing and the built-in abstractions for distribution
and concurrency. The issue of multithreading has come up several times
in this list and is something that some people miss.

I am not sure how much R is used in the usual HPC realms. It is my
understanding that the "traditional HPC" is still dominated by things
such as HPF, and C with MPI, OpenMP, or UPC or Cilk. The usual answer
to "but R is too slow" is "but you can write Fortran or C code for the
bottlenecks and call it from R". I guess you could use, say, UPC in
that C that is linked to R, but I have no experience. And I think this
code can become a pain to write and maintain (specially if you want to
play around with what you try to parallelize, etc). My feeling (based
on no information or documentation whatsoever) is that how far R can
be stretched or extended into HPC is still an open question.

> (4)finally, a list of the advantages for using R over commercial
> statistical packages. The money-saving in itself is not a reason good
> enough and some people are scared by the lack of professional support,
> though this mailing list is simply wonderful.
>

(In addition to all the already mentioned answers)
Complete source code availability. Being able to look at the C source
code for a few things has been invaluable for me.
And, of course, and extremely active, responsive, and vibrant
community that, among other things, has contributed packages and code
for an incredible range of problems.

Best,

R.

P.S. I'd be interested in hearing about the responses you get to your
presentation.

> Kind Regards
>
> Lorenzo Isella
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.spss (package foreign) and SPSS 15.0 files

2007-04-06 Thread Frank E Harrell Jr

Charilaos Skiadas wrote:
> On Apr 6, 2007, at 12:32 PM, John Kane wrote:
> 
>> I have simply moved to exporting the SPSS file to a
>> delimited file and loading it. Unfortunately I'm
>> losing all the labelling which can be time-consuming
>> to redo.Some of the data has something like 10
>> categories for a variable.
> 
> I save as csv format all the time, and it offers me a choice to use  
> the labels instead of the corresponding numbers. So you shouldn't  
> have to lose that labelling.
> 
> Haris Skiadas
> Department of Mathematics and Computer Science
> Hanover College

That's a different point.  The great advantage of read.spss (and the 
spss.get function in Hmisc that uses it) is that long variable labels 
are supported in addition to variable names.  That's why I like getting 
SPSS or Stata files instead of csv files.  I'm going to enhance csv.get 
in Hmisc to allow a row number to be specified, to contain long variable 
labels.

Frank

> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] regular expression

2007-04-06 Thread Laurent Rhelp

Dear R-List,

 I have a great many files in a directory and I would like to 
replace in every file the character " by the character ' and in the same 
time, I have to change ' by '' (i.e. the character ' twice and not the 
unique character ") when the character ' is embodied in "."
  So, "." becomes '.' and ".'.." becomes '.''..'
Certainly, regular expression could help me but I am not able to use it.

How can I do that with R ?

Thank you very much

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Labelling boxplot with fivenumber summary

2007-04-06 Thread jim holtman

Here is one way of labeling the values:

# capture data returned by boxplot
x <- boxplot(count ~ spray, data = InsectSprays, col = "lightgray")
# plot each group
for (i in seq(ncol(x$stats))){
text(i, x$stats[,i], labels=x$stats[,i])
}
# if there are outliers, plot them
if (length(x$out) > 0){
# split the groups so you can get max/min
maxmin <- split(x$out, x$group)
# go through each group getting min/max
lapply(names(maxmin), function(.grp){
.range <- range(maxmin[[.grp]])
text(as.numeric(.grp), .range, labels=.range)
})
}


On 4/6/07, Daniel Siddle <[EMAIL PROTECTED]> wrote:
>
> I am very new to R so forgive me if this seems basic but I have done 
> extensive searching and failed to come up with the answer for myself.
>
> I am trying to label a boxplot I have created with the values for the median, 
> upper and lower quartiles and max and min values.  I have been unable to do 
> this or find anything on the net to say how it might be done.  Is this 
> possible and if so how?  Regards,
>
> Daniel Siddle
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.spss (package foreign) and SPSS 15.0 files

2007-04-06 Thread John Kane

Thanks, that's excellent news.  This is a relatively
new problem for me and we don't have SPSS on the local
machines so I have not been experimenting. 

Given that I used SPSS for about 3 minutes in the last
5-6 months I was too cheap to have us get a licence. 
And it's not that far to walk to the nearest lab with
SPSS. :) Next time I'll take a coffee and experiment.

--- Charilaos Skiadas <[EMAIL PROTECTED]> wrote:

> On Apr 6, 2007, at 12:32 PM, John Kane wrote:
> 
> > I have simply moved to exporting the SPSS file to
> a
> > delimited file and loading it. Unfortunately I'm
> > losing all the labelling which can be
> time-consuming
> > to redo.Some of the data has something like 10
> > categories for a variable.
> 
> I save as csv format all the time, and it offers me
> a choice to use  
> the labels instead of the corresponding numbers. So
> you shouldn't  
> have to lose that labelling.
> 
> Haris Skiadas
> Department of Mathematics and Computer Science
> Hanover College
> 
> 
> 
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Labelling boxplot with fivenumber summary

2007-04-06 Thread Chuck Cleland

Daniel Siddle wrote:
> I am very new to R so forgive me if this seems basic but I have done 
> extensive searching and failed to come up with the answer for myself.
> 
> I am trying to label a boxplot I have created with the values for the median, 
> upper and lower quartiles and max and min values.  I have been unable to do 
> this or find anything on the net to say how it might be done.  Is this 
> possible and if so how?  Regards,

  Here is another idea:

fn <- boxplot(ToothGrowth$len, plot=FALSE)$stats

par(mar=c(4,6,4,2))
boxplot(ToothGrowth$len, ylab="Length", at=.80)
text(1.15, fn[1], paste("Minimum Value =", fn[1]), adj=0, cex=.7)
text(1.15, fn[2], paste("Lower Quartile =", fn[2]), adj=0, cex=.7)
text(1.15, fn[3], paste("Median =", fn[3]), adj=0, cex=.7)
text(1.15, fn[4], paste("Upper Quartile =", fn[4]), adj=0, cex=.7)
text(1.15, fn[5], paste("Maximum Value =", fn[5]), adj=0, cex=.7)
arrows(1.14, fn[1], 1.02, fn[1])
arrows(1.14, fn[2], 1.02, fn[2])
arrows(1.14, fn[3], 1.02, fn[3])
arrows(1.14, fn[4], 1.02, fn[4])
arrows(1.14, fn[5], 1.02, fn[5])
title("Annotated Boxplot of Tooth Growth")

> Daniel Siddle
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.spss (package foreign) and SPSS 15.0 files

2007-04-06 Thread Charilaos Skiadas

On Apr 6, 2007, at 12:32 PM, John Kane wrote:

> I have simply moved to exporting the SPSS file to a
> delimited file and loading it. Unfortunately I'm
> losing all the labelling which can be time-consuming
> to redo.Some of the data has something like 10
> categories for a variable.

I save as csv format all the time, and it offers me a choice to use  
the labels instead of the corresponding numbers. So you shouldn't  
have to lose that labelling.

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Computing the rank of a matrix.

2007-04-06 Thread Ravi Varadhan

Hi,

qr(A)$rank will work, but just be wary of the tolerance parameter (default
is 1.e-07), since the rank computation could be sensitive to the tolerance
chosen.  

Ravi.


---

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: [EMAIL PROTECTED]

Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html

 




-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
Sent: Friday, April 06, 2007 11:07 AM
To: José Luis Aznarte M.
Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED]
Subject: Re: [R] Computing the rank of a matrix.

How about

  qr(A)$rank

or perhaps

  qr(A, LAPACK=TRUE)$rank

Cheers,

Andy

__
Andy Jaworski
518-1-01
Process Laboratory
3M Corporate Research Laboratory
-
E-mail: [EMAIL PROTECTED]
Tel:  (651) 733-6092
Fax:  (651) 736-3122


   
 "José Luis
 Aznarte M."   
 <[EMAIL PROTECTED]  To 
 .ugr.es>  r-help@stat.math.ethz.ch
 Sent by:   cc 
 [EMAIL PROTECTED] 
 at.math.ethz.ch   Subject 
   [R] Computing the rank of a matrix. 
   
 04/06/2007 06:39  
 AM
   
   
   




Hi! Maybe this is a silly question, but I need the column rank
(http://en.wikipedia.org/wiki/Rank_matrix) of a matrix and R function
'rank()' only gives me the ordering of the elements of my matrix.
How can I compute the column rank of a matrix? Is there not an R
equivalent to Matlab's 'rank()'?
I've been browsing for a time now and I can't find anything, so any
help will be greatly appreciated. Best regards!


--  --
Jose Luis Aznarte M.   http://decsai.ugr.es/~jlaznarte
Department of Computer Science and Artificial Intelligence
Universidad de Granada   Tel. +34 - 958 - 24 04 67
GRANADA (Spain)  Fax: +34 - 958 - 24 00 79

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.spss (package foreign) and SPSS 15.0 files

2007-04-06 Thread Thomas Lumley

On Thu, 5 Apr 2007, John Kane wrote:
> Heck. I'd be happy to get an answer to what is
> happening here:
>> mac <- spss.get("H:/ONTH/Raw.data/Follow.sav")
> Warning message:
> H:/ONTH/Raw.data/Follow.sav: Unrecognized record type
> 7, subtype 16 encountered in system file
>

It means that your file had a record of type 7, subtype 16 in it, and 
read.spss doesn't know how to handle these.

You would have to ask SPSS what record type 7 and subtype 16 represent -- 
their software put them there, and it's their terminology.

People's experience with unrecognised record types is that they usually 
don't matter, which would make sense from a backwards-compatibility point 
of view, but in the absence of documentation or psychic powers it is hard 
to be sure.  Avoiding read.spss is a perfectly reasonable strategy, and is 
in fact what we have always recommended in the Data Import-Export manual.

AFAIK the only commercial statistical software vendor that does provide 
complete, public documentation of their file formats is Stata, and this 
is one reason why there are fewer complaints about read.dta and write.dta. 
It also probably helps that the code was written by someone who uses Stata 
-- there hasn't been much contribution of code or patches for the 
foreign package from SPSS users.

-thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.spss (package foreign) and SPSS 15.0 files

2007-04-06 Thread John Kane

--- Thomas Lumley <[EMAIL PROTECTED]> wrote:

> On Thu, 5 Apr 2007, John Kane wrote:
> > Heck. I'd be happy to get an answer to what is
> > happening here:
> >> mac <- spss.get("H:/ONTH/Raw.data/Follow.sav")
> > Warning message:
> > H:/ONTH/Raw.data/Follow.sav: Unrecognized record
> type
> > 7, subtype 16 encountered in system file
> >
> 
> It means that your file had a record of type 7,
> subtype 16 in it, and 
> read.spss doesn't know how to handle these.
> 
> You would have to ask SPSS what record type 7 and
> subtype 16 represent -- 
> their software put them there, and it's their
> terminology.
> 
> People's experience with unrecognised record types
> is that they usually 
> don't matter, which would make sense from a
> backwards-compatibility point 
> of view, but in the absence of documentation or
> psychic powers it is hard 
> to be sure.  

Yes, that actually was what I meant.  I have had no
problems with SPSS 12 but 14 seems a bit nasty. 

 Sometime I may get a change to build a couple of test
files in SPSS that I can check. 

>Avoiding read.spss is a perfectly
> reasonable strategy, and is 
> in fact what we have always recommended in the Data
> Import-Export manual.

I have simply moved to exporting the SPSS file to a
delimited file and loading it. Unfortunately I'm
losing all the labelling which can be time-consuming
to redo.Some of the data has something like 10
categories for a variable.

> 
> AFAIK the only commercial statistical software
> vendor that does provide 
> complete, public documentation of their file formats
> is Stata, and this 
> is one reason why there are fewer complaints about
> read.dta and write.dta. 
> It also probably helps that the code was written by
> someone who uses Stata 
> -- there hasn't been much contribution of code or
> patches for the 
> foreign package from SPSS users.
> 
> 
>   -thomas
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading a csv file row by row

2007-04-06 Thread Henrik Bengtsson

Hi.

On 4/6/07, Yuchen Luo <[EMAIL PROTECTED]> wrote:
> Hi, my friends.
> When a data file is large, loading the whole file into the memory all
> together is not feasible. A feasible way  is to read one row, process it,
> store the result, and read the next row.
>
>
> In Fortran, by default, the 'read' command reads one line of a file, which
> is convenient, and when the same 'read' command is executed the next time,
> the next row of the same file will be read.
>
> I tried to replicate such row-by-row reading in R.I use scan( ) to do so
> with the "skip= xxx " option. It takes only seconds when the number of the
> rows is within 1000. However, it takes hours to read 1 rows. I think it
> is because every time R reads, it needs to start from the first row of the
> file and count xxx rows to find the row it needs to read. Therefore, it
> takes more time for R to locate the row it needs to read.

Yes, to skip rows scan() needs to locate every single row (line
feed/carriage return).  The only gain you get is that it does not have
to parse and store the contents of those skipped lines.

One solution is to first go through the file and register the file
position of the first character in every line, and then make use of
this in subsequent reads.  In order to do this, you have to work with
an opened connection and pass that to scan instead.  Rough sketch:

con <- file(pathname, open="r")

# Scan file for first position of every line
rowStarts <- scanForRowStarts(con);

# Skip to a certain row and read a set of lines:
seek(con, where=rowStarts, origin="start", rw="r)
data <- scan(con, ..., skip=0, nlines=rowsPerChunk)

close(con)

That's the idea.  The tricky part is to get scanForRowStarts()
correct.  After reading a line you can always query the connection for
the current file position using:

  pos <- seek(con, rw="r")

so you could always iterate between readLines(con, n=1) and pos <-
c(pos, seek(con, rw="r")), but there might be a faster way.

Cheers

/Henrik

>
> Is there a solution to this problem?
>
> Your help will be highly appreciated!
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Computing the rank of a matrix.

2007-04-06 Thread apjaworski

How about

  qr(A)$rank

or perhaps

  qr(A, LAPACK=TRUE)$rank

Cheers,

Andy

__
Andy Jaworski
518-1-01
Process Laboratory
3M Corporate Research Laboratory
-
E-mail: [EMAIL PROTECTED]
Tel:  (651) 733-6092
Fax:  (651) 736-3122


   
 "José Luis
 Aznarte M."   
 <[EMAIL PROTECTED]  To 
 .ugr.es>  r-help@stat.math.ethz.ch
 Sent by:   cc 
 [EMAIL PROTECTED] 
 at.math.ethz.ch   Subject 
   [R] Computing the rank of a matrix. 
   
 04/06/2007 06:39  
 AM
   
   
   




Hi! Maybe this is a silly question, but I need the column rank
(http://en.wikipedia.org/wiki/Rank_matrix) of a matrix and R function
'rank()' only gives me the ordering of the elements of my matrix.
How can I compute the column rank of a matrix? Is there not an R
equivalent to Matlab's 'rank()'?
I've been browsing for a time now and I can't find anything, so any
help will be greatly appreciated. Best regards!


--  --
Jose Luis Aznarte M.   http://decsai.ugr.es/~jlaznarte
Department of Computer Science and Artificial Intelligence
Universidad de Granada   Tel. +34 - 958 - 24 04 67
GRANADA (Spain)  Fax: +34 - 958 - 24 00 79

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Labelling boxplot with fivenumber summary

2007-04-06 Thread Chuck Cleland

Daniel Siddle wrote:
> I am very new to R so forgive me if this seems basic but I have done 
> extensive searching and failed to come up with the answer for myself.
> 
> I am trying to label a boxplot I have created with the values for the median, 
> upper and lower quartiles and max and min values.  I have been unable to do 
> this or find anything on the net to say how it might be done.  Is this 
> possible and if so how?  Regards,

  This message from back in 2002 gives a function called bp.example(),
which shows how a boxplot might be annotated:

http://tolstoy.newcastle.edu.au/R/help/02a/1515.html

  You could easily modify it into a stripped down version that does what
you want.

> Daniel Siddle
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reasons to Use R

2007-04-06 Thread Roland Rau

Hi Lorenzo,

On 4/5/07, Lorenzo Isella <[EMAIL PROTECTED]> wrote:
>
> I would like to have suggestions about where to collect info about:
> (1)Institutions (not only academia) using R

A starting point might be to look at the R-project homepage and look at the
members and donors list. This is, of course, not a comprehensive list; but
at least it can give an overview in which diverse backgrounds people are
using R --- even if it is only the tip of the iceberg.

(2)Hardware requirements, possibly benchmarks

Maybe you should also mention that you can run just from a USB stick if you
want (See R for Windows FAQ 2.6).

(3)R & clusters, R & multiple CPU machines, R performance on different
> hardware.

Have a look a the 'R Administration and Installation' manual; it gives a
nice overview on how many platforms are is running.

Best,
Roland

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Labelling boxplot with fivenumber summary

2007-04-06 Thread Daniel Siddle


I am very new to R so forgive me if this seems basic but I have done extensive 
searching and failed to come up with the answer for myself.

I am trying to label a boxplot I have created with the values for the median, 
upper and lower quartiles and max and min values.  I have been unable to do 
this or find anything on the net to say how it might be done.  Is this possible 
and if so how?  Regards,

Daniel Siddle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.spss (package foreign) and SPSS 15.0 files

2007-04-06 Thread John Kane


--- Prof Brian Ripley <[EMAIL PROTECTED]> wrote:

> On Thu, 5 Apr 2007, Michael Conklin wrote:
> 
> > Not being the developer I cannot answer
> definitively but, as a frequent 
> > user of SPSS files I can give you my experience.
> >
> > 1) The unrecognized coding is perhaps due to the
> locale of the SPSS 
> > installation. I have had success reading in files
> from version 15 but 
> > often encounter that error when the file was
> created with data that 
> > included some foreign language. I often receive
> survey files that were 
> > administered in a non-English language and that is
> when I usually see 
> > the error.
> 
> That is what is surmised in this recent R-devel
> thread:
> 
>
https://stat.ethz.ch/pipermail/r-devel/2007-April/045238.html
> 
> although it may also happen in an English locale
> (since after all Windows 
> uses codepage 1252, not ASCII, for American
> 'English').
> 
> The next release of package foreign will give a
> warning (rather than an 
> error) with an unrecognized encoding and recognize a
> few more.
> 
> > 2) My experience with the "Warning - unrecognized
> record type" message 
> > is that it has no effect whatsoever on the data
> file.
> >
> > 3) Others on the list have noted that you are
> safer exporting POR files 
> > instead of SAV files from SPSS. Both are read by
> the read.spss function.
> 
> The R Data Import/Export manual recommends an open
> format such as .csv.
> (Look like John Kane has yet to read it )  

Well, as I mentioned, I've been using a tab delimited
approach. I suppose I could move to .csv...  
>R
> does have quite extensive 
> facilities for dealing with encodings in text files.
> 
> >
> > Hope that helps.
> >
> >
> >
> > Michael Conklin
> > Chief Methodologist - Advanced Analytics
> > MarketTools, Inc.
> >
> >
> > -Original Message-
> > From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf
> Of John Kane
> > Sent: Thursday, April 05, 2007 7:45 PM
> > To: RINNER Heinrich; r-help@stat.math.ethz.ch
> > Subject: Re: [R] read.spss (package foreign) and
> SPSS 15.0 files
> >
> >
> > --- RINNER Heinrich <[EMAIL PROTECTED]>
> > wrote:
> >
> >> Hello,
> >>
> >> does anyone have experience with reading SPSS
> >> Version 15.0 files into R (version 2.4.1, WinXP)?
> >>
> >> I have long been sucessfully reading SPSS files
> with
> >> read.spss from the wonderful foreign package, but
> >> somehow after upgrading from SPSS14 to SPSS15 I
> seem
> >> to have problems.
> >>
> >> Trying a simple example, where test.sav is a SPSS
> >> 15.0 data file consisting of x1=c(1,2,3) and
> >> x2=c("a","b","c"), I get this:
> >>> read.spss(file = "C:\\temp\\test.sav")
> >> Fehler in read.spss(file = "C:\\temp\\test.sav")
> :
> >> error reading system-file header
> >> Zusätzlich: Warning message:
> >> C:\temp\test.sav: File-indicated character
> >> representation code (Unknown) is not ASCII
> >>
> >> version infos:
> >> R version 2.4.1 (under WinXP)
> >> foreign version 0.8-18
> >>
> >> Has anyone experienced the same, and can give a
> >> solution here (possibly other than "downgrade to
> >> SPSS14.0" ;-))?
> >>
> >> Regards,
> >> Heinrich.
> >
> > Heck. I'd be happy to get an answer to what is
> > happening here:
> >> mac <- spss.get("H:/ONTH/Raw.data/Follow.sav")
> > Warning message:
> > H:/ONTH/Raw.data/Follow.sav: Unrecognized record
> type
> > 7, subtype 16 encountered in system file
> >
> > I have taken to exporting the file to a delimited
> > format and reading it into R since I cannot trust
> the
> > R import.
> 
> 
> -- 
> Brian D. Ripley, 
> [EMAIL PROTECTED]
> Professor of Applied Statistics, 
> http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865
> 272861 (self)
> 1 South Parks Road, +44 1865
> 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865
272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Likelihood returning inf values to optim(L-BFGS-B) other

2007-04-06 Thread tfjbl

Hello,
A couple of ideas...


Im not clear on your whole problem however...

Consider  making use of the lgamma function, which returns the natural 
log of the gamma function. This may help.
The gamma function gets awfully, big very fast.

Also multivariable likelihoods can be bumpy like a mountain range, with 
minor peaks and valleys. It is possible that your likelihood has such a 
shape. Maybe each iteration Xn is trying to get closer to the main 
peak, but instead goes up the ridge of a valley and gets lost, 
ultimately reaching a boundary of the region.

You could try starting at a variety of locations. Possibly many 
hundreds of starting points, randomly selected from within your region.
Then examine the ending point for each starting point. 

If you do have a bumpy likelihood surface you might have to start very 
close to the actual maximum to get there. Surface plots might help, 
setting some variables to a constant. I know in 4D this will be tough.

Here is a useful reference that helped me recently with a similar 
maximization problem:

"Computational Statistics"
by Geof H. Givens and Jennifer A. Hoeting

They have R-code examples here:
http://www.stat.colostate.edu/computationalstatistics/



Good luck!

Joe Liddle
University of Alaska Southeast

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] dbinom and Catherine Loader

2007-04-06 Thread Ted Harding

Hi Folks,
There has been past correspondence regarding Catherine Loader's
Bell Labs (oops, Lucent) paper

  "Fast and Accurate Computation of Binomial Probabilities"

which gives the algorithm on which R's dbinom() is based.

The original URL given in the R documentation "?dbinom" is:

http://cm.bell-labs.com/cm/ms/departments/sia/catherine/dbinom

but this link is dead. Likewise, Marc Schwarz (in reply to
Aries Arditi on Thu Dec 11 2003) gives

  http://kiefer.stat.cwru.edu/~catherine/pubs.html

  "There is a link to the paper (as a Postscript file)
   at the bottom of that page, however the link appears
   to be dead."

I've just discovered that Catherine Loader seems to have cunningly
encoded herself as "c at herine.net". So now we can find a URL
for her dbinom:

  http://www.herine.net/stat/software/dbinom.html

which points to a PDF of the above paper at

  http://www.herine.net/stat/papers/dbinom.pdf

(which, today at least, works). More generally, see

  http://www.herine.net/stat/index.html

Best wishes to all,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 06-Apr-07   Time: 14:56:29
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reasons to Use R

2007-04-06 Thread bogdan romocea

> (1)Institutions (not only academia) using R

http://www.r-project.org/useR-2006/participants.html

> (2)Hardware requirements, possibly benchmarks

Since you mention huge data sets, GNU/Linux running on 64-bit machines
with as much RAM as your budget allows.

> (3)R & clusters, R & multiple CPU machines,
> R performance on different hardware.

OpenMosix, Quantian for clusters; the archive for multiple CPUs (this
was asked quite a few times). It may be best to measure R performance
on different hardware by yourself, using your own data and code.

> (4)finally, a list of the advantages for using R over
> commercial statistical packages.

I'd say it's not R vs. commercial packages, but S vs. the rest of the
world. Check http://www.insightful.com/ , much of what they say is
applicable to R. Make the case that S is vastly superior directly, not
just through a list of reasons: take a few data sets and show how they
can be analyzed with S compared to other choices. Both R and S-Plus
are likely to significantly outperform most other software, depending
on the kind of work that needs to be done.


> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Lorenzo Isella
> Sent: Thursday, April 05, 2007 11:02 AM
> To: r-help@stat.math.ethz.ch
> Subject: [R] Reasons to Use R
>
> Dear All,
> The institute I work for is organizing an internal workshop for High
> Performance Computing (HPC).
> I am planning to attend it and talk a bit about fluid dynamics, but
> there is also quite a lot of interest devoted to data post-processing
> and management of huge data sets.
> A lot of people are interested in image processing/pattern recognition
> and statistic applied to geography/ecology, but I would like not to
> post this on too many lists.
> The final aim of the workshop is  understanding hardware requirements
> and drafting a list of the equipment we would like to buy. I think
> this could be the venue to talk about R as well.
> Therefore, even if it is not exactly a typical mailing list question,
> I would like to have suggestions about where to collect info about:
> (1)Institutions (not only academia) using R
> (2)Hardware requirements, possibly benchmarks
> (3)R & clusters, R & multiple CPU machines, R performance on
> different hardware.
> (4)finally, a list of the advantages for using R over commercial
> statistical packages. The money-saving in itself is not a reason good
> enough and some people are scared by the lack of professional support,
> though this mailing list is simply wonderful.
>
> Kind Regards
>
> Lorenzo Isella
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading a csv file row by row

2007-04-06 Thread ronggui

And _file()_ is helpful in such situation.

R/S-PLUS Fundamentals and Programming Techniques by Thomas Lumley has
something relavant in page 185 (total page is 208).

I believe you can find it by googling.



On 4/6/07, Martin Becker <[EMAIL PROTECTED]> wrote:
> readLines (which is mentioned in the "See also" section of ?scan with
> the hint "to read a file a line at a time") should work.
>
> Regards,
>   Martin
>
> Yuchen Luo schrieb:
> > Hi, my friends.
> > When a data file is large, loading the whole file into the memory all
> > together is not feasible. A feasible way  is to read one row, process it,
> > store the result, and read the next row.
> >
> >
> > In Fortran, by default, the 'read' command reads one line of a file, which
> > is convenient, and when the same 'read' command is executed the next time,
> > the next row of the same file will be read.
> >
> > I tried to replicate such row-by-row reading in R.I use scan( ) to do so
> > with the "skip= xxx " option. It takes only seconds when the number of the
> > rows is within 1000. However, it takes hours to read 1 rows. I think it
> > is because every time R reads, it needs to start from the first row of the
> > file and count xxx rows to find the row it needs to read. Therefore, it
> > takes more time for R to locate the row it needs to read.
> >
> > Is there a solution to this problem?
> >
> > Your help will be highly appreciated!
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Ronggui Huang
Department of Sociology
Fudan University, Shanghai, China

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotting multilevel / lme lines

2007-04-06 Thread Chuck Cleland

Rense Nieuwenhuis wrote:
> Dear expeRts,
> 
> I am trying to plot a lme-object {package nlme) in such a way, that  
> on a selected level the x-axis represents the value on a selected  
> predictor and the y-axis represents the predicted-outcome variable.  
> The graphs would than consist of several lines that each represent  
> one group. I can't find such a plotting function.
> 
> I could write such a function myself, based on ranef() and fixef(),  
> but it would be a waste of time if such a function would already exist.
> 
> Does any of you such a function?

  I don't know of a single function with an lme object as argument, but
for what I think you have in mind, here is how you might go about it:

library(nlme)

fm2 <- lme(distance ~ poly(age, 2) * Sex,
  data = Orthodont, random = ~ 1)

newdat <- expand.grid(age = 8:14, Sex = c("Male","Female"))

newdat$PREDDIST <- predict(fm2, newdat, level = 0)

library(lattice)

xyplot(PREDDIST ~ age, groups=Sex, ylab="Model Predicted Distance",
   data = newdat, xlab="Age",
   panel = function(x, y, ...){
   panel.grid(h=6,v=6)
   panel.superpose(x, y, type="l", ...)},
   main="Orthodont Growth Model",
   key = simpleKey(levels(newdat$Sex),
   lines=TRUE, points=FALSE)
   )

> Regards,
> 
> Rense Nieuwenhuis
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code. 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Computing the rank of a matrix.

2007-04-06 Thread Charilaos Skiadas

On Apr 6, 2007, at 7:39 AM, José Luis Aznarte M. wrote:

> Hi! Maybe this is a silly question, but I need the column rank
> (http://en.wikipedia.org/wiki/Rank_matrix) of a matrix and R function
> 'rank()' only gives me the ordering of the elements of my matrix.
> How can I compute the column rank of a matrix? Is there not an R
> equivalent to Matlab's 'rank()'?
> I've been browsing for a time now and I can't find anything, so  
> any
> help will be greatly appreciated. Best regards!
>
Surprisingly, google searching for "r matrix rank" actually returns a  
R link:

http://tolstoy.newcastle.edu.au/R/help/05/05/4000.html

I suppose the point is that in R you usually need a bit more than  
just the rank, so instead you want an object that contains all that  
info and more. Like we have the various lm objects, so to speak. They  
do the hard work once, and then we can ask them more particular  
questions.

?qr

> --  --
> Jose Luis Aznarte M.   http://decsai.ugr.es/~jlaznarte
> Department of Computer Science and Artificial Intelligence
> Universidad de Granada   Tel. +34 - 958 - 24 04 67
> GRANADA (Spain)  Fax: +34 - 958 - 24 00 79

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Computing the rank of a matrix.

2007-04-06 Thread Paul Smith

On 4/6/07, "José Luis Aznarte M." <[EMAIL PROTECTED]> wrote:
> Hi! Maybe this is a silly question, but I need the column rank
> (http://en.wikipedia.org/wiki/Rank_matrix) of a matrix and R function
> 'rank()' only gives me the ordering of the elements of my matrix.
> How can I compute the column rank of a matrix? Is there not an R
> equivalent to Matlab's 'rank()'?
> I've been browsing for a time now and I can't find anything, so any
> help will be greatly appreciated. Best regards!

This discussion may help you:

http://marc.info/?l=r-help&m=111522337531442&w=2

Paul

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Computing the rank of a matrix.

2007-04-06 Thread José Luis Aznarte M.

Hi! Maybe this is a silly question, but I need the column rank
(http://en.wikipedia.org/wiki/Rank_matrix) of a matrix and R function
'rank()' only gives me the ordering of the elements of my matrix.
How can I compute the column rank of a matrix? Is there not an R
equivalent to Matlab's 'rank()'?
I've been browsing for a time now and I can't find anything, so any
help will be greatly appreciated. Best regards!

 
--  --
Jose Luis Aznarte M.   http://decsai.ugr.es/~jlaznarte
Department of Computer Science and Artificial Intelligence
Universidad de Granada   Tel. +34 - 958 - 24 04 67
GRANADA (Spain)  Fax: +34 - 958 - 24 00 79

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to set the scale of axis?

2007-04-06 Thread Gabor Grothendieck

Try this:

plot(1:100, xaxt = "n")
axis(1, c(1, 20, 40, 60, 80, 100))

# and optionally add this third line
axis(1, 1:100, FALSE, tcl = -0.3)

See ?par (for the xaxt argument to plot) and ?axis .  ?plot and
?plot.default have info on the plot command.  A good source of
sample code for graphics is:

  http://addictedtor.free.fr/graphiques/

On 4/6/07, Shao <[EMAIL PROTECTED]> wrote:
> Hello,everyone.
>
> I want to know how to control the scale of axises.
>
> For example, the range of x axis is (1,100),and I want to show the scale in
> the axis as this:
> 1  20  40  60  80 100.
>
> Is there any parameters in plot() or other functions  to set the scale?
>
> Thands!
>
>[[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] plotting multilevel / lme lines

2007-04-06 Thread Rense Nieuwenhuis

Dear expeRts,

I am trying to plot a lme-object {package nlme) in such a way, that  
on a selected level the x-axis represents the value on a selected  
predictor and the y-axis represents the predicted-outcome variable.  
The graphs would than consist of several lines that each represent  
one group. I can't find such a plotting function.

I could write such a function myself, based on ranef() and fixef(),  
but it would be a waste of time if such a function would already exist.

Does any of you such a function?

Regards,

Rense Nieuwenhuis

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to set the scale of axis?

2007-04-06 Thread Shao

Hello,everyone.

I want to know how to control the scale of axises.

For example, the range of x axis is (1,100),and I want to show the scale in
the axis as this:
1  20  40  60  80 100.

Is there any parameters in plot() or other functions  to set the scale?

Thands!

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to exclude some packages from help.search() ?

2007-04-06 Thread Prof Brian Ripley

On Fri, 6 Apr 2007, Vladimir Eremeev wrote:

>
> I have installed RGTk2 to satisfy other package requirements.
> I am not planning to use it in my own work.
>
> Occasionally I search through the R help using the help.search() function,
> and every time it returns me lots of references to the functions in the
> RGtk2 package, which I don't need.
> I would like to avoid them.
>
> At present, I have renamed the file hsearch.rds in the RGtk2 directory.
>
> This worked, however, help.search now gives a warning, that it didn't find
> that file.
>
> Is there any other way to avoid extraneous information, returned by
> help.search, which is not such crude as mine?

Use the package= argument to say which packages you want, or even install 
little-used packages in a different library and use lib.loc= (or only have 
that library in .libPaths() when you need it).

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading a large csv file row by row

2007-04-06 Thread Prof Brian Ripley

The solution is to read the 'R Data Import/Export Manual' and make use of 
connections or databases.

What you want to do is very easy in RODBC, for example, but can be done 
with scan() easily provided you keep a connection open.

On Fri, 6 Apr 2007, Yuchen Luo wrote:

> Hi, my friends.
>
> When a data file is large, loading the whole file into the memory all
> together is not feasible. A feasible way  is to read one row, process it,
> store the result, and read the next row.

It makes a lot more sense to process say 1000 rows at a time.

> In Fortran, by default, the 'read' command reads one line of a file, which
> is convenient, and when the same 'read' command is executed the next time,
> the next row of the same file will be read.
>
> I tried to replicate such row-by-row reading in R.I use scan( ) to do so
> with the "skip= xxx " option. It takes only seconds when the number of the
> rows is within 1000. However, it takes hours to read 1 rows. I think it
> is because every time R reads, it needs to start from the first row of the
> file and count xxx rows to find the row it needs to read. Therefore, it
> takes more time for R to locate the row it needs to read.

Yes, R does tend to do what you tell it to 

> Is there a solution to this problem?
>
> Your help will be highly appreciated!
> Best Wishes
> Yuchen Luo
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

PLEASE do as we ask.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to exclude some packages from help.search() ?

2007-04-06 Thread Vladimir Eremeev


I have installed RGTk2 to satisfy other package requirements. 
I am not planning to use it in my own work.

Occasionally I search through the R help using the help.search() function,
and every time it returns me lots of references to the functions in the
RGtk2 package, which I don't need.
I would like to avoid them.

At present, I have renamed the file hsearch.rds in the RGtk2 directory.

This worked, however, help.search now gives a warning, that it didn't find
that file.

Is there any other way to avoid extraneous information, returned by
help.search, which is not such crude as mine?
-- 
View this message in context: 
http://www.nabble.com/how-to-exclude-some-packages-from-help.search%28%29---tf3535954.html#a9869861
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading a csv file row by row

2007-04-06 Thread Martin Becker

readLines (which is mentioned in the "See also" section of ?scan with 
the hint "to read a file a line at a time") should work.

Regards,
  Martin

Yuchen Luo schrieb:
> Hi, my friends.
> When a data file is large, loading the whole file into the memory all
> together is not feasible. A feasible way  is to read one row, process it,
> store the result, and read the next row.
>
>
> In Fortran, by default, the 'read' command reads one line of a file, which
> is convenient, and when the same 'read' command is executed the next time,
> the next row of the same file will be read.
>
> I tried to replicate such row-by-row reading in R.I use scan( ) to do so
> with the "skip= xxx " option. It takes only seconds when the number of the
> rows is within 1000. However, it takes hours to read 1 rows. I think it
> is because every time R reads, it needs to start from the first row of the
> file and count xxx rows to find the row it needs to read. Therefore, it
> takes more time for R to locate the row it needs to read.
>
> Is there a solution to this problem?
>
> Your help will be highly appreciated!
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reasons to Use R

2007-04-06 Thread Stephen Tucker

Hi Lorenzo,

I don't think I'm qualified to provide solid information on the first
three questions, but I'd like to drop a few thoughts on (4). While
there are no shortage of language advocates out there, I'd like to
join in for this once. My background is in chemical engineering and
atmospheric science; I've done simulation on a smaller scale but spend
much of my time analyzing large sets of experimental data. I am
comfortable programming in Matlab, R, Python, C, Fortran, Igor Pro,
and I also know a little IDL but have not programmed in it
extensively.

As you are probably aware, I would count among these, Matlab, R,
Python, and IDL as good candidates for processing large data sets, as
they are high-level languages and can communicate with netCDF files
(which I imagine will be used to transfer data).

Each language boasts an impressive array of libraries, but what I
think gives R the advantage for analyzing data is the level of
abstraction in the language. I am extremely impressed with the objects
available to represent data sets, and the functions support them very
well - it requires that I carry around a fewer number of objects to
hold information about my data (and I don't have to "unpack" them to
feed them into functions). The language is also very "expressive" in
that it lets you write a procedure in many different ways, some
shorter, some more readable, depending on what your situation
requires. System commands and text processing are integrated into the
language, and the input/output facilities are excellent, in terms of
data and graphics. Once I have my data object I am only a few
keystrokes to split, sort, and visualize multivariate data; even after
several years I keep discovering new functions for basic things like
manipulation of data objects and descriptive statistics, and plotting
- truly, an analyst's needs have been well anticipated.

And this is a recent obsession of mine, which I was introduced to
through Python, but the functional programming support for R is
amazing. By using higher-order functions like lapply(), I infrequently
rely on FOR-LOOPS, which have often caused me trouble in the past
because I had forgotten to re-initialize a variable, or incremented
the wrong variable, etc. Though I'm definitely not militant about
functional programming, in general I try to write functions and then
apply them to the data (if the functions don't exist in R already),
often through higher-order functions such as lapply(). This approach
keeps most variables out of the global namespace and so I am less
likely to reassign a value to a variable that I had intended to
keep. It also makes my code more modular so that I can re-use bits of
my code as my analysis inevitably grows much larger than I had
originally intended.

Furthermore, my code in R ends up being much, much shorter than code I
imagine writing in other languages to accomplish the same task; I
believe this leads to fewer places for errors to occur, and the nature
of the code is immediately comprehensible (though a series of nested
functions can get pretty hard to read at times), not to mention it
takes less effort to write. This also makes it easier to interact with
the data, I think, because after making a plot I can set up for the
next plot with only a few function calls instead of setting out to
write a block of code with loops, etc.

I have actually recommended R to colleagues who needed to analyze the
information from large-scale air quality/ global climate simulations,
and they are extremely pleased. I think the capability for statistics
and graphics is well-established enough that I don't need to do a
hard-sell on that so much, but R's language is something I get very
excited about. I do appreciate all the contributors who have made this
available.

Best regards,
ST

--- Lorenzo Isella <[EMAIL PROTECTED]> wrote:

> Dear All,
> The institute I work for is organizing an internal workshop for High
> Performance Computing (HPC).
> I am planning to attend it and talk a bit about fluid dynamics, but
> there is also quite a lot of interest devoted to data post-processing
> and management of huge data sets.
> A lot of people are interested in image processing/pattern recognition
> and statistic applied to geography/ecology, but I would like not to
> post this on too many lists.
> The final aim of the workshop is  understanding hardware requirements
> and drafting a list of the equipment we would like to buy. I think
> this could be the venue to talk about R as well.
> Therefore, even if it is not exactly a typical mailing list question,
> I would like to have suggestions about where to collect info about:
> (1)Institutions (not only academia) using R
> (2)Hardware requirements, possibly benchmarks
> (3)R & clusters, R & multiple CPU machines, R performance on different
> hardware.
> (4)finally, a list of the advantages for using R over commercial
> statistical packages. The money-saving in itself is not a reason good
> enough and some peop

Re: [R] Reasons to Use R

2007-04-06 Thread Wilfred Zegwaard

As to my knowledge the core of R is considered "adequate" and "good" by
the statisticians. That's sufficient isn't it?
Last year I read some documentation about R and most routines were
considered "good", but "some very bad". That is a benchmark somehow.

There must be some benchmarks you want. R is widely used and there must
be people around who can provide you with the adequate stuff. CRAN is a
way to that, or the project page.

The core is free by the way and you can participate in the development.
People can provide you there with the information you want. R is quite
well documented (not everybody thinks it's well doc'ed, but... you
know... opinions do vary).

There is one simple reason to use R. It's free that's for one. If you
have the money commercial software is sufficient. That doesn't mean that
R is the poor mans software. It works quite well actually (but you...
know... opinions vary, especially about statistical software). I think
that's the usual reason to use it: it works quite well, and it's
documentation is widely available. A LOT of statistical procedures are
available. R crashed about 2 times last year on my computer and that's a
better than SPSS, and there are a lot of user interfaces available which
make working with R easier.
Personally I don't like SPSS, but I do know that the R core is used in
commercial applications. So at least one person has done some benchmarks.

Wilfred

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading a csv file row by row

2007-04-06 Thread Yuchen Luo

Hi, my friends.
When a data file is large, loading the whole file into the memory all
together is not feasible. A feasible way  is to read one row, process it,
store the result, and read the next row.


In Fortran, by default, the 'read' command reads one line of a file, which
is convenient, and when the same 'read' command is executed the next time,
the next row of the same file will be read.

I tried to replicate such row-by-row reading in R.I use scan( ) to do so
with the "skip= xxx " option. It takes only seconds when the number of the
rows is within 1000. However, it takes hours to read 1 rows. I think it
is because every time R reads, it needs to start from the first row of the
file and count xxx rows to find the row it needs to read. Therefore, it
takes more time for R to locate the row it needs to read.

Is there a solution to this problem?

Your help will be highly appreciated!

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reading a large csv file row by row

2007-04-06 Thread Yuchen Luo

Hi, my friends.

When a data file is large, loading the whole file into the memory all
together is not feasible. A feasible way  is to read one row, process it,
store the result, and read the next row.

In Fortran, by default, the 'read' command reads one line of a file, which
is convenient, and when the same 'read' command is executed the next time,
the next row of the same file will be read.

I tried to replicate such row-by-row reading in R.I use scan( ) to do so
with the "skip= xxx " option. It takes only seconds when the number of the
rows is within 1000. However, it takes hours to read 1 rows. I think it
is because every time R reads, it needs to start from the first row of the
file and count xxx rows to find the row it needs to read. Therefore, it
takes more time for R to locate the row it needs to read.

Is there a solution to this problem?

Your help will be highly appreciated!
Best Wishes
 Yuchen Luo

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generate a serie of new vars that correlate withexistingvar

2007-04-06 Thread Olivier ETERRADOSSI

Hello Greg (and List),
Thnaks for your reply and reflections (and sorry for my "frenglish").
Of course you're right, and I agree "a posteriori" with all your views. 
Probably my suggestion was first of all a mark of appreciation for your 
solution ;-) .
Here is the path I followed to get where I was, but I see that I was 
probably misunderstanding what makes the "core" of R :
1) The question of making such related couples of vectors is nearly a 
FAQ, as you point out in your reply.
2) It appeared to me that it is often asked by newbies or users with 
relatively small statistical knowledge.
3) To get to your solution, a good understanding is needed of what 
correlation is, as well as of matrix properties and operators. My guess 
was that the people listed above have generally not.
4) I believed from my own experience that the core of R was dedicated 
either to basics or to rather complicated algorithms to handle or 
produce results appearing as "simple" or "classical".
5) From my same own experience, I was not able to imagine to which 
non-core package such a function should "obviously" be added. I imagined 
that in the same manner, a person seeking for the function could have 
some problems in locating it. Until now I did not have a look to your 
TeachingDemos package (I'll do it), but I know of other categories of 
searchers, often not statisticians, who  have a need to generate such 
data and would not think of getting there to find a way.
To end with, all this mainly shows that I did not understand R 
philosophy as well as I thought !
Thanks, and regards. Olivier

Greg Snow a écrit :
> Oliver,
>
> I have thought of adding something like this to a package, but here is my 
> current thinking on the issue.
>
> This question (or similar) has been asked a few times, so there is some 
> demand for a general answer, I see three approaches:
>
> 1. Have an example of the necessary steps archived in a publicly available 
> place.
> 2. Write a function and include it in a non-core package.
> 3. Add it to the core of R or a core package.
>
> Number 1 is already in process as the e-mails will be part of the archive.  
> Though someone is welcome to add it to the Wiki if they think that would be 
> useful as well.
>
> Your suggestion is number 3, but I would argue that 2 is better than 3 for 
> the simple reason that anything added to the core is implied to be top 
> quality and have pretty much any options that most people would think of.  
> Putting it in a non-core package makes it available, with less implications 
> of quality.
>
> The question then becomes, what options do we make available?  Do we have 
> them specify the entire correlation structure? Or just assume the new 
> variables will be independent of each other?  What should the function do if 
> the set of correlations result in a matrix that is not positive definite?  
> What if the user wants to have 2 fixed variables?  And other questions.
>
> My current thinking is that the process is simple enough that it is easier to 
> do this by hand than to remember all the options to the function.  There are 
> currently people who use bootstrap and permutation tests without loading in 
> the packages that do these because it is quicker to write the code by hand 
> than to remember the syntax of the functions.  I think this type of data 
> generation falls under the same situation.  But if you, or someone else 
> thinks that there is enough justification for a function to do this, and can 
> specify what options it should have, I will be happy to add it to my 
> TeachingDemos package (this seems an appropriate place, since one of the 
> places that I want to generate data with a specific correlation structure is 
> when creating an example for students).
>
>
> Hope this helps,
>
>   

-- 
Olivier ETERRADOSSI
Maître-Assistant
CMGD / Equipe "Propriétés Psycho-Sensorielles des Matériaux"
Ecole des Mines d'Alès
Hélioparc, 2 av. P. Angot, F-64053 PAU CEDEX 9
tel std: +33 (0)5.59.30.54.25
tel direct: +33 (0)5.59.30.90.35 
fax: +33 (0)5.59.30.63.68
http://www.ema.fr

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reasons to Use R

2007-04-06 Thread Lorenzo Isella

John Kane wrote:
> --- Lorenzo Isella <[EMAIL PROTECTED]> wrote:
>
>   
>> (4)finally, a list of the advantages for using R
>> over commercial
>> statistical packages. The money-saving in itself is
>> not a reason good
>> enough and some people are scared by the lack of
>> professional support,
>> though this mailing list is simply wonderful.
>>
>> 
> Given that I can do as much if not more with R (in
> most cases) than with commercial software, as an
> independent consultant,  'cost' is a very significant
> factor. 
>
> A very major advantage of R is the money-saving.  Have
> a look at
> http://www.spss.com/stores/1/Software_Full_Version_C2.cfm
>
>  and convince me that cost ( for an independent
> contractor) is not a good reason. 
>
> __
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
>
>   
Hello,
No doubt that for an independent contractor money is a significant 
issue, but we are talking about the case of a large organization for 
which spending a few thousand euros on software is routine.
To avoid misunderstandings: I am myself an R user and I have no 
intention to pay a cent for statistical software, but in order to speak 
up for R vs any commercial software for data analysis and 
postprocessing, I need technical details (benchmarks, etc...) rather 
than the fact that it helps saving money.
Kind Regards

Lorenzo

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic/Cox regression: Parameter estimates directly from model matrix

2007-04-06 Thread Göran Broström

On 4/6/07, Kaspar Rufibach <[EMAIL PROTECTED]> wrote:
> Hi out there
>
> Is there a way to get the estimated coefficients in a logistic / Cox
> regression without having to specify a 'formula' but by only giving the
> model matrix?

See 'coxreg.fit' in package 'eha'. Or 'glm.fit' for logistic regression.

hth,

Göran
>
> Example for Cox regression:
>
> ## predictors
> n <- 50
> q1 <- rnorm(n)
> q2 <- rgamma(n, 2, 2)
> Z <- cbind(q1, q2)
>
> ## response
> ttf <- rexp(n)
> tf <- round(runif(n))
>
> ## compute estimates
> res <- coxph(Surv(ttf, tf) ~ q1 + q2)
> r <- res$coef
>
> My goal is to have a function
>
> estFromModelMatrix <- function(tf, ttf, Z){
>
>   /* do something meaningful using built-in functions */
>
>   return(r)}
>
> I have written such functions myself using LL - maximization from
> scratch, but these are slower than the built-in functions. Since I
> intend to do some simulations (where I specify the model matrix, but not
> want to give a 'formula' manually for each simulation scenario), it
> would be nice to have a function estFromModelMatrix().
>
> I searched the help extensively, but did not find a way to do this.
>
> Hope I was clear enough, any help is appreciated!
> Kaspar Rufibach
>
>
> --
> __
> Kaspar Rufibach
> Department of Statistics -- Sequoia Hall
> 390 Serra Mall
> Stanford University
> Stanford, CA 94305-4065
>
> mailto:[EMAIL PROTECTED]
> skype:kasparrufibach
> http://www.stanford.edu/~kasparr
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Göran Broström

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating a data frame from a list

2007-04-06 Thread Stephen Tucker

Hi Dimitri,

You can try this one if you'd like:

lst = list(a=c(A=1,B=8) , b=c(A=2,B=3,C=0), c=c(B=2,D=0))
# get unique names
nms <- unique(rapply(lst,function(x) names(x)))
# create a vector of NA's and then fill it according
# to matching names for each element of list
doit <- function(x,nms) {
  y <- rep(NA,length(nms)); names(y) <- nms
  y[match(names(x),names(y))] <- x
  return(y)
}
# apply it to the data
dtf <- as.data.frame(sapply(lst,doit,nms))
  


--- Dimitri Szerman <[EMAIL PROTECTED]> wrote:

> Dear all,
> 
> A few months ago, I asked for your help on the following problem:
> 
> I have a list with three (named) numeric vectors:
> 
> > lst = list(a=c(A=1,B=8) , b=c(A=2,B=3,C=0), c=c(B=2,D=0) )
> > lst
> $a
> A B
> 1 8
> 
> $b
> A B C
> 2 3 0
> 
> $c
> B D
> 2 0
> 
> Now, I'd love to use this list to create the following data frame:
> 
> > dtf = data.frame(a=c(A=1,B=8,C=NA,D=NA),
> +  b=c(A=2,B=3,C=0,D=NA),
> +  c=c(A=NA,B=2,C=NA,D=0) )
> 
> > dtf
>ab c
> A   1   2  NA
> B   8   3 2
> C NA   0  NA
> D NA NA0
> 
> That is, I wish to "merge" the three vectors in the list into a data frame
> by their "(row)"names.
> 
> And I got the following answer:
> 
> library(zoo)
> z <- do.call(merge, lapply(lst, function(x) zoo(x, names(x
> rownames(z) <- time(z)
> coredata(z)
> 
> However, it does not seem to be working. Here's what I get when I try it:
> 
> > lst = list(a=c(A=1,B=8) , b=c(A=2,B=3,C=0), c=c(B=2,D=0) )
> > library(zoo)
> > z <- do.call(merge, lapply(lst, function(x) zoo(x, names(x
> Error in if (freq > 1 && identical(all.equal(freq, round(freq)),
> TRUE)) freq <- round(freq) :
> missing value where TRUE/FALSE needed
> In addition: Warning message:
> NAs introduced by coercion
> 
> and z was not created.
> 
> Any ideas on what is going on here?
> Thank you,
> Dimitri
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



 

Be a PS3 game guru.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

50 matches

Mail list logo