Re: [R] Suggestion for big files [was: Re: A comment about R:]

2006-01-06 Thread Prof Brian Ripley
[Just one point extracted: Hadley Wickham has answered the random sample 
one]


On Thu, 5 Jan 2006, François Pinard wrote:


[Brian Ripley]

One problem with Francois Pinard's suggestion (the credit has got lost)
is that R's I/O is not line-oriented but stream-oriented.  So selecting
lines is not particularly easy in R.


I understand that you mean random access to lines, instead of random
selection of lines.  Once again, this chat comes out of reading someone
else's problem, this is not a problem I actually have.  SPSS was not
randomly accessing lines, as data files could well be hold on magnetic
tapes, where random access is not possible on average practice.  SPSS
reads (or was reading) lines sequentially from beginning to end, and the
_random_ sample is built while the reading goes.


That was not my point.  R's standard I/O is through connections, which 
allow for pushbacks, changing line endings and re-encoding character sets. 
That does add overhead compared to C/Fortran line-buffered reading of a 
file.  Skipping lines you do not need will take longer than you might 
guess (based on some limited experience).


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Ordering boxplot factors

2006-01-06 Thread Prof Brian Ripley
On Thu, 5 Jan 2006, Marc Schwartz wrote:

 On Thu, 2006-01-05 at 20:27 -0600, Joseph LeBouton wrote:
 Hi all,

 what a great help list!  I hope someone can help me with this puzzle...

 I'm trying to find a simple way to do:

 boxplot(obs~factor)

 so that the factors are ordered left-to-right along the x-axis by
 median, not alphabetically by factor name.

The thing to realize is that they are not alphabetic, but ordered by 
factor levels.  So the key is to set the levels.  (The help page for 
boxplot does say that, as I was relieved to find.)

 Complicated ways abound, but I'm hoping for a magical one-liner that'll
 do the trick.

 Any suggestions would be treasured.

 Thanks,

 -jlb


 Using the first example in ?boxplot, which is:

 boxplot(count ~ spray, data = InsectSprays, col = lightgray)



 Get the medians for 'count by spray' using tapply() and then sort the
 results in increasing order, by median:

  med - sort(with(InsectSprays, tapply(count, spray, median)))

 med
   CEDAFB
 1.5  3.0  5.0 14.0 15.0 16.5


 Now do the boxplot, setting the factor levels in order by median:

  boxplot(count ~ factor(spray, levels = names(med)),
  data = InsectSprays, col = lightgray)


 So...technically two lines of code.

This was answered yesterday in terms of bwplot.  See ?reorder.factor
for the same example done using reorder.factor.  That will give you the 
single line asked for, and be self-explanatory.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Suggestion for big files [was: Re: A comment about R:]

2006-01-06 Thread Martin Maechler
 FrPi == François Pinard [EMAIL PROTECTED]
 on Thu, 5 Jan 2006 22:41:21 -0500 writes:

FrPi [Brian Ripley]
 I rather thought that using a DBMS was standard practice in the 
 R community for those using large datasets: it gets discussed rather 
 often.

FrPi Indeed.  (I tried RMySQL even before speaking of R to my co-workers.)

 Another possibility is to make use of the several DBMS interfaces 
already 
 available for R.  It is very easy to pull in a sample from one of those, 
 and surely keeping such large data files as ASCII not good practice.

FrPi Selecting a sample is easy.  Yet, I'm not aware of any
FrPi SQL device for easily selecting a _random_ sample of
FrPi the records of a given table.  On the other hand, I'm
FrPi no SQL specialist, others might know better.

FrPi We do not have a need yet for samples where I work,
FrPi but if we ever need such, they will have to be random,
FrPi or else, I will always fear biases.

 One problem with Francois Pinard's suggestion (the credit has got lost) 
 is that R's I/O is not line-oriented but stream-oriented.  So selecting 
 lines is not particularly easy in R.

FrPi I understand that you mean random access to lines,
FrPi instead of random selection of lines.  Once again,
FrPi this chat comes out of reading someone else's problem,
FrPi this is not a problem I actually have.  SPSS was not
FrPi randomly accessing lines, as data files could well be
FrPi hold on magnetic tapes, where random access is not
FrPi possible on average practice.  SPSS reads (or was
FrPi reading) lines sequentially from beginning to end, and
FrPi the _random_ sample is built while the reading goes.

FrPi Suppose the file (or tape) holds N records (N is not
FrPi known in advance), from which we want a sample of M
FrPi records at most.  If N = M, then we use the whole
FrPi file, no sampling is possible nor necessary.
FrPi Otherwise, we first initialise M records with the
FrPi first M records of the file.  Then, for each record in
FrPi the file after the M'th, the algorithm has to decide
FrPi if the record just read will be discarded or if it
FrPi will replace one of the M records already saved, and
FrPi in the latter case, which of those records will be
FrPi replaced.  If the algorithm is carefully designed,
FrPi when the last (N'th) record of the file will have been
FrPi processed this way, we may then have M records
FrPi randomly selected from N records, in such a a way that
FrPi each of the N records had an equal probability to end
FrPi up in the selection of M records.  I may seek out for
FrPi details if needed.

FrPi This is my suggestion, or in fact, more a thought that
FrPi a suggestion.  It might represent something useful
FrPi either for flat ASCII files or even for a stream of
FrPi records coming out of a database, if those effectively
FrPi do not offer ready random sampling devices.


FrPi P.S. - In the (rather unlikely, I admit) case the gang
FrPi I'm part of would have the need described above, and
FrPi if I then dared implementing it myself, would it be welcome?

I think this would be a very interesting tool and
I'm also intrigued about the details of the algorithm you
outline above.

If it would be made to work on all kind of read.table()-readable
files, (i.e. of course including *.csv);   that might be a valuable
tool for all those -- and there are many -- for whom working
with DBMs is too daunting initially.

Martin Maechler, ETH Zurich

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] A comment about R - Link to a technical report from ATS, UCLA

2006-01-06 Thread Naji
Hi all,

UCLA ATS Statistical Consulting Group has just launched a very interesting
paper comparing SPSS, SAS  Stata as Statistical Packages.. Perhaps the
most notable exception to this discussion is R
http://www.ats.ucla.edu/stat/technicalreports/
It's an interesting reading for this thread.

Best regards
Naji

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Suggestion for big files [was: Re: A comment about R:]

2006-01-06 Thread Prof Brian Ripley

On Fri, 6 Jan 2006, Martin Maechler wrote:


FrPi == François Pinard [EMAIL PROTECTED]
on Thu, 5 Jan 2006 22:41:21 -0500 writes:


   FrPi [Brian Ripley]
I rather thought that using a DBMS was standard practice in the
R community for those using large datasets: it gets discussed rather
often.

   FrPi Indeed.  (I tried RMySQL even before speaking of R to my co-workers.)

Another possibility is to make use of the several DBMS interfaces already
available for R.  It is very easy to pull in a sample from one of those,
and surely keeping such large data files as ASCII not good practice.

   FrPi Selecting a sample is easy.  Yet, I'm not aware of any
   FrPi SQL device for easily selecting a _random_ sample of
   FrPi the records of a given table.  On the other hand, I'm
   FrPi no SQL specialist, others might know better.

   FrPi We do not have a need yet for samples where I work,
   FrPi but if we ever need such, they will have to be random,
   FrPi or else, I will always fear biases.

One problem with Francois Pinard's suggestion (the credit has got lost)
is that R's I/O is not line-oriented but stream-oriented.  So selecting
lines is not particularly easy in R.

   FrPi I understand that you mean random access to lines,
   FrPi instead of random selection of lines.  Once again,
   FrPi this chat comes out of reading someone else's problem,
   FrPi this is not a problem I actually have.  SPSS was not
   FrPi randomly accessing lines, as data files could well be
   FrPi hold on magnetic tapes, where random access is not
   FrPi possible on average practice.  SPSS reads (or was
   FrPi reading) lines sequentially from beginning to end, and
   FrPi the _random_ sample is built while the reading goes.

   FrPi Suppose the file (or tape) holds N records (N is not
   FrPi known in advance), from which we want a sample of M
   FrPi records at most.  If N = M, then we use the whole
   FrPi file, no sampling is possible nor necessary.
   FrPi Otherwise, we first initialise M records with the
   FrPi first M records of the file.  Then, for each record in
   FrPi the file after the M'th, the algorithm has to decide
   FrPi if the record just read will be discarded or if it
   FrPi will replace one of the M records already saved, and
   FrPi in the latter case, which of those records will be
   FrPi replaced.  If the algorithm is carefully designed,
   FrPi when the last (N'th) record of the file will have been
   FrPi processed this way, we may then have M records
   FrPi randomly selected from N records, in such a a way that
   FrPi each of the N records had an equal probability to end
   FrPi up in the selection of M records.  I may seek out for
   FrPi details if needed.

   FrPi This is my suggestion, or in fact, more a thought that
   FrPi a suggestion.  It might represent something useful
   FrPi either for flat ASCII files or even for a stream of
   FrPi records coming out of a database, if those effectively
   FrPi do not offer ready random sampling devices.


   FrPi P.S. - In the (rather unlikely, I admit) case the gang
   FrPi I'm part of would have the need described above, and
   FrPi if I then dared implementing it myself, would it be welcome?

I think this would be a very interesting tool and
I'm also intrigued about the details of the algorithm you
outline above.


It's called `reservoir sampling' and is described in my simulation book 
and Knuth and elsewhere.



If it would be made to work on all kind of read.table()-readable
files, (i.e. of course including *.csv);   that might be a valuable
tool for all those -- and there are many -- for whom working
with DBMs is too daunting initially.


It would be better (for the reasons I gave) to do this in a separate file 
preprocessor: read.table reads from a connection not a file, of course.


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] ylim problem in barplot

2006-01-06 Thread Martin Maechler
 Ben == Ben Bolker [EMAIL PROTECTED]
 on Thu, 5 Jan 2006 19:21:48 + (UTC) writes:

Ben Robert Baer rbaer at atsu.edu writes:
 Well, consider this example:
 barplot(c(-200,300,-250,350),ylim=c(-99,400))
 
 It seems that barplot uses ylim and pretty to decide things about the 
axis
 but does some slightly unexpected things with the bars themselves that 
are
 not just at the 'zero' end of the bar.
 
 Rob

no, there's no pretty() involved.  
Maybe it helps you to just type box()
after the plot.  Simply, the usual par(mar) margins are set.

I think ___in conclusion___  that Marc Schwartz'  solution has been
right on target all along:

   Use 'xpd = FALSE' if you set 'ylim' because otherwise, the
   result may be confusing.

The real problem of barplot.default() is the fact that 
'xpd = TRUE' is the default, and AFAIK that's not the case
for other high-level plot functions.

One could debate if the default setting for xpd should not be changed to
  
   xpd = (is.null(ylim)  !horiz) || (is.null(xlim)  horiz)

Now this has definitely gotten a topic for R-devel, and not
R-help anymore.

Ben in previous cases I think there was room for debate about
Ben the appropriate behavior.  What do you think should happen
Ben in this case?  Cutting off the bars seems like the right thing
Ben to do; 

Ben is your point that the axis being confined to positive values (a side 
effect of setting ylim) is weird?

Ben Ben

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] A comment about R - Link to a technical report from ATS, UCLA

2006-01-06 Thread Peter Dalgaard
Naji [EMAIL PROTECTED] writes:

 Hi all,
 
 UCLA ATS Statistical Consulting Group has just launched a very interesting
 paper comparing SPSS, SAS  Stata as Statistical Packages.. Perhaps the
 most notable exception to this discussion is R
 http://www.ats.ucla.edu/stat/technicalreports/
 It's an interesting reading for this thread.

In fact, if you trace the thread back to its root, this is what
started it...

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] RMySQL/DBI

2006-01-06 Thread Arne.Muller
Hello,

does anybody run RMySQL/DBI successfully on SunOS5.8 and MySQL 3.23.53 ? I'll 
get a segmentation fault whe trying to call dbConnect. We'll soon swtich to 
MySQL 4, however,  I was wondering whether the very ancient mysql version realy 
is the problem ... 

RMySQL 0.5-5
DBI 0.1-9
R 2.2.0
SunOS 5.8

kind regards and thanks a lot for your help,

Arne


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Wikis etc.

2006-01-06 Thread John Marsland
I agree.

In desperation at my inbox being swamped by messages I contacted the R-core team
to ask about other solutions. They recommended gmane.org who compile a
web-viewable archive of thousands of email lists - it even provides RSS feeds
for new topics.

Going back to the wiki issue, it might be wise to this about using Trac
http://projects.edgewall.com/trac/ which is an open source project that
integrates a wiki with the SVN code versioning system (used by R-project) and a
replacement for bugzilla's ticketing system. We use it to document our own code.

Trac would have the advantage of pushing questions on the R list back towards
the  actual source code and allowing all users to participate in the future
development of the software.

John Marsland

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Wikis etc.

2006-01-06 Thread Fernando Henrique Ferraz P. da Rosa
John Marsland writes:
 Trac would have the advantage of pushing questions on the R list back towards
 the  actual source code and allowing all users to participate in the future
 development of the software.
 

I see that this could be useful for R-devel, but considering the
volume of traffic and the kind of contents on R-help, I don't think such
tying to the actual source code would be so useful. Perhaps trac could
be used as an integrated interface for r-devel/svn and the bug track
system, and another wiki solution be used exclusiverly for the r-help
community (which includes many people not directly interested in coding
or development issues).

--
Though this be randomness, yet there is structure in't.
   Rosa, F.H.F.P

Instituto de Matemática e Estatística
Universidade de São Paulo
Fernando Henrique Ferraz P. da Rosa
http://www.feferraz.net

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Use Of makeARIMA

2006-01-06 Thread Spencer Graves
  I have not seen a reply to this post, so I will attempt a feeble 
response.  I've been wanting to learn more about these commands and 
suffering, like you, from the paucity of examples to follow.  To get 
started, after reading the help pages for all the commands you 
mentioned, I tried to think of the simplest example that might help me 
learn something about this.  This question led to the following:


set.seed(3)
y3 - rep(0:2, 10)+0.1*rnorm(30)
acf(y3) # ACF suggest a period 3 seasonal
pacf(y3)# PACF suggests a pure AR of order at most 3

(fit3 - arima(y3, seasonal=list(order=c(1,0,0), period=3)))

attributes(fit3)
fit3$model # Compare with the documentation for 'makeARIMA'

KalmanForecast(mod=fit3$model)

  If I wanted to understand better makeARIMA in particular, I listed 
arima and did a search for makeARIMA:  arima clearly uses 
makeARIMA.  If you run 'debug(arima)' then the above 'arima' command, 
you can step through the 'arima' function line by line and look at (and 
modify) any of the objects that function creates and uses.  In 
particular, you will be able to see exactly how the arima command uses 
the makeARIMA function.

  Hope this helps.
  spencer graves
p.s.  If you'd like more help from this group, please submit another 
question.  Before you do, however, I suggest you first read the posting 
guide! www.R-project.org/posting-guide.html.  Anecdotal evidence 
suggests that posts more consistent with that guide are more likely to 
receive more useful replies quicker.


Sumanta Basak wrote:

 Hi R-Experts,
 
  
 
 Currently I'm using an univariate time series in which I'm going to
 apply KalmanLike(),KalmanForecast (),KalmanSmooth(), KalmanRun(). For I
 use it before makeARIMA () but I don't understand and i don't know to
 include the seasonal coefficients. Can anyone help me citing a suitable
 example? Thanks in advance.
 
  
 
  
 
 --
 
 SUMANTA BASAK.
 
 --
 
 http://www.drsb24.blogspot.com/  
 
  
 
 
 ---
 This e-mail may contain confidential and/or privileged infor...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Wikis etc.

2006-01-06 Thread Jonathan Baron
On 01/06/06 13:40, John Marsland wrote:
 Going back to the wiki issue, it might be wise to this about using Trac
 http://projects.edgewall.com/trac/ which is an open source project that
 integrates a wiki with the SVN code versioning system (used by R-project) and 
 a
 replacement for bugzilla's ticketing system. We use it to document our own 
 code.
 
 Trac would have the advantage of pushing questions on the R list back towards
 the  actual source code and allowing all users to participate in the future
 development of the software.

It isn't clear to me what this would be for.  I'm not sure that I
trust users to modify code.

I was thinking myself that user input might be most useful for
the documentation of functions.  Not that this is so bad, but
rather it might be possible to have an extended system of
documentation on the web, with FAQ-type questions answered as
part of the documentation itself, so that people would not have
to rely on R-help so much (even in its archived forms).

And I was thinking of setting up a Wiki with one page per
function.  (Given that there are now hundreds or thousands of
functions, setting this up would have to be automated.)  I've
just installed (for another purpose) TWiki, which seems to have
some nice features for this sort of thing (in particular, data
stored as text files, hence easily manipulated by other
programs), but I will not have time to think through how to do
this for some time.  Just another idea to throw into the hopper.

In principle, another possibility is to do something like the PHP 
manual at http://www.php.net/manual/en/, which is not a wiki but
more like a bulletin board, with discussion of each command.  But 
I think a wiki is better.  I found it time consuming to read
through all those comments, almost as bad as reading through
R-help postings. :)

Jon
-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Wikis etc.

2006-01-06 Thread John Marsland
On 1/6/06, John Marsland [EMAIL PROTECTED] wrote:
 I see your point. Maybe the answer is to use the list for R-help style
 questions, but encourage people who answer questions to point the the
 answers in the wiki - which they might have enhanced if necessary.

 On 1/6/06, Fernando Henrique Ferraz P. da Rosa [EMAIL PROTECTED] wrote:
  John Marsland writes:
   Trac would have the advantage of pushing questions on the R list back 
   towards
   the  actual source code and allowing all users to participate in the 
   future
   development of the software.
  
 
  I see that this could be useful for R-devel, but considering the
  volume of traffic and the kind of contents on R-help, I don't think such
  tying to the actual source code would be so useful. Perhaps trac could
  be used as an integrated interface for r-devel/svn and the bug track
  system, and another wiki solution be used exclusiverly for the r-help
  community (which includes many people not directly interested in coding
  or development issues).
 
  --
  Though this be randomness, yet there is structure in't.
 Rosa, F.H.F.P
 
  Instituto de Matemática e Estatística
  Universidade de São Paulo
  Fernando Henrique Ferraz P. da Rosa
  http://www.feferraz.net
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Wikis etc.

2006-01-06 Thread John Marsland
It isn't so much that users modify the code as they would have to do
that in the usual way by checking out the project from the SVN.

Rather that extended documentation, features and enhancements etc. can
easily locate and quote from the code base and the differencing engine
as applied to the code base between versions.

On 1/6/06, Jonathan Baron [EMAIL PROTECTED] wrote:

 It isn't clear to me what this would be for.  I'm not sure that I
 trust users to modify code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Wikis etc.

2006-01-06 Thread Barry Rowlingson
Jonathan Baron wrote:

 And I was thinking of setting up a Wiki with one page per
 function.  (Given that there are now hundreds or thousands of
 functions, setting this up would have to be automated.) 

  One page per R manual page file would probably suffice. You could do 
something along the lines of the Zope book, where users can add comments 
but you can browse with comments off:

http://www.zope.org/Documentation/Books/ZopeBook/2_6Edition/AdvDTML.stx

  then toggle the 'Com On' button. This is less of a wiki and more of an 
annotation service.

but I think you'd run into problems with losing all the annotation when 
a new R version comes out.

Barry

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] A comment about R:

2006-01-06 Thread Stefan Eichenberger
I just got into R for most of the Xmas vacations and was about to ask for 
helping 
pointer on how to get a hold of R when I came across this thread. I've read 
through 
most it and would like to comment from a novice user point of view. I've a 
strong 
programming background but limited statistical experience and no knowledge on 
competing packages. I'm working as a senior engineer in electronics.

Yes, the learning curve is steep. Most of the docu is extremely terse. Learning
is mostly from examples (a wiki was proposed in another mail...), documentation
uses no graphical elements at all. So, when it comes to things like xyplot in
lattice: where would I get the concepts behind panels, superpanels, and the 
like?

ok., this is steep and terse, but after a while I'll get over it... That's life.
The general concept is great, things can be expressed very densly: Potential 
is here I quickly had 200 lines of my own code together, doing what it 
should - 
or so I believed.

Next I did:
matrix-matrix(1:100, 10, 10)image(matrix)
locator()
Great: I can interactively work with my graphs... But then:
filled.contour(matrix)
locator()
Oops - wrong coordinates returned. Bug. Apparently, locator() doen't realize
that fitted.contour() has a color bar to the right and scales x wrongly...

Here is what really shocked me:

 str(bar)
`data.frame':   206858 obs. of  12 variables:
 ...
 str(mean(bar[,6:12]))
 Named num [1:7] 1.828 2.551 3.221 1.875 0.915 ...
 ...
 str(sd(bar[,6:12]))
 Named num [1:7] 0.0702 0.1238 0.1600 0.1008 0.0465 ...
 ...
 prcomp(bar[,6:12])-foo
 str(foo$x)
 num [1:206858, 1:7] -0.4187 -0.4015  0.0218 -0.4438 -0.3650 ...
 ...
 str(mean(foo$x))
 num -1.07e-13
 str(sd(foo$x))
 Named num [1:7] 0.32235 0.06380 0.02254 0.00337 0.00270 ...
 ...

So, sd returns a vector independent on whether the arguement is a matrix or 
data.frame,
but mean reacts differently and returns a vector only against a data.frame?

The problem here is not that this is difficult to learn - the problem is the 
complete absense
of a concept. Is a data.frame an 'extended' matrix with columns of different 
types or 
something different? Since the numeric mean (I expected a vector) is recycled 
nicely 
when used in a vector context, this makes debugging code close to impossible. 
Since 
sd returns a vector, things like mean + 4*sd vary sufficiently across the data 
elements
that I assume working code... I don't get any warning signal that something is 
wrong here.

The point in case is the behavior of locator() on a filled.contour() object: 
Things apparently 
have been programmed and debugged from example rather than concept.

Now, in another posting I read that all this is a feature to discourge 
inexperienced users
from statistics and force you to think before you do things. Whilst I support 
this concept
of thinking: Did I miss something in statistics? I was in the believe that mean 
and sd were
relatively close to each other conceptually... (here, they are even in 
different packages...)

I will continue using R for the time being. But whether I can recommend it to 
my work 
collegues remains to be seen: How could I ever trust results returned?

I'm still impressed by some of the efficiency, but my trust is deeply shaken...


Stefan Eichenberger mailto:[EMAIL PROTECTED]

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Wikis etc.

2006-01-06 Thread Duncan Murdoch
On 1/6/2006 9:15 AM, Jonathan Baron wrote:
 On 01/06/06 13:40, John Marsland wrote:
 Going back to the wiki issue, it might be wise to this about using Trac
 http://projects.edgewall.com/trac/ which is an open source project that
 integrates a wiki with the SVN code versioning system (used by R-project) 
 and a
 replacement for bugzilla's ticketing system. We use it to document our own 
 code.
 
 Trac would have the advantage of pushing questions on the R list back towards
 the  actual source code and allowing all users to participate in the future
 development of the software.
 
 It isn't clear to me what this would be for.  I'm not sure that I
 trust users to modify code.
 
 I was thinking myself that user input might be most useful for
 the documentation of functions.  Not that this is so bad, but
 rather it might be possible to have an extended system of
 documentation on the web, with FAQ-type questions answered as
 part of the documentation itself, so that people would not have
 to rely on R-help so much (even in its archived forms).
 
 And I was thinking of setting up a Wiki with one page per
 function.  (Given that there are now hundreds or thousands of
 functions, setting this up would have to be automated.)  I've
 just installed (for another purpose) TWiki, which seems to have
 some nice features for this sort of thing (in particular, data
 stored as text files, hence easily manipulated by other
 programs), but I will not have time to think through how to do
 this for some time.  Just another idea to throw into the hopper.

I think this sounds like a great idea.  I would like to see two way 
connections between this and the existing man pages, e.g. in the HTML or 
PDF versions, links that go directly to the Wiki, and links from the 
Wiki to an online copy of the man pages.

If your automatic setup permitted it, then showing the output of the 
examples on the man pages would be nice.

One issue that you'll need to think about is whether there is one page 
per function, or one page per .Rd file, or some other organization:  and 
you'll need to be prepared for changes in the organization of the 
documentation with new R releases (and changes in function names, and 
changes in the examples...).

Duncan Murdoch

 
 In principle, another possibility is to do something like the PHP 
 manual at http://www.php.net/manual/en/, which is not a wiki but
 more like a bulletin board, with discussion of each command.  But 
 I think a wiki is better.  I found it time consuming to read
 through all those comments, almost as bad as reading through
 R-help postings. :)
 
 Jon

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Wikis etc.

2006-01-06 Thread Seth Falcon
Regarding systems for presenting documentation and allowing user
comments, I recently came across Commentary (see homepage
http://pythonpaste.org/commentary/).

Haven't used it, but my impression is that comments and the main doc
are both stored in svn (and auto-committed for comment changes).  This
might help solve the problem of updating the doc upon a new R release
because you could take advantage of svn merge.

Of course, svn merge won't know whether the comments are still
appropriate or not :-(

Nevermind.

+ seth

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] A comment about R:

2006-01-06 Thread Stefan Eichenberger
~~~
... blame me for not having sent below message initially in
plain text format. Sorry!
~~~

I just got into R for most of the Xmas vacations and was about to ask 
for helping  pointer on how to get a hold of R when I came across this 
thread. I've read through  most it and would like to comment from a 
novice user point of view. I've a strong  programming background but 
limited statistical experience and no knowledge on  competing packages. 
I'm working as a senior engineer in electronics.

Yes, the learning curve is steep. Most of the docu is extremely terse. 
Learning is mostly from examples (a wiki was proposed in another 
mail...), documentation uses no graphical elements at all. So, when it 
comes to things like xyplot in lattice: where would I get the concepts 
behind panels, superpanels, and the like?

ok., this is steep and terse, but after a while I'll get over it... 
That's life. The general concept is great, things can be expressed very 
densly: Potential  is here I quickly had 200 lines of my own code 
together, doing what it should -  or so I believed.

Next I did:
  matrix-matrix(1:100, 10, 10)
  image(matrix)
  locator()
Great: I can interactively work with my graphs... But then:
  filled.contour(matrix)
  locator()
Oops - wrong coordinates returned. Bug. Apparently, locator() doen't 
realize that fitted.contour() has a color bar to the right and scales x 
wrongly...

Here is what really shocked me:

 str(bar) `data.frame':   206858 obs. of  12 variables:  ...
 str(mean(bar[,6:12]))
  Named num [1:7] 1.828 2.551 3.221 1.875 0.915 ...
  ...
 str(sd(bar[,6:12]))
  Named num [1:7] 0.0702 0.1238 0.1600 0.1008 0.0465 ...
  ...
 prcomp(bar[,6:12])-foo
 str(foo$x)
  num [1:206858, 1:7] -0.4187 -0.4015  0.0218 -0.4438 -0.3650 ...
  ...
 str(mean(foo$x))
  num -1.07e-13
 str(sd(foo$x))
  Named num [1:7] 0.32235 0.06380 0.02254 0.00337 0.00270 ...
  ...

So, sd returns a vector independent on whether the arguement is a matrix 
or data.frame, but mean reacts differently and returns a vector only 
against a data.frame?

The problem here is not that this is difficult to learn - the problem is 
the complete absense of a concept. Is a data.frame an 'extended' matrix 
with columns of different types or  something different? Since the 
numeric mean (I expected a vector) is recycled nicely  when used in a 
vector context, this makes debugging code close to impossible. Since  sd 
returns a vector, things like mean + 4*sd vary sufficiently across the 
data elements that I assume working code... I don't get any warning 
signal that something is wrong here.

The point in case is the behavior of locator() on a filled.contour() 
object: Things apparently  have been programmed and debugged from 
example rather than concept.

Now, in another posting I read that all this is a feature to discourge 
inexperienced users from statistics and force you to think before you do 
things. Whilst I support this concept of thinking: Did I miss something 
in statistics? I was in the believe that mean and sd were relatively 
close to each other conceptually... (here, they are even in different 
packages...)

I will continue using R for the time being. But whether I can recommend 
it to my work  collegues remains to be seen: How could I ever trust 
results returned?

I'm still impressed by some of the efficiency, but my trust is deeply 
shaken...
---
Stefan Eichenbergermailto:[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Suggestion for big files [was: Re: A comment about R:]

2006-01-06 Thread Wensui Liu
RG,

Actually, SQLite provides a solution to read *.csv file directly into db.

Just for your consideration.

On 1/5/06, ronggui [EMAIL PROTECTED] wrote:

 2006/1/6, jim holtman [EMAIL PROTECTED]:
  If what you are reading in is numeric data, then it would require (807 *
  118519 * 8) 760MB just to store a single copy of the object -- more
 memory
  than you have on your computer.  If you were reading it in, then the
 problem
  is the paging that was occurring.
 In fact,If I read it in 3 pieces, each is about 170M.

 
  You have to look at storing this in a database and working on a subset
 of
  the data.  Do you really need to have all 807 variables in memory at the
  same time?

 Yip,I don't need all the variables.But I don't know how to get the
 necessary  variables into R.

 At last I  read the data in piece and use RSQLite package to write it
 to a database.and do then do the analysis. If i am familiar with
 database software, using database (and R) is the best choice,but
 convert the file into database format is not an easy job for me.I ask
 for help in SQLite list,but the solution is not satisfying as that
 required the knowledge about the third script language.After searching
 the internet,I get this solution:

 #begin
 rm(list=ls())
 f-file(D:\wvsevs_sb_v4.csv,r)
 i - 0
 done - FALSE
 library(RSQLite)
 con-dbConnect(SQLite,c:\sqlite\database.db3)
 tim1-Sys.time()

 while(!done){
 i-i+1
 tt-readLines(f,2500)
 if (length(tt)2500) done - TRUE
 tt-textConnection(tt)
 if (i==1) {
assign(dat,read.table(tt,head=T,sep=,,quote=));
  }
 else assign(dat,read.table(tt,head=F,sep=,,quote=))
 close(tt)
 ifelse(dbExistsTable(con, wvs),dbWriteTable(con,wvs,dat,append=T),
   dbWriteTable(con,wvs,dat) )
 }
 close(f)
 #end
 It's not the best solution,but it works.



  If you use 'scan', you could specify that you do not want some of the
  variables read in so it might make a more reasonably sized objects.
 
 
  On 1/5/06, François Pinard [EMAIL PROTECTED] wrote:
   [ronggui]
  
   R's week when handling large data file.  I has a data file : 807
 vars,
   118519 obs.and its CVS format.  Stata can read it in in 2 minus,but
 In
   my PC,R almost can not handle. my pc's cpu 1.7G ;RAM 512M.
  
   Just (another) thought.  I used to use SPSS, many, many years ago, on
   CDC machines, where the CPU had limited memory and no kind of paging
   architecture.  Files did not need to be very large for being too
 large.
  
   SPSS had a feature that was then useful, about the capability of
   sampling a big dataset directly at file read time, quite before
   processing starts.  Maybe something similar could help in R (that is,
   instead of reading the whole data in memory, _then_ sampling it.)
  
   One can read records from a file, up to a preset amount of them.  If
 the
   file happens to contain more records than that preset number (the
 number
   of records in the whole file is not known beforehand), already read
   records may be dropped at random and replaced by other records coming
   from the file being read.  If the random selection algorithm is
 properly
   chosen, it can be made so that all records in the original file have
   equal probability of being kept in the final subset.
  
   If such a sampling facility was built right within usual R reading
   routines (triggered by an extra argument, say), it could offer
   a compromise for processing large files, and also sometimes accelerate
   computations for big problems, even when memory is not at stake.
  
   --
   François Pinard   http://pinard.progiciels-bpi.ca
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide!
  http://www.R-project.org/posting-guide.html
  
 
 
 
  --
  Jim Holtman
  Cincinnati, OH
  +1 513 247 0281
 
  What the problem you are trying to solve?


 --
 黄荣贵
 Deparment of Sociology
 Fudan University

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html




--
WenSui Liu
(http://statcompute.blogspot.com)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] lmer p-vales are sometimes too small

2006-01-06 Thread Olof Leimar
This concerns whether p-values from lmer can be trusted. From 
simulations, it seems that lmer can produce very small, and probably 
spurious, p-values. I realize that lmer is not yet a finished product. 
Is it likely that the problem will be fixed in a future release of the 
lme4 package?

Using simulated data for a quite standard mixed-model anova (a balanced 
two-way design; see code for the function SimMixed pasted below), I 
compared the output of lmer, for three slightly different models, with 
the output of aov. For an example where there is no fixed treatment 
effect (null hypothesis is true), with 4 blocks, 2 treatments, and 40 
observations per treatment-block combination, I find that lmer gives 
more statistical significances than it should, whereas aov does not have 
this problem. An example of output I generated by calling
   SimMixed(1000)
is the following:

Proportion significances at the 0.05 level
aov: 0.05
lmer.1:  0.148
lmer.2:  0.148
lmer.3:  0.151

Proportion significances at the 0.01 level
aov: 0.006
lmer.1:  0.076
lmer.2:  0.076
lmer.3:  0.077

Proportion significances at the 0.001 level
aov: 0.001
lmer.1:  0.047
lmer.2:  0.047
lmer.3:  0.047

which is based on 1000 simulations (and takes about 5 min on my PowerMac 
G5). The different models fitted are:

fm.aov - aov(y ~ Treat + Error(Block/Treat), data = dat)
fm.lmer.1 - lmer(y ~ Treat + (Treat|Block), data = dat)
fm.lmer.2 - lmer(y ~ Treat + (Treat-1|Block), data = dat)
fm.lmer.3 - lmer(y ~ Treat + (1|Block) + (Treat-1|Block), data = dat)

It seems that, depending on the level of the test, lmer gives between a 
factor of 3 to a factor of around 50 times too many significances. The 
first two lmer models seem to give identical results, whereas the third 
(which I think perhaps is the one that best represents the data 
generated by the simulation) differs slightly. In running the 
simulations, warnings like this are occasionally generated:

Warning message:
optim or nlminb returned message false convergence (8)
  in: LMEoptimize-(`*tmp*`, value = list(maxIter = 200, tolerance = 
1.49011611938477e-08,

They seem to derive from the third of the lmer models. Perhaps there is 
some numerical issue in the lmer function? From running SimMixed() 
several times, I have noticed that large p-values (say, larger than 0.5) 
agree very well between lmer and aov, but there seems to be a systematic 
discrepancy for smaller p-values, where lmer gives smaller values than 
aov. The F-values agree between all analyzes (except for fm.lmer.3 when 
there is a warning), so there is a systematic difference between lmer 
and aov in how a p-value is obtained from the F-value, which becomes 
severe for small p-values.



My output from sessionInfo()

R version 2.2.1, 2005-12-20, powerpc-apple-darwin7.9.0

attached base packages:
[1] methods   stats graphics  grDevices utils 
datasets  base

other attached packages:
  lme4   latticeMatrix
  0.98-1 0.12-11  0.99-3



Pasted code for the SimMixed function (some lines might wrap):

# This function generates n.sims random data sets for a design with 4
# blocks, 2 treatments applied to each block, and 40 replicate
# observations for each block-treatment combination. There is no true
# fixed treatment effect, so a statistical significance of a test for
# a fixed treatment effect ought to occur with a probability equal to
# the nominal level of the test. Four tests are applied to each
# simulated data set: the classical aov and three versions of lmer,
# corresponding to different model formulations. The proportion of
# tests for a fixed treatment effect that become significant at the
# 0.05 0.01 and 0.001 levels are printed, as well as the p-values for
# the last of the simulations. In my runs, lmer gives significance
# more often than indicated by the nominal level, for each of the
# three models, whereas aov is OK. The package lme4 needs to be loaded
# to run the code.

SimMixed - function(n.sims = 1) {
   k - 4# number of blocks
   n - 40   # num obs per block X treatment combination
   m1 - 1.0 # fixed effect of level 1 of treatment
   m2 - m1  # fixed effect of level 2 of treatment
   sd.block - 0.5   # SD of block random effect
   sd.block.trt - 1.0   # SD of random effect for block X treatm
   sd.res - 0.1 # Residual SD
   Block - factor( rep(1:k, each=2*n) )
   Treat - factor( rep( rep(c(Tr1,Tr2), k), each=n) )
   m - rep( rep(c(m1, m2), k), each=n) # fixed effects
   # storage for p-values
   p.aov - rep(0, n.sims)
   p.lmer.1 - rep(0, n.sims)
   p.lmer.2 - rep(0, n.sims)
   p.lmer.3 - rep(0, n.sims)
   for (i in 1:n.sims) {
 # first get block and treatment random deviations
 b - rep( rep(rnorm(k, 0, sd.block), each=2) +
  rnorm(2*k, 0, sd.block.trt), each=n )
 # then get response
 y - m + b + rnorm(2*k*n, 0, sd.res)
 dat - data.frame(Block, Treat, y)
 # perform the tests
 fm.aov - 

[R] inverse prediction intervals for nonlinear least squares

2006-01-06 Thread Brian S Cade
I'm trying to help several of our scientists with constructing inverse 
prediction intervals for models estimated with nonlinear least squares. So 
for example, we might estimate mean of y from a 4 parameter logistic 
function of x [e.g., using SSfpl in nls()], but then want to estimate a 
prediction interval for x estimated from y (calibration problem, inverse 
prediction).  I've done some searching of R archives and found the 
nlscal() function in package quantchem but this only seems to provide 
inverse estimates not intervals (although quantchem does have a function 
for inverse prediction intervals of linear models).  Is anyone aware of 
another function or package in R that will provide for inverse prediciton 
intervals for nonlinear least squares?  I will confess that I'm not 
cognizant of whether there is well developed, accessible theory for 
inverse prediction intervals in the nonlinear model.

Brian
 
Brian S. Cade

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  [EMAIL PROTECTED]
tel:  970 226-9326
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Wikis etc.

2006-01-06 Thread Tony Plate
I second Frank's comment!  I wonder if questioners who receive a bunch 
of useful replies could be encouraged to enter a summary of those on a 
Wiki, in much the same way as users of S-news were expected to post a 
summary of their answers as a way of giving something back.

An existing R Wiki is located at 
http://fawn.unibw-hamburg.de/cgi-bin/Rwiki.pl?RwikiHome

However, there's currently not much on it.  Recently on R-help there was 
  a summary of using databases with R, which looked very useful, so I 
put that on the Wiki.  Maybe if others just start putting things there 
it can gather momentum?

-- Tony Plate

Frank E Harrell Jr wrote:
 I feel that as long as people continue to provide help on r-help wikis 
 will not be successful.  I think we need to move to a central wiki or 
 discussion board and to move away from e-mail.  People are extremely 
 helpful but e-mail seems to be to always be memory-less and messages get 
 too long without factorization of old text.  R-help is now too active 
 and too many new users are asking questions asked dozens of times for 
 e-mail to be effective.
 
 The wiki also needs to collect and organize example code, especially for 
 data manipulation.  I think that new users would profit immensely from a 
 compendium of examples.
 
 Just my .02 Euros
 
 Frank

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] help with strip.default

2006-01-06 Thread Steven Lacey
Hi, 
 
I am creating a multi-conditioned trellis plot. My data look something like
this:
 
Factor AFactor BIVDV
X   1   
X   2
X   3
X   4  
Y   1
Y   2
Y   3
Y   5 
Z   1
Z   2
Z   3
Z   4
 
In one sense these data are suitable for trellis because for every level of
factor A there are four levels of factor B. However, the names of the factor
B levels depend on the level of factor A. 
 
How would I create a 3 x 4 trellis plot where each panel is a combination of
factor A and factor B where the names of factor B are preserved and the
strip has two levels, one for factor A and another for factor B?
 
This was more difficult than I thought because trellis wants to generate 15
panels, as there are 3 levels of factor A and 5 levels of factor B. But
these 5 levels of factor B are in name only. There are only 4 different
levels of factor B for each level of factor A.
 
As a work around I am considering renaming the levels in factor A from 1 to
4 for all levels of factor B. Then, write a custom strip.default to specify
the names. However, I am not sure how to write this function. Would someone
help me get started?
 
Thanks,
Steve

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] [Rd] Multiplication

2006-01-06 Thread Barry Rowlingson
[crossed over to r-help since its not a bug and not a devel thing any more]

Thomas Lumley wrote:

 So is -2^2.  The precedence of ^ is higher than that of unary minus. It 
 may be surprising, but it *is* documented and has been in S for a long 
 time.

And just about every other programming language:

Matlab:

  -2^2

ans =

 -4


Maxima:

(C1) -2^2;
(D1)  - 4

Fortran:
   print *,-2**2
  -4

Perl:

$ perl -e 'print -2^2'
4294967292

  Oops. I mean:

$ perl -e 'print -2**2'
-4

  The precendence of operators is remarkably consistent over programming 
languages over time. It seems natural for me now that ^ is done before 
unary minus, but I don't know if that's because I've been doing that for 
25 years or because its really more natural.

  Anyone got a counter example where unary minus is higher than power?

Barry

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Wikis etc.

2006-01-06 Thread Don MacQueen
I don't have any significant experience with wikis, but I have yet to 
use any discussion board that was anywhere near as useful to me, or 
as easy to use, as an email list.

Discussion boards have a web browser interface. Typically, they 
display at most a dozen topics at a time. Scrolling to get the next 
dozen is slow, as it requires a download from some web server. There 
is a huge amount of wasted screen space. When there is a topic that 
generates many messages scrolling through them is slow, as some 
discussion board interfaces show only 6 or 7 at a time. Search 
engines provided by the discussion board software are limited and 
slow.

In contrast, in my email client I can show about three dozen subject 
lines at a time, I can quickly scroll up and down through the list, I 
can quickly group all the messages with the same subject line with a 
single click of the mouse. I can easily and quickly store selected 
messages of particular interest to a place where I can easily find 
them again. My email software searches very quickly through a huge 
number of messages.

Then there's the question of administration and maintenance. Who is 
going to set up the wiki or discussion board categories? As far as I 
can tell (and that's actually not very far), either of them would 
require a lot more time and effort to set up and maintain than the 
present email list.

Yes, r-help has a huge volume -- right now, my R-help mailbox has 
almost 22,000 messages in it, 2004-01-02 to the present; its size is 
about 124 mb. Yes, there is a lot of duplication. None the less, I 
find it easier and quicker to scan the subject lines a few times a 
day for interesting-looking topics than it would be to go to a 
browser and have to navigate up and down through various categories, 
looking for interesting-looking topics.

As far as I can tell, the wiki concept is more along the lines of a 
reference library, whereas mailing lists and discussion boards are 
meant for people to ask each other questions, and give each other 
answers. If that perception is at all accurate, I would have to say 
that a wiki is by no means a suitable replacement for an email list. 
And when it comes to a choice between an email list and a discussion 
board, I have a strong preference for the email list.

-Don

At 7:04 PM -0600 1/5/06, Frank E Harrell Jr wrote:
I feel that as long as people continue to provide help on r-help wikis
will not be successful.  I think we need to move to a central wiki or
discussion board and to move away from e-mail.  People are extremely
helpful but e-mail seems to be to always be memory-less and messages get
too long without factorization of old text.  R-help is now too active
and too many new users are asking questions asked dozens of times for
e-mail to be effective.

The wiki also needs to collect and organize example code, especially for
data manipulation.  I think that new users would profit immensely from a
compendium of examples.

Just my .02 Euros

Frank
--
Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
--
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] [R-pkgs] sudoku

2006-01-06 Thread Brahm, David
Any doubts about R's big-league status should be put to rest, now that
we have a
Sudoku Puzzle Solver.  Take that, SAS!  See package sudoku on CRAN.

The package could really use a puzzle generator -- contributors are
welcome!

-- David Brahm ([EMAIL PROTECTED]) 


[[alternative HTML version deleted]]

___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] How to visualise spatial raster data?

2006-01-06 Thread Jan Verbesselt
Dear R help,

We are trying to visualise spatial raster data. We have per line, X  Y
coordinates and Z(data). How could we visualise this type of data? We
also would like to add extra data points to this plot based on new X,Y
and Z data.

We used the following function but would like to use only the  graph in
the upper right corner (spatial one) . Similar to the graph cfr.
http://www.est.ufpr.br/geoR/geoRdoc/vignette/geoRintro/geoRintrose3.html#x4-60003.1

geo_iRVI - as.geodata(pixels_blok,coords.col=2:3, data.col=4)
plot(geo_iRVI)

How can this plot be optimized? And can we add other points to it?

 Another solution could be:
filled.contour(AVG,color=terrain.colors, xlab=Longitude (°),
ylab=Latitude (°)) but therefore the data needs to be organised
differently, not per line of X,Y coordinates but in a raster form.

Can anyone advise functions to visualise spatial raster data optimally?

thanks,
Jan

windows R 2.2
library(geoR) 
library(akima) 



Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] distribution maps

2006-01-06 Thread Rogério Rosa da Silva
Dears,

I would like to know if there is a R package(s) on CRAN that can
generate distribution maps  of species.

I think that this issue not has been discussed, but I did not  search
extensively on CRAN or help archives.

Best regards

Rogério

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] A comment about R:

2006-01-06 Thread Petr Pikal
Hi

just to difference between matrix and data.frame

  str(data.frame(mat))
`data.frame':   4 obs. of  5 variables:
 $ X1: num  -0.1940 -0.7629  0.0446 -0.5408
 $ X2: num  -1.092 -0.040  1.070  0.868
 $ X3: num  0.634 0.823 0.693 1.152
 $ X4: num   0.0258 -1.6507  1.2052  0.9714
 $ X5: num   0.673  0.380 -1.531 -0.426

 str((mat))
 num [1:4, 1:5] -0.1940 -0.7629  0.0446 -0.5408 -1.0925 ...

matrix is a numeric vector with dim attributes, data frame is matrix 
like structure which can hold different types of variables (columns).

sd is function based on var

 sd
function (x, na.rm = FALSE) 
{
if (is.matrix(x)) 
apply(x, 2, sd, na.rm = na.rm)
else if (is.vector(x)) 
sqrt(var(x, na.rm = na.rm))
else if (is.data.frame(x)) 
sapply(x, sd, na.rm = na.rm)
else sqrt(var(as.vector(x), na.rm = na.rm))
}
environment: namespace:stats

and therefore behaves in similar manner for data.frames and matrices,
but mean accepts only data.frames, numeric vectors and dates

Arguments:

   x: An R object.  Currently there are methods for numeric data
  frames, numeric vectors and dates.  A complex vector is
  allowed for 'trim = 0', only.

So therefore matrix is treated as a numeric vector by mean but as a 
set of vectors by sd.

Don't know why.
I believe that it is because with var(matrix) you expect output as a 
variance matrix.

Maybe somebody can explain it better.

If you wanted similar behaviour for mean for matrices as sd you can 
try

mymean-function(x, na.rm=FALSE)
{
if(is.matrix(x))
colMeans(x, na.rm=na.rm)
else mean(x, na.rm=na.rm)
}

 mymean(mat)
[1] -0.3632682  0.2013843  0.8251625  0.1379205 -0.2259909


HTH
Petr


On 6 Jan 2006 at 16:18, Stefan Eichenberger wrote:

From:   Stefan Eichenberger [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Date sent:  Fri, 6 Jan 2006 16:18:16 +0100
Subject:[R]   A comment about R:

 ~~~
 ... blame me for not having sent below message initially in
 plain text format. Sorry!
 ~~~
 
 I just got into R for most of the Xmas vacations and was about to ask
 for helping  pointer on how to get a hold of R when I came across this
 thread. I've read through  most it and would like to comment from a
 novice user point of view. I've a strong  programming background but
 limited statistical experience and no knowledge on  competing
 packages. I'm working as a senior engineer in electronics.
 
 Yes, the learning curve is steep. Most of the docu is extremely terse.
 Learning is mostly from examples (a wiki was proposed in another
 mail...), documentation uses no graphical elements at all. So, when it
 comes to things like xyplot in lattice: where would I get the concepts
 behind panels, superpanels, and the like?
 
 ok., this is steep and terse, but after a while I'll get over it...
 That's life. The general concept is great, things can be expressed
 very densly: Potential  is here I quickly had 200 lines of my own
 code together, doing what it should -  or so I believed.
 
 Next I did:
   matrix-matrix(1:100, 10, 10)
   image(matrix)
   locator()
 Great: I can interactively work with my graphs... But then:
   filled.contour(matrix)
   locator()
 Oops - wrong coordinates returned. Bug. Apparently, locator() doen't
 realize that fitted.contour() has a color bar to the right and scales
 x wrongly...
 
 Here is what really shocked me:
 
  str(bar) `data.frame':   206858 obs. of  12 variables:  ...
  str(mean(bar[,6:12]))
   Named num [1:7] 1.828 2.551 3.221 1.875 0.915 ...
   ...
  str(sd(bar[,6:12]))
   Named num [1:7] 0.0702 0.1238 0.1600 0.1008 0.0465 ...
   ...
  prcomp(bar[,6:12])-foo
  str(foo$x)
   num [1:206858, 1:7] -0.4187 -0.4015  0.0218 -0.4438 -0.3650 ... ...
  str(mean(foo$x))
   num -1.07e-13
  str(sd(foo$x))
   Named num [1:7] 0.32235 0.06380 0.02254 0.00337 0.00270 ...
   ...
 
 So, sd returns a vector independent on whether the arguement is a
 matrix or data.frame, but mean reacts differently and returns a vector
 only against a data.frame?
 
 The problem here is not that this is difficult to learn - the problem
 is the complete absense of a concept. Is a data.frame an 'extended'
 matrix with columns of different types or  something different? Since
 the numeric mean (I expected a vector) is recycled nicely  when used
 in a vector context, this makes debugging code close to impossible.
 Since  sd returns a vector, things like mean + 4*sd vary sufficiently
 across the data elements that I assume working code... I don't get any
 warning signal that something is wrong here.
 
 The point in case is the behavior of locator() on a filled.contour()
 object: Things apparently  have been programmed and debugged from
 example rather than concept.
 
 Now, in another posting I read that all this is a feature to discourge
 inexperienced users from statistics and force you to think before you
 do things. Whilst I support this concept of thinking: 

[R] Can R plot multicolor lines?

2006-01-06 Thread Paul DeBruicker
I have a number of continuous data series I'd like to plot with the
first 2/3 or so of each plotted in one color with the last 1/3 plotted
in another color.

I've thought of plotting 2 lines that abut each other by determining
where the first portion ends and attach the second portion.


Is there a simpler way that i have not thought of or discovered
through the mailing list, Intro to R, or Lattice PDF?

Thanks
Paul

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] installation question/problem

2006-01-06 Thread DW
Hello,

Can anybody tell me why I am getting the error below when I run make 
check and if it has any consequences I may regret later?

I run:

#  ./configure --enable-R-shlib
# make
# make check
# make install


configure, make and make install all work without errors, and it seems 
to install ok, and I even test the R binary after install, so I guess 
it's working. But I want to make sure. I'm not going to be using R, but 
I'm the net admin who has been tasked with installing it our servers, so 
I don't want any nasty surprises.

I wonder if it's possible that I'm missing libraries because I'm not 
running X on the servers?

This is:
FreeBSD 5.4 p8
R-2.2.1


make check output:

(snip)
.
running code in 'grDevices-Ex.R' ... OK
comparing 'grDevices-Ex.Rout' to 'grDevices-Ex.Rout.prev' ... OK
running code in 'graphics-Ex.R' ... OK
comparing 'graphics-Ex.Rout' to 'graphics-Ex.Rout.prev' ... OK
running code in 'stats-Ex.R' ...*** Error code 1

Stop in /usr/home/dwinner/tmp/R-2.2.1/tests/Examples.
*** Error code 1

Stop in /usr/home/dwinner/tmp/R-2.2.1/tests/Examples.
*** Error code 1

Stop in /usr/home/dwinner/tmp/R-2.2.1/tests.
*** Error code 1

Stop in /usr/home/dwinner/tmp/R-2.2.1/tests.
*** Error code 1

Stop in /usr/home/dwinner/tmp/R-2.2.1.



Thanks for any info,
DW

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Can R plot multicolor lines?

2006-01-06 Thread Petr Pikal
Hi

one way is to use segments
 x-rnorm(200)
plot(1:200, x, type=n)
segments(1:199,x[1:199], 2:200, x[2:200], col=c(rep(1,150), 
rep(2,50)))

HTH
Petr


On 6 Jan 2006 at 12:28, Paul DeBruicker wrote:

Date sent:  Fri, 6 Jan 2006 12:28:36 -0500
From:   Paul DeBruicker [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Subject:[R] Can R plot multicolor lines?

 I have a number of continuous data series I'd like to plot with the
 first 2/3 or so of each plotted in one color with the last 1/3 plotted
 in another color.
 
 I've thought of plotting 2 lines that abut each other by determining
 where the first portion ends and attach the second portion.
 
 
 Is there a simpler way that i have not thought of or discovered
 through the mailing list, Intro to R, or Lattice PDF?
 
 Thanks
 Paul
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] installation question/problem

2006-01-06 Thread Peter Dalgaard
DW [EMAIL PROTECTED] writes:

 Hello,
 
 Can anybody tell me why I am getting the error below when I run make 
 check and if it has any consequences I may regret later?
 
 I run:
 
 #  ./configure --enable-R-shlib
 # make
 # make check
 # make install
 
 
 configure, make and make install all work without errors, and it seems 
 to install ok, and I even test the R binary after install, so I guess 
 it's working. But I want to make sure. I'm not going to be using R, but 
 I'm the net admin who has been tasked with installing it our servers, so 
 I don't want any nasty surprises.
 
 I wonder if it's possible that I'm missing libraries because I'm not 
 running X on the servers?
 
 This is:
 FreeBSD 5.4 p8
 R-2.2.1
 
 
 make check output:
 
 (snip)
 .
 running code in 'grDevices-Ex.R' ... OK
 comparing 'grDevices-Ex.Rout' to 'grDevices-Ex.Rout.prev' ... OK
 running code in 'graphics-Ex.R' ... OK
 comparing 'graphics-Ex.Rout' to 'graphics-Ex.Rout.prev' ... OK
 running code in 'stats-Ex.R' ...*** Error code 1
 
Ouch.

Please look for stats-Ex.Rout.fail at tell us what is in it (you
should find it in tests/Examples in your builddir, interesting stuff
should be towards the end of the file).

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Problem with Integral of Indicator Function

2006-01-06 Thread Cal Stats
 Hi..
  
 i was trying to integrate the indicator funtion but had  problems when 
limits where negative or equal to the indicator condition
  
  my function is 
  
  fun1-function(x){
  as.numeric(x=2) 
  }
  _
  
  which should be   Ind(x=2)*x
  
  seems to work for the following two cases
  
   integrate(fun1,3,5)
  2 with absolute error  2.2e-14
  
   integrate(fun1,5,100)
  95 with absolute error  1.1e-12
  --
   Does not work for the following
  
   integrate(fun1,0,2)
  0 with absolute error  0 ( i was expecting  = 2)
  
   integrate(fun1,-1,5)
  3 with absolute error  3.3e-14   (i was expecting =5)
  
   integrate(fun1,-2,5)
  3 with absolute error  5.3e-15(i was expecting =5)
  
  Any suggestions?
  
  Thanks.
  
  Harsh,
  
  
  


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] help with strip.default

2006-01-06 Thread Deepayan Sarkar
On 1/6/06, Berton Gunter [EMAIL PROTECTED] wrote:
 Steve:

 This is a question for **super Deepayan,** and hopefully he'll respond.

 However, in the interim, let me give it a shot. Basically, I think what
 you've asked for falls outside the bounds of what lattice is designed to do.
 But I think there's a simple way to fool it. Basically what you need to do
 is to combine your two factors into one with level names and ordering as you
 want. See ?factor (?ordered may also be useful, but you don't need it). For
 example:

 comb.factor=factor(paste(A,B,sep='.'))

That's what I would have suggested. I would recommend using
interaction() instead of paste(), since it is designed for this and is
presumably more efficient (not that it matters in this small example).
For the record, the 'layout' and 'skip' arguments (of xyplot etc) are
often useful in conjunction with this sort of use.

Deepayan

 As I said, you may have to reorder the levels from the default that factor()
 gives you to get your panels to display the way you want. Also see the
 perm.cond and index.cond arguments of xyplot, which might also suffice for
 that purpose.

 Again, Deepayan will hopefully suggest a cleverer way that I missed. But I
 think this approach will get you what you want.

 Cheers,
 Bert

 -- Bert Gunter
 Genentech Non-Clinical Statistics
 South San Francisco, CA

 The business of the statistician is to catalyze the scientific learning
 process.  - George E. P. Box



  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Steven Lacey
  Sent: Friday, January 06, 2006 8:20 AM
  To: r-help@stat.math.ethz.ch
  Subject: [R] help with strip.default
 
  Hi,
 
  I am creating a multi-conditioned trellis plot. My data look
  something like
  this:
 
  Factor AFactor BIVDV
  X   1
  X   2
  X   3
  X   4
  Y   1
  Y   2
  Y   3
  Y   5
  Z   1
  Z   2
  Z   3
  Z   4
 
  In one sense these data are suitable for trellis because for
  every level of
  factor A there are four levels of factor B. However, the
  names of the factor
  B levels depend on the level of factor A.
 
  How would I create a 3 x 4 trellis plot where each panel is a
  combination of
  factor A and factor B where the names of factor B are
  preserved and the
  strip has two levels, one for factor A and another for factor B?
 
  This was more difficult than I thought because trellis wants
  to generate 15
  panels, as there are 3 levels of factor A and 5 levels of
  factor B. But
  these 5 levels of factor B are in name only. There are only 4
  different
  levels of factor B for each level of factor A.
 
  As a work around I am considering renaming the levels in
  factor A from 1 to
  4 for all levels of factor B. Then, write a custom
  strip.default to specify
  the names. However, I am not sure how to write this function.
  Would someone
  help me get started?
 
  Thanks,
  Steve

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] installation question/problem

2006-01-06 Thread DW
Peter Dalgaard wrote:

DW [EMAIL PROTECTED] writes:

  

Hello,

Can anybody tell me why I am getting the error below when I run make 
check and if it has any consequences I may regret later?

I run:

#  ./configure --enable-R-shlib
# make
# make check
# make install


configure, make and make install all work without errors, and it seems 
to install ok, and I even test the R binary after install, so I guess 
it's working. But I want to make sure. I'm not going to be using R, but 
I'm the net admin who has been tasked with installing it our servers, so 
I don't want any nasty surprises.

I wonder if it's possible that I'm missing libraries because I'm not 
running X on the servers?

This is:
FreeBSD 5.4 p8
R-2.2.1


make check output:

(snip)
.
running code in 'grDevices-Ex.R' ... OK
comparing 'grDevices-Ex.Rout' to 'grDevices-Ex.Rout.prev' ... OK
running code in 'graphics-Ex.R' ... OK
comparing 'graphics-Ex.Rout' to 'graphics-Ex.Rout.prev' ... OK
running code in 'stats-Ex.R' ...*** Error code 1


 
Ouch.

Please look for stats-Ex.Rout.fail at tell us what is in it (you
should find it in tests/Examples in your builddir, interesting stuff
should be towards the end of the file).

  

Here is what I found:

  ## using the nl2sol algorithm
  fm4DNase1 - nls( density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
+   data = DNase1,
+   start = list(Asym = 3, xmid = 0, scal = 1),
+   trace = TRUE, algorithm = port)
  0  0.0:  3.0  0.0  1.0
Error in nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)), data = 
DNase1,  :
Convergence failure: See PORT documentation.  Code (27)
Execution halted


Thanks,
DW

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] distribution maps

2006-01-06 Thread Roger Bivand
On Fri, 6 Jan 2006, Rogério Rosa da Silva wrote:

 Dears,
 
 I would like to know if there is a R package(s) on CRAN that can
 generate distribution maps  of species.
 
 I think that this issue not has been discussed, but I did not  search
 extensively on CRAN or help archives.

Could I suggest the Spatial and Environmetrics Task Views reached from 
the Task View item in the navigation bar on CRAN? You may also find the 
R-sig-geo mailing list a useful place to make your question a little more 
detailed - you do not say anything about your data, and a helpful reply 
would depend on knowing that.

 
 Best regards
 
 Rogério
 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Got it--Re:A question on summation of functions

2006-01-06 Thread Liqiu Jiang
Dear Rers,
 It seems the usual sum function can work. Anyway, I
appreciate your time on this. 

Best wishes,
Liqiu 




Dear Rers,
 I am trying to do a 2 dimmensional intergration for a
function. the function is summation of another
function evaluated at a  series of vector values. I
have difficulty to code this function. 
 
For example: I have function f which is a bivariate
normal density function:


#define some constants
err-0.5
m-5
times-seq(0, m-1)
rou-sum(times)/sqrt(m*sum(times^2))
sig.w- sqrt(m*err)
sig.wt-sqrt(sum(times^2)*err)


#bivariate normal density 
f-function(x, y, u.x, u.y)
exp(-((x-u.x)^2/sig.w^2+(y-u.y)^2/sig.wt^2-2*rou*(x-u.x)*(y-u.y)/(sig.w*sig.wt))/(2*(1-rou^2)))/(2*pi*sig.w*sig.wt*sqrt(1-rou^2))

###

I would like to have a function g which is defined as
##
uw = 1:n
uwt = (n+1):2n

g = function(x, y) f(x,y, uw[1], uw[1])+f(x,y, uw[2],
uwt[2])+
...+f(x,y, uw[n], uwt[n])
###
if n is very large, I am not able to write all them
down, How can I code the function g. Thank you for
your consideration. 

Best wishes,
Liqiu

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Daylight Savings Time unknown in R-2.2.1

2006-01-06 Thread Brahm, David
Under R-2.2.1, a POSIXlt date created with strptime has an unknown
Daylight Savings Time flag:

 strptime(20051208, %Y%m%d)$isdst
[1] -1

This is true on both Linux (details below) and Windows.  It did not
occur under R-2.1.0.  Any ideas?  TIA!


 Sys.getenv(TZ)
TZ 
 

Version:
 platform = i686-pc-linux-gnu
 arch = i686
 os = linux-gnu
 system = i686, linux-gnu
 status = 
 major = 2
 minor = 2.1
 year = 2005
 month = 12
 day = 20
 svn rev = 36812
 language = R

Locale:
C

Search Path:
 .GlobalEnv, package:methods, package:stats, package:graphics,
package:grDevices, package:utils, package:datasets, Autoloads,
package:base

-- David Brahm ([EMAIL PROTECTED])

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Installing Task Views

2006-01-06 Thread Mark Andersen
Hello,

 

I am just beginning to use R, after several years of using S-Plus (with
mixed success). I saw a recommendation on another mailing list for the
Environmetrics and Spatial Task Views, as a good way for a new user to get
started actually using R. The Task Views page at CRAN says:

To automatically install these views, the ctv package needs to be
installed, e.g., via
install.packages(ctv)and then the views can be installed via install.views
(after loading ctv), e.g.,
install.views(Econometrics)

I have installed ctv and I'm assuming that loading ctv means entering 

 

library(ctv)

 

If I then enter

 

 install.views(Environmetrics)

 

I get the error message 

 

Warning message:

CRAN task view Environmetrics not available in:
install.views(Environmetrics)

If I then go up to the Packages menu and Set CRAN Mirror to, for example,
USA (CA 2) and again enter

 install.views(Environmetrics)

I now get the error message

Error in install.packages(pkgs, CRAN = views[[i]]$repository, dependencies
= dependencies,  : 

unused argument(s) (CRAN ...)

 

Entering CRAN.views in the R GUI does indeed give a complete list of task
view names, topics, maintainers, and repositories. Selecting different CRAN
mirrors produces the same error messages as above, as does attempting to
install a different task view.

 

I have searched the R-help mailing list archive and found several postings
announcing the availability of task views, but not on how to install them. I
have searched the pdf manuals, and found no instances of task view. I have
also searched the FAQs. The help for install.views basically repeats the
information on the CRAN site, providing information on using the function,
but not on actually installing task views. The example in the article on
task views in the May 2005 issue of R News uses the lib =  argument, which
is not mentioned in the help for install.views. It appears that the
instructions at CRAN for installing task views are missing at least one
step. Can anyone point me to a reliable set of instructions for installing
(not to mention actually using) a task view? Many thanks in advance.

 

Regards,

Mark C. Andersen

 

Dr. Mark C. Andersen

Associate Professor

Department of Fishery and Wildlife Sciences

New Mexico State University

Las Cruces NM 88003-0003

phone: 505-646-8034

fax: 505-646-1281

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Wikis etc.

2006-01-06 Thread paul sorenson
I am a fan of wiki's and I reckon it would really help with making R 
more accessible.  On one extreme you have this email list and on the 
other extreme you have RNews and the PDF's on CRAN.  A wiki might hit 
the spot between them and reduce the traffic on the email list.


Frank E Harrell Jr wrote:
 I feel that as long as people continue to provide help on r-help wikis 
 will not be successful.  I think we need to move to a central wiki or 
 discussion board and to move away from e-mail.  People are extremely 
 helpful but e-mail seems to be to always be memory-less and messages get 
 too long without factorization of old text.  R-help is now too active 
 and too many new users are asking questions asked dozens of times for 
 e-mail to be effective.
 
 The wiki also needs to collect and organize example code, especially for 
 data manipulation.  I think that new users would profit immensely from a 
 compendium of examples.
 
 Just my .02 Euros
 
 Frank

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Suggestion for big files [was: Re: A comment about R:]

2006-01-06 Thread Wensui Liu
RG,

I think .import command in sqlite should work. plus, sqlite browser (
http://sqlitebrowser.sourceforge.net) might do the work as well.

On 1/6/06, ronggui [EMAIL PROTECTED] wrote:

 Can you give me some hints? or let me know how to do ?

 Thank you !

 2006/1/6, Wensui Liu [EMAIL PROTECTED]:
  RG,
 
   Actually, SQLite provides a solution to read *.csv file directly into
 db.
 
   Just for your consideration.
 
 
  On 1/5/06, ronggui [EMAIL PROTECTED] wrote:
   2006/1/6, jim holtman [EMAIL PROTECTED]:
If what you are reading in is numeric data, then it would require
 (807 *
118519 * 8) 760MB just to store a single copy of the object -- more
  memory
than you have on your computer.  If you were reading it in, then the
  problem
is the paging that was occurring.
   In fact,If I read it in 3 pieces, each is about 170M.
  
   
You have to look at storing this in a database and working on a
 subset
  of
the data.  Do you really need to have all 807 variables in memory at
 the
same time?
  
   Yip,I don't need all the variables.But I don't know how to get the
   necessary  variables into R.
  
   At last I  read the data in piece and use RSQLite package to write it
   to a database.and do then do the analysis. If i am familiar with
   database software, using database (and R) is the best choice,but
   convert the file into database format is not an easy job for me.I ask
   for help in SQLite list,but the solution is not satisfying as that
   required the knowledge about the third script language.After searching
   the internet,I get this solution:
  
   #begin
   rm(list=ls())
   f-file(D:\wvsevs_sb_v4.csv,r)
   i - 0
   done - FALSE
   library(RSQLite)
   con-dbConnect(SQLite,c:\sqlite\database.db3)
   tim1-Sys.time()
  
   while(!done){
   i-i+1
   tt-readLines(f,2500)
   if (length(tt)2500) done - TRUE
   tt-textConnection(tt)
   if (i==1) {
  assign(dat,read.table(tt,head=T,sep=,,quote=));
}
   else assign(dat,read.table(tt,head=F,sep=,,quote=))
   close(tt)
   ifelse(dbExistsTable(con,
  wvs),dbWriteTable(con,wvs,dat,append=T),
 dbWriteTable(con,wvs,dat) )
   }
   close(f)
   #end
   It's not the best solution,but it works.
  
  
  
If you use 'scan', you could specify that you do not want some of
 the
variables read in so it might make a more reasonably sized objects.
   
   
On 1/5/06, François Pinard  [EMAIL PROTECTED] wrote:
 [ronggui]

 R's week when handling large data file.  I has a data file : 807
  vars,
 118519 obs.and its CVS format.  Stata can read it in in 2
 minus,but
  In
 my PC,R almost can not handle. my pc's cpu 1.7G ;RAM 512M.

 Just (another) thought.  I used to use SPSS, many, many years ago,
 on
 CDC machines, where the CPU had limited memory and no kind of
 paging
 architecture.  Files did not need to be very large for being too
  large.

 SPSS had a feature that was then useful, about the capability of
 sampling a big dataset directly at file read time, quite before
 processing starts.  Maybe something similar could help in R (that
 is,
 instead of reading the whole data in memory, _then_ sampling it.)

 One can read records from a file, up to a preset amount of
 them.  If
  the
 file happens to contain more records than that preset number (the
  number
 of records in the whole file is not known beforehand), already
 read
 records may be dropped at random and replaced by other records
 coming
 from the file being read.  If the random selection algorithm is
  properly
 chosen, it can be made so that all records in the original file
 have
 equal probability of being kept in the final subset.

 If such a sampling facility was built right within usual R reading
 routines (triggered by an extra argument, say), it could offer
 a compromise for processing large files, and also sometimes
 accelerate
 computations for big problems, even when memory is not at stake.

 --
 François Pinard   http://pinard.progiciels-bpi.ca

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

   
   
   
--
Jim Holtman
Cincinnati, OH
+1 513 247 0281
   
What the problem you are trying to solve?
  
  
   --
   黄荣贵
   Deparment of Sociology
   Fudan University
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide!
  http://www.R-project.org/posting-guide.html
 
 
 
  --
  WenSui Liu
  (http://statcompute.blogspot.com)
  Senior Decision Support Analyst
  Health Policy and Clinical Effectiveness
  Cincinnati Children Hospital Medical Center
 


 --
 黄荣贵
 Deparment of Sociology
 Fudan 

Re: [R] LOCFIT help

2006-01-06 Thread Takatsugu Kobayashi
Hi,

I have started to learn local regression models so as to identify
statistically significant peaks in urban areas, such as population
densities and congestion. I successfully ran /locfit/ and got several
information on the fit. Now I got stuck. This is a very silly question,
but isn't the first derivative of the fitted curve zero or close to
zero? I got some very high numbers on that.

The commands I put are:

x-Longitude
y-Latitude
model.local-locfit(log(POPDENSITY)~lp(x,y, nn=0.55))

span was determined by spgwr adaptive bandwidth.

Any help appreciated.

Thank you very much.

Taka

PhD student
Indiana University, Geography

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html