Re: [R] creating 'all' sum contrasts

2010-10-15 Thread Berwin A Turlach
G'day Michael,

On Fri, 15 Oct 2010 12:09:07 +0100
Michael Hopkins hopk...@upstreamsystems.com wrote:

 OK, my last question didn't get any replies so I am going to try and
 ask a different way.
 
 When I generate contrasts with contr.sum() for a 3 level categorical
 variable I get the 2 orthogonal contrasts:
 
  contr.sum( c(1,2,3) )
   [,1] [,2]
 110
 201
 3   -1   -1

These two contrasts are *not* orthogonal.

 This provides the contrasts 1-3 and 2-3 as expected.  But I also
 want it to create 1-2 (i.e. 1-3 - 2-3).  So in general I want
 all possible orthogonal contrasts - think of it as the contrasts for
 all pairwise comparisons between the levels.

You have to decide what you want.  The contrasts for all pairwise
comparaisons between the levels or all possible orthogonal contrasts?

The latter is actually not well defined.  For a factor with p levels,
there would be p orthogonal contrasts, which are only identifiable up to
rotation, hence infinitely many such sets. But there are p(p-1)
pairwise comparisons. So unless p=2, yo have to decide what you want

 Are there are any options for contrast() or other functions/libraries
 that will allow me to do this automatically?

Look at package multcomp, in particular functions glht and mcp, these
might help.

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-Fortran question (multiple subroutines)

2010-10-25 Thread Berwin A Turlach
G'day Remko,

On Mon, 25 Oct 2010 15:33:30 +1100
Remko Duursma remkoduur...@gmail.com wrote:

 apologies if this is somewhere in a manual, I have not been able to
 find anything relevant. 

You probably have to set some appropriate flags and the information
should be somewhere in the manual of your compiler. Though, Writing R
Exensions will tell you now to change/specify flags for compilers.

 I run Windows Vista.

My condolences. :)

 But is it possible to have more than one subroutine in my source file,
 one depending on the other? 

Yes.

 I.e, my code looks something like this:
 
 subroutine f(x,y,z)
 
 call g(x,y,z)
 
 end
 
 subroutine g(x,y,z)
 
 z = x*y
 
 end
 
 calling this from R shows that subroutine g is not called. 

What do you mean with g is not called?  How did you assert this?  

If the code functions correctly, then g has to be called, either
explicitly or implicitly (e.g. if it was inlined).

Do you mean that when you compile the code and load the resulting DLL
from R, you can only call subroutine f via .Fortran but not subroutine
g? 

 The code compiled as executable works fine.

What do you mean with compiled as executable?  The snippet that you
showed would not compile as an executable as there is no main program.
So what do you mean with works fine?  That you can call the
subroutine g from the main program that you use when creating an
executable?

My guess, based on this snippet, is that your compiler notices that
subroutine g is not really needed and replaces the call to g within f
by the body of g and does not create a separate entry point for g in
the compiled object; a process known as in-lining (IIRC) and typically
used by compilers if high levels of optimisations are requested.  The
compiler does not know that you intend to call g later directly from R
via .Fortran, all it sees is the code in the file and, if high
optimisation is requested, it may feel free to rearrange the code to
stream-line it (e.g. by in-lining).  

So possible solutions could be:

1) ask for a lower level of optimisation 
2) tell the compiler not to do in-lining (flag --no-inline??)
3) put the routines into separate files, compile the files separately
   and then link all the resulting object files together into a DLL.
   AFAIK, optimising/inlining across separate files is a tricky issue
   so few, if any, compilers would do so.
4) Check whether there is an option to specify which exportable
   symbols have to be in the DLL in the end.

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-Fortran question (multiple subroutines)

2010-10-25 Thread Berwin A Turlach
G'day all,

On Mon, 25 Oct 2010 06:52:15 -0400
Duncan Murdoch murdoch.dun...@gmail.com wrote:

 Remko Duursma wrote:
[...]
  I.e, my code looks something like this:
  
  subroutine f(x,y,z)
  
  call g(x,y,z)
  
  end
  
  subroutine g(x,y,z)
  
  z = x*y
  
  end
  
  
  calling this from R shows that subroutine g is not called. The code
  compiled as executable works fine.
 
 There are no such limitations imposed by R.  I'd suggest your
 diagnosis of the problem is wrong.  If you can't spot the problem,
 please post a real example (simplified if possible, but not as much
 as the one above).

Actually, it turns out that this example is simplified enough. :)

I put this snippet into a file, compiled it via R CMD SHLIB, loaded
it into R and then was very surprised about the result of .Fortran(f,
x=1.1, y=2.2, z=0.0).  

Eventually it dawned to me, Remko needs to read the table in section
5.2 of Writing R Extensions.  The simple fix in this case is to put
a double precision x, y, z at the beginning of each subroutine.  The
default precision in FORTRAN is not double precision, that's why the
code seems to work when compiled as stand alone but not when called
from R.

In general, I would recommend to start each subroutine with implicit
none (o.k., I know that this is not standard FORTRAN77 but most
compiler support that construct; the alternative, that would confrom
with the FORTRAN77 standard, is to declare everything to be implicitly
of type character and then let the fun begin) and then to explicitly
declare all variables to be of the appropriate type.

HTH.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cube root of a negative number

2010-10-26 Thread Berwin A Turlach
G'day Gregory,

On Tue, 26 Oct 2010 19:05:03 -0400
Gregory Ryslik rsa...@comcast.net wrote:

 Hi,
 
 This might be me missing something painfully obvious but why does the
 cube root of the following produce an NaN?
 
  (-4)^(1/3)
 [1] NaN

1/3 is not exactly representable as a binary number.  My guess is that
the number that is closest to 1/3 and representable cannot be used as
the exponent for negative numbers, hence the NaN.

Essentially, don't expect finite precision arithmetic to behave like
infinite precision arithmetic, it just doesn't.  The resources
mentioned in FAQ 7.31 can probably shed more light on this issue.

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cube root of a negative number

2010-10-26 Thread Berwin A Turlach
G'day Bill,

On Wed, 27 Oct 2010 10:34:27 +1100
bill.venab...@csiro.au wrote:

[...]
 It is no surprise that this does not work when working in the real
 domain, except by fluke with something like 
 
  -4^(1/3)
 [1] -1.587401
  
 
 where the precedence of the operators is not what you might expect.
 Now that could be considered a bug, since apparently
 
  -4^(1/2)
 [1] -2
 
 which comes as rather a surprise!
[...]

Mate, you must have been using Excel (or bc) too much recently if this
takes you by surprise. :)

This is exactly the behaviour that I would expect as I was always
taught that exponentiation was the operator with the highest priority.  

The discussion at
http://en.wikipedia.org/wiki/Order_of_operations
points out that there exist differing conventions concerning the unary
operator - but gives only Excel and bc as examples of
programs/languages who give the unary operator the higher precedence.
Moreover,
http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html
points out that in Excel the precedence of unary minus is different
than many other languages (including VBA which is often used with
Excel).

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] what´s wrong with this code?

2010-10-29 Thread Berwin A Turlach
G'day Jose,

On Fri, 29 Oct 2010 11:25:05 +0200
José Manuel Gavilán Ruiz g...@us.es wrote:

 Hello,  I  want  to  maximize  a  likelihood  function expressed as an
 integral  that  can not be symbolically evaluated. I expose my problem
 in a reduced form.
 
 g- function(x){
 integrand-function(y) {exp(-x^2)*y}
 g-integrate(integrand,0,1)
 } 
 h-function(x) log((g(x)))
 
 g  is an object of the class function, but g(2) is a integrate object,
 I can print(g(2)) an get a result, but
 if I try h(2) R says is a nonnumeric argument for a mathematical
 function. 

R print(g(2))
0.00915782 with absolute error  1.0e-16

Indeed print(g(2)) gives an output, but it is obviously not just a
numeric number but something formatted, presumably based on the class
of the returned object from g(2) and the values that this object
contains.  (You may want to read up on R's way(s) to object oriented
programming).

So what does g() return?

R str(g(2))
List of 5
 $ value   : num 0.00916
 $ abs.error   : num 1.02e-16
 $ subdivisions: int 1
 $ message : chr OK
 $ call: language integrate(f = integrand, lower = 0, upper = 1)
 - attr(*, class)= chr integrate

The object is a list with several values, and class integrate.  

 My goal is to maximize h. what´s wrong?

I guess you want to pass only the component value to h().  

R g - function(x){
+ integrand - function(y) {exp(-x^2)*y}
+ integrate(integrand, 0, 1)$value}
R g(2)
[1] 0.00915782
R h(g(2))
[1] -0.693231

HTH,

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] what´s wrong with this code?

2010-10-29 Thread Berwin A Turlach
G'day Jose,

On Fri, 29 Oct 2010 11:25:05 +0200
José Manuel Gavilán Ruiz g...@us.es wrote:

 Hello,  I  want  to  maximize  a  likelihood  function expressed as an
 integral  that  can not be symbolically evaluated. I expose my problem
 in a reduced form.
 
 g- function(x){
 integrand-function(y) {exp(-x^2)*y}
 g-integrate(integrand,0,1)
 } 
 h-function(x) log((g(x)))
 
 g  is an object of the class function, but g(2) is a integrate object,
 I can print(g(2)) an get a result, but
 if I try h(2) R says is a nonnumeric argument for a mathematical
 function. My goal is to maximize h.

Which can be done without R quite trivially.

The result of g(x) is exp(-x^2)/2.
So h(x) is -x^2-log(2), which is maximised at x=0.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R version 2-12.0 - running as 32 or as 64 bit?

2010-10-29 Thread Berwin A Turlach
G'day Dimitri,

On Fri, 29 Oct 2010 16:45:00 -0400
Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote:

 Question: I installed R verison 2-12.0 on my Windows 7 (64 bit) PC.
 When I was installing it, it did not ask me anything about 32 vs. 64
 bit. So, if I run R now - is it running as a 32-bit or a 64-bit?

Well, when I did the same, I got two shortcuts installed on my desktop,
one named R 2.12.0 and the other named R x64 2.12.0.  If I had any
doubts which version of R these shortcuts would start, then the fourth
line of the start-up message would put them to rest.  Missing that
message, you can always issue the command

 .Machine$sizeof.pointer

if the answer is 4, you are running 32bit, if the answer is 8, then you
are running 64 bit.

HTH.

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] DBLEPR?

2010-11-16 Thread Berwin A Turlach
G'day John,

On Tue, 16 Nov 2010 14:02:57 -0500
Prof. John C Nash nas...@uottawa.ca wrote:

 Are the xxxPR routines now deprecated (particularly for 64 bit
 systems) or still OK to use?  

They are still OK to use, and I use them occasionally. 

 If OK, can anyone point to documentation and examples?

Section 6.5.1 Printing from Fortran in the Writing R Extensions
manual has documentation (but no examples).  Luckily, Section 5.7
Fortran I/O, the section that I always look at first, has a link to
Section 6.5.1. :)

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Namibia becoming NA

2010-07-18 Thread Berwin A Turlach
G'day Ted,

On Sun, 18 Jul 2010 09:25:09 +0100 (BST)
(Ted Harding) ted.hard...@manchester.ac.uk wrote:

 On 18-Jul-10 05:47:03, Suresh Singh wrote:
  I have a data file in which one of the columns is country code and
  NA is the
  code for Namibia.
  When I read the data file using read.csv, NA for Namibia is being
  treated as
  null or NA
  
  How can I prevent this from happening?
  
  I tried the following but it didn't work
  input - read.csv(padded.csv,header = TRUE,as.is = c(code2))
  
  thanks,
  Suresh
 
 I suppose this was bound to happen, and in my view it represent
 a bit of a mess! With a test file temp.csv:
 
   Code,Country
   DE,Germany
   IT,Italy
   NA,Namibia
   FR,France

Thanks for providing an example.

 leads to exactly the same result. And I have tried every variation
 I can think of of as.is and colClasses, still with exactly the
 same result!

Did you think of trying some variations of na.strings? ;-) 

IMO, the simplest way of coding missing values in CSV files is to have
two consecutive commas; not some code (whether NA, 99, 999, -1, ...)
between them.

 Conclusion: If an entry in a data file is intended to become the
 character value NA, there seems to be no way of reading it in
 directly. This should not be so: it should be preventable!

It is, through simple use of the na.strings argument:

R X - read.csv(temp.csv, na.strings=)
R X
  Code Country
1   DE Germany
2   IT   Italy
3   NA Namibia
4   FR  France
R which(is.na(X))
integer(0)

HTH.

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Call to rgamma using .C causes R to hang

2010-07-20 Thread Berwin A Turlach
G'day Steve,

On Tue, 20 Jul 2010 17:20:49 +0930
Steve Pederson stephen.peder...@adelaide.edu.au wrote:

I suggest to insert the following two lines (untested as I usually
don't work on Windows):
 
 # include R.h
 # include Rmath.h
 void test1 (double *x, double *result)
 {
GetRNGstate();
  result[0] = rgamma(*x, 2.0);
PutRNGstate();
 }
 

[...] 
 I'm running Windows XP  I followed the instructions at
 http://cran.r-project.org/doc/manuals/R-admin.html#Windows-standalone
 after building R from source.

But you didn't follow the instructions of Section 6.3 in the Writing R
Extensions manual. :)

http://cran.r-project.org/doc/manuals/R-exts.html#Random-numbers

HTH.

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] minor diagonal in R

2010-09-07 Thread Berwin A Turlach
On Tue, 7 Sep 2010 06:26:09 -0700 (PDT)
Peng, C cpeng@gmail.com wrote:

 
 This this what you want?
 
  A=matrix(1:16,ncol=4)
  A
  [,1] [,2] [,3] [,4]
 [1,]159   13
 [2,]26   10   14
 [3,]37   11   15
 [4,]48   12   16
  diag(A[1:4,4:1])
 [1] 13 10  7  4

Or 

 A[cbind(1:4,4:1)]
[1] 13 10  7  4

one character more to type, but could be more efficient for large A :)

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OT: What distribution is this?

2010-09-25 Thread Berwin A Turlach
G'day Rainer,

On Sat, 25 Sep 2010 16:24:17 +0200
Rainer M Krug r.m.k...@gmail.com wrote:

 This is OT, but I need it for my simulation in R.
 
 I have a special case for sampling with replacement: instead of
 sampling once and replacing it immediately, I sample n times, and
 then replace all n items.
 
 
 So:
 
 N entities
 x samples with replacement
 each sample consists of n sub-samples WITHOUT replacement, which are
 all replaced before the next sample is drawn
 
 My question is: which distribution can I use to describe how often
 each entity of the N has been sampled?

Surely, unless I am missing something, any given entity would have
(marginally) a binomial distribution:

A sub-sample of size n either contains the entity or it does not.  The
probability that a sub-sample contains the entity is a function of N
and n alone.

x sub-samples are drawn (with replacement), so the number of times that
an entity has been sampled is the number of sub-samples in which it
appears.  This is given by the binomial distribution with parameters x
and p, where p is the probability determined in the previous paragraph.

I guess the fun starts if you try to determine the joint distribution
of two (or more) entities.

HTH.

Cheers,

Berwin 

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OT: What distribution is this?

2010-09-26 Thread Berwin A Turlach
G'day Rainer,

On Sun, 26 Sep 2010 10:29:08 +0200
Rainer M Krug r.m.k...@gmail.com wrote:

 I realized that I am simply interested in the pdf p(y) that y
 *number* of entities (which ones is irrelevant) in N are are *not*
 drawn after the sampling process has been completed. Even simpler (I
 guess), in a first step, I would only need the mean number of
 expected non-drawn entities in N (pMean).
 
 But what about the range in between?

You can calculate those probilities.

Your problem was sounding vaguely familiar to me yesterday but I could
not put my finger onto its abstract counterpart.  Now with this
clarification I can. :)  Your set up is exactly the one of lottery
draws.  In each draw n of N numbers are drawn without replacement.  So
your question is what is the probability that I have not seen k of the
N numbers after x draws?. 

This question can easily be answered by using some Markov Chain
theory.  Let Y_l be the number of entities that you have not seen after
the l-th draw, taking possible values 0, 1, , N.  Obviously, Y_0=N
with probability 1, and Y_1=N-n with probability 1.  Now, the
probability that Y_l, l=1, is equal to k is given by;

 0 if k=N-n+1, N-n+2,..., N 
and  sum_{i=0,...,n} P[Y_l=k | Y_{l-1} = k+i] P[Y_{l-1} = k+i] o/w.
 
P[Y_l=k | Y_{l-1} = k+i] is the probability that the n entities sampled
in the l-th draw consists of i entities of the k+i entities that were
not seen by draw l-1 and n-i entities of the N-k-i entities that were
already seen by draw l-1. This probability is

choose(k+i, i)*choose(N-k-i, n-i) / choose(N, n)

Note that this probability is independent of l, i.e., Y_0, Y_1, Y_2,...
form a stationary Markov Chain.  

Thus, to answer your question, the strategy would be to calculate the
transition matrix once and then start with either the state vector of
Y_0 or of Y_1 (both of which are rather trivial as mentioned above) and
calculate the state vector of Y_x by repeatedly multiplying the chosen
initial state vector with the transition matrix for the appropriate
number of times.

The transition matrix is (N+1)x(N+1), so it can be rather large, but
the number of non-zero entries in the transition matrix is at most
(N+1)*(n+1), so depending on the relative magnitude of n and N, sparse
matrix techniques might be helpful (or using a language such as C,
FOTRAN, ... for the calculations).

HTH.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Spracheinstellung

2010-09-29 Thread Berwin A Turlach
G'day Tobias,

On Wed, 29 Sep 2010 14:01:10 +
Keller Tobias (kelleto0) kelle...@students.zhaw.ch wrote:

 Für unseren Statistik Unterricht brauchen wir das R-Programm.
 Ich habe das Programm heruntergeladen, deutsch gewählt und
 installiert. Das Problem ist nach 3mal neu installieren, dass das
 Programm immer auf englisch kommt.
 
 Können Sie mir helfen?

R for Windows FAQ 3.2:

http://cran.ms.unimelb.edu.au/bin/windows/base/rw-FAQ.html#I-want-R-in-English_0021

HTH.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to make list() return a list of *named* elements

2010-10-04 Thread Berwin A Turlach
On Mon, 4 Oct 2010 19:45:23 +0800
Berwin A Turlach ber...@maths.uwa.edu.au wrote:

Mmh,

 R my.return - function (vector.of.variable.names) {
   sapply(vector.of.variable.names, function(x) list(get(x)))
 }

make that:

R my.return - function (vector.of.variable.names) {  
 sapply(vector.of.variable.names, function(x) get(x))
   }

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to make list() return a list of *named* elements

2010-10-04 Thread Berwin A Turlach
G'day Hans,

On Mon, 4 Oct 2010 11:28:15 +0200
Hans Ekbrand h...@sociologi.cjb.net wrote:

 On Thu, Sep 30, 2010 at 09:34:26AM -0300, Henrique Dallazuanna wrote:
  You should try:
  
  eapply(.GlobalEnv, I)[c('b', 'my.c')]
 
 Great!
 
 b - c(22.4, 12.2, 10.9, 8.5, 9.2)
 my.c - sample.int(round(2*mean(b)), 4)
 
 my.return - function (vector.of.variable.names) {
   eapply(.GlobalEnv, I)[vector.of.variable.names]
 }

Well, if you are willing to create a vector with the variable names,
then simpler solutions should be possible, i.e. solutions that only
operate on the objects of interest and not on all objects in the global
environment (which could be a lot depending on your style).  For
example:

R my.return - function (vector.of.variable.names) {
  sapply(vector.of.variable.names, function(x) list(get(x)))
}
R str(my.return(c(b,my.c)))
List of 2
 $ b   : num [1:5] 22.4 12.2 10.9 8.5 9.2
 $ my.c: int [1:4] 7 5 23 4

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] exponentiate elements of a whole matrix

2010-10-13 Thread Berwin A Turlach
On Wed, 13 Oct 2010 11:51:39 +0100
Maas James Dr (MED) j.m...@uea.ac.uk wrote:

 I've tried hard to find a way to exponentiate each element of a whole
 matrix such that if I start with A
 
 A = [ 2   3
   2   4]
 
 I can get back B
 
 B = [ 7.38   20.08
   7.38   54.60]
 
 I've tried
 
 B - exp(A) but no luck.

What have you tried exactly?  And with which version?  This should work
with all R versions that I am familiar with, e.g.:

R A - matrix(c(2,2,3,4),2,2)
R A
 [,1] [,2]
[1,]23
[2,]24
R B - exp(A)
R B
 [,1] [,2]
[1,] 7.389056 20.08554
[2,] 7.389056 54.59815

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] declaring GPL license

2010-10-14 Thread Berwin A Turlach
G'day Marc,

On Thu, 14 Oct 2010 10:46:39 -0500
Marc Schwartz marc_schwa...@me.com wrote:

 If you want (and you should), create and include a file called
 COPYING in the 'inst' folder in the package, so that it gets copied
 to the main package directory upon installation. [...]

But that would be against the wishes of R core, from the Writing R
Extension manual:

Whereas you should feel free to include a license file in your
source distribution, please do not arrange to @emph{install} yet
another copy of the @acronym{GNU} @file{COPYING} or
@file{COPYING.LIB} files but refer to the copies on
@url{http://www.r-project.org/Licenses/} and included in the
@R{} distribution (in directory @file{share/licenses}).

:)

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] declaring GPL license

2010-10-14 Thread Berwin A Turlach
G'day Stacey,

On Thu, 14 Oct 2010 12:36:20 -0400
Stacey Wood sjw...@ncsu.edu wrote:

 Thanks everyone.  Now I know that I should not include another copy
 of the license, but where should I refer to the copies on
 http://www.r-project.org/Licenses/? In the DESCRIPTION file?

I used to mention that location in the License: line of the DESCRIPTION
file, after GPL (=2).  But some versions ago, that line got
standardised and now it is no longer possible to put additional text
onto that line. 

I guess you could just add a line LicenceLocation:  As far as I
understand the documentation, you are free to add additional lines
beyond those described in Writing R Extensions to the DESCRIPTION
file.  Some entries described in the manual are optional and for those
that are not some restrictions may apply that have to be followed.

Personally, I decided not to put any such reference into the
DESCRIPTION file of my packages as I guess that people who worry about
the lisence will either look into the source package (where I have a
copy) or know how to find one. :)

HTH.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrained Optimization

2009-11-04 Thread Berwin A Turlach
On Wed, 4 Nov 2009 14:48:08 -0800 (PST)
s t thamp...@yahoo.com wrote:

 I'm trying to do the following constrained optimization example.
 Maximize x1*(1-x1) + x2*(1-x2) + x3*(1-x3)
 s.t. x1 + x2 + x3 = 1
 x1 = 0 and x1 = 1
 x2 = 0 and x2 = 1
 x3 = 0 and x3 = 1
 which are the constraints.
 I'm expecting the answer x1=x2=x3 = 1/3.

This is a quadratic programming problem (mininimising/maximising a
quadratic function under linear equality and inequality constraints).

There are several R packages that solve such problems, e.g.:

R library(quadprog)
R Dmat - diag(3)
R dvec - rep(1,3)/3
R Amat - cbind(rep(1,3), diag(3), -diag(3))
R bvec - c(1, rep(0,3), rep(-1,3))
R solve.QP(Dmat, dvec, Amat, bvec, meq=1)
$solution
[1] 0.333 0.333 0.333

$value
[1] -0.167

$unconstrainted.solution
[1] 0.333 0.333 0.333

$iterations
[1] 2 0

$iact
[1] 1

You may want to consult:
http://cran.r-project.org/web/views/Optimization.html

HTH.

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behaviour for as.date()

2009-11-10 Thread Berwin A Turlach
G'day Isabella,

On Tue, 10 Nov 2009 18:40:11 -0800
Isabella Ghement isabe...@ghement.ca wrote:

 I am trying to use the function as.date() from the dates package 

As far as I can tell, there is no package called dates, did you mean
the package date? 

 in R 2.10.0 to convert a character date to a Julian date, as follows:
 
  as.date(02-MAY-01, order=mdy) # convert May 2, 2001 to a Julian
  date
 [1] 2May1

Are you sure that you are doing what your comments imply?  Try:

R library(date)
R as.date(02-MAY-01, order=mdy)
[1] 2May1
R as.date(02-MAY-2001, order=mdy)
[1] 2May2001
R as.numeric(as.date(02-MAY-01, order=mdy))
[1] -21428
R as.numeric(as.date(02-MAY-2001, order=mdy))
[1] 15097
 
 However, when trying to convert a character date from the year 2000
 to a Julian date, I get an NA instead of the desired Julian date:
 
  as.date(02-MAY-00, order=mdy) # convert May 2, 2000 to a Julian
  date
 [1] NA
 
 Not quite sure why R is unable to handle this type of date (perhaps it
 thinks it comes from the year 1900?!).

My guess it thinks it comes from the year 0.  Not sure why it cannot
handle such dates.  But then, as far as I know, there is actually some
discussion about whether the year 0 exist or whether we went straight
from 1BC to 1AD..

 Is there a way to force R to convert character dates from the year
 2000 into Julian dates? 

Presumably you will need something like:

R as.date(sub(-00, -2000, 02-MAY-00))
[1] 2May2000

HTH.

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behaviour for as.date()

2009-11-10 Thread Berwin A Turlach
G'day Isabella,

On Tue, 10 Nov 2009 20:11:31 -0800
Isabella Ghement isabe...@ghement.ca wrote:

 I tried the solution you suggested and find that am having problems
 getting R to extract the year from an object created by as.date():

Perhaps it would be best to first clarify what you really need. :)

If you have to extract the year, why go via as.date?  Why not just:

R strsplit(02-MAY-00, -)[[1]][3]
[1] 00
R as.numeric(strsplit(02-MAY-00, -)[[1]][3])
[1] 0

Or if you have a vector of character strings, each a date in that
format:

R x - c(02-MAY-00, 02-MAY-01)
R sapply(x, function(x) as.numeric(strsplit(x, -)[[1]][3]))
02-MAY-00 02-MAY-01 
0 1 

HTH.

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R and HDF5 Question

2009-11-13 Thread Berwin A Turlach
G'day Scott,

On Fri, 13 Nov 2009 09:52:43 -0700
Scott MacDonald scott.p.macdon...@gmail.com wrote:

 I am trying to load an hdf5 file into R and running into some
 problems. 

It's a while that I used hdf5 files and that package in R, but:

 This builds fine. The library seems to load without issue, but no
 data is returned when I try to load a file:
 
  library(hdf5)
  hdf5load(test.h5)
  NULL

Is NULL the return of the hdf5load command or are you typing it on the
command line?

Anyway, .hdf5 files can contain several objects, just as R's .rda
file.  load() will load an .rda file and put all objects in that file
into the workspace.  Likewise, hdf5load() loads an hdf5 file and puts
all objects in that file into the workspace. 

 Yet,
 
 osx:data scott$ h5dump test.h5 HDF5 test.h5 { GROUP
 / { DATASET dset { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE
 { ( 31 ) / ( 31 ) } DATA { (0): 1, 2, 4, 8, 16, 32, 64, 128, 256,
 512, 1024, 2048, 4096, 8192, (14): 16384, 32768, 65536, 131072,
 262144, 524288, 1048576, 2097152, (22): 4194304, 8388608, 16777216,
 33554432, 67108864, 134217728, (28): 268435456, 536870912,
 1073741824 } } } }
 
 Any thoughts?

Did you try an ls() after the hdf5load() command?  If the hdf5load()
command was successfull, an ls() should show you that an object with
name dset is now in your workspace; if I read the output above
correctly.

HTH.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Relase positive with log and zero of negative with 0

2009-11-15 Thread Berwin A Turlach
G'day Kevin,

On Sun, 15 Nov 2009 7:18:18 -0800
rkevinbur...@charter.net wrote:

 This is a very simple question but I couldn't form a site search
 quesry that would return a reasonable result set.
 
 Say I have a vector:
 
 x - c(0,2,3,4,5,-1,-2)
 
 I want to replace all of the values in 'x' with the log of x.
 Naturally this runs into problems since some of the values are
 negative or zero. So how can I replace all of the positive elements
 of x with the log(x) and the rest with zero?

If you do not mind a warning message:

R x - c(0,2,3,4,5,-1,-2)
R x - ifelse(x = 0,0, log(x))
Warning message:
In log(x) : NaNs produced
R x
[1] 0.000 0.6931472 1.0986123 1.3862944 1.6094379 0.000 0.000

If you do mind, then:

R x - c(0,2,3,4,5,-1,-2)
R ind - x0
R x[!ind] - 0
R x[ind] - log(x[ind])
R x
[1] 0.000 0.6931472 1.0986123 1.3862944 1.6094379 0.000 0.000

HTH.

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Gelman 2006 half-Cauchy distribution

2010-05-28 Thread Berwin A Turlach
G'day Chris,

On Fri, 28 May 2010 08:29:30 -0500
Christopher David Desjardins cddesjard...@gmail.com wrote:

 Hi,
 I am trying to recreate the right graph on page 524 of Gelman's 2006 
 paper Prior distributions for variance parameters in hierarchical 
 models in Bayesian Analysis, 3, 515-533. I am only interested,
 however, in recreating the portion of the graph for the overlain
 prior density for the half-Cauchy with scale 25 and not the posterior
 distribution. However, when I try:
 
 curve(dcauchy, from=0, to=200, location=0, scale=25)

Which version of R do you use?  This command creates 12 warnings under
R 2.11.0 on my linux machine.  

Reading up on the help page of curve() would make you realise that you
cannot pass the location and scale parameter to dcauchy in the manner
you try.  I guess you want:

R prior - function(x) 2*dcauchy(x,location=0, scale=25)
R curve(prior, from=0, to=200)

or, more compactly,

R curve(2*dcauchy(x, location=0, scale=25), from=0, to=200)

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rgl installation failure

2010-06-05 Thread Berwin A Turlach
G'day,

On Sat, 5 Jun 2010 12:51:08 +0530
khush  bioinfo.kh...@gmail.com wrote:

 I am trying to install rgl package under R and getting some errors
 which is below.
 
  install.packages(rgl)
 Warning in install.packages(rgl) :
   argument 'lib' is missing: using '/usr/lib/R/library'
 trying URL 'http://cran.csie.ntu.edu.tw/src/contrib/rgl_0.91.tar.gz'
 Content type 'application/x-gzip' length 1677498 bytes (1.6 Mb)
 opened URL
 ==
 downloaded 1.6 Mb
 
 * installing *source* package _rgl_ ...
 checking for gcc... gcc -m32 -std=gnu99
 checking for C compiler default output file name... a.out
 checking whether the C compiler works... yes
 checking whether we are cross compiling... no
 checking for suffix of executables...
 checking for suffix of object files... o
 checking whether we are using the GNU C compiler... yes
 checking whether gcc -m32 -std=gnu99 accepts -g... yes
 checking for gcc -m32 -std=gnu99 option to accept ISO C89... none
 needed checking how to run the C preprocessor... gcc -m32 -std=gnu99
 -E checking for gcc... (cached) gcc -m32 -std=gnu99
 checking whether we are using the GNU C compiler... (cached) yes
 checking whether gcc -m32 -std=gnu99 accepts -g... (cached) yes
 checking for gcc -m32 -std=gnu99 option to accept ISO C89... (cached)
 none needed
 checking for libpng-config... no
 checking libpng... checking for grep that handles long lines and -e...
 /bin/grep
 checking for egrep... /bin/grep -E
 checking for ANSI C header files... yes
 checking for sys/types.h... yes
 checking for sys/stat.h... yes
 checking for stdlib.h... yes
 checking for string.h... yes
 checking for memory.h... yes
 checking for strings.h... yes
 checking for inttypes.h... yes
 checking for stdint.h... yes
 checking for unistd.h... yes
 checking png.h usability... no
 checking png.h presence... no
 checking for png.h... no
 checking for png_read_update_info in -lpng... no

You should heed indications as the above, i.e. that items checked for
are not found.

[...]

 In file included from pixmap.cpp:14:
 pngpixmap.h:3:17: error: png.h: No such file or directory

And this compiler error seems to be directly linked to the issues
identified by configure; and all further error come from the fact that
png.h cannot be found.

You seem to be using some flavour of Unix, presumably GNU/Linux; but
from your posting it is impossible to say which one.  On a Debian based
system (e.g. Kubuntu 9.10 which I am using), I would issue the command
(assuming that apt-file is installed and initialised):

ber...@bossiaea:~$ apt-file find include/png.h
libpng12-dev: /usr/include/png.h

and then check whether the libpng12-dev package is installed.  If not,
I would install it and retry.  Perhaps that would be all that is needed
to compile rgl, but there might be other packages missing.

If you have a system based on RedHat and Suse, you will have to figure
out how to use the package management tools on those systems to find
out which packages have to be installed additionally so that you are
able to compile rgl.

HTH.

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] forcing a zero level in contr.sum

2010-07-07 Thread Berwin A Turlach
G'day Stephen,

On Wed, 7 Jul 2010 16:41:18 -0400
Bond, Stephen stephen.b...@cibc.com wrote:

 Please, do not post if you do not know the answer. People will see
 this has answers and skip.
 
 I tried with 
 mat1=contrasts(fixw$snconv)
 mat1=mat1[,-2]
 summary(frm2sum - glm(resp.frm ~
 C(snconv,contr=mat1)+mprime+mshape,data=fixw,family=quasibinomial))
 
 the unwanted level is still there. Unbelievable.

model.matrix() instead of summary() would reveal that the estimate
corresponding to the unwanted level is probably estimating something
quite different then what you think it estimates.

?C and look at the how.many argument, presumably you want 
C(snconv, contr=mat1, how.many=NCOL(mat1))

To see what is going on, consider:

R ( tmp - gl(4,3) )
 [1] 1 1 1 2 2 2 3 3 3 4 4 4
Levels: 1 2 3 4
R options(contrasts=c(contr.sum, contr.poly))
R ( tt - contrasts(tmp) )
  [,1] [,2] [,3]
1100
2010
3001
4   -1   -1   -1
R tt - tt[,-2]
R contrasts(tmp, 2) - tt
R tmp
 [1] 1 1 1 2 2 2 3 3 3 4 4 4
attr(,contrasts)
  [,1] [,2]
110
200
301
4   -1   -1
Levels: 1 2 3 4
R contrasts(tmp) - tt
R tmp
 [1] 1 1 1 2 2 2 3 3 3 4 4 4
attr(,contrasts)
  [,1] [,2]   [,3]
110  0.2886751
200 -0.8660254
301  0.2886751
4   -1   -1  0.2886751
Levels: 1 2 3 4

HTH.

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] advice/opinion on - vs = in teaching R

2010-01-15 Thread Berwin A Turlach
G'day all,

On Fri, 15 Jan 2010 09:08:44 - (GMT)
(Ted Harding) ted.hard...@manchester.ac.uk wrote:

 On 15-Jan-10 08:14:04, Barry Rowlingson wrote:
  [...]
  I would regard modifying a variable within the parameters of a
  function call as pretty tasteless. What does:
  
   foo(x-2,x)
  or
   foo(x,x-3)
  
  do that couldn't be done clearer with two lines of code?

Depending on the definition of foo() more than two lines might be
necessary to clarify what this code is actually supposed to do. :)

   Remember: 'eschew obfuscation'.

Absolutely.

 Tasteless or not, the language allows it to be done; and therefore
 discussion of distinctions between ways of doing it is relevant to
 Erin's question!

And a full discussion of these examples would include to warn about the
side effects of lazy evaluation:

R rm(list=ls(all=TRUE))
R foo - function(x, y){
+   if(x  0 ) x else x + y}
R foo(2, x - 10)
[1] 2
R x
Error: object 'x' not found
R foo(-2, x - 10)
[1] 8
R x
[1] 10

Having said this, I often use plot(fm - lm(y ~ . , somedata)).

And, yes, students have complained that the code I gave them was not
working, that they got a strange error message and they promised that
they had typed exactly what I had written on the lab sheet.  On
checking, they had, of course, typed plot(fm = lm(y ~ . , somedata))

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is it solve.QP or is it me?

2007-09-21 Thread Berwin A Turlach
G'day Talbot,

regarding the subject line, perhaps neither, it may be your OS, chip or
maths library. :)

On my Intel Core2 Duo machine running under linux all your examples
work without error message.  What kind of machine are you using?

On Fri, 21 Sep 2007 12:38:05 -0400
Talbot Katz [EMAIL PROTECTED] wrote:

 [..] I was wondering whether anyone has any tricks to share for
 mitigating these kind of problems and still generating feasible
 solutions. 

I believe that one of the recommendations in the numerical literature
is to try to keep the norms of the columns of the matrix A on similar
scales.  Try to divide the first two columns of A (and the first two
entries in the vector b) by the square root of n.  

Hope that helps.

Cheers,

Berwin 

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-squared value for linear regression passing through origin using lm()

2007-10-19 Thread Berwin A Turlach
G'day Ralf,

On Fri, 19 Oct 2007 09:51:37 +0200
Ralf Goertz [EMAIL PROTECTED] wrote:

 Thanks to Thomas Lumley there is another convincing example. But still
 I've got a problem with it:
 
  x-c(2,3,4);y-c(2,3,3)
 
 [...]
 That's okay, but neither [...] nor [...]
 give the result of summary(lm(y~x+0)), which is 0.9796. 

Why should either of those formula yield the output of
summary(lm(y~x+0)) ?  The R-squared output of that command is
documented in help(summary.lm):

r.squared: R^2, the 'fraction of variance explained by the model',

  R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2),

  where y* is the mean of y[i] if there is an intercept and
  zero otherwise.

And, indeed:

 1-sum(residuals(lm(y~x+0))^2)/sum((y-0)^2)
[1] 0.9796238

confirms this.

Note: if you do not have an intercept in your model, the residuals do
not have to add to zero; and, typically, they will not.  Hence,
var(residuals(lm(y~x+0)) does not give you the residual sum of squares.

 In order to save the role of R^2 as a goodness-of-fit indicator 

R^2 is no goodness-of-fit indicator, neither in models with intercept
nor in models without intercept.  So I do not see how you can save its
role as a goodness-of-fit indicator. :)

Since you are posting from a .de domain, I assume you will understand
the following quote from Tutz (2000), Die Analyse kategorialer Daten,
page 18:

R^2 misst *nicht* die Anpassungsguete des linearen Modelles, es sagt
nichts darueber aus, ob der lineare Ansatz wahr oder falsch ist, sondern
nur ob durch den linearen Ansatz individuelle Beobachtungen
vorhersagbar sind.  R^2 wird wesentlich vom Design, d.h. den Werten,
die x annimmt bestimmt (vgl. Kockelkorn (1998)).  

The latter reference is:

Kockelkorn, U. (1998).  Lineare Modelle. Skript, TU Berlin.

 in zero intercept models one could use the same formula like in models
 with a constant. I mean, if R^2 is the proportion of variance
 explained by the model we should use the a priori variance of y[i].
 
  1-var(residuals(lm(y~x+0)))/var(y)
 [1] 0.3567182
 
 But I assume that this has probably been discussed at length somewhere
 more appropriate than r-help.

I am sure about that, but it was also discussed here on r-help (long
ago).  The problem is that this compares two models that are not nested
in each other which is a quite controversial thing to do; some might
even go so far as saying that it makes no sense at all.  The other
problem with this approaches is illustrated by my example:

 set.seed(20070807)
 x - runif(100)*2+10
 y - 4+rnorm(x, sd=1)
 1-var(residuals(lm(y~x+0)))/var(y)
[1] -0.04848273

How do you explain that a quantity that is called R-squared, implying
that it is the square of something, hence always non-negative, can
become negative?

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] indices of rows containing one or more elements 0

2008-02-12 Thread Berwin A Turlach
G'day Stanley,

On Wed, 13 Feb 2008 10:40:09 +0800
Ng Stanley [EMAIL PROTECTED] wrote:

 Hi,
 
 Given test - matrix(c(0,2,0,1,3,5), 3,2)
 
  test[test0]
 [1] 2 1 3 5
 
 These are values 0
 
  which(test0)
 [1] 2 4 5 6
 
 These are array indices of those values 0
 
  which(apply(test0, 1, all))
 [1] 2
 
 This gives the row whose elements are all 0
 
 I can't seem to get indices of rows containing one or more elements 0

which(apply(test0, 1 any)) instead of which(apply(test0, 1, all)) ??

HTH.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] stumped by eval

2008-02-13 Thread Berwin A Turlach
G'day Peter,

On Wed, 13 Feb 2008 08:03:07 +0100
Peter Dalgaard [EMAIL PROTECTED] wrote:

 Ross Boylan wrote:
  In the following example, the inner evaluation pulls in the global
  value of subset (a function) rather than the one I thought I was
  passing in (a vector).  Can anyone help me understand what's going
  on, and what I need to do to fix the problem?

[...]

 The point is that subset (and offset) arguments are subject to the
 same evaluation rules as the terms inside the formula: First look in
 data, then in the environment of the formula, which in this case is
 the global environment.

Perhaps I have a [senior|blonde]-day today, but this does not seem to
be the full explanation about what is going on to me.  According to this
explanation the following should not work:

 lm(Reading~0+Spec+Reader, netto, subset=c(1) )

Call:
lm(formula = Reading ~ 0 + Spec + Reader, data = netto, subset = c(1))

Coefficients:
  Spec  Reader  
 1  NA  

since the value passed to subset is not part of data and not in the
global environment. But, obviously, it works.  OTOH, if we change f0 to

 f0
function(formula, data, subset, na.action)
{
  lm(formula, data, subset=subset, na.action=na.action)
}

then we get the same behaviour as with Ross's use of f1 inside of f0:

 t3 - f0(Reading~0+Spec+Reader, netto, c(1) )
Error in xj[i] : invalid subscript type 'closure'

More over, with the original definition of f0:
 f0
function(formula, data, subset, na.action)
{
  f1(formula, data, subset, na.action)
}
 (f1(Reading~0+Spec+Reader, netto, subset= Spec==1 ))
  Reading Spec Reader
1   11  1
 f0(Reading~0+Spec+Reader, netto, subset= Spec==1 )
Error in xj[i] : invalid subscript type 'closure'

Given your explanation, I would have expected this to work.

Reading up on `subset' in ?model.frame also does not seem to shed light
onto what is going on.

Remaining confused.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quadratic Programming

2008-02-15 Thread Berwin A Turlach
G'day Jorge,

On Fri, 15 Feb 2008 17:51:16 -0500
Jorge Aseff [EMAIL PROTECTED] wrote:

 I am using solve.QP (from quadprog) to solve a standard quadratic
 programming problem: min_w -0.5*w'Qw st ... I would like solve.QP to
 do two things: 1) to start the optimization from a user-supplied
 initial condition; i.e., from a vector w_0 that satisfies the
 constraints, 

Not possible.  solve.QP is based on the Goldfarb-Idnani algorithm which
first calculates the unconstrained solution of the problem and then
checks for violated constraints; if such exist, then they are
iteratively enforced until all constraints are satisfied.

If you want to specify starting values (or have warm starts), then you
would need to use other kind of algorithms.

 and 2) to return the values of the lagrange multiplieres
 associated with the constraints.

See:
https://stat.ethz.ch/pipermail/r-devel/2007-December/047636.html

Hope this helps.

Best wishes,

Berrwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Questions about EM algorithm

2008-02-19 Thread Berwin A Turlach
G'day Sean,

On Fri, 15 Feb 2008 09:12:22 +0800
Hung-Hsuan Chen (Sean) [EMAIL PROTECTED] wrote:

 Assume I have 3 distributions, x1, x2, and x3.
 x1 ~ normal(mu1, sd1)
 x2 ~ normal(mu2, sd2)
 x3 ~ normal(mu3, sd3)
 y1 = x1 + x2
 y2 = x1 + x3
 
 Now that the data I can observed is only y1 and y2. It is
 easy to estimate (mu1+m2), (mu1+mu3), (sd1^2+sd2^2) and
 (sd1^2+sd3^2) by EM algorithm since

Isn't it a bit of an overkill to use an EM algorithm here?  There are
explicit formula for the estimators (namely the sample average and
the sample variance) of those quantities.  O.k., these formula may not
yield MLE, but it should be very easy to correct for that.  

 y1 ~ normal(mu1+mu2, sqrt(sd1^2+sd2^2)) and
 y2 ~ normal(mu1+mu3, sqrt(sd1^2+sd3^2))
 
 However, I want to estimate mu1, mu2, mu3, sd1, sd2, and sd3.
 Is it possible to do so by EM algorithm (or any other estimation
 methods like gibbs sampling or MLE) ?

EM algorithms are a way of calculating MLEs by framing the problem
(explicitly or implicitly) in a missing data context.  So EM
algorithm or MLE are not different methods.  The former is a way of
calculating the latter; of course, the latter can also be calculated by
directly maximising the (log)likelihood function.

You did not say so explicitly, but I guess you are assuming that x1, x2
and x3 are independent, are you?  At least under this assumption it is
easy to deduce that the distribution of y1 and y2 are as you stated.
If you do not assume independence of x1, x2, x3, what other assumptions
do you do to arrive at these distributions for y1 and y2?

Under the assumption of independence of x1, x2, x3, one would also have
that Cov(y1,y2)=sd1^2.  Together with the the fact that
Var(y1)=sd1^2+sd2^2 and Var(y2)=sd1^2+sd3^2, this makes the three
standard deviations identifiable, and you can readily estimate them.

Actually, if x1, x2 and x3 are independent, then they would be jointly
normal, hence y1 and y2 would be jointly normal, whence it would be easy
to write down the likelihood of the parameters given y1 and y2 and find
the MLEs for sd1, sd2, sd3.

For mu1, mu2 and mu3 you have an identifiable problem, the two triples
(mu1, mu2, mu3) and (mu1+c, mu2-c, mu3-c) (where c is any fixed
number) would yield exactly the same likelihood value.  Hence, these
three parameters are not identifiable.  You would have to fix one of
them arbitrarily, say mu1=0.

Best wishes,

Berwin 

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] debugging a try() statement

2008-02-19 Thread Berwin A Turlach
G'day Juliet,

On Tue, 19 Feb 2008 21:35:11 -0500
Juliet Hannah [EMAIL PROTECTED] wrote:

 I implemented a try() statement that looks like:
 
  - function(index)
 {
 
reduced_model - try(glm.fit(X4,n,family=poisson(link=log)))
full_model - try(glm.fit(X5,n,family=poisson(link=log)))
 
if (inherits(reduced_model,try-error) ||
 inherits(full_model,try-error)) return(NA)
 
else
{
  p - pchisq(reduced_model$deviance - full_model$deviance,
 reduced_model$df.residual - full_model$df.residual, lower.tail= FALSE)
  return (p)
}
 
 }
 
 After an error occurs NA is returned. But after this occurs, all
 values returned after this remain as an NA even though this error
 should occur only 1/500 to 1/1000 times.

[...]

 Is there anything obviously incorrect with my function above. 

Yes, namely:

1) you do not assign it to any object, so how do you call it?
2) it has a formal argument which is not used anywhere.  I guess that
   would be considered bad programming style in any programming
   language. 
3) The way your code is formatted, the return(NA) is on the same line
   has the if-clause.  Hence, at the end of that statement R's parser
   has read a complete statement which is then executed.  Whence, when
   the parser comes across the else, a syntax error should be
   produced. 

Non of these would explain the problem that you mention, but point 3)
raises the question why this code actually works.  From what is shown,
there is (at least to me) no obvious explanation for the behaviour that
you see.

 If not, I will post a full example to illustrate my question.

Well, please not. :) It is pretty obvious, that the code above is
derived by cut  paste from a larger body of code and does not run on
its own. So there is no guarantee that this piece of the code is the
one where the problem lies and you cannot seriously expect people to
debug such code for you (and for that reason your previous posting
probably did not get an answer).  

Likewise, you should not expect people to reverse-engineer and debug a
large body of code for you.  You should do what the footer of e-mails
to r-help requests, namely:

 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented,
 minimal, self-contained, reproducible code.

Typically, while trying to produce such a commented, minimal,
self-contained, reproducible example of the problematic code one finds
the bug.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrained regression

2008-03-03 Thread Berwin A Turlach
G'day Carlos,

On Mon, Mar 3, 2008 at 11:52 AM 
Carlos Alzola [EMAIL PROTECTED] wrote:

  I am trying to get information on how to fit a linear regression
 with constrained parameters. Specifically, I have 8 predictors ,
 their coeffiecients should all be non-negative and add up to 1. I
 understand it is a quadratic programming problem but I have no
 experience in the subject. I searched the archives but the results
 were inconclusive.

  Could someone provide suggestions and references to the
 literature, please?

A suggestion:

 library(MASS)   ## to access the Boston data
 designmat - model.matrix(medv~., data=Boston)
 Dmat - crossprod(designmat, designmat)
 dvec - crossprod(designmat, Boston$medv)
 Amat - cbind(1, diag(NROW(Dmat)))
 bvec - c(1, rep(0,NROW(Dmat))
 meq - 1
 library(quadprog)
 res - solve.QP(Dmat, dvec, Amat, bvec, meq)

The solution seems to contain values that are, for all practical
purposes, actually zero:

 res$solution
 [1]  4.535581e-16  2.661931e-18  1.016929e-01 -1.850699e-17
 [5]  1.458219e-16 -3.892418e-15  8.544939e-01  0.00e+00
 [9]  2.410742e-16  2.905722e-17 -5.700600e-20 -4.227261e-17
[13]  4.381328e-02 -3.723065e-18

So perhaps better:

 zapsmall(res$solution)
 [1] 0.000 0.000 0.1016929 0.000 0.000 0.000
 [7] 0.8544939 0.000 0.000 0.000 0.000 0.000
[13] 0.0438133 0.000

So the estimates seem to follow the constraints.

And the unconstrained solution is:

 res$unconstrainted.solution
 [1]  3.645949e+01 -1.080114e-01  4.642046e-02  2.055863e-02
 [5]  2.686734e+00 -1.776661e+01  3.809865e+00  6.922246e-04
 [9] -1.475567e+00  3.060495e-01 -1.233459e-02 -9.527472e-01
[13]  9.311683e-03 -5.247584e-01

which seems to coincide with what lm() thinks it should be:

 coef(lm(medv~., Boston))
  (Intercept)  crimzn indus  chas 
 3.645949e+01 -1.080114e-01  4.642046e-02  2.055863e-02  2.686734e+00 
  noxrm   age   dis   rad 
-1.776661e+01  3.809865e+00  6.922246e-04 -1.475567e+00  3.060495e-01 
  tax   ptratio black lstat 
-1.233459e-02 -9.527472e-01  9.311683e-03 -5.247584e-01 

So there seem to be no numeric problems.  Otherwise we could have done
something else (e.g calculate the QR factorization of the design
matrix, say X, and give the R factor to solve.QP, instead of
calculating X'X and giving that one to solve.QP).

If the intercept is not supposed to be included in the set of
constrained estimates, then something like the following can be done:

 Amat[1,] - 0
 res - solve.QP(Dmat, dvec, Amat, bvec, meq)
 zapsmall(res$solution)
 [1] 6.073972 0.00 0.109124 0.00 0.00 0.00 0.863421
 [8] 0.00 0.00 0.00 0.00 0.00 0.027455 0.00

Of course, since after the first command in that last block the second
column of Amat contains only zeros
 Amat[,2]
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
we might as well have removed it (and the corresponding entry in bvec)
 Amat - Amat[, -2]
 bvec - bvec[-2]
before calling solve.QP().

Note, the Boston data set was only used to illustrate how to fit such 
models, I do not want to imply that these models are sensible for these
data. :-)

Hope this helps.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange paste, string or package problem?

2008-03-03 Thread Berwin A Turlach
G'day Thomas,

On Tue, 4 Mar 2008 14:40:35 +1300
Thomas Allen [EMAIL PROTECTED] wrote:

 I came across this strange bug the other day, I'm not sure how to
 solve it and I wonder if anyone can even replicate it.

 Step 1) Make an R package using the package.skeleton() command with
 only these two functions:
 
 error - function(){
   cmd - paste( -a ,1, -a ,1, -a ,1,
 -a ,1, -a ,1, -a ,1,
 -a ,1, -a ,1, -a ,1,
 -a ,1, -a ,1, -a ,1,
 –a ,1,sep=)
   cat(cmd,\n)
 }

 Now why does that e28093 replace one of the - in the first
 command?

With my mailtool, the - before the last a looks a bit longer than the
others; it definitely seems to be a different tool.  Also, if I cut and
paste this function into an emacs buffer on my Kubuntu machine, I see:

error - function(){
  cmd - paste( -a ,1, -a ,1, -a ,1,
-a ,1, -a ,1, -a ,1,
-a ,1, -a ,1, -a ,1,
-a ,1, -a ,1, -a ,1,
\u2013a ,1,sep=)
  cat(cmd,\n)
}

So somehow you must have entered not a - but some other symbol before
that last a.  And I guess what you see is a result of the locale and
the character encoding that you are working in.  But others would know
more about this than I and can probably explain better what is going on.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting formula from an lm object

2008-08-24 Thread Berwin A Turlach
G'day Murray,

On Sun, 24 Aug 2008 18:34:39 +1200
Murray Jorgensen [EMAIL PROTECTED] wrote:

 I want to extra the part of the formula not including the response
 variable from an lm object. For example if the lm object ABx.lm was
 created by the call
 
 ABx.lm - lm( y ~ A + B + x, ...)
 
 Then ACx.lm is saved as part of a workspace.
 I wish to extract   ~ A + B + x.  Later in my code I will fit
 another linear model of the form z ~ A + B + x for some other
 response variable z. I would be grateful for any suggestions of a
 nice way to do this.

AFAIK, a formula is essentially a list of two or three components.  The
first component is ~.  The second is the LHS of the formula if there
are three components; otherwise the RHS of the formula.  The third
component, if it exists, is the RHS of the formula.

So storing ~ A + B + x and manipulating this part for different
responses could turn out to be painful; you would have to insert the
new LHS as the second component of the list.  I would suggest that it
is easier to store the complete formula and just manipulate the LHS;
see:

R library(MASS)
R fm - lm(time~dist+climb, hills)
R formula(fm)
time ~ dist + climb
R formula(fm)[[1]]
`~`
R formula(fm)[[2]]
time
R formula(fm)[[3]]
dist + climb
R tt - formula(fm)
R tt[[2]] - NULL
R tt
~dist + climb

R tt - formula(fm)
R class(tt[[2]])
[1] name
R typeof(tt[[2]])
[1] symbol
R tt[[2]] - as.name(y)
R tt
y ~ dist + climb

R tt - formula(fm)
R tt[[2]] - as.symbol(z)
R tt
z ~ dist + climb

HTH.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using R to generate reports

2008-09-05 Thread Berwin A Turlach
G'day all,

On Fri, 5 Sep 2008 18:10:37 -0400
Jorge Ivan Velez [EMAIL PROTECTED] wrote:

 Dear Bill,
 See http://www.statistik.lmu.de/~leisch/Sweave/

The following R News article might also be of interest:

Sven Garbade and Peter Burgard. Using R/Sweave in everyday
clinical practice. R News, 6(2):26-31, May 2006.

If one is new to LaTeX, the following articles might be of interest too:

Max Kuhn. Sweave and the open document format - the odfWeave
package. R News, 6(4):2-8, October 2006.

Gregor Gorjanc. Using sweave with lyx. R News, 8(1):2-9, May
2008.

Finally, check out:

http://bioinformatics.oxfordjournals.org/cgi/content/full/24/2/276

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with function rep

2008-06-12 Thread Berwin A Turlach
G'day Julien,

On Thu, 12 Jun 2008 16:48:43 +0200
Julien Hunt [EMAIL PROTECTED] wrote:

 I am currently writing a program where I need to use function rep.
 The results I get are quite confusing. Given two 
 vectors A and B, I want to replicate a[1] b[1] 
 times, a[2] b[2] times and so on.
 All the entries of vector B are positive integers.
 My problem comes from the fact that if I sum up 
 all the elements of B, [...]

Others mentioned already the need for a reproducible example.  But my
guess is that the elements in B are calculated.  Recently, I was sent
the following code by a colleague of mine:

---
Hi Berwin,

Try this in R2.7.0

pai = c(.4,.1,.1,.4)
s = .5

p = diag(1-s, 4) + s * t(matrix(pai, 4, 4))
f = diag(pai) %*% p
z = 200*f

### bug???
z
sum(z)
length(rep(1:16, z))
length(rep(1:16, round(z)))


I tested the code and my answer was:

---
Interesting variation on FAQ 7.31:
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f

Look at z-round(z) and where the negative residuals are.



My money is on you having the same problem and that using round(B)
instead of B in the rep() command will solve your problem.

HTH.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bug in nls?

2008-06-26 Thread Berwin A Turlach
G'day Petr,

On Thu, 26 Jun 2008 13:57:39 +0200
Petr PIKAL [EMAIL PROTECTED] wrote:

 I just encountered a strange problem with nls formula. I tried to use
 nls in cycle but I was not successful. I traced the problem to some
 parse command.
 
 [...]
 
 
 I am not sure if this behaviour is a bug or feature.  [...]

It is definitely a feature.  

It is an error to believe that all modelling functions that use
modelling formulae use the same syntax for their modelling formulae.
This is, perhaps, easiest realised by observing how lm() and nls()
interpret * and / in model formulae.

In nls(), [..] can be used to index parameters, if the parameter is
allowed to change between groups in the data.  This seems to be a
little known feature, though there is an example that uses that feature
in MASS.  The contributed documentation An Introduction to R: Software
for Statistical Modelling  Computing by Petra Kuhnert and Bill
Venables, available from CRAN, also has such an example on pages 134
 and 230.  

The fact that nls() allows you to use [..] to index parameters in the
model formulae seems to conflict with the way you wanted to specify
the observed values in the formula.  I guess Gabor's solution is a fix
for your problem.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recursively divide a value to get a sequence

2008-07-09 Thread Berwin A Turlach
G'day all,

On Wed, 09 Jul 2008 20:33:39 +1000
Jim Lemon [EMAIL PROTECTED] wrote:

 On Wed, 2008-07-09 at 11:40 +0200, Anne-Marie Ternes wrote:
  Hi,
  
  if given the value of, say, 15000, I would like to be able to divide
  that value recursively by, say, 5, and to get a vector of a
  determined length, say 9, the last value being (set to) zero- i.e.
  like this:
  
  15000 3000 600 120 24 4.8 0.96 0.192 0
  
  These are in fact concentration values from an experiment. For my
  script, I get only the starting value (here 15000), and the factor
  by which concentration is divided for each well, the last one
  having, by definition, no antagonist at all.
  
  I have tried to use seq, but it can only do positive or negative
  increment. I didn't either find a way with rep, sweep etc. These
  function normally start from an existing vector, which is not the
  case here, I have only got a single value to start with.
  
  I suppose I could do something loopy, but I'm sure there is a
  better way to do it.
  
 Well, if you really want to do it recursively (and maybe loopy as
 well)
 
 recursivdiv-function(x,denom,lendiv,firstpass=TRUE) {
  if(firstpass) lendiv-lendiv-1
  if(lendiv  1) {
   divvec-c(x/denom,recursivdiv(x/denom,denom,lendiv-1,FALSE))
   cat(divvec,ndiv,\n)
  }
  else divvec-0
  if(firstpass) divvec-c(x,divvec)
  return(divvec)
 }


Or, a bit more compactly:

recursivdiv - function(x,denom,lendiv) {
if(lendiv  1) {
  divvec-c(x, Recall(x/denom,denom,lendiv-1))
}
else divvec-0
return(divvec)
  }

which will continue to work if the function is renamed (say, 
rcd - recursivdiv) due to the use of Recall()

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro wilk normality test

2008-07-13 Thread Berwin A Turlach
G'day all,

On Sun, 13 Jul 2008 15:55:38 +0100 (BST)
(Ted Harding) [EMAIL PROTECTED] wrote:

 On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote:
  [...]
  A large P-value means nothing more than needing more data.  No 
  conclusion is possible.  

I would have thought that we need more data would qualify as a
conclusion. :)

  Please read the classic paper Absence of Evidence is not Evidence
  for Absence.
 
 Is that ironic, Frank, or is there really a classic paper with
 that title? If so, I'd be pleased to have a reference to it!

Of course, I do not know for sure which paper Frank has in mind, but
google and google schoar readily come up with papers/editorials that
have a nearly identical title:

http://www.bmj.com/cgi/content/full/311/7003/485
http://bmj.bmjjournals.com/cgi/content/full/328/7438/476
(see also
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=351831)
http://www.ncbi.nlm.nih.gov/pubmed/6829975

My money is on Frank having the first of these publications in mind.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NAMESPACE vs internal.Rd

2008-07-16 Thread Berwin A Turlach
G'day Christophe,

On Wed, 16 Jul 2008 15:10:05 +0200
[EMAIL PROTECTED] wrote:

 Hi the list,
 
 When we use package.skeleton, it create some file in the man 
 directorie. - If we use package.skeleton with namespace=FALSE, it 
 create a file toto-internal.Rd
 - If we use package.skeleton with namespace=TRUE, it does not create 
 the file toto-internal.Rd
 
 Why is that ?

My understanding from my reading of Writing R Extension is :

1) All functions/objects of a package that a user can call/see have to
be documented in order for the package passing R CMD check without
warnings/errors.

2) Users can see all functions/objects in packages without a namespace,
hence everything has to be documented.  For functions that you do not
want users to call directly, since they are help functions for the
main functions of your package, you can just create entries in
internal.Rd (or toto-internal.Rd) without writing a complete help
page for these functions.  R CMD check will accept this as
documentation.

3) In packages with a namespace, you decide which function/objects the
user can see by exporting them.  Everything not exported is supposed to
be internal to the package and should not be accessed directly by users
(though they can via :::).  Functions/objects that are not exported do
not need to be documented, hence no need for the toto-internal.Rd stub.

HTH (and HTIC).

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NAMESPACE vs internal.Rd

2008-07-16 Thread Berwin A Turlach
G'day Christophe,

On Wed, 16 Jul 2008 16:11:15 +0200
[EMAIL PROTECTED] wrote:

 So the main idea is - With NAMESPACE, you do not document the 
 not-for-user because they don't have to be documented
 - Witout NAMESPACE, you document the not-for-user with a 
 toto-internal.Rd that say not for user
 
 That's clear.
 
 Is it stupid to consider to use both technique at the same time ?

As far as I know, R will not complain if you write documentation for
objects that you do not have to document.  R CMD check will complain
if an object accessible to a user of the package is not documented.

A quick perusal of the packages installed on my machine shows that
there are several packages that have a namesspace and a -internal
file, e.g. the MASS package (from the VR bundle) and the sm package.

But both these packages existed before the namespace mechanism was
introduced.  So the MASS-internal.Rd and sm-internal.Rd may be
remainders from the time before namespaces were introduced.  

You would have to ask the maintainer(s) of such packages why they use a
namespace and a -internal.Rd file.  I can see no reason but a
historical one that the -internal.Rd file stems from the time before
namespaces and was not deleted after their introduction (I know that I
wouldn't delete such a file if it weren't necessary).

 - some fonction will be accessible (regular function)
 - some function will be hidden (function starting with .)
 - some function will be forbiden (function not in namespace)

We are actually talking objects (everything in R is an object) which
could be anything, a function, a data frame or ...

But if you want to keep the discussion restricted to functions then I
would want to point out that functions that start with a . are only
hidden from the ls() function and that this has nothing to do with a
namespace.

According to my understanding, if your package has a namespace, then
everything that you do not export explicitly is not visible to users of
your package.  The Writing R Extensions manual has an example for an
exportPattern() directive that exports all variables that do not start
with a period, but that would export everything else.  I guess writing
a regular expression that says export everything that does not start
with a dot but do not export foo and bar would be not trivial to write
(at least not for me).

And your distinction between hidden and forbidden functions is
spurious because either function could be accessed via the :::
operator if the user knows that it is in the package (though not
exported). 

Thus, if you use a namespace, then all the objects that you export are
visible to the users of the package;  all other objects are not visible
(but can be accessed via :::).  Objects that are not visible do not
need to be documented (for R CMD check to succeed), but it is no
error to document them.  Objects that are visible to the users of the
package have to be documented.

HTH (and HTIC).

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calculate differences - strange outcome

2008-07-17 Thread Berwin A Turlach
On Thu, 17 Jul 2008 11:47:42 +0200
Kunzler, Andreas [EMAIL PROTECTED] wrote:

 I ran into some trouble by calculating differences. For me it is
 important that differences are either 0 or not.

If that is the case, restrict yourself to calculating with integers. :)

 So I don't understand the outcome of this calculation
 
 865.56-(782.86+0+63.85+18.85+0)
 [1] -1.136868e-13
 
 I run R version 2.71 on WinXP
 
 I could solve my problem by using
 round()
 but I would like to know the reason.
 Maybe someone can help me?

FAQ 7.31 and the reference cited therein should shed light on your
problem.

Best wishes,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NAMESPACE vs internal.Rd

2008-07-17 Thread Berwin A Turlach
G'day Christophe,

On Wed, 16 Jul 2008 18:22:49 +0200
[EMAIL PROTECTED] wrote:

 Thanks for your answer.

My pleasure.

  I guess writing
  a regular expression that says export everything that does not
  start with a dot but do not export foo and bar would be not
  trivial to write (at least not for me).
 
 The NAMESPACE created by package.skeleton contain a single line :
 
 exportPattern(^[[:alpha:]]+)
 
 I guess that it is what you just say...

Not really. :)

The regular expression ^[[:alpha:]]+ matches, as far as I know, all
objects that have one or more alphabetic character at the beginning.

The Writing R Extensions manual suggests the directive

 exportPattern(^[^\\.])

exports all variables that do not start with a period.

As far as I can tell, both these instructions have more or less the
same effect (assuming that there are no objects with non-standard names
in your package; and I am not enough of an R language lawyer to know
what would happen in such a case).

I was commenting on your classification as:

 - some fonction will be accessible (regular function)
 - some function will be hidden (function starting with .)
 - some function will be forbiden (function not in namespace)  

Say, you have accessible functions called fubar, rabuf, main and
something.incredible.useful, some hidden functions whose name start
with a . and forbidden functions (i.e. not exported) with names foo
and bar.  (Though, as I commented earlier, it is practically impossible
to implement forbidden functions, users can always access them if
they want using :::.)

Both of the directives above would export fubar, rabuf, main,
something.incredible.useful, foo and bar.

So my challenge was to come up with a regular expression for
exportPattern that says export everything that does not start with a
dot but do not export foo and bar, i.e. a regular expression that
would only export the accessible functions.

HTC.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] truncated normal

2008-07-24 Thread Berwin A Turlach
G'day all,

On Wed, 23 Jul 2008 20:48:59 -0400
Duncan Murdoch [EMAIL PROTECTED] wrote:

 On 23/07/2008 8:17 PM, cindy Guo wrote:
  Yes, I know. I mean if I want to generate 100 numbers from
  N(0,1)I((0,1),(5,10)). There are two intervals (0,1) and (5,10).
  Then the function will give 50 numbers in the first interval and 50
  in the other.

I can think of two strategies:

1) Use rtnorm from the msm *package* to sample from the normal
distribution truncated to the interval from your smallest lower
bound and your largest upper bound; and then use rejection sampling.

I.e. for your original example N(0,1)I((0,1),(2,4)) sample from a
N(0,1)I(0,4) distribution and reject all observations that are between
1 and 2, sample sufficiently many points until you have kept the
required number.  But this could be rather inefficient for your second
example N(0,1)I((0,1),(5,10)).

2) If I am not mistaken, truncating the normal distribution to more than
one interval essentially creates a mixture distribution where the
components of the mixture are normal distributions truncated to a
single interval.  The weights of the mixture are given by the relative
probability with which an observation from a normal distribution falls
into each of the intervals.  Thus, an alternative strategy, exemplified
using your second example, would be (code untested):

samp1 - rtnorm(100,mean=0,sd=1, lower=0,upper=1)
samp2 - rtnorm(100,mean=0,sd=1, lower=5,upper=10)
p1 - pnorm(1)-pnorm(0)
p2 - pnorm(10)-pnorm(5)  ## Take heed about Duncan's remark on
  ## evaluating such quantities
p - p1/(p1+p2)
ind - runif(100)  p
finalsample - samp1
finalsample[!ind] - samp2[!ind]

HTH.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with which

2008-04-23 Thread Berwin A Turlach
G'day Melanie,

On Wed, 23 Apr 2008 17:46:56 -1000
Melanie Abecassis [EMAIL PROTECTED] wrote:

 This doesn't seem to happen with integers.
 Am I missing something ?? 

Yes, FAQ 7.31:

http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f

 Is there a better function for non-integers ?

 which(sapply(tt, function(x) isTRUE(all.equal(x,1.7
[1] 8

seems to work.

HTH.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is my understanding of rlnorm correct?

2008-05-03 Thread Berwin A Turlach
G'day Phil,

On Sun, 4 May 2008 14:05:09 +1000
phil colbourn [EMAIL PROTECTED] wrote:

 rlnorm takes two 'shaping' parameters: meanlog and sdlog.
 
 meanlog would appear from the documentation to be the log of the
 mean. eg if the desired mean is 1 then meanlog=0.

These to parameters are the mean and the sd on the log scale of the
variate, i.e. if you take the logarithm of the produced numbers then
those values will have the given mean and sd.  

If X has an N(mu, sd^2) distribution, then Y=exp(X) has a log-normal
distribution with parameters mu and sd.  

R set.seed(1)
R y - rlnorm(1, mean=3, sd=2)
R summary(log(y))
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
 -4.343   1.653   2.968   2.987   4.355  10.620 
R mean(log(y))
[1] 2.986926
R sd(log(y))
[1] 2.024713


 I noticed on wikipedia lognormal page that the median is exp(mu) and  
 that the mean is exp(mu + sigma^2/2)
 
 http://en.wikipedia.org/wiki/Log-normal_distribution

Where mu and sigma are the mean and standard deviation of a normal
variate which is exponentiated to obtain a log normal variate.  And
this holds for the above example (upto sampling variation):

R mean(y)
[1] 143.1624
R exp(3+2^2/2)
[1] 148.4132

 So, does this mean that if i want a mean of 100 that the meanlog
 value needs to be log(100) - log(sd)^2/2?

A mean of 100 for the log-normal variate?  In this case any set of mu
and sd for which exp(mu+sd^2/2)=100 (or mu+sd^2/2=log(100)) would do
the trick:

R mu - 2
R sd - sqrt(2*(log(100)-mu))
R summary(rlnorm(1, mean=mu, sd=sd))
 Min.   1st Qu.Median  Mean   3rd Qu.  Max. 
4.010e-04 1.551e+00 7.075e+00 1.006e+02 3.344e+01 3.666e+04 
R mu - 4
R sd - sqrt(2*(log(100)-mu))
R summary(rlnorm(1, mean=mu, sd=sd))
 Min.   1st Qu.Median  Mean   3rd Qu.  Max. 
   0.9965   25.9400   56.0200  101.2000  115.5000 3030. 
R mu - 1
R sd - sqrt(2*(log(100)-mu))
R summary(rlnorm(1, mean=mu, sd=sd))
 Min.   1st Qu.Median  Mean   3rd Qu.  Max. 
9.408e-05 4.218e-01 2.797e+00 8.845e+01 1.591e+01 7.538e+04 

Note that given the variation we would expect in the mean in the last
example, the mean is actually close enough to the theoretical value
of 100:

R sqrt((exp(sd^2)-1)*exp(2*mu + sd^2)/1)
[1] 36.77435

HTH.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about se of predicted glm values

2008-05-05 Thread Berwin A Turlach
G'day Jarrett,
 tapply(y,x,mean)
a b 
 1.00 25.58 
On Mon, 5 May 2008 20:21:26 -0700
Jarrett Byrnes [EMAIL PROTECTED] wrote:

 Hey, all.  I had a quick question about fitting new glm values and  
 then looking at the error around them.  I'm working with a glm using
 a Gamma distribution and a log link with two types of treatments.   
 However, when I then look at the predicted values for each category,
 I find for the one that is close to 0, 

And this does not surprise you, since with your data:

 tapply(y,x,mean)
a b 
 1.00 25.58 

So wouldn't you expect one predicted value to be close to 1 instead of
zero?

 the error (using se.fit=T with predicted) actually makes it overlap
 0.  This is not possible, as non-0 values have no meaning.
 
 Am I missing something in the interpretation?  

Yes. :)

For GLMs, predict returns by default the predicted values on the linear
predictor scale, not on the response scale.  Negative values for the
linear predictor are, of course, possible and may have meaning.

Look closely at the pictures that you have created.  In the first one,
for x=b, the values are around 30, in the picture with the fitted value
the prediction for x=b is around 3; clearly another scale (namely the
scale of the linear predictor).

 #get predicted values and their error
 a.fit-predict(my.glm, data.frame(x=a), se.fit=T)
 b.fit-predict(my.glm, data.frame(x=b), se.fit=T)

Try: 

a.fit-predict(my.glm, data.frame(x=a), se.fit=T, type=response)
b.fit-predict(my.glm, data.frame(x=b), se.fit=T, type=response)

Hope this helps.

Best wishes,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] apply function

2008-05-15 Thread Berwin A Turlach
G'day Shuba,

On Thu, 15 May 2008 12:18:58 +0530
Shubha Vishwanath Karanth [EMAIL PROTECTED] wrote:

 Getting a strange result using ?apply. Please look into the below
 codes:

 d=data.frame(a=c(1,2,3),b=c(A,B,C),c=c(TRUE,FALSE,FALSE),d=c(T,F,F))
 
  class(d[,1])
 
 [1] numeric
 
  class(d[,2])
 
 [1] factor
 
  class(d[,3])
 
 [1] logical
 
  class(d[,4])
 
 [1] logical
 
  apply(d,2,class)
 
   a   b   c   d 
 
 character character character character 
[]
 Why is this so? 

?apply
The first argument to apply is an *array*, not a data.frame.  In an
array, all elements have to be of the same type, so when your
data.frame is coerced into an array the target type of the coercion
depends on which components you select.

 How do I get the actual classes of columns of my dataframe d?

Something like:

R lapply(d, class)
$a
[1] numeric

$b
[1] factor

$c
[1] logical

$d
[1] logical

could be used.

HTH.

Best wishes,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm model syntax

2008-05-16 Thread Berwin A Turlach
G'day Harold,

On Fri, 16 May 2008 11:43:32 -0400
Doran, Harold [EMAIL PROTECTED] wrote:

 N+M gives only the main effects, N:M gives only the interaction, and
 G*M gives the main effects and the interaction. 

I guess this begs the question what you mean with N:M gives only the
interaction ;-)

Consider:

R (M - gl(2, 1, length=12))
 [1] 1 2 1 2 1 2 1 2 1 2 1 2
Levels: 1 2
R (N - gl(2, 6))
 [1] 1 1 1 1 1 1 2 2 2 2 2 2
Levels: 1 2
R dat - data.frame(y= rnorm(12), N=N, M=M)
R dim(model.matrix(y~N+M, dat))
[1] 12  3
R dim(model.matrix(y~N:M, dat))
[1] 12  5
R dim(model.matrix(y~N*M, dat))
[1] 12  4

Why has the model matrix of y~N:M more columns than the model matrix of
y~N*M if the former contains the interactions only and the latter
contains main terms and interactions?  Of course, if we leave the dim()
command away, we will see why.  Moreover, it seems that the model
matrix constructed from y~N:M has a redundant column.

Furthermore:

R D1 - model.matrix(y~N*M, dat)
R D2 - model.matrix(y~N:M, dat)
R resid(lm(D1~D2-1))

Shows that the column space created by the model matrix of y~N*M is
completely contained within the column space created by the model
matrix of y~N:M, and it is easy to check that the reverse is also
true.  So it seems to me that y~N:M and y~N*M actually fit the same
models.  To see how to construct one design matrix from the other, try:

R lm(D1~D2-1)

Thus, I guess the answer is that y~N+M fits a model with main terms
only while y~N:M and y~N*M fit the same model, namely a model with main
and interaction terms, these two formulations just create different
design matrices which has to be taken into account if one tries to
interpret the estimates.

Of course, all the above assumes that N and M are actually factors,
something that Birgit did not specify.  If N and M (or only one of
them) is a numeric vector, then the constructed matrices might be
different, but this is left as an exercise. ;-)  (Apparently, if N and
M are both numeric, then your summary is pretty much correct.)

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Integer / floating point question

2008-05-16 Thread Berwin A Turlach
G'day Erik,

On Fri, 16 May 2008 10:45:43 -0500
Erik Iverson [EMAIL PROTECTED] wrote:

[...]
 The help page for '%%' addresses this a bit, but then caveats it with 
 'up to rounding error', which is really my question.  Is there ever 
 'rounding error' with 2.0 %% 1 as opposed to 2 %% 1?

I am not in the position to give an authoritative answer, but I think
there should be no problem with rounding error in the situation that
you describe.  At least I hope there is no problem, otherwise I would
consider this a serious issue. :)

  However, my question is related to R FAQ 7.31, Why doesn't R
  think these numbers are equal? The first sentence of that FAQ
  reads, The only numbers that can be represented exactly in R's
  numeric type are integers and fractions whose denominator is a
  power of 2.

Again, I did not write this FAQ answer and cannot give an authoritative
answer, but the word integer in that answer does not IMHO refer to
variables in R that are of integer type; in particular since the answer
discusses what kind of numbers can be represented exactly in R's
numeric type. (Perhaps this should actually be plural since there are
several numeric types?)

My interpretation is that 2.0 and 2 are both *text constants* that
represent the integer 2, and that number is representable in a floating
point (and in an integer).

The paper by Goldberg, referenced in FAQ 7.31, contains a discussion on
whether it is possible (it is) to convert a floating point number
from binary representation to decimal representation and then back;
ending up with the same binary representation.  This kind of questions
are important if you use commands like write.table() or write.csv()
which write out floating points in decimal representation, readable to
normal humans.  When you read the data back in, you want to end up with
the exact same binary representation of the numbers.  Goldberg is
indeed an interesting paper to read.

And the comments I made above are based on my understanding of
Goldberg, 2 and 2.0 are both decimal representation of the integer 2,
and this number has an exact representation (in integer type variables
and in floating point type variables).  Hence, both these decimal
representation should lead to a binary representation that correspond
to that number exactly.

Thus, I would expect 
R x - 2.0
R x %% 1 == 0
always to work and to return TRUE.  It is things like:
R x - sqrt(2.0)^2
R x %% 1 == 0
that FAQ 7.31 is about and what, IMHO, the comment in the help page
of %% warns about; if the variable x contains a value that was created
by some finite precision floating point calculations.  But the
conversion from a textual representation of an integer to a binary
representation should not create problems.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Integer / floating point question

2008-05-16 Thread Berwin A Turlach
G'day Brian,

On Fri, 16 May 2008 19:28:59 +0100 (BST)
Prof Brian Ripley [EMAIL PROTECTED] wrote:

 'numeric' is a class but not a type -- so I think the FAQ is wrongly 
 worded but the concept is well defined 

Though there may be multiple definitions. :-)

Reading a bit in R Language Definition (yes, I am aware that it is a
draft), 'numeric' seems to be also used as a mode.  I guess this comes
from S3, the mode (no pun intended) that I am still using and thinking
in mostly, and that 'numeric' being a class came in with S4.  And I
notice that mode(1L) and class(1L) gives different results, the former
returns numeric while the latter returns integer.
 
Hence, when I read R's numeric type in the FAQ, I take this as
referring to those basic types that are of mode numeric, i.e. integer
and real.  I am not sure whether changing this to R's object of
mode 'numeric' or R's object class 'numeric' will make the answer
more readable/understandable. 

 But it does not say that all such numbers can be represented
 exactly, and only some can.

I am well aware that integers and reals can only hold integers and
fractions whose denominator is a power of 2 from a limited range; in
particular, the former will not hold any fractions.  However, given the
discussion in Goldberg (which FAQ 7.31 points to) on changing from
binary to decimal representation and back, I would expect that given
any such number (within the appropriate range) in decimal
representation, it would be transformed into the correct, exact binary
representation. However, I also know that if I stumble across an
example where this is not the case, I shall only complain after
ensuring that documented behaviour is violated and that the FAQ not
necessarily documents expected behaviour.

I guess Duncan's answer said it all, at the extreme ends of the
appropriate ranges surprises might lurk, but it would be considered a
serious bug if R did not transform small integers into their exact
binary representation.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sum of unknown number of matrices

2008-06-04 Thread Berwin A Turlach
G'day Shubha,

On Wed, 4 Jun 2008 20:23:35 +0530
Shubha Vishwanath Karanth [EMAIL PROTECTED] wrote:

 Something like do.call(+,l) is not working...why is this?

Well, as the error message says, + is either a unary or a binary
operator, i.e. it takes either one or two arguments, but not more.

 I may not be knowing the number of matrices in the list...

This is perhaps a bit complicated but it works:

R a=b=c=d=matrix(1:4,2,2)
R l=list(a,b,c,d)
R library(abind)  ## may have to install this package first
R apply(do.call(abind, list(l, along=3)), 1:2, sum)
 [,1] [,2]
[1,]4   12
[2,]8   16

HTH.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The .tex version of the manual in foo.Rcheck

2009-04-28 Thread Berwin A Turlach
G'day Bendix,

On Mon, 27 Apr 2009 22:52:02 +0200
BXC (Bendix Carstensen) b...@steno.dk wrote:

 In version 2.8.1, running Rcmd check on the package foo would leave
 the file foo-manual.tex in the folder foo.Rcheck.
 
 But as of 2.9.0 only foo-manual.pdf and foo-manual.log are there.
 
 Is this intentional?

I do not know whether it is intentional, but it seems that parts
of `check' were substantially re-written due to the new Rd parser.  As
far as I can understand the Perl code, the .tex file is only copied to
the .Rcheck directory if an error occurs while the file is processed.

But this can easily be changed by moving two lines in the script a bit
higher, see the attached patch.  

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: sta...@nus.edu.sg
Singapore 117546http://www.stat.nus.edu.sg/~statba
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The .tex version of the manual in foo.Rcheck

2009-04-29 Thread Berwin A Turlach
G'day Uwe,

On Wed, 29 Apr 2009 11:03:43 +0200
Uwe Ligges lig...@statistik.tu-dortmund.de wrote:

 you can also take a source package an run
 
 R CMD Rd2dvi --no-clean packageName
 
 on it and you will get a temporary directory with the TeX sources in
 it.

Which is fine for manual processing and when done once or twice, but
somewhat less helpful for automatic processing in scripts since the
name of the temporary directory is hard to predict.

`R CMD check foo' copies already unconditionally the foo-manual.log
file from the temporary directory to the foo.Rcheck directory; and the
foo-manual.tex file is copied if an error occurs during processing.
What is the problem with also copying the foo-manual.tex file
unconditionally to foo.Rcheck?

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splitting a vector into equal groups

2009-05-04 Thread Berwin A Turlach
G'day Utkarsh,

On Mon, 04 May 2009 11:51:21 +0530
utkarshsinghal utkarsh.sing...@global-analytics.com wrote:

 I have vector of length 52, say, x=sample(30,52,replace=T). I want to 
 sort x and split into five *nearly equal groups*.

What do you mean by *nearly equal groups*?  The size of the groups
should be nearly equal? The sum of the elements of the groups should be
nearly equal?

 Note that the observations are repeated in x so in case of a tie I
 want both the observations to fall in same group.

Then it becomes even more important to define what you mean with
nearly equal groups.

As a start, you may consider:

R set.seed(1)
R x=sample(30,52,replace=T)
R xrle - rle(sort(x))
R xrle
Run Length Encoding
  lengths: int [1:25] 2 1 2 2 3 1 1 1 5 1 ...
  values : int [1:25] 1 2 4 6 7 8 9 11 12 13 ...
R cumsum(xrle$lengths)
 [1]  2  3  5  7 10 11 12 13 18 19 24 25 26 28 29 32 35 38
[19] 43 45 46 48 49 51 52

and use this to determine our cut-offs.  E.g., should the first group
have 10, 11 or 12 elements in this case?  The information in xrle
should enable you to construct your five groups once you have decided
on a grouping.

HTH.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: sta...@nus.edu.sg
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Beyond double-precision?

2009-05-09 Thread Berwin A Turlach
G'day all,

On Sat, 09 May 2009 08:01:40 -0700
spencerg spencer.gra...@prodsyse.com wrote:

   The harmonic mean is exp(mean(logs)).  Therefore, log(harmonic 
 mean) = mean(logs). 
 
   Does this make sense?

I think you are talking here about the geometric mean and not the
harmonic mean. :)

The harmonic mean is a bit more complicated.  If x_i are positive
values, then the harmonic mean is

H= n / (1/x_1 + 1/x_2 + ... + 1/x_n)

so

log(H) = log(n) - log( 1/x_1 + 1/x_2 + ... + 1/x_n)

now log(1/x_i) = -log(x_i) so if log(x_i) is available, the logarithm
of the individual terms are easily calculated.  But we need to
calculate the logarithm of a sum from the logarithms of the individual
terms.  

At the C level R's API has the function logspace_add for such tasks, so
it would be easy to do this at the C level.  But one could also
implement the equivalent of the C routine using R commands.  The way to
calculate log(x+y) from lx=log(x) and ly=log(y) according to
logspace_add is:

  max(lx,ly) + log1p(exp(-abs(lx-ly)))

So the following function may be helpful:

logadd - function(x){
  logspace_add - function(lx, ly)
max(lx, ly) + log1p(exp(-abs(lx-ly)))

  len_x - length(x)
   if(len_x  1){
res - logspace_add(x[1], x[2])
if( len_x  2 ){
  for(i in 3:len_x)
res - logspace_add(res, x[i])
}
  }else{
res - x
  }
  res
}

R set.seed(1)
R x - runif(50)
R lx - log(x)
R log(1/mean(1/x))  ## logarithm of harmonic mean
[1] -1.600885
R log(length(x)) - logadd(-lx)
[1] -1.600885

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: sta...@nus.edu.sg
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unintended loading of package:datasets

2009-05-10 Thread Berwin A Turlach
G'day David,

On Sun, 10 May 2009 17:17:38 -0400
David A Vavra dava...@verizon.net wrote:

 The dataset package is being loaded apparently by one of the packages
 that I am using. The loading of the datasets takes a long time and I
 would like to eliminate it. I thought the datasets were effectively
 examples so don't understand why they would be required at all.
 
 1) How can I determine what is causing the datasets to be loaded?

help(options)
and then read the part on defaultPackages

 2) How can I stop them from doing so?

Create an appropriate .Rprofile.  R-DownUnder had long time ago a
discussion on what people put into their .Rprofile,  and Bill
Venables' .Rprofile seem to contain the following snippet:

### puts four more packages on to the default
### search path.  I use them all the time
local({
  old - getOption(defaultPackages)
  options(defaultPackages = c(old, splines,
lattice, mgcv, MASS))
}) 
 
 I am using the following:
 
 Rpart, grDevices, graphics, stats, utils, methods, base
 There is also an environment named 'Autoloads'

Rpart??  Or rpart??  I know of a package with the latter name but not
the former, and R is case sensitive.

So you may try:

local({
  options(defaultPackages=c(rpart, grDevices, graphics, stats,
   utils, methods)
})

FWIW, note that for command XXX, help(XXX) will give you a help page
that has usually some example code at the end, that code can be run by
example(XXX).  Typically, such example code is using data sets from
the datasets package; so if you do not load it, the example() command
might not work anymore for some functions.  

Don't complain if your R installation doesn't work anymore if you mess
around with defaultPackages, in particular if you remove packages that
are usually in the default of defaultPackages. :)

And I agree with Rolf (Turner), it is hard to believe that the
datasets package would produce a noticeable delay on start-up; for me
it also loads instantaneously.  I guess that your problem is more
along David's (Winsemuis) guess. 

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unintended loading of package:datasets

2009-05-10 Thread Berwin A Turlach
G'day David,

On Sun, 10 May 2009 21:35:30 -0400
David A Vavra dava...@verizon.net wrote:

 Thanks. Perhaps something else is going on. There is a large time
 period (about 20 sec.) after the message about loading the package.
 More investigation, I suppose.

What R version are you using?  I do not remember ever getting a message
that the package datasets is loaded, nor a message for any of the other
packages that are loaded by default.  (There once, long ago, may have
been such a message, but if so, I do not remember this anymore.)

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Getting lm() to work with a matrix

2009-05-20 Thread Berwin A Turlach
G'day Luc,

On Wed, 20 May 2009 09:58:41 -0400
Luc Villandre villa...@dms.umontreal.ca wrote:

 MikSmith wrote:
  [...]

 Indeed, functions like /lm()/ require the object fed to the /data/ 
 argument to be either [...]

But the data argument is optional and does not need to be specified.

 In your situation, I suggest you typecast your matrix into a data
 frame using /as.data.frame()/.  [...]

My guess is that he is already working with a data frame and does not
work with matrices, otherwise he should not have encountered problems:

R response - matrix(rnorm(120), ncol=4)
R spectra.spec - matrix(rnorm(900), ncol=30)
R spectra.lm - lm(response[,3]~spectra.spec[,2:20])
R spectra.lm

Call:
lm(formula = response[, 3] ~ spectra.spec[, 2:20])

Coefficients:
   (Intercept)   spectra.spec[, 2:20]1  
  -0.48404 0.42503  
 spectra.spec[, 2:20]2   spectra.spec[, 2:20]3  
  -0.08955-0.27605  
 spectra.spec[, 2:20]4   spectra.spec[, 2:20]5  
  -0.16832-0.14107  
 spectra.spec[, 2:20]6   spectra.spec[, 2:20]7  
  -0.47009-0.23672  
 spectra.spec[, 2:20]8   spectra.spec[, 2:20]9  
   0.12920 0.23306  
spectra.spec[, 2:20]10  spectra.spec[, 2:20]11  
  -0.28586 0.03579  
spectra.spec[, 2:20]12  spectra.spec[, 2:20]13  
   0.10676-0.34407  
spectra.spec[, 2:20]14  spectra.spec[, 2:20]15  
   0.20253-0.17259  
spectra.spec[, 2:20]16  spectra.spec[, 2:20]17  
   0.19765 0.40705  
spectra.spec[, 2:20]18  spectra.spec[, 2:20]19  
  -0.12448-0.17149  

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrained fits: y~a+b*x-c*x^2, with a,b,c =0

2009-05-27 Thread Berwin A Turlach
G'day Alex,

On Wed, 27 May 2009 11:51:39 +0200
Alex van der Spek am...@xs4all.nl wrote:

 I wonder whether R has methods for constrained fitting of linear
 models.
 
 I am trying fm-lm(y~x+I(x^2), data=dat) which most of the time gives
 indeed the coefficients of an inverted parabola. I know in advance
 that it has to be an inverted parabola with the maximum constrained to
 positive (or zero) values of x.
 
 The help pages for lm do not contain any info on constrained fitting.
 
 Does anyone know how to?

Look at the package nnls on CRAN.

According to your subject line, you are trying to solve what is known
as a quadratic program, and there are at least two quadratic
programming solvers (ipop in kernlab and solve.qp in quadprog)
available for R.

HTH.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: sta...@nus.edu.sg
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rbinom with computed probability

2007-11-24 Thread Berwin A Turlach
G'day Sigalit,

On Fri, 23 Nov 2007 11:25:30 +0200
sigalit mangut-leiba [EMAIL PROTECTED] wrote:

 Hello,
 I have a loop with probability computed from a logistic model like
 this: for (i in 1:300){
 
 p[i]-exp(-0.834+0.002*x[i]+0.023*z[i])/(1+exp(-0.834+0.002*x[i]+0.023
 +z[i]))
 
 x and z generated from normal distribution.
 I get 300 different probabilities And I want to generate variables
 from bernulli distribution
 with P for every observation:
 
 T[i]-rbinom(1,1,p[i])
 But i get missing values for T.
 What I'm doing wrong?

I guess the problem is that in the numerator you have 0.023*z[i] but
0.023+z[i] in the denominator.  Thus, some p[i] can be outside of
[0,1] which would produce NAs in T.

But why a for loop? This code is readily vectorised:

p - exp(-0.834+0.002*x+0.023*z)/(1+exp(-0.834+0.002*x+0.023*z))

HTH.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quadratic programming

2007-12-05 Thread Berwin A Turlach
G'day Serge,

On Wed, 5 Dec 2007 11:25:41 +0100
de Gosson de Varennes Serge (4100)
[EMAIL PROTECTED] wrote:

 I am using the quadprog package and its solve.QP routine to solve and
 quadratic programming problem with inconsistent constraints, which
 obviously doesn't work since the constraint matrix doesn't have full
 rank. 

I guess it will help to fix some terminology first.  In my book,
inconsistent constraints are constraints that cannot be fulfilled
simultaneously, e.g. something like x_1 = 3 and x_1 = 5 for an
obvious example.  Thus, a problem with inconsistent constraints cannot
be solved, regardless of the rank of the constraint matrix.  (Anyway,
that matrix is typically not square, so would be be talking about full
column rank or full row rank?)

Of course, it can happen that the constraints are consistent but that 
there are some redundancy in the specified constraints, e.g. a simply
case would be x_1 = 0, x_2 = 0 and x_1+x_2 = 0; if the first
two constraints are fulfilled, then the last one is automatically
fulfilled too. In my experience, it can happen that solve.QP comes to
the conclusion that a constraint that ought to be fulfilled, given the
constraints that have already been enforced, is deemed to be violated
and to be inconsistent with the constraints already enforced. In that
case solve.QP stops, rather misleadingly, with the message that the
constraints are inconsistent.  

I guess the package should be worked over by someone with a better
understanding of the kind of fudges that do not come back to bite and of
finite precision arithmetic than the original author's appreciation of
such issues when the code was written. ;-))

 A way to solve this is to perturb the objective function and
 the constraints with a parameter that changes at each iteration (so
 you can dismiss it), but that's where it gets tricky! Solve.QP
 doesn't seem to admitt constant terms, it need Dmat (a matrix) and
 dvec (a vector) as defined in the package description. Now, some
 might object that a constant is a vector but the problem looks like
 this
 
 Min f(x) = (1/2)x^t Q x + D^t x + d

It is a bit unclear to me what you call the constant term.  Is it `d'?
In that case, it does not perturb the constraints and it is irrelevant
for the minimizer of f(x); also for the minimizer of f(x) under linear
constraints.  Regardless of d, the solution is always the same.  I do
not know of any quadratic programming solver that allows `d' as input,
probably because it is irrelevant for determining the solution of the
problem.

 Can anyone help me, PLEASEEE?

In my experience, rescaling the problem might help, i.e. use Q* = Q/2
and D*=D/2 instead of the original Q and D; but do not forget to
rescale the constraints accordingly.  

Or you might want to try another quadratic program solver in R, e.g.
ipop() in package kernlab.

Hope this helps.

Best wishes,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Building package - tab delimited example data issue

2007-12-06 Thread Berwin A Turlach
G'day Peter,

On Thu, 06 Dec 2007 11:52:46 +0100
Peter Dalgaard [EMAIL PROTECTED] wrote:
  
 If you had looked at help(data), you would have found a list of which
 file formats it supports and how they are read. Hint: TAB-delimited
 files are not among them. [...]

On the other hand, Writing R Extensions has stated since a long time
(and still does):

The @file{data} subdirectory is for additional data files the package
makes available for loading using @code{data()}.  Currently, data files
can have one of three types as indicated by their extension: plain R
code (@file{.R} or @file{.r}), tables (@file{.tab}, @file{.txt}, or
@file{.csv}), or @code{save()} images (@file{.RData} or @file{.rda}).

Now in my book, .csv files contain comma separated values, .tab files
contain values separated by TABs and .txt files are pure text files,
presumably values separated by any kind of white space. 

Thus, I think that the expectation that TAB-delimited file formats
should work is not unreasonable; I was long time ago bitten by this
too. Then I realised that the phrase one of the three types should
probably be interpreted as implying that .tab, .txt and .csv files are
all of the same type and, apparently, should contain values separated
by whitespace.  I admit that I never tested whether .csv files would
lead to the same problems as TAB delimited .tab files. Rather, I decided
in the end that the safest option, i.e. to avoid misleading file
extensions, would be to use .rda files in the future.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm/model.matrix confusion (? bug)

2007-12-12 Thread Berwin A Turlach
G'day Mark,

On Wed, 12 Dec 2007 02:05:54 -0800 (PST)
Mark Difford [EMAIL PROTECTED] wrote:

 In order to get the same coefficients as we get from the following
[...] 
 we need to do the following (if we use model.matrix to specify the
 model)

By why would you want to do this?

 ##
 summary ( lm(Gas ~ model.matrix(~ Insul/Temp - 1) - 1, data =
 whiteside) )
 
 That is, we need to take out two intercepts.  Is this correct?

Yes.
 
 Shouldn't lm check to see if an intercept has been requested as part
 of the model formula?

No, it does not.  In the Details section of lm's help page you will
find the following:

 A formula has an implied intercept term.  To remove this use
 either 'y ~ x - 1' or 'y ~ 0 + x'.  See 'formula' for more details
 of allowed formulae.

Thus, except if you explicitly ask for a constant term not be included,
lm will add a constant term (a column of ones) additionally to what
ever you have specified on the right hand side of the formula.

 If I do
 ##
 summary(lm(as.formula(Gas ~ model.matrix (~ Insul/Temp-1,
 data=whiteside)), data=whiteside))
 
 we get a strange model.  

Well, you get a model in which not all parameters are identifiable, and
a particular parameter that is not identifiable is estimated by NA.  I
am not sure what is strange about this.

 But the formula part of this model qualifies
 as a valid formula
 ##
 as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside))

Debatable, the above command only shows that it can be coerced into a
valid formula. :)

 just as if I were to write: lm(Gas ~ Insul/Temp - 1, data=whiteside)
 
 But we know that the _correct_ formula is the following
 
 ##
 as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside) -1)

Why is this formula any more correct than the other one?  Both specify
exactly the same model.  It is just that one does it in an
overparameterised way.

 (Sorry, this is getting really long) --- So, my question/confusion
 comes down to wanting to know why lm() doesn't check to see if an
 intercept has been specified when the model has been specified using
 model.matrix.

Because lm() is documented not to check this.  If you do not want to
have an intercept in the model you have to specifically ask it for.

Also, comparing the output of 
summary( lm(Gas ~ Insul/Temp - 1, data = whiteside) )
and
summary( lm(Gas ~ Insul/Temp, data = whiteside ) )

you can see that lm() does not check whether there is an implicit
intercept in the model.  Compare the (Adjusted) R-squared values
returned; one case is using the formula for models with no intercept
the other one the formula for models with intercept.  Similar story
with the reported F-statistics.  

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looking for simpler solution to probabilistic question

2008-01-15 Thread Berwin A Turlach
G'day Rainer,

On Tue, 15 Jan 2008 14:24:08 +0200
Rainer M Krug [EMAIL PROTECTED] wrote:

 I have two processes which take with a certain probability (p1 and
 p2) x number of years to complete (age1 and age2). As soon as thge
 first process is completed, the second one begins. I want to
 calculate the time it takes for the both processes to be completed.
 
 I have the following script which gives me the answer, butI think
 there must be a more elegant way of doing the calculations between
 the #

The code between the  together with the first line after the second
 could be just shortened to:

 ager - range(age1) + range(age2)
 ager - ager[1]:ager[2]
 pp1 - c(cumsum(p1), rev(cumsum(rev(p1
 pp2 - c(cumsum(p2[-21]), rev(cumsum(rev(p2)))[-1])
 pr - pp1+pp2
 pr - pr/sum(pr)

 all.equal(p, pr)
[1] TRUE
 all.equal(age, ager)
[1] TRUE


If this is more elegant is probably in the eye of the beholder, but it
should definitely use less memory. :)

BTW, I am intrigued, in which situation does this problem arise?  The
time it takes the second process to finish seems to depend in a curious
way on the time it took the first process to complete.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lapply without return values?

2008-01-25 Thread Berwin A Turlach
G'day Rainer,

On Fri, 25 Jan 2008 10:34:32 +0200
Rainer M Krug [EMAIL PROTECTED] wrote:

[...] 

   p - data.frame(runif(10), runif(10), runif(10))
   lapply( p, function(ps) {x11(); plot(ps)} )
 
 which results in three graphs and a printout:
[...]
 How can I avoid this printout without using
 
 tmp - lapply( p, function(ps) {x11(); plot(ps)} )?

?invisible

like in invisible(lapply( p, function(ps) {x11(); plot(ps)} ))

Note, your solution seems to involve less keystroke but has the
disadvantage of creating an object in your workspace.  

Of course, you could always do something like:

 ilapply - function(...) invisible(lapply(...))

## perhaps better:
## ilapply - function(X, FUN, ...) invisible(lapply(X, FUN, ...))

 ilapply(p, function(ps) {x11(); plot(ps)})

To save keystrokes in the long run. :)

HTH.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] primitives again

2009-03-14 Thread Berwin A Turlach
G'day Edna,

On Sat, 14 Mar 2009 22:52:38 -0500
Edna Bell edna.bel...@gmail.com wrote:

 Dear R Gurus:

Well, I guess I can answer nevertheless. :)

 How do I find the functions which are primitives, please?

?is.primitive

Thus, the following code would give you all the primitive functions in
package base:

R pos - package:base
R uu - ls(pos=pos, all=TRUE)
R tt - sapply(uu, function(x) is.primitive(get(x, pos=pos)))
R rr - uu[tt]
R rr

You may want to search other packages too.  Or modify the code such
that you loop over your search path and concatenate the results from
the various locations.

HTH.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] builtin vs. closure

2009-03-14 Thread Berwin A Turlach
G'day Edna,

On Sat, 14 Mar 2009 23:00:12 -0500
Edna Bell edna.bel...@gmail.com wrote:

 Anyway, my question now is:  what determines if a function is a
 builtin vs. a closure, please?

The help page of typeof states that:

The possible values are listed in the structure 'TypeTable' in
'src/main/util.c'. Current values are [...]
   'closure' (function)
   'special' and 'builtin' (basic functions and operators)

Which might raise the question what basic functions are.  But since
basic and primitive are synonyms it is not hard to guess that
primitive functions are returning special or builtin; and the help
page of ?is.primitive confirms this.  

Note, a return value of special is possible for a primitive function:
R UseMethod
function (generic, object)  .Primitive(UseMethod)
R typeof(UseMethod)
[1] special

HTH.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] primitives again

2009-03-15 Thread Berwin A Turlach
On Sun, 15 Mar 2009 14:23:40 +0100
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

 Edna Bell wrote:
  How do I find the functions which are primitives, please?

 
 you can scan the whole search path for functions that are primitives:
 
 primitives = sapply(search(), function(path)
  with(as.environment(path), Filter(is.primitive, lapply(ls(),
 get
 
 primitives is a list of named lists of primitives, one sublist for
 each attached package (most sublists will be empty, i guess).

The code above will miss some primitives in package:base, namely those
that start with a dot:

R primitives - sapply(search(), function(path)
+   with(as.environment(path), 
+Filter(is.primitive, lapply(ls(), get
R sapply(primitives, length)
   .GlobalEnv package:stats  package:graphics package:grDevices 
0 0 0 0 
package:utils  package:datasets   package:methods Autoloads 
0 0 2 0 
 package:base 
  176 
R primitives - sapply(search(), function(path)
+   with(as.environment(path), 
+Filter(is.primitive, lapply(ls(all=TRUE), get
R sapply(primitives, length)
   .GlobalEnv package:stats  package:graphics package:grDevices 
0 0 0 0 
package:utils  package:datasets   package:methods Autoloads 
0 0 2 0 
 package:base 
  188 

Also, but that is a matter of taste, it could be preferable to use
sapply instead of lapply:

R primitives$`package:methods`
[[1]]
function (expr)  .Primitive(quote)

[[2]]
.Primitive([[-)

R head(primitives$`package:base`)
[[1]]
function (x)  .Primitive(!)

[[2]]
function (e1, e2)  .Primitive(!=)

[[3]]
.Primitive($)

[[4]]
.Primitive($-)

[[5]]
function (e1, e2)  .Primitive(%%)

[[6]]
function (x, y)  .Primitive(%*%)

R primitives - sapply(search(), function(path)
+with(as.environment(path), 
+ Filter(is.primitive, sapply(ls(all=TRUE), get
R primitives$`package:methods`
$Quote
function (expr)  .Primitive(quote)

$`el-`
.Primitive([[-)

R head(primitives$`package:base`)
$`!`
function (x)  .Primitive(!)

$`!=`
function (e1, e2)  .Primitive(!=)

$`$`
.Primitive($)

$`$-`
.Primitive($-)

$`%%`
function (e1, e2)  .Primitive(%%)

$`%*%`
function (x, y)  .Primitive(%*%)


Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] .Internal

2009-03-18 Thread Berwin A Turlach
G'day Kevin,

On Wed, 18 Mar 2009 21:46:51 -0700
rkevinbur...@charter.net wrote:

 I was trying to find source for optimize and I ran across
 
 function (f, interval, ..., lower = min(interval), upper =
 max(interval), maximum = FALSE, tol = .Machine$double.eps^0.25) 
 {
 if (maximum) {
 val - .Internal(fmin(function(arg) -f(arg, ...), lower, 
 upper, tol))
 list(maximum = val, objective = f(val, ...))
 }
 else {
 val - .Internal(fmin(function(arg) f(arg, ...), lower, 
 upper, tol))
 list(minimum = val, objective = f(val, ...))
 }
 }
 
 Then I did a search for fmin and i came up with:
 
 /* fmin(f, xmin, xmax tol) */
 SEXP attribute_hidden do_fmin(SEXP call, SEXP op, SEXP args, SEXP rho)
 
 
 So my question is where do I find the intermediary step between 
 
 .Internal(fmin(function(arg) f(arg, ...), lower,  upper, tol))
 
 and
 
 SEXP attribute_hidden do_fmin(SEXP call, SEXP op, SEXP args, SEXP rho)

@Article{Rnews:Ligges:2006,
  author   = {Uwe Ligges},
  title= {{R} {H}elp {D}esk: {Accessing} the Sources},
  journal  = {R News},
  year = 2006,
  volume   = 6,
  number   = 4,
  pages= {43--45},
  month= {October},
  url  = http,
  pdf  = Rnews2006-4
}

http://CRAN.R-project.org/doc/Rnews/Rnews_2006-4.pdf

 The number of arguments doesn't match up. I am guessing that lower
 and upper somehow get merged into the args. And rho is 'tol'. Right?

Unlikely.  In Writing R Extensions (and the functions I looked up),
'rho' usually denotes an environment that is used to evaluate
expressions in.  Typically (i.e. in cases that I had need to look at),
all arguments are rolled into the SEXP arg for internal functions.

HTH.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] If statement generates two outputs

2009-03-23 Thread Berwin A Turlach
G'day Carl,

On Mon, 23 Mar 2009 20:11:19 -0400
Carl Witthoft c...@witthoft.com wrote:

  From: Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk_at_idi.ntnu.no
  Date: Sun, 22 Mar 2009 22:58:49 +0100
 
 
  just for fun, you could do this with multiassignment, e.g., using
  the (highly experimental and premature!) rvalues:
 
  source('http://miscell.googlecode.com/svn/rvalues/rvalues.r') 
  if (TRUE)
 
 c(df1, df2) := list(4:8, 9:13)
 
  dput(df1)
  # 4:8
  dput(df2)
  # 9:13
 
 
 ---
 Now THAT's what I call an overloaded operator!   ^_^
 
 But seriously:  can someone explain to me what's going on in the 
 rvalues.r code?  I tried a simple experiment, replacing := with a 
 colec in the code, and of course the line
 
 c(df1, df2) colec list(4:8, 9:13)
 
 
 just gives me a syntax error response.   Clearly I need a pointer
 to some documentation about how the colon and equals sign get
 special treatment somewhere inside R.

Not sure why := gets a special treatment, perhaps because it is not a
valid name and, hence, the parser deduces that it is an operator?

IIRC, the traditional way to define your own operator is to bound the
name by percentage signs, i.e. replacing := by %colec% and then
issuing the command

c(df1, df2) %colec% list(4:8, 9:13)

will work.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: sta...@nus.edu.sg
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] .Internal

2009-03-23 Thread Berwin A Turlach
G'day Kevin,

On Mon, 23 Mar 2009 18:48:16 -0400
rkevinbur...@charter.net wrote:

 Sorry to be so dense but the article that you suggest does not give
 any information on how the arguments are packed up. I look at the
 call:
 
 val - .Internal(fmin(function(arg) -f(arg, ...), lower, upper, tol))
 
 and then with the help of this article I find do_fmin in optimize.c:
 
 SEXP attribute_hidden do_fmin(SEXP call, SEXP op, SEXP args, SEXP rho)

Sorry for not being clear enough.  Yes, the article tells you how to
find out that do_fmin is (eventually) called when you call optimize on
the command line.  I thought that this was one of your questions.

 Again there doesn't seem to be any coorespondance between lower,
 upper, tol and the arguments to do_fmin. So I am missing a step.

As far as I know, .Internal has the same interface as .External.  So a
study of Writing R Extensions should give insight regarding this
step.  In particular Chapter 5 (System and foreign language interfaces)
- Section 5.10 (Interface functions .Call and .External) - Section
5.10.2 (Calling .External).

Essentially, all arguments to a C function called via .Internal
or .External are passed as a single SEXP; this allows to pass an
unlimited number of arguments to a C function as all other interfaces
to native routines (.C, .Call, .Fortran) have some limit, albeit a
rather generous one, on the number of arguments that can be passed to
the native routine.

I believe you can think of that single SEXP as a list containing the
individual arguments.  Accessing those arguments one by one involves
macros with names like CDR, CAR, CADR, CADDR, CADDDR, CAD4R and so
forth.  As I understand it, if you are fluent in Lisp (Scheme?) and how
components of a list are referred to in that language, then you have no
problems with understanding the names of those macros.  Since I never
became comfortable with those languages, I restrict myself to .C
and .Call; YMMV.

HTH.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: sta...@nus.edu.sg
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Converting a Matrix to a Vector

2009-03-25 Thread Berwin A Turlach
G'day Ken,

On Wed, 25 Mar 2009 00:13:48 -0700 (PDT)
Ken-JP kfmf...@gmail.com wrote:

 
 Say I have:
 
  set.seed( 1 )
  m - matrix( runif(5^2), nrow=5, dimnames =
  list( c(A,B,C,D,E), c(O,P,Q,R,S) ) )
  m
   O  P Q R S
 A 0.2655087 0.89838968 0.2059746 0.4976992 0.9347052
 B 0.3721239 0.94467527 0.1765568 0.7176185 0.2121425
 C 0.5728534 0.66079779 0.6870228 0.9919061 0.6516738
 D 0.9082078 0.62911404 0.3841037 0.3800352 0.121
 E 0.2016819 0.06178627 0.7698414 0.7774452 0.2672207
 
 ---
 
 I want to create a vector v from matrix m that looks like this:
 
 A.O 0.2655087
 B.O 0.3721239
 
 v - as.vector( m ) almost gives me what I want, but then I need to
 take combinations of colnames( m ) and rownames( m ) to get my labels
 and hope they match up in order: if not, manipulate the order.  This
 approach feels kludgy...
 
 Is this the right approach or is there a better way?

R tt - reshape(data.frame(m), direction=long, varying=list(1:5), 
ids=rownames(m), times=colnames(m))
R ind - names(tt) %in% c(time, id)
R uu - tt[,!ind]
R names(uu) - rownames(tt)
R uu
   A.OB.OC.OD.OE.OA.PB.P 
0.26550866 0.37212390 0.57285336 0.90820779 0.20168193 0.89838968 0.94467527 
   C.PD.PE.PA.QB.QC.QD.Q 
0.66079779 0.62911404 0.06178627 0.20597457 0.17655675 0.68702285 0.38410372 
   E.QA.RB.RC.RD.RE.RA.S 
0.76984142 0.49769924 0.71761851 0.99190609 0.38003518 0.77744522 0.93470523 
   B.SC.SD.SE.S 
0.21214252 0.65167377 0.1210 0.26722067 

HTH.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: sta...@nus.edu.sg
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conerned about Interfacing R with Fortran

2009-03-26 Thread Berwin A Turlach
G'day Maura,

On Thu, 26 Mar 2009 18:21:01 +0100
mau...@alice.it wrote:

 I am reading the manual sections illustrating how to call a Fortran
 subroutine from R. I feel uneasy at the explicit statement about
 .Fortran interface working with Fortran 77. I would like to call a
 Fortran-90 subroutine from my R script. Is that supported at all ?

Read the completely manual. :)  It is pretty easy to use
acroread, or other PDF readers, to search for Fortran in the PDF
file; the HTML version should also be searchable from your browser.

In chapter 1 (page 7 of the PDF 2.8.1 version) of the Writing R
Extensions manual, you will find:

  [...] providing support for C, C++, FORTRAN 77, Fortran
  9...@footnote{note that Ratfor is not supported. If you have Ratfor
  source code, you need to convert it to FORTRAN.  Only FORTRAN-77
  (which we write in upper case) is supported on all platforms, but most
  also support Fortran-95 (for which we use title case).  If you want to
  ship Ratfor source files, please do so in a subdirectory of @file{src}
  and not in the main subdirectory.}, Objective C [...]

and later in chapter 1, there is a complete section (namely 1.2.3) on
F95 code:

  @subsection Using F95 code

  @R{} currently does not distinguish between FORTRAN 77 and Fortran
  90/95 code, and assumes all FORTRAN comes in source files with
  extension @file{.f}.  Commercial Unix systems typically use a F95
  compiler, but only since the release of @code{gcc 4.0.0} in April 2005
  have Linux and other non-commercial OSes had much support for F95.
  Only wih @R{} 2.6.0 did the Windows port adopt a Fortran 90 compiler.

  This means that portable packages need to be written in correct
  FORTRAN 77, which will also be valid Fortran 95.  See
  @uref{http://developer.r-project.org/Portability.html} for reference
  resources.  In particular, @emph{free source form} F95 code is not
  portable.

  On some systems an alternative F95 compiler is available: from the
  @code{gcc} family this might be @command{gfortran} or @command{g95}.
  Configuring @R{} will try to find a compiler which (from its name)
  appears to be a Fortran 90/95 compiler, and set it in macro @samp{FC}.
  Note that it does not check that such a compiler is fully (or even
  partially) compliant with Fortran 90/95.  Packages making use of
  Fortran 90/95 features should use file extension @file{.f90} or
  @file{.f95} for the source files: the variable @code{PKG_FCFLAGS}
  specifies any special flags to be used.  There is no guarantee that
  compiled Fortran 90/95 code can be mixed with any other type of code,
  nor that a build of @R{} will have support for such packages.

Section 5.5 (Creating shared objects) also mentions Fortran 9x:

  Shared objects for loading into @R{} can be created using @command{R
  CMD SHLIB}.  This accepts as arguments a list of files which must be
  object files (with extension @file{.o}) or sources for C, C++, FORTRAN
  77, Fortran 9x, Objective C or Objective C++ (with extensions
  @file{.c}, @file{.cc} or @file{.cpp} or @file{.C}, @file{.f},
  @file{.f90} or @file{.f95}, @file{.m}, and @file{.mm} or @file{.M},
  respectively), or commands to be passed to the linker.  See @kbd{R CMD
  SHLIB --help} (or the @R{} help for @code{SHLIB}) for usage
  information.

Thus, it seems that calling Fortran 90 code from R is possible on
some platforms and, presumably, on those where it is possible this is
done via the .Fortran interface; although the Writing R Extensions
manual does not seem to say so explicitly.

OTOH, the help file for .Fortran states:

 Use '.Fortran' with care for compiled Fortran 9x code: it may not
 work if the Fortran 9x compiler used differs from the Fortran
 compiler used when configuring R, especially if the subroutine
 name is not lower-case or includes an underscore.

HTH.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: sta...@nus.edu.sg
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug? FORTTRAN help

2009-03-26 Thread Berwin A Turlach
G'day Kevin,

On Thu, 26 Mar 2009 13:42:20 -0700
rkevinbur...@charter.net wrote:

 I was feeling masochistic the other day [...]

Welcome to the club. :)

 and we have been having some wierd memory problems so I started
 digging into the source for L-BFGS-B. In the lbgfsb.c file I see the
 following code:
 
 /* Cholesky factorization of (2,2) block of wn. */
 F77_CALL(dpofa)(wn[*col + 1 + (*col + 1) * wn_dim1], m2, col,
 info); if (*info != 0) {
   *info = -2;
   return;
 }
 
 If I am not mistaken this says that there is a m2 * col matrix that
 starts at 'col + 1 + (col + 1) * wn_dm1. Where wn_dm1 is 2 * m.

I think your interpretation is not quite correct.  Note that it makes
only a sense to calculate a Cholesky factorization of a square matrix.
The interface of dpofa (in the linpack library) is available at:
 http://www.netlib.org/linpack/dpofa.f

Thus, the call above says, calculate the Cholesky factorization of a
col * col matrix whose (1,1) element is stored at wn[*col+1+(*col+1)]
and that matrix is stored within a matrix which was allocated such that
it has m2 rows.

Or, in other words, calculate the Cholesky factorization of a col * col
matrix whose (1,1) element is stored at wn[*col+1+(*col+1)] and to
move from the (1,1) element to the (1,2) element you have to move to
the memory location m2*sizeof(double) ahead/behind of (1,1).

Fortran uses a column major form to store arrays, i.e. element (1,1) is
followed by element (2,1), (3,1) and so forth.  To know where to find
element (1,2) of the matrix, you have to tell Fortran with how many
rows the big matrix that holds your matrix was allocated.

 I am worried that the optimizer will silently write info memory that
 it shouldn't [...]

If you are worried about such issues, you should read chapter 4 of
Writing R extensions, in particular Section 4.3 on gctorture and
valgrind.  Then run R on a platform that supports valgrind.  It is very
useful to catch problems such as accessing or writing into memory that
you should not access or write to.

HTH.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: sta...@nus.edu.sg
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [OT ?] rant (was : Re: Conversions From standard to metricunits)

2009-04-03 Thread Berwin A Turlach
G'day Murray,

On Fri, 3 Apr 2009 20:01:30 -0400
Murray Cooper myrm...@earthlink.net wrote:

 For science yes. For pleasure I'll still take a pint instead of 570ml!

And you might experience a disappointment then if you order it in the
US where the pint is apparently 450ml; at least, I won several bets
on whether a pint is less or more than half a litre against American
visitors  The stake being, of course, the next round of drinks. :)

And, if memory serves correctly, here in Singapore I once ordered a
pint of draught beer and was served half a litre...  Confused me
completely...  That's when I thought that the metric system is so much
safer until I ordered my next beer, according to the menu a bottled
beer of 400ml, it came in a 333ml bottle...  

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: sta...@nus.edu.sg
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Eclipse and StatET Howto (also added Subversion, Rtools)

2009-04-04 Thread Berwin A Turlach
G'day Dirk,

On Sat, 4 Apr 2009 20:27:22 -0500
Dirk Eddelbuettel e...@debian.org wrote:

 On 4 April 2009 at 14:37, Ken-JP wrote:
  Yes, I have x11-common installed, and dpkg -S /etc/X11/rgb.txt
  shows not found for me.  This is on Ubuntu 8.10 amd64.
 
 Same 8.10 for both amd64 and i386 where I checked -- both have the
 file. It could be a leftover from an earlier install of mine, or an
 accidental deletion at your end.

My guess that it is the former. :)  Some time ago I wiped Debian of my
laptop and installed Kubuntu 8.10 fresh (i386 flavour); so no left
overs from previous versions and on my laptop the results are:

ber...@berwin5:~$ apt-file find /etc/X11/rgb.txt
ber...@berwin5:~$ dpkg -S /etc/X11/rgb.txt
dpkg: /etc/X11/rgb.txt not found.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Eclipse and StatET Howto (also added Subversion, Rtools)

2009-04-05 Thread Berwin A Turlach
G'day Peter,

On Sun, 05 Apr 2009 11:26:40 +0200
Peter Dalgaard p.dalga...@biostat.ku.dk wrote:

 Berwin A Turlach wrote:
  G'day Dirk,
  
  On Sat, 4 Apr 2009 20:27:22 -0500
  Dirk Eddelbuettel e...@debian.org wrote:
  
  On 4 April 2009 at 14:37, Ken-JP wrote:
  Yes, I have x11-common installed, and dpkg -S /etc/X11/rgb.txt
  shows not found for me.  This is on Ubuntu 8.10 amd64.
  Same 8.10 for both amd64 and i386 where I checked -- both have the
  file. It could be a leftover from an earlier install of mine, or an
  accidental deletion at your end.
  
  My guess that it is the former. :)  Some time ago I wiped Debian of
  my laptop and installed Kubuntu 8.10 fresh (i386 flavour); so no
  left overs from previous versions and on my laptop the results are:
  
  ber...@berwin5:~$ apt-file find /etc/X11/rgb.txt
  ber...@berwin5:~$ dpkg -S /etc/X11/rgb.txt
  dpkg: /etc/X11/rgb.txt not found.
 
 Hum.. Fedora 9 doesn't have it either.
 
 It does have /usr/share/X11/rgb.txt though, so please check whether
 it has moved. 

Apparently it has:

ber...@berwin5:~$ apt-file find rgb.txt
[...]
x11-common: /usr/share/X11/rgb.txt
[...]

However, 

ber...@berwin5:~$ ls -l /usr/share/X11/rgb.txt
lrwxrwxrwx 1 root root 16 Jan 14 03:28 /usr/share/X11/rgb.txt - 
/etc/X11/rgb.txt
ber...@berwin5:~$ more /usr/share/X11/rgb.txt
/usr/share/X11/rgb.txt: No such file or directory

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Code of sample() in C

2009-04-05 Thread Berwin A Turlach
G'day Ranjan,

On Sun, 5 Apr 2009 17:00:56 -0500
Ranjan Maitra mai...@iastate.edu wrote:

 [...[
 I haven't tried this function out much myself, so YMMV.

There is an obvious problem in this code since len is not decreased but
n is when an index is sampled.  The last line in the last for loop
should be
x[j] = x[--len];
Otherwise this algorithm will produce samples in which an index can
be repeated.  And the probabilities with which an index is sampled would
be non-trivial to determine; they would definitely not be uniform.

HTH.

Cheers,

Berwin
 
 #include stdlib.h
 #ifndef USING_RLIB
 #define MATHLIB_STANDALONE 1 /*It is essential to have this before
 the call to the Rmath's header file because this decides
the definitions to be set. */
 #endif
 #include Rmath.h
 
 /* Note that use of this function involves a prior call to the Rmath
 library to get the seeds in place. It is assumed that this is done in
 the calling function. */
 
 /* Equal probability sampling; without-replacement case */
 /* Adapted from the R function called SampleNoReplace */
 
 /**
  * Stores k out of n indices sampled at random without replacement
  * in y.
  */
 int srswor(int n, int k, int *y)
 { 
   if (k  n) {
 return 1;
   }
   else {
 const double len = (double) n;
 int i;
 int* x = malloc(n * sizeof(int));
 if (!x) {
   return 2;
 }
   
 for (i = 0; i  n; ++i)   x[i] = i;
 
 for (i = 0; i  k; ++i) {
   const int j = (int)(len * runif(0.0, 1.0));
   y[i] = x[j];
   x[j] = x[--n];
 }
 free(x);
   }
   return 0;
 }

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: sta...@nus.edu.sg
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unable to re-import a table that was just exported to a file

2009-04-27 Thread Berwin A Turlach
On Sun, Apr 26, 2009 at 7:38 PM, Nigel Birney na...@cam.ac.uk wrote:

 Hi all,

 I am saving a program's output to a file to be read by another
 algorithm. But somehow such a simple operation (the reading) fails.
 I get:

 Error in read.table(a_corr_data.txt, sep = ,, col.names = T,
 row.names = F) :
 __more columns than column names


 Here is the write statement:

 write.table(a_corr_data,a_corr_data.txt,sep=,,col.names=T,row.names=F)

 Here is the read statement:

 a_corr_data -
 read.table(a_corr_data.txt,sep=,,col.names=T,row.names=F)

 Nothing happens in-between (these actions are just 10-30 secs
 apart). I tried to export/import without col.names, also tried
 different deliminators (/t) but the same error pops up again and
 again. I am already quite unhappy with this.

?read.table

You might find that you are using col.names and row.names wrongly in the
read.table command.

Best wishes,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: sta...@nus.edu.sg
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] definition of a function

2008-10-28 Thread Berwin A Turlach
G'day George,

On Wed, 29 Oct 2008 06:21:37 +0200
KARAVASILIS GEORGE [EMAIL PROTECTED] wrote:

 Hello, R subscribers.
 I would like to define a function like this one:
 Fun - function(x)sin(x)/x
 with value 1 at x=0. How can I do that?

Fun - function(x) ifelse(x==0, 1, sin(x)/x)

 Is there a way to plot it symbolically, without using x as a vector?
 e.g.
 x - seq(from=-10, to=10, length=100)
 plot(x, Fun(x))

curve(Fun, from=-10, to=10, n=100)

HTH.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] odd behaviour of identical

2008-11-01 Thread Berwin A Turlach
On Sat, 01 Nov 2008 22:57:38 +0100
Wacek Kusnierczyk [EMAIL PROTECTED] wrote:

 Patrick Burns wrote:
  Wacek Kusnierczyk wrote:
  smells bad design.

 
  Nonsense.
 
 not really, i'm afraid.
 
[...]
 to the point:
 
 is.integer(1) # FALSE
 is.integer(1:1) # TRUE
 
 is not particularly appealing as a design, though it can be defended
 along the line that : uses 1 as the increase step, thus if it starts
 at an integer, the vector contains integers (just like it says in
 help(:)).  the problem here is that 1:1 is *obviously* an integer
 vector, while 1 is *obviously* a number vector, but *obviously* not an
 integer vector.  do a poll on how obvious it is to r's users, i'd bet
 you lose.

Probably you are right, but the set of useRs would be the wrong
reference base.  Most useRs are using R for the purpose it was
designed; namely, statistical analyses.  And I cannot remember any
statistical analyses that I ever done where it mattered whether a
number was stored as an integer or not; or whether is.integer()
returned FALSE or TRUE.

I can see a use of is.integer() only when you use R for programming.
Personally, the only use I can see for is.integer() is for checking
that a user has not by accident passed a vector stored in integer mode
to a routine in which I pass this vector down to C or FORTRAN code that
expects double or DOUBLE PRECISION, respectively.  But in such
situation, rather than testing with is.integer(), I typically just use
storage.mode() to ensure the proper storage mode.

In summary, if one uses R as a programming language then, as with
any other programming language, one should become familiar with the
language and its idiosyncrasies; and perhaps also with some features of
binary computing and the constraints imposed by these feature.  In my
experience, every language has idiosyncrasies that one has to get used
to (and which one may or may not consider design flaws).  As the saying
goes, a good handyman does not blame her/his tools.

  On Sun, Oct 26, 2008 at 6:39 PM, Wacek Kusnierczyk
  [EMAIL PROTECTED] wrote:
   
  given what ?identical says, i find the following odd:

Given that the example section of the help page for identical starts
with:

Examples:

 identical(1, NULL) ## FALSE -- don't try this with ==
 identical(1, 1.)   ## TRUE in R (both are stored as doubles)
 identical(1, as.integer(1)) ## FALSE, stored as different types

I find it odd that you find the following odd.  :)

  x = 1:10
  y = 1:10
 
  all.equal(x,y)
  [1] TRUE
  identical(x,y)
  [1] TRUE
 
  y[11] = 11
  y = y[1:10]

Vectors can only hold objects of one type; and from the previous
discussion and example you know that 11 has storage mode double.  So
you should not be surprised that all elements of y are promoted to
storage mode double. 

And if you think that R's promotion rules are hard to understand; try
to figure out the rules how objects of different types are promoted in
expressions by C; in particular if some objects are unsigned integers.

  all.equal(x,y)
  [1] TRUE
  identical(x,y)
  [1] FALSE
 
  y
  [1] 1 2 3 4 5 6 7 8 9 10
  length(y)
  [1] 10
 
 
  looks like a bug.
 
  platform   i686-pc-linux-gnu
  arch   i686
  os linux-gnu
  system i686, linux-gnu
  status
  major  2
  minor  7.0
  year   2008
  month  04
  day22
  svn rev45424
  language   R
  version.string R version 2.7.0 (2008-04-22)

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] chi square table

2008-11-06 Thread Berwin A Turlach
G'day Cruz,

On Fri, 7 Nov 2008 09:47:47 +0800
cruz [EMAIL PROTECTED] wrote:

 Hi,
 
 How do we get the value of a chi square as we usually look up on the
 table on our text book?
 
 i.e. Chi-square(0.01, df=8), the text book table gives 20.090
 
  dchisq(0.01, df=8)
 [1] 1.036471e-08
  pchisq(0.01, df=8)
 [1] 2.593772e-11
  qchisq(0.01, df=8)
 [1] 1.646497
 
 
 nono of them give me 20.090

The value that your textbook denotes, presumably, with chi^2_0.01 (or
some similar notatation) is in fact the 0.99 quantile of the chi-square
distribution; which R readily calculates:

R qchisq(0.99, df=8)
[1] 20.09024

rant on
That's the problem with introductory textbook whose author think they
do the students a favour by using notation as z_alpha, z_0.01,
z_(alpha/2) instead of z_(1-alpha), z_0.99, z_(1-alpha/2),
respectively.  In my opinion this produces in the long run only
more confusion and does not help students at all.  It just panders to
intellectual laziness of (some) students and shows a great deal of
confusion on the side of the authors.
I would search another textbook
rand off

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Berwin A Turlach
G'day Duncan,

On Tue, 11 Nov 2008 09:37:57 -0500
Duncan Murdoch [EMAIL PROTECTED] wrote:

 I think this tension is a fundamental part of the character of S and
 R. But it is also fundamental to R that there are QC tests that apply
 to code in packages:  so writing new tests that detect dangerous
 usage (e.g. to disallow partial name matching) would be another way
 to improve reliability.  [...]

Please not. :)
After years of using of R, it is now second nature to me to type (yes,
I always spell out from and to) 
seq(from=xx, to=yy, length=zz)
and I never understood why the full name of that argument had to be
length.out.  I would hate to see lots of warning messages because I am
using partial matching.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Berwin A Turlach
On Tue, 11 Nov 2008 09:49:31 +0100
Wacek Kusnierczyk [EMAIL PROTECTED] wrote:

 (for whatever reason a user may choose to have a column named '-b')

For whatever reason, people also jump from bridges.  Does that mean
all bridges have an inherently flawed design and should be abolished?

Wait, then we would only have level crossing and some people, for
whatever reason, think it is a good idea to race trains to level
crossings.  Gee, we better abolish them too since they are such a bad
design.  

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Berwin A Turlach
On Tue, 11 Nov 2008 11:27:30 +0100
Wacek Kusnierczyk [EMAIL PROTECTED] wrote:

 Berwin A Turlach wrote:

  Why is it worth asking this if nobody else asks it?  Most notably a
  certain software company in Redmond, Washington, which is famous for
  carrying on with bad designs and bugs all in the name of backward
  compatibility.  Apparently this company also sets industry
  standards so it must be o.k. to do that. ;-)

 
 sure.  i have had this analogy in mind for a long time, but just
 didn't want to say it aloud.  

Mate, if you contemplate comparing R to anything coming out of Redmond,
Washington, then you should first heed the old saying that it is
better to remain silent and let people believe that one is a fool than
to open one's mouth and remove any doubt. :)

 indeed, r carries on with bad design, but since there are more and
 more users, it's just fine.

Whether R carries on with bad design is debatable.  Clearly the changes
that you would like to see would lead to big changes that might break a
lot of existing code and programming idioms.  Such changes could
estrange large part of the user base and, in a worst case scenario,
make R unusable for many tasks it is used now.  No wonder that nobody
is eager to implement such design changes.  Apparently Python is
planning such whole sale changes when moving to version 3.x.  Let's see
what that does to the popularity of Python and the uptake of the new
version.

  Didn't see any confused complaints yet.  
 
 really.  the discussion was motivated precisely by a user's
 complaint. 

We must have different definition of what constitutes a complaint.  I
looked at the initial posting again.  In my book there was no
complaint.  Just a user who asked how to achieve a certain aim because
the way he tried to achieve it did not work.  There were three or four
constructive answers that pointed out how it can be done and then all
of a sudden complaints about alleged design flaws of R started.

 just scan this list;  a large part of the questions stems from
 confusion, which results directly from r's design. 

That's your opinion, to which you are of course entitled to.  In my
opinion, a large part of the questions on r-help these days stem from
the fact that in this age of instant gratification it seems to be
easier to fire off an e-mail to a mailing list and try to pick the
brain of thousands of subscribers  instead of spending time on trying
to read the documentation, learn about R and figure out the question on
one's own.

  because r is likely to do everything but what the user expects.
 
  This is quite a strong statement, and I wonder what the basis is for
  that a statement.  Care to provide any evidence?
 
 i could think of organizing a (don't)useR conference, where
 submissions would provide such evidence.  

Please do so.  Such a conference would probably turn out to be more
hilarious and funnier than the Melbourne International Comedy Festival;
should be real fun to attend. :)

 whatever i say here, is mostly discarded as nonsense comments (while
 it certainly isn't), you say i make the problem up (while i just
 follow up user's complaints).  seriously, i may have exaggerated in
 the immediately above, but lots of comments made here by the users
 convince me that r very often breaks expectations.

Ever heard about biased sampling?  On a list like this you, of course,
hear questions by useRs who had the wrong expectations about how R
should behave and got surprised.  You do not hear of all the instances
in which useRs had the correct expectations which promptly were met by
R.  

  R is a tool; a very powerful one and hence also very sharp.  It is
  easy to cut yourself with it, but when one knows how to use it
  gives the results that one expects.  I guess the problem in this
  age of instant gratification is that people are not willing to put
  in the time and effort to learn about the tools they are using.  
 
 but a good tool should be made with care for how users will use it.  

But the group of users change, and sometimes one cannot foresee all
possible ways in which future users may use the software.  As a
programming paradigm says, you cannot make a piece of software
idiot-proof; nature will always come up with a better idiot.  

 r apparently fits the ideas of its developers, 

That's the prerogative of the developers, isn't it?  But if it would
only fit their ideas, then it would only be used by them.  The fact
that it is used by many others seem to indicate that it fits also the
ideas of many others.

 while confuses naive users.  

Well, many judiciaries have staged driver licenses for motorcycle;
initially allowing only low-powered machine for new users with
increasing powerful machines allowed for more experiences users.  Some
people in Australia would like to introduce a similar system for
car-drivers since, apparently, too many P-platers kill themselves with
high-powered V8 cars (though, I am not sure whether this is a problem

Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Berwin A Turlach
On Tue, 11 Nov 2008 09:27:41 +0100
Wacek Kusnierczyk [EMAIL PROTECTED] wrote:

 but then it might be worth asking whether carrying on with misdesign
 for backward compatibility outbalances guaranteed crashes in future
 users' programs, [...]

Why is it worth asking this if nobody else asks it?  Most notably a
certain software company in Redmond, Washington, which is famous for
carrying on with bad designs and bugs all in the name of backward
compatibility.  Apparently this company also sets industry standards so
it must be o.k. to do that. ;-)

 which result in confused complaints, 

Didn't see any confused complaints yet.  Only polite requests for
enlightenment after coming across behaviour that useRs found surprising
given their knowledge of R.  The confused complaints seem to be posted
as responses to responses to such question by people who for what ever
reason seem to have an axe to grind with R. 

 the need for responses suggesting hacks to bypass the design, 

Not to bypass the design, but to achieve what the person whats.  As any
programming language, R is a Turing machine and anything can be done
with it; it is just a question how.

 and possibly incorrect results published 

I guess such things cannot be avoided no matter what software you are
using.  I am more worried about all the analysis done in MS Excel, in
particular in the financial maths/stats world.  Also, to me it seems
that getting incorrect results is a relative small problem compared with
the frequent misinterpretation of correct results or the use of
inappropriate statistical techniques.  

 because r is likely to do everything but what the user expects.

This is quite a strong statement, and I wonder what the basis is for
that a statement.  Care to provide any evidence?

R is a tool; a very powerful one and hence also very sharp.  It is easy
to cut yourself with it, but when one knows how to use it gives the
results that one expects.  I guess the problem in this age of instant
gratification is that people are not willing to put in the time and
effort to learn about the tools they are using.  

How about spending some time learning about R instead of continuously
griping about it?  Just imagine how much you could have learned in the
time you spend writing all those e-mails. :)

 r suffers from early made poor decisions, but then this in itself is
 not a good reason to carry on.

Radford Neal is also complaining on his blog
(http://radfordneal.wordpress.com/) about what he thinks are design
flaws in R.  Why don't you two get together and design a good
substitute without any flaws?  Or is that too hard? ;-)

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Berwin A Turlach
On Tue, 11 Nov 2008 12:53:31 +0100
Wacek Kusnierczyk [EMAIL PROTECTED] wrote:

 but seriously, when one buys a complicated device one typically reads
 a quick start guide, and makes intuitive assumptions about how the
 device will work, turning back to the reference when the expectations
 fail. good design should aim at reducing the need for checking why an
 intuitive assumption fails.

And on what are these intuitive assumptions based if not on familiarity
with similar devices?  And people have different intuition, why should
yours be the correct one and the golden standard?

I know that if I buy a complicated device and never owned something
similar I read more than the quick start guide to get familiar with the
device before breaking something due to using wrong assumptions.

When I started to use S-PLUS, I had used GAUSS before.  Still I took
the time off to work through the blue book and make myself familiar
with S-PLUS before using it for serious work.  Based on my experience
with R, I found R very intuitive and easy to use; but still try to keep
up with relevant documentation.

It really seems that your problem is that you have an attitude of
wanting to have instant gratification.

  If you do not care about how to use machine-gun correctly you could
  easily harm yourself or others. 

 indeed, and i'm scared to think that some of the published research
 can be harmful because the researcher denied to read the whole r
 reference before doing a stats analysis.

Sorry, but this is absolute rubbish.  There are plenty of statistical
analyses that can be done without reading the complete R reference.
However, one or two good books might help.

My concern would rather be that everybody thinks that they can do
statistics and that software project of R makes such people really
think they can do it.  I am far more concerned about inappropriate
analyses and wrong interpretations.  How often is absence of evidence
taken as evidence of absence?

 you see, i'm not complaining about my own analyses failing because i
 have not read the appropriate section in the reference.  if this were
 the problem, i'd just read more and keep silent.
 
 i'm complaining about the need to read, by anyone who starts up with
 r, in all gory details, about the intricacies of r before doing
 anything, because the behaviour is often so unexpected.  

I guess Frank Harrell had people like you in mind when he wrote:
  https://stat.ethz.ch/pipermail/r-help/2005-April/068625.html

Would you also not expect to learn about surgery in all its gory
details before attempting brain surgery because brain surgery is so
intuitive and doesn't need any study?

Believe it or not, there are lots of useful things that you can do in R
without knowing all the gory details.  There are even people who got
books on R published who obviously don't know all the gory details and
they still show useful applications of R.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R design (was Variable passed to function not used in function in select)

2008-11-11 Thread Berwin A Turlach
G'day Peter,

On Wed, 12 Nov 2008 00:42:36 +0100
Peter Dalgaard [EMAIL PROTECTED] wrote:

  On 12/11/2008, at 11:29 AM, Peter Dalgaard wrote:
 
  Not that one again! For at least one other value of one, the
  expectation is the opposite: Factor levels do not go away just
  because they happen not to be present in data.
 
  fct - lapply(dd, is.factor)
  dd[fct] - lapply(dd[fct], [, drop=TRUE)
 
 (Actually, the last line could have had lapply(dd[fct],factor), I
 just got confused about whether in would preserve the level order.)

That was my first thought too, that lapply(dd[fct], factor) should be
enough.  But then I thought that ordered factors test TRUE for
is.factor and that using factor() on an ordered factor will not only
remove the factor levels that are missing but also remove the
information on the ordering of the remaining levels.  A short test
showed that this is not the case; and after reading the help page of
factor() I realised that this is due to the design of the function.

So perhaps this example should be documented as a case in which the
design decisions of the R developer save a naive user of accidentally
doing a wrong thing (namely turning an ordinal variable into a
nominal). :)

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] licensing of R packages

2008-11-14 Thread Berwin A Turlach
 Theory's side, they do
not distribute precompiled CDROMs anymore.  I wonder whether somebody
has had a word with them about the price they were asking?  

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] licensing of R packages

2008-11-14 Thread Berwin A Turlach
G'day Brian,

On Fri, 14 Nov 2008 11:47:46 + (GMT)
Prof Brian Ripley [EMAIL PROTECTED] wrote:

 On Fri, 14 Nov 2008, Duncan Murdoch wrote:

  I think they are talking about cases where the GPL libraries are
  compiled into the new product.  Packages generally don't include
  copies of anything from R, so our GPL doesn't apply to them.
  (Writers may have chosen to copy and modify base functions; if so,
  they are copying our code, and the GPL would apply.)
 
 That _has_ happened several times (usually without any credit and
 removing the copyright header from the R version), so package writers
 do need to be aware that base functions are not fair game.  Often it
 comes to light when the forked version fails when e.g. .Internal
 calls are changed.

Removing credits is definitely poor form but nothing that the GPL
stops, IIRC.  If you want credits to be carried on, you would need to
use a license such as the original BSD license.  IIRC, it was exactly
this clause, that credits have to be kept, that made the original BSD
license a GPL-incompatible Free Software License.

As far as I understand, removing a copyright header is indeed a pretty
big no-no.  And copyright law seems to be a funny thing.  Recently I was
asked by the maintainers of AucTeX whether I was willing to sign over
the copyright of contributions I once had provided.  When I looked at
one of the files for which they wanted the copyright signed over, not a
single line looked familiar to me (I cannot write elisp code at the
level that the code was at).  The file attributed copyright to me and I
vaguely remember that I once contributed it so that AucTeX would have
support for some .sty file. But over time, presumably also to keep up
with changes in AucTeX, the file was completely re-written by others;
with none of the original code remaining (as far as I can tell). Still,
it was suggested to me that I am the copyright holder just because I
contributed the original file, even though by now not a single line of
the code originated from me.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] licensing of R packages

2008-11-14 Thread Berwin A Turlach
G'day Duncan,

On Fri, 14 Nov 2008 11:16:35 -0500
Duncan Murdoch [EMAIL PROTECTED] wrote:

 On 11/14/2008 11:01 AM, Berwin A Turlach wrote:

  But I remember that a situation as you describe was hotly debated on
  gnu.misc.discuss in the mid-90s; thus, I am talking obviously GPL 2.
  Unfortunately I do not remember which software/company was involved
  and how the dispute was solved.  But they did something like you
  described:  distributed a binary and asked the user to download
  additionally some GPL software and then run both together.  If this
  were allowed, the GPL would have a hole in it that you could drive a
  truck through. :)
 
 If the binary being released had no GPL content in it, then there
 would be no basis to complain about anything.  I'd guess that the
 particular case required GPL'd headers in order to compile.  That
 would be enough to say that the binary includes GPL'd code.

I am not so sure about whether it is is a matter of including GPL'd
headers.  After all, if I just want to call some functions from a
library libXXX and I know what the arguments of those function is, then
I could write their prototypes into my own non-GPL'd headers.

The question is really, does the software that I distribute work only
if it is dynamically linked against libXXX and are there several
implementations of the functionality that I need from libXXX? If the
only implementation of libXXX is a GPL'd one, then my software is a
derivative work of it.  When I compiled it, I must have had the GPL
version of libXXX on my system.

To be more specific, looking at my Debian system, I can see that 
/usr/lib/libXXX might link to /etc/alternatives/libXXX
and /etc/alternatives/libXXX links to the actual version, which could
be a commercial library, a GPL'd one or one under some other
open-software licence. If there are several alternatives for libXXX (for
the functions that I need), then there is no way of telling what I have
on my system and what I used when I created my binary which is linked
to /usr/lib/libXXX.  So I can give my binary to others and tell them
that they have to get a version of /usr/lib/libXXX; and that one option
would be to install the GPL'd version obtainable from XYZ.

But if the only library that implements the functionality that my
program needs is a GPL'd version, then it is pretty clear
that /etc/alternatives/libXXX on my machine (if the /etc/alternatives
set up is used at all) must point to the GPL'd libXXX.  Thus, when I
created the binary, I created a derivative work of libXXX, whether I
used its GPL'd headers or not.

But I guess I should continue to say that IANAL. :)

Cheers,

Berwin

PS:  Hope I managed again to not getting the flame-thrower started. :)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] licensing of R packages

2008-11-14 Thread Berwin A Turlach
G'day Duncan,

On Fri, 14 Nov 2008 12:36:15 -0500
Duncan Murdoch [EMAIL PROTECTED] wrote:

 On 11/14/2008 11:57 AM, Carlos Ungil wrote:
  
  And the copyright owners have recourse to legal action if they
  think there is a license violation. Again, I don't know what a
  court would decide, but if you want to test the limits of the GPL
  license I would avoid challenging a GNU project :-)
 
 Actually, I think that's an ideal situation.  There is a lot of fear
 and doubt about using GPL'd software, because of worries like yours.
 But the FSF has the resources to follow up in cases where there are
 actual violations, and they have won several.

What do you mean with won several?  As far as I remember from reading
an interview of Eben Moglen, in most cases the violator backed down and
it ddi not come to a law suit.  From what I read, I think the FSF was
only involved in one case that actually went to court, perhaps two; but
I could be wrong.

However, http://en.wikipedia.org/wiki/Harald_Welte attributes to Harald
Welte the creation of http://gpl-violations.org and with prosecuting
violators of the GPL, which had been untested in court until then.

And as I mentioned early, the FSF does not seem to be the copyright
holder of the GSL, so it is not clear whether it can become active.
But perhaps it can, it is not clear to me whether Harald Welte was the
copyright holder in the cases that he pursued.
 
 The way to lose a GPL lawsuit is to incorporate GPL'd code into your
 own project, and then not follow the GPL when you redistribute.
 There's evidence of that.
 
 But I've never heard of anyone linking to but not distributing GPL'd 
 code and being sued for it, let alone losing lawsuits over it.
 That's evidence enough for me that it is a safe thing to do.

I rather think it is evidence that nobody who has done such a thing was
willing to let it come to a lawsuit but rather decided to settle the
matter before it reached court.  Probably because the advice that they
got from their own lawyers was that they were on a sure loser and
should settle instead, with the settlement being much cheaper.  As I
understand it, the FSF never went for punitive damages and was just
happy with the offender rectifying his/her way in the future and
adhering to her/his obligation under the GPL.

In my opinion, but IANAL, it is pretty obvious that if your binary has
to be linked against some GPL'd software to make it work and this is the
only way of making your binary work, then your binary is a derivative
work of that GPL'd software.  Hence, if you want to distribute your
binary, you are bound to the GPL.

Thus, on one hand I agree with you, I have never heard that of anyone
linking to but not distributing GPL'd code and being sued for it, let
alone losing lawsuits over it.  On the other hand I do seem to remember
hearing about people who distributed binaries-only software, that needed
to be linked against software under GPL to work, being talked to by the
FSF and agreeing to stop their behaviour.  Anybody knows if there is an
archive for gnu.misc.discuss for the early to mid '90s so that I can
try to refresh my memory?

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] licensing of R packages

2008-11-14 Thread Berwin A Turlach
G'day Duncan,

On Fri, 14 Nov 2008 13:37:20 -0500
Duncan Murdoch [EMAIL PROTECTED] wrote:

 What is so special about binaries?  

First, with binaries it is (presumably) easier to define when one piece
of software is part of another.

Secondly, I presume that the software that started this thread is a
binary, though I have not downloaded it.

 By your line of reasoning (which I don't agree with), if you write an
 R script, making use of features of R that are not present in other S
 dialects, then the only way to make it work would be to use R.  So if
 you want to distribute it, you would need to GPL it.  

Not at all, see:

http://www.fsf.org/licensing/licenses/gpl-faq.html#IfInterpreterIsGPL

Last time I checked, R is a programming language interpreter. :)

 So lots of the R packages on CRAN that do not have GPL compatible
 licenses would be in violation of the GPL,

Not at all.  As per the above cited FAQ item, there would be no problem
with the interpreted R code; there could be problem with compiled code
linked into R.  But as soon as that code can also be compiled and run
without R, there would be no problem. 

 and since CRAN is distributing them, CRAN is in violation of the GPL.
 That's nonsense.

You might find that many things that you think are nonsense are taken
deadly serious by lawyers. ;-)
 
 Moreover, if you write a C/C++ program that makes use of GNU
 extensions, you'd be in violation of the GPL if you were to
 distribute it without GPLing it.  Even the FSF doesn't believe that:
 
 http://www.fsf.org/licensing/licenses/gpl-faq.html#CanIUseGPLToolsForNF

Could you please explain how you come to this opinion?  As far as I
know, if you write a C/C++ program that makes use of GNU extensions,
that program will be still linked to the same system libraries as a
program that does not make use of such extensions.  Or is a program
that uses GNU extensions all of a sudden linked to /usr/lib/libgnuspecial ?

 although they do (incorrectly, in my opinion) make the same claim as
 you about libraries:
 
 http://www.fsf.org/licensing/licenses/gpl-faq.html#IfLibraryIsGPL

Given the track history of the FSF of defending the GPL, I am afraid I
will go rather with their opinion than yours. :)  In particular since
their opinions were formed with input from lawyers as far as I can tell.

 The argument in the latter FAQ does seem to imply that R scripts must
 be GPL'd, and again:  that's nonsense.

Not at all, read the complete FAQ, especially the part I mention above
about interpreted languages.

  Thus, on one hand I agree with you, I have never heard that [...]
  On the other hand I do seem to remember hearing about people who
  distributed binaries-only software, that needed to be linked
  against software under GPL to work, being talked to by the FSF and
  agreeing to stop their behaviour. [...]
 
 And I've heard of people receiving letters from copyright holders
 asking them to stop making fair use of copyrighted materials, and
 they complied.  So what?  That just shows that intimidation works, it
 doesn't show that there is a legal basis for the intimidation.

Well, from what I read it showed that the offenders didn't like their
chances of taking the risk of going to court.  So they must have come
to the conclusion that there is some legal basis.

I guess the only way of knowing for sure would be if you commit such a
violation, refuse to stop doing so, don't settle out of court and decide
go to court.  That would probably settle once and for all whether there
is a legal basis for that argument.  And you would be doing the
free-software community a great favour by having this issue finally
tested in a law of court, decided and laid to rest.  But please don't
be offended if my money is on you loosing the case. :)

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] licensing of R packages

2008-11-15 Thread Berwin A Turlach
G'day Duncan,

On Fri, 14 Nov 2008 15:24:03 -0500
Duncan Murdoch [EMAIL PROTECTED] wrote:

  Moreover, if you write a C/C++ program that makes use of GNU
  extensions, you'd be in violation of the GPL if you were to
  distribute it without GPLing it.  Even the FSF doesn't believe
  that:
  
  http://www.fsf.org/licensing/licenses/gpl-faq.html#CanIUseGPLToolsForNF
  
  Could you please explain how you come to this opinion? 
 
 I believe your argument was that if my program can't be used without
 a library that is currently only available under the GPL, then my
 program must be considered as a derivative work, so I must license it
 under the GPL as well.

It depends now what you mean with my program.  I was making my
comments specifically within the context of the original posters
questions about Numerit's behaviour and Barry's assertion that he could
write a C program that relies on a GPL'd library, distribute the
compiled binary of that C program without the GPL'd library and just
ask the user to get the GPL'd library from somewhere else so that he or
she could then run the program.  

You seem to extend the scope to a more general definition of what a
program constitutes, including the distribution of source code for an
interpreter.  As I tried to point out, if you distribute source code
to be interpreted by an interpreter that is GPL'd, you bring up all
kind of issues.

 Well, if my program uses GNU extensions, it can only be used when 
 compiled by gcc, which I believe is GPL'd.  So your argument would
 imply that it is a derivative work.

Not at all.  Please read what I said.  Is the program compiled from C
code that uses GNU extensions all of a sudden linked to a
GPL'd /usr/lib/libgnuextension library, with that library being the
only source of the needed functionality?

The C code is input (data) to the GNU compiler.  The license of the GNU
compiler does not specify that the output (i.e. the binary) has to be
GPL'd.  Nor does the FSF claim copyright to that output.  (IIRC, when
reading the EULA of some commercial compilers in the 80's, some
software houses seriously claimed the copyright of any program
compiled with their compilers.  I think this clause disappeared rather
soon when users realised its implication.) 

 None of us (you, me or the FSF) think the latter argument is valid.

Agreed, and in my case, and presumably also the FSF's case, it is
because such an argument was never made and never claimed.  

To be, hopefully, completely precise:  if you distribute a binary
program which can only be run if linked against a GPL'd library, i.e.
no alternative implementation of the library providing the
functionality your binary needed exists, then that binary is a
derivative work of the GPL'd library.  Hence, in order to distribute
it, you would have to adhere to the rules under which use of the GPL'd
library was granted to you.  Distributing your binary, and asking your
users to get the GPL'd library separately from somewhere else, does not
absolve you from your responsibilities under the GPL.

 I don't see a material difference from the former argument, so I
 conclude the former argument is invalid as well.

My scenario:  a binary program is distributed which can only be run if
it is linked by the user to a GPL'd library.  The source code of that
binary program is not distributed and there is no offer of providing
it.  There is no way of using that program except with the GPL'd
library, i.e. you need to have the GPL'd library libXXX, there is no
other (whether commercial or not) provider of libXXX that implements
the same featuers.  Thus, it is clear that when you produced the
binary, you had the GPL'd library on your system, you linked the binary
against that library and at that moment you created a derivative work
of the GPL'd library.  

Your scenario (as I understand it):  You use a GPL'd compiler to compile
your program which is then only linked to system libraries available on
any similar OS. The compiler offers extension to the language that other
compilers don't offer, but produces binaries that are only linked to
said standard libraries.  The GPL'd compiler makes no claims on the
binary that it produces.

Sorry, but if you do not see a material difference here, then I guess
it is best that we just agree to disagree. :)

  The argument in the latter FAQ does seem to imply that R scripts
  must be GPL'd, and again:  that's nonsense.
  
  Not at all, read the complete FAQ, especially the part I mention
  above about interpreted languages.
 
 I think the complete FAQ is inconsistent.  

I guess that comes from the fact that you do not see a material
difference where other do. :)

 Some of it is accurate (e.g. the bits that say you can use GPL'd
 software without GPL'ing your own work) and some of it is not (the
 parts that imply dynamic linking makes something a derivative work.)

As far as I understand, there have been people who shared that thought,
namely that dynamic linking does not imply 

  1   2   >