from:"Ted Harding"

Re: [Rd] Write unix format files on windows and vice versa

2012-04-24 Thread Ted Harding

On 24-Apr-2012 17:23:00 Andrew Redd wrote:
> I go back and forth between windows and linux, and find myself
> running into problem with line endings. Is there a way to
> control the line ending conversion when writing files, such as
> write and cat? More explicitly I want to be able to write files
> with LF line endings rather than CRLF line ending on windows;
> and CRLF line endings instead of LF on linux, and I want to be
> able to control when the conversion is made and/or choose the
> line endings that I want.
> 
> As far as I can tell the conversion is not optional and buried
> deep in compiled code. Is there a possible work around?
> 
> Thanks,
> Andrew

Rather than write the Linux version out in Windows (or the other
way round in Linux), you might find it more useful to use an
external conversion utility.

One such is unix2dos/dos2unix (the same program in either case,
whose function depends on what name you call it by). This used
to be standard on older Linux systems, but may need to be explicitly
installed on your system. It seems there may also be a version for
Windows -- see:

  http://download.cnet.com/Unix2DOS/3000-2381_4-10488164.html

Then, when you create a file in Windows, you could transfer it to
Linux and covert it to "Unix"; and also keep it as it is for use
on Windows. Conversely, when you create a file in Linux, you
coled convert it to "Windows" and transfer it to Windows; and
also keep it as it is for use on Linux.

It is also possible to do the CRLF-->LF or LF-->CRLF using 'sed'
in Linux.

For some further detail see

   http://en.wikipedia.org/wiki/Unix2dos

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) 
Date: 24-Apr-2012  Time: 18:56:21
This message was sent by XFMail

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] "NA" vs. NA

2012-04-05 Thread Ted Harding

On 05-Apr-2012 11:03:15 Adrian Dusa wrote:
> Dear All,
> 
> I assume this is an R-devel issue, apologies if I missed something
> obvious. I have a dataframe where the row names are country codes,
> based on ISO 3166, something like this:
> 
> 
> "v1""v2"
> "UK"12
> "NA"23
> 
> 
> It happens that "NA" is the country code for "Namibia", and that
> creates problems on using this data within a package due to this:
> 
> Error in read.table(zfile, header = TRUE, as.is = FALSE) :
>   missing values in 'row.names' are not allowed
> 
> I realise that NA is reserved in R, but I assumed that when quoted it
> would be usable.
> For the moment I simply changes the country code, but I wonder if
> there's any (other) solution to circumvent this issue.
> 
> Thanks very much in advance,
> Adrian

Hi Adrian,
The default in read.table() for the "na.strings" parameter is

  na.strings = "NA"

So, provided you have no "NA" in the data portion of your file
(or e.g. any missing values are simply blank) you could use
something like:

read.table(zfile, header = TRUE, as.is = FALSE, na.strings="OOPS")

which should avoid the problem.

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) 
Date: 05-Apr-2012  Time: 12:21:57
This message was sent by XFMail

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Julia

2012-03-01 Thread Ted Harding

http://julialang.org/blog

Then click on "Stanford Talk Video".
Then click on "available here".

Ted.

On 01-Mar-2012 Kjetil Halvorsen wrote:
> Can somebody postb a link to the video? I cant find it, searching
> "Julia" on youtube stanford channel gives nothing.
> 
> Kjetil
> 
> On Thu, Mar 1, 2012 at 11:37 AM, Douglas Bates  wrote:
>> On Thu, Mar 1, 2012 at 11:20 AM, Jeffrey Ryan 
>> wrote:
>>> Doug,
>>>
>>> Agreed on the interesting point - looks like it has some real promise.
>>> Â_I think the spike in interest could be attributable to Mike
>>> Loukides's tweet on Feb 20. (editor at O'Reilly)
>>>
>>> https://twitter.com/#!/mikeloukides/status/171773229407551488
>>>
>>> That is exactly the moment I stumbled upon it.
>>
>> I think Jeff Bezanson attributes the interest to a blog posting by
>> Viral Shah, another member of the development team, that hit Reddit.
>> He said that, with Viral now in India, it all happened overnight for
>> those in North America and he awoke the next day to find a firestorm
>> of interest. Â_I ran across Julia in the Release Notes of LLVM and
>> mentioned it to Dirk Eddelbuettel who posted about it on Google+ in
>> January. Â_(Dirk, being much younger than I, knows about these
>> new-fangled social media things and I don't.)
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-
E-Mail: (Ted Harding) 
Date: 01-Mar-2012  Time: 20:47:42
This message was sent by XFMail

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] bug in sum() on integer vector

2011-12-13 Thread Ted Harding

 particles in the Universe is believed to be
around 10^80 (estimates vary from 10^72 to 10^87). If we
could use each particle as a binary element in storage
(e.g. using positive and negative spin as 1 or 0) then
the largest integer that could be stored in the entire
Universe would be about 2^(10^80). A huge number, of course,
but it is the limit of what is possible.

Now, to do arithmetic with integers, you need to store two
ort more integers, thus at least halving the power of 2.
Then you need a computer to do the computation, and you
won't have room for that in the Universe unless you cut
down on the largest integer.

So, in real life (whatever Mathematics may say), you can't
expect arbitrary integer arithmetic.

Now, computer programs for numerical computation can broadly
be divided into two types.

In one, "arbitrary precision" is available: you can tell
the program how many decimal digits you want it to work to.
An example of this is 'bc':

  http://en.wikipedia.org/wiki/Bc_programming_language

You can set as many decimal ditgits as you like, *provided*
they fall within the storage capacity of your computer, for
which an upper bound is the storage capacity of the Universe
(see above). For integers and results which surpass the
decimal places you have set, the result will be an approximation.
Inevitably.

In the other type, the program is written so as to embody
integers to a fixed maximum number of decimal (or binary)
digits. An example of this is R (and most other numerical
programs). This may be 32 bits or 64 bits. Any result ot
computation which involve smore than this numer of bits
is inevitably an approximation.

Provided the user is aware of this, there is no need for
your "It should always return the correct value or fail."
It will return the correct value if the integers are not
too large; otherwise it will retuirn the best approximation
that it can cope with in the fixed finite storage space
for which it has been programmed.

There is an implcit element of the arbitrary in this. You
can install 32-bit R on a 64-bit-capable machine, or a
64-bit version. You could re-program R so that it can
work to, say, 128 bits or 256 bits even on a 32-bit machine
(using techniques like those that underlie 'bc'), but
that would be an arbitrary choice. However, the essential
point is that some choice is unavoidable, since if you push
it too far the Universe will run out of particles -- and the
computer industry will run out of transistors long before
you hit the Universe limit!

So you just have to accept the limits. Provided you are aware
of the approximations which may set in at some point, you can
cope with the consequences, so long as you take account of
some concept of "adequacy" in the inevitable approximations.
Simply to "fail" is far too unsophisticated a result!

Hoping this is useful,
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 14-Dec-11   Time: 00:52:49
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] array extraction

2011-09-27 Thread Ted Harding

Somewhat out of my depth here (I have only 2 arms, but am
swimming in waters which require 3): My interpretation is
that a and M are basically vectors, with dimension attributes,
accessed down the columns.

The array 'a' consists of 30 elements 1:30 in that order,
accessed by each of 3 rows for each of 5 columns in each
of two "layers", in that order of precedence.

The matrix M consusts of 15 elements, accessed by each
of 35 rows for each of 5 columns.

Thus a(M) treats M as an selection vector, ignoring dimension,
and reads along a and at the same time along M, selecting
according to TRUE of FALSE. Then, when it gets to the end
of the first "layer" in 'a' it re-cycles M.

When I tried a[M,] I got

  a[M,]
  # Error in a[M, ] : incorrect number of dimensions

I infer that it is treating the M as a vector, so there
are only 2 dimensions in a[M,] instead of 3. So try:

  a[M,,]
  # Error: (subscript) logical subscript too long

which is not surprising since one is applying a 15-long
selector to the first dimension of 'a', which has only
length 3, just as in

  a[rep(TRUE,15),,]
  # Error: (subscript) logical subscript too long

which, interestingly, differs from

  a[rep(1,15),,]
  # (Output, which is what you'd expect, omitted)

(Hmm, out of my depth again ... ).

Well, maybe two arms is not enough when you need three
to swim in these waters; but it seems that one long swishy
tail will do nicely. That being said, I still find the
water quite murky!
Ted.

On 27-Sep-11 22:12:26, robin hankin wrote:
> thank you Simon.
> 
> I find a[M] working to be unexpected, but consistent with (a close
> reading of) Extract.Rd
> 
> Can we reproduce a[,M]?
> 
> [I would expect this to extract a[,j,k] where M[j,k] is TRUE]
> 
> try this:
> 
> 
>> a <- array(1:30,c(3,5,2))
>> M <- matrix(1:10,5,2) %% 3==1
>> a[M]
>  [1]  1  4  7 10 11 14 17 20 21 24 27 30
> 
> This is not doing what I would want a[,M] to do.
> 
> 
> 
> 
> I'll checkout afill() right now
> 
> best wishes
> 
> 
> Robin
> 
> 
> On Wed, Sep 28, 2011 at 10:39 AM, Simon Knapp 
> wrote:
>> a[M] gives the same as your `cobbled together' code.
>>
>> On Wed, Sep 28, 2011 at 6:35 AM, robin hankin 
>> wrote:
>>>
>>> hello everyone.
>>>
>>> Look at the following R idiom:
>>>
>>> _a <- array(1:30,c(3,5,2))
>>> _M <- (matrix(1:15,c(3,5)) %% 4) < 2
>>> _a[M,] <- 0
>>>
>>> Now, I think that "a[M,]" has an unambiguous meaning (to a human).
>>> However, the last line doesn't work as desired, but I expected it
>>> to...and it recently took me an indecent amount of time to debug an
>>> analogous case. _Just to be explicit, I would expect a[M,] to extract
>>> a[i,j,] where M[i,j] is TRUE. _(Extract.Rd is perfectly clear here,
>>> and R
>>> is
>>> behaving as documented).
>>>
>>> The best I could cobble together was the following:
>>>
>>> _ind <- which(M,arr.ind=TRUE)
>>> _n <- 3
>>> _ind <-
>>> cbind(kronecker(ind,rep(1,dim(a)[n])),rep(seq_len(dim(a)[n]),nrow(ind)
>>> ))
>>> _a[ind] <- 0
>>>
>>>
>>> but the intent is hardly clear, certainly compared to "a[M,]"
>>>
>>> I've been pondering how to implement such indexing, and its
>>> generalization.
>>>
>>> Suppose 'a' is a seven-dimensional array, and M1 a matrix and M2 a
>>> three-dimensional array (both Boolean). _Then "a[,M1,,M2]" is a
>>> natural generalization of the above. _I would want a[,M1,,M2] to
>>> extract a[i1,i2,i3,i4,i5,i6,i7] where M1[i2,i3] and M[i5,i6,i7] are
>>> TRUE.
>>>
>>> One would need all(dim(a)[2:3] == dim(M1)) and all(dim(a)[5:7] ==
>>> dim(M2)) for consistency.
>>>
>>> Can any R-devel subscribers advise?
>>>
>>>
>>>
>>>
>>> --
>>> Robin Hankin
>>> Uncertainty Analyst
>>> hankin.ro...@gmail.com
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
> 
> 
> 
> -- 
> Robin Hankin
> Uncertainty Analyst
> hankin.ro...@gmail.com
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 28-Sep-11   Time: 00:28:37
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Wish there were a "strict mode" for R interpreter. What

2011-04-09 Thread Ted Harding

On 09-Apr-11 20:37:28, Duncan Murdoch wrote:
> On 11-04-09 3:51 PM, Paul Johnson wrote:
>> Years ago, I did lots of Perl programming. Perl will let you be lazy
>> and write functions that refer to undefined variables (like R does),
>> but there is also a strict mode so the interpreter will block anything
>> when a variable is mentioned that has not been defined. I wish there
>> were a strict mode for checking R functions.
>>
>> Here's why. We have a lot of students writing R functions around here
>> and they run into trouble because they use the same name for things
>> inside and outside of functions. When they call functions that have
>> mistaken or undefined references to names that they use elsewhere,
>> then variables that are in the environment are accidentally used. Know
>> what I mean?
>>
>> dat<- whatever
>>
>> someNewFunction<- function(z, w){
>> #do something with z and w and create a new "dat"
>> # but forget to name it "dat"
>>  lm (y, x, data=dat)
>> # lm just used wrong data
>> }
>>
>> I wish R had a strict mode to return an error in that case. Users
>> don't realize they are getting nonsense because R finds things to fill
>> in for their mistakes.
>>
>> Is this possible?  Does anybody agree it would be good?
>>
> 
> It would be really bad, unless done carefully.
> 
> In your function the free (undefined) variables are dat and lm.  You 
> want to be warned about dat, but you don't want to be warned about lm. 
> What rule should R use to determine that?
> 
> (One possible rule would work in a package with a namespace.  In that 
> case, all variables must be found in declared dependencies, the search 
> could stop before it got to globalenv().  But it seems unlikely that 
> your students are writing packages with namespaces.)
> 
> Duncan Murdoch

I'm with Duncan on this one! On the other hand, I can understand the
issues that Paul's students might encounter.

I think the right thing to so is to introduce the students to the
basics of scoping, early in the process of learning R.

Thus, when there is a variable (such as 'lm' in the example) which
you *expect* to already be out there (since 'lm' is in 'stats'
which is pre-loaded by default), then you can go ahead and use it.

But when your function uses a variable (e.g. 'dat') which just
*happened* to be out there when you first wrote the function,
then when you re-use the same function definition in a different
context things are likely to go wrong. So teach them that variables
which occur in functions, which might have any meaning in whatever
the context of use may be, should either be named arguments in
the argument list, or should be specifically defined within the
function, and not assumed to already exist unless that is already
guaranteed in every context in which the function would be used.

This is basic good practice which, once routinely adopted, should
ensure that the right thing is done every time!

Ted.

E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 09-Apr-11   Time: 22:08:10
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Surprising behavior of letters[c(NA, NA)]

2010-12-17 Thread Ted Harding

On 17-Dec-10 14:32:18, Gabor Grothendieck wrote:
> Consider this:
> 
>> letters[c(2, 3)]
> [1] "b" "c"
>> letters[c(2, NA)]
> [1] "b" NA
>> letters[c(NA, 3)]
> [1] NA  "c"
>> letters[c(NA, NA)]
>  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> NA NA NA
> [26] NA
> 
> The result is a 2-vector in each case until we get to c(NA, NA) and
> then it unexpectedly changes from returning a 2-vector to returning a
> 26-vector.  I think most people would have expected that the answer
> would be c(NA, NA).

I'm not sure that it is suprising! Consider
  letters[NA]
which returns exactly the same result. Then consider that 'letters' is
simply a 26-element character vector c("a",...). Now consider

  x <- c(1,2,3,4,5,6,7,8,9,10,11,12,13)
  x[NA]
  # [1] NA NA NA NA NA NA NA NA NA NA NA NA NA

In other words, x[NA] for any vector x will test each index 1:length(x)
against NA, and will find that it's NA, since it doesn't know whether
the index matches or not. Therefore it returns NA for that index, and
will do the same for every index. So it's telling you: "For each of my
elements a,b,c,d,e,f,... I have to tell you that I don't know whether
you want it or not". You also get similar behavior for x==NA.

If anything might be surprising (though that also admits a logical
explanation), is the result

  letters[c(2, NA)]
  # [1] "b" NA

since the result being asked for by the first element of c(2,NA) is
definite -- so far so good -- but then you would expect it to have the
same problem with what is being asked for by NA. This time, it seems
that because the 2-element vector c(2,NA) is being submitted, its
length over-rides the length of the response that would be given for
x[NA]: "You asked for a 2-element extraction from letters; I can see
what you want for the first, but not for the second".

However, that logic does not work for letters[c(NA,NA)] which still
returns the 26-element result!

After all that, I'm inclined to the view that letters[NA] should
return one element (NA), letters[c(NA,NA)] should return 2 (NA,NA),
etc.; and that the same should apply to all vectors accessed by [].
The above behaviour seems to contradict [what I can understand from]
what is said in ?"[":

NAs in indexing:
 When extracting, a numerical, logical or character 'NA' index
 picks an unknown element and so returns 'NA' in the corresponding
 element of a logical, integer, numeric, complex or character
 result, and 'NULL' for a list.  (It returns '00' for a raw
 result.]

since that seems to imply that x[c(NA,NA)] should return c(NA,NA)
and not rep(NA,length(x))!

Ted.

E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 17-Dec-10   Time: 15:40:03
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [R] Logical vectors

2010-11-04 Thread Ted Harding

On 04-Nov-10 08:56:42, Gerrit Eichner wrote:
> On Thu, 4 Nov 2010, Stephen Liu wrote:
> [snip]
>> In;
>>
>> 2.4 Logical vectors
>> http://cran.r-project.org/doc/manuals/R-intro.html#R-and-statistics
>>
>> It states:-
>>
>> The logical operators are <, <=, >, >=, == for exact equality and !=
>> for inequality 
>>
>>># exact equality
>> !=   # inequality
> 
> [snip]
> 
> 
> Hello, Stephen,
> in my understanding of the sentence
> 
> "The logical operators are <, <=, >, >=, == for exact equality and !=
> for inequality "
> 
> the phrase "exact equality" refers to the operator "==", i. e. to the
> last element "==" in the enumeration (<, <=, >, >=, ==), and not to its
> first.
> 
>   Regards  --  Gerrit

This indicates that the sentence can be mis-read. It should be
cured by a small change in punctuation (hence I copy to R-devel):

  The logical operators are <, <=, >, >=; == for exact equality;
  and != for inequality 

Hoping this helps!
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 04-Nov-10   Time: 09:08:37
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] small syntax suggestion

2010-08-24 Thread Ted Harding

On 24-Aug-10 14:42:11, Davor Cubranic wrote:
> On August 23, 2010 01:27:24 pm Philippe Grosjean wrote:
>> They have to write such a code like this:
>> 
>> if (x < -3) do_something
>> 
>> That way, there is no ambiguity. Don't you think it's important to
>> write clear code, including by using spaces where it makes it easier
>> to read,... and less ambiguous, as you just realize?
> 
> I fully agree, and I'm sure nobody here would dispute your advice. But 
> we all sometimes make typos, and the point here is that the grammar's 
> ambiguity makes for hard-to-find bugs.
> 
> So, if I may focus us back on the OP's suggestion: "that R simply warn 
> about an ambiguity" in 'if' statement's comparison involving '<-[0-9]'.
> It doesn't seem like an unreasonable suggestion. For comparison, GCC 
> will do the same thing with C code when the '-Wparentheses' switch if 
> assignment is used in a context where a truth value is expected. (E.g.,
> 'if (x = 3)'.)
> 
> It's been a very long time since I looked at Yacc and Lex source, but
> it looks like function 'xxif' in gram.y is the earliest place where we
> have a whole IF statement. AFAICT, this is evaluated in 'do_if'
> function of eval.c. There, the condition is evaluated via
> 'asLogicalNoNA'. Could 'do_if' instead use a function similar to
> 'asLogicalNoNA' but which issues a warning if the condition is an
> assignment?
> 
> Davor

I'm a bit doubtful about the suggestion to "trap" this kind of
thing and issue warnings afterwards. It's not just in if() statements,
but anywhere where a logical value (as "x< -3") can validly be placed
and an assignment (as "x<-3") can validly be placed.

E.g.:

  if(x<-3) print("yes") else print("no")
  # [1] "yes"

because:

  as.logical(x<-3)
  # [1] TRUE
  as.logical(x<-0)
  # [1] FALSE

Or:

  x <- -5
  y <- c(5,4,3,2,1)
  y[x<-3]
  # [1] 3

  x <- -5
  y[x< -3]
  # [1] 5 4 3 2 1

It may be all very well to say that such examples look silly
(in what context might anyone mean them seriously?), but
remember that the context of this discussion is precisely
where someone has written something they didn't mean, so
"might mean them seriously" is not a criterion.

As illustrated above, "x<-3" is valid in all sorts of syntactic
contexts.

The upshot is that, if such warnings are to be implemented
in any context -- not just "if()" -- where one wanted to
protect the innocent from their own fingers, then they would
be triggered *every time* the expression "x<-3" (or the like)
occurred! Of course one would need to be able to turn this
warning off, but if (e.g. as a precaution in a class of
beginnners) it were turned on, then there would be a huge
number of false positives filling up the screen in almost
any normal program. Reassuring for beginners!

So, on those grounds, I doubt its wisdom (and would prefer
giving the advice to bracket things, as in "x<(-3)". It's
a potential syntactic trap, but it's only one of many which
can be avoided in similar ways, and I think it's better to
teach avoidance rather than warning after the event.

Ted.

E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 24-Aug-10   Time: 16:18:27
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] small syntax suggestion

2010-08-23 Thread Ted Harding

On 23-Aug-10 17:50:44, Barry Rowlingson wrote:
> On Mon, Aug 23, 2010 at 6:06 PM, Davor Cubranic 
> wrote:
> 
>> The students are trying to *compare* to a negative number, and
>> trip on> R's parsing of "<-". They could use '=' for assignment
>> all they want (which I thought is being discouraged as a code style
>> these days, BTW), and they'll still run into this problem.
> 
>  Oops yes, negative logic assumption from me.
> 
>  Okay, back to the question...
> 
>  Looks tricky, because if(x=2){...} fails because of syntax.
>  There's stuff in gram.y that makes x=1 a bit special, and only
>  lets it occur as a top-level expression. However x<-1 is allowed
>  anywhere an expression can be. Both expressions then call the
>  same 'primitive' function. At that point I'm not sure how the
>  code could find out if it's being called from a <- or an =
>  assignment. And then whether its a top-level expression or not.
>  And this would need checking on every <- call, which could be
>  a problem...
> 
>  Barry

Indeed! R has a number of these tricky little syntatic traps,
perhaps especially where precedence or operators is concerned.
For example:

  1:10-1
  # [1] 0 1 2 3 4 5 6 7 8 9
  1-1:10
  # [1]  0 -1 -2 -3 -4 -5 -6 -7 -8 -9
  -1:10
  # [1] -1  0  1  2  3  4  5  6  7  8  9 10
  1+ -1:10
  # [1]  0  1  2  3  4  5  6  7  8  9 10 11
  1+ -(1:10)
  # [1]  0 -1 -2 -3 -4 -5 -6 -7 -8 -9

In due course people will learn (most of) the details of
precedence, but certainly at the beginning stage I would
strongly recommend putting brackets round *anything* which
is to be considered as an entity, just to avoid getting
things wrong. So, with the item in the original query:

  if (x<-3) do_something;

if they wrote it as

  if (x<(-3)) do_something;

there would be no problem (and no doubt about what went
with what). Of course in complicated expressions this could
induce an episode of ocular lispopia, but apart from that
it's safe! It's certainly something I tend to do even now,
simply for the sake of readability (aka "visual parsing").

Ted.

E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 23-Aug-10   Time: 20:13:26
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] No RTFM?

2010-08-22 Thread Ted Harding

On 22-Aug-10 23:29:39, Paul Johnson wrote:
> Dude:
> What's so objectionable about filling in the output of
>>systemInfo()
> ??
> What particular piece is too onerous to ask of people who
> are asking questions?
> -- 
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas

(with Cc: to r-devel)
I presume you mean "sessionInfo()". "systemInfo()" hasn't been
mentioned so far, I think.

[1] It may or may not be relevant to the query. If it's relevant,
then it's fine. If not. it's a waste of space. The current
Posting Guide already advises it under "Surprising behavior
and bugs:", where such advice is appropriate.
It is not appropriate for (e.g.) a query asking for advice about
packages for multiple imputation.

[2] I am not objecting to it as such. I am questioning your proposal
that
  1. Every question to r-help should begin with the following.
  A. Output from the command sessionInfo()
  B. Output from Sys.getlocale()
  [etc.]
Not *every* question, as I've said before. It depends on the case.

[3] I have tried to argue for a moderate and flexible spirit in
what is advised in the Posting Guide. I am very uncomfortable
about proposals as prescriptive and rigid as yours seem to be.
Users, especially beginners, are likely to be discouraged from
posting to R-help if faced with stringent (and possibly not
relevant) requirements on how they should post. If faced with
responses which chastise them for not "following the rules"
they may well be put off using R altogether.

As I have said before, I see the function of R-help as helping,
in as cooperative a way as possible. The function of the Posting
Guide should likewise be to help them to write good questions,
with advice on what mey be necessary, what useful, what not useful.

The current Posting Guide is already quite reasonable in these
respects, and perhaps would benefit most from being made being
somewhat re-formatted, without essential change.

Ted.

E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 23-Aug-10   Time: 03:22:01
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] No RTFM?

2010-08-22 Thread Ted Harding

On 22-Aug-10 18:31:46, Paul Johnson wrote:
> Hey Ben:
> One of my colleagues bought your book and was reading it during a
> faculty meeting last Tuesday.  Everybody kept asking what's that?
> 
> If you know how to put it in the wiki, would you please do it and let
> us know where it is.  I was involved with an R wiki about 5 or 6 years
> ago, but completely lost track of it.
> 
> I'm going to keep pushing for sharp, clear guidelines.  If I could get
> R-help set up as a template for messages like  bugzilla, I think it
> would be awesome! To disagree with Ted. People won't mind giving us
> what we need, as long as we tell them exactly what to do. We save
> ourselves the circle of "give us sessionInfo()" and "give us your
> LANG" and 'what version is that' and so forth.

"People won't mind"? If R-help ends up telling me exactly what to do,
I shall leave the list. I mean it. For good.
Ted.

[the rest snipped]


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 22-Aug-10   Time: 19:49:41
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] No RTFM?

2010-08-21 Thread Ted Harding

r, that a question *purely* about Statistics
would normally be out of order. E.g. "How can I prove that 1*(X==0)
is an unbiased estimate of exp(-mu) in a Poisson distribution".

> B. If you have trouble with a package from CRAN or elsewhere,
> write to the author of that package.

It depends on the trouble. I think R-help is usually a good and
legitimate place to ask, since it is likely to come to the
attention of several people who have used the package, and may
have encountered the problem or understand the reasons for it.
They may even have better solutions than the author of the package
could provide! I have myself had the experience of discussing
a problem with a package on R-help, locating the problem (and
devising its solution) in the code, leading to my mailing the
package maintainer who was not aware of the problem!

> r-help might be a good place to ask, "I've been using package
> XYZ and the author seems to have abandoned the project.
> Does anybody know of a replacement?" Otherwise, don't.

For reasons above, I don't agree with this,

> Note that the Bioconductor repository is not part of "R" proper
> and you should address questions about Bioconductor to their
> support framework.

Presumably you mean the various special lists for Bioconductor?
In such a case, that is fair enough, since anyhone using Bioconductor
is likely to be subscribed to such lists anyway. (But they wouldn't
need to be told about this in the general Posting Guide).

[***]
There are many packages, however, such as combinat, boot, cluster,
etc. which are not part of R-base (though many of them are in
"recommended") which are in frequent use, and questions about them
to R-help are perfectly in order (and the individual author of
such a package does not want to be the sole recipient of frequent
questions about it -- the R-help community can carry the load
better with its "distributed model").

> C. If you are writing code for R itself, or if you are developing a
>    package, send your question to r-devel, rather than r-help.

It depends on the question!

> D. For operating-system or R interface questions, there are dedicated
> lists. See R-sig-Mac, R-sig-Debian, R-sig-Fedora, etc.

Fair enough (in most cases).

> ==
> 
> It will be necessary to add, toward the end, the part about
> "be polite when posting."

And, since a reply is also a posting, "be polite when responding"!

> And along the lines of the "No RTFM" policy, I think we should say
> "All RTFM answers should include an exact document and section
> number." It is certainly insulting to answer a question about plot
> with the one line
> 
> ?plot
> 
> but it is not insulting to say "In ?plot, check the "Details"
> section and run the example code."

Fair enough -- and indeed this is an example of a helpful and
sufficient (for most people) response, and therefore is not in
the "RTFM" category.



E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 22-Aug-10   Time: 03:13:22
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] suggestion for ?factor

2010-08-09 Thread Ted Harding

Hi Folks.
May I suggest a small addition to ?factor -- this is not explicit
about how to force the ordering of levels in an ordered factor.

One can guess it, but I think it would be a good idea to spell it out.
For example, one can do:

  X <- c("Alf","Bert","Chris","Dave")
  F <- factor(X,levels=c("Dave","Alf","Chris","Bert"),ordered=TRUE)
  F
  # [1] Alf   Bert  Chris Dave 
  # Levels: Dave < Alf < Chris < Bert

So, where ?factor says:

  levels: an optional vector of the values that ?x? might have taken.
  The default is the unique set of values taken by
  'as.character(x)', sorted into increasing order of 'x'.
  Note that this set can be smaller than 'sort(unique(x))'.

it could include a bit extra so as to read:

  levels: an optional vector of the values that 'x' might have taken.
  The default is the unique set of values taken by
  'as.character(x)', sorted into increasing order of 'x'.
  Note that this set can be smaller than 'sort(unique(x))'.
  If 'levels=...' is present and 'ordered=TRUE', then the
  levels are ordered according to the order in which they
  are listed in 'levels'.

With thanks,
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 09-Aug-10   Time: 16:37:50
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] results of pnorm as either NaN or Inf

2010-05-13 Thread Ted Harding

On 13-May-10 20:04:50, efree...@berkeley.edu wrote:
> I stumbled across this and I am wondering if this is unexpected
> behavior or if I am missing something.
> 
>> pnorm(-1.0e+307, log.p=TRUE)
> [1] -Inf
>> pnorm(-1.0e+308, log.p=TRUE)
> [1] NaN
> Warning message:
> In pnorm(q, mean, sd, lower.tail, log.p) : NaNs produced
>> pnorm(-1.0e+309, log.p=TRUE)
> [1] -Inf
> 
> I don't know C and am not that skilled with R, so it would be hard
> for me to look into the code for pnorm. If I'm not just missing
> something, I thought it may be of interest.
> 
> Details:
> I am using Mac OS X 10.5.8. I installed a precompiled binary version.
> Here is the output from sessionInfo(), requested in the posting guide:
> R version 2.11.0 (2010-04-22) 
> i386-apple-darwin9.8.0 
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base   
> 
> loaded via a namespace (and not attached):
> [1] tools_2.11.0
> 
> Thank you very much,
> 
> Eric Freeman
> UC Berkeley

This is probably platform-independent. I get the same results with
R on Linux. More to the point:

You are clearly "pushing the envelope" here. First, have a look
at what R makes of your inputs to pnorm():

  -1.0e+307
  # [1] -1e+307
  -1.0e+308
  # [1] -1e+308
  -1.0e+309
  # [1] -Inf

So, somewhere beteen -1e+308 and -1.0e+309. the envelope burst!
Given -1.0e+309, R returns -Inf (i.e. R can no longer represent
this internally as a finite number).

Now look at

  pnorm(-Inf,log.p=TRUE)
  # [1] -Inf

So, R knows how to give the correct answer (an exact 0, or -Inf
on the log scale) if you feed pnorm() with -Inf. So you're OK
with -1e+N where N >= 309.

For smaller powers, e.g. -1e+(200:306), these give pnorm() much
less than -1.0e+309, and presumably R's algorithm (which I haven't
studied either ... ) returns 0 for pnorm(), as it should to the
available internal accuracy.

So, up to pnorm(-1.0e+307, log.p=TRUE) = -Inf. All is as it should be.
However, at -1e+308, "the envelope is about to burst", and something
may occur within the algorithm which results in a NaN.

So there is nothing anomalous about your results except at -1e+308,
which is where R is at a critical point.

That's how I see it, anway!
Ted.

E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 14-May-10   Time: 00:01:27
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] y ~ X -1 , X a matrix

2010-03-17 Thread Ted Harding

On 17-Mar-10 23:32:41, Ross Boylan wrote:
> While browsing some code I discovered a call to lm that used
> a formula y ~ X - 1, where X was a matrix.
> 
> Looking through the documentation of formula, lm, model.matrix
> and maybe some others I couldn't find this useage (R 2.10.1).
> Is it anything I can count on in future versions?  Is there
> documentation I've overlooked?
> 
> For the curious: model.frame on the above equation returns a
> data.frame with 2 columns.  The second "column" is the whole X
> matrix. model.matrix on that object returns the expected matrix,
> with the transition from the odd model.frame to the regular
> matrix happening in an .Internal call.
> 
> Thanks.
> Ross
> 
> P.S. I would appreciate cc's, since mail problems are preventing
> me from seeing list mail.

Hmmm ... I'm not sure what is the problem with what you describe.
Code:

  set.seed(54321)
  X  <- matrix(rnorm(50),ncol=2)
  Y  <- 1*X[,1] + 2*X[,2] + 0.25*rnorm(25)
  LM <- lm(Y ~ X-1)

  summary(LM)
  # Call:
  # lm(formula = Y ~ X - 1)
  # Residuals:
  #  Min   1Q   Median   3Q  Max 
  # -0.39942 -0.13143 -0.02249  0.11662  0.61661 
  # Coefficients:
  #Estimate Std. Error t value Pr(>|t|)
  # X1  0.977070.04159   23.49   <2e-16 ***
  # X2  2.091520.06714   31.15   <2e-16 ***
  # ---
  # Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 
  # Residual standard error: 0.2658 on 23 degrees of freedom
  # Multiple R-squared: 0.9863, Adjusted R-squared: 0.9851 
  # F-statistic: 826.6 on 2 and 23 DF,  p-value: < 2.2e-16 

  model.frame(LM)
  #  Y  X.1  X.2
  # 1   0.04936244 -0.178900750  0.051420078
  # 2  -0.54224173 -0.928044132 -0.027963292
  # [...]
  # 24  1.54196979  0.312332806  0.602009497
  # 25 -0.16928420 -1.285559427  0.394790358

  str(model.frame(LM))
  #  $ Y: num  0.0494 -0.5422 -0.7295 -3.4422 -3.1296 ...
  #  $ X: num [1:25, 1:2] -0.179 -0.928 -0.784 -1.651 -0.408 ...
  # [...]

  model.frame(Y ~ X-1)
  #  Y  X.1  X.2
  # 1   0.04936244 -0.178900750  0.051420078
  # 2  -0.54224173 -0.928044132 -0.027963292
  # [...]
  # 24  1.54196979  0.312332806  0.602009497
  # 25 -0.16928420 -1.285559427  0.394790358
  ## (Identical to above)

  str(model.frame(Y ~ X-1))
  # $ Y: num  0.0494 -0.5422 -0.7295 -3.4422 -3.1296 ...
  # $ X: num [1:25, 1:2] -0.179 -0.928 -0.784 -1.651 -0.408 ...
  # [...]
  ## (Identical to above)

Maybe the clue (admittedly somewhat obtuse( can be found in ?lm:

  lm(formula, data, subset, weights, na.action,
 method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
 singular.ok = TRUE, contrasts = NULL, offset, ...)
  [...]

  data: an optional data frame, list or environment (or object
coercible by 'as.data.frame' to a data frame) containing the
variables in the model.  If not found in 'data', the variables
are taken from 'environment(formula)', typically the
environment from which ?lm? is called.

So, in the example the variables are taken from X, "coercible
by 'as.data.frame' ... taken from 'environment(formula)'".

Hence (I guess) X is found in the environment and is coerced
into a dataframe with 2 columns, and X.1, X.2 are taken from there.

R Gurus: Please comment! (I'm only guessing by plausibility).
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 18-Mar-10   Time: 00:57:20
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Possible bug in fisher.test() (PR#14196)

2010-01-27 Thread Ted Harding

On 27-Jan-10 17:30:10, nhor...@smith.edu wrote:
># is there a bug in the calculation of the odds ratio in fisher.test?
># Nicholas Horton, nhor...@smith.edu Fri Jan 22 08:29:07 EST 2010
> 
> x1 = c(rep(0, 244), rep(1, 209))
> x2 = c(rep(0, 177), rep(1, 67), rep(0, 169), rep(1, 40))
> 
> or1 = sum(x1==1&x2==1)*sum(x1==0&x2==0)/
>  (sum(x1==1&x2==0)*sum(x1==0&x2==1))
> 
> library(epitools)
> or2 = oddsratio.wald(x1, x2)$measure[2,1]
> 
> or3 = fisher.test(x1, x2)$estimate
> 
># or1=or2 = 0.625276, but or3=0.6259267!
> 
> I'm running R 2.10.1 under Mac OS X 10.6.2.
> Nick

Not so. Look closely at ?fisher.test:

Value:
[...]
estimate: an estimate of the odds ratio.  Note that the
  _conditional_ Maximum Likelihood Estimate (MLE)
  rather than the unconditional MLE (the sample
  odds ratio) is used. Only present in the 2 by 2 case.

Your or1 (and presumably the epitools value also) is the sample OR.

The conditional MLE is the value of rho (the OR) that maximises
the probability of the table *conditional* on the margins.

In this case it differs slightly from the sample OR (by 0.1%).
For smaller tables it will tend to differ even more, e.g.

  M1 <- matrix(c(4,7,17,18),nrow=2)
  M1
  #  [,1] [,2]
  # [1,]4   17
  # [2,]7   18

  (4*18)/(17*7)
  # [1] 0.605042

  fisher.test(M1)$estimate
  # odds ratio 
  # 0.6116235  ## (1.1% larger than sample OR)

  M2 <- matrix(c(1,2,4,5),nrow=2)
  M2
  #  [,1] [,2]
  # [1,]14
  # [2,]25

  (1*5)/(4*2)
  # [1] 0.625

  fisher.test(M2)$estimate
  # odds ratio 
  # 0.649423  ## (3.9% larger than sample OR)

The probability of a table matrix(c(a,b,c,d),nrow=2) given
the marginals (a+b),(a+c),(b+c) and hence also (c+d) is
a function of the odds ratio only. Again see ?fisher.test:

  "given all marginal totals fixed, the first element of
   the contingency table has a non-central hypergeometric
   distribution with non-centrality parameter given by
   the odds ratio (Fisher, 1935)."

The value of the odds ratio which maximises this (for given
observed 'a') is not the sample OR.

Hoping this helps,
Ted.

E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 27-Jan-10   Time: 18:14:57
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Benefit of treating NA and NaN differently for numerics

2009-12-31 Thread Ted Harding

On 31-Dec-09 20:43:43, Saptarshi Guha wrote:
> Hello,
> I notice in main/arithmetic.c, that NA and NaN are encoded
> differently(since every numeric NA comes  from R_NaReal which is
> defined via ValueOfNA)
> What is the benefit of treating these two differently? Why can't NA
> be a synonym for NaN?
> 
> Thank you
> Saptarshi
> (R-2.9)

Because they are used to represent different things. Others will be
able to give you a much more comprehensive account than I can of
their uses in R, but essentially:

NaN represents a result which is not valid (i.e. "Not a Number")
in the domain of quantities being evaluated. For example, R does
its arithmetic by default in the domain of "double", i.e. the
machine representation of real numbers. In this domain, sqrt(-1)
does not exist -- it is not a number in the domain of real numbers.
Hence:

  sqrt(-1)
  # [1] NaN
  # Warning message:
  # In sqrt(-1) : NaNs produced

In order to obtain a result which does exist, you need to switch
domain to complex numbers:

  > sqrt(as.complex(-1))
  # [1] 0+1i

NA, on the other hand, represents a value (in whatever domain:
double, logical, character, ...) which is not known, which is why
it is typically used to represent missing data. It would be a valid
entity in the current domain if its value were known, but the value
is not known. Hence the result of any expression involving NA
quantities is NA, since the value if the expression would depend on
the unkown elements, and hence the value of the expression is unknown.

This distinction is important and useful, so it should not be done
away with by merging NaN and NA!

Best wishes,
Ted.

--------
E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 31-Dec-09   Time: 21:05:06
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] a little bug for the function 'sprintf' (PR#14161)

2009-12-21 Thread Ted Harding

On 21-Dec-09 17:36:12, Peter Dalgaard wrote:
> baoli...@gmail.com wrote:
>> Dear R-ers,
>> I am a gratuate student from South China University of Technology.
>> I fond the function 'sprintf' in R2.10.1 have a little bug(?):
>> 
>> When you type in the example codes:
>> 
>>> sprintf("%s is %f feet tall\n", "Sven", 7.1)
>> 
>> and R returns:
>> 
>> [1] "Sven is 7.10 feet tall\n"
>> 
>> this is very different from the 'sprintf' function in C/C++,
>> for in C/C++, the format string "\n" usually represents a new line,
>> but here, just the plain text "\n"!
> 
> No, this is exactly the same as in C/C++. If you compare the result
> of sprintf to "Sven is 7.10 feet tall\n" with strcmp() in C,
> they will compare equal.
> 
>  > s <- sprintf("%s is %f feet tall\n", "Sven", 7.1)
>  > s
> [1] "Sven is 7.10 feet tall\n"
>  > nchar(s)
> [1] 27
>  > substr(s,27,27)
> [1] "\n"
> 
> The thing that is confusing you is that strings are DISPLAYED
> using the same escape-character mechanisms as used for input.
> Compare
> 
>  > cat(s)
> Sven is 7.10 feet tall
>  >
> 
>> 
>> Is it a bug, or a deliberate design?
> 
> Design, not bug (and please don't file as bug when you are in doubt.)

And another confusion is that the C/C++ function sprintf() indeed
creates a string AND ASSIGNS IT TO A NAMED VARIABLE, according to
the syntax

  int sprintf(char *str, const char *format, ...);

as in

  char *X ;
  sprintf(X,"%s is %f feet tall\n", "Sven", 7.1) ;

as a result of which the string X will have the value
  "Sven is 7.10 feet tall\n"

R's sprintf does not provide for the parameter "Char *str", here X,
and so RETURNS the string as the value of the function.

This is NOT TO BE CONFUSED with the behaviour of the C/C++ functions
printf() and fprintf(), both of which create the string and then
send it to either stdout or to a file:

  int printf(const char *format, ...);
  int fprintf(FILE *stream, const char *format, ...);

Therefore, if you programmed

  printf("%s is %f feet tall\n", "Sven", 7.1) ;

you would see on-screen (stdout) the string 

  "Sven is 7.10 feet tall"

(followed by a line-break due to the "\n"), while

  mystream = fopen("myoutput.txt",a) ;
  fprintf(mystream, "%s is %f feet tall\n", "Sven", 7.1) ;

would append

  "Sven is 7.10 feet tall"

(followed by a line-break) to myoutput.txt

Hoping this helps!
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 21-Dec-09   Time: 18:14:57
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Binning of integers with hist() function odd results (P (PR#14047)

2009-11-07 Thread ted . harding

On 06-Nov-09 23:30:12, g...@fnal.gov wrote:
> Full_Name: Gerald Guglielmo
> Version: 2.8.1 (2008-12-22)
> OS: OSX Leopard
> Submission from: (NULL) (131.225.103.35)
> 
> When I attempt to use the hist() function to bin integers the behavior
> seems
> very odd as the bin boundary seems inconsistent across the various
> bins. For
> some bins the upper boundary includes the next integer value, while in
> others it
> does not. If I add 0.1 to every value, then the hist() binning behavior
> is what
> I would normally expect. 
> 
>> h1<-hist(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5))
>> h1$mids
> [1] 1.5 2.5 3.5 4.5
>> h1$counts
> [1] 3 3 4 5
>> h2<-hist(c(1.1,2.1,2.1,3.1,3.1,3.1,4.1,4.1,4.1,4.1,5.1,5.1,5.1,5.1,5.1)
>> )
>> h2$mids
> [1] 1.5 2.5 3.5 4.5 5.5
>> h2$counts
> [1] 1 2 3 4 5
> 
> Naively I would have expected the same distribution of counts in the
> two cases, but clearly that is not happening. This is a simple example
> to illustrate the behavior, originally I noticed this while binning a
> large data sample where I had set the breaks=c(0,24,1).

This is the correct intended bahaviour. By default, values which are
exactly on the boundary between two bins are counted in the bin which
is just below the boundary value. Except that the bottom-most break
will count values on it into the bin just above it.

Hence 1,2,2 all go into the [1,2] bin; 3,3,3 into (2,3];
4,4,4,4 into (3,4]; and 5,5,5,5,5 into (4,5]. Hence the counts 3,3,4,5.

Since you did not set breaks in
  h1<-hist(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5)),
they were set using the default method, and you can see what they are
with

  h1$breaks
  [1] 1 2 3 4 5

When you add 0.1 to each value, you push the values on the boundaries
up into the next bin. Now each value is inside its bin, and not on
any boundary. Hence 1.1 is in (1,2]; 2.1,2.1 in (2,3];
3.1,3.1,3.1 in (3,4]; 4.1,4.1,4.1,4.1 in (4,5]; and
5.1,5.1,5.1,5.1,5.1 in (5,6], giving counts 1,2,3,4,5 as you observe.

The default behaviour described above is defined by the default options

  include.lowest = TRUE, right = TRUE

where:

include.lowest: logical; if 'TRUE', an 'x[i]' equal to the 'breaks'
  value will be included in the first (or last, for 'right =
  FALSE') bar.  This will be ignored (with a warning) unless
  'breaks' is a vector.

   right: logical; if 'TRUE', the histograms cells are right-closed
  (left open) intervals.

See '?hist'. You can change this behaviour by shanging the options.

Hoping this helps,
Ted.

E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 07-Nov-09   Time: 13:57:07
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Binning of integers with hist() function odd results (P

2009-11-07 Thread Ted Harding

On 06-Nov-09 23:30:12, g...@fnal.gov wrote:
> Full_Name: Gerald Guglielmo
> Version: 2.8.1 (2008-12-22)
> OS: OSX Leopard
> Submission from: (NULL) (131.225.103.35)
> 
> When I attempt to use the hist() function to bin integers the behavior
> seems
> very odd as the bin boundary seems inconsistent across the various
> bins. For
> some bins the upper boundary includes the next integer value, while in
> others it
> does not. If I add 0.1 to every value, then the hist() binning behavior
> is what
> I would normally expect. 
> 
>> h1<-hist(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5))
>> h1$mids
> [1] 1.5 2.5 3.5 4.5
>> h1$counts
> [1] 3 3 4 5
>> h2<-hist(c(1.1,2.1,2.1,3.1,3.1,3.1,4.1,4.1,4.1,4.1,5.1,5.1,5.1,5.1,5.1)
>> )
>> h2$mids
> [1] 1.5 2.5 3.5 4.5 5.5
>> h2$counts
> [1] 1 2 3 4 5
> 
> Naively I would have expected the same distribution of counts in the
> two cases, but clearly that is not happening. This is a simple example
> to illustrate the behavior, originally I noticed this while binning a
> large data sample where I had set the breaks=c(0,24,1).

This is the correct intended bahaviour. By default, values which are
exactly on the boundary between two bins are counted in the bin which
is just below the boundary value. Except that the bottom-most break
will count values on it into the bin just above it.

Hence 1,2,2 all go into the [1,2] bin; 3,3,3 into (2,3];
4,4,4,4 into (3,4]; and 5,5,5,5,5 into (4,5]. Hence the counts 3,3,4,5.

Since you did not set breaks in
  h1<-hist(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5)),
they were set using the default method, and you can see what they are
with

  h1$breaks
  [1] 1 2 3 4 5

When you add 0.1 to each value, you push the values on the boundaries
up into the next bin. Now each value is inside its bin, and not on
any boundary. Hence 1.1 is in (1,2]; 2.1,2.1 in (2,3];
3.1,3.1,3.1 in (3,4]; 4.1,4.1,4.1,4.1 in (4,5]; and
5.1,5.1,5.1,5.1,5.1 in (5,6], giving counts 1,2,3,4,5 as you observe.

The default behaviour described above is defined by the default options

  include.lowest = TRUE, right = TRUE

where:

include.lowest: logical; if 'TRUE', an 'x[i]' equal to the 'breaks'
  value will be included in the first (or last, for 'right =
  FALSE') bar.  This will be ignored (with a warning) unless
  'breaks' is a vector.

   right: logical; if 'TRUE', the histograms cells are right-closed
  (left open) intervals.

See '?hist'. You can change this behaviour by shanging the options.

Hoping this helps,
Ted.

E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 07-Nov-09   Time: 13:57:07
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] basename returns "." not in filename (PR#13958)

2009-09-18 Thread Ted Harding

On 18-Sep-09 19:08:15, Jens Oehlschlägel wrote:
> Mmh,
>> > Point is, I gather, that trailing slashes are removed, e.g.,
>> >
>> > viggo:~/>basename foo/
>> > foo
>> >
>> > So, not a bug.
> 
> This unfortunately means that we cannot distinguish between 
> 1) a path with a filename
> 2) a path without a filename
> 
> For example in the next version of the ff-package we allow a user to
> specify a 'pattern' for all files of a ff dataframe which is path
> together with a fileprefix, the above means we cannot specify an empty
> prefix "" for the current working directory, because 
> 
>> dirname("./.")
> [1] "."
>> basename("./.")
> [1] "."
>> dirname("./")
> [1] "."
>> basename("./")
> [1] "."
> 
> Jens Oehlschlägel

I am getting confused by this discussion. At least on Unixoid systems,
and I believe it holds for Windows systems too, "." stands for the
current directory ("working directory").

Moreover, "./" means exactly the same as ".": If you list the files
in "./" you will get exactly the same as if you list the files in ".".

Further, "/." means the same as "/"
and the same as "", so (on the same basis) "./." also
means exactly the same as ".".

Therefore the second "." in "./." is not a filename.

What the above examples of dirname and basename usage are returning
is simply a specific representation of the current working directlory.

Forgive me if I have not seen the point, but what I think I have seen
boils down to the interpretation I have given above.

Best wishes,
Ted.

E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 18-Sep-09   Time: 22:35:34
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Problem in matrix definition?

2009-08-31 Thread Ted Harding

Does the following help to clarify things?
>From your original two different definitions of the matrix 'a':

  mp <- 10
  np <- 5
  a <- matrix(c(1:mp*np),mp,np)
  a
  #   [,1] [,2] [,3] [,4] [,5]
  #  [1,]55555
  #  [2,]   10   10   10   10   10
  #  [3,]   15   15   15   15   15
  #  [4,]   20   20   20   20   20
  #  [5,]   25   25   25   25   25
  #  [6,]   30   30   30   30   30
  #  [7,]   35   35   35   35   35
  #  [8,]   40   40   40   40   40
  #  [9,]   45   45   45   45   45
  # [10,]   50   50   50   50   50

  a <- matrix(c(1:50),mp,np)
  a
  #   [,1] [,2] [,3] [,4] [,5]
  #  [1,]1   11   21   31   41
  #  [2,]2   12   22   32   42
  #  [3,]3   13   23   33   43
  #  [4,]4   14   24   34   44
  #  [5,]5   15   25   35   45
  #  [6,]6   16   26   36   46
  #  [7,]7   17   27   37   47
  #  [8,]8   18   28   38   48
  #  [9,]9   19   29   39   49
  # [10,]   10   20   30   40   50

Check what is said about precedence in '?Syntax'. The effect of
"c(1:mp*np)" is (1:mp)*np, i.e. first create (1:mp), and then
multiuply each element by np. So, with mp=10, np=5, you first get
c(1,2,3,4,5,6,7,8,9,10) from (1:10), and then
c(5,10,15,20,25,30,35,40,45,50) after multiplying this by 5.

This vector with 10 elements is then recycled 5 times over
when you ask for "a <- matrix(c(1:mp*np),mp,np)", i.e. the
equivalent of

  a <- matrix(c(5,10,15,20,25,30,35,40,45,50),10,5)

On the other hand, a <- matrix(c(1:50),10,5) does what you expected!

Hoping this helps,
Ted.

On 31-Aug-09 08:21:34, Uwe Ligges wrote:
> Try to learn how to debug.
> The following copied from my R session might give you some hints:
> 
>  > options(error=recover)
>  > geninv(a)
> Error in L[k:n, 1:(r - 1)] %*% (t(L[k, 1:(r - 1)])) :
>non-conformable arguments
> 
> Enter a frame number, or 0 to exit
> 
> 1: geninv(a)
> 
> Selection: 1
> Called from: eval(expr, envir, enclos)
> Browse[1]> (t(L[k, 1:(r - 1)]))
>   [,1]   [,2]
> [1,] 75.68261 29.277
> Browse[1]> L[k:n, 1:(r - 1)]
>[,1][,2]
> [1,]  75.68261 29.2770
> [2,] 103.71320 43.9155
> [3,] 131.74380 58.5540
> 
> 
> 
> Best,
> Uwe Ligges
> 
> 
> 
> 
> Fabio Mathias Corrêa wrote:
>> I'm implementing a function to compute the moore-penrose inverse,
>> using a code from the article: Fast Computation of Moore-Penrose
>> Inverse Matrices. Neural Information Processing - Letters and Reviews.
>> Vol.8, No.2, August 2005
>> 
>> However, the R presents an error message when I use the geninv.
>> 
>> The odd thing is that the error occurs for some arrays, however they
>> have the same size. And the R indicates the lack of compatibility
>> between the matrix!
>> 
>> Below is an example:
>>  
>> Creating the function geninv
>> 
>> geninv <- function(x)
>> {
>>  m <- dim(x)[1]
>>  n <- dim(x)[2]
>>  tr <- 0
>>  if(m < n) {
>>a <- tcrossprod(x)
>>n <- m
>>tr <- 1
>>  }
>>  else  a <- crossprod(x)
>>  dA <- diag(a)
>>  tol=min(dA[dA>0])*1e-9
>>  L = a*0
>>  r = 0
>>  for(k in 1:n){
>>   r = r+1
>>   L[k:n,r] = a[k:n,k]-(L[k:n,1:(r-1)]%*%(t(L[k,1:(r-1)])))
>>   if(L[k,r] > tol){
>> L[k,r] <- sqrt(L[k,r])
>> if (k < n) L[(k+1):n,r] <- L[(k+1):n,r]/L[k,r]
>>   }
>>   else r <- r-1
>>  }
>>  L <- L[,1:r]
>>  M <- solve(crossprod(L))
>>  if (tr == 1) Y <- t(x)%*%L%*%M%*%M%*%t(L)
>>  else Y <- L%*%M%*%M%*%t(L)%*%t(x)
>>  return(Y)
>>  }
>>  
>> # Perfect result! This result is identical of the function ginv!
>> 
>> library(MASS)
>> mp <- 10
>> np <- 5
>> a <- matrix(c(1:mp*np),mp,np)
>> dim(a) # 10,5
>> geninv(a)
>> ginv(a)
>> 
>> # Problem
>> a <- matrix(c(1:50),mp,np) # The difference is the vector (1:50)
>> dim(a) # 10,5
>> 
>> geninv(a)
>> Error in L[k:n, 1:(r - 1)] %*% (t(L[k, 1:(r - 1)])) : 
>>   arguments are not compatible
>> 
>> 
>> The problem this in matrix definition?
>> 
>> Thanks very much!
>> 
>>Fábio Mathias Corrêa
>> Estatística e Experimentação Agropecuária/UFLA
>> 
>> 
>> 
>>   _
>>   ___
>> Veja quais são os assuntos do momento no Yahoo! +Buscados
>> http://br.maisbuscados.yahoo.com
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 31-Aug-09   Time: 10:00:04
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Printing the null hypothesis

2009-08-16 Thread Ted Harding

On 16-Aug-09 14:06:18, Liviu Andronic wrote:
> Hello,
> On 8/16/09, Ted Harding  wrote:
>> I don't know about *compelling* reasons! But (as a general rule)
>>  if the Alternative Hyptohesis is stated, then the Null Hypothesis
>>  is simply its negation. So, in your example, you can infer
>>
>>   H0: true tau equals 0
>>   Ha: true tau is not equal to 0.
>>
> Oh, I had a slightly different H0 in mind. In the given example,
> cor.test(..., met="kendall") would test "H0: x and y are independent",
> but cor.test(..., met="pearson") would test: "H0: x and y are not
> correlated (or `are linearly independent')" .

Ah, now you are playing with fire! What the Pearson, Kendall and
Spearman coefficients in cor.test measure is *association*. OK, if
the results clearly indicate association, then the variables are
not independent. But it is possible to have two variables x, y
which are definitely not independent (indeed one is a function of
the other) which yield zero association by any of these measures.

Example:
  x <-  (-10:10) ; y <- x^2 - mean(x^2)
  cor.test(x,y,method="pearson")
  #   Pearson's product-moment correlation
  # t = 0, df = 19, p-value = 1
  # alternative hypothesis: true correlation is not equal to 0 
  # sample estimates: cor 0
  cor.test(x,y,method="kendall")
  #   Kendall's rank correlation tau
  # z = 0, p-value = 1
  # alternative hypothesis: true tau is not equal to 0 
  # sample estimates: tau 0
  # cor.test(x,y,method="spearman")
  #  Spearman's rank correlation rho
  # S = 1540, p-value = 1
  # alternative hypothesis: true rho is not equal to 0 
  # sample estimates: rho 0

If you wanted, for instance, that the "method=kendall" should
announce that it is testing "H0: x and y are independent" then
it would seriously mislead the reader!

> To take a different example, a test of normality.
>> shapiro.test(mtcars$wt)
> 
>   Shapiro-Wilk normality test
> 
> data:  mtcars$wt
> W = 0.9433, p-value = 0.09265
> 
> Here both "H0: x is normal" and "Ha: x is not normal" are missing. At
> least to beginners, these things are not always perfectly clear (even
> after reading the documentation), and when interpreting the results it
> can prove useful to have on-screen information about the null.
> 
> Thank you for answering
> Liviu

This is possibly a more discussable point, in that even if you know
what the Shapiro-Wilk statistic is, it is not obvious what it is
sensitive to, and hence what it might be testing for. But I doubt
that someone would be led to try the Shapiro-Wilk test in the
first place unless they were aware that it was a test for normality,
and indeded this is announced in the first line of the response.
The alternative, therefore, is "non-normality".

As to the contrast between absence of an "Ha" statement for the
Shapiro-Wilk, and its presence in cor,test(), this comes back to
the point I made earlier: cot.test() offers you three alternatives
to choose from: "two-sided" (default), "greater", "less". This
distinction can be important, and when cor.test() reports "Ha" it
tells you which one was used.

On the other hand, as far as Shapiro-Wilk is concerned there is
no choice of alternatives (nor of anything else except the data x).
So there is nothing to tell you! And, further, departure from
normality has so many "dimensions" that alternatives like "two
sided", "greater" or "less" would make no sense. One can think of
tests targeted at specific kinds of alternative such as "Distribution
is excessively skew" or "distribution has excessive kurtosis" or
"distribution is bimodal" or "distribution is multimodal", and so on.
But any of these can be detected by Shapiro-Wilk, so it is not
targeted at any specific alternative.

Best wishes,
Ted.

E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 16-Aug-09   Time: 16:26:48
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Printing the null hypothesis

2009-08-16 Thread Ted Harding

On 16-Aug-09 10:38:40, Liviu Andronic wrote:
> Dear R developers,
> Currently many (all?) test functions in R describe the alternative
> hypothesis, but not the the null hypothesis being tested. For example,
> cor.test:
>> require(boot)
>> data(mtcars)
>> with(mtcars, cor.test(mpg, wt, met="kendall"))
> 
>   Kendall's rank correlation tau
> 
> data:  mpg and wt
> z = -5.7981, p-value = 0.6706
> alternative hypothesis: true tau is not equal to 0
> sample estimates:
>  tau
> -0.72783
> 
> Warning message:
> In cor.test.default(mpg, wt, met = "kendall") :
>   Cannot compute exact p-value with ties
> 
> 
> In this example,
> H0: (not printed)
> Ha: true tau is not equal to 0
> 
> This should be fine for the advanced users and expert statisticians,
> but not for beginners. The help page will also often not explicitely
> state the null hypothesis. Personally, I often find myself in front of
> an htest object guessing what the null should have reasonably sounded
> like.
> 
> Are there compelling reasons for not printing out the null being
> tested, along with the rest of the results? Thank you
> Liviu

I don't know about *compelling* reasons! But (as a general rule)
if the Alternative Hyptohesis is stated, then the Null Hypothesis
is simply its negation. So, in your example, you can infer

  H0: true tau equals 0
  Ha: true tau is not equal to 0.

I don't think one needs to be an advanced user or expert statistician
to see this -- it is part of the basic understanding of hypothesis
testing.

Some people might regard the "H0" statement as simply redundant!

The "Ha" statement is, however, essential, since different alterntatives
may ne adopted depending on the application such as

  Ha: true tau is greater than 0
  (implicit: true tau <= 0)

or
  Ha: tru tau is less than 0
  (implicit: true tau >= 0)

Hoping this helps,
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 16-Aug-09   Time: 11:55:15
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] identical(0, -0)

2009-08-07 Thread Ted Harding

On 07-Aug-09 11:07:08, Duncan Murdoch wrote:
> Martin Maechler wrote:
>>>>>>> William Dunlap 
>>>>>>> on Thu, 6 Aug 2009 15:06:08 -0700 writes:
>> >> -Original Message- From:
>> >> r-help-boun...@r-project.org
>> >> [mailto:r-help-boun...@r-project.org] On Behalf Of
>> >> Giovanni Petris Sent: Thursday, August 06, 2009 3:00 PM
>> >> To: milton.ru...@gmail.com Cc: r-h...@r-project.org;
>> >> daniel.gerl...@geodecapital.com Subject: Re: [R] Why is 0
>> >> not an integer?
>> >> 
>> >> 
>> >> I ran an instant experiment...
>> >> 
>> >> > typeof(0) [1] "double" > typeof(-0) [1] "double" >
>> >> identical(0, -0) [1] TRUE
>> >> 
>> >> Best, Giovanni
>>
>> > But 0.0 and -0.0 have different reciprocals
>>
>> >> 1.0/0.0
>> >[1] Inf
>> >> 1.0/-0.0
>> >[1] -Inf
>>
>> > Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap
>> > tibco.com
>>
>> yes.  {finally something interesting in this boring thread !}
>> ---> diverting to R-devel
>>
>> In April, I've had a private e-mail communication with John
>> Chambers [father of S, notably S4, which also brought identical()]
>> and Bill about the topic,
>> where I had started suggesting that  R  should be changed such
>> that
>> identical(-0. , +0.)
>> would return FALSE.
>> Bill did mention that it does so for (newish versions of) S+
>> and that he'd prefer that, too,
>> and John said
>>
>>  >> I agree on having a preference for a bitwise comparison for
>>  >> identical()---that's what the name means after all.  But since
>>  >> someone implemented the numerical case as the C == it's probably
>>  >> going to be more hassle than it's worth to change it.  But we
>>  >> should make the implementation clear in the documentation.
>>
>> so in principle, we all agreed that R's identical() should be
>> changed here, namely by using something like  memcmp() instead
>> of simple '==' ,  however we haven't bothered to actually 
>> *implement* this change.
>>
>> I am currently testing a patch  which would lead to
>> identical(0, -0)  return FALSE.
>>   
> I don't think that would be a good idea.  Other expressions besides
> "-0" 
> calculate the zero with the negative sign bit, e.g. the following
> sequence:
> 
> pos <- 1
> neg <- -1
> zero <- 0
> y <- zero*pos
> z <- zero*neg
> identical(y, z)
> 
> I think most R users would expect the last expression there to be
> TRUE based on the previous two lines, given that pos and neg both
> have finite values. In a simple case like this y == z would be a
> better test to use, but if those were components of a larger
> structure, identical() is all we've got, and people would waste a
> lot of time tracking down why structures differing only in the
> sign of zero were not identical, even though every element tested
> equal.
> 
> Duncan Murdoch
>> Martin Maechler, ETH Zurich

My own view of this is that there may in certain cirumstances be an
interest in distinguishing between 0 and (-0), yet normally most
users will simply want to compare the numerical values.

Therefore I am in favour of revising identical() so that it can so
distinguish; but also of taking the opportunity to give it a parameter
say

  identical(x,y,sign.bit=FALSE)

so that the default behaviour would be to see 0 and (-0) as identical,
but with sign.bit=TRUE it would see the difference.

However, I put this forward in ignorance of
a) Any difficulties that this may present in re-coding identical();
b) Any complications that may arise when applying this new form
   to complex objects.

Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 07-Aug-09   Time: 14:49:51
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] read.csv

2009-06-14 Thread Ted Harding

On 14-Jun-09 18:56:01, Gabor Grothendieck wrote:
> If read.csv's colClasses= argument is NOT used then read.csv accepts
> double quoted numerics:
> 
> 1: > read.csv(stdin())
> 0: A,B
> 1: "1",1
> 2: "2",2
> 3:
>   A B
> 1 1 1
> 2 2 2
> 
> However, if colClasses is used then it seems that it does not:
> 
>> read.csv(stdin(), colClasses = "numeric")
> 0: A,B
> 1: "1",1
> 2: "2",2
> 3:
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
> na.strings,  :
>   scan() expected 'a real', got '"1"'
> 
> Is this really intended?  I would have expected that a csv file
> in which each field is surrounded with double quotes is acceptable
> in both cases. This may be documented as is yet seems undesirable
> from both a consistency viewpoint and the viewpoint that it should
> be possible to double quote fields in a csv file.

Well, the default for colClasses is NA, for which ?read.csv says:
  [...]
  Possible values are 'NA' (when 'type.convert' is used),
  [...]
and then ?type.convert says:
  This is principally a helper function for 'read.table'. Given a
  character vector, it attempts to convert it to logical, integer,
  numeric or complex, and failing that converts it to factor unless
  'as.is = TRUE'.  The first type that can accept all the non-missing
  values is chosen.

It would seem that type 'logical' won't accept integer (naively one
might expect 1 --> TRUE, but see experiment below), so the first
acceptable type for "1" is integer, and that is what happens.
So it is indeed documented (in the R[ecursive] sense of "documented" :))

However, presumably when colClasses is used then type.convert() is
not called, in which case R sees itself being asked to assign a
character entity to a destination which it has been told shall be
integer, and therefore, since the default for as.is is
  as.is = !stringsAsFactors
but for this ?read.csv says that stringsAsFactors "is overridden
bu [sic] 'as.is' and 'colClasses', both of which allow finer
control.", so that wouldn't come to the rescue either.

Experiment:
  X <-logical(10)
  class(X)
  # [1] "logical"
  X[1]<-1
  X
  # [1] 1 0 0 0 0 0 0 0 0 0
  class(X)
  # [1] "numeric"
so R has converted X from class 'logical' to class 'numeric'
on being asked to assign a number to a logical; but in this
case its hands were not tied by colClasses.

Or am I missing something?!!

Ted.

E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 14-Jun-09   Time: 21:21:22
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Closed-source non-free ParallelR ?

2009-04-26 Thread Ted Harding

On 24-Apr-09 16:53:04, Stavros Macrakis wrote:
> On Thu, Apr 23, 2009 at 8:54 PM, Ted Harding
>  wrote:
> [...]
>> ...inspires someone to incorporate the same language extension
>> into a GPL'd FORTRAN interpreter/compiler. I think I could then
>> be vulnerable, or they could, on the grounds that I/they had pinched
>> the idea from the commercial product.
> 
> Unless you have a confidentiality agreement of some kind, or the idea
> is covered by a patent, you can pinch any ideas you like from other
> products.  Copyright law does not cover ideas.

Well, I'm not so sure about that ... back in 2002/2003, National
Instrument sued the MathWorks (MatLab proprietors) on the grounds
that the MathWorks Simulink graphical development tool infringed
on National Instruments' patented rights in such an idea. NI's
implementation is embodied in their LabVIEW tool.

In both cases, the tool consists of 'data flow diagrams' drawn on
screen under the user's mouse control, using icons, with the ability
to associate data structures and code with the nodes and the links.

On my reading of it, it was the *idea* of using such a graphical
interface itself which National Instruments claimed to have patented,
namely

  The technology of the patents in suit concerns the creation
  of model systems (generally known as "data flow diagrams")
  through building diagrams on a computer screen by pointing
  and clicking with a mouse, rather than writing traditional
  lines of code. ...

The patent claims (long, and hoghly detailed) can be read at

  http://www.freepatentsonline.com/4901221.html
  http://www.freepatentsonline.com/4914568.html
  http://www.freepatentsonline.com/5301336.html

The Mathworks lost, and it went to appeal. Mathworks also lost the
appeal. The Appeal Court's opinion can be read at

  http://cafc.bna.com/03-1540.pdf

And the best of luck ... As I said before, I am not a lawyer and
tend to get bewildered by their use of language; but others may end
up more sure about this topic!

Ted.

>> ...Or maybe the GPL doesn't inhibit you
>> from using *ideas* and *features* of GPL software, provided you
>> implement them yourself and in your own way?
> 
> The GPL does not and cannot restrict reimplementations of ideas and
> features.

E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 26-Apr-09   Time: 12:24:16
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Closed-source non-free ParallelR ?

2009-04-23 Thread Ted Harding

 the prospective third party developer of an application  
>>> that is
>>> to use R to consult with lawyers to determine what *THEIR*  
>>> obligations are
>>> if they should elect to proceed.
>>
>> Yes, this is true.  But it is also true that if (for example) the R
>> Foundation says officially that it interprets GPL to allow
>> distributing proprietary packages along with R, then that is the
>> interpretation that matters, since the R Foundation (not the FSF) is
>> the copyright holder.
>>
>>> At this level, it is really pretty simple and a lot of these things  
>>> are
>>> covered in the GPL FAQs, including the reporting of violations.
>>
>> The GPL FAQs are the FSF's interpretation.  The R Foundation is not
>> obliged to have the same interpretation, and of course the FSF cannot
>> enforce licenses given by the R Foundation.
> 
> Underlying all of your comments seems to be a presumption that the R  
> Foundation can disentangle themselves from the FSF vis-a-vis the GPL.
> 
> Keep in mind that it is the FSF that is the copyright holder of the
> GPL.
> 
> The R Foundation may be the copyright holder to R, but they are  
> distributing it under a license which they did not write.
> 
> Thus, I cannot envision any reasonable circumstances under which the R 
> Foundation would place themselves in a position of legal risk in  
> deviating from the interpretations of the GPL by the FSF. It would be  
> insane legally to do so.
> 
> The key issue is the lack of case law relative to the GPL and that  
> leaves room for interpretation. One MUST therefore give significant  
> weight to the interpretations of the FSF as it will likely be the FSF  
> that will be involved in any legal disputes over the GPL and its  
> application. You would want them on your side, not be fighting them.
> 
> A parallel here is why most large U.S. public corporations legally  
> incorporate in the state of Delaware, even though they may not have  
> any material physical presence in that state. It is because the  
> overwhelming majority of corporate case law in the U.S. has been  
> decided under the laws of Delaware and the interpretations of said  
> laws. If I were to start a company (which I have done in the past) and 
> feared that I should find myself facing litigation at some future  
> date, I would want that huge database of case law behind me. A small  
> company (such as I had) may be less concerned about this and be  
> comfortable with the laws of their own state, which I was. But if I  
> were to be looking to build a big company with investors, etc. and  
> perhaps look to go public at a future date, you bet I would look to  
> incorporate in Delaware. It would be the right fiduciary decision to  
> make in the interest of all parties.
> 
> Unfortunately, we have no such archive of case law yet of the GPL.  
> Thus at least from a legally enforceable perspective, all is grey and  
> the FSF has to be the presumptive leader here.
> 
> HTH,
> 
> Marc Schwartz
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 24-Apr-09   Time: 01:54:11
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Gamma funtion(s) bug

2009-03-30 Thread Ted Harding

On 30-Mar-09 20:37:51, Duncan Murdoch wrote:
> On 3/30/2009 2:55 PM, (Ted Harding) wrote:
>> On 30-Mar-09 18:40:03, Kjetil Halvorsen wrote:
>>> With R 2.8.1 on ubuntu I get:
>>>> gamma(-1)
>>> [1] NaN
>>> Warning message:
>>> In gamma(-1) : NaNs produced
>>>> lgamma(-1)
>>> [1] Inf
>>> Warning message:
>>> value out of range in 'lgamma'
>>> 
>>> Is'nt the first one right, and the second one (lgamma)
>>> should also be NaN?
>>> Kjetil
>> 
>> That is surely correct! Since lim[x->(-1)+] gamma(x) = +Inf,
>> while lim[x->(-1)-] gamma(x) = -Inf, at gamma(-1) one cannot
>> choose between +Inf and -Inf, so surely is is NaN.
> 
> But lgamma(x) is log(abs(gamma(x))), so it looks okay to me.
> 
> Duncan Murdoch

Oops, yes! That's what comes of talking off the top of my head
(I don't think I've ever had occasion to evaluate lgamma(x)
for negative x, so never consciously checked in ?lgamma).

Thanks, Duncan!
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 30-Mar-09   Time: 22:28:52
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Gamma funtion(s) bug

2009-03-30 Thread Ted Harding

On 30-Mar-09 18:40:03, Kjetil Halvorsen wrote:
> With R 2.8.1 on ubuntu I get:
>> gamma(-1)
> [1] NaN
> Warning message:
> In gamma(-1) : NaNs produced
>> lgamma(-1)
> [1] Inf
> Warning message:
> value out of range in 'lgamma'
> 
> Is'nt the first one right, and the second one (lgamma)
> should also be NaN?
> Kjetil

That is surely correct! Since lim[x->(-1)+] gamma(x) = +Inf,
while lim[x->(-1)-] gamma(x) = -Inf, at gamma(-1) one cannot
choose between +Inf and -Inf, so surely is is NaN.

Ted.

--------
E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 30-Mar-09   Time: 19:55:33
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Error in help file for quantile()

2009-03-29 Thread Ted Harding

On 29-Mar-09 20:43:33, Rob Hyndman wrote:
> For some reason, the help file on quantile() says "Missing values are
> ignored" in the description of the x argument. Yet this is only true
> if na.rm=TRUE. I suggest the help file is amended to remove the words
> "Missing values are ignored".
> Rob

True enough -- in that if, as in the default, na.rm == FALSE,
then applying quantile() to a vector with NAs yields the error
message:

quantile(X1)
# Error in quantile.default(X1) : 
#   missing values and NaN's not allowed if 'na.rm' is FALSE

So either you have na.rm==TRUE, in which case it doesn't need
saying that "Missing values are ignored" (unless you really
want to spell it out in the form "Missing values are ignored if
called with na.rm=TRUE; otherwise an error message is produced"),
or you have na.rm==FALSE, in which case you get the error message
and know where you stand.

Ted.

------------
E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 29-Mar-09   Time: 23:08:50
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [R] Semantics of sequences in R

2009-02-24 Thread Ted Harding

On 24-Feb-09 13:14:36, Berwin A Turlach wrote:
> G'day Dimitris,
> 
> On Tue, 24 Feb 2009 11:19:15 +0100
> Dimitris Rizopoulos  wrote:
> 
>> in my opinion the point of the whole discussion could be summarized
>> by the question, what is a design flaw? This is totally subjective,
>> and it happens almost everywhere in life. [...]
> 
> Beautifully summarised and I completely agree.  Not surprisingly,
> others don't.
> 
> [...]
>> To close I'd like to share with you a Greek saying (maybe also a
>> saying in other parts of the world) that goes, for every rule there
>> is an exception. [...]
> 
> As far as I know, the same saying exist in English.  It definitely
> exist in German.  Actually, in German it is "every rule has its
> exception including this rule".

Or, as my mother used to say, "Moderation in all things!"
To which, as I grew up, I adjoined " ... including moderation."
Ted.

> In German there is one grammar rule
> that does not have an exception.  At least there used to be one; I am
> not really sure whether that rule survived the recent reform of the
> German grammar rules.
> 
> Cheers,
>   Berwin


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 24-Feb-09   Time: 14:44:21
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] "open-ended" plot limits?

2009-02-05 Thread Ted Harding

Thanks, everyone, for all the responses!
Ted.

On 05-Feb-09 20:48:33, Ted Harding wrote:
> Hi Folks,
> Maybe I've missed it already being available somehow,
> but if the following isn't available I'd like to suggest it.
> 
> If you're happy to let plot() choose its own limits,
> then of course plot(x,y) will do it.
> 
> If you know what limits you want, then
>   plot(x,y,xlim=c(x0,x1),ylim(y0,y1)
> will do it.
> 
> But sometimes one would like to
> a) make sure that (e.g.) the y-axis has a lower limit (say) 0
> b) let plot() choose the upper limit.
> 
> In that case, something like
> 
>   plot(x,y,ylim=c(0,NA))
> 
> would be a natural way of specifying it. But of course that
> does not work.
> 
> I would like to suggest that this possibility should be available.
> What do people think?
> 
> Best wishes,
> Ted.
> 
> 
> E-Mail: (Ted Harding) 
> Fax-to-email: +44 (0)870 094 0861
> Date: 05-Feb-09   Time: 20:48:30
> -- XFMail --
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 05-Feb-09   Time: 23:08:25
---------- XFMail --


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 06-Feb-09   Time: 00:21:44
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] "open-ended" plot limits?

2009-02-05 Thread Ted Harding

Hi Folks,
Maybe I've missed it already being available somehow,
but if the following isn't available I'd like to suggest it.

If you're happy to let plot() choose its own limits,
then of course plot(x,y) will do it.

If you know what limits you want, then
  plot(x,y,xlim=c(x0,x1),ylim(y0,y1)
will do it.

But sometimes one would like to
a) make sure that (e.g.) the y-axis has a lower limit (say) 0
b) let plot() choose the upper limit.

In that case, something like

  plot(x,y,ylim=c(0,NA))

would be a natural way of specifying it. But of course that
does not work.

I would like to suggest that this possibility should be available.
What do people think?

Best wishes,
Ted.

--------
E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 05-Feb-09   Time: 20:48:30
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] cat: ./R/Copy: No such file or directory

2008-11-01 Thread Ted Harding

Just guessing here, but it looks as though an attempt has been made
to execute the following command:

  cat ./R/Copy of create.fourier.basis.R

This may have arisen because there is indeed a file named

  "Copy of create.fourier.basis.R"

but the command was issued to the OS without quotes; or (perhaps
less likely, because of the presence of the "./R/") the line

  "Copy of create.fourier.basis.R"

was intended as a comment in the file, but somehow got read as
a substantive part of the code.

Good luck!
Ted.

On 01-Nov-08 15:40:33, Spencer Graves wrote:
> Hello: 
> 
>   What do you recommend I do to get past cryptic error messages
> from 
> "R CMD check", complaining "No such file or directory"?  The package is
> under SVN control, and I reverted to a version that passed "R CMD 
> check", without eliminating the error.  The "00install.out" file is 
> short and bitter: 
> 
> 
> cat: ./R/Copy: No such file or directory
> cat: of: No such file or directory
> cat: create.fourier.basis.R: No such file or directory
> make: *** [Rcode0] Error 1
> 
> -- Making package fda 
>   adding build stamp to DESCRIPTION
>   installing NAMESPACE file and metadata
> make[2]: *** No rule to make target `R/Copy', needed by 
> `D:/spencerg/statmtds/splines/fda/RForge/fda/fda.Rcheck/fda/R/fda'. 
> Stop.
> make[1]: *** [all] Error 2
> make: *** [pkg-fda] Error 2
> *** Installation of fda failed ***
> 
> Removing 'D:/spencerg/statmtds/splines/fda/RForge/fda/fda.Rcheck/fda'
> 
>  
>   Thanks for any suggestions.  I have a "*.tar.gz" file that I hope
> may not have this problem, but I'm not sure. 
> 
>   Thanks,
>   Spencer
> 
> __________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 01-Nov-08   Time: 16:19:56
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] non user-friendly error for chol2inv functions

2008-08-29 Thread Ted Harding

On 29-Aug-08 13:00:01, Martin Maechler wrote:
>>>>>> "cd" == christophe dutang <[EMAIL PROTECTED]>
>>>>>> on Fri, 29 Aug 2008 14:28:42 +0200 writes:
> 
> cd> Yes, I do not cast the first argument as a matrix with
> cd> as.matrix function.
> cd> Maybe we could detail the error message if the first
> cd> argument
> cd> is a numeric?
> 
> cd> error(_("'a' is a numeric and must be coerced to a numeric
> cd> matrix"));
> 
> Merci, Christophe.   Yes, we *could* do that.
> Alternatively, I think I will just make it work in that case,
> since I see that 
>   qr(), chol(), svd(), solve()  all 
> treat a numeric (of length 1) as a  1 x 1 matrix automatically.

I was about to draw attention to this "inconsistency"!
While one is about it, might it not be useful also to do
the converse: Treat a 1x1 matrix as a scalar in appropriate
contexts?

E.g.

  a <- 4
  A <- matrix(4,1,1)
  B <- matrix(c(1,2,3,4),2,2)
  a*B
#  [,1] [,2]
# [1,]4   12
# [2,]8   16

  a+B
#  [,1] [,2]
# [1,]57
# [2,]68

  A*B
# Error in A * B : non-conformable arrays
  A+B
# Error in A + B : non-conformable arrays

Ted. 


> 
> cd> Thanks for your answer
> De rien!
> Martin
> 
> cd> 2008/8/29 Martin Maechler <[EMAIL PROTECTED]>
> 
> >> >>>>> "cd" == christophe dutang <[EMAIL PROTECTED]>
> >> >>>>> on Fri, 29 Aug 2008 12:44:18 +0200 writes:
> >> 
> cd> Hi,
> cd> In function chol2inv with the option LINPACK set to false
> (default),
> >> it
> cd> raises an error when the matrix is 1x1 matrix (i.e. just a
> real)
> >> saying
> >> 
> cd> 'a' must be a numeric matrix
> >> 
> >> It is very helpful, but you have to read and understand it.
> >> I'm pretty sure you did not provide a  1 x 1 matrix.
> >> 
> >> Here's an example showing how things works :
> >> 
> >> > m <- matrix(4,1,1)
> >> > cm <- chol(m)
> >> > cm
> >> [,1]
> >> [1,]2
> >> > chol2inv(cm)
> >> [,1]
> >> [1,] 0.25
> >> >
> >> 
> >> Martin Maechler, ETH Zurich
> >> 
> >> 
> cd> This error is raised by the underlying C function
> (modLa_chol2inv in
> cd> function Lapack.c). Everything is normal, but I wonder if we
> could
> >> have
> cd> another behavior when we pass a 1x1 matrix. I spent time this
> >> morning
> cd> finding where was the error, and it was this "problem".
> >> 
> cd> Thanks in advance
> >> 
> cd> Christophe
> >> 
> cd> [[alternative HTML version deleted]]
> >> 
> cd> __
> cd> R-devel@r-project.org mailing list
> cd> https://stat.ethz.ch/mailman/listinfo/r-devel
> >> 
> 
> cd> [[alternative HTML version deleted]]
> 
> cd> __
> cd> R-devel@r-project.org mailing list
> cd> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 29-Aug-08   Time: 14:16:03
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Suggestion: add a warning in the help-file of unique()

2008-04-17 Thread Ted Harding

On 17-Apr-08 10:44:32, Matthieu Stigler wrote:
> Hello
> 
> I'm sorry if this suggestion/correction was already made
> but after a search in devel list I did not find any mention
> of it. I would just suggest to add a warning or an exemple
> for the help-file of the function unique() like
> 
> "Note that unique() compares only identical values. Values
> which, are printed equally but in facts are not identical
> will be treated as different."
> 
> 
>  > a<-c(0.2, 0.3, 0.2, 0.4-0.1)
>  > a
> [1] 0.2 0.3 0.2 0.3
>  > unique(a)
> [1] 0.2 0.3 0.3
> 
> Well this is just the idea and the sentence could be made better
> (my poor english...). Maybe a reference to RFAQ 7.31 could be made.
> Maybe is this behaviour clear and logical for experienced users,
> but I don't think it is for beginners. I personnaly spent two
> hours to see that the problem in my code came from this.

The above is potentially a useful suggestion, and I would be
inclined to support it. However, for your other suggestion:

> I was thinking about modify the function unique() to introduce
> a "tol" argument which allows to compare with a tolerance level
> (with default value zero to keep unique consistent) like all.equal(),
> but it seemed too complicated with my little understanding.
> 
> Bests regards and many thanks for what you do for R!
> Matthieu Stigler

What is really complicated about it is that the results may
depend on the order of elements. When unique() eliminates only
values which are strictly identical to values which have been
scanned earlier, there is no problem.

But suppose you set "tol=0.11" in

unique(c(20.0, 30.0, 30.1, 30.2, 40.0)
# 20.0, 30.0, 40
[30.1 rejected because within 0.11 of previous 30.0;
 30.2 rejected because within 0.11 of previous 30.1]
and compare with

unique(c(20.0, 30.0, 30.2, 30.1, 40.0)
# 20.0, 30.0, 30.2, 40.0
[30.2 accepted because not within 0.11 of any previous;
 30.1 rejected because within 0.11 of previous 30.2 or 30.0]

This kind of problem is always present in situations where
there are potential "chained tolerances".

You cannot see the difference between the position of the
hour-hand of a clock now, and one minute later.

But you may not chain this logic, for, if you could:

If A is indistinguishable from B, and B is indistinguishable
  from C, then A is indistinguishable from C.

10:00 is indistinguishable from 10:01 (on the hour-hand)
10:[n] is indistinguishable from 10:[n+1]

Hence, by induction, 10:00 is indistinguishable from 11:00

Which you do not want!

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 17-Apr-08   Time: 14:54:19
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Warnings generated by log2()/log10() are really large/t

2008-02-27 Thread Ted Harding

On 27-Feb-08 13:39:47, Gabor Grothendieck wrote:
> On Wed, Feb 27, 2008 at 5:50 AM, Henrik Bengtsson
> <[EMAIL PROTECTED]> wrote:
>> On Wed, Feb 27, 2008 at 12:56 AM, Prof Brian Ripley
>> <[EMAIL PROTECTED]> wrote:
>> > On Wed, 27 Feb 2008, Martin Maechler wrote:
>> >
>> >  > Thank you Henrik,
>> >  >
>> >  >>>>>> "HenrikB" == Henrik Bengtsson <[EMAIL PROTECTED]>
>> >  >>>>>> on Tue, 26 Feb 2008 22:03:24 -0800 writes:
>> >  >
>> >  > {with many superfluous empty statements ( i.e., trailing ";" ):
>> >
>> >  Indeed!
>>
>> I like to add a personal touch to the code I'm writing ;)
>>
>> Seriously, I added them here as a bait in order to get a chance to say
>> that I finally found a good reason for adding the semicolons.  If you
>> cut'n'paste code from certain web pages it may happen that
>> newlines/carriage returns are not transferred and all code is pasted
>> into the same line at the R prompt.  With semicolons you still get a
>> valid syntax.  I cannot remember under what conditions this happened -
> 
> I have seen that too and many others have as well since in some forums
> (not related to R) its common to indent all source lines by two spaces.
> Any line appearing without indentation must have been wrapped.

A not-so-subtle solution to this (subtle or not) problem.

NEVER paste from a browser (or a Word doc, or anything similar)
into the R command interface. Paste only from pure plain text.

Therefore, if you must paste, then paste first into a window
where a pure-plain-text editor is running.

Then you can see what you're getting, and can clean it up.
After that, you can paste from this directly into R, or can
save the file and source() it.

Ted.



E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 27-Feb-08   Time: 14:36:02
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] 0.45<0.45 = TRUE (PR#10744)

2008-02-13 Thread Ted Harding

On 13-Feb-08 12:40:48, Barry Rowlingson wrote:
> hadley wickham wrote:
> 
>> It's more than that as though, as floating point addition is
>> no longer guaranteed to be commutative or associative, and
>> multiplication does not distribute over addition. Many concepts
>> that are clear cut in pure math become fuzzy in floating point
>> math - equality, singularity of matrices etc etc.
> 
>   I've just noticed that R doesn't calculate e^pi - pi as equal to 20:
> 
>   > exp(pi)-pi == 20
>   [1] FALSE
> 
>   See: http://www.xkcd.com/217/
> 
> Barry

Barry,
These things fluctuate. Once upon a time (sometime in 1915 will do)
you could get $[US]4.81 for £1.00 sterling.

One of the rare brief periods when the folks on opposite side
of the Atlantic saw i^i (to within .Machine$double.eps, which
at the time was about 0.001, if you were lucky and didn't
make a slip of the pen).

R still gets it approximately right:

  1/(1i^1i)
  [1] 4.810477+0i

$i^i = £1

Best wishes,
Ted.

----
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 13-Feb-08   Time: 15:57:02
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] 0.45<0.45 = TRUE (PR#10744)

2008-02-12 Thread Ted Harding

On 12-Feb-08 14:53:19, Gavin Simpson wrote:
> On Tue, 2008-02-12 at 15:35 +0100, [EMAIL PROTECTED] wrote:
>> Dear developer,
>> 
>> in my version of R (2.4.0) as weel as in a more recent version
>> (2.6.0) on different computers, we found this problem :
> 
> No problem in R. This is the FAQ of all FAQs (Type III SS is
> probably up there as well).

I'm thinking (by now quite strongly) that there is a place
in "Introduction to R" (and maybe other basic documentation)
for an account of arithmetic precision in R (and in digital
computation generally).

A section "Arithmetic Precision in R" near the beginning
would alert people to this issue (there is nothing about it in
"Introduction to R", "R Language Definition", or "R internals").

Once upon a time, poeple who did arithmetic knew about this
from hands-on experience (just when do you break out of the
loop when you are dividing 1 by 3 on a sheet of paper?) -- but
now people press buttons on black boxes, and when they find
that 1/3 calculated in two "mathematically equivalent" ways
comes out with two different values, they believe that there
is a bug in the software.

It would not occur to them, spontaneously, that the computer
is doing the right thing and that they should look in a FAQ
for an explanation of how they do not understand!

I would be willing to contribute to such an explanation;
and probably many others would too. But I feel it should be
coordinated by people who are experts in the internals
of how R handles such things.

Best wishes to all,
Ted.

E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 12-Feb-08   Time: 15:31:26
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] When 1+2 != 3 (PR#9895)

2007-09-03 Thread Ted Harding

On 03-Sep-07 19:25:58, Gabor Grothendieck wrote:
> Not sure if this counts but using the Ryacas package

Gabor, I'm afraid it doesn't count! (Though I didn't
exclude it explicitly). I'm not interested in the behaviour
of the sequence with denominator = 7 particularly.
The system is in fact an example of simulating chaotic
systems on a computer.

For instance, one of the classic illustrations is

  next x = 2*x*(1-x)

for any real x. The question is, how does a finite-length
binary representation behave?

Petr Savicky [privately] sent me a similar example:
Starting with r/K:

nextr <- function(r){ifelse(r<=K/2, 2*r, 2*(K-r))}

  "For K = 7 and r = 3, this yields r = 3,  6,  2,  4,  6, ...
   Dividing this by K=7, one gets the correct period with
   approximately correct numbers."

Best wishes,
Ted.

------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Sep-07   Time: 21:02:27
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] When 1+2 != 3 (PR#9895)

2007-09-03 Thread Ted Harding

On 03-Sep-07 15:12:06, Henrik Bengtsson wrote:
> On 9/2/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>> [...]
>> If it may be usefull, I have written to small function
>> (Unique and isEqual)
>> which can deal with this problem of the double numbers.
> 
> Quiz: What about utility functions equalsE() and equalsPi()?
> ...together with examples illustrating when they return TRUE and when
> they return FALSE.
> 
> Cheers
> 
> /Henrik

Well, if you guys want a Quiz: ... My favourite example
of something which will probably never work on R (or any
machine which implements fixed-length binary real arithmetic).

An interated function scheme on [0,1] is defined by

  if 0 <= x <= 0.5 then next x = 2*x

  if 0.5 < x <= 1  then next x = 2*(1 - x)

in R:

  nextX <- function(x){ifelse(x<=0.5, 2*x, 2*(1-x))}

and try, e.g.,

 x<-3/7; for(i in (1:60)){x<-nextX(x); print(c(i,x))}

x = 0 is an absorbing state.
x = 1 -> x = 0
x = 1/2 -> 1 -> 0
...
(these work in R)

If K is an odd integer, and 0 < r < K, then

x = r/K ->  ... leads into a periodic set.

E.g. (see above) 3/7 -> 6/7 -> 2/7 -> 4/7 -> 2/7

All other numbers x outside these sets generate non-periodic
sequences.

Apart from the case where initial x = 1/2^k, none of the
above is true in R (e.g. the example above).

So can you devise an "isEqual" function which will make this
work?

It's only Monday .. plenty of time!
Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Sep-07   Time: 17:32:38
---------- XFMail --


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Sep-07   Time: 18:50:23
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] SWF animation method

2007-08-08 Thread Ted Harding

On 08-Aug-07 13:52:59, Mike Lawrence wrote:
> Hi all,
> 
> Just thought I'd share something I discovered last night. I was  
> interested in creating animations consisting of a series of plots and  
> after finding very little in the usual sources regarding animation in  
> R directly, and disliking the imagemagick method described here 
> (http://tolstoy.newcastle.edu.au/R/help/05/10/13297.html), I  
> discovered that if one exports the plots to a multipage pdf, it is  
> relatively trivial to then use the pdf2swf command in SWFTools  
> (http://www.swftools.org/download.html; mac install instructions  
> here: http://9mmedia.com/blog/?p=7).

Thanks so much for sharing your discovery, Mike! Out of the blue!
(Unexpected bonus for being on the R list).

Best wishes,
Ted.

--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Aug-07   Time: 15:25:44
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] formula(CO2)

2007-07-16 Thread Ted Harding

On 16-Jul-07 14:42:19, Gabor Grothendieck wrote:
> Note that the formula uptake ~. will do the same thing so its
> not clear how useful this facility really is.

Hmmm... Do you mean somthing like

  lm(uptake ~ . , data=CO2[,i])

where i is a subset of (1:4) as in my code below? In which case
I agree!

Ted.


> 
> On 7/16/07, Ted Harding <[EMAIL PROTECTED]> wrote:
>> On 16-Jul-07 14:16:10, Gabor Grothendieck wrote:
>> > Following up on your comments it seems formula.data.frame just
>> > creates
>> > a formula whose lhs is the first column name and whose rhs is made
>> > up
>> > of the remaining column names.  It ignores the "formula" attribute.
>> >
>> > In fact, CO2 does have a formula attribute but its not extracted by
>> > formula.data.frame:
>> >
>> >> [EMAIL PROTECTED]
>> > uptake ~ conc | Plant
>> >> formula(CO2)
>> > Plant ~ Type + Treatment + conc + uptake
>>
>> Indeed! And, following up yet again on my own follow-up comment:
>>
>> library(combinat)
>>
>> for(j in (1:4)){
>>  for(i in combn((1:4),j,simplify=FALSE)){
>>print(formula(CO2[,c(5,i)]))
>>  }
>> }
>> uptake ~ Plant
>> uptake ~ Type
>> uptake ~ Treatment
>> uptake ~ conc
>> uptake ~ Plant + Type
>> uptake ~ Plant + Treatment
>> uptake ~ Plant + conc
>> uptake ~ Type + Treatment
>> uptake ~ Type + conc
>> uptake ~ Treatment + conc
>> uptake ~ Plant + Type + Treatment
>> uptake ~ Plant + Type + conc
>> uptake ~ Plant + Treatment + conc
>> uptake ~ Type + Treatment + conc
>> uptake ~ Plant + Type + Treatment + conc
>>
>> opening the door to automated fitting of all possible models
>> (without interactions)!
>>
>> Now if only I could find out how to do the interactions as well,
>> I would never need to think again!
>>
>> best wishes,
>> Ted.
>>
>> 
>> E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
>> Fax-to-email: +44 (0)870 094 0861
>> Date: 16-Jul-07   Time: 15:40:36
>> -- XFMail --
>>
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Jul-07   Time: 16:13:15
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] formula(CO2)

2007-07-16 Thread Ted Harding

On 16-Jul-07 14:16:10, Gabor Grothendieck wrote:
> Following up on your comments it seems formula.data.frame just creates
> a formula whose lhs is the first column name and whose rhs is made up
> of the remaining column names.  It ignores the "formula" attribute.
> 
> In fact, CO2 does have a formula attribute but its not extracted by
> formula.data.frame:
> 
>> [EMAIL PROTECTED]
> uptake ~ conc | Plant
>> formula(CO2)
> Plant ~ Type + Treatment + conc + uptake

Indeed! And, following up yet again on my own follow-up comment:

library(combinat)

for(j in (1:4)){
  for(i in combn((1:4),j,simplify=FALSE)){
print(formula(CO2[,c(5,i)]))
  }
}
uptake ~ Plant
uptake ~ Type
uptake ~ Treatment
uptake ~ conc
uptake ~ Plant + Type
uptake ~ Plant + Treatment
uptake ~ Plant + conc
uptake ~ Type + Treatment
uptake ~ Type + conc
uptake ~ Treatment + conc
uptake ~ Plant + Type + Treatment
uptake ~ Plant + Type + conc
uptake ~ Plant + Treatment + conc
uptake ~ Type + Treatment + conc
uptake ~ Plant + Type + Treatment + conc

opening the door to automated fitting of all possible models
(without interactions)!

Now if only I could find out how to do the interactions as well,
I would never need to think again!

best wishes,
Ted.

--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Jul-07   Time: 15:40:36
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] formula(CO2)

2007-07-16 Thread Ted Harding

On 16-Jul-07 13:57:56, Ted Harding wrote:
> On 16-Jul-07 13:28:50, Gabor Grothendieck wrote:
>> The formula attribute of the builtin CO2 dataset seems a bit strange:
>> 
>>> formula(CO2)
>> Plant ~ Type + Treatment + conc + uptake
>> 
>> What is one supposed to do with that?  Certainly its not suitable
>> for input to lm and none of the examples in ?CO2 use the above.
> 
> I think one is supposed to ignore it! (Or maybe be inspired to
> write a mail to the list ... ).
> 
> I couldn't find anything that looked like the above formula from
> str(CO2). But I did spot that the order of terms in the formula:
> Plant, Type, treatment, conc, uptake, is the same as the order
> of the "columns" in the dataframe.
> 
> So I tried:
> 
>   D<-data.frame(x=(1:10),y=(1:10))
> 
>   formula(D)
>   x ~ y
> 
> So, lo and behold, D has a formula!
> 
> Or does it? Maybe if you give formula() a dataframe, it simply
> constructs one from the "columns".

Now that I think about it, I can see a use for this phenomenon:

  > formula(CO2)
  Plant ~ Type + Treatment + conc + uptake
  > formula(CO2[,2:5])
  Type ~ Treatment + conc + uptake
  > formula(CO2[,3:5])
  Treatment ~ conc + uptake
  > formula(CO2[,4:5])
  conc ~ uptake
  > formula(CO2[,c(5,1,2,3,4)])
  uptake ~ Plant + Type + Treatment + conc


Could save a lot of typing!

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Jul-07   Time: 15:14:38
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] formula(CO2)

2007-07-16 Thread Ted Harding

On 16-Jul-07 13:28:50, Gabor Grothendieck wrote:
> The formula attribute of the builtin CO2 dataset seems a bit strange:
> 
>> formula(CO2)
> Plant ~ Type + Treatment + conc + uptake
> 
> What is one supposed to do with that?  Certainly its not suitable
> for input to lm and none of the examples in ?CO2 use the above.

I think one is supposed to ignore it! (Or maybe be inspired to
write a mail to the list ... ).

I couldn't find anything that looked like the above formula from
str(CO2). But I did spot that the order of terms in the formula:
Plant, Type, treatment, conc, uptake, is the same as the order
of the "columns" in the dataframe.

So I tried:

  D<-data.frame(x=(1:10),y=(1:10))

  formula(D)
  x ~ y

So, lo and behold, D has a formula!

Or does it? Maybe if you give formula() a dataframe, it simply
constructs one from the "columns".

Best wishes,
Ted.

------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Jul-07   Time: 14:57:28
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] attachments on R mailing lists

2006-10-26 Thread Ted Harding

On 26-Oct-06 Martin Maechler wrote:
> 
>>>>>> "Michael" == Michael Toews <[EMAIL PROTECTED]>
>>>>>> on Wed, 25 Oct 2006 19:00:03 -0700 writes:
> 
> Michael> Okay ... I'll try to attach that patch once more
> Michael> ... (does this list only accept certain exertions
> Michael> for attachments? I used '.patch', but it must have
> Michael> been filtered off, so I'll try '.patch.txt' now
> 
> It's not the file name, but the MIME type of an attachment which
> is important.
>   text/plain  is valid, e.g.,
> 
> The posting guide and even http://www.r-project.org/mail.html
> mention this.
> 
> It seems your e-mail software does not allow to specify the MIME
> type used, but rather determines it from the file extension.
> Other e-mail client programs send all "unknown" attachments" as
> binary; and most ascii texts seem "unknown" in this regard.

Mine (XFMail) always gives me the option to confirm or change the
MIME type it has determined from the extension or, if it does not
recognise the extension, I am asked to assign the MIME type.

> All this feels Micro$oftish and it is sad to see that such a 
> softie--attitude also invades free software developments such
> as a unix-based version Thunderbird which I'm guessing you use.

I have to agree with this. Quite apart from any criticisms one
may have of the quality of technical performance of M$ software,
one of its major annoyances since its early days has been its
tendency to "know better" than the user, and to assume control
over choices which the user would prefer to make judiciously.

One outstanding quality of Unix, and later Linux, is the fact
that the user can consciously choose and control what the software
does. Regrettably, Linux increasingly conceals this beneath
"user-friendly" interfaces which take control, softie-style.
This is undoubtedly an attempt to appeal to a wider user-base.
Fortunately, the underlying controllability is still accessible
in many cases if you lift the lid. But GUI-driven applications
such as web-browsers and web-based mailers tend to work from
the top (GUI) down, making this hard to reach.

Best wishes to all,
Ted.

E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 26-Oct-06   Time: 12:56:22
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] bounding box in PostScript

2006-04-17 Thread Ted Harding

On 17-Apr-06 Prof Brian Ripley wrote:
> On Mon, 17 Apr 2006, David Allen wrote:
> 
>> When a graph is saved as PostScript, the bounding box is often too
>> big.
>> A consequence is that when the graph is included in a LaTeX document,
>> the spacing does not look good.
>> Is this a recognized problem? Is someone working on it? Could I help?
> 
> It's not really true.  The bounding box is of the figure region, and it
> is 
> a design feature.  If the figure `region' that is too big for your 
> purposes you need to adjust the margins.
> 
> Alternatively, there are lots of ways to set a tight bounding box: I
> tend to use the bbox device in gs.  But that often ends up with
> figures which do not align, depending on whether text has descenders.

Ghostscript's BBox resources tend to suffer from this problem,
since for some reason as Brian says it may not properly allow
for the true sizes of characters. I once tweaked ps2eps to
correct (more or less) for this.

But nowadays, to get it exactly right, I use gv (ghostview
front end) to open the EPS file, and set "watch" mode:

  gv -watch filename.eps

If it doesn't automatically give the "BoundingBox" view, you
can always select this from the top menu. This ensures that
the frame of the gv window is the same as the BoundingBox.

Then open the EPS file for editing, and look for the

  %%BoundingBox llx lly urx ury

line (it may be near the beginning, or at the end -- which
will be flagged by a line

  %%BoundingBox: atend

near the beinning).

Here, llx, lly, urx, ury are integers giving the coordinates
(in points relative to the page origin) of the lower left,
and upper right, corners of the bounding box.

Now you can change the values of these. Each time you change,
save the edited file. Because gv is in "watch" mode, it will
re-draw the page whenever the file changes. Thus you can adjust
the bounding box until it is just as you want it.

Best wishes,
Ted.

E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 17-Apr-06   Time: 23:05:29
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] floor and ceiling can't handle more than 15 decimal pla

2006-02-12 Thread Ted Harding

On 12-Feb-06 [EMAIL PROTECTED] wrote:
> Full_Name: Ben Phalan
> Version: 2.2.1
> OS: Win XP
> Submission from: (NULL) (131.111.111.231)
> 
> 
> I have noticed that floor returns the wrong number when there are more
> than 15
> decimal places:
> 
>> floor(6.999)
> [1] 6
>> floor(6.)
> [1] 7
> 
> There is a similar problem with ceiling, so this may apply to all/most
> rounding functions?
> 
>> ceiling (2.001)
> [1] 3
>> ceiling (2.0001)
> [1] 2

This is not a problem (nor a bug) with 'floor' or 'ceiling'.
The "problem" (in quotes because the real problem is the user's)
is in R, intrinsic to the finite-length floating-point arithmetic
which is used. See:

  > 6.999 - 7
  [1] -8.881784e-16
  > 6. - 7
  [1] 0
  > 2.001 - 2
  [1] 8.881784e-16
  > 2.0001 - 2
  [1] 0

so, in fact, R cannot "see" the 16th decimal place when you enter
a number to that precision -- it is simply lost. Exactly the same
"problem" would arise at some point whatever the finite precision
to which a floating-point number is stored. The effect is not
confined to functions 'floor' and 'ceiling' or any similar
"rounding" functions. It applies to all functions; it is simply
more obvious with the rounding functions.

Enter

  .Machine

and the first two items in the output are:

  $double.eps
  [1] 2.220446e-16

  $double.neg.eps
  [1] 1.110223e-16

showing that the smallest difference which can be "seen" by R
is greater than 1-^(-16).

So, when you type it in, you *think* you have entered

2.0001

into R, but you have not. So the user has to face the problem of
how to cope with the finite-length representation in any situation
where the distinction between 2 and 2.0001 really
matters.

Hoping this helps,
Ted.

E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 12-Feb-06   Time: 15:37:53
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] floor and ceiling can't handle more than 15 decimal pla (PR#8591)

2006-02-12 Thread ted . harding

On 12-Feb-06 [EMAIL PROTECTED] wrote:
> Full_Name: Ben Phalan
> Version: 2.2.1
> OS: Win XP
> Submission from: (NULL) (131.111.111.231)
> 
> 
> I have noticed that floor returns the wrong number when there are more
> than 15
> decimal places:
> 
>> floor(6.999)
> [1] 6
>> floor(6.)
> [1] 7
> 
> There is a similar problem with ceiling, so this may apply to all/most
> rounding functions?
> 
>> ceiling (2.001)
> [1] 3
>> ceiling (2.0001)
> [1] 2

This is not a problem (nor a bug) with 'floor' or 'ceiling'.
The "problem" (in quotes because the real problem is the user's)
is in R, intrinsic to the finite-length floating-point arithmetic
which is used. See:

  > 6.999 - 7
  [1] -8.881784e-16
  > 6. - 7
  [1] 0
  > 2.001 - 2
  [1] 8.881784e-16
  > 2.0001 - 2
  [1] 0

so, in fact, R cannot "see" the 16th decimal place when you enter
a number to that precision -- it is simply lost. Exactly the same
"problem" would arise at some point whatever the finite precision
to which a floating-point number is stored. The effect is not
confined to functions 'floor' and 'ceiling' or any similar
"rounding" functions. It applies to all functions; it is simply
more obvious with the rounding functions.

Enter

  .Machine

and the first two items in the output are:

  $double.eps
  [1] 2.220446e-16

  $double.neg.eps
  [1] 1.110223e-16

showing that the smallest difference which can be "seen" by R
is greater than 1-^(-16).

So, when you type it in, you *think* you have entered

2.0001

into R, but you have not. So the user has to face the problem of
how to cope with the finite-length representation in any situation
where the distinction between 2 and 2.0001 really
matters.

Hoping this helps,
Ted.

E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 12-Feb-06   Time: 15:37:53
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] pbinom with size argument 0 (PR#8560)

2006-02-06 Thread Ted Harding

On 05-Feb-06 [EMAIL PROTECTED] wrote:
> Hello all
> 
> A pragmatic argument for allowing size=3D=3D0 is the situation where
> the size is in itself a random variable (that's how I stumbled over
> the inconsistency, by the way).
> 
> For example, in textbooks on probability it is stated that:
> 
>   If X is Poisson(lambda), and the conditional=20
>   distribution of Y given X is Binomial(X,p), then=20
>   Y is Poisson(lambda*p).
> 
> (cf eg Pitman's "Probability", p. 400)
> 
> Clearly this statement requires Binomial(0,p) to be a well-defined
> distribution.
> 
> Such statements would be quite convoluted if we did not define
> Binomial(0,p) as a legal (but degenerate) distribution. The same
> applies to codes where the size parameter may attain the value 0.
> 
> Just my 2 cents.
> 
> Cheers,
> 
> Uffe

Uffe's pragmatic argument is of course convincing at least in
the circumstances he refers to. However, Peter Ehlers' posting
has re-stimulated the underlying ambiguity I feel about this
issue (intially, that the probability of a "non-event" should
be undefined).

Thus I can envisage different circumatances in which one or the
other view could be appropriate.

Uffe observes a Poisson-distributed number of Bernoulli trials
and records the number of "successes", with zero if the Poisson
distribution says "zero trials". In that case no Bernoulli trial
has been carried out, so the issue of what the distribution over
its empty set of outcomes should be is irrelevant. However, he
can encapsulate this process mathematically by assigning P=1
to the outcome r=0 when n=0, and this may well lead to a more
straightforward R program, for instance (which, reading between
the lines, may well be what really happened in his case).

On the other hand, suppose I (and maybe Peter Ehlers too) am
simulating a study in which random numbers (according to some
distribution) of subjects become available, in each "sweep" of the
study, for questionnaire, and the outcome of interest is the
number in the "sweep" answering "Yes" to a question. Part of this
simulation is to create a database of responses along with concomitant
variables. It is possible (and under some circumstances perhaps more
likely) that the number of available subjects in a "sweep" is zero --
these people cannot be contacted, say.

Maybe I'm studying a "missing data" situation.

In that case it would be natural to enter "r=NA" in the
database for those sweeps which produces no responses. This
would denote "missing data". And natural also to (initially,
before embarking on say an imputation exercise) to attribute
"P=NA" to the probability of "Yes" for such a group since
we do not have any direct information (though may be able to
exploit associations between other variables to obtain indirect
information, under certain assumptions).

So maybe one could need implementations of pbinom and dbinom
which work differently in different circumstances. But what
remains important is that, whichever way they work in given
circumstances, they should be consistent with each other.

Best wishes to all,
Ted.

E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 06-Feb-06   Time: 10:10:19
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] pbinom with size argument 0 (PR#8560)

2006-02-03 Thread Ted Harding

On 03-Feb-06 Peter Dalgaard wrote:
> (Ted Harding) <[EMAIL PROTECTED]> writes:
> 
>> On 03-Feb-06 [EMAIL PROTECTED] wrote:
>> > Full_Name: Uffe Høgsbro Thygesen
>> > Version: 2.2.0
>> > OS: linux
>> > Submission from: (NULL) (130.226.135.250)
>> > 
>> > 
>> > Hello all.
>> > 
>> >   pbinom(q=0,size=0,prob=0.5)
>> > 
>> > returns the value NaN. I had expected the result 1. In fact any
>> > value for q seems to give an NaN.
>> 
>> Well, "NaN" can make sense since "q=0" refers to a single sampled
>> value, and there is no value which you can sample from "size=0";
>> i.e. sampling from "size=0" is a non-event. I think the probability
>> of a non-event should be NaN, not 1! (But maybe others might argue
>> that if you try to sample from an empty urn you necessarily get
>> zero "successes", so p should be 1; but I would counter that you
>> also necessarily get zero "failures" so q should be 1. I suppose
>> it may be a matter of whether you regard the "r" of the binomial
>> distribution as referring to the "identities" of the outcomes
>> rather than to how many you get of a particular type. Hmmm.)
>> 
>> > Note that
>> > 
>> >   dbinom(x=0,size=0,prob=0.5)
>> > 
>> > returns the value 1.
>> 
>> That is probably because the .Internal code for pbinom may do
>> a preliminary test for "x >= size". This also makes sense, for
>> the cumulative p for any  with a finite range,
>> since the answer must then be 1 and a lot of computation would
>> be saved (likewise returning 0 when x < 0). However, it would
>> make even more sense to have a preceding test for "size<=0"
>> and return NaN in that case since, for the same reasons as
>> above, the result is the probability of a non-event.
> 
> Once you get your coffee, you'll likely realize that you got
> your p's and d's mixed up...

You're right about the mix-up! (I must mend the pipeline.)

> I think Uffe is perfectly right: The result of zero experiments will
> be zero successes (and zero failures) with probability 1, so the
> cumulative distribution function is a step function with one step at
> zero ( == as.numeric(x>=0) ).

I'm perfectly happy with this argument so long as it leads to
dbinom(x=0,size=0,prob=p)=1 and also pbinom(q=0,size=0,prob=p)=1
(which seems to be what you are arguing too). And I think there
are no traps if p=0 or p=1.

>> (But it depends on your point of view, as above ... However,
>> surely the two  should be consistent with each other.)

Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Feb-06   Time: 15:07:49
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] pbinom with size argument 0 (PR#8560)

2006-02-03 Thread Ted Harding

On 03-Feb-06 [EMAIL PROTECTED] wrote:
> Full_Name: Uffe Høgsbro Thygesen
> Version: 2.2.0
> OS: linux
> Submission from: (NULL) (130.226.135.250)
> 
> 
> Hello all.
> 
>   pbinom(q=0,size=0,prob=0.5)
> 
> returns the value NaN. I had expected the result 1. In fact any
> value for q seems to give an NaN.

Well, "NaN" can make sense since "q=0" refers to a single sampled
value, and there is no value which you can sample from "size=0";
i.e. sampling from "size=0" is a non-event. I think the probability
of a non-event should be NaN, not 1! (But maybe others might argue
that if you try to sample from an empty urn you necessarily get
zero "successes", so p should be 1; but I would counter that you
also necessarily get zero "failures" so q should be 1. I suppose
it may be a matter of whether you regard the "r" of the binomial
distribution as referring to the "identities" of the outcomes
rather than to how many you get of a particular type. Hmmm.)

> Note that
> 
>   dbinom(x=0,size=0,prob=0.5)
> 
> returns the value 1.

That is probably because the .Internal code for pbinom may do
a preliminary test for "x >= size". This also makes sense, for
the cumulative p for any  with a finite range,
since the answer must then be 1 and a lot of computation would
be saved (likewise returning 0 when x < 0). However, it would
make even more sense to have a preceding test for "size<=0"
and return NaN in that case since, for the same reasons as
above, the result is the probability of a non-event.

(But it depends on your point of view, as above ... However,
surely the two  should be consistent with each other.)

Best wishes,
Ted.

E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Feb-06   Time: 14:34:28
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R on the brain

2006-01-30 Thread Ted Harding

On 30-Jan-06 Roger D. Peng wrote:
> Well shared! :)  Maybe better yet,
> 
> "Before I begin this talk, I'd like to 'attach("nano")'".
> 
> -roger

And, at the end of the talk:

   "Save workspace image? [y/n/c]: "

Ted.

> Ben Bolker wrote:
>>I was sitting in the coffee room at work listening to people
>>complain
>> about a recent seminar about nanotechnology using the terms 
>> nanofluidics, nanofactory, nano-this, and nano-that ... I found myself
>> thinking "well the speaker should just
>> have said
>>with(nano,
>>   ...)
>> 
>>Un(?)fortunately there's no-one here I can share that thought with.
>> 
> 
> -- 
> Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 30-Jan-06   Time: 20:49:43
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] typo in `eurodist'

2005-12-08 Thread Ted Harding

On 08-Dec-05 Martin Maechler wrote:
>>>>>> "Torsten" == Torsten Hothorn <[EMAIL PROTECTED]>
>>>>>> on Thu, 8 Dec 2005 08:51:57 +0100 (CET) writes:
> 
> Torsten> On Wed, 7 Dec 2005, Prof Brian Ripley wrote:
> >> I've often wondered about that.
> Torsten> and the copy editor did too :-)
> 
> >> I've presumed that the names were
> >> deliberate, so have you checked the stated source?  It's not
> >> readily available to me (as one would expect in Oxford)?
> 
> Torsten> our library doesn't seems to have a copy of `The
> Torsten> Cambridge Encyclopaedia', so I can't check
> Torsten> either. Google has 74.900 hits for `Gibralta'
> Torsten> (more than one would expect for a typo, I think)
> Torsten> and 57.700.000 for `Gibraltar'.
> 
> Torsten> So maybe both spellings are in use.
> 
> Well,  do you expect web authors to have a much lower rate of
> typos than 1:770 ?
> My limited experience on "google voting for spelling correction"
> has rather lowered my expectation on webauthors' education in
> orthography...
> 
> Martin

Hmmm ... Using my Google's "Results ... of about xxx":


  Gibraltar 50,700,000 
  Gibralta  75,200
  Gibraltr 573
  Gibralar 836
  Gibratar   1,020
  Gibrltar 349
  Gibaltar   1,850
  Giraltar 530
  Gbraltar 352
  ibralter 576

I'm not proposing to get exhaustive about this, but a few further
experiments suggest that other specific typos are typically O(500)
in frequency:

  Gibralatar   589
  Gibrlatar618
  Gibrltar 349
  Gobraltar652


So -- if anyone can find a typo of "Gibraltar" which googles to
more than 5000 hits (excepting "Gibralta")?

Best wishes,
Ted.



E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Dec-05   Time: 18:58:04
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [R] computing the variance

2005-12-05 Thread Ted Harding

On 05-Dec-05 Duncan Murdoch wrote:
>> The variance of X is (or damn well should be) defined as
>> 
>>   Var(X) = E(X^2) - (E(X))^2
>> 
>> and this comes to (Sum(X^2) - (Sum(X)/N)^2))/(N-1).
> 
> I don't follow this.  I agree with the first line (though I prefer to 
> write it differently), but I don't see how it leads to the second.  For
> example, consider a distribution which is equally likely to be +/- 1, 
> and a sample from it consisting of a single 1 and a single -1.  The 
> first formula gives 1 (which is the variance), the second gives 2.
> 
> The second formula is unbiased because in a random sample I am just as 
> likely to get a 0 from the second formula, but I'm curious about what 
> you mean by "this comes to".
> 
> Duncan

Sorry, you're of course right -- I was being a bit hasty and
maganed to tangle this with a standard definition of the
"variance" of a finite population which uses the 1/(N-1)
divisor!




E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 05-Dec-05   Time: 20:08:38
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [R] computing the variance

2005-12-05 Thread Ted Harding

On 05-Dec-05 Martin Maechler wrote:
> UweL> x <- c(1,2,3,4,5)
> UweL> n <- length(x)
> UweL> var(x)*(n-1)/n
> 
> UweL> if you really want it.
> 
> It seems Insightful at some point in time have given in to
> this user request, and S-plus nowadays has
> an argument  "unbiased = TRUE"
> where the user can choose {to shoot (him/her)self in the leg and}
> require 'unbiased = FALSE'.
> {and there's also 'SumSquraes = FALSE' which allows to not
> require any division (by N or N-1)}
> 
> Since in some ``schools of statistics'' people are really still
> taught to use a 1/N variance, we could envisage to provide such an
> argument to var() {and cov()} as well.  Otherwise, people define
> their own variance function such as  
>   VAR <- function(x,) .. N/(N-1)*var(x,...)
> Should we?

If people need to do this, such an option would be a convenience,
but I don't see that it has much further merit than that.

My view of how to calculate a "variance" is based, not directly
on the the "unbiased" issue, but on the following.

Suppose you define a RV X as a single value sampled from a finite
population of values X1,...,XN.

The variance of X is (or damn well should be) defined as

  Var(X) = E(X^2) - (E(X))^2

and this comes to (Sum(X^2) - (Sum(X)/N)^2))/(N-1).

So this is the variance of the set of values {X1,...,XN} from
that point of view. Similarly I like to preserve the analogy
by calling the "variance" of a set of sampled values {x1,...,xn}
the quantity caluclated by dividing by (n-1).

This of course links with "unbiased" in that when you use the
"1/(n-1)" definition you do have an unbiased estimator of Var(X).

And it ties in nicely with what I call "The Fundamental Formula
of the Analysis of Variance" (coincidentally mentioned recently
on R-help):

  Var[X](X) = E[J](Var[X|J](X|J)) + Var[J](E[X|J](X|J))

for two random variables Y, J (where "E[Y](...)", "E[X|Y](...)"
mean "Expectation using the distribution of Y or the conditional
distribution of X|Y respectively").

Now, for instance, take a set of samples of sizes n1,...,nk
indexed by j=1,...,k and pick a value X at random from the
N = n1+...+nk sample values. This gives rise also to a random
value of J (sample index) with P(J=j) = nj/N. Now apply the
FFAOV and you get the usual breakdown of the sum of squares.
And so on.

All that sort of thing is too fundamental to be undermined by
making more than the minimum necessary concession (i.e. convenient
for those who must) to the "1/n" view of the matter.

I think the confusion (and it does exist) arises (a) from the
hisytorical notion that "variance" = the mean squared deviation from
the mean (which I prefer to name by its description); together
with (b) that the Mean, Variance and such occur very early on in
even the least mathematical of introductory statistics courses,
and participants in these are invariaboy puzzled by the somewhat
mysterious appearnce on stage of "1/(n-1)" -- it can be easier
to sweep this under the carpet by saying "it doesn't make any
difference in large samples" etc.

> BTW: S+ even has the 'unbiased' argument for cor() where of course
> it really doesn't make any difference (!), and actually I think is
> rather misleading, since the sample correlation is not unbiased
> in almost all cases AFAICS.

Agreed. I wasn't aware of this -- what is the S-plus default?
(not that it matters ... ). A simply silly distinction, and
possibly a carry-over from the "confusion" described above
(e.g. "Since sample correlation is calculated as
 Cov(X,Y)/sqrt(Var(X)*Var(Y)), should we use the unbiased
or the biased versions of the Cov and the Vars?").

Hmmm.
Best wishes,
Ted.

E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 05-Dec-05   Time: 19:19:20
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] is.na<- problem

2005-10-19 Thread Ted Harding

On 19-Oct-05 Gabor Grothendieck wrote:
> In the following the first element of xx should have
> been set to 0 but remains NA.  Any comments?
> 
>> xx <- c(NA,1)
>> is.na(xx) <- 0
>> xx
> [1] NA  1
>> R.version.string # Windows XP
> [1] "R version 2.2.0, 2005-09-20"

I wonder, has it ever worked? I get the same as you on

R.version.string #Linux
[1] "R version 1.6.2, 2003-01-10"

R.version.string #Linux
[1] "R version 1.8.0, 2003-10-08"

R.version.string #Linux
[1] "R version 2.1.0, 2005-04-08"

with the exception of

R.version.string #Linux
[1] "R version 1.2.3, 2001-04-26"

which does know about is.na()<- at all.


Hmmm
Ted.







E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 20-Oct-05   Time: 01:11:45
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Typo [Was: Rd and guillemots]

2005-09-17 Thread Ted Harding

On 17-Sep-05 Prof Brian Ripley wrote:
> On Fri, 16 Sep 2005 [EMAIL PROTECTED] wrote:
> 
>> On 16-Sep-05 Duncan Murdoch wrote:
>>> [...]
>>> This seems to happen in Rdconv.pm, around here:
>>>
>>>  ## avoid conversion to guillemots
>>>  $c =~ s/<>>  $c =~ s/>>/>\{\}>/;
>>
>> The name of the "continental" quotation mark « is "guillemet".
>>
>> The R Development Core Team must have had some bird on the brain
>> at the time ...
> 
> I don't think any authority agrees with Ted here. There are two 
> characters, left and right.

Agreed I only gave one instance. Either « or » is a guillemet.

As to "any authority", it depends what you mean by "authority".

1. Take any good French dictionary (e.g. Collins "Robert").
   Look up [Fr]"guillemet": --> [En]"quotation mark, inverted comma".
   Look up [En]"quotation mark": --> [Fr]"guillemet".

   There is a phrase "entre guillemets": --> "in quotation marks"
   or "in quotes", and vice versa.

   Look up [Fr]"guillemot": --> [En]"guillemot" and vice versa.

2. Take a good book on printing/typographical matters, e.g. "The
   Chicago Manual of Style" which is very comprehensive.

   Index: "guillemets" [the entry is in the plural]: -> 9.22-26
   "Small angle marks called guillemets («») are used for quotation
   marks ..."

   Index: "guillemot": --> nothing found.

It's not as straightforward as that, however! In French, "guillemet"
is in fact used generically for "quotation mark" and, typographically,
includes not only the marks « and » we are talking bout, but also
the marks used for similar purposes in "non-continental" typography.

So the opening double quote `` (e.g. in Times Roman) and closing ''
(sorry, can't make these marks in email) are also "guillemets".
Indeed we have [note the singular] "guillemet anglais ouvrant" (``),
"guillemet anglais fermant" (''), as well as "guillemet français
ouvrant" («), "guillemet français fermant (»); not to mention the
fact that a "guillemet français" e.g. « consists of two "chevrons"
and one can also have a "chevron ouvrant" consisting of just one
of these (can't do this either) which is also called a "guillemet
français simple ouvrant" (in PostScript "guilsingleft"), etc. And
there is (as in Courier font) the guillemet dactylographique
= typewriter quotation mark ("). And lots of other variants.

Rather than sink in the morass of French-speaking usage, we might
be better off referring to an authority closer to the sort of usage
that concerns us, So I've had a look at the Unicode Standard,
specifically

  http://www.unicode.org/Public/UNIDATA/NamesList.txt

where one can find

  00ABLEFT-POINTING DOUBLE ANGLE QUOTATION MARK
  = LEFT POINTING GUILLEMET
  = chevrons (in typography)
  * usually opening, sometimes closing

  00BBRIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
  = RIGHT POINTING GUILLEMET
  * usually closing, sometimes opening

  2039SINGLE LEFT-POINTING ANGLE QUOTATION MARK
  = LEFT POINTING SINGLE GUILLEMET
  * usually opening, sometimes closing

  203ASINGLE RIGHT-POINTING ANGLE QUOTATION MARK
  = RIGHT POINTING SINGLE GUILLEMET
  * usually closing, sometimes opening

but no guillemots!

> Collectively it seems agreed they are called guillemets, but the
> issue is over the names of the single characters, and the character
> shown is the left guillem[eo]t.

See above ...

> Adobe says these are left and right guillemot.  It seems that the 
> majority opinion does not agree, but there is a substantial usage 
> following Adobe.

That is certainly a matter of fact! And it is certainly thus in
Adobe's PostScript Language Reference Manual (see e.g. Standard
Roman Character Set in Appendix E, "Standard Character Sets and
Encoding Vectors"). So that is what must be used when invoking
them in PostScript. However, I am firmly of the view that Adobe
made an error when they gave these things the names "guillemotleft"
and "guillemotright".

> I had already changed the R source code, so please Ted and others
> follow the advice in the posting guide and
> 
> *** check the current sources before posting alleged bugs ***

Easier said than done ... However, I apologise!

Nevertheless, apart from the issue of a possible "R bug", I think
it is worth putting the record straight on the general issue of
nomenclature.

Best wishes to all,
Ted.

E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 17-Sep-05   Time: 10:51:17
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Typo [Was: Rd and guillemots]

2005-09-16 Thread Ted Harding

On 16-Sep-05 Duncan Murdoch wrote:
> [...]
> This seems to happen in Rdconv.pm, around here:
> 
>  ## avoid conversion to guillemots
>  $c =~ s/<  $c =~ s/>>/>\{\}>/;

The name of the "continental" quotation mark « is "guillemet".

The R Development Core Team must have had some bird on the brain
at the time ...

Ted.



--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Sep-05   Time: 22:51:19
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Rd and guillemots

2005-09-16 Thread Ted Harding

On 16-Sep-05 Ted Harding wrote:
> On 16-Sep-05 Duncan Murdoch wrote:
>> Yes, this is the tex that gets output:
>> 
>> \code{mlazy( <{}{}>, <>, <>)}
>> 
>> This seems to happen in Rdconv.pm, around here:
>> 
>>  ## avoid conversion to guillemots
>>  $c =~ s/<>  $c =~ s/>>/>\{\}>/;
>> 
>> 
>> But I don't know enough Perl syntax to tell it to replace all << by 
>> <{}<, instead of just the first.  (I would have guessed appending a g 
>> would work, but didn't in a quick test, i.e. $c =~ s/<> didn't work.)
>> 
>> Duncan Murdoch
> 
> Perl is overkill -- by a long way!
> 
> echo "{mlazy( <>, <>, <>)}" |
>   sed 's/<>/>{}>/g'
> 
> {mlazy( <{}{}>, <{}{}>, <{}{}>)}
> 
> Cheers,
> Ted.

Sorry, Duncan -- I misread the role of Perl in your mail.

But the substitution string might also work in Perl ... ?

Ted.



E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Sep-05   Time: 21:03:58
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Rd and guillemots

2005-09-16 Thread Ted Harding

On 16-Sep-05 Duncan Murdoch wrote:
> On 9/15/2005 7:51 PM, [EMAIL PROTECTED] wrote:
>> First of all, thanks to those who've set up R to work so smoothly with
>> Miktex-- even a total Latex bunny like me got it to work instantly, so
>> that for the first time I'm able to run my Rd files through the Latex
>> side of RCMD CHECK.
>> 
>> Now the question/buglet. One of my Rd files contains the following:
>> 
>> \code{mlazy( <>, <>, <>)}
>> 
>> When I run the file through RCMD (either RCMD CHECK or Rcmd Rd2dvi
>> --pdf) the first << and >> are left alone, but the second and third
>> pairs are converted to single guillemot characters (i.e. European
>> quotation marks). This inconsistency seems a bit odd.
> 
> Yes, this is the tex that gets output:
> 
> \code{mlazy( <{}{}>, <>, <>)}
> 
> This seems to happen in Rdconv.pm, around here:
> 
>  ## avoid conversion to guillemots
>  $c =~ s/<  $c =~ s/>>/>\{\}>/;
> 
> 
> But I don't know enough Perl syntax to tell it to replace all << by 
> <{}<, instead of just the first.  (I would have guessed appending a g 
> would work, but didn't in a quick test, i.e. $c =~ s/< didn't 
> work.)
> 
> Duncan Murdoch

Perl is overkill -- by a long way!

echo "{mlazy( <>, <>, <>)}" |
  sed 's/<>/>{}>/g'

{mlazy( <{}{}>, <{}{}>, <{}{}>)}

Cheers,
Ted.




E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Sep-05   Time: 20:33:40
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Citation for R

2005-06-14 Thread Ted Harding

On 14-Jun-05 A.J. Rossini wrote:
> Fritz - 
> 
> That's silly.  As someone pointed out, the issue is with the
> publisher, not the citation.  If R-Core were a generally well-known
> and regarded publishing house such as Springer or Microsoft, it would
> not be a problem.  But it's still a nebulous entity to MANY people,
> and so many people fail to understand this open source stuff.  It's
> seriously discouraged by most journals to cite technical reports, for
> example.  And perhaps, R could be considered more of a long-ish
> technical report than a book?   Though perhaps Peter D. could be
> considered the "editor"?  (these questions are not those that I need
> to ask, obviously!)
> 
> (just yesterday, I was asked by a reasonably intelligent colleague,
> with respect to corporate packaging of R: "So they (corporate
> packagers) just pick some version and package it, right?"  and my
> flabbergasted response was:
> 
> "and so, what the heck do you think they (corporate packagers) do with
> SAS, S-PLUS, and SPSS, and why do you think it's different...?").
> 
> best,
> -tony

Tony is getting admirably and justifiably tetchy! His comment
about the packaging of SAS, S-PLUS and SPSS is delightfully
to the point (and, had I been there at the time, I'd have
bought him a beverage of his choice in appreciation).

And, to extend his comment, where for instance does that leave
WinBUGS (see my previous mail)? Granted, perhaps, that in the
citation

  Spiegelhalter, D. J., Thomas, A. and Best, N. G. (2000)
WinBUGS Version 1.3 User Manual. Cambridge: Medical Research
Council Biostatistics Unit.
(Available from www.mrc-bsu.cam.ac.uk/bugs.)

one has a prestigious institute (MRC-BSU) as the publisher.
But, basically, it's the same self-justifiying ordinance all
over again: the creators of the software produce their own
User Manual and "publish" it. The publisher of S-PLUS and its
documentation is "Insightful Corporation": beyond their
achievement in developing S-PLUS, whence their prestige?
(Granted, again, if you look inside the books, you can find
famous names listed as contributors; but we at R can claim
many of these same names ... ).

The issue of "peer review" of software citations has been
raised. Where is the peer review of the S-PLUS or SAS or
SPSS User Manual? OK, the thing that needs peer review is
the software itself. In the "Review" section of journals
you can find critical reviews of software. But these are
not referee'd and usually represent the writer's own views;
furthermore, such reviews are not often cited in articles
where the software has been used to obtain the results in
the article. People just use the stuff, and cite the User
Manual or the "corporate packager".

Also, from time to time you come across published referee'd
articles which analyse the performance of software for
particular purposes, or compare different software packages
for a given purpose. These could be viewed as peer review,
but, again, are rarely cited by the simple user.

Moreover, suppose that citing User Manual or corporate packager
is considered to justify the use of the software in an article.
I have seen many articles where such-and-such software was used
(I mention no names) and "cited" in these terms, where it is
known to discerning users that the software does not correctly
address the tasks it claims to work for, or gives incorrect
results. Yet the "citation box" had been duly ticked, and the
article has successfully gone through the editorial process.
This is not peer review either -- and could only come close to
it if the journal editors' panel of referees could be assumed
sufficiently knowledgeable to be discriminating about such things.

But in that case you don't really need a citation: the referee
says "Used [e.g.] S-PLUS, yes, that's fine for this job and
I could reproduce the analysis using that software and trust
the results. Passed." And such a referee could say exactly
the same for R.

The true peer review of software is done by discriminating
users of the software. Peer review of R is done *HERE* (as
well as other places).

[I've had the experience of using R for analysis, only to
have the analysis repeated in S-PLUS purely for the purpose
of citing S-PLUS rather than R for publication ... ]

However, the follow-ups in this thread are departing somewhat
from Gordon Smyth's original wish to pay tribute, in citation,
to the growing team of people who have, over the last 10+ years,
contributed centrally to making R what it is.

Thinking about that question, I can't come up with a better
idea than his: an up-to-date "Ihaka and Gentleman", which
does due honour to the greatly enhanced riches of R, and R's

Re: [Rd] Citation for R

2005-06-13 Thread Ted Harding

On 13-Jun-05 Thomas Lumley wrote:
> On Mon, 13 Jun 2005, Gordon K Smyth wrote:
> 
>> This is just a note that R would get a lot more citations if the 
>> recommended citation was an article in a recognised journal or from a 
>> recognised publisher.
>>
> 
> This is unfortunately true, but R is *not* an article or a book, it is
> a 
> piece of software.  I don't think I'm the only person who thinks it is 
> counterproductive in the long run to encourage users to cite an article
> that they probably haven't read instead of citing the software they 
> actually used.
> 
> Jan's suggestion of the Journal of Statistical Software might provide a
> solution, since JSS *does* publish software.
> 
>   -thomas

Is a journal reference necessary? I have seen many articles where
the statistical software (S-Plus, SPSS, SAS, etc.) was "cited" as
the User Manual, usually only available from the supplier of the
software, sometimes with a WWW URL. Such cases provide a precedent
for R, surely. I have also seen cases where the "citation" was
simply the name of the company (with location, version, date etc.)

As an example which, as software, is closer to home, consider the
following quotation, and the corresponding citation:

Quotation:
  "This is essentially a four-level hierarchical model and
   is easily implemented in, say, WinBUGS (Spiegelhalter,
   Thomas and Best, 2000)." [p. 13 of Source reference below]

Citation Reference:
  Spiegelhalter, D. J., Thomas, A. and Best, N. G. (2000)
WinBUGS Version 1.3 User Manual. Cambridge: Medical Research
Council Biostatistics Unit.
(Available from www.mrc-bsu.cam.ac.uk/bugs.)

Source:
  David J. Spiegelhalter, Paul Aylin, Nicola G. Best,
  Stephen J. W. Evans, Gordon D. Murray (2002).
Commissioned analysis of surgical performance by using
routine data: lessons from the Bristol inquiry.
  J. R. Statist. Soc. A (2002) 165, Part 2, pp. 1-31)

Surely this would do? Does R need more justification than
WinBUGS? Are JRSS citations less canonical then other journals?

Best wishes to all,
Ted.

E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 13-Jun-05   Time: 16:30:35
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

66 matches

Mail list logo