from:"Ted"

[R] phyl.RMA error

2020-06-08 Thread Ted Stankowich

Hello,

We're trying to run phylogenetically corrected reduced major axes regression 
analyses and have encountered an error we can't debug. We're using the function 
phyl.RMA in the package 'phytools'. Here is the code we are using and the error 
it returns.



>Model <- phyl.RMA(log(Skull), log(Tusk), tree, h0=1.0)

Error in if (sign(beta1) != sign(h0)) { :

  missing value where TRUE/FALSE needed

We can't seem to figure out which argument is missing, and we've tried 
including all of the T/F based arguments we think are possible. Our species 
dataset and nexus file are printed below.  Any advice would be greatly 
appreciated.

We have the following dataset:
Binomial Skull  Tusk

1 Tragulus_javanicus93.7  14.6
2 Tragulus_kanchil  99.7  13.9
3 Tragulus_napu 98.1  11.1
4 Tragulus_nigricans99.8  13.2
5 Moschiola_meminna101.   14.6
6 Moschus_berezovskii  134.   55.0
7 Moschus_moschiferus  152.   52.9
8 Muntiacus_muntjak193.   26.4
9 Muntiacus_reevesi159.   23.4
10 Muntiacus_truongsonensis 184.   27.7
11 Muntiacus_vaginalis  203.   28.6
12 Hydropotes_inermis   162.   48.5
13 Hyemoschus_aquaticus 122.   20.1
14 Elaphodus_cephalophus186.   17.3

And the following nexus tree:

#NEXUS
[R-package APE, Mon Jun 08 12:20:01 2020]

BEGIN TAXA;
  DIMENSIONS NTAX = 12;
  TAXLABELS
 Tragulus_napu
 Tragulus_kanchil
 Tragulus_javanicus
 Hyemoschus_aquaticus
 Moschiola_meminna
 Muntiacus_reevesi
 Muntiacus_muntjak
 Muntiacus_truongsonensis
 Elaphodus_cephalophus
 Hydropotes_inermis
 Moschus_moschiferus
 Moschus_berezovskii
  ;
END;
BEGIN TREES;
  TRANSLATE
 1Tragulus_napu,
 2Tragulus_kanchil,
 3Tragulus_javanicus,
 4Hyemoschus_aquaticus,
 5Moschiola_meminna,
 6Muntiacus_reevesi,
 7Muntiacus_muntjak,
 8Muntiacus_truongsonensis,
 9Elaphodus_cephalophus,
 10  Hydropotes_inermis,
 11  Moschus_moschiferus,
 12  Moschus_berezovskii
  ;
  TREE * UNTITLED = [] 
1:5.540957781,(2:2.978817423,3:2.978817423):2.562139698):10.78911152,4:16.33006601):6.360692368,5:22.69076035):5.725388419,(6:1.611149584,7:1.611149848):1.556474893,8:3.167624477):4.130280196,9:7.297904013):1.497063399,10:8.794967413):7.19682079,(11:2.539095678,12:2.539096008):13.45269085):12.42436025);



Dr. Ted Stankowich
Associate Professor
Department of Biological Sciences
California State University Long Beach
Long Beach, CA 90840
theodore.stankow...@csulb.edu<mailto:theodore.stankow...@csulb.edu>
562-985-4826
http://www.csulb.edu/mammal-lab/
@CSULBMammalLab




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] na.omit not omitting rows

2020-06-04 Thread Ted Stankowich

Thanks - a previous response resolved the issue and I'm off and running with 
the analyses. 

-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: Thursday, June 4, 2020 5:02 PM
To: Ted Stankowich 
Cc: Rui Barradas ; William Dunlap ; 
r-help@r-project.org
Subject: Re: [R] na.omit not omitting rows

CAUTION: This email was sent from an external source. Use caution when 
replying, opening links or attachments.


Perhaps indexing with rowSums(is.na(dfrm))?

—
David

Sent from my iPhone

> On Jun 4, 2020, at 4:58 PM, Ted Stankowich  
> wrote:
>
> This worked! Thank you!
>
> -Original Message-
> From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
> Sent: Thursday, June 4, 2020 2:49 PM
> To: Ted Stankowich ; William Dunlap 
> 
> Cc: r-help@r-project.org
> Subject: Re: [R] na.omit not omitting rows
>
> CAUTION: This email was sent from an external source. Use caution when 
> replying, opening links or attachments.
>
>
> Hello,
>
> If the problem is the "na.action" attribute, here are two ways of solving it.
>
> First, an example data set.
>
> set.seed(2020)# Make the example reproducible
> phamComplBinomial <- sprintf("f%003d", 1:356)
> is.na(UphamComplBinomial) <- sample(356, 37) DarkEum <- 
> factor(sample(1:2, 356, TRUE))
> Protect1 <- data.frame(UphamComplBinomial = 
> factor(UphamComplBinomial),
> DarkEum)
>
>
> 1. Setting the attribute "na.action" to NULL removes it
>
> Protect2 <- na.omit(Protect1)
> attributes(Protect2)
> attr(Protect2, "na.action") <- NULL
> attributes(Protect2)
>
>
> 2. Use an index vector to subset the data
>
> na <- is.na(Protect1$UphamComplBinomial)
> Protect3 <- Protect1[!na, ]
>
>
> The results are identical. But if you have more than one column with NA's, 
> this second way will be more complicated.
>
> identical(Protect2, Protect3)
> #[1] TRUE
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 22:27 de 04/06/20, Ted Stankowich escreveu:
>> Thanks, but no that doesn’t work. The na.omit attributes are still in 
>> the dataframe, which you can see in the str outputs from the post. 
>> The problem line is likely:  - attr(*, "na.action")= 'omit' Named int 
>> [1:2] 2 3
>>
>> From: William Dunlap [mailto:wdun...@tibco.com]
>> Sent: Thursday, June 4, 2020 12:39 PM
>> To: Ted Stankowich 
>> Cc: r-help@r-project.org
>> Subject: Re: [R] na.omit not omitting rows
>>
>> CAUTION: This email was sent from an external source. Use caution when 
>> replying, opening links or attachments.
>>
>> Does droplevels() help?
>>
>>> d <- data.frame(size = factor(c("S","M","M","L","L"), 
>>> levels=c("S","M","L")), id=c(101,NA,NA,104,105))
>>> str(d)
>> 'data.frame':   5 obs. of  2 variables:
>>  $ size: Factor w/ 3 levels "S","M","L": 1 2 2 3 3  $ id  : num  101 
>> NA NA 104 105
>>> str(na.omit(d))
>> 'data.frame':   3 obs. of  2 variables:
>>  $ size: Factor w/ 3 levels "S","M","L": 1 3 3  $ id  : num  101 104 
>> 105
>>  - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
>>   ..- attr(*, "names")= chr [1:2] "2" "3"
>>> str(droplevels(na.omit(d)))
>> 'data.frame':   3 obs. of  2 variables:
>>  $ size: Factor w/ 2 levels "S","L": 1 2 2  $ id  : num  101 104 105
>>  - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
>>   ..- attr(*, "names")= chr [1:2] "2" "3"
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com<http://tibco.com>
>>
>>
>> On Thu, Jun 4, 2020 at 12:18 PM Ted Stankowich 
>> mailto:theodore.stankow...@csulb.edu>> wrote:
>> Hello! I'm trying to create a subset of a dataset and then remove all rows 
>> with NAs in them. Ultimately, I am running phylogenetic analyses with trees 
>> that require the tree tiplabels to match exactly with the rows in the 
>> dataframe. But when I use na.omit to delete the rows with NAs, there is 
>> still a trace of those omitted rows in the data.frame, which then causes an 
>> error in the phylogenetic analyses. Is there any way to completely scrub 
>> those omitted rows from the dataframe? The code is below. As you can see 
>> from the result of the final str(Protect1) line, there are attributes with 
>> the omitted features still in the dataframe (356 species names in the 
>> UphamComplBinomial factor, but on

Re: [R] na.omit not omitting rows

2020-06-04 Thread Ted Stankowich

This worked! Thank you!

-Original Message-
From: Rui Barradas [mailto:ruipbarra...@sapo.pt] 
Sent: Thursday, June 4, 2020 2:49 PM
To: Ted Stankowich ; William Dunlap 

Cc: r-help@r-project.org
Subject: Re: [R] na.omit not omitting rows

CAUTION: This email was sent from an external source. Use caution when 
replying, opening links or attachments.


Hello,

If the problem is the "na.action" attribute, here are two ways of solving it.

First, an example data set.

set.seed(2020)# Make the example reproducible
phamComplBinomial <- sprintf("f%003d", 1:356)
is.na(UphamComplBinomial) <- sample(356, 37) DarkEum <- factor(sample(1:2, 356, 
TRUE))
Protect1 <- data.frame(UphamComplBinomial = factor(UphamComplBinomial),
DarkEum)


1. Setting the attribute "na.action" to NULL removes it

Protect2 <- na.omit(Protect1)
attributes(Protect2)
attr(Protect2, "na.action") <- NULL
attributes(Protect2)


2. Use an index vector to subset the data

na <- is.na(Protect1$UphamComplBinomial)
Protect3 <- Protect1[!na, ]


The results are identical. But if you have more than one column with NA's, this 
second way will be more complicated.

identical(Protect2, Protect3)
#[1] TRUE


Hope this helps,

Rui Barradas

Às 22:27 de 04/06/20, Ted Stankowich escreveu:
> Thanks, but no that doesn’t work. The na.omit attributes are still in 
> the dataframe, which you can see in the str outputs from the post. The 
> problem line is likely:  - attr(*, "na.action")= 'omit' Named int 
> [1:2] 2 3
>
> From: William Dunlap [mailto:wdun...@tibco.com]
> Sent: Thursday, June 4, 2020 12:39 PM
> To: Ted Stankowich 
> Cc: r-help@r-project.org
> Subject: Re: [R] na.omit not omitting rows
>
> CAUTION: This email was sent from an external source. Use caution when 
> replying, opening links or attachments.
>
> Does droplevels() help?
>
>> d <- data.frame(size = factor(c("S","M","M","L","L"), 
>> levels=c("S","M","L")), id=c(101,NA,NA,104,105))
>> str(d)
> 'data.frame':   5 obs. of  2 variables:
>   $ size: Factor w/ 3 levels "S","M","L": 1 2 2 3 3
>   $ id  : num  101 NA NA 104 105
>> str(na.omit(d))
> 'data.frame':   3 obs. of  2 variables:
>   $ size: Factor w/ 3 levels "S","M","L": 1 3 3
>   $ id  : num  101 104 105
>   - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
>..- attr(*, "names")= chr [1:2] "2" "3"
>> str(droplevels(na.omit(d)))
> 'data.frame':   3 obs. of  2 variables:
>   $ size: Factor w/ 2 levels "S","L": 1 2 2
>   $ id  : num  101 104 105
>   - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
>..- attr(*, "names")= chr [1:2] "2" "3"
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com<http://tibco.com>
>
>
> On Thu, Jun 4, 2020 at 12:18 PM Ted Stankowich 
> mailto:theodore.stankow...@csulb.edu>> wrote:
> Hello! I'm trying to create a subset of a dataset and then remove all rows 
> with NAs in them. Ultimately, I am running phylogenetic analyses with trees 
> that require the tree tiplabels to match exactly with the rows in the 
> dataframe. But when I use na.omit to delete the rows with NAs, there is still 
> a trace of those omitted rows in the data.frame, which then causes an error 
> in the phylogenetic analyses. Is there any way to completely scrub those 
> omitted rows from the dataframe? The code is below. As you can see from the 
> result of the final str(Protect1) line, there are attributes with the omitted 
> features still in the dataframe (356 species names in the UphamComplBinomial 
> factor, but only 319 observations). These traces are causing errors with the 
> phylo analyses.
>
>> Protect1=as.data.frame(cbind(UphamComplBinomial, DarkEum, NoctCrep, 
>> Shade))  #Create the dataframe with variables of interest from an 
>> attached dataset row.names(Protect1)=Protect1$UphamComplBinomial 
>> #assign species names as rownames
>> Protect1=as.data.frame(na.omit(Protect1)) #drop rows with missing 
>> data
>> str(Protect1)
> 'data.frame': 319 obs. of  4 variables:
>   $ UphamComplBinomial: Factor w/ 356 levels 
> "Allenopithecus_nigroviridis_CERCOPITHECIDAE_PRIMATES",..: 1 2 3 4 5 8 9 10 
> 11 12 ...
>   $ DarkEum   : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
>   $ NoctCrep  : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
>   $ Shade : Factor w/ 59 levels "0.1","0.2","0.25",..: 10 58 53 
> 17 49 52 52

Re: [R] na.omit not omitting rows

2020-06-04 Thread Ted Stankowich

Thanks, but no that doesn’t work. The na.omit attributes are still in the 
dataframe, which you can see in the str outputs from the post. The problem line 
is likely:  - attr(*, "na.action")= 'omit' Named int [1:2] 2 3

From: William Dunlap [mailto:wdun...@tibco.com]
Sent: Thursday, June 4, 2020 12:39 PM
To: Ted Stankowich 
Cc: r-help@r-project.org
Subject: Re: [R] na.omit not omitting rows

CAUTION: This email was sent from an external source. Use caution when 
replying, opening links or attachments.

Does droplevels() help?

> d <- data.frame(size = factor(c("S","M","M","L","L"), levels=c("S","M","L")), 
> id=c(101,NA,NA,104,105))
> str(d)
'data.frame':   5 obs. of  2 variables:
 $ size: Factor w/ 3 levels "S","M","L": 1 2 2 3 3
 $ id  : num  101 NA NA 104 105
> str(na.omit(d))
'data.frame':   3 obs. of  2 variables:
 $ size: Factor w/ 3 levels "S","M","L": 1 3 3
 $ id  : num  101 104 105
 - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
  ..- attr(*, "names")= chr [1:2] "2" "3"
> str(droplevels(na.omit(d)))
'data.frame':   3 obs. of  2 variables:
 $ size: Factor w/ 2 levels "S","L": 1 2 2
 $ id  : num  101 104 105
 - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
  ..- attr(*, "names")= chr [1:2] "2" "3"

Bill Dunlap
TIBCO Software
wdunlap tibco.com<http://tibco.com>

On Thu, Jun 4, 2020 at 12:18 PM Ted Stankowich 
mailto:theodore.stankow...@csulb.edu>> wrote:
Hello! I'm trying to create a subset of a dataset and then remove all rows with 
NAs in them. Ultimately, I am running phylogenetic analyses with trees that 
require the tree tiplabels to match exactly with the rows in the dataframe. But 
when I use na.omit to delete the rows with NAs, there is still a trace of those 
omitted rows in the data.frame, which then causes an error in the phylogenetic 
analyses. Is there any way to completely scrub those omitted rows from the 
dataframe? The code is below. As you can see from the result of the final 
str(Protect1) line, there are attributes with the omitted features still in the 
dataframe (356 species names in the UphamComplBinomial factor, but only 319 
observations). These traces are causing errors with the phylo analyses.

> Protect1=as.data.frame(cbind(UphamComplBinomial, DarkEum, NoctCrep, Shade))  
> #Create the dataframe with variables of interest from an attached dataset
> row.names(Protect1)=Protect1$UphamComplBinomial #assign species names as 
> rownames
> Protect1=as.data.frame(na.omit(Protect1)) #drop rows with missing data
> str(Protect1)
'data.frame': 319 obs. of  4 variables:
 $ UphamComplBinomial: Factor w/ 356 levels 
"Allenopithecus_nigroviridis_CERCOPITHECIDAE_PRIMATES",..: 1 2 3 4 5 8 9 10 11 
12 ...
 $ DarkEum   : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
 $ NoctCrep  : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
 $ Shade : Factor w/ 59 levels "0.1","0.2","0.25",..: 10 58 53 17 
49 52 52 39 39 41 ...
 - attr(*, "na.action")= 'omit' Named int  6 7 23 36 37 40 42 50 51 60 ...
  ..- attr(*, "names")= chr  "Alouatta_macconnelli_ATELIDAE_PRIMATES" 
"Alouatta_nigerrima_ATELIDAE_PRIMATES" "Ateles_fusciceps_ATELIDAE_PRIMATES" 
"Callicebus_baptista_PITHECIIDAE_PRIMATES" ...

Dr. Ted Stankowich
Associate Professor
Department of Biological Sciences
California State University Long Beach
Long Beach, CA 90840
theodore.stankow...@csulb.edu<mailto:theodore.stankow...@csulb.edu><mailto:theodore.stankow...@csulb.edu<mailto:theodore.stankow...@csulb.edu>>
562-985-4826
http://www.csulb.edu/mammal-lab/
@CSULBMammalLab

[[alternative HTML version deleted]]

__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] na.omit not omitting rows

2020-06-04 Thread Ted Stankowich

Hello! I'm trying to create a subset of a dataset and then remove all rows with 
NAs in them. Ultimately, I am running phylogenetic analyses with trees that 
require the tree tiplabels to match exactly with the rows in the dataframe. But 
when I use na.omit to delete the rows with NAs, there is still a trace of those 
omitted rows in the data.frame, which then causes an error in the phylogenetic 
analyses. Is there any way to completely scrub those omitted rows from the 
dataframe? The code is below. As you can see from the result of the final 
str(Protect1) line, there are attributes with the omitted features still in the 
dataframe (356 species names in the UphamComplBinomial factor, but only 319 
observations). These traces are causing errors with the phylo analyses.

> Protect1=as.data.frame(cbind(UphamComplBinomial, DarkEum, NoctCrep, Shade))  
> #Create the dataframe with variables of interest from an attached dataset
> row.names(Protect1)=Protect1$UphamComplBinomial #assign species names as 
> rownames
> Protect1=as.data.frame(na.omit(Protect1)) #drop rows with missing data
> str(Protect1)
'data.frame': 319 obs. of  4 variables:
 $ UphamComplBinomial: Factor w/ 356 levels 
"Allenopithecus_nigroviridis_CERCOPITHECIDAE_PRIMATES",..: 1 2 3 4 5 8 9 10 11 
12 ...
 $ DarkEum   : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
 $ NoctCrep  : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
 $ Shade : Factor w/ 59 levels "0.1","0.2","0.25",..: 10 58 53 17 
49 52 52 39 39 41 ...
 - attr(*, "na.action")= 'omit' Named int  6 7 23 36 37 40 42 50 51 60 ...
  ..- attr(*, "names")= chr  "Alouatta_macconnelli_ATELIDAE_PRIMATES" 
"Alouatta_nigerrima_ATELIDAE_PRIMATES" "Ateles_fusciceps_ATELIDAE_PRIMATES" 
"Callicebus_baptista_PITHECIIDAE_PRIMATES" ...

Dr. Ted Stankowich
Associate Professor
Department of Biological Sciences
California State University Long Beach
Long Beach, CA 90840
theodore.stankow...@csulb.edu<mailto:theodore.stankow...@csulb.edu>
562-985-4826
http://www.csulb.edu/mammal-lab/
@CSULBMammalLab




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ks.test ; impossible to calculate exact exact value with ex-aequos

2018-12-10 Thread Ted Harding

On Mon, 2018-12-10 at 22:17 +0100, Fatma Ell wrote:
> Dear all,
> I'm trying to use ks.test in order to compare two curve. I've 0 values i
> think this is why I have the follonwing warnings :impossible to calculate
> exact exact value with  ex-aequos
> 
> a=c(3.02040816326531, 7.95918367346939, 10.6162790697674, 4.64150943396226,
> 1.86538461538462, 1.125, 1.01020408163265, 1.2093023255814,
> 0.292452830188679,
> 0, 0, 0)
> b=c(2.30769230769231, 4.19252873563218, 5.81924882629108, 6.2248243559719,
> 5.02682926829268, 4.50728862973761, 3.61741424802111, 5.05479452054795,
> 3.68095238095238, 1.875, 5.25, 0)
> 
> ks.test(a,b)
> 
> data:  a and b
> D = 0.58333, p-value = 0.0337
> alternative hypothesis: two-sided
> 
> Warning message:
> In ks.test(a, b) :
> impossible to calculate exact exact value with ex-aequos
> 
> Does the p-value is correct ? Otherwise, how to solve this issue ?
> Thanks a lot.

The warning arises, not because you have "0" values as such,
but because there are repeated values (which happen to be 0).

The K-S test is designed for continuous random variables, for
which the probability of repeated values is (theoretically) zero:
theoretically, they can't happen.

>From the help page ?ks.test :

"The presence of ties always generates a warning, since continuous
distributions do not generate them. If the ties arose from
rounding the tests may be approximately valid, but even modest
amounts of rounding can have a significant effect on the
calculated statistic."

in view of the fact that your sample 'a' has three zeros along with
nine other vauwes which are all different from 0 (and all *very*
different from 0 except for 0.292452830188679), along with the fact
that your sample 'b' has 11 values all *very* different from 0.
and pne finall value equal to 0; together also with the fact that
in each sample the '0' values occur at the end, stringly suggests
that the data source is not such that a K-D test is auitasble.

The K-S test is a non-parametric test for whether
  a) a given sample comes from na given kind of distribiution;
or
  v) two samples are drwn from the same distribition.
In either case, it is assumed that the sample values are drawn
independently of each other; if there is some reason why they
may not be independent of each other, the test os not valid.

You say "I'm trying to use ks.test in order to compare two curve".
When I ezecute
  plot(a)
  plot(b)
on your data, I see (approximately) in each case a rise from a
medium vale (~2 or ~3) to a higher vale {~6 or ~10) followed
by a decline down to an exact 0.

This is not the sort of situation that the K-S test is for!
Hoping this helps,
Ted.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ignoring the domain of RV in punif()

2018-10-23 Thread Ted Harding

Well, as a final (I hope!) clarification: It is not the case that
"the bigger cube does not exists (because the Q is the reference
space)". It does exist! Simply, the probability of the random point
being in the bigger cube, and NOT in the cube Q, is 0.

Hence "the cumulative probability of reaching a point outside the
cube (u or v or w  > A) is 1" is badly phrased. The "cumulative
probability" is not the probability of *reaching* a point, but of
being (in the case of a real random variable) less than or equal
to the given value. If Prob[X <= x1] = 1, then Prob[X > x1] = 0.
Hence if x0 is the minimum value such that Prob[X <= x0] = 1,
then X "can reach" x0. But for any x1 > x0, Prob[x0 < X <= x1] = 0.
Therefore, since X cannot be greater than x0, X *cannot reach* x1!

Best wishes,
Ted.

On Tue, 2018-10-23 at 12:06 +0100, Hamed Ha wrote:
> Hi Ted,
> 
> Thanks for the explanation.
> 
> I am convinced at least more than average by Eric and your answer. But
> still have some shadows of confusion that is definitely because I have
> forgotten some fundamentals in probabilities.
> 
> In your cube example, the cumulative probability of reaching a point
> outside the cube (u or v or w  > A) is 1 however, the bigger cube does not
> exists (because the Q is the reference space). Other words, I feel that we
> extend the space to accommodate any cube of any size! Looks a bit weird to
> me!
> 
> 
> Hamed.
> 
> On Tue, 23 Oct 2018 at 11:52, Ted Harding  wrote:
> 
> > Sorry -- stupid typos in my definition below!
> > See at ===*** below.
> >
> > On Tue, 2018-10-23 at 11:41 +0100, Ted Harding wrote:
> > Before the ticket finally enters the waste bin, I think it is
> > necessary to explicitly explain what is meant by the "domain"
> > of a random variable. This is not (though in special cases
> > could be) the space of possible values of the random variable.
> >
> > Definition of (real-valued) Random Variable (RV):
> > Let Z be a probability space, i.e. a set {z} of entities z
> > on which a probability distribution is defined. The entities z
> > do not need to be numeric. A real-valued RV X is a function
> > X:Z --> R defined on Z such that, for any z in Z, X(z) is a
> > real number. The set Z, in tthis context, is (by definitipon)
> > the *domain* of X, i.e. the space on which X is defined.
> > It may or may not be (and usually is not) the same as the set
> > of possible values of X.
> >
> > Then. given any real value x0, the CDF of X at x- is Prob[X <= X0].
> > The distribution function of X does not define the domain of X.
> >
> > As a simple exam[ple: Suppose Q is a cube of side A, consisting of
> > points z=(u,v,w) with 0 <= u,v,w <= A. Z is the probability space
> > of points z with a uniform distribution of position within Q.
> > Define the random variable X:Q --> [0,1] as
> > ===***
> >   X[u,v,w) = x/A
> >
> > Wrong! That should  have been:
> >
> >   X[u,v,w) = w/A
> > ===***
> > Then X is uniformly distributed on [0,1], the domain of X is Q.
> > Then for x <= 0 _Prob[X <= x] = 0, for 0 <= x <= 1 Prob(X >=x] = x,
> > for x >= 1 Prob(X <= x] = 1. These define the CDF. The set of poaaible
> > values of X is 1-dimensional, and is not the same as the domain of X,
> > which is 3-dimensional.
> >
> > Hopiong this helps!
> > Ted.
> >
> > On Tue, 2018-10-23 at 10:54 +0100, Hamed Ha wrote:
> > > > Yes, now it makes more sense.
> > > >
> > > > Okay, I think that I am convinced and we can close this ticket.
> > > >
> > > > Thanks Eric.
> > > > Regards,
> > > > Hamed.
> > > >
> > > > On Tue, 23 Oct 2018 at 10:42, Eric Berger 
> > wrote:
> > > >
> > > > > Hi Hamed,
> > > > > That reference is sloppy. Try looking at
> > > > > https://en.wikipedia.org/wiki/Cumulative_distribution_function
> > > > > and in particular the first example which deals with a Unif[0,1] r.v.
> > > > >
> > > > > Best,
> > > > > Eric
> > > > >
> > > > >
> > > > > On Tue, Oct 23, 2018 at 12:35 PM Hamed Ha 
> > wrote:
> > > > >
> > > > >> Hi Eric,
> > > > >>
> > > > >> Thank you for your reply.
> > > > >>
> > > > >> I should say that your justification makes sense to me.  However, I
> > am in
> > > > >> doubt that CDF defines by th

Re: [R] Ignoring the domain of RV in punif()

2018-10-23 Thread Ted Harding

Sorry -- stupid typos in my definition below!
See at ===*** below.

On Tue, 2018-10-23 at 11:41 +0100, Ted Harding wrote:
Before the ticket finally enters the waste bin, I think it is
necessary to explicitly explain what is meant by the "domain"
of a random variable. This is not (though in special cases
could be) the space of possible values of the random variable.

Definition of (real-valued) Random Variable (RV):
Let Z be a probability space, i.e. a set {z} of entities z
on which a probability distribution is defined. The entities z
do not need to be numeric. A real-valued RV X is a function
X:Z --> R defined on Z such that, for any z in Z, X(z) is a
real number. The set Z, in tthis context, is (by definitipon)
the *domain* of X, i.e. the space on which X is defined.
It may or may not be (and usually is not) the same as the set
of possible values of X.

Then. given any real value x0, the CDF of X at x- is Prob[X <= X0].
The distribution function of X does not define the domain of X.

As a simple exam[ple: Suppose Q is a cube of side A, consisting of
points z=(u,v,w) with 0 <= u,v,w <= A. Z is the probability space
of points z with a uniform distribution of position within Q.
Define the random variable X:Q --> [0,1] as
===***
  X[u,v,w) = x/A

Wrong! That should  have been:

  X[u,v,w) = w/A
===***
Then X is uniformly distributed on [0,1], the domain of X is Q.
Then for x <= 0 _Prob[X <= x] = 0, for 0 <= x <= 1 Prob(X >=x] = x,
for x >= 1 Prob(X <= x] = 1. These define the CDF. The set of poaaible
values of X is 1-dimensional, and is not the same as the domain of X,
which is 3-dimensional.

Hopiong this helps!
Ted.

On Tue, 2018-10-23 at 10:54 +0100, Hamed Ha wrote:
> > Yes, now it makes more sense.
> > 
> > Okay, I think that I am convinced and we can close this ticket.
> > 
> > Thanks Eric.
> > Regards,
> > Hamed.
> > 
> > On Tue, 23 Oct 2018 at 10:42, Eric Berger  wrote:
> > 
> > > Hi Hamed,
> > > That reference is sloppy. Try looking at
> > > https://en.wikipedia.org/wiki/Cumulative_distribution_function
> > > and in particular the first example which deals with a Unif[0,1] r.v.
> > >
> > > Best,
> > > Eric
> > >
> > >
> > > On Tue, Oct 23, 2018 at 12:35 PM Hamed Ha  wrote:
> > >
> > >> Hi Eric,
> > >>
> > >> Thank you for your reply.
> > >>
> > >> I should say that your justification makes sense to me.  However, I am in
> > >> doubt that CDF defines by the Pr(x <= X) for all X? that is the domain of
> > >> RV is totally ignored in the definition.
> > >>
> > >> It makes a conflict between the formula and the theoretical definition.
> > >>
> > >> Please see page 115 in
> > >>
> > >> https://books.google.co.uk/books?id=FEE8D1tRl30C=frontcover=statistical+distribution=en=X=0ahUKEwjp3PGZmJzeAhUQqxoKHV7OBJgQ6AEIKTAA#v=onepage=uniform=false
> > >> The
> > >>
> > >>
> > >> Thanks.
> > >> Hamed.
> > >>
> > >>
> > >>
> > >> On Tue, 23 Oct 2018 at 10:21, Eric Berger  wrote:
> > >>
> > >>> Hi Hamed,
> > >>> I disagree with your criticism.
> > >>> For a random variable X
> > >>> X: D - - - > R
> > >>> its CDF F is defined by
> > >>> F: R - - - > [0,1]
> > >>> F(z) = Prob(X <= z)
> > >>>
> > >>> The fact that you wrote a convenient formula for the CDF
> > >>> F(z) = (z-a)/(b-a)  a <= z <= b
> > >>> in a particular range for z is your decision, and as you noted this
> > >>> formula will give the wrong value for z outside the interval [a,b].
> > >>> But the problem lies in your formula, not the definition of the CDF
> > >>> which would be, in your case:
> > >>>
> > >>> F(z) = 0 if z <= a
> > >>>= (z-a)/(b-a)   if a <= z <= b
> > >>>= 1 if 1 <= z
> > >>>
> > >>> HTH,
> > >>> Eric
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Oct 23, 2018 at 12:05 PM Hamed Ha  wrote:
> > >>>
> > >>>> Hi All,
> > >>>>
> > >>>> I recently discovered an interesting issue with the punif() function.
> > >>>> Let
> > >>>> X~Uiform[a,b] then the CDF is defined by F(x)=(x-a)/(b-a) for (a<= x<=
> > >>>> b).
> &g

Re: [R] Ignoring the domain of RV in punif()

2018-10-23 Thread Ted Harding

Before the ticket finally enters the waste bin, I think it is
necessary to explicitly explain what is meant by the "domain"
of a random variable. This is not (though in special cases
could be) the space of possible values of the random variable.

Definition of (real-valued) Random Variable (RV):
Let Z be a probability space, i.e. a set {z} of entities z
on which a probability distribution is defined. The entities z
do not need to be numeric. A real-valued RV X is a function
X:Z --> R defined on Z such that, for any z in Z, X(z) is a
real number. The set Z, in tthis context, is (by definitipon)
the *domain* of X, i.e. the space on which X is defined.
It may or may not be (and usually is not) the same as the set
of possible values of X.

Then. given any real value x0, the CDF of X at x- is Prob[X <= X0].
The distribution function of X does not define the domain of X.

As a simple exam[ple: Suppose Q is a cube of side A, consisting of
points z=(u,v,w) with 0 <= u,v,w <= A. Z is the probability space
of points z with a uniform distribution of position within Q.
Define the random variable X:Q --> [0,1] as
  X(u,v,w) = x/A
Then X is uniformly distributed on [0,1], the domain of X is Q.
Then for x <= 0 _Prob[X <= x] = 0, for 0 <= x <= 1 Prob(X >=x] = x,
for x >= 1 Prob(X <= x] = 1. These define the CDF. The set of poaaible
values of X is 1-dimensional, and is not the same as the domain of X,
which is 3-dimensional.

Hopiong this helps!
Ted.

On Tue, 2018-10-23 at 10:54 +0100, Hamed Ha wrote:
> Yes, now it makes more sense.
> 
> Okay, I think that I am convinced and we can close this ticket.
> 
> Thanks Eric.
> Regards,
> Hamed.
> 
> On Tue, 23 Oct 2018 at 10:42, Eric Berger  wrote:
> 
> > Hi Hamed,
> > That reference is sloppy. Try looking at
> > https://en.wikipedia.org/wiki/Cumulative_distribution_function
> > and in particular the first example which deals with a Unif[0,1] r.v.
> >
> > Best,
> > Eric
> >
> >
> > On Tue, Oct 23, 2018 at 12:35 PM Hamed Ha  wrote:
> >
> >> Hi Eric,
> >>
> >> Thank you for your reply.
> >>
> >> I should say that your justification makes sense to me.  However, I am in
> >> doubt that CDF defines by the Pr(x <= X) for all X? that is the domain of
> >> RV is totally ignored in the definition.
> >>
> >> It makes a conflict between the formula and the theoretical definition.
> >>
> >> Please see page 115 in
> >>
> >> https://books.google.co.uk/books?id=FEE8D1tRl30C=frontcover=statistical+distribution=en=X=0ahUKEwjp3PGZmJzeAhUQqxoKHV7OBJgQ6AEIKTAA#v=onepage=uniform=false
> >> The
> >>
> >>
> >> Thanks.
> >> Hamed.
> >>
> >>
> >>
> >> On Tue, 23 Oct 2018 at 10:21, Eric Berger  wrote:
> >>
> >>> Hi Hamed,
> >>> I disagree with your criticism.
> >>> For a random variable X
> >>> X: D - - - > R
> >>> its CDF F is defined by
> >>> F: R - - - > [0,1]
> >>> F(z) = Prob(X <= z)
> >>>
> >>> The fact that you wrote a convenient formula for the CDF
> >>> F(z) = (z-a)/(b-a)  a <= z <= b
> >>> in a particular range for z is your decision, and as you noted this
> >>> formula will give the wrong value for z outside the interval [a,b].
> >>> But the problem lies in your formula, not the definition of the CDF
> >>> which would be, in your case:
> >>>
> >>> F(z) = 0 if z <= a
> >>>= (z-a)/(b-a)   if a <= z <= b
> >>>= 1 if 1 <= z
> >>>
> >>> HTH,
> >>> Eric
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Oct 23, 2018 at 12:05 PM Hamed Ha  wrote:
> >>>
> >>>> Hi All,
> >>>>
> >>>> I recently discovered an interesting issue with the punif() function.
> >>>> Let
> >>>> X~Uiform[a,b] then the CDF is defined by F(x)=(x-a)/(b-a) for (a<= x<=
> >>>> b).
> >>>> The important fact here is the domain of the random variable X. Having
> >>>> said
> >>>> that, R returns CDF for any value in the real domain.
> >>>>
> >>>> I understand that one can justify this by extending the domain of X and
> >>>> assigning zero probabilities to the values outside the domain. However,
> >>>> theoretically, it is not true to return a value for the CDF outside the
> >>>> domain. Then I propose a patch to R function punif() t

Re: [R] differing behavior of mean(), median() and sd() with na.rm

2018-08-22 Thread Ted Harding

I think that one can usefully look at this question from the
point of view of what "NaN" and "NA" are abbreviations for
(at any rate, according to the understanding I have adopted
since many years -- maybe over-simplified).

NaN: Mot a Number
NA: Not Available

So NA is typically used for missing values, whereas NaN
represents the reults of numerical calculations which
cannot give a result which is a definite number,

Hence 0/0 is not a number, so NaN; similarly Inf/Inf.

Thus, with your x <- c(NA, NA, NA) mean(x, na.rm=TRUE)
sum(x, na.rm=TRUE) = 0, since the set of values of x
with na.rm=TRUE is empty so the number of elements
in x is 0; hence mean = 0/0 = NaN.

But for median(x, na.rm=TRUE), because there are no available
elements in x with na.rm=TRUE, and the median is found by
searching among available elements for the value which
divides the set of values into two halves, the median
is not available, hence NA.

Best wishes to all,
Ted.

On Wed, 2018-08-22 at 11:24 -0400, Marc Schwartz via R-help wrote:
> Hi,
> 
> It might even be worthwhile to review this recent thread on R-Devel:
> 
>   https://stat.ethz.ch/pipermail/r-devel/2018-July/076377.html
> 
> which touches upon a subtly related topic vis-a-vis NaN handling.
> 
> Regards,
> 
> Marc Schwartz
> 
> 
> > On Aug 22, 2018, at 10:55 AM, Bert Gunter  wrote:
> > 
> > ... And FWIW (not much, I agree), note that if z = numeric(0) and sum(z) =
> > 0, then mean(z) = NaN makes sense, as length(z) = 0, so dividing by 0 gives
> > NaN. So you can see the sorts of issues you may need to consider.
> > 
> > Bert Gunter
> > 
> > "The trouble with having an open mind is that people keep coming along and
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> > 
> > 
> > On Wed, Aug 22, 2018 at 7:47 AM Bert Gunter  wrote:
> > 
> >> Actually, the dissonance is a bit more basic.
> >> 
> >> After xxx(, na.rm=TRUE) with all NA's in ... you have numeric(0). So
> >> what you see is actually:
> >> 
> >>> z <- numeric(0)
> >>> mean(z)
> >> [1] NaN
> >>> median(z)
> >> [1] NA
> >>> sd(z)
> >> [1] NA
> >>> sum(z)
> >> [1] 0
> >> etc.
> >> 
> >> I imagine that there may be more of these little inconsistencies due to
> >> the organic way R evolved over time. What the conventions should be  can be
> >> purely a matter of personal opinion in the absence of accepted standards.
> >> But I would look to see what accepted standards were, if any, first.
> >> 
> >> -- Bert
> >> 
> >> 
> >> On Wed, Aug 22, 2018 at 7:34 AM Ivan Calandra  wrote:
> >> 
> >>> Dear useRs,
> >>> 
> >>> I have just noticed that when input is only NA with na.rm=TRUE, mean()
> >>> results in NaN, whereas median() and sd() produce NA. Shouldn't it all
> >>> be the same? I think NA makes more sense than NaN in that case.
> >>> 
> >>> x <- c(NA, NA, NA) mean(x, na.rm=TRUE) [1] NaN median(x, na.rm=TRUE) [1]
> >>> NAsd(x, na.rm=TRUE) [1] NA
> >>> 
> >>> Thanks for any feedback.
> >>> 
> >>> Best,
> >>> Ivan
> >>> 
> >>> --
> >>> Dr. Ivan Calandra
> >>> TraCEr, laboratory for Traceology and Controlled Experiments
> >>> MONREPOS Archaeological Research Centre and
> >>> Museum for Human Behavioural Evolution
> >>> Schloss Monrepos
> >>> 56567 Neuwied, Germany
> >>> +49 (0) 2631 9772-243
> >>> https://www.researchgate.net/profile/Ivan_Calandra
> >>> 
> >>> __
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>> 
> >> 
> > 
> > [[alternative HTML version deleted]]
> > 
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error custom indicator Quantstrat colnames

2018-07-14 Thread Ted Harding

Pietro,
Please post this to r-help@r-project.org
not to r-help-ow...@r-project.org
which is a mailing liat concerned with list management, and
does not deal with questions regarding the use of R.
Best wishes,
Ted.

On Sat, 2018-07-14 at 13:04 +, Pietro Fabbro via R-help wrote:
> I will try to be as clear as possible as I have been rebuked by some users. I 
> deleted the last questions and I will try to be sufficiently explicative in 
> this one. I apologize if the data I will insert will not be enough.
> 
> So, I am trying to run a strategy through the package Quantstrat.
> 
> install.packages("quantstrat")
> My problem is that I get the following error
> 
> Error incolnames<-(tmp, value = seq(ncol(tmp_val))) : 
> attempt to set 'colnames' on an object with less than two dimensions
> 
> when I try to run the following command:
> 
> > out <- applyStrategy(strategy=strategy.st,portfolios=portfolio.st)
> I do not have this problem if I use, as indicator, one or more indicators, 
> which are already defined by the package TTR.
> 
> I have this error only when I try to use a custom indicator. Here is the code 
> for the custom indicator that I use:
> 
> wma <-  WMA(Cl(mktdata), 4, wts=c(1:4)) 
> wmamaxt <- rollmaxr(wma, 30, fill = NA)
> wmamint <- - rollmaxr(- wma, 30, fill = NA)
> CNOwma <- function (mktdata=quote(mktdata),x) {(wma - wmamint) / (wmamaxt - 
> wmamint)}
> Please refer to the following code:
> 
> library(devtools)
> library(quantmod)
> library(quantstrat)
> library(TTR)
> library(png)
> library(IKTrading)
> 
> wma <-  WMA(Cl(mktdata), 4, wts=c(1:4)) 
> wmamaxt <- rollmaxr(wma, 30, fill = NA)
> wmamint <- - rollmaxr(- wma, 30, fill = NA)
> CNOwma <- function (mktdata=quote(mktdata),x) {(wma - wmamint) / (wmamaxt - 
> wmamint)}
> initdate <- "2010-01-01"
> from <- "2012-01-01" #start of backtest
> to <- "2017-31-12" #end of backtest
> 
> Sys.setenv(TZ= "EST") #Set up environment for timestamps
> 
> currency("USD") #Set up environment for currency to be used
> 
> symbols <- c("RUT", "IXIC") #symbols used in our backtest
> getSymbols(Symbols = symbols, src = "google", from=from, to=to, adjust = 
> TRUE) #receive data from google finance,  adjusted for splits/dividends
> 
> stock(symbols, currency = "USD", multiplier = 1) #tells quanstrat what 
> instruments present and what currency to use
> 
> tradesize <-1 #default trade size
> initeq <- 10 #default initial equity in our portfolio
> 
> strategy.st <- portfolio.st <- account.st <- "firststrat" #naming strategy, 
> portfolio and account
> 
> #removes old portfolio and strategy from environment
> rm.strat(portfolio.st)
> rm.strat(strategy.st) 
> 
> #initialize portfolio, account, orders and strategy objects
> initPortf(portfolio.st, symbols = symbols, initDate = initdate, currency = 
> "USD")
> 
> initAcct(account.st, portfolios = portfolio.st, initDate = initdate, currency 
> = "USD", initEq = initeq)
> 
> initOrders(portfolio.st, initDate = initdate)
> strategy(strategy.st, store=TRUE)
> 
> add.indicator(strategy = strategy.st,
> name = 'CNOwma',
> arguments = list(x = quote(Cl(mktdata)), n=4),
> label = 'CNOwma4')
> 
> 
> 
> 
> 
> add.signal(strategy.st, name = "sigThreshold",
> arguments = list(column = "CNOwma4", threshold = 0.6,
> relationship = "gt", cross = TRUE),
> label = "longthreshold")
> 
> 
> add.signal(strategy.st, name = "sigThreshold",
> arguments = list(column = "CNOwma4", threshold = 0.6,
> relationship = "lt", cross = TRUE),
> label = "shortthreshold")
> 
> 
> 
> 
> add.rule(strategy.st, name = "ruleSignal",
> arguments = list(sigcol = "longthreshold", sigval = TRUE,
> orderqty = "all", ordertype = "market",
> orderside = "long", replace = FALSE,
> prefer = "Open"),
> type = "enter")
> 
> 
> add.rule(strategy.st, name = "ruleSignal",
> arguments = list(sigcol = "shortthreshold", sigval = TRUE,
> orderqty = "all", ordertype = "market",
> orderside = "long", replace = FALSE,
> prefer = "Open"),
> type = "exit")
> 
> add.rule(strategy.st, name = "ruleSignal",
> arguments = list(sigcol = "shortthreshold", sigval = TRUE,
> orderqty = "all", ordertype = "market",
> orderside = "short", replace = FALSE,
> prefer =

Re: [R] prod(NaN, NA) vs. prod(NA, NaN)

2018-07-04 Thread Ted Harding

I've been following this thread, and wondering where it might lead.
My (possibly naive) view of these matters is basically logical,
relying on (possibly over-simplified) interpretaions of "NA" and "NaN".

These are that:
  "NaN" means "Not a Number", though it can result from a
numerical calculation, e.g. '0/0' or 'Inf/Inf', while:
  "NA" means "Not Available" (e.g. "Missing Value"), but
   should be interpreted as in rhe appropriate class of its
   context -- so '2 + NA' interporets "NA" as numeric,
   while 'vec("abc",NA)' interprets "NA" as character.

On that basis, the result of 'sum(NaN, )' should be
"NaN", since 'sum' presumes that its arguments are numeric,
and the sum of  is not a number.
Likewise 'sum(, NaN)' should be NaN.

And in both of 'sum(NA, NaN) and sum(NaN, NA), the "NA" should
be interepreted as a "not-available number", and because
of the "NaN" the result cannot be a number, hence is "NaN".

So it sould seem that Martin Møller Skarbiniks Pedersen's
inconsistency:
  sum(c(NaN,NA))
  [1] NaN
  sum(NaN,NA)
  [1] NA
is not consistent with the above reasoning.

However, in my R version 2.14.0 (2011-10-31):
  sum(NaN,NA)
  [1] NA
  sum(NA,NaN)
  [1] NA
which **is** consistent! Hmmm...
Best wishes to all,
Ted.

On Wed, 2018-07-04 at 12:06 +0100, Barry Rowlingson wrote: 
> I'm having deja-vu of a similar discussion on R-devel:
> 
> https://stat.ethz.ch/pipermail/r-devel/2018-July/076377.html
> 
> This was the funniest inconsistency I could find:
> 
>  > sum(c(NaN,NA))
>  [1] NaN
>  > sum(NaN,NA)
>  [1] NA
> 
> THEY'RE IN THE SAME ORDER!!!
> 
> The doc in ?NaN has this clause:
> 
>  In R, basically all mathematical functions (including basic
>  ‘Arithmetic’), are supposed to work properly with ‘+/- Inf’ and
>  ‘NaN’ as input or output.
> 
> which doesn't define "properly", but you'd think commutativity was a
> "proper" property of addition. So although they "are supposed to" they
> don't. Naughty mathematical functions!
> 
> And then there's...
> 
>  Computations involving ‘NaN’ will return ‘NaN’ or perhaps ‘NA’:
>  which of those two is not guaranteed and may depend on the R
>  platform (since compilers may re-order computations).
> 
> Which is at least telling you there is vagueness in the system. But
> hey, mathematics is not a precise science... oh wait...
> 
> Barry
> 
> 
> 
> 
> 
> On Tue, Jul 3, 2018 at 10:09 PM, Rolf Turner  wrote:
> >
> > On 04/07/18 00:24, Martin Møller Skarbiniks Pedersen wrote:
> >
> >> Hi,
> >>I am currently using R v3.4.4 and I just discovered this:
> >>
> >>> prod(NA, NaN) ; prod(NaN, NA)
> >> [1] NA
> >> [1] NaN
> >>
> >> ?prod says:
> >>  If ‘na.rm’ is ‘FALSE’ an ‘NA’ value in any of the arguments will
> >>   cause a value of ‘NA’ to be returned, otherwise ‘NA’ values are
> >>   ignored.
> >>
> >> So according to the manual-page for prod() NA should be returned in both
> >> cases?
> >>
> >>
> >> However for sum() is opposite is true:
> >>> sum(NA, NaN) ; sum(NaN, NA)
> >> [1] NA
> >> [1] NA
> >>
> >> ?sum says:
> >>  If ‘na.rm’ is ‘FALSE’ an ‘NA’ or ‘NaN’ value in any of the
> >>   arguments will cause a value of ‘NA’ or ‘NaN’ to be returned,
> >>   otherwise ‘NA’ and ‘NaN’ values are ignored.
> >>
> >>
> >> Maybe the manual for prod() should say the same as sum() that
> >> both NA and NaN can be returned?
> >
> > But:
> >
> >  > sum(NA,NaN)
> > [1] NA
> >  > sum(NaN,NA)
> > [1] NA
> >
> > so sum gives NA "both ways around".  Perhaps a slight inconsistency
> > here?  I doubt that it's worth losing any sleep over, however.
> >
> > Interestingly (???):
> >
> >  > NaN*NA
> > [1] NaN
> >  > NA*NaN
> > [1] NA
> >  > NaN+NA
> > [1] NaN
> >  > NA+NaN
> > [1] NA
> >
> > So we have an instance of non-commutative arithmetic operations.  And
> > sum() is a wee bit inconsistent with "+".
> >
> > Again I doubt that the implications are all that serious.
> >
> > cheers,
> >
> > Rolf Turner
> >
> > --
> > Technical Editor ANZJS
> > Department of Statistics
> > University of Auckland
> > Phone: +64-9-373-7599 ext. 88276
> >
> > __
>

Re: [R] R maintains old values

2018-07-03 Thread Ted Harding

On Tue, Jul 3, 2018 at 9:25 AM, J C Nash  wrote:
> 
> > . . . Now, to add to the controversy, how do you set a computer on fire?
> >
> > JN

Perhaps by exploring the context of this thread,
where new values strike a match with old values???

Ted

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] OT --- grammar.

2018-06-24 Thread Ted Harding

On Mon, 2018-06-25 at 09:46 +1200, Rolf Turner wrote:
> Does/should one say "the degrees of freedom is defined to be" or "the 
> degrees of freedom are defined to be"?
> 
> Although value of "degrees of freedom" is a single number, the first 
> formulation sounds very odd to my ear.
> 
> I would like to call upon the collective wisdom of the R community to 
> help me decide.
> 
> Thanks, and my apologies for the off-topic post.
> 
> cheers,
> Rolf Turner

Interesting question, Rolf!
>From my point of view. I see "degrees of freedon" as a plural noun,
because of "degrees". But in some cases, we have only 1 degree of
freedon. Then the degrees of freedon is 1.

But we do not say, in that case, "the degree of freedom is defined
to be", or the degree of freedom are 1"

Nor would we say "The degrees of freedom are 19".!

So I thonk that the solution is to encapsulate the term within
aingle quotes, so that it becomes a singular entity. Thus:

The 'degrees of freedom' is defined to be ... "; and
The 'degrees of freedom' is 1.
Or
The degrees of freedom' is 19.

This is not the same issue as (one of my prime hates) saying
"the data is srored in the dataframe ... ". "Data" is a
plural noun (ainguler "datum"), and I would insist on
"the data are stored ... ". The French use "une donnee" and
"les donnees"; the Germans use "ein Datum", "der Daten";
so they know what they're doing! English-speakers mostly do not"

Best wishes to all,
Ted.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mysterious rounding digits output

2018-05-31 Thread Ted Harding

Thanks Martin! Good to know that you have made this important
change, And, regarding
  Maybe we should additionally say that this is *not* round()ing,
  and give a link to the help for signif() ?
I think that also would be most useful. In fact, ?signif
leads to a whole survey of "Rounding of Numbers", covering the
functions ceiling(), floor(), trunc(), round(), signif().
Well worth reading!
Best wishes,
Ted.

On Thu, 2018-05-31 at 08:58 +0200, Martin Maechler wrote:
> >>>>> Ted Harding 
> >>>>> on Thu, 31 May 2018 07:10:32 +0100 writes:
> 
> > Well pointed out, Jim!
> 
> > It is infortunate that the documentation for options(digits=...)
> > does not mention that these are *significant digits* 
> > and not *decimal places* (which is what Joshua seems to want):
> 
> Since R 3.4.0  the help on  ?options  *does* say significant!
> 
> The change in R's source code was this one,  14 months ago :
> 
> r72380 | maechler | 2017-03-21 11:28:13 +0100 (Tue, 21. Mar 2017) | 2 Zeilen
> GeĆ¤nderte Pfade:
>M /trunk/src/library/base/man/options.Rd
> 
> digits: + "signficant"
> 
> 
> and since then, the text has been
> 
>  ‘digits’: controls the number of significant digits to print when
>   printing numeric values.  It is a suggestion only.
> ...
> 
> whereas what you (Ted) cite is from R 3.3.x and earlier
> 
>  > "‘digits’: controls the number of digits to print when
>  > printing numeric values."
> 
> Maybe we should additionally say that this is *not* round()ing,
> and give a link to the help for signif() ?
> 
> 
> > On the face of it, printing the value "0,517" of 'ccc' looks
> > like printing 4 digits! Joshua's could look even worse if 'ddd'
> > had values in the 1000s!
> 
> > To achieve exactly what Joshua seems to want, use the round()
> > function. Starting with his original assignment of values to
> > the variable itemInfo, the result of round(itemInfo,digits=3) is:
> 
> > aaa   bbb   ccc   dddeee
> > skill 1.396 6.225 0.517 5.775  2.497
> > predict   1.326 5.230 0.462 5.116 -2.673
> > waiting   1.117 4.948NANA NA
> > complex   1.237 4.170 0.220 4.713  5.642
> > novelty   1.054 4.005 0.442 4.260  2.076
> > creative  1.031 3.561 0.362 3.689  2.549
> > evaluated 0.963 3.013NANA NA
> > body  0.748 2.238 0.596 2.019  0.466
> > control   0.620 2.149NANA NA
> > stakes0.541 1.905 0.227 2.020  2.343
> > spont 0.496 1.620NANA NA
> > chatter   0.460 1.563 0.361 1.382  0.540
> > present   0.428 1.236 0.215 1.804  2.194
> > reward0.402 1.101 0.288 1.208  0.890
> > feedback  0.283 0.662NANA NA
> > goal  0.237 0.474NANA NA
> 
> > Best wishes to all,
> > Ted.
> 
> 
> > On Thu, 2018-05-31 at 15:30 +1000, Jim Lemon wrote:
> >> Hi Joshua,
> >> Because there are no values in column ddd less than 1.
> >> 
> >> itemInfo[3,"ddd"]<-0.3645372
> >> itemInfo
> >> aaa   bbb   ccc   dddeee
> >> skill 1.396 6.225 0.517 5.775  2.497
> >> predict   1.326 5.230 0.462 5.116 -2.673
> >> waiting   1.117 4.948NA 0.365 NA
> >> complex   1.237 4.170 0.220 4.713  5.642
> >> novelty   1.054 4.005 0.442 4.260  2.076
> >> creative  1.031 3.561 0.362 3.689  2.549
> >> evaluated 0.963 3.013NANA NA
> >> body  0.748 2.238 0.596 2.019  0.466
> >> control   0.620 2.149NANA NA
> >> stakes0.541 1.905 0.227 2.020  2.343
> >> spont 0.496 1.620NANA NA
> >> chatter   0.460 1.563 0.361 1.382  0.540
> >> present   0.428 1.236 0.215 1.804  2.194
> >> reward0.402 1.101 0.288 1.208  0.890
> >> feedback  0.283 0.662NANA NA
> >> goal  0.237 0.474NANA NA
> >> 
> >> digits specifies the number of significant digits ("It is a suggestion
> >> only"), so when at least one number is padded with a leading zero it
> >> affects the formatting of the other numbers in that column. I suspect
> >> that this is an esthetic consideration to line up the decimal points.
>

Re: [R] mysterious rounding digits output

2018-05-31 Thread Ted Harding

Well pointed out, Jim!
It is infortunate that the documentation for options(digits=...)
does not mention that these are *significant digits* and not
*decimal places* (which is what Joshua seems to want):
  "‘digits’: controls the number of digits to print when
  printing numeric values."

On the face of it, printing the value "0,517" of 'ccc' looks
like printing 4 digits! Joshua's could look even worse if 'ddd'
had values in the 1000s!

To achieve exactly what Joshua seems to want, use the round()
function. Starting with his original assignment of values to
the variable itemInfo, the result of round(itemInfo,digits=3) is:

aaa   bbb   ccc   dddeee
skill 1.396 6.225 0.517 5.775  2.497
predict   1.326 5.230 0.462 5.116 -2.673
waiting   1.117 4.948NANA NA
complex   1.237 4.170 0.220 4.713  5.642
novelty   1.054 4.005 0.442 4.260  2.076
creative  1.031 3.561 0.362 3.689  2.549
evaluated 0.963 3.013NANA NA
body  0.748 2.238 0.596 2.019  0.466
control   0.620 2.149NANA NA
stakes0.541 1.905 0.227 2.020  2.343
spont 0.496 1.620NANA NA
chatter   0.460 1.563 0.361 1.382  0.540
present   0.428 1.236 0.215 1.804  2.194
reward0.402 1.101 0.288 1.208  0.890
feedback  0.283 0.662NANA NA
goal  0.237 0.474NANA NA

Best wishes to all,
Ted.


On Thu, 2018-05-31 at 15:30 +1000, Jim Lemon wrote:
> Hi Joshua,
> Because there are no values in column ddd less than 1.
> 
> itemInfo[3,"ddd"]<-0.3645372
> itemInfo
> aaa   bbb   ccc   dddeee
> skill 1.396 6.225 0.517 5.775  2.497
> predict   1.326 5.230 0.462 5.116 -2.673
> waiting   1.117 4.948NA 0.365 NA
> complex   1.237 4.170 0.220 4.713  5.642
> novelty   1.054 4.005 0.442 4.260  2.076
> creative  1.031 3.561 0.362 3.689  2.549
> evaluated 0.963 3.013NANA NA
> body  0.748 2.238 0.596 2.019  0.466
> control   0.620 2.149NANA NA
> stakes0.541 1.905 0.227 2.020  2.343
> spont 0.496 1.620NANA NA
> chatter   0.460 1.563 0.361 1.382  0.540
> present   0.428 1.236 0.215 1.804  2.194
> reward0.402 1.101 0.288 1.208  0.890
> feedback  0.283 0.662NANA NA
> goal  0.237 0.474NANA NA
> 
> digits specifies the number of significant digits ("It is a suggestion
> only"), so when at least one number is padded with a leading zero it
> affects the formatting of the other numbers in that column. I suspect
> that this is an esthetic consideration to line up the decimal points.
> 
> Jim
> 
> 
> On Thu, May 31, 2018 at 11:18 AM, Joshua N Pritikin  
> wrote:
> > R version 3.5.0 (2018-04-23) -- "Joy in Playing"
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> > options(digits=3)
> >
> > itemInfo <- structure(list("aaa" = c(1.39633732316667, 1.32598263816667,  
> > 1.1165832407, 1.23651072616667, 1.0536867998, 1.0310073738,  
> > 0.9630728395, 0.7483865045, 0.62008664617, 0.5411017985,  
> > 0.49639760783, 0.45952804467, 0.42787704783, 0.40208597967, 
> >  0.28316118123, 0.23689627723), "bbb" = c(6.22533860696667,  
> > 5.229736804, 4.94816041772833, 4.17020503255333, 4.00453781427167,  
> > 3.56058007398333, 3.0125202404, 2.2378235873, 2.14863910661167,  
> > 1.90460903044777, 1.62001089796667, 1.56341257968151, 1.23618558850667,  
> > 1.10086688908262, 0.661981500639833, 0.47397754310745), "ccc" = 
> > c(0.5165911355,  0.46220470767, NA, 0.21963592433, 
> > 0.44186378083,  0.36150286583, NA, 0.59613794667, NA, 
> > 0.22698477157,  NA, 0.36092266158, 0.2145347068, 0.28775624948, 
> > NA, NA ), "ddd" = c(5.77538400186667,  5.115877113, NA, 4.71294520316667, 
> > 4.25952652129833, 3.68879921863167,  NA, 2.01942456211145, NA, 
> > 2.02032557108, NA, 1.381810875
>  9571,  1.80436759778167, 1.20789851993367, NA, NA), "eee" = 
> c(2.4972534717,  -2.67340172316667, NA, 5.6419520633, 
> 2.0763355523, 2.548949539,  NA, 0.465537272243167, NA, 2.34255027516667, 
> NA, 0.5400824922975,  2.1935000655, 0.89000797687, NA, NA)), row.names = 
> c("skill",  "predict", "waiting", "complex", "novelty", "creative", 
> "evaluated",  "body", "control", "stakes", "spont", "chatter", "present", 
> "reward",  "feedback", "goal"), class = "data.frame")
> >
> > itemInfo  # examine column ddd
> >
> > When I try this, column ddd has 1 fewer digits than expected. See the
> > attache

[R] TEST message

2018-04-24 Thread Ted Harding

Apologies for disturbance! Just checking that I can
get through to r-help.
Ted.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hacked

2018-04-18 Thread Ted Harding

Not necessarily. The R-help archives are publicly accessible,
with a "sort by date" option. So if someone sets up a web=page
monitor which reports back when new messages appear there
(at the bpottom end), then their email addresses are readily
copied (subject to " at " --> "@".

Once they have the address then anything can happen!

Best wishes,
Ted (eagerly awaiting attempted seduction ... ).

On Wed, 2018-04-18 at 10:36 +, Fowler, Mark wrote:
> Seems it must be the R-list. A horde of ‘solicitation’ emails began arriving 
> about 27 minutes after I posted about not seeing any! Had left work by that 
> time, so did not encounter them until now.
> 
> From: Mark Leeds [mailto:marklee...@gmail.com]
> Sent: April 18, 2018 12:33 AM
> To: Rui Barradas
> Cc: Ding, Yuan Chun; Fowler, Mark; Luis Puerto; Peter Langfelder; R-Help ML 
> R-Project; Neotropical bat risk assessments
> Subject: Re: [R] Hacked
> 
> Hi All: I lately get a lot more spam-porn type emails lately also but I don't 
> know if they are due to me being on
> the R-list.
> 
> 
> On Tue, Apr 17, 2018 at 5:09 PM, Rui Barradas 
> <ruipbarra...@sapo.pt<mailto:ruipbarra...@sapo.pt>> wrote:
> Hello,
> 
> Nor do I, no gmail, also got spam.
> 
> Rui Barradas
> 
> On 4/17/2018 8:34 PM, Ding, Yuan Chun wrote:
> No, I do not use gmail, still got dirty spam email twice.
> 
> -Original Message-
> From: R-help 
> [mailto:r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] On 
> Behalf Of Fowler, Mark
> Sent: Tuesday, April 17, 2018 12:32 PM
> To: Luis Puerto; Peter Langfelder
> Cc: R-Help ML R-Project; Neotropical bat risk assessments
> Subject: Re: [R] Hacked
> 
> [Attention: This email came from an external source. Do not open attachments 
> or click on links from unknown senders or unexpected emails.]
> 
> 
> 
> 
> 
> Just an observation. I have not seen the spam you are discussing. Possibly it 
> is specific to gmail addresses?
> 
> -Original Message-
> From: R-help 
> [mailto:r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] On 
> Behalf Of Luis Puerto
> Sent: April 17, 2018 4:11 PM
> To: Peter Langfelder
> Cc: R-Help ML R-Project; Neotropical bat risk assessments
> Subject: Re: [R] Hacked
> 
> Hi!
> 
> This happened to me also! I just got a spam email just after posting and then 
> in following days I got obnoxious spam emails in my spam filter. As the 
> others, I think that there is some kind of bot subscribed to the list, but 
> also perhaps a spider or crawler monitoring the R-Help archive and getting 
> email addresses there. Nabble is a possibility too.
> On 17 Apr 2018, at 21:50, Peter Langfelder 
> <peter.langfel...@gmail.com<mailto:peter.langfel...@gmail.com>> wrote:
> 
> I got some spam emails after my last post to the list, and the emails
> did not seem to go through r-help. The spammers may be subscribed to
> the r-help, or they get the poster emails from some of the web copies
> of this list (nabble or similar).
> 
> Peter
> 
> On Tue, Apr 17, 2018 at 11:37 AM, Ulrik Stervbo 
> <ulrik.ster...@gmail.com<mailto:ulrik.ster...@gmail.com>> wrote:
> I asked the moderators about it. This is the reply
> 
> "Other moderators have looked into this a bit and may be able to shed
> more light on it. This is a "new" tactic where the spammers appear to
> reply to the r-help post. They are not, however, going through the r-help 
> server.
> 
> It also seems that this does not happen to everyone.
> 
> I am not sure how you can automatically block the spammers.
> 
> Sorry I cannot be of more help."
> 
> --Ulrik
> 
> Jeff Newmiller <jdnew...@dcn.davis.ca.us<mailto:jdnew...@dcn.davis.ca.us>> 
> schrieb am Di., 17. Apr.
> 2018,
> 14:59:
> Likely a spammer has joined the mailing list and is auto-replying to
> posts made to the list. Unlikely that the list itself has been
> "hacked". Agree that it is obnoxious.
> 
> On April 17, 2018 5:01:10 AM PDT, Neotropical bat risk assessments <
> neotropical.b...@gmail.com<mailto:neotropical.b...@gmail.com>> wrote:
> Hi all,
> 
> Site has been hacked?
> Bad SPAM arriving
> 
> __
> R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> --
> Sent from my phone. Please excuse my brevity.
> 
> _

Re: [R] R help

2018-03-31 Thread Ted Harding

 A. On Sat, 2018-03-31 at 15:45 +0200, Henri Moolman wrote:
> Could you please provide help with something from R that I find rather
> puzzling? In the small program below x[1]=1, .  .  .  , x[5]=5. R also
> finds that x[1]<=5 is TRUE. Yet when you attempt to execute while, R does
> not seem to recognize the condition. Any thoughts on why this happens?
> 
> Regards
> 
> Henri Moolman
> 
> > x=c(1,2,3,4,5)
> > x[1]
> [1] 1
> > i=1
> > x[1]<=5
> [1] TRUE
> > while(x[i]<=5){
> + i=i+1
> + }
> Error in while (x[i] <= 5) { : missing value where TRUE/FALSE needed

If you run the following you should understand why (the only
change is to include "print(i)" in the loop, so you can see
what is happening):

  x=c(1,2,3,4,5)
  x[1]
# [1] 1
  i=1
  x[1]<=5
# [1] TRUE
  while(x[i]<=5){
  i = i+1 ; print(i)
  }
# [1] 3
# [1] 4
# [1] 5
# [1] 6
# Error in while (x[i] <= 5) { : missing value where TRUE/FALSE needed

So everything is fine so long as i <= 5 (i.e. x[i] <= 5),
but then the loop sets i = 6. and then:

  i
# [1] 6
  x[i]
# [1] NA
  x[i] <= 5
# [1] NA

Helpful?
Best wishes,
Ted.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] substr gives empty output

2018-01-21 Thread Ted Harding

On Sun, 2018-01-21 at 09:59 +0100, Luigi Marongiu wrote:
> Dear all,
> I have a string, let's say "testing", and I would like to extract in
> sequence each letter (character) from it. But when I use substr() I only
> properly get the first character, the rest is empty (""). What am I getting
> wrong?
> For example, I have this code:
> 
> >>>
> x <- "testing"
> k <- nchar(x)
> for (i in 1:k) {
>   y <- substr(x, i, 1)
>   print(y)
> }

>From the help page
  substr(x, start, stop)
where 'start' is the position in the character vector x at which the
substring starts, and 'stop' is the position at which it stops.

Hence 'stop' must be >= 'start'; and if they are equal then you get
just the single character. That is the case in your code, when i=1;
when i > 1 then stop < start, so you get nothing. Compare with:

  x <- "testing"
  k <- nchar(x)
  for (i in 1:k) { 
y <- substr(x, i, i)  ### was: substr(x, i, 1)
print(y)
  }

[1] "t"
[1] "e"
[1] "s"
[1] "t"
[1] "i"
[1] "n"
[1] "g"

Hoping this helps,
Ted.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How (in general) take out data sets (available in the packages)?

2017-07-31 Thread Ted Beginner via R-help


Suzen, thank you very much for your so useful information (I will try to 
understand it)!
And my sincere gratitude to the moderator!
>"Suzen, Mehmet" < msu...@gmail.com >:
>I also suggest you Hadley's optimized package for interoperating xls
>files with R:
>https://github.com/tidyverse/readxl
>https://cran.r-project.org/web/packages/readxl/index.html





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How export data set (available in the package) from R?

2017-07-30 Thread Ted via R-help

"Data set flchain available in the survival  package".  How can I get it (from 
R)  as Excel file? Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R: How to multiplying y-axis ticks value by 100?

2017-07-24 Thread Ted via R-help


Many thanks, Jim!!!

>Jim Lemon < drjimle...@gmail.com >:
>Have a look at axis.mult in the plotrix package.
>Jim

>>iPad via R-help <  r-help@r-project.org > wrote:
>> How to multiplying y-axis ticks value by 100 (without put the % symbol next 
>> to the number) here:
>> plot (CI.overall, curvlab=c("Discharge", "Death"), xlab="Days", las=1)
>> P.S. So instead 0.00, 0.20 etc. the 0, 20 etc.
 __
>>  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>  https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide  http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.


--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Precision of values > 53 bits

2017-07-20 Thread Ted Harding

On Thu, 2017-07-20 at 14:33 +0200, peter dalgaard wrote:
> > On 10 Jan 2013, at 15:56 , S Ellison <s.elli...@lgcgroup.com> wrote:
> > 
> > 
> > 
> >> I am working with large numbers and identified that R looses 
> >> precision for such high numbers.
> > Yes. R uses standard 32-bit double precision.
> 
> 
> Well, for large values of 32... such as 64.

Hmmm ... Peter, as one of your compatriots (guess who) once solemnly
said to me:

  2 plus 2 is never equal to 5 -- not even for large values of 2.

Best wishes,
Ted.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [FORGED] Logical Operators' inconsistent Behavior

2017-05-19 Thread Ted Harding

[I unadvertently sent my reply below to Jeremie, instead of R-help.
Also, I havve had an additional thought which may clarify things
for R users].
[Original reply]:
The point about this is that (as Rolf wrote) FALSE & (anything)
is FALSE, provided logical NA is either TRUE ot FALSE but,
because the "NA" says that it is not known which it is,
it could be "anything". And, indeed, if "NA" is given the
"missing" meaning and if we assume that a missing logical value
did indeed have a value (necessarily either TRUE or FALSE),
then it follows logically that FALSE & NA = FALS£.

On the other hand, if with the "missing" interpretation of "NA"
we don't even know that it is a logical, then it might be fair
enough to say FALSE & NA = NA.
Ted.

[Additional thought]:
Testing to see what would happen if the NA were not loigical,
I put myself (not being logical ... ) on the line, facing up to R:
   X <- "Ted"
   FALSE & X
   Error in FALSE & X : 
   operations are possible only for numeric, logical or complex types
So R will refuse to deal with any variable which cannot partake in
a logical expression.

Ted.

On Fri, 2017-05-19 at 14:24 +0200, Jérémie Juste wrote:
> My apologies if I was not clear enough,
> 
> TRUE & NA could be either TRUE or FALSE and consequently is NA.
> why is   FALSE & NA = FALSE?  NA could be TRUE or FALSE, so FALSE & NA
> should be NA?
> 
> 
> On Fri, May 19, 2017 at 2:13 PM, Rolf Turner <r.tur...@auckland.ac.nz>
> wrote:
> 
> > On 20/05/17 00:01, Jérémie Juste wrote:
> >
> >> Hello,
> >>
> >> Rolf said,
> >>
> >> TRUE & FALSE is FALSE but TRUE & TRUE is TRUE, so TRUE & NA could be
> >> either TRUE or FALSE and consequently is NA.
> >>
> >> OTOH FALSE & (anything) is FALSE so FALSE & NA is FALSE.
> >>
> >>
> >> According to this logic why is
> >>
> >> FALSE & NA
> >>
> >> [1] FALSE
> >>
> >
> > Huh
> >
> >
> > cheers,
> >
> > Rolf Turner
> >
> > --
> > Technical Editor ANZJS
> > Department of Statistics
> > University of Auckland
> > Phone: +64-9-373-7599 ext. 88276
> >
> 
> 
> 
> -- 
> Jérémie Juste
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interesting quirk with fractions and rounding

2017-04-21 Thread Ted Harding

I've been following this thread with interest. A nice
collection of things to watch out for, if you don't
want the small arithmetic errors due to finite-length
digital representations of fractions to cause trouble!

However, as well as these small discrepancies, major
malfunctions can also result.

Back on Dec 22, 2013, I posted a Christmas Greetings
message to R-help:

  Season's Greetings (and great news ... )!

which starts:

  Greetings All!
  With the Festive Season fast approaching, I bring you joy
  with the news (which you will surely wish to celebrate)
  that R cannot do arithmetic!

  Usually, this is manifest in a trivial way when users report
  puzzlement that, for instance,

sqrt(pi)^2 == pi
# [1] FALSE

  which is the result of a (generally trivial) rounding or
  truncation error:

 sqrt(pi)^2 - pi
# [1] -4.440892e-16

  But for some very simple calculations R goes off its head.

And the example given is:

  Consider a sequence generated by the recurrence relation

x[n+1] = 2*x[n] if 0 <= x[n] <= 1/2
x[n+1] = 2*(1 - x[n]) if 1/2 < x[n] <= 1

  (for 0 <= x[n] <= 1).

  This has equilibrium points (x[n+1] = x[n]) at x[n] = 0
  and at x[n] = 2/3:

2/3 -> 2*(1 - 2/3) = 2/3

  It also has periodic points, e.g.

2/5 -> 4/5 -> 2/5 (period 2)
2/9 -> 4/9 -> 8/9 -> 2/9 (period 3)

  The recurrence relation can be implemented as the R function

nextx <- function(x){
  if( (0<=x)&(x<=1/2) ) {x <- 2*x} else {x <- 2*(1 - x)}
}

  Now have a look at what happens when we start at the equilibrium
  point x = 2/3:

N <- 1 ; x <- 2/3
while(x > 0){
  cat(sprintf("%i: %.9f\n",N,x))
  x <- nextx(x) ; N <- N+1
}
cat(sprintf("%i: %.9f\n",N,x))

For a while [run it and see!], this looks as though it's doing what
the arithmetic would lead us to expect: the first 24 results will all
be printed as 0.7, which looks fine as 2/3 to 9 places.

But then the "little errors" start to creep in:

  N=25: 0.6
  N=28: 0.66672 
  N=46: 0.667968750
  N=47: 0.664062500
  N=48: 0.671875000
  N=49: 0.65625
  N=50: 0.68750
  N=51: 0.62500
  N=52: 0.75000
  N=53: 0.5
  N=54: 1.0
  N=55: 0.0

  What is happening is that, each time R multiplies by 2, the binary
  representation is shifted up by one and a zero bit is introduced
  at the bottom end.

At N=53, the first binary bit of 'x' is 1, and all the rest are 0,
so now 'x' is exactly 0.5 = 1/2, hence the final two are also exact
results; 53 is the Machine$double.digits = 53 binary places.

So this normally "almost" trivial feature can, for such a simple
calculation, lead to chaos or catastrophe (in the literal technical
sense).

For more detail, including an extension of the above, look at the
original posting in the R-help archives for Dec 22, 2013:

  From: (Ted Harding) <ted.hard...@wlandres.net>
  Subject: [R] Season's Greetings (and great news ... )!
  Date: Sun, 22 Dec 2013 09:59:16 - (GMT)

(Apologies, but I couldn't track down the URL for this posting
in the R-help archives; there were a few follow-ups).

I gave this as an example to show that the results of the "little"
arithmetic errors (such as have recently been discussed from many
aspects) can, in certain contexts, destroy a computation.

So be careful to consider what can happen in the particular
context you are working with.

There are ways to dodge the issue -- such as using the R interface
to the 'bc' calculator, which computes arithmetic expressions in
a way which is quite different from the fixed-finite-length binary
representation and algorithms used, not only by R, but also by many
other numerical computation software suites

Best wishes to all,
Ted.

-
E-Mail: (Ted Harding) <ted.hard...@wlandres.net>
Date: 21-Apr-2017  Time: 22:03:15
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Beginner needs help with R

2017-02-07 Thread Ted Harding

Bert, your solution seems to presuppose that the programmer
knows beforehand that the leading digit in the number is "0"
(which in fact is clearly the case in Nabila Arbi's original
query). However, the sequence might arise from some process
outside of the progammer's contgrol, and may then either have
a leading 0 or not.In that case, I think Jim's solution is safer!
Best wishes,
Ted.


On 07-Feb-2017 16:02:18 Bert Gunter wrote:
> No need for sprintf(). Simply:
> 
>> paste0("DQ0",seq.int(60054,60060))
> 
> [1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059"
> [7] "DQ060060"
> 
> 
> Cheers,
> Bert
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Mon, Feb 6, 2017 at 5:45 AM, jim holtman <jholt...@gmail.com> wrote:
>> You need the leading zeros, and 'numerics' just give the number without
>> leading zeros.  You can use 'sprintf' for create a character string with
>> the leading zeros:
>>
>>> # this is using 'numeric' and drops leading zeros
>>>
>>> seq1 <- paste("DQ", seq(060054, 060060), sep = "")
>>> seq1
>> [1] "DQ60054" "DQ60055" "DQ60056" "DQ60057" "DQ60058" "DQ60059" "DQ60060"
>>>
>>> # use 'sprintf' to create leading zeros
>>> seq2 <- paste0("DQ", sprintf("%06d", seq(060054, 060060)))
>>> seq2
>> [1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059"
>> "DQ060060"
>>>
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Sun, Feb 5, 2017 at 8:50 PM, Nabila Arbi <nabilaelarbi1...@gmail.com>
>> wrote:
>>
>>> Dear R-Help Team!
>>>
>>> I have some trouble with R. It's probably nothing big, but I can't find a
>>> solution.
>>> My problem is the following:
>>> I am trying to download some sequences from ncbi using the ape package.
>>>
>>> seq1 <- paste("DQ", seq(060054, 060060), sep = "")
>>>
>>> sequences <- read.GenBank(seq1,
>>> seq.names = seq1,
>>> species.names = TRUE,
>>> gene.names = FALSE,
>>> as.character = TRUE)
>>>
>>> write.dna(sequences, "mysequences.fas", format = "fasta")
>>>
>>> My problem is, that R doesn't take the whole sequence number as "060054"
>>> but it puts it as DQ60054 (missing the zero in the beginning, which is
>>> essential).
>>>
>>> Could please tell me, how I can get R to accepting the zero in the
>>> beginning of the accession number?
>>>
>>> Thank you very much in advance and all the best!
>>>
>>> Nabila
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) <ted.hard...@wlandres.net>
Date: 07-Feb-2017  Time: 16:48:41
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] histogram first bar wrong position

2016-12-22 Thread Ted Harding

Willam has listed the lid on the essence of the problem, which is
that in R the way that breaks (and therefore counts) in a histogram
are evaluated is an area of long grass with lurking snakes!

To get a glimpse of this, have a look at
  ?hist
and in the seaction "Arguments", look at "breaks", "freq", "right".
Also see under "Details".

and, as suggested under "See also", look at
  ?nclass.Sturges

As William suggests, if you know what claa intervals you want,
create them yourself! For your example (with N=100), look at:

   hist(y,freq=TRUE, col='red', breaks=0.5+(0:6))

or

   hist(y,freq=TRUE, col='red', breaks=0.25+(0:12)/2)

Hoping this helps!
Best wishes,
Ted.


On 22-Dec-2016 16:36:34 William Dunlap via R-help wrote:
> Looking at the return value of hist will show you what is happening:
> 
>> x <- rep(1:6,10*(6:1))
>> z <- hist(x, freq=TRUE)
>> z
> $breaks
>  [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
> 
> $counts
>  [1] 60 50  0 40  0 30  0 20  0 10
> ...
> 
> The the first bin is [1-1.5], including both endpoints, while the other
> bins include only the upper endpoint.  I recommend defining your
> own breakpoints, ones don't include possible data points, as in
> 
>> print(hist(x, breaks=seq(min(x)-0.5, max(x)+0.5, by=1), freq=TRUE))
> $breaks
> [1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5
> 
> $counts
> [1] 60 50 40 30 20 10
> ...
> 
> S+ had a 'factor' method for hist() that did this sort of thing, but R does
> not.
> 
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> 
> On Thu, Dec 22, 2016 at 5:17 AM, itpro <itp...@yandex.ru> wrote:
> 
>> Hi, everyone.
>>
>>
>> I stumbled upon weird histogram behaviour.
>>
>> Consider this "dice emulator":
>> Step 1: Generate uniform random array x of size N.
>> Step 2: Multiply each item by six and round to next bigger integer to get
>> numbers 1 to 6.
>> Step 3: Plot histogram.
>>
>> > x<-runif(N)
>> > y<-ceiling(x*6)
>> > hist(y,freq=TRUE, col='orange')
>>
>>
>> Now what I get with N=10
>>
>> > x<-runif(10)
>> > y<-ceiling(x*6)
>> > hist(y,freq=TRUE, col='green')
>>
>> At first glance looks OK.
>>
>> Now try N=100
>>
>> > x<-runif(100)
>> > y<-ceiling(x*6)
>> > hist(y,freq=TRUE, col='red')
>>
>> Now first bar is not where it should be.
>> Hmm. Look again to 10 histogram... First bar is not where I want it,
>> it's only less striking due to narrow bars.
>>
>> So, first bar is always in wrong position. How do I fix it to make
>> perfectly spaced bars?
>>
>>
>>
>>
>>
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) <ted.hard...@wlandres.net>
Date: 22-Dec-2016  Time: 17:23:26
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] detecting if a variable has changed

2016-06-05 Thread Ted Harding

Ah, perhaps I'm beginning to undertstand the question!
Presumably the issue is that evaluating X[X<=y] takes O(n) time,
where n = length(X), and similarly X[X>y]. 

So I suppose that one needs to be looking at some procedure for
a "bisecting" search for the largest r such that X[r] <= y, which
would then be O(log2(n)).

Perhaps not altogether straightforward to program, but straqightforward
in concept!

Apologies for misunderstanding.
Ted.

On 05-Jun-2016 18:13:15 Bert Gunter wrote:
> Nope, Ted. I asked for  a O(log(n)) solution, not an O(n) one.
> 
> I will check out the data.table package, as suggested.
> 
> -- Bert
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Sun, Jun 5, 2016 at 11:01 AM, Ted Harding <ted.hard...@wlandres.net>
> wrote:
>> Surely it is straightforward, since the vector (say 'X') is already sorted?
>>
>> Example (raw code with explicit example):
>>
>> set.seed(54321)
>> X <- sort(runif(10))
>> # X ## The initial sorted vector
>> # [1] 0.04941009 0.17669234 0.20913493 0.21651016 0.27439354
>> # [6] 0.34161241 0.37165878 0.42900782 0.49843042 0.86636110
>>
>> y <- runif(1)
>> # y ## The new value to be inserted
>> [1] 0.1366424
>>
>> Y <- c(X[X<=y],y,X[X>y]) ## Now insert y into X:
>> Y
>> [1] 0.04941009 0.13664239 0.17669234 0.20913493 0.21651016 0.27439354
>> [7] 0.34161241 0.37165878 0.42900782 0.49843042 0.86636110
>>
>> ## And there it is at Y[2]
>>
>> Easy to make such a function!
>> Best wishes to all,
>> Ted.
>>
>> On 05-Jun-2016 17:44:29 Neal H. Walfield wrote:
>>> On Sun, 05 Jun 2016 19:34:38 +0200,
>>> Bert Gunter wrote:
>>>> This help thread suggested a question to me:
>>>>
>>>> Is there a function in some package that efficiently (I.e. O(log(n)) )
>>>> inserts a single new element into the correct location in an
>>>> already-sorted vector? My assumption here is that doing it via sort()
>>>> is inefficient, but maybe that is incorrect. Please correct me if so.
>>>
>>> I think data.table will do this if the the column is marked
>>> appropriately.
>>>
>>>> I realize that it would be straightforward to write such a function,
>>>> but I just wondered if it already exists. My google & rseek searches
>>>> did not succeed, but maybe I used the wrong keywords.
>>>>
>>>> Cheers,
>>>> Bert
>>>>
>>>>
>>>> Bert Gunter
>>>>
>>>> "The trouble with having an open mind is that people keep coming along
>>>> and sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>
>>>>
>>>> On Sun, Jun 5, 2016 at 9:47 AM, William Dunlap via R-help
>>>> <r-help@r-project.org> wrote:
>>>> > I don't know what you mean by "without having to use any special
>>>> > interfaces", but "reference classes" will do what I think you want. 
>>>> > E.g.,
>>>> > the following makes a class called 'SortedNumeric' that only sorts the
>>>> > vector when you want to get its value, not when you append values.  It
>>>> > stores the sorted vector so it does not get resorted each time you ask
>>>> > for
>>>> > it.
>>>> >
>>>> > SortedNumeric <- setRefClass("sortedNumeric",
>>>> > fields = list(
>>>> > fData = "numeric",
>>>> > fIsUnsorted = "logical"),
>>>> > methods = list(
>>>> > initialize = function(Data = numeric(), isUnsorted =
>>>> > TRUE)
>>>> > {
>>>> > fData <<- Data
>>>> > stopifnot(is.logical(isUnsorted),
>>>> >   length(isUnsorted)==1,
>>>> >   !is.na(isUnsorted))
>>>> > fIsUnsorted <<- isUnsorted
>>>> > },
>>>> > getData = function() {
>>>> > if (isUnsorted) {
>>>> > fData &

Re: [R] detecting if a variable has changed

2016-06-05 Thread Ted Harding

Surely it is straightforward, since the vector (say 'X') is already sorted?

Example (raw code with explicit example):

set.seed(54321)
X <- sort(runif(10))
# X ## The initial sorted vector
# [1] 0.04941009 0.17669234 0.20913493 0.21651016 0.27439354
# [6] 0.34161241 0.37165878 0.42900782 0.49843042 0.86636110

y <- runif(1)
# y ## The new value to be inserted
[1] 0.1366424

Y <- c(X[X<=y],y,X[X>y]) ## Now insert y into X:
Y
[1] 0.04941009 0.13664239 0.17669234 0.20913493 0.21651016 0.27439354
[7] 0.34161241 0.37165878 0.42900782 0.49843042 0.86636110

## And there it is at Y[2]

Easy to make such a function!
Best wishes to all,
Ted.

On 05-Jun-2016 17:44:29 Neal H. Walfield wrote:
> On Sun, 05 Jun 2016 19:34:38 +0200,
> Bert Gunter wrote:
>> This help thread suggested a question to me:
>> 
>> Is there a function in some package that efficiently (I.e. O(log(n)) )
>> inserts a single new element into the correct location in an
>> already-sorted vector? My assumption here is that doing it via sort()
>> is inefficient, but maybe that is incorrect. Please correct me if so.
> 
> I think data.table will do this if the the column is marked
> appropriately.
> 
>> I realize that it would be straightforward to write such a function,
>> but I just wondered if it already exists. My google & rseek searches
>> did not succeed, but maybe I used the wrong keywords.
>> 
>> Cheers,
>> Bert
>> 
>> 
>> Bert Gunter
>> 
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> 
>> 
>> On Sun, Jun 5, 2016 at 9:47 AM, William Dunlap via R-help
>> <r-help@r-project.org> wrote:
>> > I don't know what you mean by "without having to use any special
>> > interfaces", but "reference classes" will do what I think you want.  E.g.,
>> > the following makes a class called 'SortedNumeric' that only sorts the
>> > vector when you want to get its value, not when you append values.  It
>> > stores the sorted vector so it does not get resorted each time you ask for
>> > it.
>> >
>> > SortedNumeric <- setRefClass("sortedNumeric",
>> > fields = list(
>> > fData = "numeric",
>> > fIsUnsorted = "logical"),
>> > methods = list(
>> > initialize = function(Data = numeric(), isUnsorted = TRUE)
>> > {
>> > fData <<- Data
>> > stopifnot(is.logical(isUnsorted),
>> >   length(isUnsorted)==1,
>> >   !is.na(isUnsorted))
>> > fIsUnsorted <<- isUnsorted
>> > },
>> > getData = function() {
>> > if (isUnsorted) {
>> > fData <<- sort(fData)
>> > fIsUnsorted <<- FALSE
>> > }
>> > fData
>> > },
>> > appendData = function(newEntries) {
>> > fData <<- c(fData, newEntries)
>> > fIsUnsorted <<- TRUE
>> > }
>> > ))
>> >
>> > Use it as:
>> >
>> >> x <- SortedNumeric$new()
>> >> x$appendData(c(4,2,5))
>> >> x$appendData(c(1,8,9))
>> >> x
>> > Reference class object of class "sortedNumeric"
>> > Field "fData":
>> > [1] 4 2 5 1 8 9
>> > Field "fIsUnsorted":
>> > [1] TRUE
>> >> x$getData()
>> > [1] 1 2 4 5 8 9
>> >> x
>> > Reference class object of class "sortedNumeric"
>> > Field "fData":
>> > [1] 1 2 4 5 8 9
>> > Field "fIsUnsorted":
>> > [1] FALSE
>> >
>> >
>> > Outside of base R, I think the R6 package gives another approach to this.
>> >
>> >
>> > Bill Dunlap
>> > TIBCO Software
>> > wdunlap tibco.com
>> >
>> > On Sun, Jun 5, 2016 at 6:53 AM, Neal H. Walfield <n...@walfield.org>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> I have a huge list.  Normally it is sorted, but I want to be able to
>> >> add elements to it without having to use a

Re: [R] LaplacesDemon package installation

2016-02-04 Thread Ted Harding

See at [***] below.

On 04-Feb-2016 21:23:05 Rolf Turner wrote:
> 
> (1) You might get better mileage asking this on the r-sig-mac list.
> 
> (2) The phenomena you describe are puzzling and are beyond my capacity 
> to explain.  Perhaps someone else will be able to enlighten you.
> 
> (3) Out of idle curiosity I went to the github site and downloaded the
> zip file of the package.  (I could not see a *.tag.gz file, but perhaps 
> I just don't understand how github works.  Actually, there's no 
> "perhaps" about it!)
> 
> I unzipped the download and then did
> 
>  R CMD build LaplacesDemon-master
> 
> in the appropriate directory.  This created a file
> 
>  LaplacesDemon_15.03.19.tar.gz
> 
> Note that the version number seems to be as you require.
> 
> I then used your install.packages syntax, and the package installed 
> seamlessly.  It also loaded seamlessly.
> 
> So I don't know why the computer gods are picking on you.
>
[***] 
> Note that I am not working on a Mac, but rather running Linux (as do all 
> civilized human beings! :-) )

Might this be yet another candidate for a Fortune today?

Ted.

> cheers,
> 
> Rolf Turner
> 
> -- 
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
> 
> On 05/02/16 06:42, lluis.hurt...@uv.es wrote:
>> Dear all,
>>
>> I've recently changed my Mac and I am trying to reinstall my commonly
>> used R-packages. I'm having troubles with a package called
>> LaplacesDemon.
>>
>> This package is no more in the CRAN list and the developers web page
>> (http://www.bayesian-inference.com/software) is out for more than
>> half a year. Old versions of the package can still be found in tar.gz
>> in
>>
>> https://cran.r-project.org/src/contrib/Archive/LaplacesDemon/
>>
>> and in github
>>
>> https://github.com/ecbrown/LaplacesDemon
>>
>> Last version is LaplacesDemon_13.03.04.tar.gz, but I was able to get
>> version LaplacesDemon_15.03.19.tar.gz time ago (can't find it
>> anymore).
>>
>> I have tried to install this packages from source in my computer
>> using
>>
>>> install.packages("/Users/.../LaplacesDemon_15.03.19.tar.gz", repos
>>> = NULL, type="source")
>>
>> answer:
>>
>>> Warning: invalid package 'Users/.../LaplacesDemon_15.03.19.tar.gzâ
>>> Error: ERROR: no packages specified Warning message: In
>>> install.packages("Users/.../LaplacesDemon_15.03.19.tar.gz",  :
>> installation of package âUsers/.../LaplacesDemon_15.03.19.tar.gzâ had
>> non-zero exit status
>>
>> I also tried the 'Packages & Data' menu in R, selecting the source
>> file or the directory from Finder and I got this message:
>>
>>> * installing *source* package âLaplacesDemonâ ... ** R ** data **
>>> inst ** byte-compile and prepare package for lazy loading ** help
>>> *** installing help indices ** building package indices **
>>> installing vignettes ** testing if installed package can be loaded
>>> * DONE (LaplacesDemon)
>>
>> but
>>
>>> library(LaplacesDemon)
>> Error in library(LaplacesDemon) : there is no package called
>> âLaplacesDemonâ
>>
>> Finally I tried
>>
>>> install.packages("devtools") library(devtools)
>>> install_github("ecbrown/LaplacesDemon")
>>
>> but I am not able to install devtools (for similar reasons). So my
>> questions are:
>>
>> -Do anyone knows how to install this packages in my Mac? -Do anyone
>> knows were can I find the information previously content in
>> http://www.bayesian-inference.com/software?
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) <ted.hard...@wlandres.net>
Date: 04-Feb-2016  Time: 22:03:31
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R project and the TPP

2016-02-04 Thread Ted Harding

Saludos José!
Could you please give a summary of the relevant parts of TPP
that might affect the use of R? I have looked up TPP on Wikipedia
without beginning to understand what it might imply for the use of R.
Best wishes,
Ted.

On 04-Feb-2016 14:43:29 JosÃ© Bustos wrote:
> Hi everyone,
> 
> I have a question regarding the use R software under the new TPP laws
> adopted by some governments in the region. Who know how this new agreements
> will affect researchers and the R community?
> 
> Hope some of you knows better and can give ideas about it.
> 
> saludos,
> JosÃ©

-----
E-Mail: (Ted Harding) <ted.hard...@wlandres.net>
Date: 04-Feb-2016  Time: 22:01:42
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Has R-help changed reply-to policy?

2016-02-04 Thread Ted Harding

Steve, I'm inclined to suspect that something *has* changed at your end
(or along the line between you and R-help). In replying to your message,
selecting "include all recipients" (i.e. reply to all), the result was:
  To: S Ellison <s.elli...@lgcgroup.com>
  Cc: r-help@r-project.org
just as it always has been! So no change that *I* can perceive at the
R-help end.

Hoping this is useful,
Ted.

On 04-Feb-2016 16:33:29 S Ellison wrote:
> Apologies if I've missed a post, but have the default treatment of posts and
> reply-to changed on R-Help of late?
> 
> I ask because as of today, my email client now only lists the OP email when
> replying to an R-help message, even with a reply-all, so the default reply is
> not to the list.
> I also noticed a few months back that I no longer see my own posts, other
> than as a receipt notification, and tinkering with my account defaults
> doesn't change that.
> 
> I don't _think_ things have changed at my end ...
> 
> Steve Ellison

-----
E-Mail: (Ted Harding) <ted.hard...@wlandres.net>
Date: 04-Feb-2016  Time: 17:03:37
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help mailing list activity / R-not-help?

2016-01-25 Thread Ted Harding

My feelings exactly! (And since quite some time ago).
Ted.

On 25-Jan-2016 12:23:16 Fowler, Mark wrote:
> I'm glad to see the issue of negative feedback addressed. I can especially
> relate to the 'cringe' feeling when reading some authoritarian backhand to a
> new user. We do see a number of obviously inappropriate or overly lazy
> postings, but I encounter far more postings where I don't feel competent to
> judge their merit. It might be better to simply disregard a posting one does
> not like for some reason. It might also be worthwhile to actively counter
> negative feedback when we experience that 'cringing' moment. I'm not thinking
> to foster contention, but simply to provide some tangible reassurance to new
> users, and not just the ones invoking the negative feedback, that a
> particular respondent may not represent the perspective of the list.
> 
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Michael
> Friendly
> Sent: January 24, 2016 5:43 PM
> To: Jean-Luc Dupouey; r-help@r-project.org
> Subject: Re: [R] R-help mailing list activity / R-not-help?
> 
> 
> On 1/23/2016 7:28 AM, Jean-Luc Dupouey wrote:
>> Dear members,
>>
>> Not a technical question:
> But one worth raising...
>>
>> The number of threads in this mailing list, following a long period of 
>> increase, has been regularly and strongly decreasing since 2010, 
>> passing from more than 40K threads to less than 11K threads last year. 
>> The trend is similar for most of the "ancient" mailing lists of the
>> R-project.
> [snip ...]
>>
>> I hope it is the wright place to ask this question. Thanks in advance,
>>
> 
> In addition to the other replies, there is another trend I've seen that has
> actively worked to suppress discussion on R-help and move it elsewhere. The
> general things:
> - R-help was too unwieldy and so it was a good idea to hive-off specialized
> topics to various sub lists, R-SIG-Mac, R-SIG-Geo, etc.
> - Many people posted badly-formed questions to R-help, and so it was a good
> idea to develop and refer to the posting guide to mitigate the number of
> purely junk postings.
> 
> 
> Yet, the trend I've seen is one of increasing **R-not-help**, in that there
> are many posts, often by new R users who get replies that not infrequently
> range from just mildly off-putting to actively hostile:
> 
> - Is this homework? We don't do homework (sometimes false alarms, where the
> OP has to reply to say it is not)
> - Didn't you bother to do your homework, RTFM, or Google?
> - This is off-topic because XXX (e.g., it is not strictly an R programming
> question).
> - You asked about doing XXX, but this is a stupid thing to want to do.
> - Don't ask here; you need to talk to a statistical consultant.
> 
> I find this sad in a public mailing list sent to all R-help subscribers and I
> sometimes cringe when I read replies to people who were actually trying to
> get help with some R-related problem, but expressed it badly, didn't know
> exactly what to ask for, or how to format it, or somehow motivated a
> frequent-replier to publicly dis the OP.
> 
> On the other hand, I still see a spirit of great generosity among some people
> who frequently reply to R-help, taking a possibly badly posed or
> ill-formatted question, and going to some lengths to provide a a helpful
> answer of some sort.  I applaud those who take the time and effort to do
> this.
> 
> I use R in a number of my courses, and used to advise students to post to
> R-help for general programming questions (not just homework) they couldn't
> solve. I don't do this any more, because several of them reported a negative
> experience.
> 
> In contrast, in the Stackexchange model, there are numerous sublists
> cross-classified by their tags.  If I have a specific knitr, ggplot2, LaTeX,
> or statistical modeling question, I'm now more likely to post it there, and
> the worst that can happen is that no one "upvotes" it or someone (helpfully)
> marks it as a duplicate of a similar question.
> But comments there are not propagated to all subscribers, and those who reply
> helpfully, can see their solutions accepted or not, or commented on in that
> specific topic.
> 
> Perhaps one solution would be to create a new "R-not-help" list where, as in
> a Monty Python skit, people could be directed there to be insulted and all
> these unhelpful replies could be sent.
> 
> A milder alternative is to encourage some R-help subscribers to click the
> "Don't send" or "Save" button and think better of their replies.
> 
> 
> -- 
> Michael Friendly Email: friendly AT yorku DOT ca
&

Re: [R] If else

2015-10-31 Thread Ted Harding

[Apologies if the message below should arrive twice. When first
sent there was apparently something wrong with the email address
to r-help, and it was held for moderation because "Message has
implicit destination" (whatever that means). I have made sure
that this time the email address is correct.]

John Fox has given a neat expression to achieve the desired result!

I would like to comment, however, on the somewhat insistent criticism
of Val's request from several people.

It can make sense to have three "sex"es. Suppose, for example,
that the data are records of street crime reported by victims.
The victim may be able to identify the sex of the preprator
as definitely "M", or definitely "F". One of the aims of the
analysis is to investgate whether there is an association
between the gender of the offender and the type of crime.

But in some cases the victim may not have been able to recognise
the offender's sex. Then it would have to go in the record as "NA"
(or equivalent). There can be two kinds of reason why the victim
was unable to recognise the sex. One kind is where the victim
simply did not see the offender (e.g. their purse was stolen
while they were concentrating on something else, and they only
found out later). Another kind is where the offender deliberately
disguises their gender, so that it cannot be determined from their
appearance. This second kind could be associated with a particular
category of crime (and I leave it to people's lurid imaginations
to think of possible examples ... ).

Then one indeed has three "sex"es: Male, Female, and Indeterminate,
for each of which there is a potential assoctiation with type of crime.
With most analyses, however, a category of "NA" would be ignored
(at least by R).

And then one has a variable which is a factor with 3 levels, all
of which can (as above) be meaningful), and "NA" would not be
ignored.

Hoping this helps to clarify! (And, Val, does the above somehow
correspond to your objectives).

Best wishes to all,
Ted.

On 31-Oct-2015 17:41:02 Jeff Newmiller wrote:
> Rolf gave you two ways. There are others. They all misrepresent the data
> (there are only two sexes but you are effectively acting as if there are
> three); hence the inquisition in hopes of diverting you to a more correct
> method of analysis. However, this is not the support forum for whatever other
> software you plan to proceed with so never mind.
> ---
> Jeff NewmillerThe .   .  Go Live...
> DCN:<jdnew...@dcn.davis.ca.us>Basics: ##.#.   ##.#.  Live Go...
>   Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
> --- 
> Sent from my phone. Please excuse my brevity.
> 
> On October 31, 2015 10:15:33 AM PDT, Val <valkr...@gmail.com> wrote:
>>Hi Jeff,
>>
>>I thought I answered. Yes I was not clear about it.  The further
>>analysis
>>will no  be done by R.  It is another software that will not accept a
>>character response variable.
>>
>>Why R is so complicated to do that.  If it is SAS then I can do it on
>>one
>>statement. .
>>
>>
>>On Sat, Oct 31, 2015 at 11:39 AM, Jeff Newmiller
>><jdnew...@dcn.davis.ca.us>
>>wrote:
>>
>>> You haven't actually answered John's question as to the type of
>>analysis
>>> you plan to do. It still looks from here like you should be using
>>factor
>>> data rather than numeric, but since you are not being clear we cannot
>>give
>>> specifics as to how to proceed.
>>>
>>---
>>> Jeff NewmillerThe .   .  Go
>>Live...
>>> DCN:<jdnew...@dcn.davis.ca.us>Basics: ##.#.   ##.#.  Live
>>> Go...
>>>   Live:   OO#.. Dead: OO#.. 
>>Playing
>>> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
>>> /Software/Embedded Controllers)   .OO#.   .OO#. 
>>rocks...1k
>>>
>>---
>>> Sent from my phone. Please excuse my brevity.
>>>
>>> On October 31, 2015 8:23:05 AM PDT, Val <valkr...@gmail.com> wrote:
>>> >Hi All,
>>> >
>>> >
>>> >Yes I need  to change to numeric  because I am pre

Re: [R] Subscription request

2015-10-14 Thread Ted Harding

On 14-Oct-2015 15:19:06 Manish Sindagi wrote:
> Hi,
> 
> I have a few R programming related questions that i wanted to ask.
> Can you please accept my subscription request.
> 
> Regards,
> Manish.

Visit the R-help info web page:

  https://stat.ethz.ch/mailman/listinfo/r-help

Towards the bottom of this page is a section "Subscribing to R-help".
Follow the instructions in this section, and it should work!

Best wishes,
Ted.

-----
E-Mail: (Ted Harding) <ted.hard...@wlandres.net>
Date: 14-Oct-2015  Time: 19:34:55
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Beta distribution approximate to Normal distribution

2015-09-15 Thread Ted Harding

Using non-central chi-squared (especially with df=1) is unlikely
to generate random numbers anywhere near a Normal distribution
(see below).

And "rchisq(100, df=1, ncp=u/a)" won't work anyway with u<0,
since ncp must be >= 0 (if < 0 then all are NA).

Better to shoot straight for the target (truncated Normal), though
several shots are likely to be required! For example (code which
spells it out), taking u=3 and a=2:

  n <- 100
  u <- 3 ; a <- 2
  x <- NULL
  N <- length(x)
  while(N < n){
x <- c(x,rnorm(n,mean=u,sd=a))
x <- x[x>0]
N <- length(x)
  }
  x <- x[1:n]

Comparison with non-central chi-squared:

  y <- rchisq(100, df=1, ncp=u/a)
  hist(x)
  hist(y)

On 15-Sep-2015 13:26:44 jlu...@ria.buffalo.edu wrote:
> Your question makes no sense as stated.  However, guessing at what you 
> want, you should  perhaps consider the non-central chi-square density with 
> 1 df and ncp = u/a, i.e,
> 
> rchisq(100, df=1, ncp=u/a)
> 
> Joe
> Joseph F. Lucke, PhD
> Senior Statistician
> Research Institute on Addictions
> University at Buffalo
> State University of New York
> 1021 Main Street
> Buffalo, NY  14203-1016
> 
> Chien-Pang Chin <chienpan...@gmail.com> 
> Sent by: "R-help" <r-help-boun...@r-project.org>
> 09/15/2015 06:58 AM
> 
> To
> "r-help@r-project.org" <r-help@r-project.org>, 
> 
> Subject
> [R] Beta distribution approximate to Normal distribution
> 
> Hi,
> I need to generate 1000 numbers from N(u, a^2), however I don't
> want to include 0 and negative values. How can I use beta distribution
> approximate to N(u, a^2) in R.
> 
> Thx for help

-
E-Mail: (Ted Harding) <ted.hard...@wlandres.net>
Date: 15-Sep-2015  Time: 16:12:35
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Variance is different in R vs. Excel?

2015-02-09 Thread Ted Harding

[See at end]

On 09-Feb-2015 21:45:11 David L Carlson wrote:
 Time for a new version of Excel? I cannot duplicate your results in Excel
 2013.
 
 R:
 apply(dat, 2, var)
 [1] 21290.80 24748.75
 
 Excel 2013:
 =VAR.S(A2:A21)   =VAR.S(B2:B21)
 21290.8  24748.74737
 
 -
 David L Carlson
 Department of Anthropology
 Texas AM University
 College Station, TX 77840-4352
 
 
 -Original Message-
 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Karl Fetter
 Sent: Monday, February 9, 2015 3:33 PM
 To: r-help@r-project.org
 Subject: [R] Variance is different in R vs. Excel?
 
 Hello everyone, I have a simple question. when I use the var() function in
 R to find a variance, it differs greatly from the variance found in excel
 using the =VAR.S function. Any explanations on what those two functions are
 actually doing?
 
 Here is the data and the results:
 
 dat-matrix(c(402,908,553,522,627,1040,756,679,806,711,713,734,683,790,597,872
 ,476,1026,423,476,419,591,376,640,550,601,588,499,646,693,351,730,632,707,779,
 838,814,771,533,818),
 nrow=20, ncol=2, byrow=T)
 
 var(dat[,1])
#21290.8
 
 var(dat[,2])
#24748.75
 
#in Excel, the variance of dat[,1] = 44763.91; for dat[,2] = 52034.2
 
 Thanks,
 Karl

I suspect that something has happened to the reading-in of the
data into Excel. (I don't know much about Excel, and that's because
I don't want to ... ).

The ratio of the variances of the two datasets in R is:

  var(dat[,2])/var(dat[,1])
  # [1] 1.162415

while the ratio of th results from Excel is:

  52034.2/44763.91
  # [1] 1.162414

so they are almost identical. 

So it is as if Excel was evaluating the variances for data which
are

  sqrt(44763.91/var(dat[,1]))
  # [1] 1.45
  sqrt(52034.2/var(dat[,2]))
  # [1] 1.44

times the data used by R. So maybe there's a nasty lurking somewhere
in the spreadsheet? (Excel is notorious for planting things invisibly
in its spreadsheets which lead to messed-up results for no apparent
reasion ... ).

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 09-Feb-2015  Time: 22:15:44
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] the less-than-minus gotcha

2015-02-02 Thread Ted Harding

On 02-Feb-2015 11:58:10 Rolf Turner wrote:
 
 On 02/02/15 14:26, Steve Taylor wrote:
 
 All the more reason to use = instead of -
 
 Couldn't agree less, Steve. The - should be used for assignment. The 
 = sign should be reserved for handling function arguments in the 
 name=value form.  Doing anything else invites confusion and 
 occasionally chaos.
 
 Lots of examples have been given in the past involving syntax of the 
 form foo(x = y) and foo(x - y).
 
 cheers,
 Rolf
 -- 
 Rolf Turner

As well as agreering with Rolf, it should also be said that Steve's
advice All the more reason to use = instead of - is not applicable
in this context, which was:

  which( frame$var4 )   # no problem
  which( frame$var-4 )  # huge problem: frame$var is destroyed

There is no place for an = here!

What does not seems to have been mentioned so far is that this
kind of thing can be safely wrapped in parentheses:

  which( frame$var4 )# no problem oper
  which( frame$var(-4) ) # [again no problem]

Personally, I'm prone to using parentheses if I have any feeling
that a gotcha may be lurking -- not only in the distinction
between var-4 and var -4, but also to be confident that,
for instance, my intentions are not being over-ridden by operator
precedence rules.

Some people object to code clutter from parentheses that could
be more simply replaced (e.g. var -4 instead of var(-4)),
but parentheses ensure that it's right and also make it clear
when one reads it.

Best wishes to all,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 02-Feb-2015  Time: 12:43:51
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculate the median age interval

2015-01-12 Thread Ted Harding

Sorry, a typo in my reply below. See at ###.

On 12-Jan-2015 11:12:43 Ted Harding wrote:
 On 12-Jan-2015 10:32:41 Erik B Svensson wrote:
 Hello
 I've got a problem I don't know how to solve. I have got a dataset that
 contains age intervals (age groups) of people and the number of persons in
 each age group each year (y1994-y1996). The number of persons varies each
 year. I only have access to the age intervals, not the age of each person,
 which would make things easier.
 
 I want to know the median age interval (not the median number) for each
 year. Let's say that in y1994 23 corresponds to the median age interval
 45-54, I want to 45-54 as a result. How is that done?
 
 This is the sample dataset:
 
 agegrp -
 c(1,1-4,5-14,15-24,25-34,35-44,45-54,55-64,65-74,
   75-84,84-)
   y1994 - c(0,5,7,9,25,44,23,32,40,36,8)
   y1995 - c(2,4,1,7,20,39,32,18,21,23,5)
   y1996 - c(1,3,1,4,22,37,41,24,24,26,8)
 
 I look forward to your response
 
 Best regards,
 Erik Svensson
 
 In principle, this is straightforward. But in ##practice you may
 need to be careful about how to deal with borderline cases -- and
 about what you mean by median age interval.
 The underlying idea is based on:
 
  cumsum(y1994)/sum(y1994)
  # [1] 0. 0.02183406 0.05240175 0.09170306 0.20087336
  # [6] 0.39301310 0.49344978 0.63318777 0.80786026 0.96506550 1.
 
 Thus age intervals 1-7 (1 - 45-64) contain less that 50%
 (0.49344978...), though 45-64 almost gets there. However,
 age groups 1-8 (1 - 55-64 contain more than 50%. Hence
 the median age is within 49-64.
### Should be:
  age groups 1-8 (1 - 55-64) contain more than 50%. Hence
  the median age is within 55-64.

 Implementing the above as a procedure:
 
   agegrp[max(which(cumsum(y1994)/sum(y1994)0.5)+1)]
   # [1] 55-64
 
 Note that the obvious solution:
 
   agegrp[max(which(cumsum(y1994)/sum(y1994) = 0.5))]
   # [1] 45-54
 
 gives an incorrect answer, since with these data it returns a group
 whose maximum age is below the median. This is because the = is
 satisfied by  also.
 
 Hoping this helps!
 Ted.
 
 -
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Date: 12-Jan-2015  Time: 11:12:39
 This message was sent by XFMail
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 12-Jan-2015  Time: 11:21:11
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculate the median age interval

2015-01-12 Thread Ted Harding

On 12-Jan-2015 10:32:41 Erik B Svensson wrote:
 Hello
 I've got a problem I don't know how to solve. I have got a dataset that
 contains age intervals (age groups) of people and the number of persons in
 each age group each year (y1994-y1996). The number of persons varies each
 year. I only have access to the age intervals, not the age of each person,
 which would make things easier.
 
 I want to know the median age interval (not the median number) for each
 year. Let's say that in y1994 23 corresponds to the median age interval
 45-54, I want to 45-54 as a result. How is that done?
 
 This is the sample dataset:
 
agegrp -
c(1,1-4,5-14,15-24,25-34,35-44,45-54,55-64,65-74,
  75-84,84-)
  y1994 - c(0,5,7,9,25,44,23,32,40,36,8)
  y1995 - c(2,4,1,7,20,39,32,18,21,23,5)
  y1996 - c(1,3,1,4,22,37,41,24,24,26,8)

 I look forward to your response
 
 Best regards,
 Erik Svensson

In principle, this is straightforward. But in practice you may
need to be careful about how to deal with borderline cases -- and
about what you mean by median age interval.
The underlying idea is based on:

 cumsum(y1994)/sum(y1994)
 # [1] 0. 0.02183406 0.05240175 0.09170306 0.20087336
 # [6] 0.39301310 0.49344978 0.63318777 0.80786026 0.96506550 1.

Thus age intervals 1-7 (1 - 45-64) contain less that 50%
(0.49344978...), though 45-64 almost gets there. However,
age groups 1-8 (1 - 55-64 contain more than 50%. Hence
the median age is within 49-64.

Implementing the above as a procedure:

  agegrp[max(which(cumsum(y1994)/sum(y1994)0.5)+1)]
  # [1] 55-64

Note that the obvious solution:

  agegrp[max(which(cumsum(y1994)/sum(y1994) = 0.5))]
  # [1] 45-54

gives an incorrect answer, since with these data it returns a group
whose maximum age is below the median. This is because the = is
satisfied by  also.

Hoping this helps!
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 12-Jan-2015  Time: 11:12:39
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] diff question

2015-01-11 Thread Ted Harding

Troels, this is due to the usual tiny difference between numbers
as computed by R and the numbers that you think they are!

  tt  - seq(0,20,by=0.02)
  dtt - diff(tt)
  length(dtt)
  # [1] 1000
  r02 - rep(0.02,1000)
  unique(r02 - dtt)
  # [1]  0.00e+00  3.469447e-18 -3.469447e-18  1.040834e-17
  # [5] -1.734723e-17  3.816392e-17  9.367507e-17  2.046974e-16 
  # [9]  4.267420e-16 -4.614364e-16 -1.349615e-15 -3.125972e-15

Hoping this helps!
Ted.

On 11-Jan-2015 08:29:26 Troels Ring wrote:
 R version 3.1.1 (2014-07-10) -- Sock it to Me
 Copyright (C) 2014 The R Foundation for Statistical Computing
 Platform: x86_64-w64-mingw32/x64 (64-bit)
 
 Dear friends - I have a small problem with diff (I guess)
 I made a sequence with fixed interval between consecutive elements - and 
 hence thought the  diff would be as specified
 but had a vector with apparently identical 12 elements returned from diff
 tt - seq(0,20,by=0.02)
 unique(diff(tt)) #[1] 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 
 0.02 0.02
 Trying to see if these elements in diff were duplicated
 duplicated(diff(tt))
#[1] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE and from
 sum(duplicated(diff(tt)))
 [1] 988
 saw that 12 of the elements in duplicated(diff(tt)) were FALSE. Would it 
 be expected that the first was FALSE and the rest TRUE?
|duplicated()|determines which elements of a vector or data frame are 
 duplicates of elements with smaller subscripts, and returns a logical 
 vector indicating which elements (rows) are duplicates.
 
 All best wishes
 Troels
 Aalborg, Denmark
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 11-Jan-2015  Time: 08:48:03
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] diff question

2015-01-11 Thread Ted Harding

I should have added an extra line to the code below, to complete
the picture. Here it is (see below line ##.
Ted.

On 11-Jan-2015 08:48:06 Ted Harding wrote:
 Troels, this is due to the usual tiny difference between numbers
 as computed by R and the numbers that you think they are!

  tt  - seq(0,20,by=0.02)
  dtt - diff(tt)
  length(dtt)
  # [1] 1000
  r02 - rep(0.02,1000)
  unique(r02 - dtt)
  # [1]  0.00e+00  3.469447e-18 -3.469447e-18  1.040834e-17
  # [5] -1.734723e-17  3.816392e-17  9.367507e-17  2.046974e-16
  # [9]  4.267420e-16 -4.614364e-16 -1.349615e-15 -3.125972e-15
  ##
  sum(dtt != 0.02)
  # [1] 998

So only 2 values among the 1000 in diff(tt) are exactly equal to
[R's representation of] 0.2!

 Hoping this helps!
 Ted.
 
 On 11-Jan-2015 08:29:26 Troels Ring wrote:
 R version 3.1.1 (2014-07-10) -- Sock it to Me
 Copyright (C) 2014 The R Foundation for Statistical Computing
 Platform: x86_64-w64-mingw32/x64 (64-bit)
 
 Dear friends - I have a small problem with diff (I guess)
 I made a sequence with fixed interval between consecutive elements - and 
 hence thought the  diff would be as specified
 but had a vector with apparently identical 12 elements returned from diff
 tt - seq(0,20,by=0.02)
 unique(diff(tt)) #[1] 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 
 0.02 0.02
 Trying to see if these elements in diff were duplicated
 duplicated(diff(tt))
#[1] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE and from
 sum(duplicated(diff(tt)))
 [1] 988
 saw that 12 of the elements in duplicated(diff(tt)) were FALSE. Would it 
 be expected that the first was FALSE and the rest TRUE?
|duplicated()|determines which elements of a vector or data frame are 
 duplicates of elements with smaller subscripts, and returns a logical 
 vector indicating which elements (rows) are duplicates.
 
 All best wishes
 Troels
 Aalborg, Denmark
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 -
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Date: 11-Jan-2015  Time: 08:48:03
 This message was sent by XFMail
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 11-Jan-2015  Time: 11:41:27
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rounding down with as.integer

2015-01-01 Thread Ted Harding

I've been followeing this little tour round the murkier bistros
in the back-streets of R with interest! Then it occurred to me:
What is wrong with [using example data]:

  x0 - c(0,1,2,0.325,1.12,1.9,1.003)
  x1 - as.integer(as.character(1000*x0))
  n1 - c(0,1000,2000,325,1120,1900,1003)

  x1 - n1
  ## [1] 0 0 0 0 0 0 0

  ## But, of course:
  1000*x0 - n1
  ## [1]  0.00e+00  0.00e+00  0.00e+00  0.00e+00
  ## [5]  0.00e+00  0.00e+00 -1.136868e-13

Or am I missing somthing else in what Mike Miller is seeking to do?
Ted.

On 01-Jan-2015 19:58:02 Mike Miller wrote:
 I'd have to say thanks, but no thanks, to that one!  ;-)  The problem is 
 that it will take a long time and it will give the same answer.
 
 The first time I did this kind of thing, a year or two ago, I manipulated 
 the text data to produce integers before putting the data into R.  The 
 data were a little different -- already zero padded with three digits to 
 the right of the decimal and one to the left, so all I had to do was drop 
 the decimal point.  The as.integer(1000*x+.5) method is very fast and it 
 works great.
 
 I could have done that this time, but I was also saving to other formats, 
 so I had the data already in the format I described.
 
 Mike
 
 
 On Thu, 1 Jan 2015, Richard M. Heiberger wrote:
 
 Interesting.  Following someone on this list today the goal is input
 the data correctly.
 My inclination would be to read the file as text, pad each number to
 the right, drop the decimal point,
 and then read it as an integer.
 0 1 2 0.325 1.12 1.9
 0.000 1.000 2.000 0.325 1.120 1.900
  1000 2000 0325 1120 1900

 The pad step is the interesting step.

 ## 0 1 2 0.325 1.12 1.9
 ## 0.000 1.000 2.000 0.325 1.120 1.900
 ##  1000 2000 0325 1120 1900

 x.in - scan(text=
 0 1 2 0.325 1.12 1.9 1.
 , what=)

 padding - c(.000, 000, 00, 0, )

 x.pad - paste(x.in, padding[nchar(x.in)], sep=)

 x.nodot - sub(., , x.pad, fixed=TRUE)

 x - as.integer(x.nodot)


 Rich


 On Thu, Jan 1, 2015 at 1:21 PM, Mike Miller mbmille...@gmail.com wrote:
 On Thu, 1 Jan 2015, Duncan Murdoch wrote:

 On 31/12/2014 8:44 PM, David Winsemius wrote:


 On Dec 31, 2014, at 3:24 PM, Mike Miller wrote:

 This is probably a FAQ, and I don't really have a question about it, but
 I just ran across this in something I was working on:

 as.integer(1000*1.003)

 [1] 1002

 I didn't expect it, but maybe I should have.  I guess it's about the
 machine precision added to the fact that as.integer always rounds down:


 as.integer(1000*1.003 + 255 * .Machine$double.eps)

 [1] 1002

 as.integer(1000*1.003 + 256 * .Machine$double.eps)

 [1] 1003


 This does it right...

 as.integer( round( 1000*1.003 ) )

 [1] 1003

 ...but this seems to always give the same answer and it is a little
 faster in my application:

 as.integer( 1000*1.003 + .1 )

 [1] 1003


 FYI - I'm reading in a long vector of numbers from a text file with no
 more than three digits to the right of the decimal.  I'm converting them
 to
 integers and saving them in binary format.


 So just add 0.0001 or even .001 to all of them and coerce to integer.


 I don't think the original problem was stated clearly, so I'm not sure
 whether this is a solution, but it looks wrong to me.  If you want to
 round
 to the nearest integer, why not use round() (without the as.integer
 afterwards)?  Or if you really do want an integer, why add 0.1 or 0.0001,
 why not add 0.5 before calling as.integer()?  This is the classical way to
 implement round().

 To state the problem clearly, I'd like to know what result is expected for
 any real number x.  Since R's numeric type only approximates the real
 numbers we might not be able to get a perfect match, but at least we could
 quantify how close we get.  Or is the input really character data?  The
 original post mentioned reading numbers from a text file.



 Maybe you'd like to know what I'm really doing.  I have 1600 text files
 each
 with up to 16,000 lines with 3100 numbers per line, delimited by a single
 space.  The numbers are between 0 and 2, inclusive, and they have up to
 three digits to the right of the decimal.  Every possible value in that
 range will occur in the data.  Some examples numbers: 0 1 2 0.325 1.12 1.9.
 I want to multiply by 1000 and store them as 16-bit integers (uint16).

 I've been reading in the data like so:

 data - scan( file=FILE, what=double(), nmax=3100*16000)


 At first I tried making the integers like so:

 ptm - proc.time() ; ints - as.integer( 1000 * data ) ; proc.time()-ptm

user  system elapsed
   0.187   0.387   0.574

 I decided I should compare with the result I got using round():

 ptm - proc.time() ; ints2 - as.integer( round( 1000 * data ) ) ;
 proc.time()-ptm

user  system elapsed
   1.595   0.757   2.352

 It is a curious fact that only a few of the values from 0 to 2000 disagree
 between the two methods:

 table( ints2[ ints2 != ints ] )


  1001  1003  1005  1007  1009  1011  1013  1015

Re: [R] Cox model -missing data.

2014-12-19 Thread Ted Harding

Hi Aoife,
I think that if you simply replace each * in the data file
with NA, then it should work (NA is usually interpreted
as missing for those functions for which missingness is
relevant). How you subsequently deal with records which have
missing values is another question (or many questions ... ).

So your data should look like:

V1   V2  V3   Survival   Event
ann  13  WTHomo   41
ben  20  NA   51
tom  40  Variant  61

Hoping this helps,
Ted.

On 19-Dec-2014 10:12:00 aoife doherty wrote:
 Hi all,
 
 I have a data set like this:
 
 Test.cox file:
 
 V1V2 V3   Survival   Event
 ann  13  WTHomo   41
 ben  20  *51
 tom  40  Variant  61
 
 
 where * indicates that I don't know what the value is for V3 for Ben.
 
 I've set up a Cox model to run like this:
 
#!/usr/bin/Rscript
 library(bdsmatrix)
 library(kinship2)
 library(survival)
 library(coxme)
 death.dat - read.table(Test.cox,header=T)
 deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid))
 sink(Test.cox.R.Output)
 Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) +
 strata(factor(V2)) + factor(V3)) +
 (1|ID),data=death.dat,varlist=deathdat.kmat)
 Model
 sink()
 
 
 
 As you can see from the Test.cox file, I have a missing value *. How and
 where do I tell the R script treat * as a missing variable. If I can't
 incorporate missing values into the model, I assume the alternative is to
 remove all of the rows with missing data, which will greatly reduce my data
 set, as most rows have at least one missing variable.
 
 Thanks
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 19-Dec-2014  Time: 10:21:23
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cox model -missing data.

2014-12-19 Thread Ted Harding

Yes, your basic reasoning is correct. In general, the observed variables
carry information about the variables with missing values, so (in some
way) the missing values can be replaced with estimates (imputations)
and the standard regression method will then work as though the
replacements were there is the first place. To incorporate the inevitable
uncertainty about what the missing values really were, one approach
(multiple imputation) is to do the replacement many times over,
sampling the replacement values from a posterior distribution estimated
from the non-missing data. There are other approaches.

This is where the many questions kick in! I don't have time at the
moment, to go into further detail (there's a lot of it, and several
R packages which deal with missing data in different ways), but I hope
that someone can meanwhile point you in the right direction.

With best wishes,
Ted.

On 19-Dec-2014 11:17:27 aoife doherty wrote:
 Many thanks, I appreciate the response.
 
 When I convert the missing values to NA and run the cox model as described
 in previous post,  the cox model seems to remove all of the rows with a
 missing value (as the number of rows n in the cox output after I
 completely remove any row with missing data is the same as the number of
 rows n in the cox output after I change the missing values to NA).
 
 What I had been hoping to do is not completely remove a row with missing
 data for a co-variable, but rather somehow censor or estimate a value for
 the missing value?
 
 In reality, I have ~600 people with survival data and say 6 variables
 attached to them. After I incorporate a 7th variable (for which the
 information isn't available for every individual), I have 400 people left.
 Since I still have survival data and almost all of the information for the
 other 200 people (the only thing missing is information about that 7th
 variable), it seems a waste to remove all of the survival data for 200
 people over one co-variate. So I was hoping instead of completely removing
 the rows, to just somehow acknowledge that the data for this particular
 co-variate is missing in the model but not completely remove the row? This
 is more what I was hoping someone would know if it's possible to
 incorporate into the model I described above?
 
 Thanks
 
 
 
 On Fri, Dec 19, 2014 at 10:21 AM, Ted Harding ted.hard...@wlandres.net
 wrote:

 Hi Aoife,
 I think that if you simply replace each * in the data file
 with NA, then it should work (NA is usually interpreted
 as missing for those functions for which missingness is
 relevant). How you subsequently deal with records which have
 missing values is another question (or many questions ... ).

 So your data should look like:

 V1   V2  V3   Survival   Event
 ann  13  WTHomo   41
 ben  20  NA   51
 tom  40  Variant  61

 Hoping this helps,
 Ted.

 On 19-Dec-2014 10:12:00 aoife doherty wrote:
  Hi all,
 
  I have a data set like this:
 
  Test.cox file:
 
  V1V2 V3   Survival   Event
  ann  13  WTHomo   41
  ben  20  *51
  tom  40  Variant  61
 
 
  where * indicates that I don't know what the value is for V3 for Ben.
 
  I've set up a Cox model to run like this:
 
 #!/usr/bin/Rscript
  library(bdsmatrix)
  library(kinship2)
  library(survival)
  library(coxme)
  death.dat - read.table(Test.cox,header=T)
  deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid))
  sink(Test.cox.R.Output)
  Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) +
  strata(factor(V2)) + factor(V3)) +
  (1|ID),data=death.dat,varlist=deathdat.kmat)
  Model
  sink()
 
 
 
  As you can see from the Test.cox file, I have a missing value *. How
 and
  where do I tell the R script treat * as a missing variable. If I can't
  incorporate missing values into the model, I assume the alternative is to
  remove all of the rows with missing data, which will greatly reduce my
 data
  set, as most rows have at least one missing variable.
 
  Thanks
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 -
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Date: 19-Dec-2014  Time: 10:21:23
 This message was sent by XFMail
 -

 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https

Re: [R] Printing/Generating/Outputting a Table (Not Latex)

2014-12-09 Thread Ted Harding

The program 'gv' is installed on just about any linux system.
It has many available options (one, which might be useful,
being -watch, whose effect is that if the file being displayed
is changed, e.g. by being over-written by a new file with the
same name, then 'gv' automatically updates what it is displaying).

And of course many linux users install 'acroread' (Acrobat
Reader), though some object!

Hoping this helps,
Ted.

On 09-Dec-2014 20:47:06 Richard M. Heiberger wrote:
 the last one is wrong.  That is the one for which I don't know the
 right answer on linux.
 
 'xdvi' displays dvi files.  you need to display a pdf file.
 whatever is the right program on linux to display pdf files is what
 belongs there.
 
 On Macintosh we can avoid knowing by using 'open', which means use the
 system standard.
 I don't know what the linux equivalent is, either the exact program or
 the instruction to use the standard.
 
 On Tue, Dec 9, 2014 at 3:36 PM, Kate Ignatius kate.ignat...@gmail.com
 wrote:
 I set these options:

 options(latexcmd='pdflatex')
 options(dviExtension='pdf')
 options(xdvicmd='xdvi')

 Maybe one too many?  I'm running in Linux.



 On Tue, Dec 9, 2014 at 3:24 PM, Richard M. Heiberger r...@temple.edu wrote:
 It looks like you skipped the step of setting the options.
 the latex function doesn't do pdflatex (by default it does regular
 latex) unless you set the options
 as I indicated.

 On Tue, Dec 9, 2014 at 3:11 PM, Kate Ignatius kate.ignat...@gmail.com
 wrote:
 Ah yes, you're right.

 The log has this error:

 ! LaTeX Error: Missing \begin{document}.

 Though can't really find much online on how to resolve it.

 On Tue, Dec 9, 2014 at 1:15 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us
 wrote:
 pdflatex appears to have run, because it exited. You should look at the
 tex log file, the problem is more likely that the latex you sent out to
 pdflatex was incomplete.
 --
 -
 Jeff NewmillerThe .   .  Go
 Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#. 
 rocks...1k
 --
 -
 Sent from my phone. Please excuse my brevity.

 On December 9, 2014 8:43:02 AM PST, Kate Ignatius
 kate.ignat...@gmail.com wrote:
Thanks!  I do get several errors though when running on Linux.

Running your code, I get this:

Error in system(cmd, intern = TRUE, wait = TRUE) :
error in running command

Fiddling around with the code and running this:

tmp - matrix(1:9,3,3)
tmp.tex - latex(tmp, file='tmp.tex')
print.default(tmp.tex)
tmp.dvi - dvi(tmp.tex)
tmp.dvi
tmp.tex
dvips(tmp.dvi)
dvips(tmp.tex)
library(tools)
texi2dvi(file='tmp.tex', pdf=TRUE, clean=TRUE)

I get this:

Error in texi2dvi(file=tmp.tex,,  :
  Running 'texi2dvi' on 'tmp.tex' failed.
Messages:
/usr/bin/texi2dvi: pdflatex exited with bad status, quitting.

I've read that it may have something to do with the path of pdflatex.

Sys.which('pdflatex')

   pdflatex

/usr/bin/pdflatex


Sys.which('texi2dvi')

   texi2dvi

/usr/bin/texi2dvi

 file.exists(Sys.which('texi2dvi'))

[1] TRUE

 file.exists(Sys.which('pdflatex'))

[1] TRUE

Is there a specific path I should be giving with pdflatex and/or
'texi2dvi to make this work?

Thanks!

On Mon, Dec 8, 2014 at 11:13 PM, Richard M. Heiberger r...@temple.edu
wrote:
 yes of course, and the answer is latex() in the Hmisc package.
 Why were you excluding it?
 Details follow

 Rich


 The current release of the Hmisc package has this capability on
 Macintosh and Linux.
 For Windows, you need the next release 3.14-7 which is available now
at github.

 ## windows needs these lines until the new Hmisc version is on CRAN
 install.packages(devtools)
 devtools::install_github(Hmisc, harrelfe)

 ## All operating systems
 options(latexcmd='pdflatex')
 options(dviExtension='pdf')

 ## Macintosh
 options(xdvicmd='open')

 ## Windows, one of the following

options(xdvicmd='c:\\progra~1\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe')
 ## 32-bit windows

options(xdvicmd='c:\\progra~2\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe')
 ## 64 bit windows

 ## Linux
 ## I don't know the xdvicmd value


 ## this works on all R systems
 library(Hmisc)
 tmp - matrix(1:9,3,3)
 tmp.dvi - dvi(latex(tmp))
 print.default(tmp.dvi) ## prints filepath of the pdf file
 tmp.dvi  ## displays the pdf file on your screen

 On Mon, Dec 8, 2014 at 9:31 PM, Kate Ignatius
kate.ignat...@gmail.com wrote:
 Hi,

 I have a simple question.  I know there are plenty of packages out
 there that can provide code to generate a table in latex.  But I was
 wondering whether there was one out there where I can generate a
table
 from my data (which ever way I

Re: [R] Inverse Student t-value

2014-09-30 Thread Ted Harding

And, with 1221 degrees of freedom, one cannot be far off a Normal
distribution. So:

  abs(qnorm(0.408831/2))
  [1] 4.102431

which is nearer to Duncan's answer (4.117, and a bit smaller, as it
should be) than to yours (4.0891672, which, if accurate, should
be greater than the Normal value 4.102431).

Ted.

On 30-Sep-2014 18:20:39 Duncan Murdoch wrote:
 On 30/09/2014 2:11 PM, Andre wrote:
 Hi Duncan,

 No, that's correct. Actually, I have data set below;
 
 Then it seems Excel is worse than I would have expected.  I confirmed 
 R's value in two other pieces of software,
 OpenOffice and some software I wrote a long time ago based on an 
 algorithm published in 1977 in Applied Statistics.  (They are probably 
 all using the same algorithm.  I wonder what Excel is doing?)
 
 N= 1223
 alpha= 0.05

 Then
 probability= 0.05/1223=0.408831
 degree of freedom= 1223-2= 1221

 So, TINV(0.408831,1221) returns 4.0891672


 Could you show me more detail a manual equation. I really appreciate 
 it if you may give more detail.
 
 I already gave you the expression:  abs(qt(0.408831/2, df=1221)). 
 For more detail, I suppose you could look at the help page for the qt 
 function, using help(qt).
 
 Duncan Murdoch
 

 Cheers!


 On Wed, Oct 1, 2014 at 1:01 AM, Duncan Murdoch 
 murdoch.dun...@gmail.com mailto:murdoch.dun...@gmail.com wrote:

 On 30/09/2014 1:31 PM, Andre wrote:

 Dear Sir/Madam,

 I am trying to use calculation for two-tailed inverse of the
 student`s
 t-distribution function presented by Excel functions like
 =TINV(probability, deg_freedom).

 For instance: The Excel function =TINV(0.408831,1221) = 
 returns
   4.0891672.

 Would you like to show me a manual calculation for this?

 Appreciate your helps in advance.


 That number looks pretty far off the true value.  Have you got a
 typo in your example?

 You can compute the answer to your question as
 abs(qt(0.408831/2, df=1221)), but you'll get 4.117.

 Duncan Murdoch



 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 30-Sep-2014  Time: 19:35:45
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data.table/ifelse conditional new variable question

2014-08-17 Thread Ted Harding

On 17-Aug-2014 03:50:33 John McKown wrote:
 On Sat, Aug 16, 2014 at 9:02 PM, Kate Ignatius kate.ignat...@gmail.com
 wrote:
 
 Actually - your code is not wrong... because this is a large file I
 went through the file to see if there was anything wrong with it -
 looks like there are two fathers or three mothers in some families.
 Taking these duplicates out fixed the problem.

 Sorry about the confusion!  And thanks so much for your help!


 Kate,
 I hope you don't mind, but I have a curiosity question on my part.
 Were the families with multiple fathers or mothers a mistake, just
 duplicates (same Family.ID  Sample.ID), or more like an intermixed
 family due to divorce and remarriage. Or even, like in some countries,
 a case of polygamy? Sorry, I just get curious about the strangest
 things sometimes.
 -- 
 There is nothing more pleasant than traveling and meeting new people!
 Genghis Khan
 
 Maranatha! 
 John McKown

When Kate first posted her query, similar thoughts to John's occurred
to me. The potential for convoluted ancestry and kinship is enormous!

For perhaps (or perhaps not) ultimate convolution, try reconstructing
a canine pedigree from a breeding register of thoroughbreds, where
again the primary data is for each individual is
  * ID of individual
  * ID of litter the individual was born in (family)
  * ID of male parent
  * ID of female parent
(as, for instance, registered with the UK Kennel Club).

Similar convolutions can be found with race-horses.

But even humans can compete. Here is a little challenge for anyone
who has an R program that will work out a pedigree from data such as
described above. I have used Kate's notation. Individuals are numbered
from 1 up (with a gap): Sample.ID; Families from 101 up: Family.ID.
Relationships are sibling, father, mother.

ID for father/mother may be NA (data not given).

Family.ID Sample.ID Relationship
101   01sibling
101   02father
101   03mother

102   02sibling
102   04father
102   05mother

103   03sibling
103   06father
103   07mother

104   04sibling
104   08father
104   09mother

104   05sibling
104   08father
104   09mother

104   06sibling
104   08father
104   09mother

104   15sibling
104   08father
104   09mother

105   07sibling
105   04father
105   15mother

106   08sibling
106   16father
106   17mother

106   18sibling
106   16father
106   17mother

106   19sibling
106   16father
106   17mother

107   09sibling
107   18father
107   19mother

108   16sibling
108   NAfather
108   NAmother

109   17sibling
109   NAfather
109   NAmother

That's the data. Now a little quiz question: Can you guess the
identity of the person with sample.ID = 01 ?

Best wishes to all,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 17-Aug-2014  Time: 19:41:38
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A basic statistics question

2014-08-13 Thread Ted Harding

On 12-Aug-2014 22:22:13 Ted Harding wrote:
 On 12-Aug-2014 21:41:52 Rolf Turner wrote:
 On 13/08/14 07:57, Ron Michael wrote:
 Hi,

 I would need to get a clarification on a quite fundamental statistics
 property, hope expeRts here would not mind if I post that here.

 I leant that variance-covariance matrix of the standardized data is
 equal to the correlation matrix for the unstandardized data. So I
 used following data.
 
 SNIP
 
 (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1]

 Point is that I am not getting exact CORR matrix. Can somebody point
 me what I am missing here?
 
 You are using a denominator of n in calculating your covariance 
 matrix for your normalized data.  But these data were normalized using 
 the sd() function which (correctly) uses a denominator of n-1 so as to 
 obtain an unbiased estimator of the population standard deviation.
 
 If you calculated
 
 (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1)
 
 then you would get the same result as you get from cor(Data) (to within 
 about 1e-15).
 
 cheers,
 Rolf Turner
 
 One could argue about (correctly)!
 
From the descriptive statistics point of view, if one is given a single
 number x, then this dataset has no variation, so one could say that
 sd(x) = 0. And this is what one would get with a denominator of n.
 
 But if the single value x is viewed as sampled from a distribution
 (with positive dispersion), then the value of x gives no information
 about the SD of the distribution. If you use denominator (n-1) then
 sd(x) = NA, i.e. is indeterminate (as it should be in this application).
 
 The important thing when using pre-programmed functions is to know
 which is being used. R uses (n-1), and this can be found from
 looking at
 
   ?sd
 
 or (with more detail) at
 
   ?cor
 
 Ron had assumed that the denominator was n, apparently not being aware
 that R uses (n-1).
 
 Just a few thoughts ...
 Ted.
 -
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Date: 12-Aug-2014  Time: 23:22:09

After some hesitation (not wanting to prolong a thread whose original
query has been effectively settled), nevertheless I think it may be
worth stating the general picture, for the sake of users who might
be confused or misled regarding the use of  the functions var(), cov(),
cor(), sd() etc.

The distinction to become aware of is between summary statistics
and statistics which will be used for estimation/inference.

Given a set of numbers, x[1], x[2], ... , x[N], they have a mean

  MEAN = sum(x)/N

and their variance VAR is the mean of the deviations between the x[i]
and MEAN:

  VAR = (sum(x - MEAN)^2)/N

If a random value X is drawn from {x[1], x[2], ... , x[N]}, with
uniform probability, then the expectation of X is again

  E(X) = MEAN

and the variance of X is again

  Var(X) = E((X - MEAN)^2) = VAR

with MEAN and VAR as given above. And the R function mean(x) will
return MEAN is its value. However, the R function var(x) will not
return VAR -- it will return (N/(N-1))*VAR

The above definitions of MEAN and VAR use division by N, i.e.
denominator = N. But the R functions var(x), sd(x) etc. divide by
(N-1), i.e. use denominator = N-1, as explained in
  ?var
  ?sd
etc.

The basic reason for this is that, given a random sample X[1], ... ,X[n]
of size n from {x[1], x[2], ... , x[N]}, the expectation of

  sum((X - mean(X))^2)

is (n-1)*VAR, so to obtain an unbiased estimate of VAR from the X
sample one must use the bias-corrected sample variance

  var(X) = sum((X - mean(X))^2)/(n-1)

i.e. denominator = (n-1) as described in the help pages.

So the function var(), with denominator = (n-1), is correct for
obtaining an unbiased estimator. But it will not be correct for the
variance sum((X - mean(X))^2)/n of the numbers X[1], ... ,X[n].

Since sd() also uses denominator = (n-1), sd(X) is the square root
of var(X). But, while var(X) is unbiased for VAR, sd(X) is not
unbiased for SD = sqrt(VAR), since, for a positive ransom variable Y,
in general

  E(sqrt(Y))  sqrt(E(Y))

i.e. E(sd(X)) = E(sqrt(var(X)))  sqrt(E(var(X))) = sqrt(VAR) = SD.

The R functions var() etc., which use denominator = (n-1), do not
have an option which allows the user to choose denominator = n.

Therefore, in particular, a user who has a set of numbers
  {x[1], x[2], ... , x[N]}
(e.g. a record of a popuation) and wants the SD of these (e.g. for
use in summary statistics Mean and SD), could inadvertently use
R's sd(x), expecting SD, without being alerted to the fact that it
will give the wrong answer. And the only way round it is to explicitly
write one's own correction, e.g.

  SD - function(x){n-length(x); sd(x)*sqrt((n-1)/n)}

Indeed, this topic has got me wondering how many times I may have
blindly used sd(x) in the past, as if it were going to give me the
standard (sum(x - mean(x))^2)/length(x) result!

As I wrote earlier, when there is more than one definition which
might be used

Re: [R] A basic statistics question

2014-08-12 Thread Ted Harding

On 12-Aug-2014 19:57:29 Ron Michael wrote:
 Hi,
 
 I would need to get a clarification on a quite fundamental statistics
 property, hope expeRts here would not mind if I post that here.
 
 I leant that variance-covariance matrix of the standardized data is equal to
 the correlation matrix for the unstandardized data. So I used following data.
 
 Data - structure(c(7L, 5L, 9L, 7L, 8L, 7L, 6L, 6L, 5L, 7L, 8L, 6L, 7L,  7L,
 6L, 7L, 7L, 6L, 8L, 6L, 7L, 7L, 7L, 8L, 7L, 9L, 8L, 7L, 7L,  0L, 10L, 10L,
 10L, 7L, 6L, 8L, 5L, 5L, 6L, 6L, 7L, 11L, 9L, 10L,  0L, 13L, 13L, 10L, 7L,
 7L, 7L, 10L, 7L, 5L, 8L, 7L, 10L, 10L,  10L, 6L, 7L, 6L, 6L, 8L, 8L, 7L, 7L,
 7L, 7L, 8L, 7L, 8L, 6L,  6L, 8L, 7L, 4L, 7L, 7L, 10L, 10L, 6L, 7L, 7L, 12L,
 12L, 8L, 5L,  5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 5L, 4L, 5L, 5L, 5L, 6L,
 7L, 5L, 7L, 5L, 7L, 7L, 7L, 7L, 8L, 7L, 6L, 7L, 7L, 6L, 7L, 7L,  6L, 4L, 4L,
 6L, 6L, 7L, 8L, 7L, 11L, 10L, 8L, 7L, 6L, 6L, 11L,  5L, 4L, 6L, 6L, 6L, 7L,
 8L, 7L, 12L, 4L, 4L, 2L, 5L, 6L, 7L,  6L, 6L, 5L, 6L, 5L, 7L, 7L, 7L, 6L, 5L,
 6L, 6L, 5L, 5L, 6L, 6L,  4L, 4L, 5L, 10L, 10L, 7L, 7L, 6L, 4L, 6L, 10L, 7L,
 4L, 6L, 6L,  6L, 8L, 8L, 8L, 7L, 8L, 9L, 10L, 7L, 6L, 6L, 8L, 6L, 8L, 3L, 
 3L, 4L, 5L, 5L, 6L, 5L, 5L, 6L, 4L, 8L, 7L, 3L, 5L, 6L, 9L, 8L,  9L, 10L, 8L,
 9L, 8L, 9L, 8L, 8L, 9L, 11L, 10L, 9L, 9L, 13L,
  13L,  10L, 7L, 7L, 7L, 9L, 8L, 7L, 6L, 10L, 8L, 7L, 8L, 8L, 3L, 4L,  3L, 7L,
 6L, 6L, 6L, 6L, 5L, 6L, 6L, 6L, 2L, 5L, 7L, 9L, 8L, 9L,  10L, 8L, 8L, 9L, 9L,
 11L, 11L, 11L, 10L, 9L, 9L, 11L, 2L, 3L,  2L, 2L, 2L, 1L, 4L, 4L, 2L, 2L, 1L,
 1L, 1L, 3L, 3L, 4L, 6L, 4L,  5L, 2L, 3L, 5L, 4L, 4L, 2L, 4L, 4L, 5L, 4L, 2L,
 7L, 3L, 3L, 10L,  13L, 11L, 9L, 9L, 7L, 8L, 9L, 6L, 7L, 6L, 5L, 3L, 13L, 3L,
 3L,  0L, 1L, 4L, 5L, 3L, 3L, 0L, 2L, 20L, 3L, 2L, 6L, 5L, 5L, 5L,  2L, 2L,
 5L, 5L, 5L, 4L, 3L, 4L, 4L, 3L, 4L, 10L, 10L, 9L, 8L,  4L, 4L, 8L, 7L, 10L,
 3L, 1L, 9L, 5L, 11L, 9L), .Dim = c(45L,  8L), .Dimnames = list(NULL, c(V1,
 V7, V13, V19, V25,  V31, V37, V43))) 
 
   
 Data_Normalized - apply(Data, 2, function(x) return((x - mean(x))/sd(x))) 
 
 (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1]
 
 
 
 Point is that I am not getting exact CORR matrix. Can somebody point me
 what I am missing here?
 
 Thanks for your pointer.

Try:
  Data_Normalized - apply(Data, 2, function(x) return((x - mean(x))/sd(x)))
  (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1)

and compare the result with

  cor(Data)

And why? Look at

  ?sd

and note that:

  Details:
 Like 'var' this uses denominator n - 1.

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 12-Aug-2014  Time: 22:32:26
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A basic statistics question

2014-08-12 Thread Ted Harding

On 12-Aug-2014 21:41:52 Rolf Turner wrote:
 On 13/08/14 07:57, Ron Michael wrote:
 Hi,

 I would need to get a clarification on a quite fundamental statistics
 property, hope expeRts here would not mind if I post that here.

 I leant that variance-covariance matrix of the standardized data is equal to
 the correlation matrix for the unstandardized data. So I used following
 data.
 
 SNIP
 
 (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1]

 Point is that I am not getting exact CORR matrix. Can somebody point
 me what I am missing here?
 
 You are using a denominator of n in calculating your covariance 
 matrix for your normalized data.  But these data were normalized using 
 the sd() function which (correctly) uses a denominator of n-1 so as to 
 obtain an unbiased estimator of the population standard deviation.
 
 If you calculated
 
 (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1)
 
 then you would get the same result as you get from cor(Data) (to within 
 about 1e-15).
 
 cheers,
 Rolf Turner

One could argue about (correctly)!

From the descriptive statistics point of view, if one is given a single
number x, then this dataset has no variation, so one could say that
sd(x) = 0. And this is what one would get with a denominator of n.

But if the single value x is viewed as sampled from a distribution
(with positive dispersion), then the value of x gives no information
about the SD of the distribution. If you use denominator (n-1) then
sd(x) = NA, i.e. is indeterminate (as it should be in this application).

The important thing when using pre-programmed functions is to know
which is being used. R uses (n-1), and this can be found from
looking at

  ?sd

or (with more detail) at

  ?cor

Ron had assumed that the denominator was n, apparently not being aware
that R uses (n-1).

Just a few thoughts ...
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 12-Aug-2014  Time: 23:22:09
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generate quasi-random positive numbers

2014-08-05 Thread Ted Harding

On 05-Aug-2014 10:27:54 Frederico Mestre wrote:
 Hello all:
 
 Is it possible to generate quasi-random positive numbers, given a standard
 deviation and mean? I need all positive values to have the same probability
 of selection (uniform distribution). Something like:
 
 runif(10, min = 0, max = 100)
 
 This way I'm generating random positive numbers from a uniform
 distribution. However, using runif I can't previously select SD and mean
 (as in rnorm).
 
 Alternatively, I'm able to generate a list of quasi-random numbers given a
 SD and a mean.
 
 b - (sqrt(SD^2*12)+(MEAN*2))/2
 a - (MEAN*2) - b
 x1 - runif(N,a,b)
 
 However, negative values might be included, since a can assume a negative
 value.
 
 Any help?
 
 Thanks,
 Frederico

There is an inevitable constraint on MEAN and SD for a uniform
ditribution of positive numbers. Say the parent distribution is
uniform on (a,b) with a = 0 and b  a.

Then MEAN = (a+b)/2, SD^2 = ((b-a)^2)/12, so

  12*SD^2  = b^2 - 2*a*b + a^2
  4*MEAN^2 = b^2 + 2*a*b + a^2

  4*MEAN^2 - 12*SD^2 = 4*a*b

  MEAN^2 - 3*SD^2 = a*b

Hence for a = 0 and b  a you must have MEAN^2 = 3*SD^2.

Once you have MEAN and SD satisfying this constraint, you should
be able to solve the equations for a and b.

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 05-Aug-2014  Time: 11:46:52
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simple permutation question

2014-06-25 Thread Ted Harding

I think Robert wants deterministic permutations. In the e1071
package -- load with library(e1071) -- there is a function
permutations():

  Description:
  Returns a matrix containing all permutations of the integers
  '1:n' (one permutation per row).

  Usage:
  permutations(n)

  Arguments:
  n: Number of element to permute.

so, starting with
  x - c(A,B,C,D,E)
  library(e1071)
  P - permutations(length(x))

then, for say the 27th of these 120 permutations of x,

  x[P[27,]]

will return it.

Ted.

On 25-Jun-2014 20:38:45 Cade, Brian wrote:
 It is called sample(,replace=F), where the default argument is sampling
 without replacement.
 Try
 x - c(A,B,C,D,E)
 sample(x)
 
 Brian
 
 Brian S. Cade, PhD
 
 U. S. Geological Survey
 Fort Collins Science Center
 2150 Centre Ave., Bldg. C
 Fort Collins, CO  80526-8818
 
 email:  ca...@usgs.gov brian_c...@usgs.gov
 tel:  970 226-9326
 
 
 
 On Wed, Jun 25, 2014 at 2:22 PM, Robert Latest boblat...@gmail.com wrote:
 
 So my company has hired a few young McKinsey guys from overseas for a
 couple of weeks to help us with a production line optimization. They
 probably charge what I make in a year, but that's OK because I just
 never have the time to really dive into one particular time, and I have
 to hand it to the consultants that they came up with one or two really
 clever ideas to model the production line. Of course it's up to me to
 feed them the real data which they then churn through their Excel
 models that they cook up during the nights in their hotel rooms, and
 which I then implement back into my experimental system using live data.

 Anyway, whenever they need something or come up with something I skip
 out of the room, hack it into R, export the CSV and come back in about
 half the time it takes Excel to even read in the data, let alone
 process it. Of course that gor them curious, and I showed off a couple
 of scripts that condense their abysmal Excel convolutions in a few
 lean and mean lines of R code.

 Anyway, I'm in my office with this really attractive, clever young
 McKinsey girl (I'm in my mid-forties, married with kids and all, but I
 still enjoyed impressing a woman with computer stuff, of all things!),
 and one of her models involves a simple permutation of five letters --
 A through E.

 And that's when I find out that R doesn't have a permutation function.
 How is that possible? R has EVERYTHING, but not that? I'm
 flabbergasted. Stumped. And now it's up to me to spend the evening at
 home coding that model, and the only thing I really need is that
 permutation.

 So this is my first attempt:

 perm.broken - function(x) {
 if (length(x) == 1) return(x)
 sapply(1:length(x), function(i) {
 cbind(x[i], perm(x[-i]))
 })
 }

 But it doesn't work:
  perm.broken(c(A, B, C))
  [,1] [,2] [,3]
 [1,] A  B  C
 [2,] A  B  C
 [3,] B  A  A
 [4,] C  C  B
 [5,] C  C  B
 [6,] B  A  A
 

 And I can't figure out for the life of me why. It should work because I
 go through the elements of x in order, use that in the leftmost column,
 and slap the permutation of the remaining elements to the right. What
 strikes me as particularly odd is that there doesn't even seem to be a
 systematic sequence of letters in any of the columns. OK, since I
 really need that function I wrote this piece of crap:

 perm.stupid - function(x) {
 b - as.matrix(expand.grid(rep(list(x), length(x
 b[!sapply(1:nrow(b), function(r) any(duplicated(b[r,]))),]
 }

 It works, but words cannot describe its ugliness. And it gets really
 slow really fast with growing x.

 So, anyway. My two questions are:
 1. Does R really, really, seriously lack a permutation function?
 2. OK, stop kidding me. So what's it called?
 3. Why doesn't my recursive function work, and what would a
working version look like?

 Thanks,
 robert

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 25-Jun-2014  Time: 21:55:42
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] C: drive memory full

2014-06-17 Thread Ted Harding

Don't forget the possibility of hidden files, which will not be
shown using normal file-listing procedures. It may be that R has
stored those files as hidden files.

See, for example:

http://windows.microsoft.com/en-gb/windows/show-hidden-files

http://www.sevenforums.com/tutorials/394-hidden-files-folders-show-hide.html

[NB: These are the results of a google search. I am no expert on
Windows myself ... ]

Hoping this helps,
Ted.

On 17-Jun-2014 12:48:54 Hiyoshi, Ayako wrote:
 Dear Martyn and Professor Ripley,
 
 Thank you so much for your help. I used Window's large file search (it was
 useful! thank you), but there is no big files detected in C: drive .
 Perhaps I will have to reinstall Windows..
 
 Thank you so much for your replies.
 
 Best wishes,
 Ayako
 
 From: Martyn Byng martyn.b...@nag.co.uk
 Sent: 17 June 2014 12:10
 To: Hiyoshi, Ayako; Prof Brian Ripley; r-help@R-project.org
 Subject: RE: [R] C: drive memory full
 
 Hi,
 
 Try
 
 http://social.technet.microsoft.com/wiki/contents/articles/19295.windows-8-how
 -to-search-for-large-files.aspx
 
 Martyn
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of Hiyoshi, Ayako
 Sent: 17 June 2014 11:40
 To: Prof Brian Ripley; r-help@R-project.org
 Subject: Re: [R] C: drive memory full
 
 Dear Professor Ripley,
 
 Thank you so much for your quick reply.
 
 I tried 'dianame(tempdir())' and removed several 'Rtemp' and all other files.
 Unfortunately it did not changed C: drive space much.
 
 I am really sorry, but could there be other things stored in somewhere in C:
 drive ?
 
 I called IT support, but they could not spot either.
 
 Kind regards,
 Ayako
 
 
 From: Prof Brian Ripley rip...@stats.ox.ac.uk
 Sent: 17 June 2014 10:37
 To: Hiyoshi, Ayako; r-help@R-project.org
 Subject: Re: [R] C: drive memory full
 
 tempdir() is specific to an R session.
 
 Start up R and run
 
 dirname(tempdir())
 
 and look at that directory.  Shut down R, then remove all old files/folders
 in that directory, especially those beginning with 'Rtmp'.
 
 An R process tries to clean up after itself but
 
 - it cannot do so if it segfaults 
 - Windows has rules which no other OS has which can result in files being
 locked and hence not deletable.
 
 
 On 17/06/2014 08:49, Hiyoshi, Ayako wrote:
 Dear R users,

 Hello, I am new to R and confused about my PC's memory space after
 using R a while.

 My PC is Windows 8, RAM 8GB. I have run some analyses using commands
 like vglm, aalen(Sruv()), lm()... some regressions.

 I also use Stata, and when I tried to run Stata (big file), Stata
 could not do something which used to do because of the lack of memory
 space. I suspect R's something because R is the only new activity I
 did recently.

 I tried to google and found 'tempdir()'.

 I checked my temporary file but it was empty.

 Just in case, after running 'unlink(tempdir(),recursive=TRUE)', I
 restarted my computer, but memory space did not change. But there
 seems still something big in my C: drive storage and nearly 12GB
 is eaten.

 Could it be possible that R saved something somewhere?

 As I finished analyses, all I need is to erase everything stored
 so that I can get my memory space back.
 
 Ayako
   [[alternative HTML version deleted]]

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 17-Jun-2014  Time: 14:14:40
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matrix column division by vector

2014-05-14 Thread Ted Harding

Maybe I am missing the point -- but what is wrong with line 3 of:

  m=rbind(c(6,4,2),c(3,2,1))
  v= c(3,2,1)
  m%*%diag(1/v)
  #  [,1] [,2] [,3]
  # [1,]222
  # [2,]111

Ted.

On 14-May-2014 15:03:36 Frede Aakmann Tøgersen wrote:
 Have a look at ?sweep
 
 Br. Frede
 
 
 Sendt fra Samsung mobil
  Oprindelig meddelelse 
 Fra: carol white
 Dato:14/05/2014 16.53 (GMT+01:00)
 Til: r-h...@stat.math.ethz.ch
 Emne: [R] matrix column division by vector
 
 Hi,
 What is the elegant script to divide the columns of a matrix by the
 respective position of a vector elements?
 
 m=rbind(c(6,4,2),c(3,2,1))
 
 v= c(3,2,1)
 
 res= 6/3   4/2  2/1
 3/3   2/21/1
 
 
 this is correct
 mat2 = NULL
 
 for (i in 1: ncol(m))
 
 mat2 = cbind(mat2, m[,i]/ v[i])
 
 
 but how to do more compact and elegant with for ex do.call?
 
 Many thanks
 
 Carol
 [[alternative HTML version deleted]]
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 14-May-2014  Time: 18:16:12
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Precedence and parentheses

2014-05-06 Thread Ted Harding

On 06-May-2014 18:09:12 Göran Broström wrote:
 A thread on r-devel (Historical NA question) went (finally) off-topic, 
 heading towards Precedence. This triggered a question that I think is 
 better put on this list:
 
 I have been more or less regularly been writing programs since the 
 seventies (Fortran, later C) and I early got the habit of using 
 parentheses almost everywhere, for two reasons. The first is the 
 obvious, to avoid mistakes with precedences, but the second is almost
 as important: Readability.
 
 Now, I think I have seen somewhere that unnecessary parentheses in  R 
 functions may slow down execution time considerably. Is this really 
 true, ant should I consequently get rid of my faiblesse for parentheses?
 Or are there rules for when it matters and doesn't matter?
 
 Thanks, Göran

I have much sympathy with your motives for using parentheses!
(And I have a similar computing history).

My general encouragement would be: continue using them when you
feel that each usage brings a benefit.

Of course, you would avoid silliness like
  x - (1*(2*(3*(4*(5)

and indeed, that does slow it down somewhat (it takes about twice
as long as a - 1*2*3*4*5):

  system.time(for(i in (1:1)) 1*2*3*4*5)
#  user  system elapsed 
# 0.032   0.000   0.032 

  system.time(for(i in (1:1)) (1*(2*(3*(4*(5))
#  user  system elapsed 
# 0.056   0.000   0.054 

The main reason, I suppose, is that a ( forces a new level
on a stack, which is not popped until the matching ) arrives.

Interestingly, the above silliness takes a little longer when
the nesting is the other way round:

  system.time(for(i in (1:1)) 1*2*3*4*5)
#  user  system elapsed 
# 0.028   0.000   0.029 

  system.time(for(i in (1:1)) (1)*2)*3)*4)*5) )
#  user  system elapsed 
#  0.052   0.000   0.081 

(though in fact the times are somwhat variable in both cases,
so I'm not sure of the value of the relationship).

Best wishes,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 06-May-2014  Time: 19:41:13
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with products in R ?

2014-05-04 Thread Ted Harding

On 04-May-2014 14:13:27 Jorge I Velez wrote:
 Try
 
 options(digits = 22)
 168988580159 * 36662978
# [1] 6195624596620653568
 
 HTH,
 Jorge.-

Err, not quite ... !
I hitch my horses to my plough (with help from R):

options(digits=22)
168988580159*8 = 1351908641272 (copy down)
168988580159*7 = 1182920061113  (   )
168988580159*9 = 1520897221431  (   )
168988580159*2 =  337977160318  (   )
168988580159*6 = 1013931480954  (   )^3
168988580159*3 =  506965740477  (   )

 1351908641272
11829200611130
   152089722143100
   337977160318000
 1013931480954
10139314809540
   101393148095400
   506965740477000
==
   6195624596620653502
[after adding up mentally]

compared with Jorge's:
   6195624596620653568

(02 vs 68 in the final two digits).

Alternatively, if using a unixoid system with 'bc' present,
one can try interfacing R with 'bc'. 'bc' is an calculating
engine which works to arbitrary precision.

There certainly used to be a utility in which R can evoke 'bc',
into which one can enter a 'bc' command and get the result
returned as a string, but I can't seem to find it on CRAN now.
In any case, the raw UNIX command line for this calculation
with 'bc' (with result) is:

$ bc -l
[...]
168988580159 * 36662978
6195624596620653502
quit

which agrees with my horse-drawn working.

Best wishes to all,
Ted.

 On Sun, May 4, 2014 at 10:44 PM, ARTENTOR Diego Tentor 
 diegotento...@gmail.com wrote:
 
 Trying algorithm for products with large numbers i encountered a difference
 between result of 168988580159 * 36662978 in my algorithm and r product.
 The Microsoft calculator confirm my number.

 Thanks.
 --
 *Gráfica ARTENTOR  *

 de Diego L. Tentor
 Echagüe 558
 Tel.:0343 4310119
 Paraná - Entre Ríos

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 04-May-2014  Time: 17:50:54
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with products in R ?

2014-05-04 Thread Ted Harding

Thanks for this link, Gabor (and especially for the link
therein to your posting on Thu May 7 14:10:53 CEST 2009).
This confirms that the R 'bc' package is not on CRAN and
points to where it can be sourced from. I used to have bc.R
installed on an old machine, which has gone into terminal
coma now.

Best wishes,
Ted.


On 04-May-2014 17:10:00 Gabor Grothendieck wrote:
 Checking this with the bc R package (https://code.google.com/p/r-bc/),
 the Ryacas package (CRAN), the gmp package (CRAN) and the Windows 8.1
 calculator all four give the same result:
 
 library(bc)
 bc(168988580159 * 36662978)
 [1] 6195624596620653502
 
 library(Ryacas)
 yacas(168988580159 * 36662978, retclass = character)
 6195624596620653502
 
 library(gmp)
 as.bigz(168988580159) * as.bigz(36662978)
 Big Integer ('bigz') :
 [1] 6195624596620653502
 
 
 On Sun, May 4, 2014 at 12:50 PM, Ted Harding ted.hard...@wlandres.net
 wrote:
 On 04-May-2014 14:13:27 Jorge I Velez wrote:
 Try

 options(digits = 22)
 168988580159 * 36662978
# [1] 6195624596620653568

 HTH,
 Jorge.-

 Err, not quite ... !
 I hitch my horses to my plough (with help from R):

 options(digits=22)
 168988580159*8 = 1351908641272 (copy down)
 168988580159*7 = 1182920061113  (   )
 168988580159*9 = 1520897221431  (   )
 168988580159*2 =  337977160318  (   )
 168988580159*6 = 1013931480954  (   )^3
 168988580159*3 =  506965740477  (   )

  1351908641272
 11829200611130
152089722143100
337977160318000
  1013931480954
 10139314809540
101393148095400
506965740477000
 ==
6195624596620653502
 [after adding up mentally]

 compared with Jorge's:
6195624596620653568

 (02 vs 68 in the final two digits).

 Alternatively, if using a unixoid system with 'bc' present,
 one can try interfacing R with 'bc'. 'bc' is an calculating
 engine which works to arbitrary precision.

 There certainly used to be a utility in which R can evoke 'bc',
 into which one can enter a 'bc' command and get the result
 returned as a string, but I can't seem to find it on CRAN now.
 In any case, the raw UNIX command line for this calculation
 with 'bc' (with result) is:

 $ bc -l
 [...]
 168988580159 * 36662978
 6195624596620653502
 quit

 which agrees with my horse-drawn working.

 Best wishes to all,
 Ted.

 On Sun, May 4, 2014 at 10:44 PM, ARTENTOR Diego Tentor 
 diegotento...@gmail.com wrote:

 Trying algorithm for products with large numbers i encountered a
 difference
 between result of 168988580159 * 36662978 in my algorithm and r product.
 The Microsoft calculator confirm my number.

 Thanks.
 --
 *GrÃ¡fica ARTENTOR  *

 de Diego L. Tentor
 EchagÃ¼e 558
 Tel.:0343 4310119
 ParanÃ¡ - Entre RÃos

 -
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Date: 04-May-2014  Time: 17:50:54
 This message was sent by XFMail

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 04-May-2014  Time: 18:28:23
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Getting a particular weekday for a given month

2014-04-07 Thread Ted Harding

On 07-Apr-2014 17:49:09 Christofer Bogaso wrote:
 Hi,
 Given a month name, I am looking for some script to figure out,
 what is the date for 3rd Wednesday. For example let say I have
 following month:
 
 library(zoo)
 Month - as.yearmon(as.Date(Sys.time()))
 
 I need to answer: What is the date for 3rd Wednesday of 'Month'?
 
 Really appreciate for any pointer.
 
 Thanks for your time.

The following may not suit you, but it the sort of approach I tend
to adopt myself, using things I know about rather than getting lost
in R documentation! (Outline of general method, not details). And it
also assumes you are using a Unixoid system (e.g. Linux or Mac OS2).

Your two commands currently give:

  library(zoo)
  Month - as.yearmon(as.Date(Sys.time()))
  Month
  # [1] Apr 2014

and it is straightforward to extract Apr and 2014 from Month.

This is the point at which I attach my horses to my wooden plough ...

In Unixoid systems there is a command 'cal' which, for 2014,
yields output:

$ cal 2014
 2014  

  January   February   March
Mo Tu We Th Fr Sa Su  Mo Tu We Th Fr Sa Su  Mo Tu We Th Fr Sa Su
   1  2  3  4  5  1  2  1  2
 6  7  8  9 10 11 12   3  4  5  6  7  8  9   3  4  5  6  7  8  9
13 14 15 16 17 18 19  10 11 12 13 14 15 16  10 11 12 13 14 15 16
20 21 22 23 24 25 26  17 18 19 20 21 22 23  17 18 19 20 21 22 23
27 28 29 30 3124 25 26 27 2824 25 26 27 28 29 30
31  
   April  May   June
Mo Tu We Th Fr Sa Su  Mo Tu We Th Fr Sa Su  Mo Tu We Th Fr Sa Su
1  2  3  4  5  61  2  3  4 1
 7  8  9 10 11 12 13   5  6  7  8  9 10 11   2  3  4  5  6  7  8
14 15 16 17 18 19 20  12 13 14 15 16 17 18   9 10 11 12 13 14 15
21 22 23 24 25 26 27  19 20 21 22 23 24 25  16 17 18 19 20 21 22
28 29 30  26 27 28 29 30 31 23 24 25 26 27 28 29
30  
July August  September  
Mo Tu We Th Fr Sa Su  Mo Tu We Th Fr Sa Su  Mo Tu We Th Fr Sa Su
1  2  3  4  5  6   1  2  3   1  2  3  4  5  6  7
 7  8  9 10 11 12 13   4  5  6  7  8  9 10   8  9 10 11 12 13 14
14 15 16 17 18 19 20  11 12 13 14 15 16 17  15 16 17 18 19 20 21
21 22 23 24 25 26 27  18 19 20 21 22 23 24  22 23 24 25 26 27 28
28 29 30 31   25 26 27 28 29 30 31  29 30   

  October   November  December  
Mo Tu We Th Fr Sa Su  Mo Tu We Th Fr Sa Su  Mo Tu We Th Fr Sa Su
   1  2  3  4  5  1  2   1  2  3  4  5  6  7
 6  7  8  9 10 11 12   3  4  5  6  7  8  9   8  9 10 11 12 13 14
13 14 15 16 17 18 19  10 11 12 13 14 15 16  15 16 17 18 19 20 21
20 21 22 23 24 25 26  17 18 19 20 21 22 23  22 23 24 25 26 27 28
27 28 29 30 3124 25 26 27 28 29 30  29 30 31

After the first two lines, this consists of 4 blocks, each with 8 rows,
each covering 3 months where each month consists of 7 columns, one
for each day of the week (Mo - Su). Each column occupies 3 character
spaces (excpet for the last -- only 2).

From April you can readily identify that this is the 4th month,
so you need to go to Month 1 of the 2nd block of rows. The We
column is the 3rd in that month, and you are looking for the
date of the 3rd Wednesday. So count down to the 3rd non-blank
entry[*] in this 3rd column, and you will find 16. Done.

[*] Some months, e.g. November above, have an initial blank
entry because this day belongs to the previous month.

Quite how you would program this efficiently in R is another matter!
But the principle is simple. To give R a text file to work on, at the
shell prompt use a command like:

$ cal 2014  cal2014.txt

and then cal2014.txt is accessible as a plain text file.

Even simpler (if it is only one particular month you want,
as in your example) is:

$ cal April 2014

which yields:

 April 2014 
Mo Tu We Th Fr Sa Su
1  2  3  4  5  6
 7  8  9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30

and now just count down the 3rd column (as before).

Maybe this helps ...
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 07-Apr-2014  Time: 19:29:41
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about lm()

2014-03-30 Thread Ted Harding

I suspect the problem may be with the structure of the data.

Si Qi L wrote:
  [...]
  acc1- lm(data$acc ~ dummy + data$Reqloanamount + data$Code +
  data$Code.1 + data$EmpNetMonthlyPay + data$Customer_Age + data$RTI)
  [...]
  These attributes are all numerical except the acc(factors)
  [...]

If, as he implies, the acc variable in data is a factor,
then lm() will not enjoy fitting an lm where the dependent
variables (response) is a factor!

Just a shot in the dark ...
Ted.

On 30-Mar-2014 18:46:27 Bert Gunter wrote:
 1. Post in plain text, not HTML.
 
 2. Read ?lm and note the data argument. Use it in your lm call instead
 of all the $data extractions.
 
 3. Your problem is with the summary() call, so read ?summary.lm. Learn
 about S3 methods if you do not know where the .lm part is coming
 from by reading the Introduction to R  tutorial that ships with R,
 which you should have read already anyway.
 
 Cheers,
 Bert
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 (650) 467-7374
 
 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
 H. Gilbert Welch
 
 
 
 
 On Sun, Mar 30, 2014 at 9:43 AM, Frank Schwidom schwi...@gmx.net wrote:
 Please provide some data from your variable data.

 Show the output of dput( data) or an subset
 of data which leads toe the specific error.

 Regards


 On Sun, Mar 30, 2014 at 02:23:09PM +0100, Si Qi L. wrote:
 Hi

 I have a problem with linear regression. This is my codes:

 acc1- lm(data$acc ~ dummy + data$Reqloanamount + data$Code + data$Code.1 +
 data$EmpNetMonthlyPay + data$Customer_Age + data$RTI)
 summary(acc1)

 These attributes are all numerical except the acc(factors), so how can I
 fix the problem R showed me? can anyone please help me out? Many thanks!

 But the R shows:

 Residuals:Error in quantile.default(resid) : factors are not allowedIn
 addition: Warning message:In Ops.factor(r, 2) : ^ not meaningful for
 factors

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 30-Mar-2014  Time: 21:12:16
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] points with-in boundaries of a map

2014-03-23 Thread Ted Harding

On 23-Mar-2014 22:50:50 Jim Lemon wrote:
 On 03/23/2014 10:29 PM, eliza botto wrote:
 Thankyou very much jim. it worked! but regarding second part of my
 question, isn't there a way to read the coordinates of intersecting
 lines with the premises of the map?
 
 Hi Eliza,
 I think you want the locator function, which will return the 
 coordinates to the accuracy that you can click on a point in the plot.
 
 Jim

There is a possible analytical approach to this. If the boundary of
a simple closed region is given as a series of straight lines which
join successive points as one proceeds continuously round the boundary
of theregion, and one has the data of the (x,y) coordinates of the points
in that order (with the last point the same as the first), then one
can proceed as follows:

Let the points be P1=(x1,y1), P2=(x2,y2), ... , PN=(xN,yN)=P1
in the order specified above.

Since the grid lines have been drawn at specified intervals, you
can work through every point (x0,y0) which is an intersection.

Let (x0,y0) be a point of intersection of the grid of lines. Now you
know the coordinates of (x0,y0) already, so the question is whether
this point is inside or outside the region.

Consider a line jpoining (x0,y0) to (x1,y1). Now work out the angle
through which the line rotates when it is moved so that it joins
(x0,y0) to (x2,y2); say this is phi2 (clockwise positive, anticlockise
negative). Similarly work out phi3, the angle through which the line
next rotates when it is further moved so that it joins (x0,y0) to (x2,y2);
and so on.

Now add up phi2 + phi3 + ... + phiN

If the point (x0,y0) is inside the region, and you have proceeded
consecutively along the boundary, then this sum of angles will be
pi or -pi. If (x0,y0) is outside the region, then this sum will be zero.

So you can work through all the intersections of the grid lines,
and for each intersection you can determine whether it is inside
or outside; and, if it is inside, you already know its coordinates.

It can get more complicated if the region is not simply-connected.
E.g. if you do not want to count points (x0,y0) which are within
the boundary coastline but also are within the boundary of an
inland lake; or, further, you do want to count points which are
on an island in the lake; and so on.

The essential for following a procedure of this kind is that you
can extract from the data from which the map is drawn the coordinates
of a series of points which are in consecutive order as one proceeds
along the boundar of a region. Not all geographic data have the
coordinates in such an order -- one can find datasets such that the
boundary is drawn as a set of separate partial boundaries which
are in no particular order as a whole; and in some datasets the
different separate parts of the boundary do not exactly match up
at the points where they should exactly join.

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 23-Mar-2014  Time: 23:47:15
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rounding to whole number

2014-03-20 Thread Ted Harding

On 20-Mar-2014 19:42:35 Kristi Glover wrote:
 Hi R User, 
 I was trying to convert a decimal value into integral (whole number). I used
 round function but some of the cells have the value less than 0.5 and it
 converted into 0. But I wanted these cell to have 1 instead of 0. Another
 way, I could multiply by 10. But l did not want it because it affected my
 results later.
 I was wondering about how I can convert 0.2 to 1. For example 
 data-structure(list(site1 = structure(1:4, .Label = c(s1, s2, 
 s3, s4), class = factor), A = c(0, 2.3, 2.6, 1.3), B = c(0.5, 
 0.17, 2.9, 3)), .Names = c(site1, A, B), class = data.frame,
 row.names = c(NA, 
 -4L))
 output-structure(list(site1 = structure(1:4, .Label = c(s1, s2, 
 s3, s4), class = factor), A = c(0L, 3L, 3L, 2L), B = c(1L, 
 1L, 3L, 3L)), .Names = c(site1, A, B), class = data.frame, row.names
 = c(NA, 
 -4L))
 
 Thanks for your suggestions in advance.
 cheers,
 KG

Hi Kristi,
the function round() has peculiarites (because of the IEEE standard).
E.g.

  round(1+1/2)
  # [1] 2
  round(0+1/2)
  # [1] 0

i.e. in equipoised cases like these, it rounds to the nearest *even
number; otherwise, if it is nearer to one integer value (above or below)
than to the other (below or above), then it rounds to the nearest integer
value. There are two unambiguous functions you could use for equipoised
cases, depending on which way you want to go:

  floor(1+1/2)
  # [1] 1
  floor(0+1/2)
  # [1] 0

  ceiling(1+1/2)
  # [1] 2
  ceiling(0+1/2)
  # [1] 1

The latter would certainly meet your request for how I can convert
0.2 to 1, since

  ceiling(0.2)
  # [1] 1

But it will not round, say, (1+1/4) to the nearest integer.

On the other hand, if you want a function which rounds to the
nearest integer when not exactly halfway between, and rounds
either always up or always down when the fractional part is exactly 1/2,
then I think (but others will probably correct me) that you may have
to write your own -- say roundup() or rounddown():

  roundup - function(x){
if((x-floor(x))==1/2){
  ceiling(x)
} else {round(x)}
  }

  rounddown - function(x){
if((x-floor(x))==1/2){
  floor(x)
} else {round(x)}
  }

Also have a look at the help page

  ?round

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 20-Mar-2014  Time: 20:04:20
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pattern Matching

2014-03-02 Thread Ted Harding

On 02-Mar-2014 20:12:57 Benno Pütz wrote:
 Try
 
 as.numeric(sub(.*\\(,, sub('\\)','',aa)))
 
 You may also want to look at regexec/regmatches for a more general approach
 ...
 
 On 02 Mar 2014, at 20:55, Doran, Harold hdo...@air.org wrote:
 
 1 (472) 2 (445) 3 (431) 3 (431) 5 (415) 6 (405) 7 (1)
 
 Benno Pütz

Another formulation, which breaks it into steps and may therefore
be easier to adopt to similar but different cases, is

  aa0-gsub(^[0-9]+ ,,aa)
  aa0
  # [1] (472) (445) (431) (431) (415) (405) (1)

  as.numeric(gsub([()],,aa0))
  # [1] 472 445 431 431 415 405   1

Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 02-Mar-2014  Time: 20:57:07
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] geneation

2014-02-20 Thread Ted Harding

[see at end]
On 20-Feb-2014 10:47:50 Rui Barradas wrote:
 Hello,
 
 I'm not sure I understand the question. When you use set.seed, it will 
 have effect in all calls to the random number generator following it. So 
 the value for y is also fixed.
 As for your code, you don't need the second set.seed. And though it is 
 not syntatically incorrect, the way you are coding it is not very usual. 
 Try instead
 
 set.seed(100)
 x - 10*runif(10)
 x
 
 y - rnorm(10)
 y
 
 y - rnorm(10)  #different y
 y
 
 
 Hope this helps,
 
 Rui Barradas
 
 Em 20-02-2014 07:05, IZHAK shabsogh escreveu:
 how do i use set.seed? for example i want to generate fix x with different
 value of y each time i.e

 genarate x-rnorm(10)
 generate y-rnorm(10)
 i want have x fix but y changes at each iteration. this what i try but is
 not working


 {
 set.seed(100)

 x-10*runif(10)
 }
 x
 set.seed(y-rnorm(10))
 y

It seems clear that Izhak seeks to detach the random generation of y
from the random generation of x after using set.seed(). On my reading of
  ?RNG
once set.seed has been used, as RUI says, it affects all subsequent
calls to the generator. Initially, however:

  Note:
  Initially, there is no seed; a new one is created from the current
  time when one is required.  Hence, different sessions started at
  (sufficiently) different times will give different simulation
  results, by default.

But, even so, it still seems (perhaps) that using a RNG without the
call to set.seed() will still establish a seed (and its consequences)
for that session (in effect there is an implicit call to set.seed()).

This leads me to suggest that a useful innovation could be to add a
feature to set.seed() so that

  set.seed(NULL)

(which currently generates an error) would undo the effect of any
previous (explicit or implicit) call to set.seed() so that, for instance,

  set.seed(100)
  x-10*runif(10)

  set.seed(NULL)
  y - rnorm(10)

would result in y being generated from a seed which was set from the
system clock. There is no form of argument to set.seed() which instructs
it to take its value from the system clock (or other sourc of external
random events); and indeed it seems that only the system clock is available.

On Linux/UNIX systems, at least, there is a possible accessible source
of external randomness, namely '/dev/random', and its companion
'/dev/urandom'. This accumulates random noise from high-resolution timings
of system events like key-presses, mouse-clicks, disk accesses, etc.,
which take place under external influences and are by nature irregular in
timings.

The difference between /dev/random and /dev/urandom is that one read
from /dev/random effectively resets it, and further reads may be blocked
until sufficient new noise has accumulated; while repeated reads from
/dev/urandom are always possible (though with short time-lapses there
may not be much difference between successive reads).

The basic mechanism for this is via the command 'dd', on the lines of

  dd if=/dev/urandom of=newseed count=1 bs=32

which makes one (count=1) read from input file (if) /dev/urandom
of 32 bytes (bs=32) into the output file (of) newseed.

When I ran the above command on my machine just now, and inspected the
results in hex notation ('od -x newseed') I got (on-screen):

od -x newseed
000 4be9 7634 41cf 5e17 b068 7898 879e 8b5f
020 fb4f 52e6 59ef 0b58 5258 4a3a df04 c18d
040

where the initial 000 etc. denote byte-counts to the beginning
of the current line (expressed also in hex); so the actual byte content
of newseed is:

  4b e9 76 34 41 cf 5e 17 b0 68 78 98 87 9e 8b 5f
  fb 4f 52 e6 59 ef 0b 58 52 58 4a 3a df 04 c1 8d

This could be achieved via a system() call from R; and the contents
of newseed would then need to be converted into a format suitable
for use as argument to set.seed().

For the time being (not having time just now) I leave the details
to others ...
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 20-Feb-2014  Time: 12:00:07
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculate probability of union of independent events

2014-02-18 Thread Ted Harding

On 18-Feb-2014 22:08:38 Meinfelder, Florian wrote:
 Dear all,
 
 I am looking for a way to calculate the probability of the union of k
 independent events in R. Is there a function that takes a k-dimensional
 probability vector as input and returns p(A_1 U A_2 U...U A_k)?
 
 Thanks,
 Florian

I don't know (off-hand); but it is very easy to write one's own!

Since

  P(A1 U A2 U ... U Ak )

= 1 - P(not(A1 U A2 U ... u Ak))

= 1 - P((not A1)  (not A2)  ...  (not Ak))

= 1 - P(not A1)*P(not A2)* ... *P(not Ak)  [by independence]

= 1 - (1-p1)*(1-p2)* ... *(1-pk)

where pj is P(Aj). Hence

  punion - function(p){1 - prod(1-p)}

should do it!

Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 18-Feb-2014  Time: 23:51:31
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to plot a shifted Gamma distribution

2014-02-13 Thread Ted Harding

On 13-Feb-2014 15:30:43 Rodrigo Cesar Silva wrote:
 I have the parameters of a gamma distribution that I am trying to plot.
 The parameters are shape = 2, scale = 5.390275 and the minimum value
 x0 is 65.44945.
 
 Since the gamma is shifted by x0, we have
 
Mean = shape*scale + x0 = 76.23
 
 My question is, how can I do it in r?
 
 I was trying to use dgamma function, but I saw no options to offset
 the gamma distribution. I can only use the shape and scale parameters,
 like this:
 
 x - seq(from=0, to=100, length.out=100)
 
 plot(x, dgamma(x, shape=2, scale=5.390275),
   main=Gamma,type='l')

If all that is happening is that the distribution is the same as that
of a Gamma distribution with origin at 0, but simply shifted to the
right by an amount x0 (though I am wondering what context this could
arise in), then the commands

  x - seq(from=0, to=100, length.out=100)
  x0 - 65.44945
  plot(x+x0, dgamma(x, shape=2, scale=5.390275),
   main=Gamma,type='l')

will produce such a plot. However, I wonder if you have correctly
expressed the problem!

Ted.

 This generates a distribution with origin equal zero, but I want the
 origin to be x0
 
 How can I handle shifted gamma distribution in r?
 
 Thanks a lot!
 Rodrigo.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 13-Feb-2014  Time: 17:27:29
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating an equivalent of r-help on r.stackexchange.com ?

2014-02-03 Thread Ted Harding

Ditto. And ditto. And (by the way -- no-one seems to have mentioned it)
what are the possibilities, for mail appearing on something like Stack
Exchange, of having the mail sent to oneself so that it can be stored
locally, on one's own machine? That is the only way I would want to
work -- anything interesting is sitting in my disk, I can edit it if
I wish, I can make local copies, etc. etc. etc. etc. Anything which is
not interesting gets deleted (though I can always dig into R-help
archives if need be).

Best wishes,
Ted.
 
On 03-Feb-2014 21:36:21 Rolf Turner wrote:
 
 For what it's worth, I would like to say that I concur completely with 
 Don and Bert.  (Also I would like second Bert's vote of thanks to Don 
 for expressing the position so clearly.)
 
 cheers,
 
 Rolf Turner
 
 On 04/02/14 09:56, Bert Gunter wrote:
 Don:

 First, I apologize if this is off topic, but I thought I should reply
 publicly.

 I would only like to say thank you for so eloquently and elegantly
 summarizing my views, also. Maybe that makes me a dinosaur. If so, I
 happily accept the label.

 I find SO's voting for posting business especially irritating. I wish
 merely to post or to read the posts of others without being subjected
 to some kind of online pseudo game and ratings competition. That alone
 keeps me away. But Don said it better.

 I realize that I may be out of step with the masses here, and the
 masses should certainly decide. Hopefully I won't be around if/when
 they decide that R-help should go.

 Best,
 Bert

 Bert Gunter
 Genentech Nonclinical Biostatistics
 (650) 467-7374

 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
 H. Gilbert Welch




 On Mon, Feb 3, 2014 at 12:42 PM, MacQueen, Don macque...@llnl.gov wrote:
 Every browser-based interface I've ever seen has a number of features that
 I find to be huge deterrents. To mention just two:

 - They waste copious amounts of screen space on irrelevant things such as
 votes, the number of views, the elapsed time since something or other
 happened, fancy web-page headers, and so on. Oh, and advertisements. The
 Mathematica stackexchange example given in a link in one of the emails
 below (http://mathematica.stackexchange.com/) illustrates these
 shortcomings -- and it's not the worst such example.

 - In most if not all cases, one has to login before posting. I have too
 many usernames and passwords as it is.

 Right now, at this very moment, in my email client's window I can see and
 browse the subject lines of 20 threads in r-help. And that's using only
 about half of my screens vertical space. In contrast, in the Mathematica
 stackexchange example, I can see at most 10, and that only by using the
 entire vertical space of my screen. The From column in my email client
 shows the names of several of the people contributing to the thread, which
 the browser interface does not. In the email client, I can move through
 messages, and between messages in a thread using my keyboard. In a
 browser, I have to do lots of mousing and clicking, which is much less
 efficient.

 As it is now, r-help messages come to me. I don't have to start up a
 browser. So it's much easier to go take a quick look at what's new at any
 time.

 True, I had to subscribe to the mailing list, which involves a username
 and password. But once it's done, it's done. I don't have to login before
 posting, which means I don't have to remember yet another username and
 password.

 What ...duplicated efforts of monitoring multiple mailing lists)? I have
 no duplicated effort...in fact, I have almost no effort at all, since the
 messages come to me. There was some initial setup, i.e., to filter
 different r-* messages to different mailboxes in my email client, but now
 that that's done, it's as simple as clicking on the correct mailbox.

 In other words, in every way that's important to me, the mailing list
 approach is superior. I do not support abandoning the mailing list system
 for any alternative.

 -Don

 --
 Don MacQueen

 Lawrence Livermore National Laboratory
 7000 East Ave., L-627
 Livermore, CA 94550
 925-423-1062





 On 2/2/14 1:49 PM, Liviu Andronic landronim...@gmail.com wrote:

 Dear Duncan,
 I discovered something interesting wrt to the licensing and mirroring
 of user-contributed material on StackExchange.  Please read below.


 On Sun, Nov 24, 2013 at 9:00 PM, Duncan Murdoch
 murdoch.dun...@gmail.com wrote:
 I'm not aware of a discussion on this, but I would say no.
 Fragmentation is bad. Further fragmentation is worse.

 TL;DR
 =

 Actually I'd say all mailing lists except r-devel should be moving to
 StackOverlow in the future (disclaimer: I'm not affiliated with it).


 I would generally agree with you, except for a few points.

 1.  I avoid StackOverflow, because they claim copyright on the
 compilation.
 As I read their terms of service, it would be illegal for anyone to
 download
 and duplicate all postings about R.  So

Re: [R] How to subscribe this mailing list

2014-01-12 Thread Ted Harding

[Also inline]
On 12-Jan-2014 17:45:03 Rui Barradas wrote:
 Hello,
 
 Inline.
 
 Em 12-01-2014 10:48, gj1989lh escreveu:
 Hi,


 How can I subscribe this mailing list?
 
 Apparently, you already have.
 Welcome.

Well, apparently not. gj1989lh is not listed in the R-help
subscriber list, and the original posting was moderator approved,
so had been held for moderation (presumably because the address
was not subscribed).

The way to subscribe is to visit the web-page:

  https://stat.ethz.ch/mailman/listinfo/r-help

and follow the instructions which are set out in the section
Subscribing to R-help
(including the instructions relating to the follow-up confirmation
email which the subscriber will subsequently receive).

Hoping this helps (and replying to the R-help list so that
everone else can see that it has been dealt with).

The best address for enquiries about subscribing to/using/posting
to R-help is

  r-help-ow...@r-project.org

Ted.

 thx
  [[alternative HTML version deleted]]
 
 Don't post in html, please.
 
 Rui Barradas

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 12-Jan-2014  Time: 18:01:25
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Tracking what R actually executes

2014-01-02 Thread Ted Harding

On 02-Jan-2014 23:55:28 Duncan Murdoch wrote:
 On 14-01-02 6:05 PM, Fisher Dennis wrote:
 R 3.0.2
 All platforms

 Colleagues

 This question is probably conceptual rather than technical and I have not
 thought out all of the issues yet.  Lets say I have an extensive list of
 functions and some lines of code that call the functions.  I would like to
 have a record of all the commands that were actually executed.  I realize
 that this could be voluminous but it might be necessary.
 For example, the code and function might be:

 ###
 INPUT:
 COUNTER  - function(N)
  for (i in 1:N)  cat(count, i, \n)
 COUNTER(10)

 ###
 OUTPUT:
 cat(count, 1, \n)
 cat(count, 2, \n)
 cat(count, 3, \n)
 cat(count, 4, \n)
 cat(count, 5, \n)
 cat(count, 6, \n)
 cat(count, 7, \n)
 cat(count, 8, \n)
 cat(count, 9, \n)
 cat(count, 10, \n)

 #
 If I have formulated the question poorly, please do you best to understand
 the intent.

 Dennis
 
 As far as I know, R doesn't have exactly this built in, but the Rprof() 
 function gives an approximation.  It will interrupt the execution at a 
 regular time interval (1/50 sec is the default, I think), and record all 
 functions that are currently active on the execution stack.  So tiny 
 little functions could be missed, but bigger ones probably won't be.
 
 There are also options to Rprof to give other profiling information, 
 including more detail on execution position (down to the line number), 
 and various measures of memory use.
 
 Duncan Murdoch

Also have a look at

  ?trace

which you may be able to use for what you want.
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 03-Jan-2014  Time: 00:14:51
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Season's Greetings (and great news ... )!

2013-12-22 Thread Ted Harding

 machine precision ...

Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 22-Dec-2013  Time: 09:59:00
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Season's Greetings (and great news ... )!

2013-12-22 Thread Ted Harding

Thanks for the comments, Bert and Mehmet! It is of course a serious
and interesting area to explore (and I'm aware of the chaos context;
I initially got into this areas year ago when I was exploring the
possibilities for chaos in fish population dynamics -- and they're
certainly there)!

But, before anyone takes my posting *too* seriously, let me say that
it was written tongue-in-cheek (or whatever the keyboard analogue of
that may be). I'm certainly not blaming R.

Have fun anyway!
Ted.

On 22-Dec-2013 17:35:56 Bert Gunter wrote:
 Yes.
 
 See also Feigenbaum's constant and chaos theory for the general context.
 
 Cheers,
 Bert
 
 On Sun, Dec 22, 2013 at 8:54 AM, Suzen, Mehmet msu...@gmail.com wrote:
 I wouldn't blame R for floating-point arithmetic and our personal
 feeling of what 'zero' should be.

 options(digits=20)
 pi
 [1] 3.141592653589793116
 sqrt(pi)^2
 [1] 3.1415926535897926719
 (pi - sqrt(pi)^2)  1e-15
 [1] TRUE

 There was a similar post before, for example see:
 http://r.789695.n4.nabble.com/Why-does-sin-pi-not-return-0-td4676963.html

 There is an example by Martin Maechler (author of Rmpfr) on how to use
 arbitrary precision
 with your arithmetic.

 On 22 December 2013 10:59, Ted Harding ted.hard...@wlandres.net wrote:
 Greetings All!
 With the Festive Season fast approaching, I bring you joy
 with the news (which you will surely wish to celebrate)
 that R cannot do arithmetic!

 Usually, this is manifest in a trivial way when users report
 puzzlement that, for instance,

   sqrt(pi)^2 == pi
   # [1] FALSE

 which is the result of a (generally trivial) rounding or
 truncation error:

   sqrt(pi)^2 - pi
   [1] -4.440892e-16

 But for some very simple calculations R goes off its head.

 I had originally posted this example some years ago, but I
 have since generalised it, and the generalisation is even
 more entertaining than the original.

 The Original:
 Consider a sequence generated by the recurrence relation

   x[n+1] = 2*x[n] if 0 = x[n] = 1/2
   x[n+1] = 2*(1 - x[n]) if 1/2  x[n] = 1

 (for 0 = x[n] = 1).

 This has equilibrium points (x[n+1] = x[n]) at x[n] = 0
 and at x[n] = 2/3:

   2/3 - 2*(1 - 2/3) = 2/3

 It also has periodic points, e.g.

   2/5 - 4/5 - 2/5 (period 2)
   2/9 - 4/9 - 8/9 - 2/9 (period 3)

 The recurrence relation can be implemented as the R function

   nextx - function(x){
 if( (0=x)(x=1/2) ) {x - 2*x} else {x - 2*(1 - x)}
   }

 Now have a look at what happens when we start at the equilibrium
 point x = 2/3:

   N - 1 ; x - 2/3
   while(x  0){
 cat(sprintf(%i: %.9f\n,N,x))
 x - nextx(x) ; N - N+1
   }
   cat(sprintf(%i: %.9f\n,N,x))

 Run that, and you will see that successive values of x collapse
 towards zero. Things look fine to start with:

   1: 0.7
   2: 0.7
   3: 0.7
   4: 0.7
   5: 0.7
   ...

 but, later on,

   24: 0.7
   25: 0.6
   26: 0.8
   27: 0.4
   28: 0.66672
   ...

   46: 0.667968750
   47: 0.664062500
   48: 0.671875000
   49: 0.65625
   50: 0.68750
   51: 0.62500
   52: 0.75000
   53: 0.5
   54: 1.0
   55: 0.0

 What is happening is that, each time R multiplies by 2, the binary
 representation is shifted up by one and a zero bit is introduced
 at the bottom end. To illustrate this, do the calculation in
 7-bit arithmetic where 2/3 = 0.1010101, so:

 0.1010101  x[1], 1/2 so subtract from 1 = 1.000 - 0.0101011,
 and then multiply by 2 to get x[2] = 0.1010110. Hence

 0.1010101  x[1] - 2*(1 - 0.1010101) = 2*0.0101011 -
 0.1010110  x[2] - 2*(1 - 0.1010110) = 2*0.0101010 -
 0.1010100  x[3] - 2*(1 - 0.1010100) = 2*0.0101100 -
 0.1011000  x[4] - 2*(1 - 0.1011000) = 2*0.0101000 -
 0.101  x[5] - 2*(1 - 0.101) = 2*0.011 -
 0.110  x[6] - 2*(1 - 0.110) = 2*0.010 -
 0.100  x[7] - 2*0.100 = 1.000 -
 1.000  x[8] - 2*(1 - 1.000) = 2*0 -
 0.000  x[9] and the end of the line.

 The final index of x[i] is i=9, 2 more than the number of binary
 places (7) in this arithmetic, since 8 successive zeros have to
 be introduced. It is the same with the real R calculation since
 this is working to .Machine$double.digits = 53 binary places;
 it just takes longer (we reach 0 at x[55])! The above collapse
 to 0 occurs for any starting value in this simple example (except
 for multiples of 1/(2^k), when it works properly).

 Generalisation:
 This is basically the same, except that everything is multiplied
 by a scale factor S, so instead of being on the interval [0,1].
 it is on [0,S], and

   x[n+1] = 2*x[n] if 0 = x[n] = S/2
   x[n+1] = 2*(S - x[n]) if S/2  x[n] = S
 (for 0 = x[n] = S).

 Again, x[n] = 2*S/3 is an equilibrium point. 2*S/3  S/2, so

   x[n] - 2*(S - 2*S/3) = 2*(S/3) = 2*S/3

 Functions to implement this:

   nxtS - function(x,S){
 if((x = 0)(x = S/2)){ x- 2*x } else {x - 2*(S-x)}
   }

   S - 6 ##  Or some other value of S
   Nits - 100
   x - 2*S/3
   N - 1 ; print(c(N,x

Re: [R] Converting decimal to binary in R

2013-12-14 Thread Ted Harding

 On Fri, Dec 13, 2013 at 10:11 PM, æ°´éæµæ·± 1248283...@qq.com wrote:
 i  have write a function to convert decimal number into binary number in R.

 dectobin-function(x){
   as.numeric(intToBits(x))-x1
   paste(x1,collapse=)-x2
   as.numeric(gsub(0+$,,x2))-x3
   return(as.character(x3))}

 dectobin can get right result ,it is so long ,is there a  build-in function
 to do ?

On 14-Dec-2013 06:17:30 Richard M. Heiberger wrote:
 I recommend
 ?sprintf
 
 (4^(1/3))^3 != 4
 (4^(1/3))^3 == 4
 (4^(1/3))^3 - 4
 format(c((4^(1/3))^3 , 4), digits=17)
 sprintf(%+13.13a, c((4^(1/3))^3 , 4))

The above generates a hexadecinal representation, not binary!
So it needs further substitutions to get the binary representation.

Can I add a tip which I have very often found useful for this kind
of global substitution? Not just binary -- I first cooked it up
in text-editing when faced with replacing European numbers by
Anglo-Saxon numbers -- e.g. 1.234.567,89 needs to be converted
into 1,234,567.89, therefore swapping . and ,. But you don't
want to do . -- , and then , -- . since you would then
end up with  1.234.567,89 -- 1,234,567,89 -- 1.234.567.89

There, the trick was to use a character such as #, which does
not appear in the text, as a marker for the first substitution while
the second is being made. Then substitute the desired character for #:
1.234.567,89 -- 1#234#567,89 -- 1#234#567.89 -- 1,234,567.89
(first replacing . by #, then finally # by ,).

You need to replace, for instance, 0 in hex by  in binary,
1 by 0001, 2 by 0010, ... , A by 1010, and so on.
However, you need to avoid replacing already-replaced symbols.

So I suggest using, in a first round, U for 1 and Z for 0
(or whatever you prefer, provided it avoids 0 and 1).
So 0 - , 1 - ZZZU, ... , A - UZUZ, etc.
Then, finally, replace each Z by 0 and each U by 1.

Hence (using a truncated representation), sqrt(pi) in hex is:

  sprintf(%+8.8A, sqrt(pi))
  # [1] +0X1.C5BF891BP+0

Then the successive substitutions (which can of course be programmed)
would be:

+0X1.C5BF891BP+0

0: +X1.C5BF891BP+
1: +XZZZU.C5BF89ZZZUBP+
2: +XZZZU.C5BF89ZZZUBP+
3: +XZZZU.C5BF89ZZZUBP+
4: +XZZZU.C5BF89ZZZUBP+
5: +XZZZU.CZUZUBF89ZZZUBP+
6: +XZZZU.CZUZUBF89ZZZUBP+
7: +XZZZU.CZUZUBF89ZZZUBP+
8: +XZZZU.CZUZUBFUZZZ9ZZZUBP+
9: +XZZZU.CZUZUBFUZZZUZZUZZZUBP+
A: +XZZZU.CZUZUBFUZZZUZZUZZZUBP+
B: +XZZZU.CZUZUUZUUFUZZZUZZUZZZUUZUUP+
C: +XZZZU.UUZZZUZUUZUUFUZZZUZZUZZZUUZUUP+
D: +XZZZU.UUZZZUZUUZUUFUZZZUZZUZZZUUZUUP+
E: +XZZZU.UUZZZUZUUZUUFUZZZUZZUZZZUUZUUP+
F: +XZZZU.UUZZZUZUUZUUUZZZUZZUZZZUUZUUP+

Z: +X000U.UU000U0UU0UUU000U00U000UU0UUP+
U: +X0001.1100010110111000100100011011P+

The final result probably needs tidying up in accordance with
the needs of subsequent uses!

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 14-Dec-2013  Time: 10:50:03
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] iterated sum

2013-12-14 Thread Ted Harding

On 14-Dec-2013 10:46:10 Ë®¾²Á÷Éî wrote:
 x-c(1,4,9,20,3,7)
 i want to get a serie c(5,13,29,23,10).
  y - c()
  for (i in 2:length(x)){
  y[i-1] - x[i-1]+x[i]}
 
 is there more simple way to get?

  x - c(1,4,9,20,3,7)
  N - length(x)

  x[1:(N-1)] + x[2:N]
  # [1]  5 13 29 23 10

Best wishes,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 14-Dec-2013  Time: 10:54:00
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fortune? [was: Re: quotation marks and scan]

2013-11-17 Thread Ted Harding

[See in-line below]
On 17-Nov-2013 22:38:30 Rolf Turner wrote:
 
 (1) The backslashes are not really there; they are an artefact of the R 
 print() function.
 Try cat(u,\n).  I think this might be an FAQ.
 
 (2) Is not your problem the fact that your are setting replacement 
 equal to the
 thing you are trying to get rid of?  I.e. don't you want
 
  v - gsub(pattern='\',replacement='',x=u) ???
 


 Either I am misunderstanding your intent or you need another cup of coffee.

Is the above line a Fortune?


  cheers,
 
  Rolf
 
 On 11/18/13 11:07, Erin Hodgess wrote:
 Dear R People:

 I'm sure that this is a very simple problem, but I have been wresting with
 it for some time.

 I have the following file that has the following one line:

   CRS(+init=epsg:28992)

 Fair enough.  I scan it into R and get the following:

 u
 [1] CRS(\+init=epsg:28992\)
 gsub(pattern='\',replacement='',x=u)
 [1] CRS(\+init=epsg:28992\)

 I need to get rid of the extra quotation marks and slashes.  I've tried all
 sorts of things, including gsub, as you see,  but no good.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 17-Nov-2013  Time: 23:55:48
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] all combinations with replacement not ordered

2013-11-07 Thread Ted Harding

On 07-Nov-2013 13:38:29 Konstantin Tretiakov wrote:
 Hello!
 
 I need to obtain all possible combinations with replacement when
 order is not important.
 E.g. I have a population x{1,2,3}.
 So I can get (choose(3+3-1,3)=) 10 combinations from this population
 with 'size=3'.
 How can I get a list of all that combinations?
 
 I have tried 'expand.grid()' and managed to get only samples where
 order is important.
 'combn()' gave me samples without replacement.
 
 Best regards,
 Konstantin Tretyakov.

From your description I infer that, from {1,2,3}, you want the result:

  1 1 1
  1 1 2
  1 1 3
  1 2 2
  1 2 3
  1 3 3
  2 2 2
  2 2 3
  2 3 3
  3 3 3

The following will do that:

u - c(1,2,3)
unique(t(unique(apply(expand.grid(u,u,u),1,sort),margin=1)))

#  [,1] [,2] [,3]
# [1,]111
# [2,]112
# [3,]113
# [4,]122
# [5,]123
# [6,]133
# [7,]222
# [9,]233
#[10,]333

There may be a simpler way!
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 07-Nov-2013  Time: 17:04:50
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there something wrong with R version 3.0.2 (2013-09-25) -- Frisbee Sailing?

2013-10-16 Thread Ted Harding

On 16-Oct-2013 14:54:00 tom soyer wrote:
 Hi,
 
 pnorm(-1.53,0,1) under version 3.0.2 gives 0.05155075. I am pretty sure it
 should be 0.063. Is there something wrong with this version of R?
 
 I am using:
 R version 3.0.2 (2013-09-25) -- Frisbee Sailing
 Copyright (C) 2013 The R Foundation for Statistical Computing
 Platform: i686-pc-linux-gnu (32-bit)
 -- 
 Tom

If you did exactly as you describe, then something is indeed wrong:

  pnorm(-1.53,0,1)
  # [1] 0.06300836
  [R version 2.11.0 (2010-04-22)]

as you expected.

However:

  qnorm(0.05155075)
  [1] -1.63

so maybe you mistyped 1.63 instead of 1.53?

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 16-Oct-2013  Time: 16:12:56
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help: concurrent R sessions for different settings of simulations

2013-09-29 Thread Ted Harding

[See at end]
On 29-Sep-2013 17:22:24 Chee Chen wrote:
 Dear All, 
 I have spent almost 2 days but did not succeed yet.
 
 Problem description:  I have 3 parameters, p1, p2 and p3, for which
 p1 take 1 of 5 possible distributions (e.g., normal, laplace),
 p2 takes 1 of 3 possible distributions, and p3 takes 1 of 5 possible
 distribution. These 3 parameters create 75 settings, and these 3
 parameters are arguments of a function F; and F is part of simulation
 codes. To summarize: different value of the ordered triple (p1,p2,p3)
 means different setting and this is the only difference in the
 simulation codes. 
 
 Target to achieve: instead of loop through each of the 75 settings
 one after another, I would like to concurrently run all 75 settings
 on the cluster.
 
 My attempts: via loops, I used Perl to create 75 files, each for a
 different triple (p1,p2,p3), and Perl uses system(R ..) to execute
 this setting once it is created. The Perl codes are submitted to
 cluster correctly. But when I looked into the log file, the cluster
 still executes it one setting after another setting. 
 
 Request: any help is appreciated!  It is because of the loops of Perl
 that executes a setting once it is created?
 
 Have a nice day!
 Chee

Just a simple comment (which does not cionsider the technicalities
of using Perl, using a cluster, etc.).

From your description, it looks as though the system waits for one
item in the loop to finish before it starts the next one.

If that is the case, and *if* you are using UNIX/Linux (or other
UNIX-like OS), then you could try appending   to each submitted
command. An outline exemplar:

  for( s in settings ){
system(R something depending on s )
  }

The   has the effect, in a UNIX command line, of detaching the
command from the executing program. So the program can continue to
run (and take as long as it likes) while the system command-shell
is immediately freed up for the next command.

Therefore, with the above exemplar, is there were say 75 settings,
then that loop would complete in a very short time, after which
you would have 75 copies of R executing simulations, and your
original R command-line would be available.

Just a suggestion (which may have missed the essential point of
your query, but worth a try ... ).

I have no idea how to achieve a similar effect in Windows ...

Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 29-Sep-2013  Time: 19:31:29
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why does sin(pi) not return 0?

2013-09-26 Thread Ted Harding

On 26-Sep-2013 07:55:38 Rolf Turner wrote:
 On 09/26/13 19:31, Babak Bastan wrote:
 Hi experts,

 If I test sin(pi) in r, it returns me 1.224606e-16

 Why doesn't return me 0?
 
 If you think that 1.224606e-16 is different from 0, you should probably not
 be using computers.

Is that a Fortune? And, if so, should R be using computers?

  sin(pi)
  # [1] 1.224606e-16
  sin(pi)==0
  # [1] FALSE

 See FAQ 7.31 (which is in a way about the inverse of
 your question, but it should provide the necessary insight).
 
  cheers,
  Rolf Turner

Though, mind you, FAQ 3.71 does also offer some consolation to R:

  all.equal(0,sin(pi))
  # [1] TRUE

So it depends on what you mean by different from. Computers
have their own fuzzy concept of this ... Babak has too fussy
a concept.

Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 26-Sep-2013  Time: 09:13:33
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Coordinate scales for pairs plot

2013-08-21 Thread Ted Harding

Greetings all.

I suspect this question has already been asked. Apologies
for not having taced it ...

In the default pairs plot produces by the function pairs(),
the coordinate scales alternate between top and bottom and
right and left sides.

For example, in a 5x5 plot for variables X1, X2, X3, X4, X5
the coordinate scales for X2, X4 appear beside rows 2 and 4
on the left, and the scales for X1, X3, X5 appear beside rows
1, 3, 5 on the right.

Similarly, the scales for X2 and X4 appear above columns 2 and 4,
and the scales for X1, X3, X5 appear below columns 1, 3, 5.

Is there a parameter lurking somewhere in the depths of this
function which can be set so that the scales for all the variables
X1,X2,X3,X4,X5 appear both above and below  columns 1,2,3,4,5;
and both to the left and to the right of rows 1,2,3,4,5?

With thanks,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 21-Aug-2013  Time: 18:30:44
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Coordinate scales for pairs plot

2013-08-21 Thread Ted Harding

On 21-Aug-2013 19:08:29 David Winsemius wrote:
 
 On Aug 21, 2013, at 10:30 AM, (Ted Harding) wrote:
 
 Greetings all.
 
 I suspect this question has already been asked. Apologies
 for not having taced it ...
 
 In the default pairs plot produces by the function pairs(),
 the coordinate scales alternate between top and bottom and
 right and left sides.
 
 For example, in a 5x5 plot for variables X1, X2, X3, X4, X5
 the coordinate scales for X2, X4 appear beside rows 2 and 4
 on the left, and the scales for X1, X3, X5 appear beside rows
 1, 3, 5 on the right.
 
 Similarly, the scales for X2 and X4 appear above columns 2 and 4,
 and the scales for X1, X3, X5 appear below columns 1, 3, 5.
 
 Is there a parameter lurking somewhere in the depths of this
 function which can be set so that the scales for all the variables
 X1,X2,X3,X4,X5 appear both above and below  columns 1,2,3,4,5;
 and both to the left and to the right of rows 1,2,3,4,5?
 
 I've searched for a parameter and come up empty; Hacking the code for
 pairs.default is not that difficult. I stripped out the conditionals that
 were driving the Axis calls to alternating sides: 
 Search for `box()` to start this surgery and replace everything to the 'mfg'
 assignment to get uniform axis locations on sides 1 and 2.
 
 pairs.12 - function(x, ... arglist same as pairs.default)
{upper portion of code
 box()
 if (i == nc ) 
 localAxis(1L , x[, j], x[, i], 
   ...)
 if (j == 1 ) 
 localAxis(2L, x[, j], x[, i], ...)
 
 mfg - par(mfg)
lower portion of code }
 
 Oooops,  that wasn't what you asked for ... Use this instead:
 
 
 box()  # begin surgery
 if (i == 1 ) 
 localAxis(3, x[, j], x[, i],  ...)
 if (i == nc ) 
 localAxis(1, x[, j], x[, i],  ...)
 if (j == 1 ) 
 localAxis(2L, x[, j], x[, i], ...)
 if (j == nc ) 
 localAxis(4L, x[, j], x[, i], ...)
 # end anastomosis
  mfg - par(mfg)
 ..
 --
 David Winsemius
 Alameda, CA, USA

Thanks very much, David! I'll give it a try. It looks promising.

Good surgery, steady hand!
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 21-Aug-2013  Time: 21:23:39
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] odds ratio per standard deviation

2013-06-12 Thread Ted Harding

[Replies transposed so as to achieve bottom-posting ... ]

On 12-Jun-2013 14:53:02 Greg Snow wrote:
 
 On Tue, Jun 11, 2013 at 7:38 PM, vinhnguyen04x imvi...@yahoo.com.vn wrote:
 
 Hi all
 i have a question:

 why and when do we use odds ratio per standard deviation instead of odds
 ratio?
 --
 View this message in context:
 http://r.789695.n4.nabble.com/odds-ratio-per-standard-deviation-tp4669315.htm
 l
 Sent from the R help mailing list archive at Nabble.com.
 __
 
 Without context this is a shot in the dark, but my guess is this is
 referring to something like a logistic regression where the odds ratio
 (exponential of the coefficient) refers to the change in odds for the
 outcome for a 1 unit change in x.  Now often a 1 unit change in x is very
 meaningful, but other times it is not that meaningful, e.g. if x is
 measuring the size of diamonds in carats and the data does not span an
 entire carat, or x is measured in days and our x variable spans years.  In
 these cases it can make more sense to talk about the change in the odds
 relative to a different step size than just a 1 unit change in x, a
 reasonable thing to use is the change in odds for a one standard deviation
 change in x (much smaller than 1 for the diamonds, much larger for the
 days), this may be the odds ratio per standard deviation that you mention.
 -- 
 Gregory (Greg) L. Snow Ph.D.
 538...@gmail.com

I think there is one comment that needs to be made about this!

The odds ratio per unit change in x means exactly what it says,
and can be converted into the odds ratio per any other amount of
change in x very easily. With x originally in (say) days, and
with estimated logistic regression logodds = a + b*x, if you
change your unit of x to, say weeks, so that x' = x/7, then this
is equivalent to changing b to b' = 7*b. Now just take your sliderule;
no need for R (oops, now off-topic ... ).

Another comment: I do not favour blindly standardising a variable
relative to its standard deviation in the data. The SD may be what
it is for any number of reasons, ranging from x being randomly sampled
fron a population to x being assigned specific values in a designed
experiment.

Since, for exactly the same meanings of the variables in the regression,
the standard deviation may change from one set of data to another of
exactly the same kind, the odds per standard deviation could vary
from dataset to dataset of the same investigation, and even vary
dramatically. That looks like a situation to avoid, unless it is very
carefully discussed!

The one argument that might give some sense to odds ratio per standard
deviation could apply when x has been sampled from a population in
which the variation of x can be approximately described by a Normal
distribution. Then odds ratio per standard deviation refers to
a change from, say, the mean/median of the population to the 84th
percentile, or from the 31st percentile to the 69th percentile,
or from the 69th percentile to the 93rd percentile, etc.
But these steps cover somewhat different proportions of the populatipn:
50th to 85th = 35%; 31st to 69th = 38%; 69th to 93rd = 24%. So you are
still facing issues of what you mean, or what you want to mean.

Simpler to stick to the original odds per unit of x and then apply
it to whatever multiple of the unit you happen to be interested in
as a change (along with the reasons for that interest).

Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 12-Jun-2013  Time: 17:14:00
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fwd: Your message to R-help awaits moderator approval

2013-06-01 Thread Ted Harding

[See at end]

On 01-Jun-2013 17:52:01 Janh Anni wrote:
 Hello,
 
 I don't understand why my mails are being held up.  What could be the
 problem?
 
 Thanks
 Janh
 -- Forwarded message --
 From: r-help-boun...@r-project.org
 Date: Sat, Jun 1, 2013 at 1:48 PM
 Subject: Your message to R-help awaits moderator approval
 To: annij...@gmail.com
 
 
 Your mail to 'R-help' with the subject
 
 Re: [R] wilcox_test function in coin package
 
 Is being held until the list moderator can review it for approval.
 
 The reason it is being held:
 
 The message headers matched a filter rule
 
 Either the message will get posted to the list, or you will receive
 notification of the moderator's decision.  If you would like to cancel
 this posting, please visit the following URL:
 
 https://stat.ethz.ch/mailman/confirm/r-help/067afe28f7ead30dfea844b8a34449526c
 d665d8

This can happen to anyone, depending on the current sensitivity
of the mail-server's spam-detection filter to potential
triggers in the message.

It is particularly likely to arise with mails posted from a gmail
account, as your was. This is because gmail is a major source of
spam emails, and the spam filter is alert to these.

I have had a look at your message (which was duly approved), and
I see that it is in reply to a message which itself is in a thread
that includes several messages sent via gmail. From the headers
of your message:

In-Reply-To: 51a9291c.70...@ucalgary.ca
References:
cafcodddp64mfyvttb6b_o6kgotkyr41mcb2spcc4c6anqyw...@mail.gmail.com
cafeqcdy6+nhk2hgcwqalymyxx1a0trqjthybxx-fub9oyff...@mail.gmail.com
 CAFCoDdBcm4B1tVW7BarkHXAgpqMpTCimmjhrLc3N7=yvsse...@mail.gmail.com
 CAFEqCdz_=YBeeDYLfpDYyTAwjr6a4n2OKTbQaBsb9UC=g9s...@mail.gmail.com
 CAFCoDdC1asW6JGK5huY_kLgCQQCduw8s=yjhhdp0pta_9ya...@mail.gmail.com
 51a9291c.70...@ucalgary.ca

so that's a total of 6 references to gmail (including your own message)
which is probably why the spam filter felt a bit twitchy!

Don't worry about it. As I say, it can happen to anyone (though more
often to some than to others). If it is a proper message to R-help,
one of the moderators will approve it (though quite possible not
immediately).

Hoping this helps,
Ted (one of the moderators)

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 01-Jun-2013  Time: 20:11:00
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Unexpected behavior of apply when FUN=sample

2013-05-14 Thread Ted Harding

On 14-May-2013 09:46:32 Duncan Murdoch wrote:
 On 13-05-14 4:52 AM, Luca Nanetti wrote:
 Dear experts,

 I wanted to signal a peculiar, unexpected behaviour of 'apply'.
 It is not a bug, it is per spec, but it is so counterintuitive
 that I thought it could be interesting.

 I have an array, let's say test, dim=c(7,5).

 test - array(1:35, dim=c(7, 5))
 test

   [,1] [,2] [,3] [,4] [,5]
 [1,]18   15   22   29
 [2,]29   16   23   30
 [3,]3   10   17   24   31
 [4,]4   11   18   25   32
 [5,]5   12   19   26   33
 [6,]6   13   20   27   34
 [7,]7   14   21   28   35

 I want a new array where the content of the rows (columns) are
 permuted, differently per row (per column)

 Let's start with the columns, i.e. the second MARGIN of the array:
 test.m2 - apply(test, 2, sample)
 test.m2

   [,1] [,2] [,3] [,4] [,5]
 [1,]1   10   18   23   32
 [2,]79   16   25   30
 [3,]6   14   17   22   33
 [4,]4   11   15   24   34
 [5,]2   12   21   28   31
 [6,]58   20   26   29
 [7,]3   13   19   27   35

 perfect. That was exactly what I wanted: the content of each column is
 shuffled, and differently for each column.
 However, if I use the same with the rows (MARGIIN = 1), the output is
 transposed!

 test.m1 - apply(test, 1, sample)
 test.m1

   [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [1,]12345   13   21
 [2,]   22   30   17   18   19   20   35
 [3,]   15   23   24   32   26   27   14
 [4,]   29   16   31   25   33   34   28
 [5,]89   10   11   1267

 In other words, I wanted to permute the content of the rows of test, and
 I expected to see in the output, well, the shuffled rows as rows, not as
 column!

 I would respectfully suggest to make this behavior more explicit in the
 documentation.
 
 It's is already very explicit:  If each call to FUN returns a vector of 
 length n, then apply returns an array of dimension c(n, dim(X)[MARGIN]) 
 if n  1.  In your first case, sample is applied to columns, and 
 returns length 7 results, so the shape of the final result is c(7, 5). 
 In the second case it is applied to rows, and returns length 5 results, 
 so the shape is c(5, 7).
 
 Duncan Murdoch

And the (quite simple) practical implication of what Duncan points out is:

  test - array(1:35, dim=c(7, 5))
  test
  #  [,1] [,2] [,3] [,4] [,5]
  # [1,]18   15   22   29
  # [2,]29   16   23   30
  # [3,]3   10   17   24   31
  # [4,]4   11   18   25   32
  # [5,]5   12   19   26   33
  # [6,]6   13   20   27   34
  # [7,]7   14   21   28   35

# To permute the rows:
  t(apply(t(test), 2, sample))
  #  [,1] [,2] [,3] [,4] [,5]
  # [1,]   22   298   151
  # [2,]   30   16   2329
  # [3,]   10   31   243   17
  # [4,]   114   25   32   18
  # [5,]   265   12   33   19
  # [6,]   27   34   20   136
  # [7,]   35   28   147   21

which looks right!
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 14-May-2013  Time: 11:07:46
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Choice of statistical test (in R) of two apparently different distributions

2013-05-09 Thread Ted Harding

On 09-May-2013 01:42:07 Pascal Oettli wrote:
 On 05/09/2013 10:29 AM, Gundala Viswanath wrote:
 I have the following list of data each has 10 samples.
 The values indicate binding strength of a particular molecule.

 What I want so show is that 'x' is statistically different from
 'y', 'z' and 'w'.  Which it does if you look at X it has
 more values greater than zero (2.8,1.00,5.4, etc) than others.

 I tried t-test, but all of them shows insignificant difference
 with high P-value.

 What's the appropriate test for that?

 Below is my code:

 x   -
 c(2.852672123,0.076840264,1.009542943,0.430716968,5.4016,0.084281843,0.065654
 548,0.971907344,3.325405405,0.606504718)
 y   -
 c(0.122615039,0.844203734,0.002128992,0.628740077,0.87752229,0.888600425,0.72
 8667099,0.000375047,0.911153571,0.553786408);
 z   -
 c(0.766445916,0.726801899,0.389718652,0.978733927,0.405585807,0.408554832,0.7
 99010791,0.737676439,0.433279599,0.947906524)
 w   -
 c(0.000124984,1.486637663,0.979713013,0.917105894,0.660855127,0.338574774,0.2
 11689885,0.434050179,0.955522972,0.014195184)

 t.test(x,y)
 t.test(x,z)

 --END--

 G.V.
 
 Hello,
 
 1) Why 'x' should be statistically different from others?
 2) 'y' looks to be bimodal. The mean is not an appropriate measurement 
 for this kind of distribution.
 
 Regards,
 Pascal

Running the commands:

  plot(x,pch=+,col=red,ylim=c(0,6))
  points(y,pch=+,col=green)
  points(z,pch=+,col=blue)
  points(w,pch=+,col=black)
  lines(x,col=red)
  lines(y,col=green)
  lines(z,col=blue)
  lines(w,col=black)

indicates that y, z and w are similar to each other (with some
suggestion of a serial structure).

However, while part of x is also similar to y, z and w, it is
clear that 3 values of x are outliers (well above the range
of all other values, including those of x). [And I think Pascal
meant x when he wrote 'y' looks to be bimodal.]

And it may be of interest that these exceptional values of x
occur at x[1], x[5], x[9] (i.e. every 4th observation).

Taken together, these facts suggest that an examination of the
procedure giving rise to the data may be relevant. As one
example of the sort of thing to look for: were the 3 outlying
observations obtained by the same worker/laboratory/apparatus
as the others (or a similar question for x as opposed to y, z, w,
raising issues of reliability). There are many similar questions
one could think of raising, but knowledge of the background
is essential for appropriate choice!

I would agree with Pascal that a routine t-test is not
appropriate.

One thing that can be directly looked at statistically
is, taking as given that there are 3 outliers somewhere
in all 40 data, what is the probability that all three
occur in one of the 4 groups (x,y,z,w) of data?

This is 4 times the probability that they occur is a specific
group (say x). The chance of all 3 being in x is the number
of ways of choosing the remaining 7 out of the remaining 37,
divided by the number of ways of choosing any 10 out of 40,
i.e. (in R-speak)

  choose(37,7)/choose(40,10)
  # [1] 0.01214575

so the chance of all 3 being in some one of the 4 groups is

  4*choose(37,7)/choose(40,10)
  # [1] 0.048583

which, if you are addicted to P-values, is just significant
at the 5% (P = 0.05) level. So this gives some indication
that the x group of data is not on the same footing as the
other (y, z, w) groups. However, such a test does not
address any question of why such outliers should be there
in the first place; this needs to be addressed differently
(see above).

And one must not forget that the above P-value has been
obtained by a method which was prompted by looking at the data
in the first place.

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 09-May-2013  Time: 09:35:05
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Decomposing a List

2013-04-25 Thread Ted Harding

Greetings!
For some reason I am not managing to work out how to do this
(in principle) simple task!

As a result of applying strsplit() to a vector of character strings,
I have a long list L (N elements), where each element is a vector
of two character strings, like:

  L[1] = c(A1,B1)
  L[2] = c(A2,B2)
  L[3] = c(A3,B3)
  [etc.]

From L, I wish to obtain (as directly as possible, e.g. avoiding
a loop) two vectors each of length N where one contains the strings
that are first in the pair, and the other contains the strings
which are second, i.e. from L (as above) I would want to extract:

  V1 = c(A1,A2,A3,...)
  V2 = c(B1,B2,B3,...)

Suggestions?

With thanks,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 25-Apr-2013  Time: 11:16:46
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Decomposing a List

2013-04-25 Thread Ted Harding

Thanks, Jorge, that seems to work beautifully!
(Now to try to understand why ... but that's for later).
Ted.

On 25-Apr-2013 10:21:29 Jorge I Velez wrote:
 Dear Dr. Harding,
 
 Try
 
 sapply(L, [, 1)
 sapply(L, [, 2)
 
 HTH,
 Jorge.-
 
 
 
 On Thu, Apr 25, 2013 at 8:16 PM, Ted Harding ted.hard...@wlandres.netwrote:
 
 Greetings!
 For some reason I am not managing to work out how to do this
 (in principle) simple task!

 As a result of applying strsplit() to a vector of character strings,
 I have a long list L (N elements), where each element is a vector
 of two character strings, like:

   L[1] = c(A1,B1)
   L[2] = c(A2,B2)
   L[3] = c(A3,B3)
   [etc.]

 From L, I wish to obtain (as directly as possible, e.g. avoiding
 a loop) two vectors each of length N where one contains the strings
 that are first in the pair, and the other contains the strings
 which are second, i.e. from L (as above) I would want to extract:

   V1 = c(A1,A2,A3,...)
   V2 = c(B1,B2,B3,...)

 Suggestions?

 With thanks,
 Ted.

 -
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Date: 25-Apr-2013  Time: 11:16:46
 This message was sent by XFMail

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 25-Apr-2013  Time: 11:31:57
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting a large number into smaller numbers and find the largest product

2013-04-18 Thread Ted Harding


On 18-Apr-2013 08:47:18 Janesh Devkota wrote:
 Hello,
 
 I have a big number lets say of around hundred digits. I want to subset
 that big number into consecutive number of 5 digits and find the product of
 those 5 digits. For example my first 5 digit number would be 73167. I need
 to check the product of the individual numbers in 73167 and so on.
 
 The sample number is as follows:
 
 
 731671765313306249192251196744265747423553491949349698352031277450632623957831
 801698480186947885184385861560789112949495459501737958331952853208805511125406
 98747158523863050715693290963295227443043557
 
 I have a problem subsetting the small numbers out of the big number.
 
 Any help is highly appreciated.
 
 Best Regards,
 Janesh Devkota

Since the number is too large to be stored in any numerical format in R,
it needs to be stored as a character string:

X - 73167176531330624963295227443043557.

Then you can easily access successive 5-digit blocks as, e.g. for
block i (i = 1:N, where 5*N is the length of the number):

  block - substr(X, 1+5*(i-1), 5*i)

You now have a 5-character string from which you can extract the
individual digits. And then on to whatever you want to do ...

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 18-Apr-2013  Time: 10:06:43
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] I don't understand the 'order' function

2013-04-16 Thread Ted Harding

[See in-line below[

On 16-Apr-2013 17:51:41 Julio Sergio wrote:
 I thought I've understood the 'order' function, using simple examples like:
 
order(c(5,4,-2))
[1] 3 2 1
 
 However, I arrived to the following example:
 
order(c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045)) 
[1]  8  9 10  7 11  6  5  4  3  2  1
 
 and I was completely perplexed!
 Shouldn't the output vector be  11 10 9 8 7 6 4 1 2 3 5 ?
 Do I have a damaged version of R?

I think the simplest explanation can be given as:

S - c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045)
cbind(Index=1:length(S), S, Order=order(S), Sort=sort(S))
  IndexS Order Sort
 [1,] 1 2465 8  210
 [2,] 2 2255 9  210
 [3,] 3 208510  505
 [4,] 4 1545 7  920
 [5,] 5 133511 1045
 [6,] 6 1210 6 1210
 [7,] 7  920 5 1335
 [8,] 8  210 4 1545
 [9,] 9  210 3 2085
[10,]10  505 2 2255
[11,]11 1045 1 2465

showing that the value of 'order' for any one of the numbers
is the Index (position) of that number in the original series,
placed in the position that number occupies in the sorted series.
(With a tie for S[8] = S[9] = 210).

For example: which one of S occurs in 5th position in the sorted
series? It is the 11th of S (1045).

 I became still more astonished when I used the sort function and got the 
 right answer: 
 
 sort(c(2465, 2255, 2085, 1545, 1335, 1210,  920,  210,  210,  505, 1045))
 [1]  210  210  505  920 1045 1210 1335 1545 2085 2255 2465
 since 'sort' documentation claims to be using 'order' to establish the right 
 order.

Indeed, once you have order(S), you know which element of S to put in
each position of the sorted order:

  S[order(S)]
  [1]  210  210  505  920 1045 1210 1335 1545 2085 2255 2465

Does this help to explain it?
Ted.

 Please help me to understand all this!
 
   Thanks,
 
   -Sergio.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 16-Apr-2013  Time: 19:12:21
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] %*%

2013-04-11 Thread Ted Harding

On 11-Apr-2013 10:25:17 Shane Carey wrote:
 What does these operators do: %*%
 
 Thanks
 -- 
 Shane

Enter the command

  ?%*%

and you will see:

  Matrix Multiplication
  Description:
 Multiplies two matrices, if they are conformable.  If one argument
 is a vector, it will be promoted to either a row or column matrix
 to make the two arguments conformable.  If both are vectors it
 will return the inner product (as a matrix).
  Usage:
 x %*% y
  [etc.]

Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 11-Apr-2013  Time: 11:41:22
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rep() fails at times=0.29*100

2013-04-09 Thread Ted Harding

[See at end]
On 09-Apr-2013 16:11:18 Jorge Fernando Saraiva de Menezes wrote:
 Dear list,
 
 I have found an unusual behavior and would like to check if it is a
 possible bug, and if updating R would fix it. I am not sure if should post
 it in this mail list but I don't where is R bug tracker. The only mention I
 found that might relate to this is If times is a computed quantity it is
 prudent to add a small fuzz. in rep() help, but not sure if it is related
 to this particular problem
 
 Here it goes:
 
 rep(TRUE,29)
  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [28] TRUE TRUE
 rep(TRUE,0.29*100)
  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [28] TRUE
 length(rep(TRUE,29))
 [1] 29
 length(rep(TRUE,0.29*100))
 [1] 28
 
 Just to make sure:
 0.29*100
 [1] 29
 
 This behavior seems to be independent of what is being repeated (rep()'s
 first argument)
 length(rep(1,0.29*100))
 [1] 28
 
 Also it occurs only with the 0.29.
 length(rep(1,0.291*100))
 [1] 29
 for(a in seq(0,1,0.01)) {print(sum(rep(TRUE,a*100)))} #also shows correct
 values in values from 0 to 1 except for 0.29.
 
 I have confirmed that this behavior happens in more than one machine
 (though I only have session info of this one)
 
 
 sessionInfo()
 R version 2.15.3 (2013-03-01)
 Platform: x86_64-w64-mingw32/x64 (64-bit)
 [1]  LC_COLLATE=Portuguese_Brazil.1252  LC_CTYPE=Portuguese_Brazil.1252
  LC_MONETARY=Portuguese_Brazil.1252
 [4] LC_NUMERIC=C   LC_TIME=Portuguese_Brazil.1252
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 
 other attached packages:
 [1] spatstat_1.31-1 deldir_0.0-21   mgcv_1.7-22
 
 loaded via a namespace (and not attached):
 [1] grid_2.15.3 lattice_0.20-13 Matrix_1.0-11   nlme_3.1-108
  tools_2.15.3

The basic issue is, believe or not, that despite apparently:
  0.29*100
  # [1] 29

in reality:
  0.29*100 == 29
  # [1] FALSE

In other words, as computed by R, 0.29*100 is not exactly equal to 29:

  29 - 0.29*100
  # [1] 3.552714e-15

The difference is tiny, but it is sufficient to make 0.29*100 slightly
smaller than 29, so rep(TRUE,0.29*100) uses the largest integer compatible
with times = 0.29*100, i.e. 28. Hence the recommendation to add a
little fuzz.

On the other hand, when you use rep(1,0.291*100) you will be OK:
This is because:

  29 - 0.291*100
  # [1] -0.1

so 0.291*100 is comfortably greater than 29 (but well clear of 30).

The reason for the small inaccuracy (compared with mathematical
truth) is that R performs numerical calculations using binary
representations of numbers, and there is no exact binary representation
of 0.29, so the result of 0.29*100 will be slightly inaccurate.

If you do need to do this sort of thing (e.g. the value of times
will be the result of a calculation) then one useful precaution
could be to round the result:

  round(0.29*100)
  # [1] 29
  29-round(0.29*100)
  # [1] 0
  length(rep(TRUE,0.29*100))
  # [1] 28
  length(rep(TRUE,round(0.29*100)))
  # [1] 29

(The default for round() is 0 decimal places, i.e. it rounds to
an integer).

So, compared with:
  0.29*100 == 29
  # [1] FALSE

we have:
  round(0.29*100) == 29
  # [1] TRUE

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 09-Apr-2013  Time: 17:56:33
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Console display buffer size

2013-04-01 Thread Ted Harding

Greetings All.
This is a somewhat generic query (I'm really asking on behalf
of a friend who uses R on Windows, whereas I'm on Linux, but
the same phenomenon appears on both).

Say one has a largish dataframe -- call it G -- which in the
case under discussion has 592 rows and 41 columns. The intention
is to display the data by simply entering its name (G) as command.

Say the display console has been set wide enough to display 15
columns (determined by lengths of column names). Then R will output
succesively to the console, in continuous flow:
  Chunk 1: rows 1:592 of columns 1:15
  Chunk 2: rows 1:592 of columns 16:30
  Chunk 3: rows 1:592 of columns 31:41
If the number of rows that can be displayed on the screen is, say, 60,
then only rows 533:592 of Chunk 3 will be visible in the first instance.

However, on my Linux system at any rate, I can use Shift+PgUp to
scroll back up through what has been output to the console. It seems
that my friend proceeds similarly.

But now, after a certain number of (Shift+PgUps), one runs out
of road before getting to Row 1 of Chunk 1 (in my friend's case,
only Rows 468-592 of Chunk 1 can be seen).

The explanation which occurs to me is that the console has a buffer
in which such an output is stored, and if the dataframe is too big
then lines 1:N (for some N) of the output are dropped from the start
of the buffer, and it is impossible to go further back than line (N+1)
of Chunk 1 where in this case N=467 (of course one may not even be
able to go further back than Chunk K, for some K  1, for a bigger
dataframe).

The query I have is: In the light of the above, is there a way to
change the size of the buffer so that one can scroll all the way
back to the very first row of Chunk 1? (The size-change may perhaps
have to be determined empirically).

With thanks,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 01-Apr-2013  Time: 21:37:17
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Console display buffer size

2013-04-01 Thread Ted Harding

On 01-Apr-2013 21:26:07 Robert Baer wrote:
 On 4/1/2013 4:08 PM, Peter Ehlers wrote:
 On 2013-04-01 13:37, Ted Harding wrote:
 Greetings All.
 This is a somewhat generic query (I'm really asking on behalf
 of a friend who uses R on Windows, whereas I'm on Linux, but
 the same phenomenon appears on both).

 Say one has a largish dataframe -- call it G -- which in the
 case under discussion has 592 rows and 41 columns. The intention
 is to display the data by simply entering its name (G) as command.

 Say the display console has been set wide enough to display 15
 columns (determined by lengths of column names). Then R will output
 succesively to the console, in continuous flow:
Chunk 1: rows 1:592 of columns 1:15
Chunk 2: rows 1:592 of columns 16:30
Chunk 3: rows 1:592 of columns 31:41
 If the number of rows that can be displayed on the screen is, say, 60,
 then only rows 533:592 of Chunk 3 will be visible in the first instance.

 However, on my Linux system at any rate, I can use Shift+PgUp to
 scroll back up through what has been output to the console. It seems
 that my friend proceeds similarly.

 But now, after a certain number of (Shift+PgUps), one runs out
 of road before getting to Row 1 of Chunk 1 (in my friend's case,
 only Rows 468-592 of Chunk 1 can be seen).

 The explanation which occurs to me is that the console has a buffer
 in which such an output is stored, and if the dataframe is too big
 then lines 1:N (for some N) of the output are dropped from the start
 of the buffer, and it is impossible to go further back than line (N+1)
 of Chunk 1 where in this case N=467 (of course one may not even be
 able to go further back than Chunk K, for some K  1, for a bigger
 dataframe).

 The query I have is: In the light of the above, is there a way to
 change the size of the buffer so that one can scroll all the way
 back to the very first row of Chunk 1? (The size-change may perhaps
 have to be determined empirically).

 Isn't this set by the 'bufbytes' and 'buflines' specifications in the
 Rconsole file?
 Anyway, it's probably best to use 'View' to inspect data.

 Although I have not tried them, Windows RGUI  [ -- Edit | GUI 
 Preferences --] has settings for both buffer and lines which on my 
 64-bit machine default to 25 and 8000 respectively.
 
 Rob Baer
 
 Peter Ehlers

Thanks for the replies, Rob and Peter. Rob's reply in particular
corresponds to what my friend has himself just found out in his
Windows installation of R. It seems, however, that this setting
is once-and-for-all in an R session (I can't check that myself).

Re Peter's comment: I don;t seem to have an Rconsole file
anywhere in my Linux installation. Is it unique to Windows?

Once again, thanks. Very helpful.
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 01-Apr-2013  Time: 23:06:21
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] prop.test correct true and false gives same answer

2013-03-27 Thread Ted Harding

On 27-Mar-2013 21:04:51 David Arnold wrote:
 All,
 How come both of these are the same.  Both say 1-sample proportions
 test without continuity correction. I would suspect one would say
 without and one would say with.
 
 
 prop.test(118,236,.5,correct=FALSE,conf.level=0.95)
 
   1-sample proportions test without continuity correction
 
 data:  118 out of 236, null probability 0.5 
 X-squared = 0, df = 1, p-value = 1
 alternative hypothesis: true p is not equal to 0.5 
 95 percent confidence interval:
  0.4367215 0.5632785 
 sample estimates:
   p 
 0.5 
 
 prop.test(118,236,.5,correct=TRUE,conf.level=0.95)
 
   1-sample proportions test without continuity correction
 
 data:  118 out of 236, null probability 0.5 
 X-squared = 0, df = 1, p-value = 1
 alternative hypothesis: true p is not equal to 0.5 
 95 percent confidence interval:
  0.4367215 0.5632785 
 sample estimates:
   p 
 0.5

Note what is said (admittedly somewhat deeply tucked away)
under Details in ?prop.test:

 Continuity correction is used only if it does not exceed
  the difference between sample and null proportions
  in absolute value.

In your example, the sample proportion exactly matches the
null-hypothesis proportion (0.5).

Confirmation:
[A] Your same example:
  prop.test(118,236,.5,correct=TRUE,conf.level=0.95)
  # 1-sample proportions test without continuity correction
  # data:  118 out of 236, null probability 0.5 
  # X-squared = 0, df = 1, p-value = 1
  # alternative hypothesis: true p is not equal to 0.5 
  # 95 percent confidence interval:
  #  0.4367215 0.5632785 
  # sample estimates:
  #   p 
  # 0.5 

[B1] Slightly change x, but keep correct=TRUE:
  prop.test(117,236,.5,correct=TRUE,conf.level=0.95)
  # 1-sample proportions test with continuity correction
  # data:  117 out of 236, null probability 0.5 
  # X-squared = 0.0042, df = 1, p-value = 0.9481
  # alternative hypothesis: true p is not equal to 0.5 
  # 95 percent confidence interval:
  #  0.4304724 0.5611932 
  # sample estimates:
  # p 
  # 0.4957627 

[B2] Slightly change x, but now correct=FALSE:
  prop.test(117,236,.5,correct=FALSE,conf.level=0.95)
  # 1-sample proportions test without continuity correction
  # data:  117 out of 236, null probability 0.5 
  # X-squared = 0.0169, df = 1, p-value = 0.8964
  # alternative hypothesis: true p is not equal to 0.5 
  # 95 percent confidence interval:
  #  0.4325543 0.5591068 
  # sample estimates:
  # p 
  # 0.4957627 

So it doesn't do the requested continuity correction in [A] because
there is no need to. But in [B1] it makes a difference (compare
with [B2]), so it does it.

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 27-Mar-2013  Time: 21:21:39
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] edit.data() read-only?

2013-03-26 Thread Ted Harding

Greetings All.

The function edit.data() allows a convenient spreadsheet-like
view of a dataframe with too many rows/columns to fit on the
screen (especially when there are many columns). Very useful
when scanning through a dataset (row  column are conveniently
identified by the labels at the side and above).

However, there seens to be no option to set it read-only on
start-up, with the consequence that a clumsy key-press or
mouse-click could cause a change in the data which would then
be stored after quitting edit.data().

Is there a possibility of a read-only option? Or some other
function which could offer similar viewing capability without
the risk of data change?

With thanks,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 26-Mar-2013  Time: 10:08:58
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] edit.data() read-only?

2013-03-26 Thread Ted Harding

Sorry, I meant data.entry(), not edit.data() (the latter due
to mental cross-wiring with edit.data.frame()).

I think that Nello Blaser's suggestion of View may be what I
seek (when I can persuade it to find the font it seeks ... )!

With thanks, Barry.
Ted.

On 26-Mar-2013 10:20:59 Barry Rowlingson wrote:
 On Tue, Mar 26, 2013 at 10:09 AM, Ted Harding ted.hard...@wlandres.net
 wrote:
 Greetings All.

 The function edit.data() allows a convenient spreadsheet-like
 view of a dataframe with too many rows/columns to fit on the
 screen (especially when there are many columns). Very useful
 when scanning through a dataset (row  column are conveniently
 identified by the labels at the side and above).
 
  Do you mean:
 
 d=data.frame(x=1:10,y=runif(10))
 edit(d)
 
 ? Because I don't have an edit.data function (maybe its windows only).
 
 However, there seens to be no option to set it read-only on
 start-up, with the consequence that a clumsy key-press or
 mouse-click could cause a change in the data which would then
 be stored after quitting edit.data().

 Is there a possibility of a read-only option? Or some other
 function which could offer similar viewing capability without
 the risk of data change?
 
  If you just want to view the data, don't assign it back. The edit
 function only updates the data if you assign it back, as in:
 
 d=edit(d)
 
 so a 'read only' version would be:
 
 invisible(edit(d))
 
 except the user here can change the values in the cells, they just
 don't go anywhere. Except into .Last.value if you decide you really
 did want to get the values...
 
 Barry
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 26-Mar-2013  Time: 10:36:32
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] edit.data() read-only?

2013-03-26 Thread Ted Harding

Thanks! ?View does indeed state The object is then viewed
in a spreadsheet-like data viewer, a read-only version of
'data.entry', which is what I was looking for!
Ted.

On 26-Mar-2013 10:23:59 Blaser Nello wrote:
 Try ?View()
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of Ted Harding
 Sent: Dienstag, 26. März 2013 11:09
 To: r-help@r-project.org
 Subject: [R] edit.data() read-only?
 
 Greetings All.
 
 The function edit.data() allows a convenient spreadsheet-like view of a
 dataframe with too many rows/columns to fit on the screen (especially when
 there are many columns). Very useful when scanning through a dataset (row 
 column are conveniently identified by the labels at the side and above).
 
 However, there seens to be no option to set it read-only on start-up, with
 the consequence that a clumsy key-press or mouse-click could cause a change
 in the data which would then be stored after quitting edit.data().
 
 Is there a possibility of a read-only option? Or some other function which
 could offer similar viewing capability without the risk of data change?
 
 With thanks,
 Ted.
 
 -
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Date: 26-Mar-2013  Time: 10:08:58
 This message was sent by XFMail
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 26-Mar-2013  Time: 10:38:34
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] order statistic of multivariate normal

2013-03-22 Thread Ted Harding

On 22-Mar-2013 13:02:25 li li wrote:
 Thank you all for the reply.
 
 One example of my question is as follows.
 
 Suppose X1, ..., X10 has multivariate normal distribution
 and X(1), ..., X(10) are the corresponding order statistics.
 
 My question is that whether there is a R function that would
 help compute the c which satisfies
 P(X(4) c)=beta.
 Here beta is a known constant between 0 and 1.
 
 Thank you.
Hanna

The basic question which needs to be answered (which has been hinted
at in earlier replis) is: How do you define order statistic for
multivariate observations?

For example, here is a sample of 10 (to 3 d.p.) from a bivariate
normal distribution:

  [,1]   [,2]
   [1,]  1.143 -0.396
   [2,] -0.359 -0.217
   [3,] -0.391 -0.601
   [4,] -0.416 -1.093
   [5,] -1.810 -1.499
   [6,] -0.367 -0.636
   [7,] -2.238  0.563
   [8,]  0.811  1.230
   [9,]  0.082  0.174
  [10,] -1.359 -0.364

Which one of these 10 rows is X(4)?

There is an alternative interpretation of your question:

  Suppose X1, ..., X10 has multivariate normal distribution
  and X(1), ..., X(10) are the corresponding order statistics.

This could mean that the vector (X1,...,X10) has a multivariate
normal distribution with 10 dimensions, and, for a single vector
(X1,...,X10) drawn from this distribution, (X(1), ..., X(10))
is a vector consisting of these same values (X1,...,X10), but
in increasing order.

Is that what you mean?

Hoping this helps,
Ted.


-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 22-Mar-2013  Time: 13:31:31
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Random Sample with constraints

2013-03-03 Thread Ted Harding

On 03-Mar-2013 16:29:05 Angelo Scozzarella Tiscali wrote:
 For example,  I want to simulate different populations with same mean and
 standard deviation but different distribution.
 
 Il giorno 03/mar/2013, alle ore 17:14, Angelo Scozzarella Tiscali ha scritto:
 Dear R friends,
 
 I'd like to generate random sample (variable size and range) without a
 specified distribution but with given mean and standard deviation.
 
 Could you help me?
 
 thanks in advance
 Angelo

As Ben Bolker said, any random sample must come from some distribution,
so you cannot generate one without specifying some distribution.

Insofar as your question can be interpreted, it will be satisfied
if, given the desired mean, M, and SD, S, you take any two available
distributions with, respectively, known means M1 and M2 and known
SDs S1 and S2. Let X1 denote a sample from t5he first, and X2 a
sample from the second.

Then (X1 - M1)/(S1/S) is a sample from the first distribution
re-scaled to have mean M and SD S, as required.

Similarly, (X2 - M2)/(S2/S) is a sample from the second distribution
re-scaled to have mean M and SD S, as required.

As for what the first distribution that you sample from, and the second,
that can be at your own choice -- for eample, the first could be
the Standard Normal (M1 = 0, S1 = 1); use rnomr().
The second could be the uniform on (0,1) (M2 = 0.5, S2 = 1/sqrt(12));
use runif().

Similar for other arbitrary choices of first and second distribution
(so long as each has at least a second moment, hence excluding, for
example, the Cauchy distribution).

That's about as far as one can go with your question!

Hoping it helps, howevr.
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 03-Mar-2013  Time: 17:12:50
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] arithmetic and logical operators

2013-01-30 Thread Ted Harding

On 30-Jan-2013 20:39:34 Berend Hasselman wrote:
 
 On 30-01-2013, at 21:32, Dave Mitchell dmmtc...@gmail.com wrote:
 
 Why, in R, does (0.1 + 0.05)  0.15 evaluate to True?  What am I missing
 here?  How can I ensure this (ostensibly incorrect) behavior doesn't
 introduce bugs into my code?  Thanks for your time.
 
 
 R-FAQ 7.31:
 http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-num
 bers-are-equal_003f
 
 Berend

And, to put Dave's specific query into the context of that FAQ:

  (0.1 + 0.05)  0.15
  # [1] TRUE
  (0.1 + 0.05) - 0.15
  # [1] 2.775558e-17

so that tiny 2.775558e-17 is the inexactitude (due to finite
binary representation).

As an interesting variant:

  (1.0 + 0.5)  1.5
  # [1] FALSE
  (1.0 + 0.5) - 1.5
  # [1] 0

and that is because 1.0 and 0.5, and also 1.5, have exact finite
binary representations, e.g.:

  1.0 == 1.
  0.5 == 0.1000
  1.5 == 1.1000

whereas 0.1, 0.5 and 0.15 are these numbers divided by 10 = 2*5;
and while you can exactly do the /2 part (just shift right by
one binary place), you can't exactly divide by 5 in finite binary
arithmetic (any more than you can exactly divide by 3 in decimal),
because 5 is not a factor of the base (2) of the binary representation
(whereas, in decimal, both 2 and 5 are factors of 10; but 3 isn't).

Whereas R has the function all.equal() to give the right answer
for most things like

  (0.1 + 0.05) == 0.15
  # [1] FALSE
  all.equal((0.1 + 0.05), 0.15)
  # [1] TRUE

(because it tests for equality to within a very small tolerance,
by default the square root of the binary precision of a double
precision binary real number), R does not have a straightforward
method for testing the truth of (0.1 + 0.05)  0.15 (and that
is because the question is not clearly discernible from the
expression, when imprecision underlies it).

You could emulate all.equal() on the lines of

  (0.1 + 0.05)  (0.15 + .Machine$double.eps^0.5)
  # [1] FALSE
  (0.1 + 0.05)  (0.15 - .Machine$double.eps^0.5)
  # [1] FALSE

(or similar). Observe that

  .Machine$double.eps^0.5
  # [1] 1.490116e-08
  .Machine$double.eps
  # [1] 2.220446e-16

  (0.1 + 0.05) - 0.15
  # [1] 2.775558e-17

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 30-Jan-2013  Time: 23:22:53
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1077 matches

Mail list logo