Re: [R] Rounding of problem with sum command in R

2017-08-22 Thread John Kane via R-help
Try options(digits=12)
sum(v)
BTW I don't get the same value as you did when you calculated manually.


On Tuesday, August 22, 2017, 10:39:26 AM EDT, Bert Gunter 
 wrote:

... and following up on Spencer's answer, see the "digits" argument of
?options and ?print.default.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Aug 22, 2017 at 7:30 AM, Spencer Graves
 wrote:
>
>
> On 2017-08-22 9:26 AM, niharika singhal wrote:
>>
>> Hello I have a vector
>>
>> v=c(0.0886,0.1744455,0.1379778,0.1209769,0.1573065,0.1134463,0.2074027)
>> when i do
>> sum(v)
>> or
>> 0.0886+0.1744455+0.1379778+0.1209769+0.1573065+0.1134463+0.2074027
>> i am getting output as 1
>
>
>
> No:  That's only the display:
>
>
>> sum(v)-1
> [1] 1.6e-07
>
>
>      hope this helps.  Spencer
>
>
>> But if i add them manually i get
>> 1.0026
>> I do not want to round of my value since it effect my code further
>> Can anyone suggest how can i avoid this.
>>
>> Thanks & Regards
>> Niharika Singhal
>>
>>        [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Flummoxed by gsub().

2017-08-22 Thread Rolf Turner


I have a vector (say "x") of the form

[1] "mung5"  "mung10" "mung20" "gorp5"  "gorp10" "gorp20"

I want to extract just the numbers (strings of digits) that appear at 
the end of the strings in "x".


My reading of ?regex led me to believe that

gsub("[:alpha:]","",x)

should give the result that I want.  However it returns

[1] "mung5"  "mung10" "mung20" "gor5"   "gor10"  "gor20"

i.e. it chops the last letter out of the "gorp" string, but nothing else.

I am completely bewildered by this behaviour and can see no rationale 
for it nor any way to adjust my syntax to get what I want.


A bit of Googling led me to the information that

gsub("\\D","",x)

should work, and indeed it does, giving:

[1] "5"  "10" "20" "5"  "10" "20"

OM!  (Apparently "\D" means *not* a digit.)

So I have *a* solution to my problem.  However I would really like to 
know why the  the first idea I tried did not work and

what it is actually *doing*!

Anybody?

cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] likert Package

2017-08-22 Thread Jeff Reichman
R- Help Forum

 

Working with the "likert" package and find that my "bar" graphs are
backwards (see attached)

 

> summary(results)

  Item low neutral high meansd

4   Q4   5  15   80 2.75 0.5501196

5   Q5  20  40   40 2.20 0.7677719

1   Q1  65  305 1.40 0.5982430

3   Q3   5  905 2.00 0.3244428

2   Q2  90  100 1.10 0.3077935

 

> results <- likert(data[,2:6], grouping = data$Group)
> plot(results, type = "bar", centered = FALSE, group.order = c("Band 3",
"Band 4"))
 
In the attached figure the percentages appear correct but the bars are
backwards (or appear to be backwards)

 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] splitting a dataframe in R based on multiple gene names in a specific column

2017-08-22 Thread Jim Lemon
Hi Bogdan,
Messy, and very specific to your problem:

df.sample.gene<-read.table(
 text="Chr Start   End Ref Alt Func.refGene  Gene.refGene
 284 chr2  16080996  16080996   C   T ncRNA_exonic  GACAT3
 448 chr2 113979920 113979920   C   T ncRNA_exonic  LINC01191,LOC100499194
 465 chr2 131279347 131279347   C   G ncRNA_exonic  LOC440910
 525 chr2 22358 22358   T   A   exonic  AP1S3
 626 chr3  99794575  99794575   G   A   exonic  COL8A1
 643 chr3 132601066 132601066   A   G   exonic  ACKR4
 655 chr3 132601999 132601999   A   G   exonic  BCDF5,CDFG6",
 header=TRUE,stringsAsFactors=FALSE)

multgenes<-grep(",",df.sample.gene$Gene.refGene)
rep_genes<-strsplit(df.sample.gene$Gene.refGene[multgenes],",")
ngenes<-unlist(lapply(rep_genes,length))
dup_row<-function(x) {
 newrows<-x
 lastcol<-dim(x)[2]
 rep_genes<-unlist(strsplit(x[,lastcol],","))
 for(i in 2:length(rep_genes)) newrows<-rbind(newrows,x)
 newrows$Gene.refGene<-rep_genes
 return(newrows)
}
for(multgene in multgenes)
 df.sample.gene<-rbind(df.sample.gene,dup_row(df.sample.gene[multgene,]))
df.sample.gene<-df.sample.gene[-multgenes,]
df.sample.gene

I added a second line with multiple genes to make sure that it would
work with more than one line.

Jim


On Wed, Aug 23, 2017 at 9:57 AM, Bogdan Tanasa  wrote:
> I would appreciate please a suggestion on how to do the following :
>
> i'm working with a dataframe in R that contains in a specific column
> multiple gene names, eg :
>
>> df.sample.gene[15:20,2:8]
>  Chr Start   End Ref Alt Func.refGene
> Gene.refGene284 chr2  16080996  16080996   C   T ncRNA_exonic
>GACAT3448 chr2 113979920 113979920   C   T ncRNA_exonic
> LINC01191,LOC100499194465 chr2 131279347 131279347   C   G
> ncRNA_exonic  LOC440910525 chr2 22358 22358   T
> A   exonic  AP1S3626 chr3  99794575  99794575   G
>  A   exonic COL8A1643 chr3 132601066 132601066   A
>   G   exonic  ACKR4
>
> How could I obtain a dataframe where each line that has multiple gene names
> (in the field Gene.refGene) is replicated with only one gene name ? i.e.
>
> for the second row :
>
>   448 chr2 113979920 113979920   C   T ncRNA_exonic LINC01191,LOC100499194
>
> we shall get in the final output (that contains all the rows) :
>
>   448 chr2 113979920 113979920   C   T ncRNA_exonic LINC01191
>   448 chr2 113979920 113979920   C   T ncRNA_exonic LOC100499194
>
> thanks a lot !
>
> -- bogdan
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] splitting a dataframe in R based on multiple gene names in a specific column

2017-08-22 Thread Bogdan Tanasa
I would appreciate please a suggestion on how to do the following :

i'm working with a dataframe in R that contains in a specific column
multiple gene names, eg :

> df.sample.gene[15:20,2:8]
 Chr Start   End Ref Alt Func.refGene
Gene.refGene284 chr2  16080996  16080996   C   T ncRNA_exonic
   GACAT3448 chr2 113979920 113979920   C   T ncRNA_exonic
LINC01191,LOC100499194465 chr2 131279347 131279347   C   G
ncRNA_exonic  LOC440910525 chr2 22358 22358   T
A   exonic  AP1S3626 chr3  99794575  99794575   G
 A   exonic COL8A1643 chr3 132601066 132601066   A
  G   exonic  ACKR4

How could I obtain a dataframe where each line that has multiple gene names
(in the field Gene.refGene) is replicated with only one gene name ? i.e.

for the second row :

  448 chr2 113979920 113979920   C   T ncRNA_exonic LINC01191,LOC100499194

we shall get in the final output (that contains all the rows) :

  448 chr2 113979920 113979920   C   T ncRNA_exonic LINC01191
  448 chr2 113979920 113979920   C   T ncRNA_exonic LOC100499194

thanks a lot !

-- bogdan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] boot.stepAIC fails with computed formula

2017-08-22 Thread Bert Gunter
OK, here's the problem. Continuing with your example:

strt1 <- lm(y1 ~1, dat)
strt2 <- lm(frm1,dat)


> strt1

Call:
lm(formula = y1 ~ 1, data = dat)

Coefficients:
(Intercept)
  41.73

> strt2

Call:
lm(formula = frm1, data = dat)

Coefficients:
(Intercept)
  41.73


Note that the formula objects of the lm object are different: strt2
does not evaluate the formula. So presumably boot.step.AIC does no
evaluation and therefore gets confused with the errors you saw. So you
need to get the evaluated formula into the lm object. This can be
done, e.g. via:

> strt2 <- eval(substitute(lm(form,data = dat), list(form = frm1)))

## yielding

> strt2

Call:
lm(formula = y1 ~ 1, data = dat)

Coefficients:
(Intercept)
  41.73

So this looks like it should fix the problem, but alas no, the
boot.stepAIC call still fails with the same error message. Here's why:

> identical(strt$call, strt2$call)
[1] FALSE

So one might rightfully ask, what the heck is going on here?! Further digging:

> str(strt$call)
 language lm(formula = y1 ~ 1, data = dat)

> str(strt2$call)
 language lm(formula = y1 ~ 1, data = dat)

These certainly look identical! -- but of course they're not:

> names(strt$call)
[1] """formula" "data"
> names(strt2$call)
[1] """formula" "data"

So the difference must lie in the formula component, right? ...

> strt$call$formula
y1 ~ 1
> strt2$call$formula
y1 ~ 1

So, thus far, huhh? But..

> class(strt2$call$formula)
[1] "formula"

> class(strt$call$formula)
[1] "call"

So I think therein lies the critical difference that is screwing
things up. NOTE: If I am wrong about this someone **PLEASE** correct
me.

I see no clear workaround for this other than to explicitly avoid
passing a formula in the lm() call with y~1 or y ~ .   I think the
real fix is to make the  boot.stepAIC function smarter in how it
handles its formula argument, and that is above my paygrade (and
degree of interest) . You should probably email the maintainer, who
may not monitor this list. But give it a day or so to give someone
else a chance to correct me if I'm wrong.


HTH.

Cheers,

Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Aug 22, 2017 at 8:17 AM, Stephen O'hagan
 wrote:
> I'm trying to use boot.stepAIC for feature selection; I need to be able to 
> specify the name of the dependent variable programmatically, but this appear 
> to fail:
>
> In R-Studio with MS R Open 3.4:
>
> library(bootStepAIC)
>
> #Fake data
> n<-200
>
> x1 <- runif(n, -3, 3)
> x2 <- runif(n, -3, 3)
> x3 <- runif(n, -3, 3)
> x4 <- runif(n, -3, 3)
> x5 <- runif(n, -3, 3)
> x6 <- runif(n, -3, 3)
> x7 <- runif(n, -3, 3)
> x8 <- runif(n, -3, 3)
> y1 <- 42+x3 + 2*x6 + 3*x8 + runif(n, -0.5, 0.5)
>
> dat <- data.frame(x1,x2,x3,x4,x5,x6,x7,x8,y1)
> #the real data won't have these names...
>
> cn <- names(dat)
> trg <- "y1"
> xvars <- cn[cn!=trg]
>
> frm1<-as.formula(paste(trg,"~1"))
> frm2<-as.formula(paste(trg,"~ 1 + ",paste(xvars,collapse = "+")))
>
> strt=lm(y1~1,dat) # boot.stepAIC Works fine
>
> #strt=do.call("lm",list(frm1,data=dat)) ## boot.stepAIC FAILS ##
>
> #strt=lm(frm1,dat) ## boot.stepAIC FAILS ##
>
> limit<-5
>
>
> stp=stepAIC(strt,direction='forward',steps=limit,
> scope=list(lower=frm1,upper=frm2))
>
> bst <- boot.stepAIC(strt,dat,B=50,alpha=0.05,direction='forward',steps=limit,
> scope=list(lower=frm1,upper=frm2))
>
> b1 <- bst$Covariates
> ball <- data.frame(b1)
> names(ball)=unlist(trg)
>
> Any ideas?
>
> Cheers,
> SOH
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert Factor to Date

2017-08-22 Thread Spencer Graves


On 2017-08-22 2:04 PM, Patrick Casimir wrote:
>
> This is large data set Spencer. What about when the dates change as below:
>


 � Have you tried what I suggested?� What were the results? Spencer

>
> COL1  COL2
> Jan-141-Aug-16
> Feb-141-Aug-16
> Mar-141-Aug-16
> Apr-141-Aug-16
> May-141-Aug-16
> Jun-141-Aug-16
> Jul-141-Aug-16
> Aug-141-Aug-16
> Sep-141-Aug-16
> Oct-141-Aug-16
> Nov-141-Aug-16
> Dec-141-Aug-16
> Jan-151-Aug-16
> Feb-151-Aug-16
> Mar-151-Aug-16
> Apr-151-Aug-16
> May-151-Aug-16
> Jun-151-Aug-16
> Jul-151-Aug-16
> Aug-151-Aug-16
> Sep-151-Aug-16
> Oct-151-Aug-16
> Nov-151-Aug-16
> Dec-151-Aug-16
> Jan-161-Aug-16
> Feb-161-Aug-16
> Mar-161-Aug-16
> Apr-161-Aug-16
> May-161-Aug-16
> Jun-161-Aug-16
> Jul-161-Aug-16
> Aug-161-Aug-16
> Sep-161-Aug-16
> Oct-161-Aug-16
>
>
>
>
>
>
>
> 
> *From:* R-help  on behalf of Spencer 
> Graves 
> *Sent:* Tuesday, August 22, 2017 2:49 PM
> *To:* r-help@r-project.org
> *Subject:* Re: [R] Convert Factor to Date
>
>
> On 2017-08-22 1:30 PM, Patrick Casimir wrote:
> > Dear R Fellows,
> >
> >
> > I Have a dataset( data1) with 2 columns of date showing a class of 
> factor. How to convert them to date? Then compare them, keep the 
> greater date only in a new column. Using as.Date to change the class 
> to Date but the data becomes NA.
>
>
> �� When I specified a format with the second date, I got the desired
> behavior:
>
>
> �> as.Date(factor('1-Nov-16'), '%d-%b-%y')
> [1] "2016-11-01"
> �> as.Date('Nov-16', '%b-%y')
> [1] NA
> �> as.Date(factor('Nov-16'), '%b-%y')
> [1] NA
> �> as.Date('Nov-16', '%b-%y')
> [1] NA
>
>
> �� To convert the first column, I pasted "1-" in front:
>
>
> as.Date(paste0('1-', factor('Nov-16')), '%d-%b-%y')
>
>
> �� Hope this helps.� Spencer
>
> > Much Thanks
> >
> >
> > COL1��� COL2
> > Apr-16� 1-Nov-16
> > May-16� 1-Nov-16
> > Jun-16� 1-Nov-16
> > Jul-16� 1-Nov-16
> > Aug-16� 1-Nov-16
> > Sep-16� 1-Nov-16
> > Oct-16� 1-Nov-16
> > Nov-16� 1-Nov-16
> > Dec-16� 1-Nov-16
> > Jan-17� 1-Nov-16
> > Feb-17� 1-Nov-16
> > Mar-17� 1-Nov-16
> > Apr-17� 1-Nov-16
> > May-17� 1-Nov-16
> > Jun-17� 1-Nov-16
> > Jul-17� 1-Nov-16
> > Aug-17� 1-Nov-16
> > Sep-17� 1-Nov-16
> >
> >
> >��� [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help=02%7C01%7Cpatrcasi%40nova.edu%7C6abf3517ab5f407427d308d4e98e9efd%7C2c2b2d312e3e4df1b571fb37c042ff1b%7C0%7C0%7C636390246143633480=jwTeb%2BvH0bbkXdckgzE6PJZ3gDl9d1%2F3t9K%2BxDtjyls%3D=0
> > PLEASE do read the posting guide 
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html=02%7C01%7Cpatrcasi%40nova.edu%7C6abf3517ab5f407427d308d4e98e9efd%7C2c2b2d312e3e4df1b571fb37c042ff1b%7C0%7C0%7C636390246143633480=GUAR582xxtA88KLkQC1oPnvyNecfUyXjV9MrIziJicU%3D=0
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help=02%7C01%7Cpatrcasi%40nova.edu%7C6abf3517ab5f407427d308d4e98e9efd%7C2c2b2d312e3e4df1b571fb37c042ff1b%7C0%7C0%7C636390246143633480=jwTeb%2BvH0bbkXdckgzE6PJZ3gDl9d1%2F3t9K%2BxDtjyls%3D=0
> PLEASE do read the posting guide 
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html=02%7C01%7Cpatrcasi%40nova.edu%7C6abf3517ab5f407427d308d4e98e9efd%7C2c2b2d312e3e4df1b571fb37c042ff1b%7C0%7C0%7C636390246143633480=GUAR582xxtA88KLkQC1oPnvyNecfUyXjV9MrIziJicU%3D=0
> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Convert Factor to Date

2017-08-22 Thread Patrick Casimir
This is large data set Spencer. What about when the dates change as below:


COL1COL2
Jan-14  1-Aug-16
Feb-14  1-Aug-16
Mar-14  1-Aug-16
Apr-14  1-Aug-16
May-14  1-Aug-16
Jun-14  1-Aug-16
Jul-14  1-Aug-16
Aug-14  1-Aug-16
Sep-14  1-Aug-16
Oct-14  1-Aug-16
Nov-14  1-Aug-16
Dec-14  1-Aug-16
Jan-15  1-Aug-16
Feb-15  1-Aug-16
Mar-15  1-Aug-16
Apr-15  1-Aug-16
May-15  1-Aug-16
Jun-15  1-Aug-16
Jul-15  1-Aug-16
Aug-15  1-Aug-16
Sep-15  1-Aug-16
Oct-15  1-Aug-16
Nov-15  1-Aug-16
Dec-15  1-Aug-16
Jan-16  1-Aug-16
Feb-16  1-Aug-16
Mar-16  1-Aug-16
Apr-16  1-Aug-16
May-16  1-Aug-16
Jun-16  1-Aug-16
Jul-16  1-Aug-16
Aug-16  1-Aug-16
Sep-16  1-Aug-16
Oct-16  1-Aug-16







From: R-help  on behalf of Spencer Graves 

Sent: Tuesday, August 22, 2017 2:49 PM
To: r-help@r-project.org
Subject: Re: [R] Convert Factor to Date



On 2017-08-22 1:30 PM, Patrick Casimir wrote:
> Dear R Fellows,
>
>
> I Have a dataset( data1) with 2 columns of date showing a class of factor. 
> How to convert them to date? Then compare them, keep the greater date only in 
> a new column. Using as.Date to change the class to Date but the data becomes 
> NA.


   When I specified a format with the second date, I got the desired
behavior:


 > as.Date(factor('1-Nov-16'), '%d-%b-%y')
[1] "2016-11-01"
 > as.Date('Nov-16', '%b-%y')
[1] NA
 > as.Date(factor('Nov-16'), '%b-%y')
[1] NA
 > as.Date('Nov-16', '%b-%y')
[1] NA


   To convert the first column, I pasted "1-" in front:


as.Date(paste0('1-', factor('Nov-16')), '%d-%b-%y')


   Hope this helps.  Spencer

> Much Thanks
>
>
> COL1COL2
> Apr-16  1-Nov-16
> May-16  1-Nov-16
> Jun-16  1-Nov-16
> Jul-16  1-Nov-16
> Aug-16  1-Nov-16
> Sep-16  1-Nov-16
> Oct-16  1-Nov-16
> Nov-16  1-Nov-16
> Dec-16  1-Nov-16
> Jan-17  1-Nov-16
> Feb-17  1-Nov-16
> Mar-17  1-Nov-16
> Apr-17  1-Nov-16
> May-17  1-Nov-16
> Jun-17  1-Nov-16
> Jul-17  1-Nov-16
> Aug-17  1-Nov-16
> Sep-17  1-Nov-16
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help=02%7C01%7Cpatrcasi%40nova.edu%7C6abf3517ab5f407427d308d4e98e9efd%7C2c2b2d312e3e4df1b571fb37c042ff1b%7C0%7C0%7C636390246143633480=jwTeb%2BvH0bbkXdckgzE6PJZ3gDl9d1%2F3t9K%2BxDtjyls%3D=0
> PLEASE do read the posting guide 
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html=02%7C01%7Cpatrcasi%40nova.edu%7C6abf3517ab5f407427d308d4e98e9efd%7C2c2b2d312e3e4df1b571fb37c042ff1b%7C0%7C0%7C636390246143633480=GUAR582xxtA88KLkQC1oPnvyNecfUyXjV9MrIziJicU%3D=0
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help=02%7C01%7Cpatrcasi%40nova.edu%7C6abf3517ab5f407427d308d4e98e9efd%7C2c2b2d312e3e4df1b571fb37c042ff1b%7C0%7C0%7C636390246143633480=jwTeb%2BvH0bbkXdckgzE6PJZ3gDl9d1%2F3t9K%2BxDtjyls%3D=0
PLEASE do read the posting guide 
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html=02%7C01%7Cpatrcasi%40nova.edu%7C6abf3517ab5f407427d308d4e98e9efd%7C2c2b2d312e3e4df1b571fb37c042ff1b%7C0%7C0%7C636390246143633480=GUAR582xxtA88KLkQC1oPnvyNecfUyXjV9MrIziJicU%3D=0
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in .jnew(“java/io/FileOutputStream”, jFile)

2017-08-22 Thread Martin Møller Skarbiniks Pedersen
On 22 August 2017 at 17:51, Александр Мащенко  wrote:
>
>
> I don't know R absolutely, but I have to do this work for my diploma. So
I'm sorry for strange message below. Please, help me anybody decides this
issue.

[...]

> > write.xlsx(ab_ret,
"C:\\Users\\Sapl\\Desktop\\NATA\\code\\Results\\Created by
R\\10.05.2006\\ab_ret_banks_short_form_10.05.2006.xlsx")
> >
> >Error in .jnew("java/io/FileOutputStream", jFile) :
> >java.io.FileNotFoundException:
> >C:\Users\Sapl\Desktop\NATA\code\Results\Created by
R\10.05.2006\ab_ret_banks_short_form_10.05.2006.xlsx (unreadablesymbols)

I see that you have also posted this answer at Stackoverflow:
https://stackoverflow.com/questions/45782728/error-in-jnewjava-io-fileoutputstream-jfile

My best guess is that the directory:
C:\Users\Sapl\Desktop\NATA\code\Results\Created by R\10.05.2006\
doesn't exist.

Regards
Martin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Convert Factor to Date

2017-08-22 Thread Spencer Graves



On 2017-08-22 1:30 PM, Patrick Casimir wrote:

Dear R Fellows,


I Have a dataset( data1) with 2 columns of date showing a class of factor. How 
to convert them to date? Then compare them, keep the greater date only in a new 
column. Using as.Date to change the class to Date but the data becomes NA.



  When I specified a format with the second date, I got the desired 
behavior:



> as.Date(factor('1-Nov-16'), '%d-%b-%y')
[1] "2016-11-01"
> as.Date('Nov-16', '%b-%y')
[1] NA
> as.Date(factor('Nov-16'), '%b-%y')
[1] NA
> as.Date('Nov-16', '%b-%y')
[1] NA


  To convert the first column, I pasted "1-" in front:


as.Date(paste0('1-', factor('Nov-16')), '%d-%b-%y')


  Hope this helps.  Spencer


Much Thanks


COL1COL2
Apr-16  1-Nov-16
May-16  1-Nov-16
Jun-16  1-Nov-16
Jul-16  1-Nov-16
Aug-16  1-Nov-16
Sep-16  1-Nov-16
Oct-16  1-Nov-16
Nov-16  1-Nov-16
Dec-16  1-Nov-16
Jan-17  1-Nov-16
Feb-17  1-Nov-16
Mar-17  1-Nov-16
Apr-17  1-Nov-16
May-17  1-Nov-16
Jun-17  1-Nov-16
Jul-17  1-Nov-16
Aug-17  1-Nov-16
Sep-17  1-Nov-16


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Convert Factor to Date

2017-08-22 Thread Patrick Casimir
Dear R Fellows,


I Have a dataset( data1) with 2 columns of date showing a class of factor. How 
to convert them to date? Then compare them, keep the greater date only in a new 
column. Using as.Date to change the class to Date but the data becomes NA.


Much Thanks


COL1COL2
Apr-16  1-Nov-16
May-16  1-Nov-16
Jun-16  1-Nov-16
Jul-16  1-Nov-16
Aug-16  1-Nov-16
Sep-16  1-Nov-16
Oct-16  1-Nov-16
Nov-16  1-Nov-16
Dec-16  1-Nov-16
Jan-17  1-Nov-16
Feb-17  1-Nov-16
Mar-17  1-Nov-16
Apr-17  1-Nov-16
May-17  1-Nov-16
Jun-17  1-Nov-16
Jul-17  1-Nov-16
Aug-17  1-Nov-16
Sep-17  1-Nov-16


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] boot.stepAIC fails with computed formula

2017-08-22 Thread Bert Gunter
SImplify your call to lm using the "." argument instead of
manipulating formulas.
> strt <- lm(y1 ~ ., data = dat)

and you do not need to explicitly specify the "1+" on the rhs for lm, so

> frm2<-as.formula(paste(trg," ~  ", paste(xvars,collapse = "+")))
works fine, too.

Anyway, doing this gives (but see end of output)"


bst <- boot.stepAIC(strt,data =
dat,B=50,alpha=0.05,direction='forward',steps=limit,
  scope=list(lower=frm1,upper=frm2))

> bst

Summary of Bootstrapping the 'stepAIC()' procedure for

Call:
lm(formula = y1 ~ ., data = dat)

Bootstrap samples: 50
Direction: forward
Penalty: 2 * df

Covariates selected
   (%)
x1 100
x2 100
x3 100
x4 100
x5 100
x6 100
x7 100
x8 100

Coefficients Sign
   + (%) - (%)
x3   100 0
x6   100 0
x8   100 0
x19010
x28614
x76238
x44456
x5 0   100

Stat Significance
   (%)
x3 100
x6 100
x8 100
x5  34
x2  20
x1  16
x4  10
x7   4


The stepAIC() for the original data-set gave

Call:
lm(formula = y1 ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8, data = dat)

Coefficients:
(Intercept)   x1   x2   x3   x4   x5
  42.008824 0.012304 0.010925 0.976469-0.005183-0.021041
 x6   x7   x8
   2.000649 0.004461 3.007071


Stepwise Model Path
Analysis of Deviance Table

Initial Model:
y1 ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8

Final Model:
y1 ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8


  Step Df Deviance Resid. Df Resid. Dev AIC
1191   16.14592 -485.33


HOWEVER, I do not know why your failed calls failed. In view of the
above, it appears to be a bug in the formula interface, so if you do
not get a satisfactory answer here (i.e. I am wrong about this), you
should contact the maintainer.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Aug 22, 2017 at 10:08 AM, Steve O'Hagan
 wrote:
> The error is "the model fit failed in 50 bootstrap samples
> Error: non-character argument"
>
> Cheers,
> SOH.
>
>
> On 22/08/2017 17:52, Bert Gunter wrote:
>>
>> Failed?  What was the error message?
>>
>> Cheers,
>>
>> Bert
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Tue, Aug 22, 2017 at 8:17 AM, Stephen O'hagan
>>  wrote:
>>>
>>> I'm trying to use boot.stepAIC for feature selection; I need to be able
>>> to specify the name of the dependent variable programmatically, but this
>>> appear to fail:
>>>
>>> In R-Studio with MS R Open 3.4:
>>>
>>> library(bootStepAIC)
>>>
>>> #Fake data
>>> n<-200
>>>
>>> x1 <- runif(n, -3, 3)
>>> x2 <- runif(n, -3, 3)
>>> x3 <- runif(n, -3, 3)
>>> x4 <- runif(n, -3, 3)
>>> x5 <- runif(n, -3, 3)
>>> x6 <- runif(n, -3, 3)
>>> x7 <- runif(n, -3, 3)
>>> x8 <- runif(n, -3, 3)
>>> y1 <- 42+x3 + 2*x6 + 3*x8 + runif(n, -0.5, 0.5)
>>>
>>> dat <- data.frame(x1,x2,x3,x4,x5,x6,x7,x8,y1)
>>> #the real data won't have these names...
>>>
>>> cn <- names(dat)
>>> trg <- "y1"
>>> xvars <- cn[cn!=trg]
>>>
>>> frm1<-as.formula(paste(trg,"~1"))
>>> frm2<-as.formula(paste(trg,"~ 1 + ",paste(xvars,collapse = "+")))
>>>
>>> strt=lm(y1~1,dat) # boot.stepAIC Works fine
>>>
>>> #strt=do.call("lm",list(frm1,data=dat)) ## boot.stepAIC FAILS ##
>>>
>>> #strt=lm(frm1,dat) ## boot.stepAIC FAILS ##
>>>
>>> limit<-5
>>>
>>>
>>> stp=stepAIC(strt,direction='forward',steps=limit,
>>>  scope=list(lower=frm1,upper=frm2))
>>>
>>> bst <-
>>> boot.stepAIC(strt,dat,B=50,alpha=0.05,direction='forward',steps=limit,
>>>  scope=list(lower=frm1,upper=frm2))
>>>
>>> b1 <- bst$Covariates
>>> ball <- data.frame(b1)
>>> names(ball)=unlist(trg)
>>>
>>> Any ideas?
>>>
>>> Cheers,
>>> SOH
>>>
>>>
>>>  [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] boot.stepAIC fails with computed formula

2017-08-22 Thread Bert Gunter
Failed?  What was the error message?

Cheers,

Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Aug 22, 2017 at 8:17 AM, Stephen O'hagan
 wrote:
> I'm trying to use boot.stepAIC for feature selection; I need to be able to 
> specify the name of the dependent variable programmatically, but this appear 
> to fail:
>
> In R-Studio with MS R Open 3.4:
>
> library(bootStepAIC)
>
> #Fake data
> n<-200
>
> x1 <- runif(n, -3, 3)
> x2 <- runif(n, -3, 3)
> x3 <- runif(n, -3, 3)
> x4 <- runif(n, -3, 3)
> x5 <- runif(n, -3, 3)
> x6 <- runif(n, -3, 3)
> x7 <- runif(n, -3, 3)
> x8 <- runif(n, -3, 3)
> y1 <- 42+x3 + 2*x6 + 3*x8 + runif(n, -0.5, 0.5)
>
> dat <- data.frame(x1,x2,x3,x4,x5,x6,x7,x8,y1)
> #the real data won't have these names...
>
> cn <- names(dat)
> trg <- "y1"
> xvars <- cn[cn!=trg]
>
> frm1<-as.formula(paste(trg,"~1"))
> frm2<-as.formula(paste(trg,"~ 1 + ",paste(xvars,collapse = "+")))
>
> strt=lm(y1~1,dat) # boot.stepAIC Works fine
>
> #strt=do.call("lm",list(frm1,data=dat)) ## boot.stepAIC FAILS ##
>
> #strt=lm(frm1,dat) ## boot.stepAIC FAILS ##
>
> limit<-5
>
>
> stp=stepAIC(strt,direction='forward',steps=limit,
> scope=list(lower=frm1,upper=frm2))
>
> bst <- boot.stepAIC(strt,dat,B=50,alpha=0.05,direction='forward',steps=limit,
> scope=list(lower=frm1,upper=frm2))
>
> b1 <- bst$Covariates
> ball <- data.frame(b1)
> names(ball)=unlist(trg)
>
> Any ideas?
>
> Cheers,
> SOH
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to benchmark speed of load/readRDS correctly

2017-08-22 Thread William Dunlap via R-help
Note that if you force a garbage collection each iteration the times are
more stable.  However, on the average it is faster to let the garbage
collector decide when to leap into action.

mb_gc <- microbenchmark::microbenchmark(gc(), { x <- as.list(sin(1:5e5)); x
<- unlist(x) / cos(1:5e5) ; sum(x) }, times=1000,
control=list(order="inorder"))
with(mb_gc, plot(time[expr!="gc()"]))
with(mb_gc, quantile(1e-6*time[expr!="gc()"], c(0, .5, .75, .9, .95, .99,
1)))
#   0%   50%   75%   90%   95%   99%  100%
# 59.33450  61.33954  63.43457  66.23331  68.93746  74.45629 158.09799



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Aug 22, 2017 at 9:26 AM, William Dunlap  wrote:

> The large value for maximum time may be due to garbage collection, which
> happens periodically.   E.g., try the following, where the
> unlist(as.list()) creates a lot of garbage.  I get a very large time every
> 102 or 51 iterations and a moderately large time more often
>
> mb <- microbenchmark::microbenchmark({ x <- as.list(sin(1:5e5)); x <-
> unlist(x) / cos(1:5e5) ; sum(x) }, times=1000)
> plot(mb$time)
> quantile(mb$time * 1e-6, c(0, .5, .75, .90, .95, .99, 1))
> #   0%   50%   75%   90%   95%   99%  100%
> # 59.04446  82.15453 102.17522 180.36986 187.52667 233.42062 249.33970
> diff(which(mb$time > quantile(mb$time, .99)))
> # [1] 102  51 102 102 102 102 102 102  51
> diff(which(mb$time > quantile(mb$time, .95)))
> # [1]  6 41  4 47  4 40  7  4 47  4 33 14  4 47  4 47  4 47  4 47  4 47  4
>  6 41
> #[26]  4  6  7  9 25  4 47  4 47  4 47  4 22 25  4 33 14  4  6 41  4 47  4
> 22
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Tue, Aug 22, 2017 at 5:53 AM, 
> wrote:
>
>> Dear all
>>
>> I was thinking about efficient reading data into R and tried several ways
>> to test if load(file.Rdata) or readRDS(file.rds) is faster. The files
>> file.Rdata and file.rds contain the same data, the first created with
>> save(d, ' file.Rdata', compress=F) and the second with saveRDS(d, '
>> file.rds', compress=F).
>>
>> First I used the function microbenchmark() and was a astonished about the
>> max value of the output.
>>
>> FIRST TEST:
>> > library(microbenchmark)
>> > microbenchmark(
>> +   n <- readRDS('file.rds'),
>> +   load('file.Rdata')
>> + )
>> Unit: milliseconds
>>   expr minlq
>>  meanmedianuq
>>  max  neval
>> n <- readRDS(fl1)106.5956  109.6457 237.3844
>> 117.8956  141.9921  10934.162   100
>>  load(fl2)  295.0654  301.8162
>> 335.6266  308.3757  319.6965  1915.706
>> 100
>>
>> It looks like the max value is an outlier.
>>
>> So I tried:
>> SECOND TEST:
>> > sapply(1:10, function(x) system.time(n <- readRDS('file.rds'))[3])
>> elapsed   elapsed   elapsed
>>  elapsed   elapsed   elapsed   elapsed
>>  elapsed elapsed   elapsed
>>   10.50   0.11   0.11
>>0.11   0.10   0.11
>>  0.11   0.11   0.12
>>0.12
>> > sapply(1:10, function(x) system.time(load'flie.Rdata'))[3])
>> elapsed   elapsed   elapsed
>>  elapsed   elapsed   elapsed   elapsed
>>  elapsed elapsed   elapsed
>>1.860.29   0.31
>>0.30   0.30   0.31
>>  0.30   0.29   0.31
>>0.30
>>
>> Which confirmed my suspicion; the first time loading the data takes much
>> longer than the following times. I suspect that this has something to do
>> how the data is assigned and that R doesn't has to 'fully' read the data,
>> if it is read the second time.
>>
>> So the question remains, how can I make a realistic benchmark test? From
>> the first test I would conclude that reading the *.rds file is faster. But
>> this holds only for a large number of neval. If I set times = 1 then
>> reading the *.Rdata would be faster (as also indicated by the second test).
>>
>> Thanks for any help or comments.
>>
>> Kind regards
>>
>> Raphael
>> 
>> 
>> Raphael Felber, PhD
>> Scientific Officer, Climate & Air Pollution
>>
>> Federal Department of Economic Affairs,
>> Education and Research EAER
>> Agroscope
>> Research Division, Agroecology and Environment
>>
>> Reckenholzstrasse 191, CH-8046 Zürich
>> Phone +41 58 468 75 11 

Re: [R] boot.stepAIC fails with computed formula

2017-08-22 Thread Steve O'Hagan

The error is "the model fit failed in 50 bootstrap samples
Error: non-character argument"

Cheers,
SOH.

On 22/08/2017 17:52, Bert Gunter wrote:

Failed?  What was the error message?

Cheers,

Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Aug 22, 2017 at 8:17 AM, Stephen O'hagan
 wrote:

I'm trying to use boot.stepAIC for feature selection; I need to be able to 
specify the name of the dependent variable programmatically, but this appear to 
fail:

In R-Studio with MS R Open 3.4:

library(bootStepAIC)

#Fake data
n<-200

x1 <- runif(n, -3, 3)
x2 <- runif(n, -3, 3)
x3 <- runif(n, -3, 3)
x4 <- runif(n, -3, 3)
x5 <- runif(n, -3, 3)
x6 <- runif(n, -3, 3)
x7 <- runif(n, -3, 3)
x8 <- runif(n, -3, 3)
y1 <- 42+x3 + 2*x6 + 3*x8 + runif(n, -0.5, 0.5)

dat <- data.frame(x1,x2,x3,x4,x5,x6,x7,x8,y1)
#the real data won't have these names...

cn <- names(dat)
trg <- "y1"
xvars <- cn[cn!=trg]

frm1<-as.formula(paste(trg,"~1"))
frm2<-as.formula(paste(trg,"~ 1 + ",paste(xvars,collapse = "+")))

strt=lm(y1~1,dat) # boot.stepAIC Works fine

#strt=do.call("lm",list(frm1,data=dat)) ## boot.stepAIC FAILS ##

#strt=lm(frm1,dat) ## boot.stepAIC FAILS ##

limit<-5


stp=stepAIC(strt,direction='forward',steps=limit,
 scope=list(lower=frm1,upper=frm2))

bst <- boot.stepAIC(strt,dat,B=50,alpha=0.05,direction='forward',steps=limit,
 scope=list(lower=frm1,upper=frm2))

b1 <- bst$Covariates
ball <- data.frame(b1)
names(ball)=unlist(trg)

Any ideas?

Cheers,
SOH


 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed-rank test

2017-08-22 Thread Bert Gunter
This query is offtopic for this list, as it is about statistics, not R
programming. stats.stackexchange.com is a good venue for statistics
questions.

However, you are confused. Wilcoxon does NOT test for differences in
population means. e.g.

Consider the 2 samples:

A: 5,6,7
B: 1,2, 50


Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Aug 22, 2017 at 9:20 AM, Karolis Uziela
 wrote:
> Hi,
>
> I am using wilcox.test function to test the difference between the means of
> two samples. The data points are paired, so I am using a paired test.
>
> There is one strange case. Sample A has a higher mean than a sample B.
> However, wilcox.test function says that sample B has a significantly higher
> "mean rank" than sample A. How is it possible?
>
> Here is the code (data file is attached):
> df <- read.table("wilcox_data.txt", head=TRUE)
> mean(df$A)
> [1] 0.7987849
> mean(df$B)
> [1] 0.7977966
> mean(df$C)
> [1] 0.6350737
>
> wilcox.test(df$B, df$A, paired=TRUE, alternative="greater")
> Wilcoxon signed rank test with continuity correction
>
> data:  df$B and df$A
> V = 134300, p-value = 3.299e-05
> alternative hypothesis: true location shift is greater than 0
>
> wilcox.test(df$C, df$A, paired=TRUE, alternative="greater")
> Wilcoxon signed rank test with continuity correction
>
> data:  df$C and df$A
> V = 41423, p-value = 1
> alternative hypothesis: true location shift is greater than 0
>
> The p-value of the first test is rather low (3.299e-05), which indicates
> that the alternative hypothesis is true - sample B has a higher "mean rank"
> than sample A. Just to make sure I am not doing a dumb mistake, I added a
> third variable C to this example, which is much smaller than A or B. As
> expected, the second test has p-value = 1, which means that "mean rank" of
> C is lower than A (null hypothesis is true).
>
> I am afraid, I am not very strong in statistics, but I would very much
> appreciate if someone could explain me in simple words:
> 1) Wikipedia says that Wilcoxon signed-rank test is used to test whether
> population "mean ranks" differ. What is exactly the definition of "mean
> rank"? https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test
> 2) How can the mean of a variable A be bigger than the mean of variable B,
> but the "mean rank" of variable B is significantly bigger than "mean rank"
> of variable A.
>
> There is a small chance that this is because of a bug in wilcox.test
> function, but it is probably more likely that this paradox is because of
> some statistics phenomena that I don't understand.
>
> Best regards,
> Karolis Uziela
>
> P. S. I have another strange example, where the difference between A and B
> is much smaller than the difference between A and C, but the significance
> of the "mean rank" difference between A and B is much larger then the
> significance of mean rank difference between A and C. For simplicity
> reasons, I didn't add that example here, but I guess that the answer to the
> above question will be related to this one.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Wilcoxon signed-rank test

2017-08-22 Thread Karolis Uziela
Hi,

I am using wilcox.test function to test the difference between the means of
two samples. The data points are paired, so I am using a paired test.

There is one strange case. Sample A has a higher mean than a sample B.
However, wilcox.test function says that sample B has a significantly higher
"mean rank" than sample A. How is it possible?

Here is the code (data file is attached):
df <- read.table("wilcox_data.txt", head=TRUE)
mean(df$A)
[1] 0.7987849
mean(df$B)
[1] 0.7977966
mean(df$C)
[1] 0.6350737

wilcox.test(df$B, df$A, paired=TRUE, alternative="greater")
Wilcoxon signed rank test with continuity correction

data:  df$B and df$A
V = 134300, p-value = 3.299e-05
alternative hypothesis: true location shift is greater than 0

wilcox.test(df$C, df$A, paired=TRUE, alternative="greater")
Wilcoxon signed rank test with continuity correction

data:  df$C and df$A
V = 41423, p-value = 1
alternative hypothesis: true location shift is greater than 0

The p-value of the first test is rather low (3.299e-05), which indicates
that the alternative hypothesis is true - sample B has a higher "mean rank"
than sample A. Just to make sure I am not doing a dumb mistake, I added a
third variable C to this example, which is much smaller than A or B. As
expected, the second test has p-value = 1, which means that "mean rank" of
C is lower than A (null hypothesis is true).

I am afraid, I am not very strong in statistics, but I would very much
appreciate if someone could explain me in simple words:
1) Wikipedia says that Wilcoxon signed-rank test is used to test whether
population "mean ranks" differ. What is exactly the definition of "mean
rank"? https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test
2) How can the mean of a variable A be bigger than the mean of variable B,
but the "mean rank" of variable B is significantly bigger than "mean rank"
of variable A.

There is a small chance that this is because of a bug in wilcox.test
function, but it is probably more likely that this paradox is because of
some statistics phenomena that I don't understand.

Best regards,
Karolis Uziela

P. S. I have another strange example, where the difference between A and B
is much smaller than the difference between A and C, but the significance
of the "mean rank" difference between A and B is much larger then the
significance of mean rank difference between A and C. For simplicity
reasons, I didn't add that example here, but I guess that the answer to the
above question will be related to this one.
A B C
0.493579934773664 0.511836168474316 0.520749710683309
0.630184025170146 0.58786433109054 0.458946430077942
0.633512186023312 0.493175938647652 0.66308680696517
0.491989020132133 0.520090183585035 0.420844896432205
0.0399860725369978 -0.0301629556552143 -0.161169807765403
0.598248869116991 0.637706466762332 0.722797128117201
0.954025476347433 0.977016137944116 0.750624074377985
0.862950985422539 0.868732350639475 0.730996449953475
0.659015458019471 0.631657908519014 0.649268814371901
0.96851459376 0.951663741829449 0.887377555251177
0.791126281396265 0.806829715682826 0.520987050223108
0.790981668919217 0.832554187306653 0.816906470482764
0.952134063327928 0.950267697811188 0.888111039865367
0.979447757413485 0.965082926636581 0.978260925492216
0.767165875729762 0.809954221337577 0.598870382402464
0.913408312148682 0.952665220683963 0.943094656757241
0.972608015661106 0.979393087897226 0.93167912128025
0.92368233901018 0.884309958031427 0.857603379672018
0.898019141975969 0.902422790997998 0.972835691590958
0.583425531942068 0.60080296136487 0.0762982452965866
0.971925274787888 0.97627957685407 0.937911525197627
0.809897828917177 0.800664937235237 0.799131043225983
0.85205447905511 0.913651623435268 0.947490858434091
0.673979778229831 0.757789915726126 0.785004269972942
0.92791391929192 0.952431131239962 0.958965163180968
0.888104211563345 0.896293265932192 0.757021764767304
0.754923516313538 0.743209910646708 0.400590699921333
0.790444023531614 0.839202269416027 0.728543527605082
0.489700648107734 0.546352461567751 0.246692598690384
0.919047447196611 0.937637379169673 0.854373325181927
0.746567053460642 0.732674872823925 0.696672515669091
0.82109253132961 0.805084424922876 0.795858547687794
0.931033788637529 0.94668898248564 0.852845771346312
0.789707695658712 0.804674058488827 0.917422903218963
0.948317951802274 0.974637687624406 0.98965193862357
0.710108950455433 0.863972180005134 0.693696835692262
0.812826034207032 0.848422587445403 0.577844837616433
0.833755877836278 0.762137912181951 0.744863154045316
0.754586240931889 0.818206106263108 0.785493914964923
0.881927578630931 0.852124198530008 0.862181321497686
0.815417604208001 0.82311408988169 0.59316052481171
0.913033642409478 0.909933623735767 0.740059472208528
0.928085804060841 0.9223273915188 0.316043076521959
0.543637616376008 0.589592007011875 0.485877283274438
0.874049458665115 0.909943060701501 0.913855131935667
0.959075608346417 0.957696030631906 

Re: [R] How to benchmark speed of load/readRDS correctly

2017-08-22 Thread Jeff Newmiller
Caching happens, both within the operating system and within the C standard 
library. Ostensibly the intent for those caches is to help performance, but you 
are right that different low-level caching algorithms can be a poor match for 
specific application level use cases such as copying files or parsing text 
syntax. However, the OS and even the specific file system drivers (e.g. ext4 on 
flash disk or FAT32 on magnetic media) can behave quite differently for the 
same application level use case, so a generic discussion at the R language 
level (this mailing list) can be almost impossible to sort out intelligently. 
-- 
Sent from my phone. Please excuse my brevity.

On August 22, 2017 7:11:39 AM PDT, J C Nash  wrote:
>Not convinced Jeff is completely right about this not concerning R,
>since I've found that the application language (R, 
>perl, etc.) makes a difference in how files are accessed by/to OS. He
>is certainly correct that OS (and versions) are 
>where the actual reading and writing happens, but sometimes the call to
>those can be inefficient. (Sorry, I've not got 
>examples specifically for file reads, but had a case in computation
>where there was an 800% i.e., 8 fold difference 
>in timing with R, which rather took my breath away. That's probably
>been sorted now.) The difficulty in making general 
>statements is that a rather full set of comparisons over different
>commands, datasets, OS and version variants is needed 
>before the general picture can emerge. Using microbenchmark when you
>need to find the bottlenecks is how I'd proceed, 
>which OP is doing.
>
>About 30 years ago, I did write up some preliminary work, never
>published, on estimating the two halves of a copy, that 
>is, the reading from file and storing to "memory" or a different
>storage location. This was via regression with a 
>singular design matrix, but one can get a minimal length least squares
>solution via svd. Possibly relevant today to try 
>to get at slow links on a network.
>
>JN
>
>On 2017-08-22 09:07 AM, Jeff Newmiller wrote:
>> You need to study how reading files works in your operating system.
>This question is not about R.
>> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to benchmark speed of load/readRDS correctly

2017-08-22 Thread William Dunlap via R-help
The large value for maximum time may be due to garbage collection, which
happens periodically.   E.g., try the following, where the
unlist(as.list()) creates a lot of garbage.  I get a very large time every
102 or 51 iterations and a moderately large time more often

mb <- microbenchmark::microbenchmark({ x <- as.list(sin(1:5e5)); x <-
unlist(x) / cos(1:5e5) ; sum(x) }, times=1000)
plot(mb$time)
quantile(mb$time * 1e-6, c(0, .5, .75, .90, .95, .99, 1))
#   0%   50%   75%   90%   95%   99%  100%
# 59.04446  82.15453 102.17522 180.36986 187.52667 233.42062 249.33970
diff(which(mb$time > quantile(mb$time, .99)))
# [1] 102  51 102 102 102 102 102 102  51
diff(which(mb$time > quantile(mb$time, .95)))
# [1]  6 41  4 47  4 40  7  4 47  4 33 14  4 47  4 47  4 47  4 47  4 47  4
 6 41
#[26]  4  6  7  9 25  4 47  4 47  4 47  4 22 25  4 33 14  4  6 41  4 47  4
22



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Aug 22, 2017 at 5:53 AM,  wrote:

> Dear all
>
> I was thinking about efficient reading data into R and tried several ways
> to test if load(file.Rdata) or readRDS(file.rds) is faster. The files
> file.Rdata and file.rds contain the same data, the first created with
> save(d, ' file.Rdata', compress=F) and the second with saveRDS(d, '
> file.rds', compress=F).
>
> First I used the function microbenchmark() and was a astonished about the
> max value of the output.
>
> FIRST TEST:
> > library(microbenchmark)
> > microbenchmark(
> +   n <- readRDS('file.rds'),
> +   load('file.Rdata')
> + )
> Unit: milliseconds
>   expr minlq
>  meanmedianuq
>  max  neval
> n <- readRDS(fl1)106.5956  109.6457 237.3844
> 117.8956  141.9921  10934.162   100
>  load(fl2)  295.0654  301.8162
> 335.6266  308.3757  319.6965  1915.706
> 100
>
> It looks like the max value is an outlier.
>
> So I tried:
> SECOND TEST:
> > sapply(1:10, function(x) system.time(n <- readRDS('file.rds'))[3])
> elapsed   elapsed   elapsed   elapsed
>  elapsed   elapsed   elapsed
>elapsed elapsed   elapsed
>   10.50   0.11   0.11
>  0.11   0.10   0.11
>0.11   0.11   0.12
>  0.12
> > sapply(1:10, function(x) system.time(load'flie.Rdata'))[3])
> elapsed   elapsed   elapsed   elapsed
>  elapsed   elapsed   elapsed
>elapsed elapsed   elapsed
>1.860.29   0.31
>0.30   0.30   0.31
>  0.30   0.29   0.31
>0.30
>
> Which confirmed my suspicion; the first time loading the data takes much
> longer than the following times. I suspect that this has something to do
> how the data is assigned and that R doesn't has to 'fully' read the data,
> if it is read the second time.
>
> So the question remains, how can I make a realistic benchmark test? From
> the first test I would conclude that reading the *.rds file is faster. But
> this holds only for a large number of neval. If I set times = 1 then
> reading the *.Rdata would be faster (as also indicated by the second test).
>
> Thanks for any help or comments.
>
> Kind regards
>
> Raphael
> 
> 
> Raphael Felber, PhD
> Scientific Officer, Climate & Air Pollution
>
> Federal Department of Economic Affairs,
> Education and Research EAER
> Agroscope
> Research Division, Agroecology and Environment
>
> Reckenholzstrasse 191, CH-8046 Zürich
> Phone +41 58 468 75 11
> Fax +41 58 468 72 01
> raphael.fel...@agroscope.admin.ch >
> www.agroscope.ch
>
>
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error in .jnew(“java/io/FileOutputStream”, jFile)

2017-08-22 Thread Александр Мащенко

I don't know R absolutely, but I have to do this work for my diploma. So I'm 
sorry for strange message below. Please, help me anybody decides this issue.
If you need any information, I'll show you all what you need.
**error**:
> #save
> #csv
>file.create("C:\\Users\\Sapl\\Desktop\\NATA\\code\\Results\\Created by 
>R\\ab_ret_banks_short_form_10.05.2006.csv")
[1] TRUE
> write.csv(ab_ret, "C:\\Users\\Sapl\\Desktop\\NATA\\code\\Results\\Created by 
> R\\ab_ret_banks_short_form_10.05.2006.csv")
> #xlsx
> file.create("C:\\Users\\Sapl\\Desktop\\NATA\\code\\Results\\Created by 
> R\\ab_ret_banks_short_form_10.05.2006.xlsx")
[1] TRUE
> write.xlsx(ab_ret, "C:\\Users\\Sapl\\Desktop\\NATA\\code\\Results\\Created by 
> R\\10.05.2006\\ab_ret_banks_short_form_10.05.2006.xlsx")
> 
>Error in .jnew("java/io/FileOutputStream", jFile) : 
>java.io.FileNotFoundException: 
>C:\Users\Sapl\Desktop\NATA\code\Results\Created by 
>R\10.05.2006\ab_ret_banks_short_form_10.05.2006.xlsx (unreadablesymbols)
**EDIT:**
write.xlsx(abn_ret_all,"C:\\Users\\Sapl\\Desktop\\NATA\\code\\Results\\Created 
by R\\10.05.2006\\ab_ret_banks_long_form_13.07.10.05.2006.xlsx")
Error in .jnew("java/io/FileOutputStream", jFile) : 
java.io.FileNotFoundException: C:\Users\Sapl\Desktop\NATA\code\Results\Created 
by R\10.05.2006\ab_ret_banks_long_form_13.07.10.05.2006.xlsx (Системе 
не удается найти указанный путь)
Also, I've got one more error. It's just following the previous one. Sorry, I'm 
totally noob here ..
In addition: Warning message:
In data.row.names(row.names, rowsi, i) :
some row.names duplicated: 
2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133
 --> row.names NOT used
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] boot.stepAIC fails with computed formula

2017-08-22 Thread Stephen O'hagan
I'm trying to use boot.stepAIC for feature selection; I need to be able to 
specify the name of the dependent variable programmatically, but this appear to 
fail:

In R-Studio with MS R Open 3.4:

library(bootStepAIC)

#Fake data
n<-200

x1 <- runif(n, -3, 3)
x2 <- runif(n, -3, 3)
x3 <- runif(n, -3, 3)
x4 <- runif(n, -3, 3)
x5 <- runif(n, -3, 3)
x6 <- runif(n, -3, 3)
x7 <- runif(n, -3, 3)
x8 <- runif(n, -3, 3)
y1 <- 42+x3 + 2*x6 + 3*x8 + runif(n, -0.5, 0.5)

dat <- data.frame(x1,x2,x3,x4,x5,x6,x7,x8,y1)
#the real data won't have these names...

cn <- names(dat)
trg <- "y1"
xvars <- cn[cn!=trg]

frm1<-as.formula(paste(trg,"~1"))
frm2<-as.formula(paste(trg,"~ 1 + ",paste(xvars,collapse = "+")))

strt=lm(y1~1,dat) # boot.stepAIC Works fine

#strt=do.call("lm",list(frm1,data=dat)) ## boot.stepAIC FAILS ##

#strt=lm(frm1,dat) ## boot.stepAIC FAILS ##

limit<-5


stp=stepAIC(strt,direction='forward',steps=limit,
scope=list(lower=frm1,upper=frm2))

bst <- boot.stepAIC(strt,dat,B=50,alpha=0.05,direction='forward',steps=limit,
scope=list(lower=frm1,upper=frm2))

b1 <- bst$Covariates
ball <- data.frame(b1)
names(ball)=unlist(trg)

Any ideas?

Cheers,
SOH


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R-es] Interpretación modelo ZINB

2017-08-22 Thread miriam . alzate
Buenas tardes,

He ejecutado un modelo binomial negativo de ceros inflados y me gustaría
que me ayudarais en la interpretación del modelo. Me gustaría saber
también como validarlo y probar la robustez. Os explico un poco que estoy
modelizando.
Te explico un poco mis datos y lo que quiero modelizar con el ZINB.


La variable dependiente del modelo es el nº de votos de utilidad que
recibe una reseña en una página web.  Cuando el consumidor la lee tiene 3
opciones, votarla como "útil", como "no útil" o no votar.  Para este
estudio estamos interesados en los votos de "util", incluyendo aquellas
que tienen 0 votos "util".

Nosotros proponemos un proceso que el consumidor sigue hasta que toma la
decisión de votar la reseña o no que consta de 3 etapas.

La idea con el modelo ZINB es ver como cada variable de cada etapa afecta
a la "utilidad" de las reseñas, que es la variable dependiente. La idea es
meter las variables como variables independientes del modelo y ver como
cada una afecta a la dependiente.
El modelo ZINB nos interesa porque distingue entre ceros verdaderos y
falsos. En nuestro caso, una reseña puede tener 0 votos porque realmente
no es de calidad y no es util o porque no ha sido leída por el consumidor
y por lo tanto, no ha podido ser votada. En nuestro contexto, es un cero
verdadero aquel de la reseña que tiene 0 votos porque aunque ha sido leída
no ha sido votada. Cero falso es aquel de la reseña que tiene 0 votos
porque no ha sido leída y por lo tanto no ha podido ser votada.

El modelo zero-inflation del ZINB, mide la probabilidad de que un cero sea
falso, es decir de que la reseña no se vote porque no se lee. En esta
parte del modelo entran en juego las variables de las dos primeras etapas
del proceso de voto. Tanto la probabilidad de considerar los productos
como la de considerar la reseña van a influir en que una reseña se lea o
no se lea, es decir en que los ceros sean falsos. Está claro que si no se
lee ( no es vista por el consumidor), las variables de la etapa de voto
(las propias de la reseña y el emisor) no pueden ser consideradas porque
no se ha leido la reseña y por ello  no afectan a esta parte del modelo.
Si la reseña se lee, es cuando se puede votar o no votar, y esta parte la
mide ya el modelo de conteo del modelo ZINB.

En el modelo de conteo, que tiene en cuenta tanto las reseñas con 0 votos
(ceros verdaderos) como con más votos, entran en juego las variables de la
reseña y el emisor. Una vez que el consumidor lee la reseña, la decisión
de votarla como util o no votarla va  a depender de los factores de la
reseña.


Por lo tanto, el modelo que tenemos quedaría así en R:

Call:
zeroinfl(formula = Evolucion.Yesvotes ~ Average.Rating.Inconsistency.abs +
Review.Quicktake.Dummy.y + WC.y + Title_wc + Quicktake_wc + Tone.y +
Authentic.y + Analytic.y + Clout.y + Physical.Information.Sum +
Average.Reviewer.Reviews + Reviewer.Expenditure.Group.2017 |
Average.Product.Consideration.Bestselling +
Average.Product.Consideration.New +
Average.Product.Consideration.TopRated +
Average.Review.Consideration.MostHelpful +
Average.Review.Consideration.Newest +
Average.Review.Consideration.TopContributor, data = Comunes, dist =
"negbin")

Pearson residuals:
 Min   1Q   Median   3Q  Max
-0.68126 -0.07371 -0.04387 -0.02596 28.45862

Count model coefficients (negbin with log link):
Estimate Std. Error z
value Pr(>|z|)
(Intercept)   -5.4727521  0.3075131
-17.797  < 2e-16 ***
Average.Rating.Inconsistency.abs   0.1439045  0.0578238  
2.489  0.01282 *
Review.Quicktake.Dummy.y   2.6065756  0.2233601 
11.670  < 2e-16 ***
WC.y   0.0049510  0.0005680  
8.717  < 2e-16 ***
Title_wc  -0.0175549  0.0178333 
-0.984  0.32493
Quicktake_wc  -0.0190379  0.0136268 
-1.397  0.16239
Tone.y 0.0040236  0.0017236  
2.334  0.01957 *
Authentic.y   -0.0039767  0.0013314 
-2.987  0.00282 **
Analytic.y 0.0039323  0.0014093  
2.790  0.00527 **
Clout.y   -0.0050619  0.0019687 
-2.571  0.01013 *
Physical.Information.Sum   0.2725491  0.0366530  
7.436 1.04e-13 ***
Average.Reviewer.Reviews  -0.0002728  0.0001539 
-1.772  0.07634 .
Reviewer.Expenditure.Group.2017BEAUTY INSIDER -0.3127326  0.1517806 
-2.060  0.03936 *
Reviewer.Expenditure.Group.2017VIB 0.1779505  0.1630176  
1.092  0.27501
Reviewer.Expenditure.Group.2017VIB ROUGE   0.1971850  0.1589578  
1.240  0.21480
Log(theta)-0.5719754  0.1038317 
-5.509 3.62e-08 ***

Zero-inflation model coefficients (binomial with logit link):
 Estimate Std. Error z value
Pr(>|z|)

Re: [R] Rounding of problem with sum command in R

2017-08-22 Thread Bert Gunter
... and following up on Spencer's answer, see the "digits" argument of
?options and ?print.default.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Aug 22, 2017 at 7:30 AM, Spencer Graves
 wrote:
>
>
> On 2017-08-22 9:26 AM, niharika singhal wrote:
>>
>> Hello I have a vector
>>
>> v=c(0.0886,0.1744455,0.1379778,0.1209769,0.1573065,0.1134463,0.2074027)
>> when i do
>> sum(v)
>> or
>> 0.0886+0.1744455+0.1379778+0.1209769+0.1573065+0.1134463+0.2074027
>> i am getting output as 1
>
>
>
> No:  That's only the display:
>
>
>> sum(v)-1
> [1] 1.6e-07
>
>
>   hope this helps.  Spencer
>
>
>> But if i add them manually i get
>> 1.0026
>> I do not want to round of my value since it effect my code further
>> Can anyone suggest how can i avoid this.
>>
>> Thanks & Regards
>> Niharika Singhal
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rounding of problem with sum command in R

2017-08-22 Thread Spencer Graves



On 2017-08-22 9:26 AM, niharika singhal wrote:

Hello I have a vector
v=c(0.0886,0.1744455,0.1379778,0.1209769,0.1573065,0.1134463,0.2074027)
when i do
sum(v)
or
0.0886+0.1744455+0.1379778+0.1209769+0.1573065+0.1134463+0.2074027
i am getting output as 1



No:  That's only the display:


> sum(v)-1
[1] 1.6e-07


  hope this helps.  Spencer


But if i add them manually i get
1.0026
I do not want to round of my value since it effect my code further
Can anyone suggest how can i avoid this.

Thanks & Regards
Niharika Singhal

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rounding of problem with sum command in R

2017-08-22 Thread niharika singhal
Hello I have a vector
v=c(0.0886,0.1744455,0.1379778,0.1209769,0.1573065,0.1134463,0.2074027)
when i do
sum(v)
or
0.0886+0.1744455+0.1379778+0.1209769+0.1573065+0.1134463+0.2074027
i am getting output as 1
But if i add them manually i get
1.0026
I do not want to round of my value since it effect my code further
Can anyone suggest how can i avoid this.

Thanks & Regards
Niharika Singhal

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to benchmark speed of load/readRDS correctly

2017-08-22 Thread J C Nash
Not convinced Jeff is completely right about this not concerning R, since I've found that the application language (R, 
perl, etc.) makes a difference in how files are accessed by/to OS. He is certainly correct that OS (and versions) are 
where the actual reading and writing happens, but sometimes the call to those can be inefficient. (Sorry, I've not got 
examples specifically for file reads, but had a case in computation where there was an 800% i.e., 8 fold difference 
in timing with R, which rather took my breath away. That's probably been sorted now.) The difficulty in making general 
statements is that a rather full set of comparisons over different commands, datasets, OS and version variants is needed 
before the general picture can emerge. Using microbenchmark when you need to find the bottlenecks is how I'd proceed, 
which OP is doing.


About 30 years ago, I did write up some preliminary work, never published, on estimating the two halves of a copy, that 
is, the reading from file and storing to "memory" or a different storage location. This was via regression with a 
singular design matrix, but one can get a minimal length least squares solution via svd. Possibly relevant today to try 
to get at slow links on a network.


JN

On 2017-08-22 09:07 AM, Jeff Newmiller wrote:

You need to study how reading files works in your operating system. This 
question is not about R.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to benchmark speed of load/readRDS correctly

2017-08-22 Thread Jeff Newmiller
You need to study how reading files works in your operating system. This 
question is not about R.
-- 
Sent from my phone. Please excuse my brevity.

On August 22, 2017 5:53:09 AM PDT, raphael.fel...@agroscope.admin.ch wrote:
>Dear all
>
>I was thinking about efficient reading data into R and tried several
>ways to test if load(file.Rdata) or readRDS(file.rds) is faster. The
>files file.Rdata and file.rds contain the same data, the first created
>with save(d, ' file.Rdata', compress=F) and the second with saveRDS(d,
>' file.rds', compress=F).
>
>First I used the function microbenchmark() and was a astonished about
>the max value of the output.
>
>FIRST TEST:
>> library(microbenchmark)
>> microbenchmark(
>+   n <- readRDS('file.rds'),
>+   load('file.Rdata')
>+ )
>Unit: milliseconds
>expr minlq  
>meanmedianuq   
>   max  neval
>n <- readRDS(fl1)106.5956  109.6457 237.3844   
>117.8956  141.9921  10934.162   100
>load(fl2)  295.0654  301.8162335.6266  
>  308.3757  319.6965  1915.706  100
>
>It looks like the max value is an outlier.
>
>So I tried:
>SECOND TEST:
>> sapply(1:10, function(x) system.time(n <- readRDS('file.rds'))[3])
>elapsed   elapsed   elapsed  
>elapsed   elapsed   elapsed  
>elapsed   elapsed elapsed  
>elapsed
>10.50   0.11   0.11
>0.11   0.10   0.11 
>0.11   0.11   0.12 
> 0.12
>> sapply(1:10, function(x) system.time(load'flie.Rdata'))[3])
>elapsed   elapsed   elapsed  
>elapsed   elapsed   elapsed  
>elapsed   elapsed elapsed  
>elapsed
>1.860.29   0.31
>0.30   0.30   0.31 
>0.30   0.29   0.31 
> 0.30
>
>Which confirmed my suspicion; the first time loading the data takes
>much longer than the following times. I suspect that this has something
>to do how the data is assigned and that R doesn't has to 'fully' read
>the data, if it is read the second time.
>
>So the question remains, how can I make a realistic benchmark test?
>From the first test I would conclude that reading the *.rds file is
>faster. But this holds only for a large number of neval. If I set times
>= 1 then reading the *.Rdata would be faster (as also indicated by the
>second test).
>
>Thanks for any help or comments.
>
>Kind regards
>
>Raphael
>
>Raphael Felber, PhD
>Scientific Officer, Climate & Air Pollution
>
>Federal Department of Economic Affairs,
>Education and Research EAER
>Agroscope
>Research Division, Agroecology and Environment
>
>Reckenholzstrasse 191, CH-8046 Z�rich
>Phone +41 58 468 75 11
>Fax +41 58 468 72 01
>raphael.fel...@agroscope.admin.ch
>www.agroscope.ch
>
>
>   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to benchmark speed of load/readRDS correctly

2017-08-22 Thread raphael.felber
Dear all

I was thinking about efficient reading data into R and tried several ways to 
test if load(file.Rdata) or readRDS(file.rds) is faster. The files file.Rdata 
and file.rds contain the same data, the first created with save(d, ' 
file.Rdata', compress=F) and the second with saveRDS(d, ' file.rds', 
compress=F).

First I used the function microbenchmark() and was a astonished about the max 
value of the output.

FIRST TEST:
> library(microbenchmark)
> microbenchmark(
+   n <- readRDS('file.rds'),
+   load('file.Rdata')
+ )
Unit: milliseconds
  expr minlq
   meanmedianuq   
max  neval
n <- readRDS(fl1)106.5956  109.6457 237.3844  
117.8956  141.9921  10934.162   100
 load(fl2)  295.0654  301.8162335.6266  
308.3757  319.6965  1915.706  100

It looks like the max value is an outlier.

So I tried:
SECOND TEST:
> sapply(1:10, function(x) system.time(n <- readRDS('file.rds'))[3])
elapsed   elapsed   elapsed   elapsed   
elapsed   elapsed   elapsed   
elapsed elapsed   elapsed
  10.50   0.11   0.11   
0.11   0.10   0.11  
 0.11   0.11   0.12 
  0.12
> sapply(1:10, function(x) system.time(load'flie.Rdata'))[3])
elapsed   elapsed   elapsed   elapsed   
elapsed   elapsed   elapsed   
elapsed elapsed   elapsed
   1.860.29   0.31  
 0.30   0.30   0.31 
  0.30   0.29   0.31
   0.30

Which confirmed my suspicion; the first time loading the data takes much longer 
than the following times. I suspect that this has something to do how the data 
is assigned and that R doesn't has to 'fully' read the data, if it is read the 
second time.

So the question remains, how can I make a realistic benchmark test? From the 
first test I would conclude that reading the *.rds file is faster. But this 
holds only for a large number of neval. If I set times = 1 then reading the 
*.Rdata would be faster (as also indicated by the second test).

Thanks for any help or comments.

Kind regards

Raphael

Raphael Felber, PhD
Scientific Officer, Climate & Air Pollution

Federal Department of Economic Affairs,
Education and Research EAER
Agroscope
Research Division, Agroecology and Environment

Reckenholzstrasse 191, CH-8046 Z�rich
Phone +41 58 468 75 11
Fax +41 58 468 72 01
raphael.fel...@agroscope.admin.ch
www.agroscope.ch


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] My very first loop!! I failed. May I have some start-up aid?

2017-08-22 Thread Dagmar

Dear Jeff,

Thank you! You helped me a lot!

Tagmarie


Am 19.08.2017 um 08:11 schrieb Jeff Newmiller:
Thank you for providing the example code... for the request of running 
it multiple times it would have helped if you could have confirmed 
that the example ran through without errors... there were a lot of 
mistakes in it. Look into using the reprex package to check your 
example next time.


I don't do this kind of analysis... I really don't know what to expect 
from the functions. The final step in your sequence produces a vector 
of sixteen values, not 1 value, nor 4 values, so I don't know how to 
obtain the 4x3 result you said you expected... I got 16x3.


myframe <- data.frame( ID = c( "Ernie", 
"Ernie","Ernie","Ernie","Ernie","Ernie")
 , Timestamp = c( "24.09.2012 08:00", "24.09.2012 
09:00"
    , "24.09.2012 10:00", "25.09.2012 
10:00"
    , "26.09.2012 10:00", "27.09.2012 
10:00"

    )
 , Longitude = c( 8.481, 8.482, 8.483, 8.481, 
8.483, 8.481 )
 , Latitude = c( 54.753, 54.753, 54.752, 54.751, 
54.752, 54.751 )

 , stringsAsFactors = FALSE
 )
myframe
#>  ID    Timestamp Longitude Latitude
#> 1 Ernie 24.09.2012 08:00 8.481   54.753
#> 2 Ernie 24.09.2012 09:00 8.482   54.753
#> 3 Ernie 24.09.2012 10:00 8.483   54.752
#> 4 Ernie 25.09.2012 10:00 8.481   54.751
#> 5 Ernie 26.09.2012 10:00 8.483   54.752
#> 6 Ernie 27.09.2012 10:00 8.481   54.751

# Now this is where my loop is supposed to start. In this example I want
#to run the following functions 3 times. (In real life more often.) How
#do I do that?

library(adehabitatHR)
#> Loading required package: sp
#> Loading required package: deldir
#> deldir 0.1-14
#> Loading required package: ade4
#> Loading required package: adehabitatMA
#> Loading required package: adehabitatLT
#> Loading required package: CircStats
#> Loading required package: MASS
#> Loading required package: boot
library(rgdal)
#> rgdal: version: 1.2-8, (SVN revision 663)
#>  Geospatial Data Abstraction Library extensions to R successfully 
loaded

#>  Loaded GDAL runtime: GDAL 1.11.3, released 2015/09/16
#>  Path to GDAL shared files: /usr/share/gdal/1.11
#>  Loaded PROJ.4 runtime: Rel. 4.9.2, 08 September 2015, [PJ_VERSION: 
492]

#>  Path to PROJ.4 shared files: (autodetected)
#>  Linking to sp version: 1.2-5

myframe$mysampletime <- as.POSIXct( myframe$Timestamp
  , format = "%d.%m.%Y %H:%M"
  , tz="GMT"
  )

set.seed( 42 )
N <- 3
result <- matrix( NA, nrow = 16, ncol = N )
for ( i in seq.int( N ) ) {
  mysample <- myframe[ sample( seq_along( myframe[[ 1 ]] )
 , 3
 , replace=FALSE
 )
    ,
    ]
  mysamplexy <- project( as.matrix( mysample[ , c( "Longitude"
 , "Latitude"
 )
    ]
  )
   , "+proj=utm +zone=32 +ellps=WGS84"
   )
  colnames( mysamplexy ) <- c( "xCord", "yCord" )
  datltraj <- as.ltraj( mysamplexy, mysample$mysampletime, 
id=mysample$ID )

  Ddat <- BRB.D( datltraj, Tmax=21600, Lmin=36 )
  BRBdat <- BRB( datltraj, D = Ddat, type = "UD", Tmax = 21600, Lmin = 
36, hmin = 100 )

  result[ , i ] <- kernel.area( BRBdat, unout = "km2" )
}
#> Warning in kernel.area(BRBdat, unout = "km2"): The grid is too 
small to allow the estimation of home-range
#> for the following value of percent: 70,75,80,85,90,95. You should 
rerun kernelUD with a larger extent parameter
#> Warning in kernel.area(BRBdat, unout = "km2"): The grid is too 
small to allow the estimation of home-range
#> for the following value of percent: 
30,35,40,45,50,55,60,65,70,75,80,85,90,95. You should rerun kernelUD 
with a larger extent parameter
#> Error in getvolumeUD(x, standardize = standardize): NA/NaN/Inf in 
foreign function call (arg 1)

result
#> [,1]   [,2] [,3]
#>  [1,] 0.02271428 0.01841934   NA
#>  [2,] 0.02916494 0.02374206   NA
#>  [3,] 0.03599915 0.02943263   NA
#>  [4,] 0.04326174 0.03558258   NA
#>  [5,] 0.05102744 0.04230549   NA
#>  [6,] 0.05937594 0.04978273   NA
#>  [7,] 0.06842678 0.05844826   NA
#>  [8,] 0.07832443 0.06780540   NA
#>  [9,] 0.08940262 0.06780540   NA
#> [10,] 0.10219933 0.06780540   NA
#> [11,] 0.11765102 0.06780540   NA
#> [12,] 0.13991202 0.06780540   NA
#> [13,] 0.19924810 0.06780540   NA
#> [14,] 0.19924810 0.06780540   NA
#> [15,] 0.19924810 0.06780540   NA
#> [16,] 0.19924810 0.06780540   NA

 On Sat, 19 Aug 2017, Dagmar wrote:


Dear all,

I have a data similar to this:

myframe<- data.frame (ID=c("Ernie",