from:"Achim Zeileis"

Re: [R] 2SLS with Fixed Effects and Control Variables

2024-01-28 Thread Achim Zeileis


Kelis,

thanks for your interest. It's hard to say what exactly goes wrong based 
on the information you provide. However, I would recommend that you first 
process the data:


- Store all variables as the appropriate types (numeric, factor, etc.) 
in the data frame. Then you don't have to put these things into the model 
formula.


- Employ variable names without spaces, then you don't have to use 
backquotes.


Hopefully, this helps you to resolve the problem.

If the problem persists and you believe that this is an error within 
ivreg, please provide a minimal self-contained and reproducible example.


Best regards,
Achim


On Sun, 28 Jan 2024, Kelis Wong wrote:


Dear John Fox, Christian Kleiber, and Achim Zeileis,

I am attempting to run various independent variable parameters to assess
their suitability. Unfortunately, I hit a snag and couldn't get the tests to
run properly. When I used ivreg, I got an error message saying: "Error in
eval(predvars, data, env) : object 'WageInequality' not found." 
Can you please help?
Model: numeric(WageInequality) = numeric(EffectiveMinimum) +
numeric(EffectiveMinimum2)
(whereas EffectiveMinimum2 is the quadratic form of EffectiveMinimum)

Instruments: numeric(`Log Real Minimum Wage`) + numeric(`Log Real Minimum
Wage2`) + numeric(IVinteration)
(whereas Real Minimum Wage2 is the quadratic form of Real Minimum Wage)

Time and Region Fixed Effects: factor(Region) + numeric(Year) + Region:Year
(whereas Region:Year is and interaction term of Region and Year)

Control Variables: factor(Region), numeric(`Woman Percentage`),
numeric(`Noneducated Percentage`)

Parameter 1:
ivreg(WageInequality ~ EffectiveMinimum + EffectiveMinimum2 + Region | `Log
Real Minimum Wage` + `Log Real Minimum Wage2` + IVinteration + Region,
dataset))

Parameter 2:
ivreg(WageInequality ~ EffectiveMinimum + EffectiveMinimum2 + Region + Year
+ Region:Year | `Log Real Minimum Wage` + `Log Real Minimum Wage2` +
IVinteration + Region + Year + Region:Year, dataset))

Parameter 3:
ivreg(WageInequality ~ EffectiveMinimum + EffectiveMinimum2 + Region + Year
+ Region:Year + `Woman Percentage` + `Noneducated Percentage` | `Log Real
Minimum Wage` + `Log Real Minimum Wage2` + IVinteration + Region + Year +
Region:Year + `Woman Percentage` + `Noneducated Percentage`, dataset))

--
Peace,

Kelis Wong



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate formula - differing results

2023-09-04 Thread Achim Zeileis


On Mon, 4 Sep 2023, Ivan Calandra wrote:


Thanks Rui for your help; that would be one possibility indeed.

But am I the only one who finds that behavior of aggregate() completely 
unexpected and confusing? Especially considering that dplyr::summarise() and 
doBy::summaryBy() deal with NAs differently, even though they all use 
mean(na.rm = TRUE) to calculate the group stats.


I agree with Rui that this behaves as documented but I also agree with 
Ivan that the behavior is potentially confusing. Not so much because other 
packages behave differently but mostly because the handling of missing 
values differs between the different aggregate() methods.


Based on my teaching experience, I feel that a default of 
na.action=na.pass would be less confusing, especially in the case with 
multivariate "response".


In the univeriate case the discrepancy can be surprising - in the default 
method you need na.rm=TRUE but in the formula method you get the same 
result without additional arguments (due to na.action=na.omit). But in the 
multivariate case the discrepancy is not obvious, especially for 
beginners, because the results in other variables without NAs are affected 
as well.


A minimal toy example is the following data with two groups (x = A vs. B) 
and two "responses" (y without NAs and z with NA):


d <- data.frame(x = c("A", "A", "B", "B"), y = 1:4, z = c(1:3, NA))
d
##   x y  z
## 1 A 1  1
## 2 A 2  2
## 3 B 3  3
## 4 B 4 NA

Except for naming of the columns, both of the following summaries for y by 
x (without NAs) yield the same result:


aggregate(d$y, list(d$x), FUN = mean)
aggregate(y ~ x, data = d, FUN = mean)
##   x   y
## 1 A 1.5
## 2 B 3.5

For a single variable _with_ NAs, the default method needs the na.rm = 
TRUE argument, the fomula method does not. Again, except for naming of the 
columns:


aggregate(d$z, list(d$x), FUN = mean, na.rm = TRUE)
aggregate(z ~ x, data = d, FUN = mean)
##   x   z
## 1 A 1.5
## 2 B 3.0

Conversely, if you do want the NAs in the groups, the following two are 
the same (except for naming):


aggregate(d$z, list(d$x), FUN = mean)
aggregate(z ~ x, data = d, FUN = mean, na.action = na.pass)
##   x   z
## 1 A 1.5
## 2 B  NA

But in the multivariate case, it is not so obvious why the following two 
commands differ in their results for y (!), the variable without NAs, in 
group B:


aggregate(d[, c("y", "z")], list(d$x), FUN = mean, na.rm = TRUE)
##   Group.1   y   z
## 1   A 1.5 1.5
## 2   B 3.5 3.0
 ^^^

aggregate(cbind(y, z) ~ x, data = d, FUN = mean)
##   x   y   z
## 1 A 1.5 1.5
## 2 B 3.0 3.0
   ^^^

Hence, in my programming courses I tell students to use na.action=na.pass 
in the formula method and to handle NAs in the FUN argument themselves.


I guess that this is not important enough to change the default in 
aggregate.formula. Or are there R core members who also find that this 
inconsistency between the different methods is worth addressing?


If not, maybe an explicit example could be added on the help page? Showing 
something like this might help:


## default: omit enitre row 4 where z=NA
aggregate(cbind(y, z) ~ x, data = d, FUN = mean)
##   x   y   z
## 1 A 1.5 1.5
## 2 B 3.0 3.0

## alternatively: omit row 4 only for z result but not for y result
aggregate(cbind(y, z) ~ x, data = d, FUN = mean, na.action = na.pass, na.rm = 
TRUE)
##   x   y   z
## 1 A 1.5 1.5
## 2 B 3.5 3.0

Best wishes,
Achim


On 04/09/2023 13:46, Rui Barradas wrote:

Às 10:44 de 04/09/2023, Ivan Calandra escreveu:

Dear useRs,

I have just stumbled across a behavior in aggregate() that I cannot 
explain. Any help would be appreciated!


Sample data:
my_data <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100", 
"FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", "HORN-103", 
"HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 130.1, 140.41, 121.37, 
70.52, 122.3, 71.01, 104.5), SurfaceArea = c(1736.87, 1571.83, 1656.46, 
1247.18, 1177.47, 1169.26, 444.61, 1791.48, 461.15, 1127.2), Length = 
c(44.384, 29.831, 43.869, 48.011, 54.109, 41.742, 23.854, 32.075, 21.337, 
35.459), Width = c(45.982, 67.303, 52.679, 26.42, 25.149, 33.427, 20.683, 
62.783, 26.417, 35.297), PLATWIDTH = c(38.84, NA, 15.33, 30.37, 11.44, 
14.88, 13.86, NA, NA, 26.71), PLATTHICK = c(8.67, NA, 7.99, 11.69, 3.3, 
16.52, 4.58, NA, NA, 9.35), EPA = c(78, NA, 78, 54, 72, 49, 56, NA, NA, 
56), THICKNESS = c(10.97, NA, 9.36, 6.4, 5.89, 11.05, 4.9, NA, NA, 10.08), 
WEIGHT = c(34.3, NA, 25.5, 18.6, 14.9, 29.5, 4.5, NA, NA, 23), RAWMAT = 
c("FLINT", "FLINT", "FLINT", "FLINT", "FLINT", "HORNFELS", "HORNFELS", 
"HORNFELS", "HORNFELS", "HORNFELS")), row.names = c(1L, 2L, 3L, 4L, 5L, 
111L, 112L, 113L, 114L, 115L), class = "data.frame")


1) Simple aggregation with 2 variables:
aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data, FUN = mean, na.rm 
= TRUE)


2) Using the dot notation - different results:
aggregate(. ~ RAWMAT, data = my_data[-1], FUN = mean, na.rm = TRUE)

3) Using

Re: [R] col2rgb() function

2023-07-23 Thread Achim Zeileis

Just one addition which may or may not be useful: The color palette you 
use is also known as "Okabe-Ito" and it is the default set of colors in 
the palette.colors() function. This function also has an optional alpha 
argument. So if you want to generate these colors with an alpha of 0.3 you 
can also do:


palette.colors(8, alpha = 0.3)

or more explicitly

palette.colors(8, palette = "Okabe-Ito", alpha = 0.3)

On Sun, 23 Jul 2023, Nick Wray wrote:


Thanks That works nicely  Nick

On Sun, 23 Jul 2023 at 19:26, Ben Bolker  wrote:


   Does adjustcolor() help?

cb8<- c("#00", "#E69F00", "#56B4E9", "#009E73","#F0E442", "#0072B2",
"#D55E00", "#CC79A7")
plot(0,0,xlim=c(1,8),ylim=c(0,1))
points(1:8,rep(0.5,8),col=cb8,pch=19,cex=2)
points(1:8,rep(0.75,8),col=adjustcolor(cb8, alpha.f = 0.3), pch=19,cex=2)

On 2023-07-23 2:15 p.m., Nick Wray wrote:

Hello  I have a palette vector of colour blind colours (in hexadecimal)
which I’m using for plots, but they are not see-through, and as I wanted

to

overlay some histograms I wanted to convert these colours to rgb, when

you

can set the opacity.

I have found the function col2rgb(), which works in the sense that it

gives

a vector of numbers but these don’t work directly in rgb because they are
too big.  If I divide through to make them all less than 1 I don’t get

the

corresponding colour-blind hue, but something somewhat off.

Here is the colour-blind palette in a plot:


*cb8<- c("#00", "#E69F00", "#56B4E9", "#009E73","#F0E442", "#0072B2",
"#D55E00", "#CC79A7")*

*plot(0,0,xlim=c(1,8),ylim=c(0,1))*

*points(1:8,rep(0.5,8),col=cb8,pch=19,cex=2)*



so if I try to convert the red dot ("#D55E00") (number 7) I get

*col2rgb("#D55E00"*

   [,1]

red213

green   94

blue 0

*points(7,0.25,col=rgb(rgb(213,94,0)),pch=19,cex=2)*

gives me an error message and although if  I divide through

*points(7,0.25,col=rgb(213/307,94/307,0),pch=19,cex=2)*

gives me a reddish dot, but not the same as in the colour-blind palette



Somewhat mystified.  Can anyone help?? Thanks Nick Wray

  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grDevices::hcl.colors using two colours: Bug or Feature?

2023-04-28 Thread Achim Zeileis


On Fri, 28 Apr 2023, Duncan Murdoch wrote:


On 28/04/2023 6:18 a.m., Achim Zeileis wrote:

On Fri, 28 Apr 2023, Achim Zeileis wrote:


This was introduced in 4.3.0 (hence Rui cannot reproduce it in 4.2.3).

It's a bug and was introduced when fixing this other bug:

https://bugs.R-project.org/show_bug.cgi?id=18476
https://hypatia.math.ethz.ch/pipermail/r-help/2023-February/476960.html

Apparently, it only affects the case with n = 2 for diverging and 
divergingx

palettes. The culprit is this line:

i <- if(n2 == 1L) 0 else seq.int(1, by = -2/(n - 1), length.out = n2)

I think n2 == 1L is not the right condition and we need to distinguish n = 
1

and n = 2.


I think the solution is simply to use n == 1L instead of n2 == 1L, both in
"diverging" (line 188 in hcl.colors.R) and "divergingx" (line 197).

Duncan, maybe you can have a look at this as well?


I agree that the two changes solve this problem.  I couldn't spot any other 
places that would need fixing.


Great, thanks for checking. I reported the problem with the corresponding 
patch at: https://bugs.R-project.org/show_bug.cgi?id=18523


Thanks & best wishes,
Achim


Will have a closer look...

Thanks for reporting this!
Achim

On Fri, 28 Apr 2023, Rui Barradas wrote:


Às 06:01 de 28/04/2023, Stevie Pederson escreveu:

Hi,

I'm not sure if this is a bug or a feature, but after updating to Rv4.3,
if
requesting two colours from hcl.colors() you now get the same colour
twice.
This occurs for all palettes I've tried. My reprex:

hcl.colors(2, "Vik")
[1] "#F1F1F1" "#F1F1F1"

As I have multiple workflows I run repeatedly with A vs B comparisons,
this
has just broken the visualisations in many of them. Obviously a
workaround is hcl.colors(3, "Vik")[c(1, 3)] but this seems rather
unintuitive.

Thanks in advance,

Stevie

sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
   [1] LC_CTYPE=en_AU.UTF-8   LC_NUMERIC=C
   [3] LC_TIME=en_AU.UTF-8LC_COLLATE=en_AU.UTF-8
   [5] LC_MONETARY=en_AU.UTF-8LC_MESSAGES=en_AU.UTF-8
   [7] LC_PAPER=en_AU.UTF-8   LC_NAME=C
   [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C

time zone: Australia/Adelaide
tzcode source: system (glibc)

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.3.0 tools_4.3.0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

I cannot reproduce this on Windows.


hcl.colors(2, "Vik")
# [1] "#002E60" "#3E2000"

clrs <- sapply(hcl.pals(), \(p) hcl.colors(2, p))
any(apply(clrs, 2, \(x) x[1] == x[2]))
# [1] FALSE

sessionInfo()
# R version 4.2.3 (2023-03-15 ucrt)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 22621)
#
# Matrix products: default
#
# locale:
# [1] LC_COLLATE=Portuguese_Portugal.utf8
LC_CTYPE=Portuguese_Portugal.utf8
# [3] LC_MONETARY=Portuguese_Portugal.utf8 LC_NUMERIC=C
# [5] LC_TIME=Portuguese_Portugal.utf8
#
# attached base packages:
# [1] stats graphics  grDevices utils datasets  methods   base
#
# loaded via a namespace (and not attached):
# [1] compiler_4.2.3


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grDevices::hcl.colors using two colours: Bug or Feature?

2023-04-28 Thread Achim Zeileis


On Fri, 28 Apr 2023, Achim Zeileis wrote:


This was introduced in 4.3.0 (hence Rui cannot reproduce it in 4.2.3).

It's a bug and was introduced when fixing this other bug:

https://bugs.R-project.org/show_bug.cgi?id=18476
https://hypatia.math.ethz.ch/pipermail/r-help/2023-February/476960.html

Apparently, it only affects the case with n = 2 for diverging and divergingx 
palettes. The culprit is this line:


i <- if(n2 == 1L) 0 else seq.int(1, by = -2/(n - 1), length.out = n2)

I think n2 == 1L is not the right condition and we need to distinguish n = 1 
and n = 2.


I think the solution is simply to use n == 1L instead of n2 == 1L, both in 
"diverging" (line 188 in hcl.colors.R) and "divergingx" (line 197).


Duncan, maybe you can have a look at this as well?


Will have a closer look...

Thanks for reporting this!
Achim

On Fri, 28 Apr 2023, Rui Barradas wrote:


Às 06:01 de 28/04/2023, Stevie Pederson escreveu:

Hi,

I'm not sure if this is a bug or a feature, but after updating to Rv4.3, 
if
requesting two colours from hcl.colors() you now get the same colour 
twice.

This occurs for all palettes I've tried. My reprex:

hcl.colors(2, "Vik")
[1] "#F1F1F1" "#F1F1F1"

As I have multiple workflows I run repeatedly with A vs B comparisons, 
this

has just broken the visualisations in many of them. Obviously a
workaround is hcl.colors(3, "Vik")[c(1, 3)] but this seems rather
unintuitive.

Thanks in advance,

Stevie

sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
  [1] LC_CTYPE=en_AU.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_AU.UTF-8LC_COLLATE=en_AU.UTF-8
  [5] LC_MONETARY=en_AU.UTF-8LC_MESSAGES=en_AU.UTF-8
  [7] LC_PAPER=en_AU.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C

time zone: Australia/Adelaide
tzcode source: system (glibc)

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.3.0 tools_4.3.0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

I cannot reproduce this on Windows.


hcl.colors(2, "Vik")
# [1] "#002E60" "#3E2000"

clrs <- sapply(hcl.pals(), \(p) hcl.colors(2, p))
any(apply(clrs, 2, \(x) x[1] == x[2]))
# [1] FALSE

sessionInfo()
# R version 4.2.3 (2023-03-15 ucrt)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 22621)
#
# Matrix products: default
#
# locale:
# [1] LC_COLLATE=Portuguese_Portugal.utf8 
LC_CTYPE=Portuguese_Portugal.utf8

# [3] LC_MONETARY=Portuguese_Portugal.utf8 LC_NUMERIC=C
# [5] LC_TIME=Portuguese_Portugal.utf8
#
# attached base packages:
# [1] stats graphics  grDevices utils datasets  methods   base
#
# loaded via a namespace (and not attached):
# [1] compiler_4.2.3


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grDevices::hcl.colors using two colours: Bug or Feature?

2023-04-28 Thread Achim Zeileis


This was introduced in 4.3.0 (hence Rui cannot reproduce it in 4.2.3).

It's a bug and was introduced when fixing this other bug:

https://bugs.R-project.org/show_bug.cgi?id=18476
https://hypatia.math.ethz.ch/pipermail/r-help/2023-February/476960.html

Apparently, it only affects the case with n = 2 for diverging and 
divergingx palettes. The culprit is this line:


i <- if(n2 == 1L) 0 else seq.int(1, by = -2/(n - 1), length.out = n2)

I think n2 == 1L is not the right condition and we need to distinguish n = 
1 and n = 2.


Will have a closer look...

Thanks for reporting this!
Achim

On Fri, 28 Apr 2023, Rui Barradas wrote:


Às 06:01 de 28/04/2023, Stevie Pederson escreveu:

Hi,

I'm not sure if this is a bug or a feature, but after updating to Rv4.3, if
requesting two colours from hcl.colors() you now get the same colour twice.
This occurs for all palettes I've tried. My reprex:

hcl.colors(2, "Vik")
[1] "#F1F1F1" "#F1F1F1"

As I have multiple workflows I run repeatedly with A vs B comparisons, this
has just broken the visualisations in many of them. Obviously a
workaround is hcl.colors(3, "Vik")[c(1, 3)] but this seems rather
unintuitive.

Thanks in advance,

Stevie

sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
  [1] LC_CTYPE=en_AU.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_AU.UTF-8LC_COLLATE=en_AU.UTF-8
  [5] LC_MONETARY=en_AU.UTF-8LC_MESSAGES=en_AU.UTF-8
  [7] LC_PAPER=en_AU.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C

time zone: Australia/Adelaide
tzcode source: system (glibc)

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.3.0 tools_4.3.0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

I cannot reproduce this on Windows.


hcl.colors(2, "Vik")
# [1] "#002E60" "#3E2000"

clrs <- sapply(hcl.pals(), \(p) hcl.colors(2, p))
any(apply(clrs, 2, \(x) x[1] == x[2]))
# [1] FALSE

sessionInfo()
# R version 4.2.3 (2023-03-15 ucrt)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 22621)
#
# Matrix products: default
#
# locale:
# [1] LC_COLLATE=Portuguese_Portugal.utf8  LC_CTYPE=Portuguese_Portugal.utf8
# [3] LC_MONETARY=Portuguese_Portugal.utf8 LC_NUMERIC=C
# [5] LC_TIME=Portuguese_Portugal.utf8
#
# attached base packages:
# [1] stats graphics  grDevices utils datasets  methods   base
#
# loaded via a namespace (and not attached):
# [1] compiler_4.2.3


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Palettes {grDevices} - wrong number of colors returned?

2023-02-28 Thread Achim Zeileis

Just for the record: Duncan and Martin (Maechler) improved my proposed 
patch and Martin committed it to R-devel now.


Thank you, Sigbert, for reporting the issue!

On Thu, 23 Feb 2023, Achim Zeileis wrote:


On Thu, 23 Feb 2023, Duncan Murdoch wrote:


On 23/02/2023 6:09 a.m., Achim Zeileis wrote:

Duncan,

thanks for your feedback. I just received your response after sending out
mine. You came to the same conclusion. Should I prepare a patch and send
it to you so that you can also have a look? Or view Bugzilla?


Copying Sigbert's message to Bugzilla is a good idea so it can be referred 
to from the NEWS entry.


Good point. I've written a new description, though, because the are even more 
problems with the alpha vector handling than suggested by Sigbert.


The bug report is now at: https://bugs.R-project.org/show_bug.cgi?id=18476

If you want to submit a patch along with it, let me know and I'll take a 
look at it.


There is both a patch and a test script in the attachments of the bug report.

Let me know if something needs further fixing.

Best wishes,
Achim




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Palettes {grDevices} - wrong number of colors returned?

2023-02-23 Thread Achim Zeileis


On Thu, 23 Feb 2023, Duncan Murdoch wrote:


On 23/02/2023 6:09 a.m., Achim Zeileis wrote:

Duncan,

thanks for your feedback. I just received your response after sending out
mine. You came to the same conclusion. Should I prepare a patch and send
it to you so that you can also have a look? Or view Bugzilla?


Copying Sigbert's message to Bugzilla is a good idea so it can be 
referred to from the NEWS entry.


Good point. I've written a new description, though, because the are even 
more problems with the alpha vector handling than suggested by Sigbert.


The bug report is now at: https://bugs.R-project.org/show_bug.cgi?id=18476

If you want to submit a patch along with it, let me know and I'll take a 
look at it.


There is both a patch and a test script in the attachments of the bug 
report.


Let me know if something needs further fixing.

Best wishes,
Achim

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Palettes {grDevices} - wrong number of colors returned?

2023-02-23 Thread Achim Zeileis


Thanks, Sigbert, good catch.

The reason that this happens is the following: rainbow(), heat.colors(), 
terrain.colors(), cm.colors(), and topo.colors() are all just front-ends 
to calls to hsv() for Hue-Saturation-Value colors. (BTW: This is also the 
main reason why they yield palettes with rather poor perceptual 
properties.)


All of the palette functions just pass on the alpha argument to all 
hsv(..., alpha=alpha) calls. And in each case if the length of alpha and 
the number of colors does not match the shorter element is recycled to the 
length of the longer one.


Thus, if n == length(alpha), as in your example, then those functions that 
can generate all colors with a single call of hsv() do the right thing 
(rainbow and heat.colors). Those, that need two hsv() calls yield 2 * n 
colors (cm.colors and terrain.colors). And in topo.colors() where three 
hsv() calls are used we get 3 * n colors.


My suggested solution would be to do something like alpha <- 
rep_len(alpha, n) early on in the palette functions and then assure that 
the correct elements of alpha are passed to hsv().


The same should be done in those functions that work correctly for n = 
length(alpha) because it will assure that we always get n colors and not 
more. For example, the case with n < length(alpha) is also unexpected:


hcl.colors(2, alpha = c(0, 0.25, 0.5, 0.75, 1))
## [1] "#4B005500" "#FDE33340" "#4B005580" "#FDE333BF" "#4B0055FF"

Or even worse:

gray.colors(2, alpha = c(0, 0.25, 0.5, 0.75, 1))
## Error in gray(seq.int(from = start^gamma, to = end^gamma, length.out = 
n)^(1/gamma),  :
##   attempt to set index 2/2 in SET_STRING_ELT

If someone from R Core thinks that my proposed strategy is a good way 
forward, I'm happy to work on a patch.


Sigbert, if you are looking for a solution: (1) Don't use heat.colors, 
terrain.colors, topo.colors, cm.colors, they are all quite terribel ;-)
(2) If you really want to use them, put on the alpha afterwards, e.g., 
using colorspace::adjust_transparency:


colorspace::adjust_transparency(topo.colors(3), alpha = c(0, 0.5, 1))
## [1] "#4C00FF00" "#00FF4D80" "#00FF"

hth,
Achim


On Thu, 23 Feb 2023, Sigbert Klinke wrote:


Hi,

I would have expected that I get always 3 colors as result which is not true:

hcl.colors(3, alpha=c(0, 0.5, 1)) # 3 colors

rainbow(3, alpha=c(0, 0.5, 1))# 3 colors

heat.colors(3, alpha=c(0, 0.5, 1))# 3 colors

terrain.colors(3, alpha=c(0, 0.5, 1)) # 6 colors

cm.colors(3, alpha=c(0, 0.5, 1))  # 6 colors

topo.colors(3, alpha=c(0, 0.5, 1))# 9 colors

R-Version and platform:
R version 4.2.2 Patched (2022-11-10 r83330) -- "Innocent and Trusting"

Copyright (C) 2022 The R Foundation for Statistical Computing

Platform: x86_64-pc-linux-gnu (64-bit)

Bug or feature?

Sigbert

--
https://hu.berlin/sk
https://www.stat.de/faqs
https://hu.berlin/mmstat
https://hu.berlin/mmstat-ar

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Palettes {grDevices} - wrong number of colors returned?

2023-02-23 Thread Achim Zeileis


Duncan,

thanks for your feedback. I just received your response after sending out 
mine. You came to the same conclusion. Should I prepare a patch and send 
it to you so that you can also have a look? Or view Bugzilla?


Best wishes,
Achim

On Thu, 23 Feb 2023, Duncan Murdoch wrote:


On 23/02/2023 4:36 a.m., Sigbert Klinke wrote:

Hi,

I would have expected that I get always 3 colors as result which is not
true:

hcl.colors(3, alpha=c(0, 0.5, 1)) # 3 colors

rainbow(3, alpha=c(0, 0.5, 1))# 3 colors

heat.colors(3, alpha=c(0, 0.5, 1))# 3 colors

terrain.colors(3, alpha=c(0, 0.5, 1)) # 6 colors

cm.colors(3, alpha=c(0, 0.5, 1))  # 6 colors

topo.colors(3, alpha=c(0, 0.5, 1))# 9 colors

R-Version and platform:
R version 4.2.2 Patched (2022-11-10 r83330) -- "Innocent and Trusting"

Copyright (C) 2022 The R Foundation for Statistical Computing

Platform: x86_64-pc-linux-gnu (64-bit)

Bug or feature?


Looks like a bug to me, they should all be length 3.  The reason it happens 
in terrain.colors (I didn't look at the others) is that it breaks the range 
into two parts, and relies on the number of values of the parameters in each 
part to get the length, but passes the full alpha vector in:


terrain.colors <- function (n, alpha, rev = FALSE)
{
   if ((n <- as.integer(n[1L])) > 0) {
k <- n%/%2
h <- c(4/12, 2/12, 0/12)
s <- c(1, 1, 0)
v <- c(0.65, 0.9, 0.95)
cols <- c(hsv(h = seq.int(h[1L], h[2L], length.out = k),
 s = seq.int(s[1L], s[2L], length.out = k),
 v = seq.int(v[1L], v[2L], length.out = k), alpha = 
alpha),

 hsv(h = seq.int(h[2L], h[3L], length.out = n - k + 1)[-1L],
 s = seq.int(s[2L], s[3L], length.out = n - k + 1)[-1L],
 v = seq.int(v[2L], v[3L], length.out = n - k + 1)[-1L],
 alpha = alpha))
   if(rev) rev(cols) else cols
   } else character()
}


A bug fix would be to recycle alpha to the right length and only pass in 
portions of it, e.g.


terrain.colors <- function (n, alpha, rev = FALSE)
{
   if ((n <- as.integer(n[1L])) > 0) {
   alpha <- rep_len(alpha, length.out = n)
k <- n%/%2
h <- c(4/12, 2/12, 0/12)
s <- c(1, 1, 0)
v <- c(0.65, 0.9, 0.95)
cols <- c(hsv(h = seq.int(h[1L], h[2L], length.out = k),
 s = seq.int(s[1L], s[2L], length.out = k),
 v = seq.int(v[1L], v[2L], length.out = k), alpha = 
alpha[seq_len(k)]),

 hsv(h = seq.int(h[2L], h[3L], length.out = n - k + 1)[-1L],
 s = seq.int(s[2L], s[3L], length.out = n - k + 1)[-1L],
 v = seq.int(v[2L], v[3L], length.out = n - k + 1)[-1L],
 alpha = alpha[seq_len(n - k) + k]))
   if(rev) rev(cols) else cols
   } else character()
}

I'd guess the same sort of approach would fix cm.colors and topo.colors.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] struccchange on zoo time series

2022-05-01 Thread Achim Zeileis


On Sun, 1 May 2022, Eric Berger wrote:


Hi Achim,
My point was that tsbox (apparently) provides tools to convert zoo -->
ts which should help the OP.


Not necessarily, because ts can only represent regular and plain numeric 
time indexes whereas zoo (and also xts and tsibble) can represent 
irregular time indexes of different classes as well. Also, zoo (and also 
xts and tsibble) can convert to many other time series classes (including 
ts) directly, there is no need to go via tsbox for that.


In this particular case it would be possible, though, to convert back and 
forth between ts and zoo because the data is simply monthly. This can be 
done with as.ts() and as.zoo(), respectively.


dd.ocus <- efp(dd ~ dd.lag1 + dd.lag12, data = as.ts(na.trim(dd.z)),
   type = "OLS-CUSUM")



On Sun, May 1, 2022 at 5:56 PM Achim Zeileis  wrote:


On Sun, 1 May 2022, Eric Berger wrote:


Hi Naresh,
The tsbox package on CRAN -
https://cran.r-project.org/web/packages/tsbox/index.html - has the
following description:

tsbox: Class-Agnostic Time Series

Time series toolkit with identical behavior for all time series
classes: 'ts','xts', 'data.frame', 'data.table', 'tibble', 'zoo',
'timeSeries', 'tsibble', 'tis' or 'irts'. Also converts reliably
between these classes.

Hopefully this will provide you the necessary tools to solve your problem.


Not really because the code inside strucchange::efp does not use tsbox but
just ts directly.

Best,
Achim


Good luck,
Eric



On Sun, May 1, 2022 at 3:37 PM Naresh Gurbuxani
 wrote:


I am trying to replicate empirical fluctuation process fit (efp) described in the book 
"Applied Econometrics with R".  This fit works when data input is an object of 
class ts, but not when data input is object of class zoo.  I prefer to use zoo because it 
provides better housekeeping with dates.  Is it possible to achieve the fit with zoo?

library(AER)
library(strucchange)

data(UKDriverDeaths)
dd <- log(UKDriverDeaths)
dd.z <- zoo(dd, order.by = as.yearmon(time(dd)))
dd.z <- merge(dd = dd.z, dd.lag1 = lag(dd.z, k = -1),
  dd.lag12 = lag(dd.z, k = -12))

# Does not work
dd.ocus <- efp(dd ~ dd.lag1 + dd.lag12, data = na.trim(dd.z),
   type = "OLS-CUSUM")
# Error message
# Error in eval(attr(mt, "variables")[[2]], data, env) :
# numeric 'envir' arg not of length one

# Works
dd.ocus <- efp(dd ~ dd.lag1 + dd.lag12, data = ts(na.trim(dd.z)),
   type = "OLS-CUSUM")

# But time stamps are lost
plot(dd.ocus)
# Time indexed from 0 to 180

Thanks,
Naresh
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A glitch (???) in tools::texi2pf.

2021-08-29 Thread Achim Zeileis


On Sat, 28 Aug 2021, Rolf Turner wrote:



On Sat, 28 Aug 2021 12:49:04 +0300
Eric Berger  wrote:


As Achim wrote in point (2), Makefile is your friend.


Well, all I can say is that Makefile is *not* my friend; I have never
made its acquaintance and wouldn't know where to begin looking.

Would it be possible for you to provide a template/prototype Makefile
and give me some idea what to *do* with it?


In a Makefile you can declare "rules" for producing certain "target" files 
from "prerequisite" files. The rules are only applied if any of the 
prerequisite files have changed since the target was last produced.


In your case the .bib is a prerequisite for the target .bbl which in turn 
is a prerequisite (along with the .tex) for the .pdf.


A nice introduction to GNU Make for data analysis workflows is this JSS 
paper by Peter Baker (http://petebaker.id.au/):


https://doi.org/10.18637/jss.v094.c01

Best wishes,
Z

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A glitch (???) in tools::texi2pf.

2021-08-28 Thread Achim Zeileis


On Sat, 28 Aug 2021, Rolf Turner wrote:



On Sat, 28 Aug 2021 09:47:03 +0200
Achim Zeileis  wrote:


On Sat, 28 Aug 2021, Rolf Turner wrote:


I have found that tools::texi2pf() ignores changes to the *.bib file
unless the *.bbl file is removed prior to re-running
tools::texi2pdf().


This is how texi2pdf (or actually texi2dvi) from texinfo behaves.
This is likely what tools::texi2pdf calles in your setup anyway. In
short, texi2dvi considers the .bbl as input files to the .tex and
does not remove them if they are available prior to calling texi2dvi.

Alternatives:

(1) You can always re-run everything. Then simply start with a clean
directory and always use tools::texi2pdf(..., clean = TRUE). This
cleans up all the files it has produced (except the .pdf). But it
will preserve files left in the directory from previous runs.

(2) You can detect upstream changes, e.g., based on timestamps etc.
Then the traditional approach would be to use a Makefile.

Best,
Z


Thanks.  I guess you're saying that it's a feature, not a bug. :-)


Also. But the main point is that it's a feature of texi2dvi from texinfo 
which is called by tools::texi2dvi(). Hence, if you want to raise this 
somewhere it would have to be with the texinfo maintainers.



Well it's a feature that I intensely dislike, but that cuts no ice I'm
sure, and I'll just have to cope with it. I can easily build a local
function that will remove *.bbl before invoking tools::texi2pdf(),
and use that, rather than calling tools::texi2pdf() directly.

However I *really* believe that this is a bad feature, and is a Trap For 
Young Players.  Hardly anyone knows what a *.bbl is or is for. Users 
would expect that changing the *.bib file would change the reference 
list in the output.  (I.e. that texi2pdf() would re-run bibtex "under 
the bonnet", as the user would do if processing from the OS command line 
rather than applying texi2pdf() from R.)


The problem is that currently texi2dvi has no concept of judging whether 
the .bib or .bbl were modified last.



I wonder how many papers in the R Journal have errors in their
reference lists due to the fact that authors corrected the *.bib file,
reprocessed using texi2pdf() and did not notice that the error they
corrected had *not* been corrected in the *.pdf output?


I don't know what the R Journal editors are doing but for JSS I'm cleaning 
such files up (repeatedly actually) before compiling the final PDFs.


Best,
Z

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A glitch (???) in tools::texi2pf.

2021-08-28 Thread Achim Zeileis


On Sat, 28 Aug 2021, Rolf Turner wrote:


I have found that tools::texi2pf() ignores changes to the *.bib file
unless the *.bbl file is removed prior to re-running tools::texi2pdf().


This is how texi2pdf (or actually texi2dvi) from texinfo behaves. This is 
likely what tools::texi2pdf calles in your setup anyway. In short, 
texi2dvi considers the .bbl as input files to the .tex and does not remove 
them if they are available prior to calling texi2dvi.


Alternatives:

(1) You can always re-run everything. Then simply start with a clean 
directory and always use tools::texi2pdf(..., clean = TRUE). This cleans 
up all the files it has produced (except the .pdf). But it will preserve 
files left in the directory from previous runs.


(2) You can detect upstream changes, e.g., based on timestamps etc. Then 
the traditional approach would be to use a Makefile.


Best,
Z


I have constructed a minimal reproducible example which is contained
in the attached file "mreprex.txt".

This *.txt file should be split into two files, mreprex.tex and
mreprex.bib.  (I wasn't sure whether *.tex and *.bib files would make
it through to the list.)

After doing the splitting, start R, execute

tools::texi2pf("mreprex.tex")

and view the resulting mreprex.pdf file.  Then edit mreprex.bib
and change (e.g.) the version number from "0.0-21" to "0.0-22".

Re-run tools::texi2pf("mreprex.tex") and view mreprex.pdf.  You will
see that it hasn't changed; the version number cited is still given as
0.0-21.  Now remove mreprex.bbl, re-run tools::texi2pf("mreprex.tex")
and view mreprex.pdf.  You will see that the version number is given as
0.0-22 as it should be.

Should this be considered a bug?  If so, what is the appropriate way of
drawing this to the attention of those who have the power to fix it?

Thanks.

cheers,

Rolf Turner

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] colourlovers error

2021-06-28 Thread Achim Zeileis


On Mon, 28 Jun 2021, Monica Palaseanu-Lovejoy wrote:


Hi,

Thanks for letting me know. I will go back to RColorBrewers and monitor from
time to time that issue in GitHub as well to see if it gets solved.


See also

example(palette.colors)
example(hcl.colors)

in base R which provide a wide range of use palettes introduced in:

https://developer.R-project.org/Blog/public/2019/11/21/a-new-palette-for-r/
https://developer.R-project.org/Blog/public/2019/04/01/hcl-based-color-palettes-in-grdevices/


Thanks,
Monica

On Mon, Jun 28, 2021 at 10:11 AM PIKAL Petr  wrote:
  Hi Monica

  In the meantime you could try RColorBrewers

  https://www.r-graph-gallery.com/38-rcolorbrewers-palettes.html

  Cheers
  Petr

  > -Original Message-
  > From: R-help  On Behalf Of
  Monica
  > Palaseanu-Lovejoy
  > Sent: Monday, June 28, 2021 2:58 PM
  > To: r-help 
  > Subject: [R] colourlovers error
  >
  > Hi,
  >
  > In an older version of R i was using the package colourlovers
  to get nice
  color
  > palettes. I just updated my R to the latest version and now i
  am getting
  the
  > following error:
  >
  > id='292482'
  > pl1 <- clpalette(id)
  >
  > Opening and ending tag mismatch: input line 114 and form
  Opening and
  > ending tag mismatch: input line 113 and div Opening and ending
  tag
  > mismatch: input line 112 and div Opening and ending tag
  mismatch: form
  line
  > 98 and div Entity 'bull' not defined Entity 'bull' not defined
  Opening and
  > ending tag mismatch: div line 92 and body Opening and ending
  tag mismatch:
  > div line 84 and html Premature end of data in tag div line 82
  Premature
  end of
  > data in tag body line 81 Premature end of data in tag html
  line 5
  > Error: 1: Opening and ending tag mismatch: input line 114 and
  form
  > 2: Opening and ending tag mismatch: input line 113 and div
  > 3: Opening and ending tag mismatch: input line 112 and div
  > 4: Opening and ending tag mismatch: form line 98 and div
  > 5: Entity 'bull' not defined
  > 6: Entity 'bull' not defined
  > 7: Opening and ending tag mismatch: div line 92 and body
  > 8: Opening and ending tag mismatch: div line 84 and html
  > 9: Premature end of data in tag div line 82
  > 10: Premature end of data in tag body line 81
  > 11: Premature end of data in tag html line 5
  >
  > My session is as follows on  Windows 10 64 bit machine:
  >
  > sessionInfo(package = NULL)
  > R version 4.1.0 (2021-05-18)
  > Platform: x86_64-w64-mingw32/x64 (64-bit) Running under:
  Windows 10 x64
  > (build 18363)
  >
  > Matrix products: default
  >
  > locale:
  > [1] LC_COLLATE=English_United States.1252 [2]
  LC_CTYPE=English_United
  > States.1252 [3] LC_MONETARY=English_United States.1252 [4]
  > LC_NUMERIC=C [5] LC_TIME=English_United States.1252
  >
  > attached base packages:
  > [1] stats     graphics  grDevices utils     datasets  methods 
   base
  >
  > other attached packages:
  >  [1] colourlovers_0.3.6 reshape2_1.4.4     forcats_0.5.1
  stringr_1.4.0
  >
  >  [5] dplyr_1.0.7        purrr_0.3.4        readr_1.4.0       
  tidyr_1.1.3
  >
  >  [9] tibble_3.1.2       ggplot2_3.3.5      tidyverse_1.3.1   
  Rcpp_1.0.6
  >
  >
  > Maybe i need a separate package to install that i didn't?
  >
  > Thanks for any help,
  > Monica
  >
  >       [[alternative HTML version deleted]]
  >
  > __
  > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
  see
  > https://stat.ethz.ch/mailman/listinfo/r-help
  > PLEASE do read the posting guide
  http://www.R-project.org/posting-
  > guide.html
  > and provide commented, minimal, self-contained, reproducible
  code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] colourlovers error

2021-06-28 Thread Achim Zeileis


On Mon, 28 Jun 2021, Monica Palaseanu-Lovejoy wrote:


Hi,

In an older version of R i was using the package colourlovers to get nice
color palettes. I just updated my R to the latest version and now i am
getting the following error:


Presumably, there was a change in the API of the colourlovers web site. 
Issues like this are best raised in the issues of the corresponding GitHub 
project (as also linked under "BugReports" on CRAN). In fact, virtually 
the same problem has alsoready been reported there last month but it has 
not yet been fixed: https://github.com/andrewheiss/colourlovers/issues/14



id='292482'
pl1 <- clpalette(id)

Opening and ending tag mismatch: input line 114 and form
Opening and ending tag mismatch: input line 113 and div
Opening and ending tag mismatch: input line 112 and div
Opening and ending tag mismatch: form line 98 and div
Entity 'bull' not defined
Entity 'bull' not defined
Opening and ending tag mismatch: div line 92 and body
Opening and ending tag mismatch: div line 84 and html
Premature end of data in tag div line 82
Premature end of data in tag body line 81
Premature end of data in tag html line 5
Error: 1: Opening and ending tag mismatch: input line 114 and form
2: Opening and ending tag mismatch: input line 113 and div
3: Opening and ending tag mismatch: input line 112 and div
4: Opening and ending tag mismatch: form line 98 and div
5: Entity 'bull' not defined
6: Entity 'bull' not defined
7: Opening and ending tag mismatch: div line 92 and body
8: Opening and ending tag mismatch: div line 84 and html
9: Premature end of data in tag div line 82
10: Premature end of data in tag body line 81
11: Premature end of data in tag html line 5

My session is as follows on  Windows 10 64 bit machine:

sessionInfo(package = NULL)
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] colourlovers_0.3.6 reshape2_1.4.4 forcats_0.5.1  stringr_1.4.0

[5] dplyr_1.0.7purrr_0.3.4readr_1.4.0tidyr_1.1.3

[9] tibble_3.1.2   ggplot2_3.3.5  tidyverse_1.3.1Rcpp_1.0.6


Maybe i need a separate package to install that i didn't?

Thanks for any help,
Monica

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] newdata for predict.lm() ??

2020-11-04 Thread Achim Zeileis


On Wed, 4 Nov 2020, peter dalgaard wrote:


Don't use $ notation in lm() formulas. Use lm(w ~ h, data=DAT).


...or in any other formula for that matter!

Let me expand a bit on Peter's comment because this is really a pet peeve 
of mine:


The idea is that the formula "w ~ h" described the relationships between 
the variables involved, independent of the data set this should be applied 
to. In contrast "DAT$w ~ DAT$h" hard-wires the data into the formula and 
prevents it from applying the formula to another data set.


Hope that helps,
Achim



On 4 Nov 2020, at 10:50 , Boris Steipe  wrote:

Can't get data from a data frame into predict() without a detour that seems 
quite unnecessary ...

Reprex:

# Data frame with simulated data in columns "h" (independent) and "w" 
(dependent)
DAT <- structure(list(h = c(2.174, 2.092, 2.059, 1.952, 2.216, 2.118,
   1.755, 2.060, 2.136, 2.126, 1.792, 1.574,
   2.117, 1.741, 2.295, 1.526, 1.666, 1.581,
   1.522, 1.995),
 w = c(90.552, 89.518, 84.124, 94.685, 94.710, 82.429,
   87.176, 90.318, 76.873, 84.183, 57.890, 62.005,
   84.258, 78.317,101.304, 64.982, 71.237, 77.124,
   65.010, 81.413)),
row.names = c( "1",  "2",  "3",  "4",  "5",  "6",  "7",
   "8",  "9", "10", "11", "12", "13", "14",
  "15", "16", "17", "18", "19", "20"),
class = "data.frame")


myFit <- lm(DAT$w ~ DAT$h)
coef(myFit)

# (Intercept)   DAT$h
#   11.7647535.92002


# Create 50 x-values with seq() to plot confidence intervals
myNew <- data.frame(seq(min(DAT$h), max(DAT$h), length.out = 50))

pc <- predict(myFit, newdata = myNew, interval = "confidence")

# Warning message:
# 'newdata' had 50 rows but variables found have 20 rows

# Problem: predict() was not able to take the single column in myNew
# as the independent variable.

# Ugly workaround: but with that everything works as expected.
xx <- DAT$h
yy <- DAT$w
myFit <- lm(yy ~ xx)
coef(myFit)

myNew <- data.frame(seq(min(DAT$h), max(DAT$h), length.out = 50))
colnames(myNew) <- "xx"  # This fixes it!

pc <- predict(myFit, newdata = myNew, interval = "confidence")
str(pc)

# So: specifying the column in newdata to have same name as the coefficient
# name should work, right?
# Back to the original ...

myFit <- lm(DAT$w ~ DAT$h)
colnames(myNew) <- "`DAT$h`"
# ... same error

colnames(myNew) <- "h"
# ... same error again.

Bottom line: how can I properly specify newdata? The documentation is opaque. 
It seems the algorithm is trying to EXACTLY match the text of the RHS of the 
formula, which is unlikely to result in a useful column name, unless I assign 
to an intermediate variable. There must be a better way ...



Thanks!
Boris

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Package 'coin' - How to extract rho

2020-04-29 Thread Achim Zeileis


On Wed, 29 Apr 2020, Blume Christine wrote:


Dear Achim,

Many thanks indeed. Yes, I had thought about this too and, fortunately, 
the bootstrapping confirms the results from the "normal" test. However, 
I was wondering whether it is mathematically acceptable to report the 
pvals from bootstrapping, but non-bootstrapped correlation coefficients. 
Does someone have an opinion on this?


First, coin doesn't do bootstrapping tests, it does permutation tests.

Second, reporting the rho coefficient from the original sample is surely 
ok, no matter whether you used a bootstrap test, unconditional asymptotic 
test, or permutation test. At least I wouldn't see any reason, not to do 
so.


Best,
Z


-Ursprüngliche Nachricht-
Von: Achim Zeileis 
Gesendet: Mittwoch, 29. April 2020 13:56
An: Blume Christine 
Cc: Jim Lemon ; torsten.hoth...@uzh.ch; 
r-help@r-project.org
Betreff: Re: [R] Package 'coin' - How to extract rho

Christine,

thanks for the example. As far as I can see, the "coin" package does not explicitly compute 
Spearman's rho. This is probably also the reason why it isn't reported in the test output. Thus, you would 
have to do this "by hand" using cor(..., method = "spearman").

Hope that helps,
Achim


On Wed, 29 Apr 2020, Blume Christine wrote:


Dear Jim,



Many thanks for following up on this. Sure, I can provide a sample code. At the 
end of the code you see that I extract the p-value and the test statistic. 
However, I cannot find the correlation coefficient rho anywhere in the object 
“r”.



Best,

Christine



if(!require(pacman)) install.packages("pacman")

pacman::p_load(sn, fGarch, coin)



# Happiness

central_tendency_sim <- 0

dispersion_sim <- 1

skewness_sim <- 1.5



N_sim <- 1



Happiness <- seq(from = 0,

  to = 10,

  length.out = N_sim)



# City size

central_tendency_sim <- 3

dispersion_sim <- 1

skewness_sim <- 1.5



Citysize <- seq(from = 1,

   to = 5,

   length.out = N_sim)



# create dataframe

datastat <- data.frame(Happiness, Citysize)



# Bootstrapped correlation

r <- spearman_test(Happiness ~ Citysize, data = datastat, distribution
= "approximate", alternative = c("two.sided"))

r

pvalue(r)

statistic(r)





-Ursprüngliche Nachricht-
Von: Jim Lemon 
Gesendet: Mittwoch, 29. April 2020 05:53
An: Blume Christine 
Cc: r-help@r-project.org
Betreff: Re: [R] Package 'coin' - How to extract rho



Hi Christine,

I noticed that your question did not receive a reply. As I don't know exactly 
what you have tried, it is a bit difficult to suggest a solution. If you are 
still unable to get this to work, could you provide an example of your present 
code and data if necessary?



Jim



On Mon, Apr 27, 2020 at 3:09 AM Blume Christine 
mailto:christine.bl...@sbg.ac.at>> wrote:






I am using the 'coin' package to compute bootstrapped correlations. I am able 
to extract the p-value with confidence intervals as well as the test statistic 
Z. However, I am unable to find rho, i.e. the correlation coefficient. Can 
someone help?







Kind regards,



Christine











[[alternative HTML version deleted]]







__



R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To
UNSUBSCRIBE and more, see



https://stat.ethz.ch/mailman/listinfo/r-help



PLEASE do read the posting guide



http://www.R-project.org/posting-guide.html



and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Package 'coin' - How to extract rho

2020-04-29 Thread Achim Zeileis


Christine,

thanks for the example. As far as I can see, the "coin" package does not 
explicitly compute Spearman's rho. This is probably also the reason why it 
isn't reported in the test output. Thus, you would have to do this "by 
hand" using cor(..., method = "spearman").


Hope that helps,
Achim


On Wed, 29 Apr 2020, Blume Christine wrote:


Dear Jim,



Many thanks for following up on this. Sure, I can provide a sample code. At the 
end of the code you see that I extract the p-value and the test statistic. 
However, I cannot find the correlation coefficient rho anywhere in the object 
“r”.



Best,

Christine



if(!require(pacman)) install.packages("pacman")

pacman::p_load(sn, fGarch, coin)



# Happiness

central_tendency_sim <- 0

dispersion_sim <- 1

skewness_sim <- 1.5



N_sim <- 1



Happiness <- seq(from = 0,

  to = 10,

  length.out = N_sim)



# City size

central_tendency_sim <- 3

dispersion_sim <- 1

skewness_sim <- 1.5



Citysize <- seq(from = 1,

   to = 5,

   length.out = N_sim)



# create dataframe

datastat <- data.frame(Happiness, Citysize)



# Bootstrapped correlation

r <- spearman_test(Happiness ~ Citysize, data = datastat, distribution = "approximate", 
alternative = c("two.sided"))

r

pvalue(r)

statistic(r)





-Ursprüngliche Nachricht-
Von: Jim Lemon 
Gesendet: Mittwoch, 29. April 2020 05:53
An: Blume Christine 
Cc: r-help@r-project.org
Betreff: Re: [R] Package 'coin' - How to extract rho



Hi Christine,

I noticed that your question did not receive a reply. As I don't know exactly 
what you have tried, it is a bit difficult to suggest a solution. If you are 
still unable to get this to work, could you provide an example of your present 
code and data if necessary?



Jim



On Mon, Apr 27, 2020 at 3:09 AM Blume Christine 
mailto:christine.bl...@sbg.ac.at>> wrote:






I am using the 'coin' package to compute bootstrapped correlations. I am able 
to extract the p-value with confidence intervals as well as the test statistic 
Z. However, I am unable to find rho, i.e. the correlation coefficient. Can 
someone help?







Kind regards,



Christine











[[alternative HTML version deleted]]







__



R-help@r-project.org mailing list -- To 
UNSUBSCRIBE and more, see



https://stat.ethz.ch/mailman/listinfo/r-help



PLEASE do read the posting guide



http://www.R-project.org/posting-guide.html



and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Complete month name from as.yearmon()

2019-04-13 Thread Achim Zeileis


On Sat, 13 Apr 2019, Christofer Bogaso wrote:


Hi,

I am wondering if there is any way to get the full name from as.yearmon()
function. Please consider below example:

library(quantmod)
as.yearmon(Sys.Date())

This gives: [1] "Apr 2019".

How can I extract the full name ie. 'April 2019'


Internally, printing/formatting of yearmon objects uses the format method 
for Date objects, see ?format.Date. The default is "%b %Y" where %b is the 
abbreviated month name in the current locale. %B gives you the full month 
name, both for Date directly and for yearmon:


R> format(Sys.Date(), "%B %Y")
[1] "April 2019"
R> format(as.yearmon(Sys.Date()), "%B %Y")
[1] "April 2019"



Appreciate your pointer. Thanks,

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] strucchange Graph By Week and xts error

2019-03-06 Thread Achim Zeileis


On Thu, 7 Mar 2019, Sparks, John wrote:


Thanks to Achim's direction I now have a re-producible example.

The code below creates a ts object.  The x scale of the last graph runs 
from 0 to 700.


Yes, and hence breakpoints() re-uses that scaling. As I wrote in my 
previous mail you either have to squeeze your data into a regular grid of 
52 weekly observations per year or you have to keep track of the time 
index yourself. See below.


So just need a way to get that scale to show the weeks (or some summary 
of them).


Thanks a bunch.
--JJS


library(strucchange)
library(xts)
library(lubridate)

#rm(list=ls())


data("Nile")
class(Nile)
plot(Nile)
bp.nile <- breakpoints(Nile ~ 1)
ci.nile <- confint(bp.nile, breaks = 1)
lines(ci.nile)


dfNile<-data.frame(as.numeric(Nile))
dfNile$week<-seq(ymd('2012-01-01'),ymd('2013-11-30'),by='weeks')
tsNile <- as.xts(x = dfNile[, -2], order.by = dfNile$week)
tsNile<-as.ts(tsNile)


plot(tsNile)
bp.tsNile <- breakpoints(tsNile ~ 1)
ci.tsNile <- confint(bp.tsNile, breaks = 1)
lines(ci.tsNile)


If you want to use your own non-ts time scale, you can use "xts" (as you 
do above) or "zoo" (as I do below) or keep thing in a plain "data.frame" 
(or similar). Then you just have to index the times with the breakpoints 
or their confidence intervals respectively:


## zoo series
x <- zoo(Nile, seq(ymd('2012-01-01'),ymd('2013-11-30'),by='weeks'))

## breakpoints and confidence intervals
bp <- breakpoints(x ~ 1)
ci <- confint(bp, breaks = 1)

## map time index
cix <- time(x)[ci$confint]

## visualize
plot(x)
abline(v = cix[2], lty = 2)
arrows(cix[1], min(x), cix[3], min(x),
  col = 2, angle = 90, length = 0.05, code = 3)

Above, the $confint is a vector. If it is a matrix (due to more than one 
breakpoint) the code needs to be tweaked to make cix also a matrix and 
then use cix[,i] rather than cix[i] for i = 1, 2, 3.








From: Achim Zeileis 
Sent: Wednesday, March 6, 2019 6:11 PM
To: Sparks, John
Cc: r-help@r-project.org
Subject: Re: [R] strucchange Graph By Week and xts error

On Thu, 7 Mar 2019, Sparks, John wrote:


Hi R Helpers,

I am doing some work at identifying change points in time series data.
A very nice example is given in the R Bloggers post

https://www.r-bloggers.com/a-look-at-strucchange-and-segmented/

The data for the aswan dam in that example is yearly.  My data is
weekly.  I ran the code switching the data for the analysis to my data
and it worked, but the scale of the line chart is not sensible.  I have
225 weekly observations and the x-axis of the line graph shows numbers
from 0 to over 1500.  The information on the ts object is


Unfortunately, breakpoints() can only deal automatically with "ts" time
series not with zoo/xts/... So either you can squeeze your data onto a
regular "ts" grid which may work in the case of weekly data. Or you need
to handle the time index "by hand". See

https://stackoverflow.com/questions/43243548/strucchange-not-reporting-breakdates/43267082#43267082

for an example for this.

As for the as.xts() error below. This is because dfNile[, -2] is still a
"ts" object and then as.xts() sets up "order.by" for you.

Either you use xts() rather than as.xts() or you make the first column in
the data.frame "numeric" rather than "ts", e.g., by starting the
transformation with:

dfNile<-data.frame(as.numeric(Nile))



Start=1
End=1569
Frequency=0.1428...

I can't share the data because it is proprietary.

Wanting to be a good member of the list, I attempted to put weekly
increments on the Nile data so I could reproduce the x axis of the chart
with the axis scale that I am seeing.  Unfortunately, in doing so I got
another error that I don't understand.



library(strucchange)
library(lubridate)
library(xts)

# example from R-Blog runs fine
data(???Nile???)
plot(Nile)
bp.nile <- breakpoints(Nile ~ 1)
ci.nile <- confint(bp.nile, breaks = 1)
lines(ci.nile)

#problem comes in here
dfNile<-data.frame(Nile)
dfNile$week<-seq(ymd('2012-01-01'),ymd('2013-11-30'),by='weeks')
tsNile<-as.xts(x=dfNile[,-2],order.by=dfNile$week)

Error in xts(x.mat, order.by = order.by, frequency = frequency(x), ...) :
 formal argument "order.by" matched by multiple actual arguments


Can somebody help me to put together the ts object with weeks so that I can 
demonstrate the problem with the scale on the x-axis and then try to get some 
help with that original problem?

Much appreciated.
--John Sparks








   [[alternative HTML version deleted]]




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minima

Re: [R] strucchange Graph By Week and xts error

2019-03-06 Thread Achim Zeileis


On Thu, 7 Mar 2019, Sparks, John wrote:


Hi R Helpers,

I am doing some work at identifying change points in time series data. 
A very nice example is given in the R Bloggers post


https://www.r-bloggers.com/a-look-at-strucchange-and-segmented/

The data for the aswan dam in that example is yearly.  My data is 
weekly.  I ran the code switching the data for the analysis to my data 
and it worked, but the scale of the line chart is not sensible.  I have 
225 weekly observations and the x-axis of the line graph shows numbers 
from 0 to over 1500.  The information on the ts object is


Unfortunately, breakpoints() can only deal automatically with "ts" time 
series not with zoo/xts/... So either you can squeeze your data onto a 
regular "ts" grid which may work in the case of weekly data. Or you need 
to handle the time index "by hand". See


https://stackoverflow.com/questions/43243548/strucchange-not-reporting-breakdates/43267082#43267082

for an example for this.

As for the as.xts() error below. This is because dfNile[, -2] is still a 
"ts" object and then as.xts() sets up "order.by" for you.


Either you use xts() rather than as.xts() or you make the first column in 
the data.frame "numeric" rather than "ts", e.g., by starting the 
transformation with:


dfNile<-data.frame(as.numeric(Nile))



Start=1
End=1569
Frequency=0.1428...

I can't share the data because it is proprietary.

Wanting to be a good member of the list, I attempted to put weekly 
increments on the Nile data so I could reproduce the x axis of the chart 
with the axis scale that I am seeing.  Unfortunately, in doing so I got 
another error that I don't understand.




library(strucchange)
library(lubridate)
library(xts)

# example from R-Blog runs fine
data(???Nile???)
plot(Nile)
bp.nile <- breakpoints(Nile ~ 1)
ci.nile <- confint(bp.nile, breaks = 1)
lines(ci.nile)

#problem comes in here
dfNile<-data.frame(Nile)
dfNile$week<-seq(ymd('2012-01-01'),ymd('2013-11-30'),by='weeks')
tsNile<-as.xts(x=dfNile[,-2],order.by=dfNile$week)

Error in xts(x.mat, order.by = order.by, frequency = frequency(x), ...) :
 formal argument "order.by" matched by multiple actual arguments


Can somebody help me to put together the ts object with weeks so that I can 
demonstrate the problem with the scale on the x-axis and then try to get some 
help with that original problem?

Much appreciated.
--John Sparks








[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] getting 21 very different colours

2018-09-11 Thread Achim Zeileis


Have a look at the Polychrome package by Kevin Coombes and Guy Brock:
https://CRAN.R-project.org/package=Polychrome

This employs the LUV space (with HCL = polar LUV) to get many distinct 
distinguishable colors. For a few first steps, see:

https://CRAN.R-project.org/web/packages/Polychrome/vignettes/polychrome.html

On Wed, 12 Sep 2018, Zach Simpson wrote:


Hi Federico

For a possible alternative, the scico package provides a nice
collection of color palettes that are designed to be both color-blind
friendly and differentiable:

https://www.data-imaginist.com/2018/scico-and-the-colour-conundrum/

You could generate a vector of 21 colors (spaced as far apart as
possible on the palette) to pass to your plot arguments with something
like:

library(scico)
scico(21, palette = 'oleron')

Not sure if this works for your case though. But maybe another feature
(shape?) could help differentiate the 21 points.

Hope this helps,
Zach Simpson


Message: 11
Date: Tue, 11 Sep 2018 07:34:51 +
From: Federico Calboli 
To: "r-help@r-project.org" 
Subject: [R] getting 21 very different colours
Message-ID: <08a6397b-4d67-4195-a53a-1fd394f72...@kuleuven.be>
Content-Type: text/plain; charset="utf-8"

Hi All,

I am plotting a scatterplot of 21 populations, and I am using 
rainbow(21)[pops.col] to generate 21 colours for the plot (which works).  Maybe 
it is because I can really process few colours at a time, but the differences 
between the colours are not as strong as I’d like.  I can specify start and end 
for rainbow(), but if anything that looks worse if I do not just stick to 0 and 
1.

Is there a way of getting a set of 21 colours that maximises the differences 
between them?

I could pick them by hand, but that is about 15 colours more than I know (I 
have a detailed colourchart, but the visual differences between ’skyblue’ and 
’slategrey’ elude me when plotted as dots on a plot).

Cheers

F
--
Federico Calboli
LBEG - Laboratory of Biodiversity and Evolutionary Genomics
Charles Deberiotstraat 32 box 2439
3000 Leuven
+32 16 32 87 67


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] F-test where the coefficients in the H_0 is nonzero

2018-08-09 Thread Achim Zeileis


On Thu, 9 Aug 2018, John wrote:


Hi,

  I try to run the same f-test by lm (with summary) and the function
"linearHypothesis" in car package. Why are the results (p-values for the
f-test) different?


The standard F test in the summary output tests the hypothesis that all 
coefficients _except the intercept_ are zero. Thus, all of these are the 
same:


summary(lm1)
## ...
## F-statistic: 0. on 1 and 1 DF,  p-value: 0.6667

linearHypothesis(lm1, "x = 0")
## ...
##   Res.Df RSS Df Sum of Sq  F Pr(>F)
## 1  2 2.0
## 2  1 1.5  1   0.5 0. 0.6667

lm0 <- lm(y ~ 1, data = df1)
anova(lm0, lm1)
## ...
##   Res.Df RSS Df Sum of Sq  F Pr(>F)
## 1  2 2.0
## 2  1 1.5  1   0.5 0. 0.6667





df1<-data.frame(x=c(2,3,4), y=c(7,6,8))
lm1<-lm(y~x, df1)
lm1


Call:
lm(formula = y ~ x, data = df1)

Coefficients:
(Intercept)x
   5.5  0.5


summary(lm1)


Call:
lm(formula = y ~ x, data = df1)

Residuals:
  123
0.5 -1.0  0.5

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
(Intercept)5.500  2.693   2.0430.290
x  0.500  0.866   0.5770.667

Residual standard error: 1.225 on 1 degrees of freedom
Multiple R-squared:   0.25, Adjusted R-squared:   -0.5
F-statistic: 0. on 1 and 1 DF,  p-value: 0.6667


linearHypothesis(lm1, c("(Intercept)=0", "x=0"))

Linear hypothesis test

Hypothesis:
(Intercept) = 0
x = 0

Model 1: restricted model
Model 2: y ~ x

 Res.Df   RSS Df Sum of Sq  F Pr(>F)
1  3 149.0
2  1   1.5  2 147.5 49.167 0.1003

2018-08-03 13:54 GMT+08:00 Annaert Jan :


You can easily test linear restrictions using the function
linearHypothesis() from the car package.
There are several ways to set up the null hypothesis, but a
straightforward one here is:


library(car)
x <- rnorm(10)
y <- x+rnorm(10)
linearHypothesis(lm(y~x), c("(Intercept)=0", "x=1"))

Linear hypothesis test

Hypothesis:
(Intercept) = 0
x = 1

Model 1: restricted model
Model 2: y ~ x

  Res.Df RSS Df Sum of Sq  F Pr(>F)
1 10 10.6218
2  8  9.0001  21.6217 0.7207 0.5155


Jan

From: R-help  on behalf of John <
miao...@gmail.com>
Date: Thursday, 2 August 2018 at 10:44
To: r-help 
Subject: [R] F-test where the coefficients in the H_0 is nonzero

Hi,

   I try to run the regression
   y = beta_0 + beta_1 x
   and test H_0: (beta_0, beta_1) =(0,1) against H_1: H_0 is false
   I believe I can run the regression
   (y-x) = beta_0 +beta_1‘ x
   and do the regular F-test (using lm functio) where the hypothesized
coefficients are all zero.

   Is there any function in R that deal with the case where the
coefficients are nonzero?

John

[[alternative HTML version deleted]]

__
mailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/
posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package colorspace and .WhitePoint question

2018-06-21 Thread Achim Zeileis


As a follow-up to this issue:

A revised version of the "colorspace" package is now available on R-Forge 
at https://R-Forge.R-project.org/R/?group_id=20


This provides a function whitepoint() that can query and/or modify the 
whitepoint used in all color conversions within the package. To try it you 
can do:


install.packages("colorspace", repos="http://R-Forge.R-project.org;)
example("whitepoint", package = "colorspace")

Glenn, it would be great if you could try this and let us know if any 
problems remain.



On Mon, 4 Jun 2018, Achim Zeileis wrote:


Glenn,

currently, this is currently not exposed in "colorspace" AFAICS. You can 
modify it by changing .WhitePoint inside colorspace's NAMESPACE, though:


R> assignInNamespace(".WhitePoint", rbind(c(95, 100, 105)),
+ns = "colorspace")
R> as(XYZ(100, 100, 100), "LAB")
  LAB
[1,] 100 8.622384 3.226371

I'll have another look whether this could be exposed easily (cc also Paul).

Best,
Z

On Mon, 4 Jun 2018, Glenn Davis wrote:


In colorspace.R  I see:

   setAs("color", "LAB", function(from)
 LAB(.Call("as_LAB", from@coords, class(from), .WhitePoint, PACKAGE = "
colorspace"),
 names = dimnames(from@coords)[[1]]))
   ...
   .WhitePoint = NULL

and then in colorspace.c and the function CheckWhite(),
I see that .WhitePoint = NULL is converted to D65.

I would like to pass a different .WhitePoint to
   as( XYZ( 100,100,100)  , "LAB" )


I have tried 3 methods:
   as( XYZ( 100,100,100)  , "LAB", .WhitePoint=XYZ(95,100,105) )
   .WhitePoint = XYZ(95,100,105)
   assign( ".WhitePoint", XYZ(95,100,105), env=as.environment('package:
colorspace') )
but all fail, for different reasons.

How can I transform XYZ to LAB using a whitepoint different than D65 ?

Thanks,
Glenn Davis
gda...@gluonics.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package colorspace and .WhitePoint question

2018-06-04 Thread Achim Zeileis


Glenn,

currently, this is currently not exposed in "colorspace" AFAICS. You can 
modify it by changing .WhitePoint inside colorspace's NAMESPACE, though:


R> assignInNamespace(".WhitePoint", rbind(c(95, 100, 105)),
+ns = "colorspace")
R> as(XYZ(100, 100, 100), "LAB")
   LAB
[1,] 100 8.622384 3.226371

I'll have another look whether this could be exposed easily (cc also 
Paul).


Best,
Z

On Mon, 4 Jun 2018, Glenn Davis wrote:


In colorspace.R  I see:

   setAs("color", "LAB", function(from)
 LAB(.Call("as_LAB", from@coords, class(from), .WhitePoint, PACKAGE = "
colorspace"),
 names = dimnames(from@coords)[[1]]))
   ...
   .WhitePoint = NULL

and then in colorspace.c and the function CheckWhite(),
I see that .WhitePoint = NULL is converted to D65.

I would like to pass a different .WhitePoint to
   as( XYZ( 100,100,100)  , "LAB" )


I have tried 3 methods:
   as( XYZ( 100,100,100)  , "LAB", .WhitePoint=XYZ(95,100,105) )
   .WhitePoint = XYZ(95,100,105)
   assign( ".WhitePoint", XYZ(95,100,105), env=as.environment('package:
colorspace') )
but all fail, for different reasons.

How can I transform XYZ to LAB using a whitepoint different than D65 ?

Thanks,
Glenn Davis
gda...@gluonics.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rolling window difference for zoo time series

2018-04-24 Thread Achim Zeileis


On Tue, 24 Apr 2018, Eric Berger wrote:


Zoo_TS/lag(Zoo_TS) - 1


Or:

diff(Zoo_TS, arithmetic = FALSE) - 1



On Tue, Apr 24, 2018 at 9:29 PM, Christofer Bogaso <
bogaso.christo...@gmail.com> wrote:


Hi,

I have a 'zoo' time series as below :

Zoo_TS = zoo(5:1, as.Date(Sys.time())+0:4)

Now I want to calculate First order difference of order 1, rolling
window basis i.e.

(Zoo_TS[2] - Zoo_TS[1] ) / Zoo_TS[1]
(Zoo_TS[3] - Zoo_TS[2] ) / Zoo_TS[2]
.

Is there any direct function available to achieve this?

Thanks,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/
posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Understanding TS objects

2018-03-13 Thread Achim Zeileis


On Tue, 13 Mar 2018, JEFFERY REICHMAN wrote:


R Help Community

I'm trying to understand time series (TS) objects.  Thought I understood 
but recently have run into a series of error messages that I'm not sure 
how to handle.  I have 15 years of quarterly data and I typically create 
a TS object via something like...


data.ts <- ts(mydata, start = 2002, frequency = 4)

this create a matric as opposed to a vector object


This depends on what "mydata" is which you haven't shown...

If "mydata" is a univariate vector, everything works ok:

mydata <- rnorm(15 * 4)
data.ts <- ts(mydata, start = 2002, frequency = 4)
data.stl <- stl(data.ts, "periodic")

However, if "mydata" is a matrix, e.g., a 3-column matrix in the example 
below:


mydata <- matrix(rnorm(15 * 4 * 3), ncol = 3)

then the error occurs.

Furthermore, the same problem will occur if mydata is 1-column matrix or a 
1-column data frame.


as I receive a univariate error when I try to decompose the data using 
the STL function


data.stl <- stl(data.ts, "periodic")
Error in stl(data.ts, "periodic") : only univariate series are allowed

ok so

is.vector(data.ts)
[1] FALSE


This is always FALSE for a "ts" object, even if it is univariate, because 
it has further attributes, namely the time-series properties (tsp).



so to convert to a vector I'll use
data.ts <- as.vector(data.ts)


This will drop the "ts" class.

The cleanest way is probably to create a vector "mydata" and then the 
univariate "data.ts".


hth,
Z


but then I lose the frequency as the periods as the data becomes frequency = 1
data.ts <- stl <- stl(data.ts, "periodic")
Error in stl(data.ts, "periodic") :
  series is not periodic or has less than two periods.

So am I missing a  parameter or is there a more general/proper way to create a 
time series object? First time I've run into this problem .  I can always 
decompose  via an alternative methods so there are work arounds.  But just 
trying to understand what I'm not doing programmatically at this point.

Jeff Reichman

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] hurdle model - count and response predictions

2018-02-16 Thread Achim Zeileis

I answered the question on SO. In short the differences come from 
truncated vs. untruncated models and conditional vs. unconditional 
expectations. Feel free to follow-up on SO or here on the list...


On Fri, 16 Feb 2018, John Wilson wrote:


Hello,

I'm using pscl to run a hurdle model. Everything works great until I get to
the point of making predictions. All of my "count" predictions are lower
than my actual data, and lower than the "response" predictions, similar to
the issue described here (
https://stat.ethz.ch/pipermail/r-help/2012-August/320426.html) and here (
https://stackoverflow.com/questions/48794622/hurdle-model-prediction-count-vs-response
).

Since the issue is the same (and not resolved), I'll just use the example
from the second link:

library("pscl")
data("RecreationDemand", package = "AER")

## model
m <- hurdle(trips ~ quality | ski, data = RecreationDemand, dist = "negbin")
nd <- data.frame(quality = 0:5, ski = "no")
predict(m, newdata = nd, type = "count")
predict(m, newdata = nd, type = "response")

The presence/absence part of the model gives identical estimates to a
logistic model run on the data. However, I thought that the negbin part of
the hurdle should give identical estimates to a separate, glm.nb model of
the positive data. But I get completely different values...

library(MASS)
m.nb <- glm.nb(trips ~ quality, data =
RecreationDemand[RecreationDemand$trips > 0,])
predict(m, newdata = nd, type = "count") ## hurdle
predict(m.nb, newdata = nd, type = "response") ## positive counts only

Any help would be appreciated.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting quarterly time series

2018-01-28 Thread Achim Zeileis


On Sun, 28 Jan 2018, p...@philipsmith.ca wrote:

I have a data set with quarterly time series for several variables. The 
time index is recorded in column 1 of the dataframe as a character 
vector "Q1 1961", "Q2 1961","Q3 1961", "Q4 1961", "Q1 1962", etc. I want 
to produce line plots with ggplot2, but it seems I need to convert the 
time index from character to date class. Is that right? If so, how do I 
make the conversion?


You can use the yearqtr class in the zoo package, converting with 
as.yearqtr(..., format = "Q%q %Y"). zoo also provides an autoplot() method 
for ggplot2-based time series visualizations. See ?autoplot.zoo for 
various examples.


## example data similar to your description
d <- data.frame(sin = sin(1:8), cos = cos(1:8))
d$time <- c("Q1 1961", "Q2 1961", "Q3 1961", "Q4 1961", "Q1 1962",
  "Q2 1962", "Q3 1962", "Q4 1962")

## convert to zoo series
library("zoo")
z <- zoo(as.matrix(d[, 1:2]), as.yearqtr(d$time, "Q%q %Y"))

## ggplot2 display
library("ggplot2")
autoplot(z)

## with nicer axis scaling
autoplot(z) + scale_x_yearqtr()

## some variations
autoplot(z, facets = Series ~ .) + scale_x_yearqtr() + geom_point()
autoplot(z, facets = NULL) + scale_x_yearqtr() + geom_point()



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Discrete valued time series data sets.

2018-01-02 Thread Achim Zeileis

The "tscount" package (see http://doi.org/10.18637/jss.v082.i05) comes 
with several count data time series. Maybe this is the kind of discrete 
data you were interested in?


hth,
Z

On Tue, 2 Jan 2018, Eric Berger wrote:


Hi Rolf,
I looked at
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-public-data-sets

One of the first sets in the list is the airline time series (I think it is
also used in dplyr examples).

https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp

You might find other possibilities in that list.

HTH,
Eric


On Tue, Jan 2, 2018 at 12:44 AM, Rolf Turner 
wrote:



I am looking for (publicly available) examples of discrete valued time
series data sets.  I have googled around a bit and have found lots of
articles and books on discrete valued time series, but have had no success
in locating sites at which data are available.

Can anyone make any useful suggestions?

Thanks.

cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posti
ng-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fortune candidate

2017-12-17 Thread Achim Zeileis


On Sun, 17 Dec 2017, J C Nash wrote:


From Dirk E. on the R-devel list


Many things a developer / power-user would do are very difficult on
Windows. It is one of the charms of the platform. On the other hand you do
get a few solitaire games so I guess everybody is happy.


Thanks, added to the devel version of "fortunes" on R-Forge.
Z



JN

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Graph f(x) = 1/x

2017-09-19 Thread Achim Zeileis




On Tue, 19 Sep 2017, AbouEl-Makarim Aboueissa wrote:


Dear All: good morning

I am trying to graph the function y=f(x)=1/x over the interval (-5,5). But
I am getting an error message. Please see below.

I am getting the error message: *Error in xy.coords(x, y, xlabel, ylabel,
log) : *
*  'x' and 'y' lengths differ*


You have "y < 1/x" rather than "y <- 1/x"! So "y" is not assigned and 
presumably you have some old "y" variable in your global environment that 
is used and does not match the length of "x".




x

x <- seq(-5, 5, 0.01)
y < 1/x

plot(x,y, type='l', xlim=c(-5, 5), ylim=c(-5, 5), xlab = "x", ylab = "f(x)
= 1/x", lwd = 2, col ="red")

abline(h=0, lty=2, col = "blue")
abline(v=0, lty=2, col = "blue")
axis(1)
axis(2)
title(main="The Graph of f(x) = 1/x")


any help will be highly appreciated.


with thanks
abou
__
AbouEl-Makarim Aboueissa, PhD
Professor of Statistics
Department of Mathematics and Statistics
University of Southern Maine

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Precision error in time index of ts objects

2017-09-02 Thread Achim Zeileis

On Sat, 2 Sep 2017, Andrea Altomani wrote:

Thanks for the very detailed explanation.
I did not create the series using structure(), that was the result of dump()
on an intermediate object created within tsdisagg::ta(),

There is no tsdisagg package on CRAN, just tsdisagg2. But this does not 
have a function ta(). So I guess it's tempdisagg you are using?

which is where I found the error in the first place. ta() indeed 
manipulates .Tsp directly, rather than using ts. I guess this is a bug 
in tsdisagg then.

I just grabbed the latest version of tempdisagg from CRAN and this does 
not seem to have ".Tsp" anywhere in the code. It employs ts() in a couple 
of places so I'm not sure which part of the code you are referring to 
exactly.

On Sat, Sep 2, 2017 at 12:31 AM Achim Zeileis <achim.zeil...@uibk.ac.at>
wrote:
  On Fri, 1 Sep 2017, Andrea Altomani wrote:

  > I should have formulated my question in a more specific way.
  >
  > 1. I suspect this is a floating point precision issue. I am
  not very
  > knowledgeable about R internals, can someone else confirm it?

  Yes. If you represent a series with increment 1/12 it depends on
  how you
  do it. As a simple example consider the following two
  descriptions of the
  same time point:

  2 - 1/12
  ## [1] 1.916667

  1 + 11/12
  ## [1] 1.916667

  However, both are not identical:

  (2 - 1/12) == (1 + 11/12)
  ## [1] FALSE

  The difference is just the .Machine$double.eps:

  (2 - 1/12) - (1 + 11/12)
  ## [1] 2.220446e-16

  > 2. Should this be considered a bug or not, because it is "just
  a
  > precision issue"? Should I report it?

  I don't think it is a bug because of the (non-standard) way how
  you
  created the time series.

  > 3. How can it happen? From a quick review of ts.R, it looks
  like the values
  > of the time index are never modified, but only possibly
  removed. In my case:
  >   - x and y have the same index.
  >   - the subtraction operator recognizes this, and create a new
  ts with one
  > entry
  >   - the result of the subtraction has an index which is
  different from the
  > input.
  >  This is very surprising to me, and I am curious to understand
  the problem.

  The object 'x' and hence the object 'y' have the same time
  index. But in
  'z' a new time index is created which is subtly different from
  that of
  'x'. The reason for this is that R doesn't expect an object like
  'x' to
  exist.

  You should create a "ts" object with ts(), e.g.,

  x <- ts(2017, start = c(2017, 6), freqency = 12)

  But you created something close to the internal
  representation...but not
  close enough:

  y <- structure(2017, .Tsp = c(2017.416667, 2017.416667, 12),
  class = "ts")

  The print functions prints both print(x) and print(y) as

         Jun
  2017 2017

  However, aligning the two time indexes in x - y or
  ts.intersect(x, y) does
  not work...because they are not the same

  as.numeric(time(x)) - as.numeric(time(y))
  ## [1] -3.32e-07

  The "ts" code tries to avoid these situations by making many
  time index
  comparisons only up to a precision of getOption("ts.eps") (1e-5
  by
  default) but this is not used everywhere. See ?options:

       'ts.eps': the relative tolerance for certain time series
  ('ts')
             computations.  Default '1e-05'.

  Of course, you could ask for this being used in more places,
  e.g., in
  stats:::.cbind.ts() where (st > en) is used rather than ((st -
  en) >
  getOption("ts.eps")). But it's probably safer to just use ts()
  rather than
  structure(). Or if you use the latter make sure that you do at a
  high
  enough precision.

  hth,
  Z

  > On Fri, Sep 1, 2017 at 5:53 PM Jeff Newmiller
  <jdnew...@dcn.davis.ca.us>
  > wrote:
  >
  >> You already know the answer. Why ask?
  >> --
  >> Sent from my phone. Please excuse my brevity.
  >>
  >> On September 1, 2017 7:23:24 AM PDT, Andrea Altomani <
  >> altomani.and...@gmail.com> wrote:
  >>> I have a time series x, and two other series obtained from
  it:
  >>>
  >>> x <- structure(2017, .Tsp = c(2017.417,
  2017.417, 12),
  >>> class = "ts")
  >>> y <- floor(x)
  >>> z <- x-y
  >>>
  >>> I would expect the three series to have exactly the same
  index.
  >>> However I get the following
  >>>
  >>&

Re: [R] Precision error in time index of ts objects

2017-09-01 Thread Achim Zeileis


On Fri, 1 Sep 2017, Andrea Altomani wrote:


I should have formulated my question in a more specific way.

1. I suspect this is a floating point precision issue. I am not very 
knowledgeable about R internals, can someone else confirm it?


Yes. If you represent a series with increment 1/12 it depends on how you 
do it. As a simple example consider the following two descriptions of the 
same time point:


2 - 1/12
## [1] 1.916667

1 + 11/12
## [1] 1.916667

However, both are not identical:

(2 - 1/12) == (1 + 11/12)
## [1] FALSE

The difference is just the .Machine$double.eps:

(2 - 1/12) - (1 + 11/12)
## [1] 2.220446e-16

2. Should this be considered a bug or not, because it is "just a 
precision issue"? Should I report it?


I don't think it is a bug because of the (non-standard) way how you 
created the time series.



3. How can it happen? From a quick review of ts.R, it looks like the values
of the time index are never modified, but only possibly removed. In my case:
  - x and y have the same index.
  - the subtraction operator recognizes this, and create a new ts with one
entry
  - the result of the subtraction has an index which is different from the
input.
 This is very surprising to me, and I am curious to understand the problem.


The object 'x' and hence the object 'y' have the same time index. But in 
'z' a new time index is created which is subtly different from that of 
'x'. The reason for this is that R doesn't expect an object like 'x' to 
exist.


You should create a "ts" object with ts(), e.g.,

x <- ts(2017, start = c(2017, 6), freqency = 12)

But you created something close to the internal representation...but not 
close enough:


y <- structure(2017, .Tsp = c(2017.416667, 2017.416667, 12), class = "ts")

The print functions prints both print(x) and print(y) as

  Jun
2017 2017

However, aligning the two time indexes in x - y or ts.intersect(x, y) does 
not work...because they are not the same


as.numeric(time(x)) - as.numeric(time(y))
## [1] -3.32e-07

The "ts" code tries to avoid these situations by making many time index 
comparisons only up to a precision of getOption("ts.eps") (1e-5 by 
default) but this is not used everywhere. See ?options:


'ts.eps': the relative tolerance for certain time series ('ts')
  computations.  Default '1e-05'.

Of course, you could ask for this being used in more places, e.g., in 
stats:::.cbind.ts() where (st > en) is used rather than ((st - en) > 
getOption("ts.eps")). But it's probably safer to just use ts() rather than 
structure(). Or if you use the latter make sure that you do at a high 
enough precision.


hth,
Z



On Fri, Sep 1, 2017 at 5:53 PM Jeff Newmiller 
wrote:


You already know the answer. Why ask?
--
Sent from my phone. Please excuse my brevity.

On September 1, 2017 7:23:24 AM PDT, Andrea Altomani <
altomani.and...@gmail.com> wrote:

I have a time series x, and two other series obtained from it:

x <- structure(2017, .Tsp = c(2017.417, 2017.417, 12),
class = "ts")
y <- floor(x)
z <- x-y

I would expect the three series to have exactly the same index.
However I get the following


time(x)-time(y)

Jun
2017   0

as expected, but


time(x)-time(z)

integer(0)
Warning message:
In .cbind.ts(list(e1, e2), c(deparse(substitute(e1))[1L],
deparse(substitute(e2))[1L]),  :
 non-intersecting series

and indeed, comparing the indices gives:


time(x)[1]-time(z)[1]

[1] 3.183231e-12

Is this a bug in R, or is it one of the expected precision errors due
to the use of limited precision floats?

I am using R 3.4.0 (2017-04-21) on Windows (64-bit).

Thaks!

Andrea Altomani

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-Package for Recursive Partitioning without Classification or Regression

2017-07-31 Thread Achim Zeileis




On Fri, 28 Jul 2017, Tom D. Harray wrote:


Hello,

I have a question related to recursive partitioning, but I cannot find
an answer, likely because I don't know how to properly word my Google
search query.


I think you are looking for "divisive hierarchical clustering" which is 
the more commonly used term for clustering based on recursive 
partitioning. The classical implementation for this would be diana() from 
the "cluster" package. The "Cluster" task view at 
https://CRAN.R-project.org/view=Cluster also lists "isopam" and 
http://search.r-project.org/cgi-bin/namazu.cgi?query=divisive+clustering=functions=views

also gives a few further leads.


All recursive partitioning examples, which I can find, are used for
either classification or regression trees like

  library(tree)
  data(iris)
  tree(Species ~ Sepal.Width + Petal.Width, data = iris)

which implies building a model. However, I would like to split data
like clustering similar to decision tree methods, because I have
nothing to predict.


But do you want to split along observed variables or not? If not, then 
you're in a unsupervised clustering situation (see my comment above). But 
if you want to split along covariates, then this is supervised and you're 
possibly looking for a multivariate regression tree?


Hope that helps,
Z


My question is: Is there a package, which I can use to partition my
data without classification or regression so that it resembles
clustering methods?


Thanks and regards,

Dirk

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Classification and Regression Tree for Survival Analysis

2017-06-13 Thread Achim Zeileis


On Tue, 13 Jun 2017, Dimitrie Siriopol via R-help wrote:


I am trying to use the CART in a survival analysis. I have three variables of 
interest (all 3 ordinal - x, y and z, each of them with 5 categories) from 
which I want to make smaller groups (just an example 1st category from X 
variable with the 2nd and 3rd categories from the Y category and 2, 3 and 4 
categories from the Z category etc) based on their, let's say, association with 
mortality.
Now I would also want that this analysis to be adjusted for a number of 
variables (that I don't want to incorporate in the decision tree, just to take 
them into consideration in the relationship between the 3 variables and the 
outcome; I would also want to mention that for this confounders I have missing 
values - how should this be deal with?), this survival analysis to be 
stratified and also to use clusters.
I have tried party and rpart packages, but I don't seem to get how to properly 
do what I want.


I don't think that such an analysis is available "out of the box". In 
principle, you can iterate between (a) estimating a survival regression 
with the confounders - given the groups from the tree, and (b) estimating 
the tree - given an offset in the survival regression for the confounders. 
Such a strategy is implemented in the palmtree() function from the 
"partykit" package - however only for lm() and glm() models, not for 
survreg(). But the same idea could be applied in that case as well, e.g., 
using a Weibull distribution.


For incorporating stratification/clustering one could either use clustered 
inference in the variable selection or add some random effect. For lm/glm 
this is provided in the package "glmertree" but I don't think there are 
readily available code blocks to do the same for a survival response.


And as for the missing values in the confounders: I can't think of a good 
strategy for this. One could try generic imputation strategies but it's 
rather unlikely that this does not affect the subsequent regression plus 
tree selection process.


References for palmtree and glmertree:
http://arxiv.org/abs/1612.07498
http://EconPapers.RePEc.org/RePEc:inn:wpaper:2015-10


Thank you

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Augmented Dickey Fuller test

2017-04-28 Thread Achim Zeileis


On Fri, 28 Apr 2017, T.Riedle wrote:


Dear all,

I am trying to run an ADF test using the adf.test() function in the 
tseries package and the ur.df() function in the urca package. The 
results I get contrast sharply. Whilst the adf.test() indicates 
stationarity which is in line with the corresponding graph, the ur.df() 
indicates non-stationarity.


Why does this happen?


This is likely due to different setting for the deterministic part of the 
model and/or the number of lags tested. The defaults of ur.df() are often 
not suitable for many practical applications which might to spurious 
significant results.


Could anybody explain the adf.test() function in more detail? How does 
adf.test() select the number of lags is it AIC or BIC and how does it 
take an intercept and/or a trend into account?


There is a deterministic trend and the default number of lags is selected 
by a heuristic.


At

https://stats.stackexchange.com/questions/168332/r-augmented-dickey-fuller-adf-test/168355#168355

I've summarized an overview that I had written for my students. It might 
also be helpful for you.


hth,
Z

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon Test

2017-04-21 Thread Achim Zeileis


On Fri, 21 Apr 2017, peter dalgaard wrote:

Also, as far as I know just for historical consistency, the test 
statistic in R is the rank sum of the first group MINUS its minimum 
possible value: W = 110.5 - sum(1:13) = 19.5


Ah, yes, I meant to add that remark. And coin::wilcox_test always computes 
a standardized test statistic as opposed to the (adjusted) rank sum. But 
these are all "simple" transformations of the test statistic and hence do 
not influence the p-values.


See also the "Note" in ?wilcox.test on the difference between so-called 
Wilcoxon and Mann-Whitney statistics.



On 21 Apr 2017, at 14:54 , Achim Zeileis <achim.zeil...@uibk.ac.at> wrote:

On Fri, 21 Apr 2017, Tripoli Massimiliano wrote:


Dear R users,
Why the result of Wilcoxon sum rank test by R is different from sas

https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_npar1way_sect022.htm

The code is next:

sampleA <- c(1.94, 1.94, 2.92, 2.92, 2.92, 2.92, 3.27, 3.27, 3.27, 3.27,
  3.7, 3.7, 3.74)

sampleB <- c(3.27, 3.27, 3.27, 3.7, 3.7, 3.74)
wilcox.test(A,B,paired = F)


There are different ways how to compute or approximate the asymptotic or exact 
conditional distribution of the test statistic:

SAS reports an asymptotic normal approximation (apparently without continuity 
correction along with an asymptotic t approximation and the exact conditional 
distribution.

Base R's stats::wilcox.test can either report the exact conditional 
distribution (but only if there are no ties) or the asymptotic normal 
distribution (with or without continuity correction). In small samples the 
default is to use the former but a warning is issued when there are ties (as in 
your case).

Furthermore, coin::wilcox_test can report either the asymptotic normal 
distribution (without continuity correction) or the exact conditional 
distribution (even in the presence of ties).

Thus:

## collect data in data.frame
d <- data.frame(
 y = c(sampleA, sampleB),
 x = factor(rep(0:1, c(length(sampleA), length(sampleB
)

## asymptotic normal distribution without continuity correction
## (p = 0.0764)
stats::wilcox.test(y ~ x, data = d, exact = FALSE, correct = FALSE)
coin::wilcox_test(y ~ x, data = d, distribution = "asymptotic")

## exact conditional distribution (p = 0.1054)
coin::wilcox_test(y ~ x, data = d, distribution = "exact")

These match SAS's results. The default result of stats::wilcox.test is 
different as explained by the warning issued.

hth,
Z

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com












__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon Test

2017-04-21 Thread Achim Zeileis


On Fri, 21 Apr 2017, Tripoli Massimiliano wrote:


Dear R users,
Why the result of Wilcoxon sum rank test by R is different from sas

https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_npar1way_sect022.htm

The code is next:

sampleA <- c(1.94, 1.94, 2.92, 2.92, 2.92, 2.92, 3.27, 3.27, 3.27, 3.27,
   3.7, 3.7, 3.74)

sampleB <- c(3.27, 3.27, 3.27, 3.7, 3.7, 3.74)
wilcox.test(A,B,paired = F)


There are different ways how to compute or approximate the asymptotic or 
exact conditional distribution of the test statistic:


SAS reports an asymptotic normal approximation (apparently without 
continuity correction along with an asymptotic t approximation and the 
exact conditional distribution.


Base R's stats::wilcox.test can either report the exact conditional 
distribution (but only if there are no ties) or the asymptotic normal 
distribution (with or without continuity correction). In small samples the 
default is to use the former but a warning is issued when there are ties 
(as in your case).


Furthermore, coin::wilcox_test can report either the asymptotic normal 
distribution (without continuity correction) or the exact conditional 
distribution (even in the presence of ties).


Thus:

## collect data in data.frame
d <- data.frame(
  y = c(sampleA, sampleB),
  x = factor(rep(0:1, c(length(sampleA), length(sampleB
)

## asymptotic normal distribution without continuity correction
## (p = 0.0764)
stats::wilcox.test(y ~ x, data = d, exact = FALSE, correct = FALSE)
coin::wilcox_test(y ~ x, data = d, distribution = "asymptotic")

## exact conditional distribution (p = 0.1054)
coin::wilcox_test(y ~ x, data = d, distribution = "exact")

These match SAS's results. The default result of stats::wilcox.test is 
different as explained by the warning issued.


hth,
Z

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Piecewise continuous Poisson regression

2017-04-07 Thread Achim Zeileis


On Fri, 7 Apr 2017, Sorkin, John wrote:

Is there an R package that will perform a piecewise continuous Poisson 
regression? I want to model two linear segments that intersect at a 
common knot.


The "segmented" package implements such broken stick regressions based on 
either "lm" or "glm" models. The latter include Poisson regression.



Thank you,
John

John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric 
Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using betareg package to fit beta mixture with given initial parameters

2017-03-22 Thread Achim Zeileis


On Wed, 22 Mar 2017, Michael Dayan wrote:


The method of setting the initial values given lambda, alpha1, etc. should
not depend on the exact values of lambda, alpha1, etc. in my situation,
i.e. it does not depend on my data.


Presently, flexmix() that betamix() is built on cannot take the parameters 
directly for initialization. However, it is possible to pass a matrix with 
initial 'cluster' probabilities. This can be easily generated using 
dbeta().


For a concrete example consider the data generated in this discussion on 
SO:


http://stats.stackexchange.com/questions/114959/mixture-of-beta-distributions-full-example

Using that data with random starting values requires 42 iterations until 
convergence:


set.seed(0)
m1 <- betamix(y ~ 1 | 1, data = d, k = 2)
m1

## Call:
## betamix(formula = y ~ 1 | 1, data = d, k = 2)
## 
## Cluster sizes:

##   1   2
##  50 100
## 
## convergence after 42 iterations


Instead we could initialize with the posterior probabilities obtained at 
the observed data (d$y), the true alpha/beta parameters (10; 30 vs. 30; 
10) and the true cluster proportions (2/3 vs. 1/3):


p <- cbind(2/3 * dbeta(d$y, 10, 30), 1/3 * dbeta(d$y, 30, 10))
p <- p/rowSums(p)

This converges after only 2 iterations:

set.seed(0)
m2 <- betamix(y ~ 1 | 1, data = d, k = 2, cluster = p)
m2

## Call:
## betamix(formula = y ~ 1 | 1, data = d, k = 2, cluster = p)
## 
## Cluster sizes:

##   1   2
## 100  50
## 
## convergence after 2 iterations


Up to label switching and small numerical differences, the parameter 
estimates agree. (Of course, these are on the mu/phi scale and not 
alpha/beta as explained in the SO post linked above.)


coef(m1)
##(Intercept) (phi)_(Intercept)
## Comp.11.196286  3.867808
## Comp.2   -1.096487  3.898976
coef(m2)
##(Intercept) (phi)_(Intercept)
## Comp.1   -1.096487  3.898976
## Comp.21.196286  3.867808



On Mar 22, 2017 04:30, "David Winsemius"  wrote:



On Mar 21, 2017, at 5:04 AM, Michael Dayan 

wrote:


Hi,

I would like to fit a mixture of two beta distributions with parameters
(alpha1, beta1) for the first component, (alpha2, beta2) for the second
component, and lambda for the mixing parameter. I also would like to set a
maximum of 200 iterations and a tolerance of 1e-08.

My question is: how to use the betareg package to run the fit with initial
values for the parameters alpha1, beta1, alpha2, beta2 and lambda? I saw

in

the documentation that I would need to use the 'start' option of the
betareg function, with start described as "an optional vector with

starting

values for all parameters (including phi)". However I could not find how

to

define this list given my alpha1, beta1, alpha2, beta2 and lambda.

The current code I have is:
mydata$y <- 
bmix <- betamix(y ~ 1 | 1, data = mydata, k = 2, fstol = 1e-08, maxit =
200, fsmaxit = 200)


And I suspect I would need to do something along the lines:

initial.vals <- c(?, ?, ?, ?, ?)
bmix <- betamix(y ~ 1 | 1, data = mydata, k = 2, fstol = 1e-08, maxit =
200, fsmaxit = 200, control=betareg.control(start=initial.vals)))

But I do not know what to use for initial.vals.


If there were sensitivity to data, then wouldn't  that depend on your
(unprovided) data?




Best wishes,

Michael

  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/

posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


David Winsemius
Alameda, CA, USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] coeftest with covariance matrix

2017-03-16 Thread Achim Zeileis



On Thu, 16 Mar 2017, alfonso.carf...@uniparthenope.it wrote:


Hi all,


I want to ask you which is the difference between the specifyng and not 
specifyng the covariance matrix of the estimated coefficients when 
performing the coeftest command.


coeftest(object, ...) computes Wald statistics for all coefficients. Hence 
coef(object) is used to extract the coefficients and then, by default, 
vcov(object) is used to extract the variance-covariance matrix. For lm() 
models this computes the "usual" covariance matrix estimate assuming 
homoskedastic and uncorelated errors.


When you supply coeftest(object, vcov = vcovHC) then a 
heteroscedasticity-consistent covariance matrix estimate is used (HC3 by 
default).


See vignette("sandwich", package = "sandwich") for more details.

I'm estimating a VECM model and I want to test the significance of the 
short-run casual effects of the explanatory variables:


mod<-cajorls(ca.jo(data[,4:6], ecdet = "const", type="eigen", K=2, 
spec="longrun"))$rlm


The command:

coeftest(mod)

give me different results with respect to this one:

V<-vcovHC(mod)
coeftest(mod,V)




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmtest package - Difference between vcov and vcov.

2017-02-19 Thread Achim Zeileis


On Sun, 19 Feb 2017, T.Riedle wrote:


Dear all,

I want to run a regression using coeftest() in combination with the 
waldtest() function from the lmtest package. I am confused about the 
argument vcov. The coeftest uses vcov. whereas according to the manual 
waldtest uses vcov and I am not sure about the difference between vcov. 
in coeftest() and vcov in waldtest().


In both cases this is intended to be used as "vcov = ..." by the end-user, 
e.g., "vcov = sandwich" etc. The reason why "vcov." rather than "vcov" was 
used was to avoid name confusions in pre-NAMESPACE times.


The 'trick' with vcov. (rather than vcov) could not be used in waldtest() 
because that always requires exact matching due to the preceeding ... 
argument..


If I use vcov. and vcov in the waldtest, I get different results for the 
F-test and the p-value. In addition, vcov. returns an error message that 
for numeric model specifications all values have to be >=1.


Yes, because then the specification goes into '...' rather than 'vcov' and 
is interpreted as a rule for model updating. This cannot work, though.


The sandwich package vignette (e.g. p. 10) uses vcov = ... as argument 
in the coeftest() function.



Hence, my question is which argument to use in the both functions coeftest() 
and waldtest(). Shall I use vcov. in coeftest() and vcov in waldtest() or 
should I use vcov in both functions?



I kindly ask for your help.



Thanks in advance.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] coeftest() R squared

2017-02-15 Thread Achim Zeileis


On Wed, 15 Feb 2017, T.Riedle wrote:



Dear all,
I want to run a regression using lm() with Newey West corrected standard errors.

This is the code

Reg<-lm(g~sent + liquidity + Cape, data=dataUsa)
CoefNW<-coeftest(Reg, vcov.=NeweyWest)
CoefNW

In contrast to summary(Reg) the output of CoefNW neither returns the 
adjusted R squared nor the F-statistic. How can I obtain the R squared 
for coeftest? Alternatively, how do I get robust standard errors and the 
R squared of the regression?


The adjusted analogue to the F statistic can be obtained by
waldtest(Reg, vcov = NeweyWest)

For the R-squared there is no appropriate quantity with analogous
properties. Hence nothing is provided in the package.


Thanks for your help.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Date Column error: 'origin' must be supplied

2016-12-30 Thread Achim Zeileis


On Fri, 30 Dec 2016, Frederic Ntirenganya wrote:


Hi All,

I am creating date column on my data but getting the following error:

#add date column dat1$Date=paste(as.Date(dat1$Year,dat1$Month,
dat1$Day, sep="-"))Error in as.Date.numeric(origin, ...) : 'origin'
must be supplied


You first need to paste() the character string and the coerce it with 
as.Date() - not the other way around:


as.Date(paste(dat1$Year, dat1$Month, dat1$Day, sep="-"))




I will appreciate any help from you guys. Thanks.
Here is the data.


dput(head(dat1))structure(list(Year = c(1984L, 1984L, 1984L, 1984L, 1984L, 1984L
), Month = c(1L, 1L, 1L, 1L, 1L, 1L), Day = 1:6, WindSpeed = c(5L,
4L, 4L, 3L, 5L, 6L), Sunshine = c(6.3, 4.8, 0.6, 8.2, 7.3, 1.7
), Tmax = c(27.4, 26.3, 22.9, 27.7, 28.5, 25.5), Tmin = c(14.5,
16, 14.4, 14.8, 16.6, 15.4), Hmax = c(100L, 95L, 97L, 100L, 97L,
99L), Hmin = c(45L, 62L, 72L, 55L, 54L, 63L), Station.Name = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "KIGALI AERO", class = "factor"),
   Elevation = c(1490L, 1490L, 1490L, 1490L, 1490L, 1490L),
   Longitude = c(30.11, 30.11, 30.11, 30.11, 30.11, 30.11),
   Latitude = c(-1.95, -1.95, -1.95, -1.95, -1.95, -1.95)), .Names = c("Year",
"Month", "Day", "WindSpeed", "Sunshine", "Tmax", "Tmin", "Hmax",
"Hmin", "Station.Name", "Elevation", "Longitude", "Latitude"),
row.names = c(NA,
6L), class = "data.frame")


Best Regards,

Fredo


Frederic Ntirenganya
Maseno University,
African Maths Initiative,
Kenya.
Mobile:(+254)718492836
Email: fr...@aims.ac.za
https://sites.google.com/a/aims.ac.za/fredo/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tests for significance on conditional inference trees from party package

2016-12-13 Thread Achim Zeileis


Adrian,

thanks for your interest.

On Tue, 13 Dec 2016, Adrian Johnson wrote:


Dear group,
Please allow me to ask a naive question and pardon if it is qualified
as stupid question.

I am using party package to classify covariates and predict distribution 
of survival times for the classified variables. Typically I have a 
matrix of covariates (columns) including outcome data (overall survival 
in months, censor status) and other covariates I want to split in tree 
(such as treatment dose etc. ) . Rows are patients (~1000 patients).


Now similarly I have many such matrices (4K)  with completely different 
set of covariates but identical outcome data and patients (in rows). i 
cannot combine all data into a giant matrix,because these covariates are 
totally independent.


If the response variable is the same and the patients are the same, then I 
don't see why - conceptionally - you couldn't combine "totally 
independent" variables in the same tree. Or maybe I misunderstand what 
"totally independent" is.


Practically - however, choosing a tree from 4,000 regressor variables will 
be challenging, especially if you want to adjust in some way for the 
multiple testing. So maybe some additional structure would help here.



Currently I am running this model in a loop and storing the tree and
parsing the tree structure.


Parsing the tree structure is quite cumbersome in the old "party" 
implementation. This was one of the main motivations to establish the 
reimplementation in "partykit". This has a much better and more accessible 
tree infrastructure. See the vignettes in the "partykit" package for more 
details - especially vignette("partykit", package = "partykit") gives a 
good overview of the building blocks.


Additionally, over at StackOverflow you can find various additional 
bits and pieces that may be helpful. Look for the "party" tag.


Finally, there is also a partykit support forum on R-Forge.

My question is, is there some testing method to choose or rank these 4K 
trees such that I can select each tree from top to bottom. I know each 
tree is important in its own way.


It is not clear to me what/how you want to rank the results. However, 
looking at the sources of information listed above might take you a few 
steps further.


If selection based on significance is required, then is there any other 
way instead of conditional inference tree , that partitions data but 
will also carry some significance to choose from.


The MOB (model-based recursive partitioning) algorithm is also based on 
significance tests and implemented in the "partykit" package. It uses 
parametric asymptotic inference rather than nonparametric conditional 
inference. Otherwise the two approaches are very similar in many respects.


Hope that helps,
Z

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Three-component Negative Binomial Mixture: R code

2016-11-10 Thread Achim Zeileis


On Thu, 10 Nov 2016, danilo.car...@uniparthenope.it wrote:

Thank you for your hints, now the goodness of fit test provides me good 
results, but surprisingly for me the three-component model turns out to be 
worse than the two-component one (indeed, I focused on the three-component 
mixture because the two-component one exhibits a low p-value).


In addition, I have noticed that for some data the function fails to find 
good starting values, as you have mentioned in your previuous answer. The 
problem is that the driver FLXMRnegbin() allows to specify only the theta 
parameter (and only one value, even in the event of mixtures of two or more 
components).


I have read the description of flexmix() function too, but it seems that it 
does not allow to set starting values for the parameters of the model. Am I 
right? Or is there a way to do it?


No, I don't think so. You could look at the FLXMRnegbin() driver code and 
tweak this directly to use better starting values etc. But I think that 
the main issue is that the "theta" parameter often diverges to infinity 
(i.e., a Poisson distribution) if theta is too large in (at least) one of 
the components.


Given that the negbin distribution is a continuous mixture of Poisson 
distributions, I'm not sure whether approaching such data with a finite 
mixture of such continuous mixtures.


What to do with this situation, certainly depends on the data you have and 
the questions you have about it. And I concur with Bert's advice of 
contacting a local statistics expert for discussion of such issues.


hth,
Z





Achim Zeileis <achim.zeil...@uibk.ac.at> ha scritto:


On Tue, 8 Nov 2016, danilo.car...@uniparthenope.it wrote:

I tried the function flexmix() with the driver FLXMRnegbin() with two 
components first, in order to compare its results with those provided by 
my function mixnbinom(). In particular, I ran the following code:




fm0 <- flexmix(y ~ 1, data = data.frame(y), k = 2, model = FLXMRnegbin())



where "y" is my vector of counts. The previous function provided me the 
following parameters:




Comp.1   Comp.2
coef.(Intercept) 1.2746536 1.788578
theta0.1418201 5.028766



with priors 0.342874 and 0.657126, respectively. I assume that the 
coefficients "Intercept" represent the two means of the model (mu1 and 
mu2),


No, a log link is employed, i.e., exp(1.2746536) and exp(1.788578) are the 
means.



while the "theta" coefficients are the size parameters (size1 and size2).


Yes.

Unfortunately, unlike my function mixnbinom(), the model computed with 
flexmix() did not provide a good fit to my data (p-value ~0).


Is there something wrong in the process above?


Hard to say without a reproducible example. Using parameter values similar 
to the ones you cite above, the following seems to do a reasonable job:


## packages
library("countreg")
library("flexmix")

## artificial data from two NB distributions:
## 1/3 is NB(mu = 3.5, theta = 0.2) and
## 2/3 is NB(mu = 6.0, theta = 5.0)
set.seed(1)
y <- c(rnbinom(200, mu = 3.5, size = 0.2), rnbinom(400, mu = 6, size = 5))

## fit 2-component mixture model
set.seed(1)
fm <- flexmix(y ~ 1, k = 2, model = FLXMRnegbin())

## inspect estimated parameters -> look acceptable
parameters(fm)
exp(parameters(fm)[1,])

My experience was that finding good starting values may be a problem for 
flexmix(). So maybe setting these in some better way would be beneficial.




-
Danilo Carità

PhD Candidate
University of Naples "Parthenope"
Dipartimento di Studi Aziendali e Quantitativi
via G. Parisi, 13, 80132 Napoli - Italy
-


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Three-component Negative Binomial Mixture: R code

2016-11-08 Thread Achim Zeileis


On Tue, 8 Nov 2016, danilo.car...@uniparthenope.it wrote:

I tried the function flexmix() with the driver FLXMRnegbin() with two 
components first, in order to compare its results with those provided by my 
function mixnbinom(). In particular, I ran the following code:




fm0 <- flexmix(y ~ 1, data = data.frame(y), k = 2, model = FLXMRnegbin())



where "y" is my vector of counts. The previous function provided me the 
following parameters:




  Comp.1   Comp.2
coef.(Intercept) 1.2746536 1.788578
theta0.1418201 5.028766



with priors 0.342874 and 0.657126, respectively. I assume that the 
coefficients "Intercept" represent the two means of the model (mu1 and mu2),


No, a log link is employed, i.e., exp(1.2746536) and exp(1.788578) are the 
means.



while the "theta" coefficients are the size parameters (size1 and size2).


Yes.

Unfortunately, unlike my function mixnbinom(), the model computed with 
flexmix() did not provide a good fit to my data (p-value ~0).


Is there something wrong in the process above?


Hard to say without a reproducible example. Using parameter values similar 
to the ones you cite above, the following seems to do a reasonable job:


## packages
library("countreg")
library("flexmix")

## artificial data from two NB distributions:
## 1/3 is NB(mu = 3.5, theta = 0.2) and
## 2/3 is NB(mu = 6.0, theta = 5.0)
set.seed(1)
y <- c(rnbinom(200, mu = 3.5, size = 0.2), rnbinom(400, mu = 6, size = 5))

## fit 2-component mixture model
set.seed(1)
fm <- flexmix(y ~ 1, k = 2, model = FLXMRnegbin())

## inspect estimated parameters -> look acceptable
parameters(fm)
exp(parameters(fm)[1,])

My experience was that finding good starting values may be a problem for 
flexmix(). So maybe setting these in some better way would be beneficial.



Achim Zeileis <achim.zeil...@uibk.ac.at> ha scritto:


On Mon, 7 Nov 2016, danilo.car...@uniparthenope.it wrote:


I need a function for R software which computes a mixture of Negative
Binomial distributions with at least three components.


The package "countreg" on R-Forge provides a driver FLXMRnegbin() that can 
be combined with the "flexmix" package (i.e., functions flexmix() and 
stepFlexmix()). The manual page provides some worked illustrations in 
example("FLXMRnegbin", package = "countreg").


Note that the driver is mainly designed for negative binomial _regression_ 
models. But if you just regress on a constant (y ~ 1) you can also get 
negative binomial mixture distributions without covariates.



-
Danilo Carità

PhD Candidate
University of Naples "Parthenope"
Dipartimento di Studi Aziendali e Quantitativi
via G. Parisi, 13, 80132 Napoli - Italy
-


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Three-component Negative Binomial Mixture: R code

2016-11-07 Thread Achim Zeileis


On Mon, 7 Nov 2016, danilo.car...@uniparthenope.it wrote:


I need a function for R software which computes a mixture of Negative
Binomial distributions with at least three components.


The package "countreg" on R-Forge provides a driver FLXMRnegbin() that can 
be combined with the "flexmix" package (i.e., functions flexmix() and 
stepFlexmix()). The manual page provides some worked illustrations in 
example("FLXMRnegbin", package = "countreg").


Note that the driver is mainly designed for negative binomial _regression_ 
models. But if you just regress on a constant (y ~ 1) you can also get 
negative binomial mixture distributions without covariates.



I found on another site the following function "mixnbinom". It works very
well, but it computes a mixture of only two components:



mixnbinom=function(y,k1,mu1,k2,mu2,prob,eps=1/10)
{
 new.parms=c(k1,mu1,k2,mu2,prob)
 err=1
 iter=1
 maxiter=100
 hist(y,probability=T,nclass=30,col="lightgrey",main="The EM algorithm")
 xvals=seq(min(y),max(y),1)
 lines(xvals,prob*dnbinom(xvals,size=k1,mu=mu1)+
   (1-prob)*dnbinom(xvals,size=k2,mu=mu2),col="green")
 while(err>eps){
 if(iter<=maxiter){
 lines(xvals,prob*dnbinom(xvals,size=k1,mu=mu1)+
   (1-prob)*dnbinom(xvals,size=k2,mu=mu2),lty=3)
 }
 bayes=(prob*dnbinom(y,size=k1,mu=mu1))/((prob*
 dnbinom(y,size=k1,mu=mu1))+((1-prob)*dnbinom(y,size=k2,mu=mu2)))
 mu1=sum(bayes*y)/sum(bayes)
 mu2=sum((1-bayes)*y)/sum((1-bayes))
 var1=sum(bayes*(y-mu1)^2)/sum(bayes)
 var2=sum((1-bayes)*(y-mu2)^2)/sum((1-bayes))
 k1=abs(mu1/((var1/mu1)-1))
 k2=abs(mu2/((var2/mu2)-1))
 prob=mean(bayes)
 old.parms=new.parms
 new.parms=c(k1,mu1,k2,mu2,prob)
 err=max(abs((old.parms-new.parms)/new.parms))
 iter=iter+1
 }
 lines(xvals,prob*dnbinom(xvals,size=k1,mu=mu1)+
   (1-prob)*dnbinom(xvals,size=k2,mu=mu2),col="red")
 print(list(k1=k1,mu1=mu1,k2=k2,mu2=mu2,p=prob,iter=iter,err=err))
}



I would be grateful if someone can modify the previous function to
model a three-component mixture instead of a two-component one.

I also tried to look for a package which does the same job: I have
used the package "mixdist", but I am not able to set up a suitable set
of starting parameters for the function mix (they always converge to
zero). Hereafter, I found the package "DEXUS", but the related function
does not provide good estimates for the model, even in the event that
I already know what results I have to expect.

Any help is highly appreciated.


Danilo Carità

-
Danilo Carità

PhD Candidate
University of Naples "Parthenope"
Dipartimento di Studi Aziendali e Quantitativi
via G. Parisi, 13, 80132 Napoli - Italy

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hausman Test

2016-09-13 Thread Achim Zeileis


On Mon, 12 Sep 2016, Ding, Jie Ding (NIH/NIA/ERP) [F] wrote:


Dear Achim,

Sorry to have disturbed you. I have encountered a problem  when computing 
Hausman test statistics (i.e. p values)  in R to compare OLS and 2SLS models.

The problem is a discrepancy between the two p-value outputs from the "manual approach (by hand)" 
and the " diagnostics argument" in the "AER" library, respectively.

With respect to manual approach, I used the following codes:

cf_diff<-coef(ivreg)-coef(olsreg)
vc_diff<-vcov(ivreg)-vcov(olsreg)
x2_diff<-as.vector(t(cf_diff)%*% solve(vc_diff)%*%cf_diff)
pchisq(x2_diff,df=2,lower.tail=FALSE)


For diagnostic approach, I applied the following:

summary(ivreg, vcov = sandwich, df = Inf, diagnostics = TRUE)


However, p-value from the manual approach is always much larger than the 
diagnostic approach, e.g.  0.329 vs. 0.138


I would expect the values should be the same. Your advice would be 
highly appreciated.


The Wu-Hausman test in ivreg() follows the auugmented regression approach 
that is also used by Stata. This regresses the endogenous variable on the 
instruments and includes the fitted values in an OLS regression. The test 
is then a simple Wald test, see:

http://www.stata.com/support/faqs/statistics/durbin-wu-hausman-test/


With very best wishes,
Jennifer



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chaid Decision Tree

2016-08-22 Thread Achim Zeileis


On Mon, 22 Aug 2016, MIKE DE LA HOZ wrote:



Hi,


I am running a chaid tree using titanic dataset (see attachment)



setwd("C:/Users/miguel")

titanic <- read.csv("train.csv")
titanic.s <- subset( titanic, select = -c(PassengerId, Name ) )

ctrl <- chaid_control(minsplit = 20, minbucket = 5, minprob = 0)
chaidTitanic <- chaid(Survived ~ ., data = titanic, control = ctrl)



It looks like I get the following error

Error: is.factor(x) is not TRUE



can you please help me here? I am not able to follow this type of error. if you 
can rewrite the sentence for me, It will be much appreciated


To be able to apply the chaid() function all variables (both response and 
predictor) need to be categorical variables, i.e., in R of class "factor".


It is not clear which variables are the culprits here because your example 
is not reproducible. I guess that there are at least some numeric 
regressor variables. Maybe the "Survived" response is also in numeric 
dummy coding rather than the appropriate "factor" variable.


In any case, I would recommend to use a tree model that can deal with both 
kinds of regressor variables. If you want something that selections split 
variables and split points based on statistical tests, ctree() from 
package "partykit" would be the obvious candidate.




Thanks


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Likelihood ratio test in porl (MASS)

2016-07-27 Thread Achim Zeileis


On Wed, 27 Jul 2016, Faradj Koliev wrote:

Dear all, 

A quick question: Let?s say I have a full and a restricted model that looks something like this: 

Full<- polr(Y ~ X1+X2+X3+X4, data=data, Hess = TRUE,  method="logistic?) # ordered logistic regression 

Restricted<- polr(Y ~ X1+X2+X3, data=data, Hess = TRUE,  method="logistic?) # ordered logistic regression 

I wanted to conduct the F-test (using aov command) in order to determine whether the information from the X4 variable statistically improves our understanding of Y. 
However, I?ve been told that the likelihood ratio test is a better alternative. So, I would like to conduct the LR test. In rms package this is easy -- lrest(Full, Restricted) ? I?m just curious how to perform the same using polr. Thanks!


One generic possibility to conduct the likelihood ratio test is the 
lrtest() function in package "lmtest", i.e.,


library("lmtest")
lrtest(Restricted, Full)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ocr

2016-07-26 Thread Achim Zeileis


On Wed, 27 Jul 2016, Shane Carey wrote:


Cool, thanks Jim!!
I would love to be able to write my own script for this as I have many
images/ pdf's in a folder and would like to batch process them using an R
script!!


The underlying engine is "tesseract" which is also available as a 
command-line tool and on other OSs. In principle, it is not hard to call 
it with a system() command and then readLines() the resulting text. 
However, it might be useful to play with the available options in the GUI 
first to see what works best for your images.



Thanks

On Tuesday, July 26, 2016, Jim Lemon  wrote:


Hi Shane,
FreeOCR is a really good place to start.

http://www.paperfile.net/

Jim


On Wed, Jul 27, 2016 at 6:11 AM, Shane Carey > wrote:

Hi,

Has anyone ever done any ocr in R?? I have some scanned images that I

would

like to convert to text!!
Thanks


--
Le gach dea ghui,
Shane

[[alternative HTML version deleted]]

__
R-help@r-project.org  mailing list -- To UNSUBSCRIBE and

more, see

https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





--
Le gach dea ghui,
Shane

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] C/C++/Fortran Rolling Window Regressions

2016-07-21 Thread Achim Zeileis


Jeremiah,

for this purpose there are the "roll" and "RcppRoll" packages. Both use 
Rcpp and the former also provides rolling lm models. The latter has a 
generic interface that let's you define your own function.


One thing to pay attention to, though, is the numerical reliability. 
Especially on large time series with relatively short windows there is a 
good chance of encountering numerically challenging situations. The QR 
decomposition used by lm is fairly robust while other more straightforward 
matrix multiplications may not be. This should be kept in mind when 
writing your own Rcpp code for plugging it into RcppRoll.


But I haven't check what the roll package does and how reliable that is...

hth,
Z

On Thu, 21 Jul 2016, jeremiah rounds wrote:


Hi,

A not unusual task is performing a multiple regression in a rolling window
on a time-series.A standard piece of advice for doing in R is something
like the code that follows at the end of the email.  I am currently using
an "embed" variant of that code and that piece of advice is out there too.

But, it occurs to me that for such an easily specified matrix operation
standard R code is really slow.   rollapply constantly returns to R
interpreter at each window step for a new lm.   All lm is at its heart is
(X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in rolling
window you are just incrementing a counter and peeling off rows (or columns
of X and y) of a particular window size, and following that up with some
matrix multiplication in a loop.   The psuedo-code for that Rcpp
practically writes itself and you might want a wrapper of something like:
rolling_lm (y=y, x=x, width=4).

My question is this: has any of the thousands of R packages out there
published anything like that.  Rolling window multiple regressions that
stay in C/C++ until the rolling window completes?  No sense and writing it
if it exist.


Thanks,
Jeremiah

Standard (slow) advice for "rolling window regression" follows:


set.seed(1)
z <- zoo(matrix(rnorm(10), ncol = 2))
colnames(z) <- c("y", "x")

## rolling regression of width 4
rollapply(z, width = 4,
  function(x) coef(lm(y ~ x, data = as.data.frame(x))),
  by.column = FALSE, align = "right")

## result is identical to
coef(lm(y ~ x, data = z[1:4,]))
coef(lm(y ~ x, data = z[2:5,]))

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic regression and robust standard errors

2016-07-01 Thread Achim Zeileis


On Fri, 1 Jul 2016, Faradj Koliev wrote:


Dear Achim Zeileis, 
Many thanks for your quick and informative answer. 

I?m sure that the vcovCL should work, however, I experience some problems. 


> coeftest(model, vcov=vcovCL(model, cluster=mydata$ID))

First I got this error: 

Error in vcovCL(model, cluster = mydata$ID) : 
  length of 'cluster' does not match number of observations


This probably means that not all rows of "mydata" have been used for the 
estimation of the model, e.g., due to missing values or something like 
that. Hence, the mismatch seems to occur.



After checking the observations I got this error: 

Error in vcovCL(model, cluster = mydata$ID) : object 'tets' not found
Called from: vcovCL(model, cluster = mydata$ID)
Browse[1]> 


The variable "tets" is presumably one of the regressors in your "model" 
and apparently it cannot be found when extracting the model scores. Maybe 
you haven't stored the model frame and deleted the data.


Hard to say without a simple reproducible example.
Z


What can I do to fix it? What am I doing wrong now? 





      1 jul 2016 kl. 14:57 skrev Achim Zeileis
  <achim.zeil...@uibk.ac.at>:

On Fri, 1 Jul 2016, Faradj Koliev wrote:

  Dear all,

  I use ?polr? command (library: MASS) to estimate an
  ordered logistic regression.

  My model:   summary( model<- polr(y ~ x1+x2+x3+x4+x1*x2
  ,data=mydata, Hess = TRUE))

  But how do I get robust clustered standard errors?
  I??ve tried coeftest(resA, vcov=vcovHC(resA,
  cluster=lipton$ID))


The vcovHC() function currently does not (yet) have a "cluster"
argument. We are working on it but it's not finished yet.

As an alternative I include the vcovCL() function below that computes
the usual simple clustered sandwich estimator. This can be applied to
"polr" objects and plugged into coeftest(). So

coeftest(resA, vcov=vcovCL(resA, cluster=lipton$ID))

should work.

  and summary(a <- robcov(model,mydata$ID)).


The robcov() function does in principle what you want by I'm not sure
whether it works with polr(). But for sure it works with lrm() from
the "rms" package.

Hope that helps,
Z

vcovCL <- function(object, cluster = NULL, adjust = NULL)
{
 stopifnot(require("sandwich"))

 ## cluster specification
 if(is.null(cluster)) cluster <- attr(object, "cluster")
 if(is.null(cluster)) stop("no 'cluster' specification found")
 cluster <- factor(cluster)

 ## estimating functions and dimensions
 ef <- estfun(object)
 n <- NROW(ef)
 k <- NCOL(ef)
 if(n != length(cluster))
   stop("length of 'cluster' does not match number of observations")
 m <- length(levels(cluster))

 ## aggregate estimating functions by cluster and compute meat
 ef <- sapply(levels(cluster), function(i) colSums(ef[cluster == i, ,
   drop = FALSE]))
 ef <- if(NCOL(ef) > 1L) t(ef) else matrix(ef, ncol = 1L)
 mt <- crossprod(ef)/n

 ## bread
 br <- try(bread(object), silent = TRUE)
 if(inherits(br, "try-error")) br <- vcov(object) * n

 ## put together sandwich
 vc <- 1/n * (br %*% mt %*% br)

 ## adjustment
 if(is.null(adjust)) adjust <- class(object)[1L] == "lm"
 adj <- if(adjust) m/(m - 1L) * (n - 1L)/(n - k) else m/(m - 1L)

 ## return
 return(adj * vc)
}


  Neither works for me. So I wonder what am I doing wrong
  here?


  All suggestions are welcome ? thank you!
  [[alternative HTML version deleted]]

  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and
  more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
  reproducible code.





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic regression and robust standard errors

2016-07-01 Thread Achim Zeileis


On Fri, 1 Jul 2016, Faradj Koliev wrote:

Dear all, 



I use ?polr? command (library: MASS) to estimate an ordered logistic regression.

My model:   summary( model<- polr(y ~ x1+x2+x3+x4+x1*x2 ,data=mydata, Hess = 
TRUE))

But how do I get robust clustered standard errors? 


I??ve tried coeftest(resA, vcov=vcovHC(resA, cluster=lipton$ID))


The vcovHC() function currently does not (yet) have a "cluster" argument. 
We are working on it but it's not finished yet.


As an alternative I include the vcovCL() function below that computes the 
usual simple clustered sandwich estimator. This can be applied to "polr" 
objects and plugged into coeftest(). So


coeftest(resA, vcov=vcovCL(resA, cluster=lipton$ID))

should work.


and summary(a <- robcov(model,mydata$ID)).


The robcov() function does in principle what you want by I'm not sure 
whether it works with polr(). But for sure it works with lrm() from the 
"rms" package.


Hope that helps,
Z

vcovCL <- function(object, cluster = NULL, adjust = NULL)
{
  stopifnot(require("sandwich"))

  ## cluster specification
  if(is.null(cluster)) cluster <- attr(object, "cluster")
  if(is.null(cluster)) stop("no 'cluster' specification found")
  cluster <- factor(cluster)

  ## estimating functions and dimensions
  ef <- estfun(object)
  n <- NROW(ef)
  k <- NCOL(ef)
  if(n != length(cluster))
stop("length of 'cluster' does not match number of observations")
  m <- length(levels(cluster))

  ## aggregate estimating functions by cluster and compute meat
  ef <- sapply(levels(cluster), function(i) colSums(ef[cluster == i, ,
drop = FALSE]))
  ef <- if(NCOL(ef) > 1L) t(ef) else matrix(ef, ncol = 1L)
  mt <- crossprod(ef)/n

  ## bread
  br <- try(bread(object), silent = TRUE)
  if(inherits(br, "try-error")) br <- vcov(object) * n

  ## put together sandwich
  vc <- 1/n * (br %*% mt %*% br)

  ## adjustment
  if(is.null(adjust)) adjust <- class(object)[1L] == "lm"
  adj <- if(adjust) m/(m - 1L) * (n - 1L)/(n - k) else m/(m - 1L)

  ## return
  return(adj * vc)
}



Neither works for me. So I wonder what am I doing wrong here?


All suggestions are welcome ? thank you!
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rpart plot produces no text

2016-06-29 Thread Achim Zeileis

I don't think this is an RStudio issue. The plot() method for "rpart" 
objects draws no labels. The text() method has to be called additionally:


library("rpart")
rp <- rpart(Species ~ ., data = iris)
plot(rp)
text(rp)

As these plots produced by rpart itself are not very appealing, there are 
various approaches that allow drawing other displays, e.g.,


library("rpart.plot")
prp(rp)

library("rattle")
fancyRpartPlot(rp)

library("partykit")
plot(as.party(rp))

which show different amounts of detail and use different visual means to 
display the available information.


On Wed, 29 Jun 2016, John Kane wrote:


What happens if you run the code in a terminal rather than RStudio? My 
experience is that very, very occasionally RStudio does something a bit funny 
with plots.

And while this may sound funny just shut down RStudio, reload it and try again.

John Kane
Kingston ON Canada



-Original Message-
From: jcthomp...@redlobster.com
Sent: Tue, 28 Jun 2016 20:26:59 +
To: r-help@r-project.org
Subject: [R] Rpart plot produces no text

I am using R Studio and am able to fit a tree with RPlot, however, the
tree in the viewer has no text (see image attached).

Jim Thompson
This e-mail message is for the sole use of the intende...{{dropped:21}}


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [FORGED] Re: Rattle

2016-06-25 Thread Achim Zeileis




On Sat, 25 Jun 2016, Rolf Turner wrote:


On 25/06/16 09:13, Jeff Newmiller wrote:
This is like asking, "My car doesn't work. Can anyone tell me what is 
wrong?"




Fortune nomination!


On R-Forge now.
Z


cheers,

Rolf


--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] No reply from CRAN Task View: Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

2016-06-20 Thread Achim Zeileis


On Mon, 20 Jun 2016, Joseph Gama wrote:


Hi all,

I emailed a suggestion to Nicholas Lewin-Koh, the maintainer of the CRAN 
Task View: Graphic Displays & Dynamic Graphics & Graphic Devices & 
Visualization. I got no reply, so I wonder, is he still maintaining that 
view? If not, then who else does or will maintain it?


To the best of my knowledge he is still maintaining it. I cc'ed Nicholas 
in this reply.



BR,

José Gama

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using weights with truncreg

2016-06-13 Thread Achim Zeileis


On Mon, 13 Jun 2016, Ganz, Carl wrote:


Hello everyone,

I would like to use truncreg on survey data with weights, but so far as I can 
tell the weight parameter in truncreg is not working. I find that using weights 
does not actually change the output.

Here is a small reproducible example using the example in the documentation:

## simulate a data.frame
set.seed(1071)
n <- 1
sigma <- 4
alpha <- 2
beta <- 1
x <- rnorm(n, mean = 0, sd = 2)
eps <- rnorm(n, sd = sigma)
y <- alpha + beta * x + eps
d <- data.frame(y = y, x = x)

## truncated response
d$yt <- ifelse(d$y > 1, d$y, NA)

## binary threshold response
d$yb <- factor(d$y > 0)

## censored response
d$yc <- pmax(1, d$y)

## random weight
wgt <- runif(1,500,1500)
## unweighted
fm_trunc <- truncreg(yt ~ x, data = d, point = 1, direction = "left")
## weighted
fm_trunc_weighted <- truncreg(yt ~ x, data = d, weights = wgt, point = 1, direction = 
"left")

coef(fm_trunc_weighted)==coef(fm_trunc) # all equal

Am I misunderstanding how the weight parameter works for truncreg or is 
the weight parameter not working?


I think you are right and the 'weights' argument in truncreg() is 
currently not used.


As an alternative you can use the "crch" package for censored (and 
truncated) regression with conditional heteroskedasticity. The models 
above can be fitted via:


library("crch")
fm_trunc2 <- trch(yt ~ x, data = d, left = 1)
fm_trunc_weighted2 <- trch(yt ~ x, data = d, weights = wgt, left = 1)

Note that by default a log-link is used for the sigma/scale parameter to 
assure positivity. But you can also set link.scale="identity" to obtain 
the same results as in truncreg().



Kind Regards,
Carl

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sandwich package: HAC estimators

2016-06-01 Thread Achim Zeileis


On Wed, 1 Jun 2016, T.Riedle wrote:

Thank you very much. I have applied the example to my case and get 
following results:


crisis_bubble4<-glm(stock.market.crash~crash.MA+bubble.MA+MP.MA+UTS.MA+UPR.MA+PPI.MA+RV.MA,family=binomial("logit"),data=Data_logitregression_movingaverage)

summary(crisis_bubble4)


Call:
glm(formula = stock.market.crash ~ crash.MA + bubble.MA + MP.MA +
   UTS.MA + UPR.MA + PPI.MA + RV.MA, family = binomial("logit"),
   data = Data_logitregression_movingaverage)

Deviance Residuals:
   Min   1Q   Median   3Q  Max
-1.7828  -0.6686  -0.3186   0.6497   2.4298

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)   -5.2609 0.8927  -5.893 3.79e-09 ***
crash.MA   0.4922 0.4966   0.991  0.32165
bubble.MA 12.1287 1.3736   8.830  < 2e-16 ***
MP.MA-20.072496.9576  -0.207  0.83599
UTS.MA   -58.181419.3533  -3.006  0.00264 **
UPR.MA  -337.579864.3078  -5.249 1.53e-07 ***
PPI.MA   729.376973.0529   9.984  < 2e-16 ***
RV.MA116.001116.5456   7.011 2.37e-12 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

(Dispersion parameter for binomial family taken to be 1)

   Null deviance: 869.54  on 705  degrees of freedom
Residual deviance: 606.91  on 698  degrees of freedom
AIC: 622.91

Number of Fisher Scoring iterations: 5


coeftest(crisis_bubble4)


z test of coefficients:

 Estimate Std. Error z value  Pr(>|z|)
(Intercept)   -5.260880.89269 -5.8933 3.786e-09 ***
crash.MA   0.492190.49662  0.9911  0.321652
bubble.MA 12.128681.37357  8.8300 < 2.2e-16 ***
MP.MA-20.07238   96.95755 -0.2070  0.835992
UTS.MA   -58.18142   19.35330 -3.0063  0.002645 **
UPR.MA  -337.57985   64.30779 -5.2494 1.526e-07 ***
PPI.MA   729.37693   73.05288  9.9842 < 2.2e-16 ***
RV.MA116.00106   16.54560  7.0110 2.366e-12 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1


coeftest(crisis_bubble4,vcov=NeweyWest)


z test of coefficients:

 Estimate Std. Error z value Pr(>|z|)
(Intercept)   -5.260885.01706 -1.0486  0.29436
crash.MA   0.492192.41688  0.2036  0.83863
bubble.MA 12.128685.85228  2.0725  0.03822 *
MP.MA-20.07238  499.37589 -0.0402  0.96794
UTS.MA   -58.18142   77.08409 -0.7548  0.45038
UPR.MA  -337.57985  395.35639 -0.8539  0.39318
PPI.MA   729.37693  358.60868  2.0339  0.04196 *
RV.MA116.00106   79.52421  1.4587  0.14465
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1


waldtest(crisis_bubble4, vcov = NeweyWest,test="F")

Wald test

Model 1: stock.market.crash ~ crash.MA + bubble.MA + MP.MA + UTS.MA +
   UPR.MA + PPI.MA + RV.MA
Model 2: stock.market.crash ~ 1
 Res.Df Df  F  Pr(>F)
1698
2705 -7 2.3302 0.02351 *
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1


waldtest(crisis_bubble4, vcov = NeweyWest,test="Chisq")

Wald test

Model 1: stock.market.crash ~ crash.MA + bubble.MA + MP.MA + UTS.MA +
   UPR.MA + PPI.MA + RV.MA
Model 2: stock.market.crash ~ 1
 Res.Df Df  Chisq Pr(>Chisq)
1698
2705 -7 16.3110.02242 *
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Do you agree with the methodology?


Well, this is how you _can_ do what you _wanted_ to do. I already 
expressed my doubts about several aspects. First, some coefficients and 
their standard errors are very large which may (or may not) hint at 
problems that are close to separation. Second, given the increase in the 
standard errors, the autocorrelation appears to be substantial and it 
might be good to try to capture these autocorrelations explicitly rather 
than just correcting the standard errors.


I read in a book that it is also possible to use vcov=vcovHAC in the 
coeftest() function.


Yes. (I also mentioned that in my e-mail yesterday, see below.)

Nevertheless, I am not sure what kind of HAC I generate with this 
command. Which weights does this command apply, which bandwith and which 
kernel?


Please consult vignette("sandwich", package = "sandwich") for the details. 
In short: Both, vcovHAC and kernHAC use the quadratic spectral kernel with 
Andrews' parametric bandwidth selection. The latter function uses 
prewhitening by default while the latter does not. In contrast, NeweyWest 
uses a Bartlett kernel with Newey & Wests nonparametric lag/bandwidth 
selection and prewhitening by default.



Kind regards

From: Achim Zeileis <achim.zeil...@uibk.ac.at>
Sent: 31 May 2016 17:19
To: T.Riedle
Cc: r-help@r-project.org
Subject: Re: [R] sandwich package: HAC estimators

On Tue, 31 May 2016, T.Riedle wrote:


Many thanks for your feedback.

If I get the code for the waldtest right I can calculate the Chi2 and
the F statistic using waldtest().


Yes. In a logit model you w

Re: [R] Fortune candidate: Re: Whether statistical background is must to learn R language

2016-05-31 Thread Achim Zeileis


Thanks, Sarah, added now in the devel-package on R-Forge.
Z

On Tue, 31 May 2016, Sarah Goslee wrote:


On Tue, May 31, 2016 at 11:09 AM, Jeff Newmiller
 wrote:



However, please don't apply R like a magic answers box, because you can mislead 
others and cause harm.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sandwich package: HAC estimators

2016-05-31 Thread Achim Zeileis


On Tue, 31 May 2016, T.Riedle wrote:


Many thanks for your feedback.

If I get the code for the waldtest right I can calculate the Chi2 and 
the F statistic using waldtest().


Yes. In a logit model you would usually use the chi-squared statistic.

Can I use the waldtest() without using bread()/ estfun()? That is, I 
estimate the logit regression using glm() e.g. logit<-glm(...) and 
insert logit into the waldtest() function.


Does that work to get chi2 under HAC standard errors?


I'm not sure what you mean here but I include a worked example. Caveat: 
The data I use are cross-section data with an overly simplified set of 
regressors. So none of this makes sense for the application - but it shows 
how to use the commands.


## load AER package which provides the example data
## and automatically loads "lmtest" and "sandwich"
library("AER")
data("PSID1976", package = "AER")

## fit a simple logit model and obtain marginal Wald tests
## for the coefficients and an overall chi-squared statistic
m <- glm(participation ~ education, data = PSID1976, family = binomial)
summary(m)
anova(m, test = "Chisq")

## replicate the same statistics with coeftest() and lrtest()
coeftest(m)
lrtest(m)

## the likelihood ratio test is asymptotically equivalent
## to the Wald test leading to a similar chi-squared test here
waldtest(m)

## obtain HAC-corrected (Newey-West) versions of the Wald tests
coeftest(m, vcov = NeweyWest)
waldtest(m, vcov = NeweyWest)

Instead of NeweyWest other covariance estimators (e.g., vcovHAC, kernHAC, 
etc.) can also be plugged in.


hth,
Z



From: Achim Zeileis <achim.zeil...@uibk.ac.at>
Sent: 31 May 2016 13:18
To: T.Riedle
Cc: r-help@r-project.org
Subject: Re: [R] sandwich package: HAC estimators

On Tue, 31 May 2016, T.Riedle wrote:


I understood. But how do I get the R2 an Chi2 of my logistic regression
under HAC standard errors? I would like to create a table with HAC SE
via e.g. stargazer().

Do I get these information by using the functions

bread.lrm <- function(x, ...) vcov(x) * nobs(x)
estfun.lrm <- function(x, ...) residuals(x, "score")?

Do I need to use the coeftest() in this case?


The bread()/estfun() methods enable application of vcovHAC(), kernHAC(),
NeweyWest(). This in turn enables the application of coeftest(),
waldtest(), or linearHypothesis() with a suitable vcov argument.

All of these give you different kinds of Wald tests with HAC covariances
including marginal tests of individual coefficients (coeftest) or global
tests of nested models (waldtest/linearHypothesis). The latter can serve
as replacement for the "chi-squared test". For pseudo-R-squared values I'm
not familiar with HAC-adjusted variants.

And I'm not sure whether there is a LaTeX export solution that encompasses
all of these aspects simultaneously.


________
From: R-help <r-help-boun...@r-project.org> on behalf of Achim Zeileis 
<achim.zeil...@uibk.ac.at>
Sent: 31 May 2016 08:36
To: Leonardo Ferreira Fontenelle
Cc: r-help@r-project.org
Subject: Re: [R] sandwich package: HAC estimators

On Mon, 30 May 2016, Leonardo Ferreira Fontenelle wrote:


Em Sáb 28 mai. 2016, às 15:50, Achim Zeileis escreveu:

On Sat, 28 May 2016, T.Riedle wrote:

I thought it would be useful to incorporate the HAC consistent
covariance matrix into the logistic regression directly and generate an
output of coefficients and the corresponding standard errors. Is there
such a function in R?


Not with HAC standard errors, I think.


Don't glmrob() and summary.glmrob(), from robustbase, do that?


No, they implement a different concept of robustness. See also
https://CRAN.R-project.org/view=Robust

glmrob() implements GLMs that are "robust" or rather "resistant" to
outliers and other observations that do not come from the main model
equation. Instead of maximum likelihood (ML) estimation other estimation
techniques (along with corresponding covariances/standard errors) are
used.

In contrast, the OP asked for HAC standard errors. The motivation for
these is that the main model equation does hold for all observations but
that the observations might be heteroskedastic and/or autocorrelated. In
this situation, ML estimation is still consistent (albeit not efficient)
but the covariance matrix estimate needs to be adjusted.



Leonardo Ferreira Fontenelle, MD, MPH

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do re

Re: [R] sandwich package: HAC estimators

2016-05-31 Thread Achim Zeileis


On Tue, 31 May 2016, T.Riedle wrote:

I understood. But how do I get the R2 an Chi2 of my logistic regression 
under HAC standard errors? I would like to create a table with HAC SE 
via e.g. stargazer().


Do I get these information by using the functions

bread.lrm <- function(x, ...) vcov(x) * nobs(x)
estfun.lrm <- function(x, ...) residuals(x, "score")?

Do I need to use the coeftest() in this case?


The bread()/estfun() methods enable application of vcovHAC(), kernHAC(), 
NeweyWest(). This in turn enables the application of coeftest(),

waldtest(), or linearHypothesis() with a suitable vcov argument.

All of these give you different kinds of Wald tests with HAC covariances 
including marginal tests of individual coefficients (coeftest) or global 
tests of nested models (waldtest/linearHypothesis). The latter can serve 
as replacement for the "chi-squared test". For pseudo-R-squared values I'm 
not familiar with HAC-adjusted variants.


And I'm not sure whether there is a LaTeX export solution that encompasses 
all of these aspects simultaneously.




From: R-help <r-help-boun...@r-project.org> on behalf of Achim Zeileis 
<achim.zeil...@uibk.ac.at>
Sent: 31 May 2016 08:36
To: Leonardo Ferreira Fontenelle
Cc: r-help@r-project.org
Subject: Re: [R] sandwich package: HAC estimators

On Mon, 30 May 2016, Leonardo Ferreira Fontenelle wrote:


Em Sáb 28 mai. 2016, às 15:50, Achim Zeileis escreveu:

On Sat, 28 May 2016, T.Riedle wrote:

I thought it would be useful to incorporate the HAC consistent
covariance matrix into the logistic regression directly and generate an
output of coefficients and the corresponding standard errors. Is there
such a function in R?


Not with HAC standard errors, I think.


Don't glmrob() and summary.glmrob(), from robustbase, do that?


No, they implement a different concept of robustness. See also
https://CRAN.R-project.org/view=Robust

glmrob() implements GLMs that are "robust" or rather "resistant" to
outliers and other observations that do not come from the main model
equation. Instead of maximum likelihood (ML) estimation other estimation
techniques (along with corresponding covariances/standard errors) are
used.

In contrast, the OP asked for HAC standard errors. The motivation for
these is that the main model equation does hold for all observations but
that the observations might be heteroskedastic and/or autocorrelated. In
this situation, ML estimation is still consistent (albeit not efficient)
but the covariance matrix estimate needs to be adjusted.



Leonardo Ferreira Fontenelle, MD, MPH

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sandwich package: HAC estimators

2016-05-31 Thread Achim Zeileis

On Mon, 30 May 2016, Leonardo Ferreira Fontenelle wrote:

Em Sáb 28 mai. 2016, às 15:50, Achim Zeileis escreveu:

On Sat, 28 May 2016, T.Riedle wrote:
> I thought it would be useful to incorporate the HAC consistent 
> covariance matrix into the logistic regression directly and generate an 
> output of coefficients and the corresponding standard errors. Is there 
> such a function in R?

Not with HAC standard errors, I think.

Don't glmrob() and summary.glmrob(), from robustbase, do that?

No, they implement a different concept of robustness. See also
https://CRAN.R-project.org/view=Robust

glmrob() implements GLMs that are "robust" or rather "resistant" to 
outliers and other observations that do not come from the main model 
equation. Instead of maximum likelihood (ML) estimation other estimation 
techniques (along with corresponding covariances/standard errors) are 
used.

In contrast, the OP asked for HAC standard errors. The motivation for 
these is that the main model equation does hold for all observations but 
that the observations might be heteroskedastic and/or autocorrelated. In 
this situation, ML estimation is still consistent (albeit not efficient) 
but the covariance matrix estimate needs to be adjusted.

Leonardo Ferreira Fontenelle, MD, MPH

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] zelig package: robust SE

2016-05-29 Thread Achim Zeileis

On Sun, 29 May 2016, T.Riedle wrote:

Dear R users,

I am trying to run a logistic regression using zelig. The simple
logistic regression works well but now I want to have HAC robust
standard errors. I have read in the manual that there is an option
called "robust" and that zelig() computes robust SE via the sandwich
package. However, it doesn't work. My code looks as follows:

crisis_bubble4<-zelig(stock.market.crash~crash.MA+bubble.MA+MP.MA+UTS.MA+UPR.MA+PPI.MA+RV.MA,robust=TRUE,model="logit",data=Data_logitregression_movingaverage)

Error in glm.control(robust = TRUE) : unused argument (robust = TRUE)

Possibly this changed in recent versions of "Zelig". The documentation
for relogit on the official Zelig web page shows the same error:

http://docs.zeligproject.org/en/latest/zelig-relogit.html#example-2-one-tau-with-weighting-robust-standard-errors-and-bias-correction

In any case, Zelig only interfaced the so-called "robust" standard errors
(sandwich or HC0) and not the ones with HAC correction.

I took this code from the zelig manual and don't understand why I get an
error. What is more, I want to calculate NeweyWest SE or the SE using
the weights by Andrews via the kernHAC function. How can I do that?

Why do you want to use Zelig? Apparently, the solution using plain glm()
along with the sandwich package and coeftest() did work for you. And in my
other e-mail I showed you how you can use lrm().

Thanks for your support.

Kind regards

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sandwich package: HAC estimators

2016-05-28 Thread Achim Zeileis


On Sat, 28 May 2016, T.Riedle wrote:


Dear R users,

I am running a logistic regression using the rms package and the code 
looks as follows:


crisis_bubble4<-lrm(stock.market.crash~crash.MA+bubble.MA+MP.MA+UTS.MA+UPR.MA+PPI.MA+RV.MA,data=Data_logitregression_movingaverage)

Now, I would like to calculate HAC robust standard errors using the 
sandwich package assuming the NeweyWest estimator which looks as 
follows:


coeftest(crisis_bubble4,df=Inf,vcov=NeweyWest)

Error in match.arg(type) :

 'arg' should be one of "li.shepherd", "ordinary", "score", 
"score.binary", "pearson", "deviance", "pseudo.dep", "partial", 
"dfbeta", "dfbetas", "dffit", "dffits", "hat", "gof", "lp1"


As you can see, it doesn't work.


Yes. The "sandwich" package relies on two methods being available: bread() 
and estfun(). See vignette("sandwich-OOP", package = "sandwich") for the 
background details.


For objects of class "lrm" no such methods are available. But as "lrm" 
objects inherit from "glm" the corresponding methods are called. However, 
"lrm" objects are actually too different from "glm" objects (despite the 
inheritance) resulting in the error.


It is easy to add these methods, though, because "lrm" brings all the 
necessary information:


bread.lrm <- function(x, ...) vcov(x) * nobs(x)
estfun.lrm <- function(x, ...) residuals(x, "score")


Therefore, I did the same using the glm() instead of lrm():

crisis_bubble4<-glm(stock.market.crash~crash.MA+bubble.MA+MP.MA+UTS.MA+UPR.MA+PPI.MA+RV.MA,family=binomial("logit"),data=Data_logitregression_movingaverage)

If I use the coeftest() function, I get following results.

coeftest(crisis_bubble4,df=Inf,vcov=NeweyWest)

z test of coefficients:

 Estimate Std. Error z value Pr(>|z|)
(Intercept)   -5.260885.01706 -1.0486  0.29436
crash.MA   0.492192.41688  0.2036  0.83863
bubble.MA 12.128685.85228  2.0725  0.03822 *
MP.MA-20.07238  499.37589 -0.0402  0.96794
UTS.MA   -58.18142   77.08409 -0.7548  0.45038
UPR.MA  -337.57985  395.35639 -0.8539  0.39318
PPI.MA   729.37693  358.60868  2.0339  0.04196 *
RV.MA116.00106   79.52421  1.4587  0.14465
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Some of these coefficients and standard errors are suspiciously large. It 
might make sense to check for quasi-complete separation.


I am unsure whether the coeftest from the lmtest package is appropriate 
in case of a logistic regression.


Yes, this is ok. (Whether or not the application of HAC standard errors is 
the best way to go is a different matter though.)


Is there another function for logistic regressions? Furthermore, I would 
like to present the regression coefficients, the F-statistic and the HAC 
estimators in one single table. How can I do that?


Running first coeftest() and then lrtest() should get you close to what 
you want - even though it is not a single table.


I thought it would be useful to incorporate the HAC consistent 
covariance matrix into the logistic regression directly and generate an 
output of coefficients and the corresponding standard errors. Is there 
such a function in R?


Not with HAC standard errors, I think.


Thanks for your support.

Kind regards

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fortune candidate, was Re: Shaded areas in R

2016-05-26 Thread Achim Zeileis


On Thu, 26 May 2016, Michael Dewey wrote:


In a reply Duncan said


Nice, thanks, added to the "fortunes" package on R-Forge now.
Best,
Z


On 26/05/2016 16:12, Duncan Murdoch wrote:

On 26/05/2016 11:03 AM, Óscar Jiménez wrote:

You should try things; R won't break.







Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Michael
http://www.dewey.myzen.co.uk/home.html


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Truncreg package help

2016-05-07 Thread Achim Zeileis


On Fri, 6 May 2016, Philipp Schaper wrote:


Dear R userers,

I am running truncated regressions with the 'truncreg' package. My sample
is large (6,000 observations), the data is left-truncated at 1 and the
left tail of the data is heavily centered at 1. When I am running the
regression I receive the following error message:

 Error in optim(par = start[!fixed], fn = logLikFunc, control = control,
:
 initial value in 'vmmin' is not finite


From a previous discussion (

http://r.789695.n4.nabble.com/betareg-help-td3350129.html) on a similar
issue in the betareg function I assume that the error message stems from
that the estimate of the starting value of the precision parameter is
negative. However, I do not know how I can take care of this. Thus I would
be very thankful for any help.


As in the thread you quote above, it is hard to say what is going on 
without a reproducible example. It is possible that the problem is caused 
by some problem of (almost) degenerate data (as was the case with the 
betareg example). With a reproducible example we might be able to say 
more.


As an alternative to truncreg() you might try the function trch() from 
package "crch": trch(y ~ x1 + x2 + ... | 1, data = mydata, left = 0). This 
is a different implementation of the same model so possibly this avoids 
the problem - but maybe not.



Best regards,
Philipp
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accessing terminal datasets in Ctree()

2016-05-02 Thread Achim Zeileis


On Mon, 2 May 2016, Preetam Pal wrote:


Great, thank you so much Achim.But one issue, in case I do not know how many
terminal nodes would be there, what do I do? Note that I do not need the
datasets corresponding to the intermediate nodes only need the terminal
datasets.


With predict(ct, type = "node") you can set up a new variable, e.g.,

iris$node <- factor(predict(ct, type = "node"))

and then use this to obtain the subset corresponding to each of the 
terminal nodes.



Regards,
Preetam 

On Tue, May 3, 2016 at 3:08 AM, Achim Zeileis <achim.zeil...@uibk.ac.at>
wrote:
  On Mon, 2 May 2016, Preetam Pal wrote:

Hi guys,

If I am applying ctree() on a data (specifying some
control parameters like
maxdepth), is there a way I can programmatically
access the (smaller)
datasets corresponding to the terminal nodes in the
tree? Say, if there are
7 terminal nodes, I need those 7 datasets (of
course, I can look at the
respective node-splitting attributes and write out a
filtering function -
but clearly too much to ask for if I have a large
number of terminal
nodes). Intention is to perform regression on each
of these terminal
datasets.


  If you use the "partykit" implementation you can do:

  library("partykit")
  ct <- ctree(Species ~ ., data = iris)
  data_party(ct, id = 6)

  to obtain the data associated with node 6 for example. You can
  also use ct[6] to obtain the subtree and ct[6]$data for its
  associated data.

  For setting up a factor with the terminal node IDs, you can also
  use predict(ct, type = "node") and then use that in lm() etc.

  Finally, note that there is also lmtree() and glmtree() for
  trees with (generalized) linear models in their nodes.

Regards,
Preetam

--
Preetam Pal
(+91)-9432212774
M-Stat 2nd Year,                                   
         Room No. N-114
Statistics Division,                               
           C.V.Raman
Hall
Indian Statistical Institute,                       
         B.H.O.S.
Kolkata.

        [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE
and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,
reproducible code.




--
Preetam Pal                                                  
(+91)-9432212774
M-Stat 2nd Year,                                             Room No. N-114
Statistics Division,                                           C.V.Raman
HallIndian Statistical Institute,                                 B.H.O.S.
Kolkata.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accessing terminal datasets in Ctree()

2016-05-02 Thread Achim Zeileis


On Mon, 2 May 2016, Preetam Pal wrote:


Hi guys,

If I am applying ctree() on a data (specifying some control parameters like
maxdepth), is there a way I can programmatically access the (smaller)
datasets corresponding to the terminal nodes in the tree? Say, if there are
7 terminal nodes, I need those 7 datasets (of course, I can look at the
respective node-splitting attributes and write out a filtering function -
but clearly too much to ask for if I have a large number of terminal
nodes). Intention is to perform regression on each of these terminal
datasets.


If you use the "partykit" implementation you can do:

library("partykit")
ct <- ctree(Species ~ ., data = iris)
data_party(ct, id = 6)

to obtain the data associated with node 6 for example. You can also use 
ct[6] to obtain the subtree and ct[6]$data for its associated data.


For setting up a factor with the terminal node IDs, you can also use 
predict(ct, type = "node") and then use that in lm() etc.


Finally, note that there is also lmtree() and glmtree() for trees with 
(generalized) linear models in their nodes.



Regards,
Preetam

--
Preetam Pal
(+91)-9432212774
M-Stat 2nd Year, Room No. N-114
Statistics Division,   C.V.Raman
Hall
Indian Statistical Institute, B.H.O.S.
Kolkata.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Retrieving response variable in the probit

2016-05-02 Thread Achim Zeileis


On Mon, 2 May 2016, Steven Yen wrote:


Can anyone tell me how to retrieve the response (dependent) variable
from a probit regression object (as much as model.matrix(obj)
retrieves the data matrix). Below is a self-runnable set of codes.
Thank you!

library(sampleSelection)
data<-read.csv("https://dl.dropboxusercontent.com/u/220037024/Yen/data/pta.csv;)
eq1<-d~sex+age+educ
p1<-probit(eq1,data=data)
summary(p1)
attributes(p1)


model.frame(p1)

recovers the (possibly transformed) data employed for fitting the model 
and


model.response(model.frame(p1))

extracts the response variable from that model frame.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple interactions

2016-04-15 Thread Achim Zeileis


Try using
  ~ group/age
or even
  ~ 0 + group/age

Both have all three group-specific slopes but differ with respect to the 
intercept codings. The latter has three group-specific intercepts as well. 
But the former has an intercept corresponding to the reference group A and 
then the usual treatment contrasts for group B and C (i.e., intercept 
differences).


IIRC then I found the discussion of these contrasts and nested codings in 
the MASS book very useful.


On Fri, 15 Apr 2016, Therneau, Terry M., Ph.D. wrote:

I'd like to get interaction terms in a model to be in another form.  Namely, 
suppose I had variables age and group, the latter a factor with levels A, B, 
C, with  age * group in the model.  What I would like are the variables 
"age:group=A", "age:group=B" and "age:group=C"  (and group itself of course). 
The coefficients of the model will then be the age effect in group A, the age 
effect in group B and the age effect in C rather than the standard ones of an 
overall age effect followed by contrasts.  These is often a better format for 
tables in a publication.


Yes, I can reconstruct these from the original fit, but I have a lot of 
variables for several models and it would be easier to have an automatic 
form.  I suspect that there is an easy answer, but I don't see it.


Terry Therneau

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Decision Tree and Random Forrest

2016-04-14 Thread Achim Zeileis


On Thu, 14 Apr 2016, Michael Artz wrote:


Ah yes I will have to use the predict function.  But the predict function
will not get me there really.  If I can take the example that I have a
model predicting whether or not I will play golf (this is the dependent
value), and there are three independent variables Humidity(High, Medium,
Low), Pending_Chores(Taxes, None, Laundry, Car Maintenance) and Wind (High,
Low).  I would like rules like where any record that follows these rules
(IF humidity = high AND pending_chores = None AND Wind = High THEN 77%
there is probability that play_golf is YES).


Although I think that this toy example is not overly useful for practical 
illustrations we have included the standard dataset in the "partykit" 
package:


## data
data("WeatherPlay", package = "partykit")

I was thinking that random forrest would weight the rules somehow on the 
collection of trees and give a probability.  But if that doesnt make 
sense, then can you just tell me how to get the decsion rules with one 
tree and I will work from that.


Then you can learn one tree on this data, e.g., with rpart() or ctree():

## trees
library("rpart")
rp <- rpart(play ~ ., data = WeatherPlay,
  control = rpart.control(minsplit = 5))

library("partykit")
ct <- ctree(play ~ ., data = WeatherPlay,
  minsplit = 5, mincriterion = 0.1)

## visualize via partykit
pr <- as.party(rp)
plot(pr)
plot(ct)

And the partykit package also includes a function to generate a text 
representation of the rules although this is currently not exported:


partykit:::.list.rules.party(pr)
##"outlook %in% c(\"overcast\")"
## 4
##  "outlook %in% c(\"sunny\", \"rainy\") & humidity < 82.5"
## 5
## "outlook %in% c(\"sunny\", \"rainy\") & humidity >= 82.5"

partykit:::.list.rules.party(ct)
##23
## "humidity <= 80"  "humidity > 80"

If you do not want a text representation but something else you can 
compute on, then look at the source code of partykit:::.list.rules.party() 
and try to adapt it to your needs.



On Wed, Apr 13, 2016 at 4:30 PM, Bert Gunter  wrote:


I think you are missing the point of random forests. But if you just
want to predict using the forest, there is a predict() method that you
can use. Other than that, I certainly don't understand what you mean.
Maybe someone else might.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Apr 13, 2016 at 2:11 PM, Michael Artz 
wrote:

Ok is there a way to do  it with decision tree?  I just need to make the
decision rules. Perhaps I can pick one of the trees used with Random
Forrest.  I am somewhat familiar already with Random Forrest with

respective

to bagging and feature sampling and getting the mode from the leaf nodes

and

it being an ensemble technique of many trees.  I am just working from the
perspective that I need decision rules, and I am working backward form

that,

and I need to do it in R.

On Wed, Apr 13, 2016 at 4:08 PM, Bert Gunter 

wrote:


Nope.

Random forests are not decision trees -- they are ensembles (forests)
of trees. You need to go back and read up on them so you understand
how they work. The Hastie/Tibshirani/Friedman "The Elements of
Statistical Learning" has a nice explanation, but I'm sure there are
lots of good web resources, too.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Apr 13, 2016 at 1:40 PM, Michael Artz 
wrote:

Hi I'm trying to get the top decision rules from a decision tree.
Eventually I will like to do this with R and Random Forrest.  There

has

to
be a way to output the decsion rules of each leaf node in an easily
readable way. I am looking at the randomforrest and rpart packages

and I

dont see anything yet.
Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__

Re: [R] Test for Homoscedesticity in R Without BP Test

2016-04-04 Thread Achim Zeileis


On Mon, 4 Apr 2016, varin sacha via R-help wrote:


Hi Deepak,

In econometrics there is another test very often used : the white test. 
The white test is based on the comparison of the estimated variances of 
residuals when the model is estimated by OLS under the assumption of 
homoscedasticity and when the model is estimated by OLS under the 
assumption of heteroscedastic.


The White test is a special case of the Breusch-Pagan test using a 
particular specification of the auxiliary regressors: namely all 
regressors, their squares and their cross-products. As this specification 
makes only sense if all regressors are continuous, many implementations 
have problems if there are already dummy variables, interactions, etc. in 
the regressor matrix. This is also the reason why bptest() from "lmtest" 
uses a different specification by default. However, you can utilize the 
function to carry out the White test as illustrated in:


example("CigarettesB", package = "AER")

(Of course, the AER package needs to be installed first.)


The White test with R

install.packages("bstats")
library(bstats)
white.test(LinearModel)


That package is no longer on CRAN as it took the code from bptest() 
without crediting its original authors and released it in a package that 
conflicted with the original license. Also, the implementation did not 
check for potential problems with dummy variables or interactions 
mentioned above.


So the bptest() implementation from "lmtest" is really recommend. Or 
alternatively ncvTest() from package "car".



Hope this helps.

Sacha






De : Deepak Singh 
À : r-help@r-project.org 
Envoyé le : Lundi 4 avril 2016 10h40

Objet : [R] Test for Homoscedesticity in R Without BP Test


Respected Sir,
I am doing a project on multiple linear model fitting and in that project I
have to test Homoscedesticity of errors I have google for the same and
found bptest for the same but in R version 3.2.4 bp test is not available.
So please suggest me a test on homoscedesticity ASAP as we have to submit
our report on 7-04-2016.

P.S. : I have plotted residuals against fitted values and it is less or
more random.

Thank You !

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Test for Homoscedesticity in R Without BP Test

2016-04-04 Thread Achim Zeileis


On Mon, 4 Apr 2016, Deepak Singh wrote:


Respected Sir,
I am doing a project on multiple linear model fitting and in that project I
have to test Homoscedesticity of errors I have google for the same and
found bptest for the same but in R version 3.2.4 bp test is not available.


The function is called bptest() and is implemented in package "lmtest" 
which is available for current versions of R, see

https://CRAN.R-project.org/package=lmtest

To install it, run:
install.packages("lmtest")

And then to load the package and try the function:
library("lmtest")
example("bptest")


So please suggest me a test on homoscedesticity ASAP as we have to submit
our report on 7-04-2016.

P.S. : I have plotted residuals against fitted values and it is less or
more random.

Thank You !

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compute the Gini coefficient

2016-03-30 Thread Achim Zeileis


On Wed, 30 Mar 2016, Erich Neuwirth wrote:




On 30 Mar 2016, at 02:53, Marine Regis  wrote:

Hello,

I would like to build a Lorenz curve and calculate a Gini coefficient in order 
to find how much parasites does the top 20% most infected hosts support.

Here is my data set:

Number of parasites per host:
parasites = c(0,1,2,3,4,5,6,7,8,9,10)

Number of hosts associated with each number of parasites given above:
hosts = c(18,20,28,19,16,10,3,1,0,0,0)

To represent the Lorenz curve:
I manually calculated the cumulative percentage of parasites and hosts:

cumul_parasites <- cumsum(parasites)/max(cumsum(parasites))
cumul_hosts <- cumsum(hosts)/max(cumsum(hosts))
plot(cumul_hosts, cumul_parasites, type= "l?)



Your values in hosts are frequencies. So you need to calculate

cumul_hosts = cumsum(hosts)/sum(hosts)
cumul_parasites = cumsum(hosts*parasites)/sum(parasites)


That's what I thought as well but Marine explicitly said that the 'host' 
are _not_ weights. Hence I was confused what this would actually mean.


Using the "ineq" package you can also do
plot(Lc(parasites, hosts))


The Lorenz curves starts at (0,0), so to draw it, you need to extend these 
vectors

cumul_hosts = c(0,cumul_hosts)
cumul_parasites = c(0,cumul_parasites)

plot(cumul_hosts,cum9l_parasites,type=?l?)


The Gini coefficient can be calculated as
library(reldist)
gini(parasites,hosts)


If you want to check, you can ?recreate? the original data (number of parasited 
for each host) with

num_parasites = rep(parasites,hosts)

and
gini(num_parasites)

will also give you the Gini coefficient you want.








From this Lorenz curve, how can I calculate the Gini coefficient with the function "gini" 
in R (package reldist) given that the vector "hosts" is not a vector of weights ?


Thank you very much for your help.
Have a nice day
Marine


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package Formula????

2016-03-11 Thread Achim Zeileis


On Fri, 11 Mar 2016, CHIRIBOGA Xavier wrote:


Dear all, again me


I have already updated my version of R. But still appears this message:


plot(ctree(Surv(hours,state)~soil+volatile, data=data))
Error in loadNamespace(name) : there is no package called 'Formula'


I cannot find a package "Formula" in my list of packages. What to do in that 
case?


As you were instructed yesterday: Simply install the package "Formula". 
You can do this in R on the command line


install.packages("Formula")

or using the graphical user interface if you are using one...



Thank you,

Xavier

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] FUNCTION ctree

2016-03-10 Thread Achim Zeileis

Thanks to Erich and Sarah for the clarifications. For those wondering 
whether ctree() from "party" or from "partykit" should be used: the latter 
is the newer and improved implementation.


On Thu, 10 Mar 2016, Erich Neuwirth wrote:


If you do
??ctree
and the package partykit is installed, you will see that this function is 
defined in this package.
So, you should run
library(partykit)
before running your function call

If partykit is not installed, you need to install it.






On Mar 10, 2016, at 15:58, CHIRIBOGA Xavier  wrote:

Dear all,


I am using Rstudio. What to do when you get this message?

Error in plot(ctree(Surv(hours, state) ~ soil + volatile, data = data)) :
 could not find function "ctree"

Thank you,


Xavier

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ivreg with two endogenous variables and 1 instrument respectively

2016-03-09 Thread Achim Zeileis


On Tue, 8 Mar 2016, stefania innocenti wrote:


Hello,
I am trying to use the ivregress function to estimate a second stage 
model which looks like the following:


LogGDP=GEO+RULE+OPENNESS

I would like to instrument Rule (RULE) with Settler mortality (SETTLER) 
and Openness (OPENNESS) with logFrankelRomer (FR). I thus have one 
instrument per each endogenous variable.


Thus the second stage look like:
RULE=GEO+logFR+SETTLER
OPENNESS=GEO+logFR+SETTLER

Can I do this simultaneously with the ivregress function?


Which package is this?


Maybe I got something wrong with the syntax but if I do:
ivregress(logGDP ~RULE+GEO+OPENNESS, c(RULE~SETTLER+FR+GEO, 
OPENNESS~SETTLER+FR+GEO), mydata), the function bugs and I cannot 
proceed.


Of course, I could run the two first stages separately, but the 
coefficient of the first stage are then wrong.


The command ivreg(logGDP ~ RULE + GEO +OPEN| GEO + SETTLER+FR) gives me 
correct second stage coefficients (I checked with what I obtained doing 
the 2sls by hand).


That's package "AER", I presume?

But how do I retrieve the first stage coefficients for 
RULE=GEO+logFR+SETTLER and OPENNESS=GEO+logFR+SETTLER in this case?


The ivreg() function from "AER" does not store the first stage results but 
just returns the IV estimates from the second stage. If you want the first 
stage you need to run lm() manually:


lm(cbind(RULE, OPENNESS) ~ GEO + logFR + SETTLER, ...)


Any advice is more than appreciated.
Thanks a lot in advance.
Best regards,
Stefania

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error in vcovNW

2015-12-19 Thread Achim Zeileis

On Sat, 19 Dec 2015, Saba Sehrish wrote:

Thank you. The issue is resolved by scaling the data in millions.

That solves the numerical problem but the second issue (inappropriateness 
of the Newey-West estimator for an autoregressive model) persists.

Saba

On Saturday, 19 December 2015, 15:06, Achim Zeileis
<achim.zeil...@uibk.ac.at> wrote:

On Sat, 19 Dec 2015, Saba Sehrish via R-help wrote:

> Hi I am using NeweyWest standard errors to correct lm( ) output. For
example:
> lm(A~A1+A2+A3+A4+A5+B1+B2+B3+B4+B5)
> vcovNW<-NeweyWest(lm(A~A1+A2+A3+A4+A5+B1+B2+B3+B4+B5))
>
> I am using package(sandwich) for NeweyWest. Now when I run this command,
it gives following error:
> Error in solve.default(diag(ncol(umat)) - apply(var.fit$ar, 2:3, sum))
:system is computationally singular: reciprocal condition number =
7.49468e-18
>
> Attached herewith is data for A, A1,A2,A3,A4,A5,B1,B2,B3,B4,B5 are
> simply lag variables. Can you help me removing this error please?

Without trying to replicate the error, there are at least two issues:

(1) You should scale your data to use more reasonable orders of magnitude,
e.g., in millions. This will help avoiding numerical problems.

(2) More importantly, you should not employ HAC/Newey-West standard errors
in autoregressive models. If you use an autoregressive specification, you
should capture all relevant autocorrelations - and then no HAC estimator
is necessary. Alternatively, one may treat autocorrelation as a nuisance
parameter and not model it - but instead capture it in HAC standard
errors. Naturally, the former strategy will typically perform better if
the autocorrelations are more substantial.

> Saba

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error in vcovNW

2015-12-18 Thread Achim Zeileis


On Sat, 19 Dec 2015, Saba Sehrish via R-help wrote:


Hi I am using NeweyWest standard errors to correct lm( ) output. For example:
lm(A~A1+A2+A3+A4+A5+B1+B2+B3+B4+B5)
vcovNW<-NeweyWest(lm(A~A1+A2+A3+A4+A5+B1+B2+B3+B4+B5))

I am using package(sandwich) for NeweyWest. Now when I run this command, it 
gives following error:
Error in solve.default(diag(ncol(umat)) - apply(var.fit$ar, 2:3, sum)) :system 
is computationally singular: reciprocal condition number = 7.49468e-18

Attached herewith is data for A, A1,A2,A3,A4,A5,B1,B2,B3,B4,B5 are 
simply lag variables. Can you help me removing this error please?


Without trying to replicate the error, there are at least two issues:

(1) You should scale your data to use more reasonable orders of magnitude, 
e.g., in millions. This will help avoiding numerical problems.


(2) More importantly, you should not employ HAC/Newey-West standard errors 
in autoregressive models. If you use an autoregressive specification, you 
should capture all relevant autocorrelations - and then no HAC estimator 
is necessary. Alternatively, one may treat autocorrelation as a nuisance 
parameter and not model it - but instead capture it in HAC standard 
errors. Naturally, the former strategy will typically perform better if 
the autocorrelations are more substantial.



Saba

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Clustered Standard Errors?

2015-11-16 Thread Achim Zeileis


On Mon, 16 Nov 2015, Ignacio Martinez wrote:


Hi,

I found this

post
from 2011 for doing clustered standard errors in R.

Is there any update to the lm package that allows to do this without 
having to write your own function? In STATA is as simple as adding 
cluster to the reg command.


For "lm" objects, the easiest solution is to use the "multiwayvcov" 
package on CRAN.


Additionally, the "plm" package on CRAN offers further options beyond the 
simple clustered standard errors.



Thanks a lot for the help!

Ignacio

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] use of PPML (Poisson Pseudo Maximum Likelihood)

2015-10-08 Thread Achim Zeileis


On Thu, 8 Oct 2015, ?? ?. wrote:


Dear Colleagues!


We use R for research. We would like to perform calculations (gravity model
of trade) with the
??
use of PPML (Poisson Pseudo Maximum Likelihood).

Please explain:

1. What package in R enables us to make calculations in accordance with the
PPML?

We used the command
glm()


Yes.


with the parameter

family=quasipoisson(link="log").


That is one option. In econometrics, the quasi-Poisson version is less 
commonly used than in statistics, though. Alternatively, so-called 
"robust" sandwich standard errors can be applied, i.e.,


m <- glm(..., family = poisson)

library("lmtest")
library("sandwich")
coeftest(m, vcov = sandwich)


Does this command procedure PPML? If not, then what command allows us to
use PPML?

2. For calculations using a linear model the quality factor is R^2. What 
is the analogue of this indicator for PPML in system R?


That is not really an R question. Sometimes people use pseudo-R-squared 
measures, e.g., available in package "pscl". Whether or not this is really 
useful is a different question, though.



Thank you very much in advance!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] c(1:n, 1:(n-1), 1:(n-2), ... , 1)

2015-09-17 Thread Achim Zeileis


On Thu, 17 Sep 2015, Peter Langfelder wrote:


Not sure if this is slicker or easier to follow than your solution,
but it is shorter :)

do.call(c, lapply(n:1, function(n1) 1:n1))


Also not sure about efficiency but somewhat shorter...
unlist(lapply(5:1, seq))


Peter

On Thu, Sep 17, 2015 at 11:19 AM, Dan D  wrote:

Can anyone think of a slick way to create an array that looks like c(1:n,
1:(n-1), 1:(n-2), ... , 1)?

The following works, but it's inefficient and a little hard to follow:
n<-5
junk<-array(1:n,dim=c(n,n))
junk[((lower.tri(t(junk),diag=T)))[n:1,]]

Any help would be greatly appreciated!

-Dan



--
View this message in context: 
http://r.789695.n4.nabble.com/c-1-n-1-n-1-1-n-2-1-tp4712390.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trees (and Forests) with packages 'party' vs. 'partykit': Different results

2015-09-14 Thread Achim Zeileis


Christopher,

thanks for you interest.

I'm currently exploring a dataset with the help of conditional inference 
trees (still very much a beginner with this technique & log. reg. 
methods as a whole t.b.h.), since they explained more variation in my 
dataset than a binary logistic regression with /glm/. I started out with 
the /party /package, but after I while I ran into the 'updated' 
/partykit /package and tried this out, too.


If you want to use individual trees (as opposed to forests), then the 
"partykit" package is recommended because it contains much improved 
re-implementations of ctree() and mob() as well as the mob() convenience 
interfaces lmtree() and glmtree(). For forests see below.


Now, the strange thing is that both trees look quite different - 
actually even the very first split is different.


This might be due to several partitioning variables being associated with 
tiny p-values in the root node. The re-implementation in partykit 
internally computes with log-p-values and hence should be numerically more 
stable. In the old implementation it could happen that from several highly 
significant variables, always the first is chosen because the p-values 
were essentially indistinguishable for the computer.


If you think that this is not the problem, then please contact the package 
maintainer with a reproducible example.


Except for bug fixes like the one above, the trees grown by 
partykit::ctree and party::ctree should be the same.


So I did some research and came across the 'forest' concept. However, it 
seems that the /varImp /function does not yet work in the /partykit 
/implementation,


Correct. While the ctree() implementation in partykit is better than that 
in party, the same is _not_ true for cforest(). The new partykit::cforest 
is currently still a basic implementation which doesn't offer as many 
features as the party::cforest implementation. More work is needed 
especially for variable importance measures and different kinds of 
predictions.


which raises the question for me how I should evaluate the /partykit 
/forest - how can I find out whether the variables are important in the 
forest as in my /partykit /tree? Is there some way to do this or some 
other solution for this problem? I'd prefer to continue the /partykit 
/implementation of ctree, since it allows more settings for the final 
plot, which I'd need to get the final (large) plot into a readable form.


Related to this project, I'd also like to give statistics for the overall
model, e.g. overall significance, Nagelkerke's R², a C-value. After a
'regular' binary log. reg., I would use the lrm function to get these
values, but I am unsure whether it would be correct to also apply this
method to my tree data.


Overall significance is difficult because you have done model selection 
when growing the tree. As for pseudo R-squared or information criteria 
etc., it is relatively easy to compute these "by hand" based on the 
observed and fitted responses. An example for this is provided at:

http://stackoverflow.com/questions/29524670/how-to-find-the-the-deviance-of-an-as-party-object-converted-from-rpart-tree-in/29693223#29693223

Any help would be greatly appreciated! 


-- Christopher



--
View this message in context: 
http://r.789695.n4.nabble.com/Trees-and-Forests-with-packages-party-vs-partykit-Different-results-tp4712214.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [FORGED] Re: Help with Binning Data

2015-09-10 Thread Achim Zeileis


On Fri, 11 Sep 2015, Rolf Turner wrote:


On 11/09/15 11:57, David Winsemius wrote:




The urge to imitate other statistical package that rely on profusion
of dummies should be resisted. R repression functions can handle
factor variables 




Fortune? :-)


Nice! Should I include the "repression" typo? :-)

Best,
Z


cheers,

Rolf

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Task Views

2015-09-09 Thread Achim Zeileis


On Wed, 9 Sep 2015, Partha Sinha wrote:


Dear All
I am using R version R version 3.2.1 (2015-06-18) in windows 7.
I have installed few task views like Timeseries and Graphics in R.

1. Is it possible to find out which "TASK Views" are installed in the 
system ?


Not out of the box. The task view installation just triggers a normal 
installation of packages - once these are installed there is no way to 
know from which task views the packages came.


But you can easily run update.views("TimeSeries") which will check first 
whether all packages from the task view are already installed. And then it 
only installs those that are not installed in a current version.



2. Can in install multiple Task views using single command ?


Yes, just do install.views(c("TimeSeries", "Graphics")) or analogously for 
update.views().



Regards
Partha

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] hurdle control and optim

2015-08-17 Thread Achim Zeileis

Please refrain from cross-posting. The same request was sent to the author 
of hurdle(), R-help, and StackOverflow (where it was already answered). 
Also, do provide self-contained and reproducible code as would be 
appropriate in any of the three cases.


On Mon, 17 Aug 2015, Matt Dicken wrote:



I was hoping someone may be able to help with the following.

I fit the model below using the pscl package. I am modelling catch data (about 
17,000 entry points) so lots of zero's

fit.hurdle.bin = hurdle(Catch ~ Beach + Region + Year+
 Decade + Month + Season + Whale+ Sex + Size+ meantemp +
 meanviz + offset(log(Length.nets..km.)),
 dist=poisson,zero.dist=binomial,link=logit,trace=T)

The model output tells me that:
Warning message: In sqrt(diag(object$vcov)) : NaNs produced (against year)

I then use hurdle control with L-BFGS-B to set some parameter controls to 
solve this issue, but get the warning message:
L-BFGS-B needs finite values of 'fn'
In addition: Warning message:
In optim(fn = countDist, gr = countGrad, par = c(start$count, if (dist ==  :
 method L-BFGS-B uses 'factr' (and 'pgtol') instead of 'reltol' and 'abstol'

How do I write the script for Hurdle control to solve these issues?
Any help would be really appreciated
All the best
Matt





Dr. Matt Dicken
Senior Scientist
Telephone: 0315660400 | Fax: 0315660493 | Email: m...@shark.co.za
Physical Address: 1a Herrwood Drive, Umhlanga Rocks, 4320 | www.shark.co.za

[http://www.shark.co.za/ImageHandler.ashx?fguid=2c107195-209c-4fb2-aae5-31e45ce5de1a]

Connect with us on social media: [KZNSB Facebook]  https://www.facebook.com/kznsb 
[KZNSB Twitter]  https://twitter.com/KznSharks?lang=en

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] CHAID Usage in R

2015-06-22 Thread Achim Zeileis


On Mon, 22 Jun 2015, My List wrote:


All:

I am trying to implement CHAID decision tree. Can anyone please help me to
know which package can be used for it or lead me to the documentation to
the same.


A CHAID implementation is available on R-Forge at:
https://R-Forge.R-project.org/R/?group_id=343

In recent versions of R you should be able to install the package from 
within R via: install.packages(CHAID, repos=http://R-Forge.R-project.org;)


The package contains manual pages.

As an alternative to CHAID you might want to use some more recently 
developed statistical algorithms for classification trees, see e.g.,

http://CRAN.R-project.org/view=MachineLearning

In particular, the ctree() function from package partykit is based on 
similar ideas but is more flexible with respect to the distribution of 
dependent and explanatory variables.



I did read a web page says that only the source for the CHAID package is
available. Kindly,advice on the package to be used.

Thanks in advance,
Harmeet

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Views installation

2015-06-12 Thread Achim Zeileis


On Fri, 12 Jun 2015, Partha Sinha wrote:


I want to install views  like machine learning and Bayesian using code
( write code and running the code once and it will install all
packages in those views rather than doing it twice).
Pl help


I'm not sure what exactly you are looking for but

ctv::update.views(c(Bayesian, MachineLearning))

should get everything installed.


Parth

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mean error message missing

2015-06-08 Thread Achim Zeileis


On Mon, 8 Jun 2015, Christian Brandstätter wrote:


Dear list,

I found an odd behavior of the mean function; it is allowed to do something 
that you probably shouldn't:
If you calculate mean() of a sequence of numbers (without declaring them as 
vector), mean() then just computes mean() of the first element. Is there a 
reason why there is no warning, like in sd for example?


mean() - unlike sd() - is a generic function that has a '...' argument 
that is passed on to its methods. The default method which is called in 
your example also has a '...' argument (because the generic has it) but 
doesn't use it.



Example code:
mean(1,2,3,4)
sd(1,2,3,4)

Best regards
Christian

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mean of dates

2015-04-24 Thread Achim Zeileis




On Fri, 24 Apr 2015, Jue Lin-Ye wrote:


Dear fellow R-help members,

If my data is

 MM DD HH
2015 04 24 01
2015 04 24 02
2015 04 24 06

Where

: year
MM:month
DD:day
HH: hour

How could I calculate the mean of the ISOdatetime(,MM,DD,HH,0,0) of
these?


With the mean() method? On my machine:

R mean(ISOdatetime(,MM,DD,HH,0,0))
[1] 2015-04-24 03:00:00 CEST

hth,
Z


Note: I set minutes and seconds to 0, as I don't have data for them.

?Thank you in advance!?

--
Jue Lin-Ye

---
Civil Engineering phD candidate
Maritime Engineering Laboratory (LIM)
Universitat Politècnica de Catalunya (UPC)
C/Jordi Girona 1-3, Barcelona 08034 (Spain)
-

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Exhaustive CHAID package

2015-04-22 Thread Achim Zeileis


On Wed, 22 Apr 2015, Michael Grant wrote:


Many thanks for your response, sir.

Here are two of the references to which I referred.  I've also 
personally explored several data sets in which the outcomes are 'known' 
and have seen high variability in the topology of the trees being 
produced but, typically Exhaustive CHAID predictions match the 'known' 
results better than any of the others, using default settings.


http://www.hindawi.com/journals/jam/2014/929768/
http://interstat.statjournals.net/YEAR/2010/articles/1007001.pdf


Thanks for the references, I wasn't aware of these. Both of these appear 
to use the SPSS implementations of CHAID, exhaustive CHAID, CART, and 
QUEST with default settings (as you did above). As I have never used these 
myself in SPSS, I cannot say how the implementations compare but it's well 
possible that these are different from other implementations. E.g., for 
CART the pruning rule may make a difference (with or without 
crossvalidation; with 1-SE or 0-SE rule etc.). Similarly, for QUEST I 
think that Loh's own implementation uses somewhat different default 
settings.


So it may be advisable to go beyond defaults.


By inference, many research papers are choosing Exhaustive CHAID.


My experience is that this is determined to a good degree by the software 
available and what others in the same literature use.


My concern is not that these procedures produce mildly variant trees but 
dramatically variant, with not even the same set of variables included.


Yes, instability of the tree structure is one of the drawbacks of 
tree-based procedures. Of course, the tree structure can be very different 
while producing very similar predictions.



Is CHAID available for use as an R package?


Yes.


I thought R-FORGE was solely for developers?


See: https://R-Forge.R-project.org/R/?group_id=343

You can easily install the package from R-Forge and also check out the 
entire source code anonymously.



Again, many thanks.

MCG

-Original Message-
From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at]
Sent: Wednesday, April 22, 2015 3:30 AM
To: Michael Grant
Cc: r-help@R-project.org
Subject: Re: [R] Exhaustive CHAID package

On Tue, 21 Apr 2015, Michael Grant wrote:


Dear R-Help:

From multiple sources comparing methods of tree classification and
tree regressions on various data sets, it seems that Exhaustive CHAID
(distinct from CHAID), most commonly generates the most useful tree
results and, in particular, is more effective than ctree or rpart
which are implemented in R.


I searched a bit on the web for exhaustive CHAID and didn't find any convincing evidence that 
this method is most commonly the most useful.
I doubt that such evidence exists because the methods are applicable to so many 
different situations that uniformly better results are essentially never 
obtained. Nevertheless, if you have references of comparison studies, I would 
still be interested. Possibly these provide insight in which situations 
exhaustive CHAID performs particularly well.


I see that CHAID, but not Exhaustive CHAID, is in the R-forge, and I
write to ask if there are plans to create a package which employs the
Exhaustive CHAID strategy.


I wouldn't know of any such plans. But if you want to adapt/extend the code 
from the CHAID package, this is freely available.


Right now the best source I can find is in SPSS-IBM and I feel a bit
disloyal to R using it.


I wouldn't be concerned about disloyalty. If you feel that exhaustive CHAID is the most 
appropriate tool for your problem and you have access to it in SPSS, why not use it? 
Possibly you can also export it from SPSS and import it into R using PMML. The 
partykit package has an example with an imported QUEST tree from SPSS.


Michael Grant
Professor
University of Colorado Boulder

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Exhaustive CHAID package

2015-04-22 Thread Achim Zeileis


On Tue, 21 Apr 2015, Michael Grant wrote:


Dear R-Help:

From multiple sources comparing methods of tree classification and tree 
regressions on various data sets, it seems that Exhaustive CHAID 
(distinct from CHAID), most commonly generates the most useful tree 
results and, in particular, is more effective than ctree or rpart which 
are implemented in R.


I searched a bit on the web for exhaustive CHAID and didn't find any 
convincing evidence that this method is most commonly the most useful. 
I doubt that such evidence exists because the methods are applicable to so 
many different situations that uniformly better results are essentially 
never obtained. Nevertheless, if you have references of comparison 
studies, I would still be interested. Possibly these provide insight in 
which situations exhaustive CHAID performs particularly well.


I see that CHAID, but not Exhaustive CHAID, is in the R-forge, and I 
write to ask if there are plans to create a package which employs the 
Exhaustive CHAID strategy.


I wouldn't know of any such plans. But if you want to adapt/extend the 
code from the CHAID package, this is freely available.


Right now the best source I can find is in SPSS-IBM and I feel a bit 
disloyal to R using it.


I wouldn't be concerned about disloyalty. If you feel that exhaustive 
CHAID is the most appropriate tool for your problem and you have access to 
it in SPSS, why not use it? Possibly you can also export it from SPSS and 
import it into R using PMML. The partykit package has an example with an 
imported QUEST tree from SPSS.



Michael Grant
Professor
University of Colorado Boulder

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiple break in univariate series

2015-03-29 Thread Achim Zeileis


On Sat, 28 Mar 2015, Bert Gunter wrote:


Once you have looked at the data and chosen change points to test
based on the data, the tests for change points are invalid (unless you
make appropriate adjustments for post hoc tests).

And no, I am not making this up. Consult any competent statistician.


Yes, you must not select a breakpoint by eyeballing a time series and then 
conduct a structural break test for this given breakpoint as if it were 
exogenously given. This will be far too liberal.


But most practitioners would think it is still ok to conduct a structural 
break test with _unknown_ breakpoint. Ideally, the hypothesis to be tested 
and the significance level were formulated prior to the collection of the 
data, though.


To add to Uwe's comments: maxstat_test() in package coin would be a 
non-parametric permutation test. sctest() in package strucchange 
provides a wide range of tests for general parametric models, especially 
linear regressions.


Best,
Z


Cheers,
Bert





Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
Clifford Stoll




On Sat, Mar 28, 2015 at 5:52 PM, Uwe Ligges
lig...@statistik.tu-dortmund.de wrote:



On 29.03.2015 00:09, Temel ?spanyolca wrote:



DR. UWE LIGGES


I have sent turkish real Gdp data (1998-2013) in annex.
Turkey has lived two crises in this period, in 2001 and 2008. You can
see in data.
My problem is to indicate these dates any statistic test as a structural
change or point change.



Have you tried CUSUM? Or some permutations test? You do not have muh data


Best,
Uwe Ligges





Sincerely
Engin

2015-03-28 23:04 GMT+01:00 Uwe Ligges lig...@statistik.tu-dortmund.de
mailto:lig...@statistik.tu-dortmund.de:

 Forwarded Message 
Subject: [R] multiple break in univariate series
Date: Sat, 28 Mar 2015 00:41:35 +0100
From: Temel ?spanyolca ispanyol...@gmail.com
mailto:ispanyol...@gmail.com
To: R-help@r-project.org mailto:R-help@r-project.org

Hello
Any one knows multiple break test for univariate series ?



Which kind of breaks? shift?

You may want to look for CUSUM or MOSUM tests or even permutation
tests.

Packages: strucchange, changepoint

http://cran.r-project.org/web/__packages/changepoint/index.__html
http://cran.r-project.org/web/packages/changepoint/index.html

http://cran.r-project.org/web/__packages/strucchange/index.__html
http://cran.r-project.org/web/packages/strucchange/index.html

Best,
Uwe Ligges




--
*Thanks*
Engin YILMAZ

  [[alternative HTML version deleted]]


R-help@r-project.org mailto:R-help@r-project.org mailing
list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/__listinfo/r-help
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/__posting-guide.html
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.





--
*Sayg?lar?mla*
Engin YILMAZ



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] #library(party) - Compare predicted results for ctree

2015-02-16 Thread Achim Zeileis


On Mon, 16 Feb 2015, Rodica Coderie via R-help wrote:


Hello,

I've created a ctree model called fit using 15 input variables for a factor 
predicted variable Response (YES/NO).
When I run the following :
table(predict(fit2), training_data$response)
I get the following result:

NO   YES
NO  48694   480
YES 0 0

It appears that the NO responses are predicted with 100% accuracy and 
the YES response are predicted with 0% accuracy.


Why is this happening? It's because of my data or it's something in 
ctree algorithm?


Your data has less than 1% of YES observations and I would guess that the 
tree cannot separate these in a way such that majority voting gives a YES 
prediction. You might consider a different cutoff (other than 50%) or 
downsampling the NO observations.



Thanks!
Rodica

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R2BayesX

2015-02-05 Thread Achim Zeileis


Shanley,

thanks for the example.


Thank you for your help.
Here is my R codes:


data(MunichBnd)



N - length(MunichBnd); n - N*5
dat - data.frame(x1 = runif(n, -3, 3),id = as.factor(rep(names(MunichBnd), 
length.out = n)))
dat$sp - with(dat, sort(runif(N, -2, 2), decreasing = TRUE)[id])
dat$re_poi - with(dat, rpois(N, 10)[id])
dat$y_poi - with(dat, 1.5 + sin(x1) + sp + re_poi + rpois(n, 10))
b_poi - bayesx(y_poi ~ sx(x1) +  sx(id, bs = mrf, map = MunichBnd) + sx(id, bs = 
re), method = MCMC, data = dat, family='poisson')


Warning message:
running command 
'C:/Users/ChongS/Documents/R/win-library/3.1/BayesXsrc/libs/x64/BayesX.exe 
C:/Users/ChongS/AppData/Local/Temp/RtmpqMhw3Z/bayesx/bayesx.estim.input.prg' had status 3

If I run the model using family='gaussian', it doesn't have problem.
b_poi - bayesx(y_poi ~ sx(x1) +   sx(id, bs = mrf, map = MunichBnd) +  sx(id, bs = 
re), method = MCMC, data = dat, family='gaussian')


This appears to be a problem in BayesX for combination of bs=re, 
method=MCMC and family=poisson. If at least one of these is changed, 
everything works. An even simpler example is:


set.seed(1)
d - data.frame(y = rpois(1000, 5))
d$x - rep(1:100, each = 10)

Then:

bayesx(y ~ sx(x, bs = re), data = d, method = MCMC, family = poisson)
## - problem processing BayesX!

whereas

bayesx(y ~ sx(x),data = d, method = MCMC, family = poisson)
bayesx(y ~ sx(x, bs = re), data = d, method = REML, family = poisson)
bayesx(y ~ sx(x, bs = re), data = d, method = MCMC, family = gaussian)

are all ok. Niki has contacted the original BayesX authors who might be 
able to give more insight. For the moment, I would recommend to use 
method=REML.


Best,
Z


Kind regards,
Shanley


-Original Message-
From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at]
Sent: Wednesday, 4 February 2015 8:20 PM
To: Shanley Chong
Cc: r-help@r-project.org; nikolaus.uml...@uibk.ac.at
Subject: Re: [R] R2BayesX

Shanley:


I am trying to run STAR logistic/binomial model using R2BayesX . A
warning message came up every time when i included a random effect,
bs='re', (unstructured spatial effect) into the model. The warning
message is : Warning message: running command
'C:/Users/ChongS/Documents/R/win-library/3.1/BayesXsrc/libs/x64/BayesX.exe
C:/Users/ChongS/AppData/Local/Temp/RtmpCW8QY9/bayesx33/bayesx.estim.input.prg'
had status 5


I haven't seen this problem before. Maybe you can provide commented, minimal, 
self-contained, reproducible code (as the posting guide asks)?

I'm also cc'ing the package maintainer (Nikolaus Umlauf) as he might be able to 
give more insight.

Best,
Z


The model is fine when i just ran the model using mrf, bs='mrf', (structured 
spatial effect).

And when i tried to run STAR using family=Gaussian (response is normally 
distributed) with structured and unstructured spatial effect, it is fine too.

Can anyone please help?

Regards,

Shanley


_
This email has been scanned for the Sydney  South Western Sydney Local Health 
Districts by the MessageLabs Email Security System.
Sydney  South Western Sydney Local Health Districts regularly monitor email 
and attachments to ensure compliance with the NSW Ministry of Health's Electronic 
Messaging Policy.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



_
This email has been scanned for the Sydney  South Western Sydney Local Health 
Districts by the MessageLabs Email Security System.
Sydney  South Western Sydney Local Health Districts regularly monitor email 
and attachments to ensure compliance with the NSW Ministry of Health's Electronic 
Messaging Policy.

_
This email has been scanned for the Sydney  South Western Sydney Local Health 
Districts by the MessageLabs Email Security System.
Sydney  South Western Sydney Local Health Districts regularly monitor email 
and attachments to ensure compliance with the NSW Ministry of Health's Electronic 
Messaging Policy.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained

Re: [R] R2BayesX

2015-02-04 Thread Achim Zeileis


Shanley:

I am trying to run STAR logistic/binomial model using R2BayesX . A 
warning message came up every time when i included a random effect, 
bs='re', (unstructured spatial effect) into the model. The warning 
message is : Warning message: running command 
'C:/Users/ChongS/Documents/R/win-library/3.1/BayesXsrc/libs/x64/BayesX.exe 
C:/Users/ChongS/AppData/Local/Temp/RtmpCW8QY9/bayesx33/bayesx.estim.input.prg' 
had status 5


I haven't seen this problem before. Maybe you can provide commented, 
minimal, self-contained, reproducible code (as the posting guide asks)?


I'm also cc'ing the package maintainer (Nikolaus Umlauf) as he might be 
able to give more insight.


Best,
Z


The model is fine when i just ran the model using mrf, bs='mrf', (structured 
spatial effect).

And when i tried to run STAR using family=Gaussian (response is normally 
distributed) with structured and unstructured spatial effect, it is fine too.

Can anyone please help?

Regards,

Shanley


_
This email has been scanned for the Sydney  South Western Sydney Local Health 
Districts by the MessageLabs Email Security System.
Sydney  South Western Sydney Local Health Districts regularly monitor email 
and attachments to ensure compliance with the NSW Ministry of Health's Electronic 
Messaging Policy.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 4 5 6 7 8 >

1 - 100 of 713 matches

Mail list logo