[R] Read .xlsx and convert date-column value into Dataframe

2015-07-22 Thread R_Antony
Hi,

Here i am having a .xlsx file and it contains various columns including
date-column[mm/dd/yy]-but it is not in the date format. I have to read this
excel[.xlsx] file and need to get in dataframe. So i used xlsx-liabrary
and it was fine to read data. But the problem is, values in the date column
is converting to some other value.

for eg:- 

FF DATE
---
3/31/2016
2/26/2016
--
1/2/2016

[Values like -- will come in the column to indicate that there is no date
mentioned ]

and i getting result like this,

FF DATE
---
42460
42426

42125

this is the code i am using for it,

theData-data.frame(read.xlsx2(InputFilePath, sheetIndex,
sheetName=Workflow_Report, startRow=3,colIndex=NULL, endRow=NULL,
as.data.frame=TRUE, header=TRUE))

Aim :- I have to get actual date-column values in dataframe from xlsx
file.

I tried many ways, Could someone please help ?

Thanks in advance,
Antony.




--
View this message in context: 
http://r.789695.n4.nabble.com/Read-xlsx-and-convert-date-column-value-into-Dataframe-tp4710192.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sensitivity,Specificity and Youden Index

2015-07-22 Thread Sarah Goslee
Please reply to the list, not me. The list doesn't allow most
attachments (and most of the list recipients are highly unlikely to
open unsolicited binary files anyway). Do see the link I provided for
the appropriate way to create reproducible examples, including
providing data (hint: use dput(), and include your code).

Some evidence that you've looked at the packages suggested by rseek
would also be useful. Nobody's going to do your work for you.

Sarah

On Wed, Jul 22, 2015 at 12:34 PM, Jomy Jose infoj...@gmail.com wrote:
 Thank you for your reply

 I have attached my dataset,here 'malnut' is the
 outcome,.sensitivity,specificity and youden index of each factor from chew
 to cc(18 factors) and combination of these factors (any 6 out of 18) with
 the outcome measure 'malnut' has to be generated

 On Wed, Jul 22, 2015 at 7:30 PM, Sarah Goslee sarah.gos...@gmail.com
 wrote:

 On Wed, Jul 22, 2015 at 9:50 AM, Jomy Jose infoj...@gmail.com wrote:
  How to calculate the sensitivity,specificity,Youden index for 18 factors
  and their combination (6 factors in each) with an outcome measure.

 www.rseek.org turns up a bunch of references to Youden index,
 including packages.

 Without a reproducible example that includes some sample data (fake is
 fine), the code you used, and some clear idea of what output you
 expect, it's impossible to figure out how to help you. Here are some
 suggestions for creating a good reproducible example:

 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

 Sarah

 --
 Sarah Goslee
 http://www.functionaldiversity.org

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Langelier-Ludwig Plots

2015-07-22 Thread Boris Steipe
According to  ...
   http://info.ngwa.org/gwol/pdf/721000139.PDF (Graphical Interpretation of 
Water Quality Data)
... a Langelier-Ludwig plot is simply a scatterplot of cations vs. anions (in 
percent).
Surely that would be beyond trivial to produce in R. Or am I missing a subtle 
something?


Cheers,
B.



On Jul 22, 2015, at 11:50 AM, Rich Shepard rshep...@appl-ecosys.com wrote:

  My web search for an R package producing Langelier-Ludwig plots found no
 hits. Has this been implemented in base graphics, lattice, ggplot2, or
 another package?
 
  The reference:
 
 Langelier, W., and Ludwig, H., 1942, Graphical methods for indicating the
 mineral character of natural waters: J. Am. Water Ass., 34, p. 335-352.
 
 Rich
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to simulate informative censoring in a Cox PH model?

2015-07-22 Thread Daniel Meddings
I wish to simulate event times where the censoring is informative, and to
compare parameter estimator quality from a Cox PH model with estimates
obtained from event times generated with non-informative censoring. However
I am struggling to do this, and I conclude rather than a technical flaw in
my code I instead do not understand what is meant by informative and
un-informative censoring.

My approach is to simulate an event time T dependent on a vector of
covariates x having hazard function h(t|x)=lambda*exp(beta'*x)v*t^{v-1}.
This corresponds to T~ Weibull(lambda(x),v), where the scale parameter
lambda(x)=lambda*exp(beta'*x) depends on x and the shape parameter v is
fixed. I have N subjects where T_{i}~ Weibull(lambda(x_{i}),v_{T}),
lambda(x_{i})=lambda_{T}*exp(beta_{T}'*x_{i}), for i=1,...,N. Here I assume
the regression coefficients are p-dimensional.

I generate informative censoring times C_i~ Weibull(lambda(x_i),v_C),
lambda(x_i)=lambda_C*exp(beta_C'*x_i) and compute Y_inf_i=min(T_i,C_i) and
a censored flag delta_inf_i=1 if Y_inf_i = C_i (an observed event), and
delta_inf_i=0 if Y_inf_i  C_i (informatively censored: event not
observed). I am convinced this is informative censoring because as long as
beta_T~=0 and beta_C~=0 then for each subject the data generating process
for T and C both depend on x.

In contrast I generate non-informative censoring times
D_i~Weibull(lambda_D*exp(beta_D),v_D), and compute Y_ninf_i=min(T_i,D_i)
and a censored flag delta_ninf_i=1 if Y_ninf_i = D_i (an observed event),
and delta_ninf_i=0 if Y_ninf_i  D_i (non-informatively censored: event not
observed). Here beta_D is a scalar. I scale the simulation by choosing
the lambda_T, lambda_C and lambda_D parameters such that on average T_iC_i
and T_iD_i to achieve X% of censored subjects for both Y_inf_i and
Y_ninf_i.

The problem is that even for say 30% censoring (which I think is high), the
Cox PH parameter estimates using both Y_inf and Y_ninf are unbiased when I
expected the estimates using Y_inf to be biased, and I think I see why:
however different beta_C is from beta_T, a censored subject can presumably
influence the estimation of beta_T only by affecting the set of subjects at
risk at any time t, but this does not change the fact that every single
Y_inf_i with delta_inf_i=1 will have been generated using beta_T only. Thus
I do not see how my simulation can possibly produce biased estimates for
beta_T using Y_inf.

But then what is informative censoring if not based on this approach?

Any help would be greatly appreciated.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R-es] ERROR: subscript out of bounds

2015-07-22 Thread Carlos Ortega
Hola,

Te adjunto un nuevo código comentado, que funciona y que incluye
explicaciones de porqué el error que obtenías:

#-
library(ape)
my.phylo-read.tree(150_BootstrapConsensusTree_Cantabrian_ML.nwk)
my.sample - read.delim(my.sample.txt)
my.sample-read.table(my.sample.txt,sep=\t, header=T, row.names=1)

#Transformo a matriz my.sample
my.sampleMat - as.matrix(my.sample)

#Cambio los nombres de las columnas para que tengan el mismo tipo de
formato que las de dist.mat
#Los puntos por guion bajo.
library(stringr)
colnames(my.sampleMat) - str_replace_all(colnames(my.sampleMat), \\.,_)

#dist.mat es la matriz de distancias de referencia sobre la que comparo
my.sample.
dist.mat-cophenetic(my.phylo)

#Modifico nombres de filas y columnas de dist.mat quito acento
library(stringr)
colnames(dist.mat) - str_replace_all(colnames(dist.mat), \',)
rownames(dist.mat) - str_replace_all(rownames(dist.mat), \',)

#Solo me quedo de my.sample con las columnas con nombres que estan en
dist.mat
#Si hay un nombre de una especie que no esta en dist.mat no se puede
#calcular la distancia y aparece el error.
my.sampleGood - my.sampleMat[, intersect(colnames(my.sampleMat),
colnames(dist.mat))]

#De 54 columnas paso a 13 en my.sample.
#Ahora my.sampleGood está ya listo para poder aplicarle la funcion y
calcular distancias.

library(SDMTools)

sntd.a.function - function(x){
  com.names - names(x[x  0])# Get de names of the
species present in a community
  my.com.dist - dist.mat[com.names, com.names]
  diag(my.com.dist) = NA  # Diagonal values to
NA - Matriz sim, diagonal zero
  wt.sd(apply(my.com.dist,1,min,na.rm=T), x[x0])
}

#Aplico funcion.
proba-apply(my.sampleGood, MARGIN = 1, sntd.a.function)
#Resultado:
#La Preste  Toran  Els Ports
#0.07185927 0.07129029 0.04948652

#-


Saludos,
Carlos Ortega
www.qualityexcellence.es


2015-07-22 10:40 GMT+02:00 nuria bs lapt...@hotmail.com:

 Buenas,

 Cuando intento ejecutar el siguiente codigo, me aparece el error  Error
 in dist.mat[com.names, com.names] : subscript out of bounds. Porque me
 aparece este error? Y como puedo solucionarlo?
 Gracias!!!

 *CONSOLE:*

  my.phylo-read.tree(150_BootstrapConsensusTree_Cantabrian_ML.nwk) 
  my.sample - read.delim(C:/Filogenia/my.sample.txt) 
  my.sample-read.table(my.sample.txt,sep=\t, header=T, row.names=1)

  library(SDMTools)
  sntd.a.function - function(x){   com.names - names(x[x  0])  
# Get de names of the species present in a community   
  my.com.dist - dist.mat[com.names, com.names]   # Distance matrix   
  diag(my.com.dist) = NA  # Diagonal values to NA 
  - Matriz sim, diagonal zero   wt.sd(apply(my.com.dist,1,min,na.rm=T), 
  x[x0])  }

  dist.mat-cophenetic(my.phylo) proba-apply(my.sample, MARGIN = 1, 
  sntd.a.function)

 Error in dist.mat[com.names, com.names] : subscript out of bounds


 ___
 R-help-es mailing list
 R-help-es@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-help-es




-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] Read .xlsx and convert date-column value into Dataframe

2015-07-22 Thread Ivan Calandra

Hi Antony,

I am not sure it could work easily with package xlsx. Try using the 
function read_excel() from package readxl. This function allows for 
Dates to be read.


HTH,
Ivan

--
Ivan Calandra, ATER
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calan...@univ-reims.fr
https://www.researchgate.net/profile/Ivan_Calandra

Le 22/07/15 14:09, R_Antony a écrit :

Hi,

Here i am having a .xlsx file and it contains various columns including
date-column[mm/dd/yy]-but it is not in the date format. I have to read this
excel[.xlsx] file and need to get in dataframe. So i used xlsx-liabrary
and it was fine to read data. But the problem is, values in the date column
is converting to some other value.

for eg:-

FF DATE
---
3/31/2016
2/26/2016
--
1/2/2016

[Values like -- will come in the column to indicate that there is no date
mentioned ]

and i getting result like this,

FF DATE
---
42460
42426

42125

this is the code i am using for it,

theData-data.frame(read.xlsx2(InputFilePath, sheetIndex,
sheetName=Workflow_Report, startRow=3,colIndex=NULL, endRow=NULL,
as.data.frame=TRUE, header=TRUE))

Aim :- I have to get actual date-column values in dataframe from xlsx
file.

I tried many ways, Could someone please help ?

Thanks in advance,
Antony.




--
View this message in context: 
http://r.789695.n4.nabble.com/Read-xlsx-and-convert-date-column-value-into-Dataframe-tp4710192.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Langelier-Ludwig Plots

2015-07-22 Thread Rich Shepard

  My web search for an R package producing Langelier-Ludwig plots found no
hits. Has this been implemented in base graphics, lattice, ggplot2, or
another package?

  The reference:

Langelier, W., and Ludwig, H., 1942, Graphical methods for indicating the
mineral character of natural waters: J. Am. Water Ass., 34, p. 335-352.

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in rownames

2015-07-22 Thread JIALU YAO
Hello,

When I type the code *rownames(gene_exp_matrix)-levels(geneidfactor)*,the
error happens: Error in rownames-(*tmp*,value=character(0)):can not set
object to 'rownames'.

I am using Mac OS X10.10.2.

How can I fix this problem?

Thank a lot.

Yao



--
View this message in context: 
http://r.789695.n4.nabble.com/Error-in-rownames-tp4710183.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in rownames

2015-07-22 Thread Sarah Goslee
Without a reproducible example that includes some sample data (fake is
fine), the code you used, and some clear idea of what output you
expect, it's impossible to figure out how to help you. Here are some
suggestions for creating a good reproducible example:
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example



On Wed, Jul 22, 2015 at 3:50 AM, JIALU YAO yaoji...@foxmail.com wrote:
 Hello,

 When I type the code *rownames(gene_exp_matrix)-levels(geneidfactor)*,the
 error happens: Error in rownames-(*tmp*,value=character(0)):can not set
 object to 'rownames'.

 I am using Mac OS X10.10.2.

 How can I fix this problem?

 Thank a lot.

 Yao



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Error-in-rownames-tp4710183.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read .xlsx and convert date-column value into Dataframe

2015-07-22 Thread John McKown
Those numbers are a serial number of days. A value of 1 maps to Jan 1,
1900. ref:
https://support.office.com/en-za/article/DATE-function-e36c0c8c-4104-49da-ab83-82328b832349

A formula such as: as.Date('1900-01-01')+excel_date-1 should convert the
serial value to a date value.

On Wed, Jul 22, 2015 at 7:09 AM, R_Antony antony.akk...@ge.com wrote:

 Hi,

 Here i am having a .xlsx file and it contains various columns including
 date-column[mm/dd/yy]-but it is not in the date format. I have to read this
 excel[.xlsx] file and need to get in dataframe. So i used xlsx-liabrary
 and it was fine to read data. But the problem is, values in the date column
 is converting to some other value.

 for eg:-

 FF DATE
 ---
 3/31/2016
 2/26/2016
 --
 1/2/2016

 [Values like -- will come in the column to indicate that there is no date
 mentioned ]

 and i getting result like this,

 FF DATE
 ---
 42460
 42426

 42125

 this is the code i am using for it,

 theData-data.frame(read.xlsx2(InputFilePath, sheetIndex,
 sheetName=Workflow_Report, startRow=3,colIndex=NULL, endRow=NULL,
 as.data.frame=TRUE, header=TRUE))

 Aim :- I have to get actual date-column values in dataframe from xlsx
 file.

 I tried many ways, Could someone please help ?

 Thanks in advance,
 Antony.




 --
 View this message in context:
 http://r.789695.n4.nabble.com/Read-xlsx-and-convert-date-column-value-into-Dataframe-tp4710192.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Schrodinger's backup: The condition of any backup is unknown until a
restore is attempted.

Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.

He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! 
John McKown

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] SDR package for Subgroup Discovery on CRAN

2015-07-22 Thread amgv0009

Package SDR for subgroup discovery data mining is available on CRAN now. 





More info of the package on the CRAN page:

https://cran.r-project.org/web/packages/SDR/index.html


Ángel.
[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in rownames

2015-07-22 Thread David Winsemius

On Jul 22, 2015, at 12:50 AM, JIALU YAO wrote:

 Hello,
 
 When I type the code *rownames(gene_exp_matrix)-levels(geneidfactor)*,the
 error happens: Error in rownames-(*tmp*,value=character(0)):can not set
 object to 'rownames'.
 
 I am using Mac OS X10.10.2.
 
 How can I fix this problem?
 

The error suggests that levels(geneidfactor) is returning a vector of length 0.

How do you fix that? The first thing I would do is check to see if spelling 
might be an issue and then see what str(geneidfactor) returns. If this process 
makes any sense (noting that the order of a dataframe factor variable might not 
be the same as the level-names), then my guess is that you failed to include 
the dataframe name where the factor vector resides.
 

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read .xlsx and convert date-column value into Dataframe

2015-07-22 Thread jim holtman
forgot the reply to all:

These are serial dates within EXCEL.  Here is a way of converting them:

 as.Date(c(42460, 42426), origin = '1899-12-30')
[1] 2016-03-31 2016-02-26



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Wed, Jul 22, 2015 at 11:51 AM, Ivan Calandra ivan.calan...@univ-reims.fr
 wrote:

 Hi Antony,

 I am not sure it could work easily with package xlsx. Try using the
 function read_excel() from package readxl. This function allows for Dates
 to be read.

 HTH,
 Ivan

 --
 Ivan Calandra, ATER
 University of Reims Champagne-Ardenne
 GEGENAA - EA 3795
 CREA - 2 esplanade Roland Garros
 51100 Reims, France
 +33(0)3 26 77 36 89
 ivan.calan...@univ-reims.fr
 https://www.researchgate.net/profile/Ivan_Calandra

 Le 22/07/15 14:09, R_Antony a écrit :

 Hi,

 Here i am having a .xlsx file and it contains various columns including
 date-column[mm/dd/yy]-but it is not in the date format. I have to read
 this
 excel[.xlsx] file and need to get in dataframe. So i used xlsx-liabrary
 and it was fine to read data. But the problem is, values in the date
 column
 is converting to some other value.

 for eg:-

 FF DATE
 ---
 3/31/2016
 2/26/2016
 --
 1/2/2016

 [Values like -- will come in the column to indicate that there is no
 date
 mentioned ]

 and i getting result like this,

 FF DATE
 ---
 42460
 42426

 42125

 this is the code i am using for it,

 theData-data.frame(read.xlsx2(InputFilePath, sheetIndex,
 sheetName=Workflow_Report, startRow=3,colIndex=NULL, endRow=NULL,
 as.data.frame=TRUE, header=TRUE))

 Aim :- I have to get actual date-column values in dataframe from xlsx
 file.

 I tried many ways, Could someone please help ?

 Thanks in advance,
 Antony.




 --
 View this message in context:
 http://r.789695.n4.nabble.com/Read-xlsx-and-convert-date-column-value-into-Dataframe-tp4710192.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Langelier-Ludwig Plots

2015-07-22 Thread Rich Shepard

On Wed, 22 Jul 2015, Boris Steipe wrote:


According to  ...
  http://info.ngwa.org/gwol/pdf/721000139.PDF (Graphical Interpretation of 
Water Quality Data)
... a Langelier-Ludwig plot is simply a scatterplot of cations vs. anions (in 
percent).
Surely that would be beyond trivial to produce in R. Or am I missing a subtle 
something?


Boris,

  Yes, that's what it is. I'll see just how trivial it is to produce it in
the four-quadrant format I've seen used. It's a different layout from the
basic two-variable scatterplot.

  Thanks for the URL. I worked from a reference in a 2003 paper to the
original 1942 paper.

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Statistical distribution not fitting

2015-07-22 Thread Boris Steipe
Try:

qqnorm(log(mydat))

That doesn't look too bad, does it? Now: where is the problem?

Cheers,
B.


On Jul 22, 2015, at 12:41 PM, Amelia Marsh amelia_mars...@yahoo.com wrote:

 Hello! 
 
 (I dont know if I can raise this query here on this forum, but I had already 
 raised on teh finance forum, but have not received any sugegstion, so now 
 raising on this list. Sorry for the same. The query is about what to do, if 
 no statistical distribution is fitting to data).
 
 I am into risk management and deal with Operatioanl risk. As a part of BASEL 
 II guidelines, we need to arrive at the capital charge the banks must set 
 aside to counter any operational risk, if it happens. As a part of Loss 
 Distribution Approach (LDA), we need to collate past loss events and use 
 these loss amounts. The usual process as being practised in the industry is 
 as follows - 
 
 Using these historical loss amounts and using the various statistical tests 
 like KS test, AD test, PP plot, QQ plot etc, we try to identify best 
 statistical (continuous) distribution fitting this historical loss data. Then 
 using these estimated parameters w.r.t. the statistical distribution, we 
 simulate say 1 miliion loss anounts and then taking appropriate percentile 
 (say 99.9%), we arrive at the capital charge. 
 
 However, many a times, loss data is such that fitting of distribution to loss 
 data is not possible. May be loss data is multimodal or has significant 
 variability, making the fitting of distribution impossible. Can someone guide 
 me how to deal with such data and what can be done to simulate losses using 
 this historical loss data in R. 
 
 My data is as follows - 
 
 mydat - 
 c(829.53,4000,6000,1000,1063904,102400,22000,4000,4200,2000,1,400, 
 459006, 7276,4000,100,4000,1,613803.36, 
 825,1000,5000,4000,3000,84500,200, 2000,68000,97400,6267.8, 
 49500,27000,2100,10489.92,2200,2000,2000,1000,1900, 
 6000,5600,100,4000,14300,100,94100,1200,7000,2000,3000,1100,6900,1000,18500,6000,2000,4000,8400,11200,1000,15100,23300,4000,13100,4500,200,2000,5,3900,3200,2000,2000,67000,2000,500,2000,1000,1900,10400,1900,2000,3200,6500,1,2900,1000,14300,1000,2700,1500,12000,4,25000,2800,5000,15000,4000,1000,21000,15000,16000,54000,1500,19200,2000,2000,1000,39000,5000,1100,18000,1,3500,1000,1,5000,14000,1800,4000,1000,300,4000,1000,100,1000,4400,2000,2000,12000,200,100,1000,1000,2000,1600,2000,4000,14000,4000,13500,1000,200,200,1000,18000,23000,41400,6,500,3000,21000,6900,14600,1900,4000,4500,1000,2000,2000,1000,4100,2000,1000,2000,8000,3000,1500,2000,2000,3500,2000,2000,1000,3800,3,55000,500,1000,1000,2000,62400,2000,3000,200,2!
 00!
 ! 
 0,3500,2000,2000,500,3000,4500,1000,1,2000,3000,3600,1000,2000,2000,5000,23000,2000,1900,2000,6,2000,6,2,2000,2000,4600,1000,2000,1000,18000,6000,62000,68000,26800,5,45900,16900,21500,2000,22700,2000,2000,32000,1,5000,138000,159700,13000,2000,17619,2000,1000,4000,2000,1500,4000,2,158900,74100,6000,24900,6,500,1000,4,1,5,800,4000,4900,6500,5000,400,500,3000,32300,24000,300,11500,2000,5000,1000,500,5000,5500,17450,56800,2000,1000,21400,22000,6,3000,7500,3000,1000,1000,2000,1500,83700,2000,4000,170005,7,6700,1500,3500,2000,10563.97,1500,25000,2000,2000,2267.57,1100,3100,2000,3500,1,2000,6000,1500,200,2,4000,46400,296900,15,3700,7500,2,48500,3500,12000,2500,4000,8500,1000,14500,1000,11000,2000,2000,12,2,7600,3000,2000,8000,1600,4,2000,5000,34187.67,279100,9900,31300,814000,43500,5100,49500,4500,6262.38,100,10400,2400,1500,5000,2500,15000,4,32500,41100,358600,109600,514300,258200,225900,402700,2!
 7! 
 4300,75000,1000,56000,1,4100,1000,15000,100,4,7900,5000,105000 
 ,15100,2000,1100,2900,1500,600,500,1300,100,5000,5000,1,10100,7000,4,10500,5000,9500,1000,15200,2000,1,1,100,7800,3500,189900,58000,345000,151700,11000,6000,7000,15700,6000,3000,5000,1,2000,1000,36000,1000,500,8000,9000,6000,2000,26500,6000,5000,97200,2000,5100,17000,2500,25500,24000,5400,9,41500,6200,7500,5000,7000,41000,25000,1500,4,5000,1,21500,100,32000,32500,7,500,66400,21000,5000,5000,12600,3000,6200,38900,1,1000,6,41100,1200,31300,2500,58000,4100,58000,42500)
  
 
 Sorry for the inconvenience. I do understand fitting of distribution to such 
 data is not a full proof method, but this is what is the procedure that has 
 been followed in the risk management risk industry. Please note that my 
 question is not pertaining to operational risk. My question is if 
 distributions are not fitting to a particular data, how do we proceed further 
 to simualte data based on this data. 
 
 Regards 
 
 Amelia Marsh
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, 

Re: [R] Statistical distribution not fitting

2015-07-22 Thread Ben Bolker
Amelia Marsh amelia_marsh08 at yahoo.com writes:

 
 Hello!  (I dont know if I can raise this query here on this forum,
 but I had already raised on teh finance forum, but have not received
 any sugegstion, so now raising on this list. Sorry for the same. The
 query is about what to do, if no statistical distribution is fitting
 to data).
 
 I am into risk management and deal with Operatioanl risk. As a part
 of BASEL II guidelines, we need to arrive at the capital charge the
 banks must set aside to counter any operational risk, if it
 happens. As a part of Loss Distribution Approach (LDA), we need to
 collate past loss events and use these loss amounts. The usual
 process as being practised in the industry is as follows -
 
 Using these historical loss amounts and using the various
 statistical tests like KS test, AD test, PP plot, QQ plot etc, we
 try to identify best statistical (continuous) distribution fitting
 this historical loss data. Then using these estimated parameters
 w.r.t. the statistical distribution, we simulate say 1 miliion loss
 anounts and then taking appropriate percentile (say 99.9%), we
 arrive at the capital charge.
 
 However, many a times, loss data is such that fitting of
 distribution to loss data is not possible. May be loss data is
 multimodal or has significant variability, making the fitting of
 distribution impossible.  Can someone guide me how to deal with such
 data and what can be done to simulate losses using this historical
 loss data in R.
 
A skew-(log)-normal fit doesn't look too bad ... (whenever you
have positive data that are this strongly skewed, log-transforming
is a good step)

hist(log10(mydat),col=gray,breaks=FD,freq=FALSE)
## default breaks are much coarser:
## hist(log10(mydat),col=gray,breaks=Sturges,freq=FALSE)
lines(density(log10(mydat)),col=2,lwd=2)
library(fGarch)
ss - snormFit(log10(mydat))
xvec - seq(2,6.5,length=101)
lines(xvec,do.call(dsnorm,c(list(x=xvec),as.list(ss$par))),
  col=blue,lwd=2)
## or try a skew-Student-t: not very different:
ss2 - sstdFit(log10(mydat))
lines(xvec,do.call(dsstd,c(list(x=xvec),as.list(ss2$estimate))),
  col=purple,lwd=2)

There are more flexible distributional families (Johnson,
log-spline ...)

Multimodal data are a different can of worms -- consider
fitting a finite mixture model ...

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Statistical distribution not fitting

2015-07-22 Thread Amelia Marsh
Hello! 

(I dont know if I can raise this query here on this forum, but I had already 
raised on teh finance forum, but have not received any sugegstion, so now 
raising on this list. Sorry for the same. The query is about what to do, if no 
statistical distribution is fitting to data).

I am into risk management and deal with Operatioanl risk. As a part of BASEL II 
guidelines, we need to arrive at the capital charge the banks must set aside to 
counter any operational risk, if it happens. As a part of Loss Distribution 
Approach (LDA), we need to collate past loss events and use these loss amounts. 
The usual process as being practised in the industry is as follows - 

Using these historical loss amounts and using the various statistical tests 
like KS test, AD test, PP plot, QQ plot etc, we try to identify best 
statistical (continuous) distribution fitting this historical loss data. Then 
using these estimated parameters w.r.t. the statistical distribution, we 
simulate say 1 miliion loss anounts and then taking appropriate percentile (say 
99.9%), we arrive at the capital charge. 

However, many a times, loss data is such that fitting of distribution to loss 
data is not possible. May be loss data is multimodal or has significant 
variability, making the fitting of distribution impossible. Can someone guide 
me how to deal with such data and what can be done to simulate losses using 
this historical loss data in R. 

My data is as follows - 

mydat - c(829.53,4000,6000,1000,1063904,102400,22000,4000,4200,2000,1,400, 
459006, 7276,4000,100,4000,1,613803.36, 825,1000,5000,4000,3000,84500,200, 
2000,68000,97400,6267.8, 49500,27000,2100,10489.92,2200,2000,2000,1000,1900, 
6000,5600,100,4000,14300,100,94100,1200,7000,2000,3000,1100,6900,1000,18500,6000,2000,4000,8400,11200,1000,15100,23300,4000,13100,4500,200,2000,5,3900,3200,2000,2000,67000,2000,500,2000,1000,1900,10400,1900,2000,3200,6500,1,2900,1000,14300,1000,2700,1500,12000,4,25000,2800,5000,15000,4000,1000,21000,15000,16000,54000,1500,19200,2000,2000,1000,39000,5000,1100,18000,1,3500,1000,1,5000,14000,1800,4000,1000,300,4000,1000,100,1000,4400,2000,2000,12000,200,100,1000,1000,2000,1600,2000,4000,14000,4000,13500,1000,200,200,1000,18000,23000,41400,6,500,3000,21000,6900,14600,1900,4000,4500,1000,2000,2000,1000,4100,2000,1000,2000,8000,3000,1500,2000,2000,3500,2000,2000,1000,3800,3,55000,500,1000,1000,2000,62400,2000,3000,200,200!
 ! 
0,3500,2000,2000,500,3000,4500,1000,1,2000,3000,3600,1000,2000,2000,5000,23000,2000,1900,2000,6,2000,6,2,2000,2000,4600,1000,2000,1000,18000,6000,62000,68000,26800,5,45900,16900,21500,2000,22700,2000,2000,32000,1,5000,138000,159700,13000,2000,17619,2000,1000,4000,2000,1500,4000,2,158900,74100,6000,24900,6,500,1000,4,1,5,800,4000,4900,6500,5000,400,500,3000,32300,24000,300,11500,2000,5000,1000,500,5000,5500,17450,56800,2000,1000,21400,22000,6,3000,7500,3000,1000,1000,2000,1500,83700,2000,4000,170005,7,6700,1500,3500,2000,10563.97,1500,25000,2000,2000,2267.57,1100,3100,2000,3500,1,2000,6000,1500,200,2,4000,46400,296900,15,3700,7500,2,48500,3500,12000,2500,4000,8500,1000,14500,1000,11000,2000,2000,12,2,7600,3000,2000,8000,1600,4,2000,5000,34187.67,279100,9900,31300,814000,43500,5100,49500,4500,6262.38,100,10400,2400,1500,5000,2500,15000,4,32500,41100,358600,109600,514300,258200,225900,402700,27!
 
4300,75000,1000,56000,1,4100,1000,15000,100,4,7900,5000,105000 
,15100,2000,1100,2900,1500,600,500,1300,100,5000,5000,1,10100,7000,4,10500,5000,9500,1000,15200,2000,1,1,100,7800,3500,189900,58000,345000,151700,11000,6000,7000,15700,6000,3000,5000,1,2000,1000,36000,1000,500,8000,9000,6000,2000,26500,6000,5000,97200,2000,5100,17000,2500,25500,24000,5400,9,41500,6200,7500,5000,7000,41000,25000,1500,4,5000,1,21500,100,32000,32500,7,500,66400,21000,5000,5000,12600,3000,6200,38900,1,1000,6,41100,1200,31300,2500,58000,4100,58000,42500)
 

Sorry for the inconvenience. I do understand fitting of distribution to such 
data is not a full proof method, but this is what is the procedure that has 
been followed in the risk management risk industry. Please note that my 
question is not pertaining to operational risk. My question is if distributions 
are not fitting to a particular data, how do we proceed further to simualte 
data based on this data. 

Regards 

Amelia Marsh

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2 - Specifying Colors Manually

2015-07-22 Thread Abiram Srivatsa
Hi,

Given a data frame, I'm trying to graph multiple lines on one graph, each
line being a different color and each colored line corresponding to a
specific name in the legend. Here is a very basic data sample to work with:

 x - seq(0,40,10)
 y1 - sample(1:50,5)
 y2 - sample(1:50,5)

 mydf - data.frame(x,y1,y2)

 p - ggplot(mydf,aes(x=mydf$x)) +
geom_line(aes(y=mydf$y1,colour=green4)) +
geom_line(aes(y=mydf$y2,colour=blue2))  +

 scale_color_manual(name=legend,values=c(y1=green4,y2=blue2))


 p


When I run this, the entire plot is blank. What I WANT to show up is two
lines, one being the color of green4 and the other being blue2. Besides
that, I'm trying to associate the colors with the names y1 and y2 in
the legend, but my codes don't seem to be working.

I'm very new to R/ggplot2, and I really appreciate any and all help I can
get.

Thank you!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 - Specifying Colors Manually

2015-07-22 Thread Hadley Wickham
Try this:

ggplot(mydf,aes(x)) +
  geom_line(aes(y = y1, colour = y1)) +
  geom_line(aes(y = y2, colour = y2))  +
  scale_color_manual(values = c(y1 = green4, y2 = blue2))

Note that you don't need to use `mydf` and names in the manual scale
should match the values in the aes() calls.
Alsoit'smucheasiertoreadyourcodeifyouusespaces;)

Hadley

On Wed, Jul 22, 2015 at 1:13 PM, Abiram Srivatsa avsriva...@gmail.com wrote:
 Hi,

 Given a data frame, I'm trying to graph multiple lines on one graph, each
 line being a different color and each colored line corresponding to a
 specific name in the legend. Here is a very basic data sample to work with:

  x - seq(0,40,10)
  y1 - sample(1:50,5)
  y2 - sample(1:50,5)

  mydf - data.frame(x,y1,y2)

  p - ggplot(mydf,aes(x=mydf$x)) +
 geom_line(aes(y=mydf$y1,colour=green4)) +
 geom_line(aes(y=mydf$y2,colour=blue2))  +

  scale_color_manual(name=legend,values=c(y1=green4,y2=blue2))


  p


 When I run this, the entire plot is blank. What I WANT to show up is two
 lines, one being the color of green4 and the other being blue2. Besides
 that, I'm trying to associate the colors with the names y1 and y2 in
 the legend, but my codes don't seem to be working.

 I'm very new to R/ggplot2, and I really appreciate any and all help I can
 get.

 Thank you!

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Ordering in Sankey diagram using R and googleVis

2015-07-22 Thread Angela via R-help
Hello,

I am trying to figure out if there is a way to order the left side of a Sankey 
diagram from most frequent to least frequent. I am using R version 3.2.1 and 
using googleVis version 0.5.9 for the Sankey. I've tried sorting, but that does 
not work. Is there anyway to force it to arrange the left (before) side in 
decreasing frequency? Something I am missing? Does not have to be using 
googleVis. Thank you!

-Angela

Example of the data I have, in a csv file:

before    after
A    B
A    B
A    B
A    C
A    A
A    A
A    B
D    E
F    B
F    B
F    F
G    H
G    A

I reformat the data in R so it looks like this:

before    after    count
A    B    4
A    C    1
A    A    2
D    E    1
F    B    2
F    F    1
G    H    1
G    A    1

Then plot using this:
plot( gvisSankey (data, from=before, to=after, weight=freq, 
options=list(width=600, height=800, 
    sankey={iterations: 2})))

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Same results in arules after changing support and confidence

2015-07-22 Thread jckhmmr
Hello, everyone!

I'm new to R, so I'm sorry in advance if it's something obvious, but I still
can't figure it out.

I've started experimenting with apriori and everything was working fine,
i.e. I changed support and confidence and results were different depending
on combination.

But on the next after running the same code and same dataset I'm getting 0
rules no matter which combination I'm using. 

My dataset looks like this (foodmarket data):

Trans_Id,Product
3694728,Washington Berry Juice
3779788,Washington Berry Juice
414,Washington Berry Juice
4405313,Washington Berry Juice
etc.

My code is:

library('arules')
transactions -
read.transactions(file=transactions.csv,format=single,sep=,,cols=c(1,2),rm.duplicates=false)

basket_rules - apriori(transactions, parameter = list(sup = 0.05, conf =
0.01, target=rules,minlen=2))

With following results:
http://postimg.org/image/8qw4e1hw7/

And another one with different parameters but same result:
http://postimg.org/image/b9k7xfq91/


So, I have no idea how is that possible. Any input will be appreciated.

Thank you.



--
View this message in context: 
http://r.789695.n4.nabble.com/Same-results-in-arules-after-changing-support-and-confidence-tp4710218.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] model non-integer count outcomes

2015-07-22 Thread Don McKenzie
Sorry. Central limit theorem. Enough averaging and you get a normal 
distribution (simply stated, perhaps too simply). If so others will correct me 
before long.  :-(

Sent from my iPad

 On Jul 21, 2015, at 8:52 PM, Wensui Liu liuwen...@gmail.com wrote:
 
 what does CLT stand for?
 
 On Tue, Jul 21, 2015 at 11:41 PM, Don McKenzie d...@u.washington.edu wrote:
 Or if there are enough averages of enough counts, the CLT provides another 
 option.
 
 On Jul 21, 2015, at 8:38 PM, David Winsemius dwinsem...@comcast.net wrote:
 
 
 On Jul 21, 2015, at 8:21 PM, Wensui Liu wrote:
 
 Dear Lister
 When the count outcomes are integers, we could use either Poisson or
 NB regression to model them. However, there are cases that the count
 outcomes are non-integers, e.g. average counts.
 I am wondering if it still makes sense to use Poisson or NB regression
 to model these non-integer outcomes.
 
 There is a quasi-binomial error model that accepts non-integer outcomes.
 
 -- 
 
 David Winsemius
 Alameda, CA, USA
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 
 
 -- 
 WenSui Liu
 https://statcompute.wordpress.com/
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] model non-integer count outcomes

2015-07-22 Thread Don McKenzie
Or if there are enough averages of enough counts, the CLT provides another 
option.

 On Jul 21, 2015, at 8:38 PM, David Winsemius dwinsem...@comcast.net wrote:
 
 
 On Jul 21, 2015, at 8:21 PM, Wensui Liu wrote:
 
 Dear Lister
 When the count outcomes are integers, we could use either Poisson or
 NB regression to model them. However, there are cases that the count
 outcomes are non-integers, e.g. average counts.
 I am wondering if it still makes sense to use Poisson or NB regression
 to model these non-integer outcomes.
 
 There is a quasi-binomial error model that accepts non-integer outcomes.
 
 -- 
 
 David Winsemius
 Alameda, CA, USA
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] model non-integer count outcomes

2015-07-22 Thread Thierry Onkelinx
If you know the number of counts (n) used to calculate the average then you
can still use a poisson distribution.

Total = average * n
glm(total ~ offset(n), family = poisson)

​
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
Op 22 jul. 2015 08:38 schreef Don McKenzie d...@u.washington.edu:

 Or if there are enough averages of enough counts, the CLT provides another
 option.

  On Jul 21, 2015, at 8:38 PM, David Winsemius dwinsem...@comcast.net
 wrote:
 
 
  On Jul 21, 2015, at 8:21 PM, Wensui Liu wrote:
 
  Dear Lister
  When the count outcomes are integers, we could use either Poisson or
  NB regression to model them. However, there are cases that the count
  outcomes are non-integers, e.g. average counts.
  I am wondering if it still makes sense to use Poisson or NB regression
  to model these non-integer outcomes.
 
  There is a quasi-binomial error model that accepts non-integer outcomes.
 
  --
 
  David Winsemius
  Alameda, CA, USA
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] model non-integer count outcomes

2015-07-22 Thread Wensui Liu
Thanks Thierry
What if I don't know the n in the offset term?

On Wednesday, July 22, 2015, Thierry Onkelinx thierry.onkel...@inbo.be
wrote:

 If you know the number of counts (n) used to calculate the average then you
 can still use a poisson distribution.

 Total = average * n
 glm(total ~ offset(n), family = poisson)

 ​
 ir. Thierry Onkelinx
 Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
 Forest
 team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
 Kliniekstraat 25
 1070 Anderlecht
 Belgium

 To call in the statistician after the experiment is done may be no more
 than asking him to perform a post-mortem examination: he may be able to say
 what the experiment died of. ~ Sir Ronald Aylmer Fisher
 The plural of anecdote is not data. ~ Roger Brinner
 The combination of some data and an aching desire for an answer does not
 ensure that a reasonable answer can be extracted from a given body of data.
 ~ John Tukey
 Op 22 jul. 2015 08:38 schreef Don McKenzie d...@u.washington.edu
 javascript:;:

  Or if there are enough averages of enough counts, the CLT provides
 another
  option.
 
   On Jul 21, 2015, at 8:38 PM, David Winsemius dwinsem...@comcast.net
 javascript:;
  wrote:
  
  
   On Jul 21, 2015, at 8:21 PM, Wensui Liu wrote:
  
   Dear Lister
   When the count outcomes are integers, we could use either Poisson or
   NB regression to model them. However, there are cases that the count
   outcomes are non-integers, e.g. average counts.
   I am wondering if it still makes sense to use Poisson or NB regression
   to model these non-integer outcomes.
  
   There is a quasi-binomial error model that accepts non-integer
 outcomes.
  
   --
  
   David Winsemius
   Alameda, CA, USA
  
   __
   R-help@r-project.org javascript:; mailing list -- To UNSUBSCRIBE
 and more, see
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 
 
  __
  R-help@r-project.org javascript:; mailing list -- To UNSUBSCRIBE and
 more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org javascript:; mailing list -- To UNSUBSCRIBE and
 more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
WenSui Liu
https://statcompute.wordpress.com/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to make a persistence landscape in R?

2015-07-22 Thread Jim Lemon
Hi lstat,
The problem may be that as you are adding your connector triangles
there is a fill color. Try reprogramming with col=NA if you are using
polygon to draw the connectors.

Jim


On Wed, Jul 22, 2015 at 12:49 PM, David Winsemius
dwinsem...@comcast.net wrote:

 On Jul 21, 2015, at 10:39 AM, lstat wrote:

 Hi there,
 I am trying to construct a persistence landscape in R that shows all of the
 overlapping triangles -- not just the overall silhouette-- how can you do
 this?
 Everytime I plug in the code I get one big triangle that corresponds to the
 largest barcode, but I cannot see any of the smaller barcodes/ overlapping
 isosceles trianges.

 You need to get out more. Talk to people who don't share your somewhat 
 constricted view of the world. Learn to talk to a more diverse group of human 
 beings,... say statisticians who might never have heard of a persistent 
 landscape?

 It all sounds quite fascinating ... but there was no the code. Perhaps this 
 is due to the Nabble interface which masquerades as R-help but is only a 
 feeble imitation.


 Do read the fine posting guide:  http://www.R-project.org/posting-guide.html

 --

 David Winsemius
 Alameda, CA, USA

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] model non-integer count outcomes

2015-07-22 Thread peter dalgaard

 On 22 Jul 2015, at 06:48 , Don McKenzie d...@uw.edu wrote:
 
 Sorry. Central limit theorem.

Or some sort of vegetarian sandwich. Celery, Lettuce, Tomato sounds almost 
edible with sufficient mayo. ;-)

 Enough averaging and you get a normal distribution (simply stated, perhaps 
 too simply). If so others will correct me before long.  :-(

Well, your punctuation doesn't quite work -- ')' comes too early. Otherwise it 
is close enough for jazz, although there are distributions that you can average 
forever and still not get a normal, and some might want to stress that it is 
the parameter estimators that become approximately normal.  (Students sometimes 
get confused and believe that the original data magically become normally 
distributed when you have a lot of them.) 

In practice, one should ensure that one has many data for all the relevant 
averages (996 males and 4 females is no good), and also that one gets the 
variance structure at least roughly right.

-pd



 
 Sent from my iPad
 
 On Jul 21, 2015, at 8:52 PM, Wensui Liu liuwen...@gmail.com wrote:
 
 what does CLT stand for?
 
 On Tue, Jul 21, 2015 at 11:41 PM, Don McKenzie d...@u.washington.edu 
 wrote:
 Or if there are enough averages of enough counts, the CLT provides another 
 option.
 
 On Jul 21, 2015, at 8:38 PM, David Winsemius dwinsem...@comcast.net 
 wrote:
 
 
 On Jul 21, 2015, at 8:21 PM, Wensui Liu wrote:
 
 Dear Lister
 When the count outcomes are integers, we could use either Poisson or
 NB regression to model them. However, there are cases that the count
 outcomes are non-integers, e.g. average counts.
 I am wondering if it still makes sense to use Poisson or NB regression
 to model these non-integer outcomes.
 
 There is a quasi-binomial error model that accepts non-integer outcomes.
 
 -- 
 
 David Winsemius
 Alameda, CA, USA
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 
 
 -- 
 WenSui Liu
 https://statcompute.wordpress.com/
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] barplot -issues with axis and labels not appearing

2015-07-22 Thread Jim Lemon
Hi Pierre,
I get a reasonable plot using the following code:

par(mar=c(6,4,4,2))
barpos-barplot(unlist(GEP.data2),
 main=Global Portfolio Weights,
 col.main=gray, col=blues9,
 cex.axis=1, ylim=c(-1,1), las=2,
 cex.lab=1, cex=0.8)
axis(1,at=barpos,labels=rep(,8))

For one thing, you don't need the beside=TRUE argument as there is
only one vector of values to display. The small value for cex.axis
made the tick labels unreadable on my display. If you would like to
have the bar labels horizontal, have a look at the staxlab function
in the plotrix package.

Jim


On Tue, Jul 21, 2015 at 10:38 PM, Pierre Micallef
micallefpie...@hotmail.com wrote:
 Hi

 I am experiencing a few issues with the barplot function.

 I have written the following code;

   barplot(as.matrix(GEP.data2), beside=TRUE, main=Global Portfolio Weights, 
 col.main=gray, col=blues9,
   cex.axis=0.1, ylim=c(-1,1), las=2, cex.lab=1, cex=0.8)


 where;
 GEP.data2 =
   VGSIX.equity VUSTX.equity VGTSX.equity VFISX.equity VTSMX.equity 
 VFITX.equity VEIEX.equity VIPSX.equity
 1  -0.08645095   0.08991793   0.03548216 0.45 0.45 
 0.45   -0.1689109   -0.2200382 However i  am having the following issues; (1) 
 neither x or y axis appear on my graph (bars appear as if they are 
 floating)(2) no axis labels are showing up (3) chart title is too high on 
 graph and is being cut off from view Please can you help solve these issues? 
 Thanks Pierre
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sensitivity,Specificity and Youden Index

2015-07-22 Thread Jomy Jose
How to calculate the sensitivity,specificity,Youden index for 18 factors
and their combination (6 factors in each) with an outcome measure.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R-es] ERROR: subscript out of bounds

2015-07-22 Thread nuria bs
Buenas, 

Cuando intento ejecutar el siguiente codigo, me aparece el error  Error in 
dist.mat[com.names, com.names] : subscript out of bounds. Porque me aparece 
este error? Y como puedo solucionarlo? 
Gracias!!!

CONSOLE:
 my.phylo-read.tree(150_BootstrapConsensusTree_Cantabrian_ML.nwk)
 my.sample - read.delim(C:/Filogenia/my.sample.txt)
 my.sample-read.table(my.sample.txt,sep=\t, header=T, row.names=1)

 library(SDMTools)

 sntd.a.function - function(x){
   com.names - names(x[x  0])# Get de names of the 
species present in a community
   my.com.dist - dist.mat[com.names, com.names]   # Distance matrix
   diag(my.com.dist) = NA  # Diagonal values to NA 
- Matriz sim, diagonal zero
   wt.sd(apply(my.com.dist,1,min,na.rm=T), x[x0]) 
 }

 dist.mat-cophenetic(my.phylo)
 proba-apply(my.sample, MARGIN = 1, sntd.a.function)

Error in dist.mat[com.names, com.names] : subscript out of bounds   
  

150_BootstrapConsensusTree_Cantabrian_ML.nwk
Description: Binary data
LOC Amblystegium serpensAmblystegium subtileAnomodon attenuatus 
Anomodon viticulosusApometzgeria pubescens  Brachythecium populeum  
Brachythecium velutinum Bryum moravicum Cf. Hypnum sp.  Cryphaea heteromalla
Scleropodium cespitans  Eurhynchium sp. Kindbergia praelonga
Frullania dilatata  Frullania tamarisci Homalothecium sericeum  Hypnum 
cupressiforme var. cupressiforme Hypnum cupressiforme var. filiforme Hypnum 
recurvatum   Hypnum revolutumIsothecium alopecuroides
Jungermannia sp.Leptodon smithiiLeucodon sciuroides 
Lejeunea cavifolia  Lophocolea bidentataMetzgeria furcata   Mnium 
stellare  Neckera besseri Neckera complanata  Neckera crispa  Orthotrichum 
affine Orthotrichum lyelliiOrthotrichum speciosum  Orthotricum rupestre 
   Orthotrichum sp.Orthotrichum pallensOrthotrichum stramineum 
Orthotrichum striatum   Othotrichum aff. schimperi  Orthotrichum tenellum   
Plagiochila porelloides Plagiothecium nemorale  Porella arboris-vitae   Porella 
platyphylla Pseudoleskeella nervosa Pterigynandrum filiforme
Radula complanata   Rhynchostegiella curviseta  Scleropodium cespitans  
Thuidium delicatulumTortella flavovirens var. flavovirens   Ulota crispa
Tortella tortuosa
La Preste   0.5 0   0   0   12  0   0   0.5 
0   0   0   6   2   377 0   0   0   23.5
18  0   0   1   0   3   0   0   8   0   
0   0   0   65  0   76  52  107 41  0   
139.5   21  0   5   0   0   157 250 157.5   35.5
0   8   17  0   0   1.5
Toran   30  46  305 148 0   8   6   1.5 0   
0   0   8   0   451 49  491 28  27  102 
0   669 0   0   163 1   0   171 1   0   
1052.5  0   5   0   0   13  58.50   25  24  
0   3.5 5   30  0   701 6   504.5   332 24  
0   0   0   41.50
Els Ports   0   60  0   0   0   0   10  0   
20  8   0   0   0   465.5   12  126 0   65  
45  6   0   2   181 0   30  3   538 0   
37.52434.5  84  44.53   0   0   46.56   5   
45  0   0   21.50   100 56  0   82  180 
0   0   0   0   0   0
___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] R version update

2015-07-22 Thread Ista Zahn
On Jul 21, 2015 9:30 PM, kle...@sxmail.de wrote:

 Dear Ladies and Gentlemen,
 as a beginner in R, I encountered a sort of naturally given
limitsconcerning the process of amending R with packages.
 I apparently own version 3.0.2 and I principally decided to use R via
 the RKWard GUI on Linux Kubuntu Trusty Tahr; the R version installed on
 my computer is admittedly not far from the basics provided by my
 distribution's package management (also throughout multiple rounds of
 updates/ upgrades of the system).
 What would I have to do to finally update/ upgrade beyond that R
 version?

https://cran.r-project.org/bin/linux/ubuntu/

Would that process (generally?) affect any/ my GUI?

possibly.

 This problem appeared to me as I was busy installing packages an
 received the error message package ‘ATLAS’ is not available (for R
 version 3.0.2) .

I don't think there is any package on CRAN with that name. What makes you
think there is?

 I am quite desperate and would look forward to be indicated a path to a
 sustainable solution or be told how to mitigate/ circumvent (these/
 such) problems.

I'm not actually sure exactly what the problem is...

 Thanks a lot!

 Best regards,Markus Hofstetter


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fwd: Warning message with maxLik()

2015-07-22 Thread marammagdysalem


Sent from my iPhone

Begin forwarded message:

 From: Maram SAlem marammagdysa...@gmail.com
 Date: July 21, 2015 at 11:40:56 PM GMT+2
 To: Arne Henningsen arne.henning...@gmail.com
 Cc: r-help@r-project.org r-help@r-project.org
 Subject: Re: [R] Warning message with maxLik()
 
 Dear Arne,
 
 The elements of the theta vector are indeed strictly positive. I've just 
 tried to use instead : lamda = log (theta), which means that theta = exp 
 (lamda),  so as to get rid of the log() function that appears in the 
 log-likelihood and is causing the 50 warnings, but still the estimates I got 
 for lamda and then those I got for theta (using theta=exp(lamda)) are 
 irrelvant and their standard errors are infinite, which means that therer is 
 still a problem that I can't yet figure out.
 
 Thanks,
 Maram
 
 On 18 July 2015 at 08:01, Arne Henningsen arne.henning...@gmail.com wrote:
 Dear Maram
 
 - Please do not start a new thread for the same issue but reply to
 previous messages in this thread [1].
 
 - Please read my previous responses [1] more carefully, e.g. to use
 theta - exp( param ) which guarantees that all elements of theta
 are always positive.
 
 [1] 
 http://r.789695.n4.nabble.com/NaN-produced-from-log-with-positive-input-td4709463.html
 
 Best regards,
 Arne
 
 
 
 2015-07-18 2:46 GMT+02:00 Maram SAlem marammagdysa...@gmail.com:
  Dear All,
  I'm trying to get the MLe for a certain distribution using maxLik ()
  function. I wrote the log-likelihood function as follows:
  theta -vector(mode = numeric, length = 3)
  r- 17
  n -30
   
  T-c(7.048,0.743,2.404,1.374,2.233,1.52,23.531,5.182,4.502,1.362,1.15,1.86,1.692,11.659,1.631,2.212,5.451)
  C-
  c(0.562,5.69,12.603,3.999,6.156,4.004,5.248,4.878,7.122,17.069,23.996,1.538,7.792)
  # The  loglik. func.
  loglik - function(param) {
   theta[1]- param[1]
   theta[2]- param[2]
   theta[3]- param[3]
   
  l-(r*log(theta[3]))+(r*log(theta[1]+theta[2]))+(n*theta[3]*log(theta[1]))+(n*theta[3]*log(theta[2]))+
  (-1*(theta[3]+1))*sum(log((T*(theta[1]+theta[2]))+(theta[1]*theta[2])))+
  (-1*theta[3]*sum(log((C*(theta[1]+theta[2]))+(theta[1]*theta[2]
  return(l)
   }
 
  then, I evaluated it at theta- c(40,50,2)
 
  v-loglik(param=theta)
  v
  [1] -56.66653
 
  I used this same log-likelihood function, once with analytic gradient and
  another time with numerical one, with the maxLik function, and in both
  cases I got the same 50 warning messages and an MLE which is completely
  unrealistic as per my applied example.
 
  a - maxLik(loglik, gradlik, hesslik, start=c(40,50,2))
 
  where gradlik and hesslik are the analytic gradient and Hessian matrix,
  respectively, given by:
 
  U - vector(mode=numeric,length=3)
  gradlik-function(param = theta,n, T,C)
   {
  U - vector(mode=numeric,length=3)
  theta[1] - param[1]
  theta[2] - param[2]
  theta[3] - param[3]
  r- 17
  n -30
  T-c(7.048,0.743,2.404,1.374,2.233,1.52,23.531,5.182,4.502,1.362,1.15,1.86,1.692,11.659,1.631,2.212,5.451)
  C-
  c(0.562,5.69,12.603,3.999,6.156,4.004,5.248,4.878,7.122,17.069,23.996,1.538,7.792)
   U[1]- (r/(theta[1]+theta[2]))+((n*theta[3])/theta[1])+(
  -1*(theta[3]+1))*sum((T+theta[2])/((theta[1]+theta[2])*T+(theta[1]*theta[2])))+
  (-1*(theta[3]))*sum((C+theta[2])/((theta[1]+theta[2])*C+(theta[1]*theta[2])))
  U[2]-(r/(theta[1]+theta[2]))+((n*theta[3])/theta[2])+
  (-1*(theta[3]+1))*sum((T+theta[1])/((theta[1]+theta[2])*T+(theta[1]*theta[2])))+
  (-1*(theta[3]))*sum((C+theta[1])/((theta[1]+theta[2])*C+(theta[1]*theta[2])))
  U[3]-(r/theta[3])+(n*log(theta[1]*theta[2]))+
  (-1)*sum(log((T*(theta[1]+theta[2]))+(theta[1]*theta[2])))+(-1)*sum(log((C*(theta[1]+theta[2]))+(theta[1]*theta[2])))
  return(U)
  }
  hesslik-function(param=theta,n,T,C)
  {
  theta[1] - param[1]
  theta[2] - param[2]
  theta[3] - param[3]
  r- 17
  n -30
  T-c(7.048,0.743,2.404,1.374,2.233,1.52,23.531,5.182,4.502,1.362,1.15,1.86,1.692,11.659,1.631,2.212,5.451)
  C-
  c(0.562,5.69,12.603,3.999,6.156,4.004,5.248,4.878,7.122,17.069,23.996,1.538,7.792)
  G- matrix(nrow=3,ncol=3)
  G[1,1]-((-1*r)/((theta[1]+theta[2])^2))+((-1*n*theta[3])/(theta[1])^2)+
  (theta[3]+1)*sum(((T+theta[2])/((theta[1]+theta[2])*T+(theta[1]*theta[2])))^2)+(
  theta[3])*sum(((C+theta[2])/((theta[1]+theta[2])*C+(theta[1]*theta[2])))^2)
  G[1,2]-((-1*r)/((theta[1]+theta[2])^2))+
  (theta[3]+1)*sum(((T)/((theta[1]+theta[2])*T+(theta[1]*theta[2])))^2)+
  (theta[3])*sum(((C)/((theta[1]+theta[2])*C+(theta[1]*theta[2])))^2)
  G[2,1]-G[1,2]
  G[1,3]-(n/theta[1])+(-1)*sum(
  (T+theta[2])/((theta[1]+theta[2])*T+(theta[1]*theta[2])))+(-1)*sum((C+theta[2])/((theta[1]+theta[2])*C+(theta[1]*theta[2])))
  G[3,1]-G[1,3]
  G[2,2]-((-1*r)/((theta[1]+theta[2])^2))+((-1*n*theta[3])/(theta[2])^2)+
  (theta[3]+1)*sum(((T+theta[1])/((theta[1]+theta[2])*T+(theta[1]*theta[2])))^2)+(
  theta[3])*sum(((C+theta[1])/((theta[1]+theta[2])*C+(theta[1]*theta[2])))^2)
  

Re: [R] Differences in output of lme() when introducing interactions

2015-07-22 Thread Therneau, Terry M., Ph.D.
Type III is a peculiarity of SAS, which has taken root in the world.  There are 3 main 
questions wrt to it:


1. How to compute it (outside of SAS).  There is a trick using contr.treatment coding that 
works if the design has no missing factor combinations, your post has a link to such a 
description.  The SAS documentation is very obtuse, thus almost no one knows how to 
compute the general case.


2. What is it?  It is a population average.  The predicted average treatment effect in a 
balanced population-- one where all the factor combinations appeared the same number of 
times.  One way to compute 'type 3' is to create such a data set, get all the predicted 
values, and then take the average prediction for treatment A, average for treatment B, 
average for C, ...  and test are these averages the same.   The algorithm of #1 above 
leads to another explanation which is a false trail, in my opinion.


3. Should you ever use it?  No.  There is a very strong inverse correlation between 
understand what it really is and recommend its use.   Stephen Senn has written very 
intellgently on the issues.


Terry Therneau


On 07/22/2015 05:00 AM, r-help-requ...@r-project.org wrote:

Dear Michael,
thanks a lot. I am studying the marginality and I came across to this post:

http://www.ats.ucla.edu/stat/r/faq/type3.htm

Do you think that the procedure there described is the right one to solve my 
problem?

Would you have any other online resources to suggest especially dealing with R?

My department does not have a statician, so I have to find a solution with my 
own capacities.

Thanks in advance

Angelo


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm help - final predictor variable NA

2015-07-22 Thread Ben Bolker
matthewjones43 matthew.jones at kellogg.ox.ac.uk writes:

 
 Hi, I am not a statistician and so I am sure whatever it is I 
 am doing wrong
 must be an obvious error for those who are...Basically I can
  not understand
 why I get NA for variable 'CDSTotal' when running a glm? 
 Does anyone have an
 idea of why this might be happening?

It might be useful to look at 
http://stackoverflow.com/questions/7337761/
 linear-regression-na-estimate-just-for-last-coefficient/7341074#7341074

(broken URL).  You are overfitting the model by including
a predictor that can be expressed as a linear combination of
other predictors, and R is trying to handle it automatically.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R-es] web RSelenium

2015-07-22 Thread javier.ruben.marcuzzi
Estimados, les envío un código, el cuál me da error, aunque dudo si está 
escrito correctamente puesto que muchos ejemplos me están fallando.


Agradezco si alguno me informa un error al igual que el informado por mi 
computadora, no creo que el inconveniente sea por la opción de R que instalé en 
Windows 8.1, ya me perdí buscando mi error.


Revolution R Open 8.0.3
Using CRAN snapshot taken on 2015-04-01




library(RSelenium)
checkForServer()
startServer()



remDr - remoteDriver()
remDr$open()
remDr$navigate(http://www.r-project.org;)


webElems - remDr$findElement(using = 'css selector',  body  div  div.row  
div.col-xs-12.col-sm-offset-1.col-sm-2.sidebar  div  div:nth-child(1)  ul)


resHeaders - unlist(lapply(webElems, function(x){x$getElementText()}))

#Error in x$getElementText : object of type 'closure' is not subsettable







Javier Rubén Marcuzzi
Técnico en Industrias Lácteas
Veterinario
[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] Statistical distribution not fitting

2015-07-22 Thread Boris Steipe
So - as you can see, your data can be modelled.

Now the interesting question is: what do you do with that knowledge. I know 
nearly nothing about your domain, but given that the data looks log-normal, I 
am curious abut the following:

 - Most of the events are in the small-loss category. But most of the damage is 
done by the rare large losses. Is it even meaningful to guard against a single 
1/1000 event? Shouldn't you be saying: my contingency funds need to be large 
enough to allow survival of, say, a fiscal year with 99.9 % probability? This 
is a very different question.

 - If a loss occurs, in what time do the funds need to be replenished? Do you 
need to take series of events into account?

 - The model assumes that the data are independent. This is probably a poor 
(and dangerous) assumption.

Cheers,
B.





On Jul 22, 2015, at 3:56 PM, Ben Bolker bbol...@gmail.com wrote:

 Amelia Marsh amelia_marsh08 at yahoo.com writes:
 
 
 Hello!  (I dont know if I can raise this query here on this forum,
 but I had already raised on teh finance forum, but have not received
 any sugegstion, so now raising on this list. Sorry for the same. The
 query is about what to do, if no statistical distribution is fitting
 to data).
 
 I am into risk management and deal with Operatioanl risk. As a part
 of BASEL II guidelines, we need to arrive at the capital charge the
 banks must set aside to counter any operational risk, if it
 happens. As a part of Loss Distribution Approach (LDA), we need to
 collate past loss events and use these loss amounts. The usual
 process as being practised in the industry is as follows -
 
 Using these historical loss amounts and using the various
 statistical tests like KS test, AD test, PP plot, QQ plot etc, we
 try to identify best statistical (continuous) distribution fitting
 this historical loss data. Then using these estimated parameters
 w.r.t. the statistical distribution, we simulate say 1 miliion loss
 anounts and then taking appropriate percentile (say 99.9%), we
 arrive at the capital charge.
 
 However, many a times, loss data is such that fitting of
 distribution to loss data is not possible. May be loss data is
 multimodal or has significant variability, making the fitting of
 distribution impossible.  Can someone guide me how to deal with such
 data and what can be done to simulate losses using this historical
 loss data in R.
 
 A skew-(log)-normal fit doesn't look too bad ... (whenever you
 have positive data that are this strongly skewed, log-transforming
 is a good step)
 
 hist(log10(mydat),col=gray,breaks=FD,freq=FALSE)
 ## default breaks are much coarser:
 ## hist(log10(mydat),col=gray,breaks=Sturges,freq=FALSE)
 lines(density(log10(mydat)),col=2,lwd=2)
 library(fGarch)
 ss - snormFit(log10(mydat))
 xvec - seq(2,6.5,length=101)
 lines(xvec,do.call(dsnorm,c(list(x=xvec),as.list(ss$par))),
  col=blue,lwd=2)
 ## or try a skew-Student-t: not very different:
 ss2 - sstdFit(log10(mydat))
 lines(xvec,do.call(dsstd,c(list(x=xvec),as.list(ss2$estimate))),
  col=purple,lwd=2)
 
 There are more flexible distributional families (Johnson,
 log-spline ...)
 
 Multimodal data are a different can of worms -- consider
 fitting a finite mixture model ...
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] interactive Map: Popups

2015-07-22 Thread MacQueen, Don
I would start by looking at the Graphics and Web Technologies entries
in Task Views on CRAN. In addition, I suspect there might be some
packages listed in the Spatial task view that could do this.

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 7/22/15, 4:35 PM, R-help on behalf of Marie-Louise
r-help-boun...@r-project.org on behalf of tim...@hotmail.de wrote:

Hello,
I am trying to build a map of a country which shows informations to its
regions in a popup window as soon as someone clicks on a region.
Thank you



--
View this message in context:
http://r.789695.n4.nabble.com/interactive-Map-Popups-tp4710226.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to simulate informative censoring in a Cox PH model?

2015-07-22 Thread Greg Snow
I think that the Cox model still works well when the only information
in the censoring is conditional on variables in the model.  What you
describe could be called non-informative conditional on x.

To really see the difference you need informative censoring that
depends on something not included in the model.  One option would be
to use copulas to generate dependent data and then transform the
values using your Weibul.  Or you could generate your event times and
censoring times based on x1 and x2, but then only include x1 in the
model.

On Wed, Jul 22, 2015 at 2:20 AM, Daniel Meddings dpmeddi...@gmail.com wrote:
 I wish to simulate event times where the censoring is informative, and to
 compare parameter estimator quality from a Cox PH model with estimates
 obtained from event times generated with non-informative censoring. However
 I am struggling to do this, and I conclude rather than a technical flaw in
 my code I instead do not understand what is meant by informative and
 un-informative censoring.

 My approach is to simulate an event time T dependent on a vector of
 covariates x having hazard function h(t|x)=lambda*exp(beta'*x)v*t^{v-1}.
 This corresponds to T~ Weibull(lambda(x),v), where the scale parameter
 lambda(x)=lambda*exp(beta'*x) depends on x and the shape parameter v is
 fixed. I have N subjects where T_{i}~ Weibull(lambda(x_{i}),v_{T}),
 lambda(x_{i})=lambda_{T}*exp(beta_{T}'*x_{i}), for i=1,...,N. Here I assume
 the regression coefficients are p-dimensional.

 I generate informative censoring times C_i~ Weibull(lambda(x_i),v_C),
 lambda(x_i)=lambda_C*exp(beta_C'*x_i) and compute Y_inf_i=min(T_i,C_i) and
 a censored flag delta_inf_i=1 if Y_inf_i = C_i (an observed event), and
 delta_inf_i=0 if Y_inf_i  C_i (informatively censored: event not
 observed). I am convinced this is informative censoring because as long as
 beta_T~=0 and beta_C~=0 then for each subject the data generating process
 for T and C both depend on x.

 In contrast I generate non-informative censoring times
 D_i~Weibull(lambda_D*exp(beta_D),v_D), and compute Y_ninf_i=min(T_i,D_i)
 and a censored flag delta_ninf_i=1 if Y_ninf_i = D_i (an observed event),
 and delta_ninf_i=0 if Y_ninf_i  D_i (non-informatively censored: event not
 observed). Here beta_D is a scalar. I scale the simulation by choosing
 the lambda_T, lambda_C and lambda_D parameters such that on average T_iC_i
 and T_iD_i to achieve X% of censored subjects for both Y_inf_i and
 Y_ninf_i.

 The problem is that even for say 30% censoring (which I think is high), the
 Cox PH parameter estimates using both Y_inf and Y_ninf are unbiased when I
 expected the estimates using Y_inf to be biased, and I think I see why:
 however different beta_C is from beta_T, a censored subject can presumably
 influence the estimation of beta_T only by affecting the set of subjects at
 risk at any time t, but this does not change the fact that every single
 Y_inf_i with delta_inf_i=1 will have been generated using beta_T only. Thus
 I do not see how my simulation can possibly produce biased estimates for
 beta_T using Y_inf.

 But then what is informative censoring if not based on this approach?

 Any help would be greatly appreciated.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Differences in output of lme() when introducing interactions

2015-07-22 Thread Rolf Turner

On 23/07/15 01:15, Therneau, Terry M., Ph.D. wrote:

SNIP


3. Should you ever use it [i.e. Type III SS]?  No.  There is a very strong 
inverse
correlation between understand what it really is and recommend its
use.   Stephen Senn has written very intellgently on the issues.


Terry --- can you please supply an explicit citation?  Ta.

cheers,

Rolf

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] interactive Map: Popups

2015-07-22 Thread Marie-Louise
Hello,
I am trying to build a map of a country which shows informations to its
regions in a popup window as soon as someone clicks on a region. 
Thank you



--
View this message in context: 
http://r.789695.n4.nabble.com/interactive-Map-Popups-tp4710226.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R: Re: Differences in output of lme() when introducing interactions

2015-07-22 Thread angelo.arc...@virgilio.it
Dear Terry,
I am very grateful to you for such a detailed and helpful answer.
Following your recommendation then I will skip the method presented at 
http://www.ats.ucla.edu/stat/r/faq/type3.htm

So far, based on my understanding of R I arrived to the conclusion that the 
correct way to see if there is
a correlation between my dependent variable (spectral centroid of a sound) and 
weight, height, and interaction
between weight and height of participants asked to create those sounds (in a 
repeated measure design) is:


lme_centroid - lme(Centroid ~ Weight*Height*Shoe_Size, data = My_data, random 
= ~1 | Subject)

anova.lme(lme_centroid,type = marginal)


Can anyone please confirm me that those formulas are actually correct and give 
the significant or
non significant p-values for the main effects and their interactions? I would 
prefer to use lme(), not lmer().

I am making the assumption of course that the model I am using (Centroid ~ 
Weight*Height*Shoe_Size) is 
the best fit for my data.

Thanks in advance

Angelo




Messaggio originale
Da: thern...@mayo.edu
Data: 22-lug-2015 15.15
A: r-help@r-project.org, angelo.arc...@virgilio.it
Ogg: Re:  Differences in output of lme() when introducing interactions

Type III is a peculiarity of SAS, which has taken root in the world.  There 
are 3 main 
questions wrt to it:

1. How to compute it (outside of SAS).  There is a trick using contr.treatment 
coding that 
works if the design has no missing factor combinations, your post has a link to 
such a 
description.  The SAS documentation is very obtuse, thus almost no one knows 
how to 
compute the general case.

2. What is it?  It is a population average.  The predicted average treatment 
effect in a 
balanced population-- one where all the factor combinations appeared the same 
number of 
times.  One way to compute 'type 3' is to create such a data set, get all the 
predicted 
values, and then take the average prediction for treatment A, average for 
treatment B, 
average for C, ...  and test are these averages the same.   The algorithm of 
#1 above 
leads to another explanation which is a false trail, in my opinion.

3. Should you ever use it?  No.  There is a very strong inverse correlation 
between 
understand what it really is and recommend its use.   Stephen Senn has 
written very 
intellgently on the issues.

Terry Therneau


On 07/22/2015 05:00 AM, r-help-requ...@r-project.org wrote:
 Dear Michael,
 thanks a lot. I am studying the marginality and I came across to this post:

 http://www.ats.ucla.edu/stat/r/faq/type3.htm

 Do you think that the procedure there described is the right one to solve my 
 problem?

 Would you have any other online resources to suggest especially dealing with 
 R?

 My department does not have a statician, so I have to find a solution with my 
 own capacities.

 Thanks in advance

 Angelo



   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Differences in output of lme() when introducing interactions

2015-07-22 Thread Marc Schwartz
Hi,

In addition to Terry’s great comments below, as this subject has come up 
frequently over the years, there is also a great document by Bill Venables that 
is valuable reading:

  Exegeses on Linear Models
  http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf


Regards,

Marc Schwartz


 On Jul 22, 2015, at 8:15 AM, Therneau, Terry M., Ph.D. thern...@mayo.edu 
 wrote:
 
 Type III is a peculiarity of SAS, which has taken root in the world.  There 
 are 3 main questions wrt to it:
 
 1. How to compute it (outside of SAS).  There is a trick using 
 contr.treatment coding that works if the design has no missing factor 
 combinations, your post has a link to such a description.  The SAS 
 documentation is very obtuse, thus almost no one knows how to compute the 
 general case.
 
 2. What is it?  It is a population average.  The predicted average treatment 
 effect in a balanced population-- one where all the factor combinations 
 appeared the same number of times.  One way to compute 'type 3' is to create 
 such a data set, get all the predicted values, and then take the average 
 prediction for treatment A, average for treatment B, average for C, ...  and 
 test are these averages the same.   The algorithm of #1 above leads to 
 another explanation which is a false trail, in my opinion.
 
 3. Should you ever use it?  No.  There is a very strong inverse correlation 
 between understand what it really is and recommend its use.   Stephen 
 Senn has written very intellgently on the issues.
 
 Terry Therneau
 
 
 On 07/22/2015 05:00 AM, r-help-requ...@r-project.org wrote:
 Dear Michael,
 thanks a lot. I am studying the marginality and I came across to this post:
 
 http://www.ats.ucla.edu/stat/r/faq/type3.htm
 
 Do you think that the procedure there described is the right one to solve my 
 problem?
 
 Would you have any other online resources to suggest especially dealing with 
 R?
 
 My department does not have a statician, so I have to find a solution with 
 my own capacities.
 
 Thanks in advance
 
 Angelo

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sensitivity,Specificity and Youden Index

2015-07-22 Thread Sarah Goslee
On Wed, Jul 22, 2015 at 9:50 AM, Jomy Jose infoj...@gmail.com wrote:
 How to calculate the sensitivity,specificity,Youden index for 18 factors
 and their combination (6 factors in each) with an outcome measure.

www.rseek.org turns up a bunch of references to Youden index,
including packages.

Without a reproducible example that includes some sample data (fake is
fine), the code you used, and some clear idea of what output you
expect, it's impossible to figure out how to help you. Here are some
suggestions for creating a good reproducible example:
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calculate adjacent log odds for a table

2015-07-22 Thread Michael Friendly

On 7/21/2015 11:14 AM, Michael Friendly wrote:

More generally, for an I x J x K table, where the last factor is the
response, my desired result
is a data frame of IJ(K-1) rows, with adjacent log odds in a 'logodds'
column, and ideally, I'd like
to have a general function to do this.

Note that if T is the 10 x 3 matrix of frequencies shown by ftable(),
the calculation is essentially

log(T) %*% matrix(c(1, -1, 0,
 0,  1, -1))
followed by reshaping and labeling.


No one answered, but as I often find, writing it out as a MWE was 
helpful.  Here is a simpler approach for my sample case, that should be

easier to generalize

# reshape to matrix
T - matrix(mice.tab, nrow=prod(dim(mice.tab)[1:2]), ncol=dim(mice.tab)[3])
colnames(T) - dimnames(mice.tab)[[3]]
rn - expand.grid(litter=factor(7:11), treatment=c(A,B))
rownames(T) - apply(rn, 1, paste, collapse=:)
T
# calculate log odds as contrasts
C - matrix(c(1, -1, 0,
  0,  1, -1), nrow=3)
lodds - log(T) %*% C
colnames(lodds) - c(0:1, 1:2+)
lodds

 lodds
0:1   1:2+
7:A   1.6625477  0.7884574
8:A   1.2527630  0.3364722
9:A   0.6061358  0.1823216
10:A  0.1431008 -0.1431008
11:A -1.0986123 -0.3483067
7:B   1.3730491  0.9985288
8:B   1.2272297  0.7537718
9:B   0.7156200  0.7884574
10:B  0.5725192  0.2006707
11:B -1.0986123  0.6286087


# make a data frame
DF2 - data.frame(expand.grid(litter=factor(7:11), treatment=c(A,B), 
deaths=c(0:1, 1:2+)),

logodds=c(lodds))

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to make a persistence landscape in R?

2015-07-22 Thread John Kane
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
 and http://adv-r.had.co.nz/Reproducibility.html


John Kane
Kingston ON Canada


 -Original Message-
 From: l.shul...@hotmail.com
 Sent: Tue, 21 Jul 2015 10:39:45 -0700 (PDT)
 To: r-help@r-project.org
 Subject: [R] How to make a persistence landscape in R?
 
 Hi there,
 I am trying to construct a persistence landscape in R that shows all of
 the
 overlapping triangles -- not just the overall silhouette-- how can you do
 this?
 Everytime I plug in the code I get one big triangle that corresponds to
 the
 largest barcode, but I cannot see any of the smaller barcodes/
 overlapping
 isosceles trianges.
 Thanks!
 
 
 
 --
 View this message in context:
 http://r.789695.n4.nabble.com/How-to-make-a-persistence-landscape-in-R-tp4710159.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.