[R] Read .xlsx and convert date-column value into Dataframe
Hi, Here i am having a .xlsx file and it contains various columns including date-column[mm/dd/yy]-but it is not in the date format. I have to read this excel[.xlsx] file and need to get in dataframe. So i used xlsx-liabrary and it was fine to read data. But the problem is, values in the date column is converting to some other value. for eg:- FF DATE --- 3/31/2016 2/26/2016 -- 1/2/2016 [Values like -- will come in the column to indicate that there is no date mentioned ] and i getting result like this, FF DATE --- 42460 42426 42125 this is the code i am using for it, theData-data.frame(read.xlsx2(InputFilePath, sheetIndex, sheetName=Workflow_Report, startRow=3,colIndex=NULL, endRow=NULL, as.data.frame=TRUE, header=TRUE)) Aim :- I have to get actual date-column values in dataframe from xlsx file. I tried many ways, Could someone please help ? Thanks in advance, Antony. -- View this message in context: http://r.789695.n4.nabble.com/Read-xlsx-and-convert-date-column-value-into-Dataframe-tp4710192.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sensitivity,Specificity and Youden Index
Please reply to the list, not me. The list doesn't allow most attachments (and most of the list recipients are highly unlikely to open unsolicited binary files anyway). Do see the link I provided for the appropriate way to create reproducible examples, including providing data (hint: use dput(), and include your code). Some evidence that you've looked at the packages suggested by rseek would also be useful. Nobody's going to do your work for you. Sarah On Wed, Jul 22, 2015 at 12:34 PM, Jomy Jose infoj...@gmail.com wrote: Thank you for your reply I have attached my dataset,here 'malnut' is the outcome,.sensitivity,specificity and youden index of each factor from chew to cc(18 factors) and combination of these factors (any 6 out of 18) with the outcome measure 'malnut' has to be generated On Wed, Jul 22, 2015 at 7:30 PM, Sarah Goslee sarah.gos...@gmail.com wrote: On Wed, Jul 22, 2015 at 9:50 AM, Jomy Jose infoj...@gmail.com wrote: How to calculate the sensitivity,specificity,Youden index for 18 factors and their combination (6 factors in each) with an outcome measure. www.rseek.org turns up a bunch of references to Youden index, including packages. Without a reproducible example that includes some sample data (fake is fine), the code you used, and some clear idea of what output you expect, it's impossible to figure out how to help you. Here are some suggestions for creating a good reproducible example: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example Sarah -- Sarah Goslee http://www.functionaldiversity.org -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Langelier-Ludwig Plots
According to ... http://info.ngwa.org/gwol/pdf/721000139.PDF (Graphical Interpretation of Water Quality Data) ... a Langelier-Ludwig plot is simply a scatterplot of cations vs. anions (in percent). Surely that would be beyond trivial to produce in R. Or am I missing a subtle something? Cheers, B. On Jul 22, 2015, at 11:50 AM, Rich Shepard rshep...@appl-ecosys.com wrote: My web search for an R package producing Langelier-Ludwig plots found no hits. Has this been implemented in base graphics, lattice, ggplot2, or another package? The reference: Langelier, W., and Ludwig, H., 1942, Graphical methods for indicating the mineral character of natural waters: J. Am. Water Ass., 34, p. 335-352. Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to simulate informative censoring in a Cox PH model?
I wish to simulate event times where the censoring is informative, and to compare parameter estimator quality from a Cox PH model with estimates obtained from event times generated with non-informative censoring. However I am struggling to do this, and I conclude rather than a technical flaw in my code I instead do not understand what is meant by informative and un-informative censoring. My approach is to simulate an event time T dependent on a vector of covariates x having hazard function h(t|x)=lambda*exp(beta'*x)v*t^{v-1}. This corresponds to T~ Weibull(lambda(x),v), where the scale parameter lambda(x)=lambda*exp(beta'*x) depends on x and the shape parameter v is fixed. I have N subjects where T_{i}~ Weibull(lambda(x_{i}),v_{T}), lambda(x_{i})=lambda_{T}*exp(beta_{T}'*x_{i}), for i=1,...,N. Here I assume the regression coefficients are p-dimensional. I generate informative censoring times C_i~ Weibull(lambda(x_i),v_C), lambda(x_i)=lambda_C*exp(beta_C'*x_i) and compute Y_inf_i=min(T_i,C_i) and a censored flag delta_inf_i=1 if Y_inf_i = C_i (an observed event), and delta_inf_i=0 if Y_inf_i C_i (informatively censored: event not observed). I am convinced this is informative censoring because as long as beta_T~=0 and beta_C~=0 then for each subject the data generating process for T and C both depend on x. In contrast I generate non-informative censoring times D_i~Weibull(lambda_D*exp(beta_D),v_D), and compute Y_ninf_i=min(T_i,D_i) and a censored flag delta_ninf_i=1 if Y_ninf_i = D_i (an observed event), and delta_ninf_i=0 if Y_ninf_i D_i (non-informatively censored: event not observed). Here beta_D is a scalar. I scale the simulation by choosing the lambda_T, lambda_C and lambda_D parameters such that on average T_iC_i and T_iD_i to achieve X% of censored subjects for both Y_inf_i and Y_ninf_i. The problem is that even for say 30% censoring (which I think is high), the Cox PH parameter estimates using both Y_inf and Y_ninf are unbiased when I expected the estimates using Y_inf to be biased, and I think I see why: however different beta_C is from beta_T, a censored subject can presumably influence the estimation of beta_T only by affecting the set of subjects at risk at any time t, but this does not change the fact that every single Y_inf_i with delta_inf_i=1 will have been generated using beta_T only. Thus I do not see how my simulation can possibly produce biased estimates for beta_T using Y_inf. But then what is informative censoring if not based on this approach? Any help would be greatly appreciated. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R-es] ERROR: subscript out of bounds
Hola, Te adjunto un nuevo código comentado, que funciona y que incluye explicaciones de porqué el error que obtenías: #- library(ape) my.phylo-read.tree(150_BootstrapConsensusTree_Cantabrian_ML.nwk) my.sample - read.delim(my.sample.txt) my.sample-read.table(my.sample.txt,sep=\t, header=T, row.names=1) #Transformo a matriz my.sample my.sampleMat - as.matrix(my.sample) #Cambio los nombres de las columnas para que tengan el mismo tipo de formato que las de dist.mat #Los puntos por guion bajo. library(stringr) colnames(my.sampleMat) - str_replace_all(colnames(my.sampleMat), \\.,_) #dist.mat es la matriz de distancias de referencia sobre la que comparo my.sample. dist.mat-cophenetic(my.phylo) #Modifico nombres de filas y columnas de dist.mat quito acento library(stringr) colnames(dist.mat) - str_replace_all(colnames(dist.mat), \',) rownames(dist.mat) - str_replace_all(rownames(dist.mat), \',) #Solo me quedo de my.sample con las columnas con nombres que estan en dist.mat #Si hay un nombre de una especie que no esta en dist.mat no se puede #calcular la distancia y aparece el error. my.sampleGood - my.sampleMat[, intersect(colnames(my.sampleMat), colnames(dist.mat))] #De 54 columnas paso a 13 en my.sample. #Ahora my.sampleGood está ya listo para poder aplicarle la funcion y calcular distancias. library(SDMTools) sntd.a.function - function(x){ com.names - names(x[x 0])# Get de names of the species present in a community my.com.dist - dist.mat[com.names, com.names] diag(my.com.dist) = NA # Diagonal values to NA - Matriz sim, diagonal zero wt.sd(apply(my.com.dist,1,min,na.rm=T), x[x0]) } #Aplico funcion. proba-apply(my.sampleGood, MARGIN = 1, sntd.a.function) #Resultado: #La Preste Toran Els Ports #0.07185927 0.07129029 0.04948652 #- Saludos, Carlos Ortega www.qualityexcellence.es 2015-07-22 10:40 GMT+02:00 nuria bs lapt...@hotmail.com: Buenas, Cuando intento ejecutar el siguiente codigo, me aparece el error Error in dist.mat[com.names, com.names] : subscript out of bounds. Porque me aparece este error? Y como puedo solucionarlo? Gracias!!! *CONSOLE:* my.phylo-read.tree(150_BootstrapConsensusTree_Cantabrian_ML.nwk) my.sample - read.delim(C:/Filogenia/my.sample.txt) my.sample-read.table(my.sample.txt,sep=\t, header=T, row.names=1) library(SDMTools) sntd.a.function - function(x){ com.names - names(x[x 0]) # Get de names of the species present in a community my.com.dist - dist.mat[com.names, com.names] # Distance matrix diag(my.com.dist) = NA # Diagonal values to NA - Matriz sim, diagonal zero wt.sd(apply(my.com.dist,1,min,na.rm=T), x[x0]) } dist.mat-cophenetic(my.phylo) proba-apply(my.sample, MARGIN = 1, sntd.a.function) Error in dist.mat[com.names, com.names] : subscript out of bounds ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es -- Saludos, Carlos Ortega www.qualityexcellence.es [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
Re: [R] Read .xlsx and convert date-column value into Dataframe
Hi Antony, I am not sure it could work easily with package xlsx. Try using the function read_excel() from package readxl. This function allows for Dates to be read. HTH, Ivan -- Ivan Calandra, ATER University of Reims Champagne-Ardenne GEGENAA - EA 3795 CREA - 2 esplanade Roland Garros 51100 Reims, France +33(0)3 26 77 36 89 ivan.calan...@univ-reims.fr https://www.researchgate.net/profile/Ivan_Calandra Le 22/07/15 14:09, R_Antony a écrit : Hi, Here i am having a .xlsx file and it contains various columns including date-column[mm/dd/yy]-but it is not in the date format. I have to read this excel[.xlsx] file and need to get in dataframe. So i used xlsx-liabrary and it was fine to read data. But the problem is, values in the date column is converting to some other value. for eg:- FF DATE --- 3/31/2016 2/26/2016 -- 1/2/2016 [Values like -- will come in the column to indicate that there is no date mentioned ] and i getting result like this, FF DATE --- 42460 42426 42125 this is the code i am using for it, theData-data.frame(read.xlsx2(InputFilePath, sheetIndex, sheetName=Workflow_Report, startRow=3,colIndex=NULL, endRow=NULL, as.data.frame=TRUE, header=TRUE)) Aim :- I have to get actual date-column values in dataframe from xlsx file. I tried many ways, Could someone please help ? Thanks in advance, Antony. -- View this message in context: http://r.789695.n4.nabble.com/Read-xlsx-and-convert-date-column-value-into-Dataframe-tp4710192.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Langelier-Ludwig Plots
My web search for an R package producing Langelier-Ludwig plots found no hits. Has this been implemented in base graphics, lattice, ggplot2, or another package? The reference: Langelier, W., and Ludwig, H., 1942, Graphical methods for indicating the mineral character of natural waters: J. Am. Water Ass., 34, p. 335-352. Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in rownames
Hello, When I type the code *rownames(gene_exp_matrix)-levels(geneidfactor)*,the error happens: Error in rownames-(*tmp*,value=character(0)):can not set object to 'rownames'. I am using Mac OS X10.10.2. How can I fix this problem? Thank a lot. Yao -- View this message in context: http://r.789695.n4.nabble.com/Error-in-rownames-tp4710183.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in rownames
Without a reproducible example that includes some sample data (fake is fine), the code you used, and some clear idea of what output you expect, it's impossible to figure out how to help you. Here are some suggestions for creating a good reproducible example: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example On Wed, Jul 22, 2015 at 3:50 AM, JIALU YAO yaoji...@foxmail.com wrote: Hello, When I type the code *rownames(gene_exp_matrix)-levels(geneidfactor)*,the error happens: Error in rownames-(*tmp*,value=character(0)):can not set object to 'rownames'. I am using Mac OS X10.10.2. How can I fix this problem? Thank a lot. Yao -- View this message in context: http://r.789695.n4.nabble.com/Error-in-rownames-tp4710183.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read .xlsx and convert date-column value into Dataframe
Those numbers are a serial number of days. A value of 1 maps to Jan 1, 1900. ref: https://support.office.com/en-za/article/DATE-function-e36c0c8c-4104-49da-ab83-82328b832349 A formula such as: as.Date('1900-01-01')+excel_date-1 should convert the serial value to a date value. On Wed, Jul 22, 2015 at 7:09 AM, R_Antony antony.akk...@ge.com wrote: Hi, Here i am having a .xlsx file and it contains various columns including date-column[mm/dd/yy]-but it is not in the date format. I have to read this excel[.xlsx] file and need to get in dataframe. So i used xlsx-liabrary and it was fine to read data. But the problem is, values in the date column is converting to some other value. for eg:- FF DATE --- 3/31/2016 2/26/2016 -- 1/2/2016 [Values like -- will come in the column to indicate that there is no date mentioned ] and i getting result like this, FF DATE --- 42460 42426 42125 this is the code i am using for it, theData-data.frame(read.xlsx2(InputFilePath, sheetIndex, sheetName=Workflow_Report, startRow=3,colIndex=NULL, endRow=NULL, as.data.frame=TRUE, header=TRUE)) Aim :- I have to get actual date-column values in dataframe from xlsx file. I tried many ways, Could someone please help ? Thanks in advance, Antony. -- View this message in context: http://r.789695.n4.nabble.com/Read-xlsx-and-convert-date-column-value-into-Dataframe-tp4710192.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Schrodinger's backup: The condition of any backup is unknown until a restore is attempted. Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be. He's about as useful as a wax frying pan. 10 to the 12th power microphones = 1 Megaphone Maranatha! John McKown [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] SDR package for Subgroup Discovery on CRAN
Package SDR for subgroup discovery data mining is available on CRAN now. More info of the package on the CRAN page: https://cran.r-project.org/web/packages/SDR/index.html Ángel. [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in rownames
On Jul 22, 2015, at 12:50 AM, JIALU YAO wrote: Hello, When I type the code *rownames(gene_exp_matrix)-levels(geneidfactor)*,the error happens: Error in rownames-(*tmp*,value=character(0)):can not set object to 'rownames'. I am using Mac OS X10.10.2. How can I fix this problem? The error suggests that levels(geneidfactor) is returning a vector of length 0. How do you fix that? The first thing I would do is check to see if spelling might be an issue and then see what str(geneidfactor) returns. If this process makes any sense (noting that the order of a dataframe factor variable might not be the same as the level-names), then my guess is that you failed to include the dataframe name where the factor vector resides. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read .xlsx and convert date-column value into Dataframe
forgot the reply to all: These are serial dates within EXCEL. Here is a way of converting them: as.Date(c(42460, 42426), origin = '1899-12-30') [1] 2016-03-31 2016-02-26 Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Wed, Jul 22, 2015 at 11:51 AM, Ivan Calandra ivan.calan...@univ-reims.fr wrote: Hi Antony, I am not sure it could work easily with package xlsx. Try using the function read_excel() from package readxl. This function allows for Dates to be read. HTH, Ivan -- Ivan Calandra, ATER University of Reims Champagne-Ardenne GEGENAA - EA 3795 CREA - 2 esplanade Roland Garros 51100 Reims, France +33(0)3 26 77 36 89 ivan.calan...@univ-reims.fr https://www.researchgate.net/profile/Ivan_Calandra Le 22/07/15 14:09, R_Antony a écrit : Hi, Here i am having a .xlsx file and it contains various columns including date-column[mm/dd/yy]-but it is not in the date format. I have to read this excel[.xlsx] file and need to get in dataframe. So i used xlsx-liabrary and it was fine to read data. But the problem is, values in the date column is converting to some other value. for eg:- FF DATE --- 3/31/2016 2/26/2016 -- 1/2/2016 [Values like -- will come in the column to indicate that there is no date mentioned ] and i getting result like this, FF DATE --- 42460 42426 42125 this is the code i am using for it, theData-data.frame(read.xlsx2(InputFilePath, sheetIndex, sheetName=Workflow_Report, startRow=3,colIndex=NULL, endRow=NULL, as.data.frame=TRUE, header=TRUE)) Aim :- I have to get actual date-column values in dataframe from xlsx file. I tried many ways, Could someone please help ? Thanks in advance, Antony. -- View this message in context: http://r.789695.n4.nabble.com/Read-xlsx-and-convert-date-column-value-into-Dataframe-tp4710192.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Langelier-Ludwig Plots
On Wed, 22 Jul 2015, Boris Steipe wrote: According to ... http://info.ngwa.org/gwol/pdf/721000139.PDF (Graphical Interpretation of Water Quality Data) ... a Langelier-Ludwig plot is simply a scatterplot of cations vs. anions (in percent). Surely that would be beyond trivial to produce in R. Or am I missing a subtle something? Boris, Yes, that's what it is. I'll see just how trivial it is to produce it in the four-quadrant format I've seen used. It's a different layout from the basic two-variable scatterplot. Thanks for the URL. I worked from a reference in a 2003 paper to the original 1942 paper. Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Statistical distribution not fitting
Try: qqnorm(log(mydat)) That doesn't look too bad, does it? Now: where is the problem? Cheers, B. On Jul 22, 2015, at 12:41 PM, Amelia Marsh amelia_mars...@yahoo.com wrote: Hello! (I dont know if I can raise this query here on this forum, but I had already raised on teh finance forum, but have not received any sugegstion, so now raising on this list. Sorry for the same. The query is about what to do, if no statistical distribution is fitting to data). I am into risk management and deal with Operatioanl risk. As a part of BASEL II guidelines, we need to arrive at the capital charge the banks must set aside to counter any operational risk, if it happens. As a part of Loss Distribution Approach (LDA), we need to collate past loss events and use these loss amounts. The usual process as being practised in the industry is as follows - Using these historical loss amounts and using the various statistical tests like KS test, AD test, PP plot, QQ plot etc, we try to identify best statistical (continuous) distribution fitting this historical loss data. Then using these estimated parameters w.r.t. the statistical distribution, we simulate say 1 miliion loss anounts and then taking appropriate percentile (say 99.9%), we arrive at the capital charge. However, many a times, loss data is such that fitting of distribution to loss data is not possible. May be loss data is multimodal or has significant variability, making the fitting of distribution impossible. Can someone guide me how to deal with such data and what can be done to simulate losses using this historical loss data in R. My data is as follows - mydat - c(829.53,4000,6000,1000,1063904,102400,22000,4000,4200,2000,1,400, 459006, 7276,4000,100,4000,1,613803.36, 825,1000,5000,4000,3000,84500,200, 2000,68000,97400,6267.8, 49500,27000,2100,10489.92,2200,2000,2000,1000,1900, 6000,5600,100,4000,14300,100,94100,1200,7000,2000,3000,1100,6900,1000,18500,6000,2000,4000,8400,11200,1000,15100,23300,4000,13100,4500,200,2000,5,3900,3200,2000,2000,67000,2000,500,2000,1000,1900,10400,1900,2000,3200,6500,1,2900,1000,14300,1000,2700,1500,12000,4,25000,2800,5000,15000,4000,1000,21000,15000,16000,54000,1500,19200,2000,2000,1000,39000,5000,1100,18000,1,3500,1000,1,5000,14000,1800,4000,1000,300,4000,1000,100,1000,4400,2000,2000,12000,200,100,1000,1000,2000,1600,2000,4000,14000,4000,13500,1000,200,200,1000,18000,23000,41400,6,500,3000,21000,6900,14600,1900,4000,4500,1000,2000,2000,1000,4100,2000,1000,2000,8000,3000,1500,2000,2000,3500,2000,2000,1000,3800,3,55000,500,1000,1000,2000,62400,2000,3000,200,2! 00! ! 0,3500,2000,2000,500,3000,4500,1000,1,2000,3000,3600,1000,2000,2000,5000,23000,2000,1900,2000,6,2000,6,2,2000,2000,4600,1000,2000,1000,18000,6000,62000,68000,26800,5,45900,16900,21500,2000,22700,2000,2000,32000,1,5000,138000,159700,13000,2000,17619,2000,1000,4000,2000,1500,4000,2,158900,74100,6000,24900,6,500,1000,4,1,5,800,4000,4900,6500,5000,400,500,3000,32300,24000,300,11500,2000,5000,1000,500,5000,5500,17450,56800,2000,1000,21400,22000,6,3000,7500,3000,1000,1000,2000,1500,83700,2000,4000,170005,7,6700,1500,3500,2000,10563.97,1500,25000,2000,2000,2267.57,1100,3100,2000,3500,1,2000,6000,1500,200,2,4000,46400,296900,15,3700,7500,2,48500,3500,12000,2500,4000,8500,1000,14500,1000,11000,2000,2000,12,2,7600,3000,2000,8000,1600,4,2000,5000,34187.67,279100,9900,31300,814000,43500,5100,49500,4500,6262.38,100,10400,2400,1500,5000,2500,15000,4,32500,41100,358600,109600,514300,258200,225900,402700,2! 7! 4300,75000,1000,56000,1,4100,1000,15000,100,4,7900,5000,105000 ,15100,2000,1100,2900,1500,600,500,1300,100,5000,5000,1,10100,7000,4,10500,5000,9500,1000,15200,2000,1,1,100,7800,3500,189900,58000,345000,151700,11000,6000,7000,15700,6000,3000,5000,1,2000,1000,36000,1000,500,8000,9000,6000,2000,26500,6000,5000,97200,2000,5100,17000,2500,25500,24000,5400,9,41500,6200,7500,5000,7000,41000,25000,1500,4,5000,1,21500,100,32000,32500,7,500,66400,21000,5000,5000,12600,3000,6200,38900,1,1000,6,41100,1200,31300,2500,58000,4100,58000,42500) Sorry for the inconvenience. I do understand fitting of distribution to such data is not a full proof method, but this is what is the procedure that has been followed in the risk management risk industry. Please note that my question is not pertaining to operational risk. My question is if distributions are not fitting to a particular data, how do we proceed further to simualte data based on this data. Regards Amelia Marsh __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal,
Re: [R] Statistical distribution not fitting
Amelia Marsh amelia_marsh08 at yahoo.com writes: Hello! (I dont know if I can raise this query here on this forum, but I had already raised on teh finance forum, but have not received any sugegstion, so now raising on this list. Sorry for the same. The query is about what to do, if no statistical distribution is fitting to data). I am into risk management and deal with Operatioanl risk. As a part of BASEL II guidelines, we need to arrive at the capital charge the banks must set aside to counter any operational risk, if it happens. As a part of Loss Distribution Approach (LDA), we need to collate past loss events and use these loss amounts. The usual process as being practised in the industry is as follows - Using these historical loss amounts and using the various statistical tests like KS test, AD test, PP plot, QQ plot etc, we try to identify best statistical (continuous) distribution fitting this historical loss data. Then using these estimated parameters w.r.t. the statistical distribution, we simulate say 1 miliion loss anounts and then taking appropriate percentile (say 99.9%), we arrive at the capital charge. However, many a times, loss data is such that fitting of distribution to loss data is not possible. May be loss data is multimodal or has significant variability, making the fitting of distribution impossible. Can someone guide me how to deal with such data and what can be done to simulate losses using this historical loss data in R. A skew-(log)-normal fit doesn't look too bad ... (whenever you have positive data that are this strongly skewed, log-transforming is a good step) hist(log10(mydat),col=gray,breaks=FD,freq=FALSE) ## default breaks are much coarser: ## hist(log10(mydat),col=gray,breaks=Sturges,freq=FALSE) lines(density(log10(mydat)),col=2,lwd=2) library(fGarch) ss - snormFit(log10(mydat)) xvec - seq(2,6.5,length=101) lines(xvec,do.call(dsnorm,c(list(x=xvec),as.list(ss$par))), col=blue,lwd=2) ## or try a skew-Student-t: not very different: ss2 - sstdFit(log10(mydat)) lines(xvec,do.call(dsstd,c(list(x=xvec),as.list(ss2$estimate))), col=purple,lwd=2) There are more flexible distributional families (Johnson, log-spline ...) Multimodal data are a different can of worms -- consider fitting a finite mixture model ... __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Statistical distribution not fitting
Hello! (I dont know if I can raise this query here on this forum, but I had already raised on teh finance forum, but have not received any sugegstion, so now raising on this list. Sorry for the same. The query is about what to do, if no statistical distribution is fitting to data). I am into risk management and deal with Operatioanl risk. As a part of BASEL II guidelines, we need to arrive at the capital charge the banks must set aside to counter any operational risk, if it happens. As a part of Loss Distribution Approach (LDA), we need to collate past loss events and use these loss amounts. The usual process as being practised in the industry is as follows - Using these historical loss amounts and using the various statistical tests like KS test, AD test, PP plot, QQ plot etc, we try to identify best statistical (continuous) distribution fitting this historical loss data. Then using these estimated parameters w.r.t. the statistical distribution, we simulate say 1 miliion loss anounts and then taking appropriate percentile (say 99.9%), we arrive at the capital charge. However, many a times, loss data is such that fitting of distribution to loss data is not possible. May be loss data is multimodal or has significant variability, making the fitting of distribution impossible. Can someone guide me how to deal with such data and what can be done to simulate losses using this historical loss data in R. My data is as follows - mydat - c(829.53,4000,6000,1000,1063904,102400,22000,4000,4200,2000,1,400, 459006, 7276,4000,100,4000,1,613803.36, 825,1000,5000,4000,3000,84500,200, 2000,68000,97400,6267.8, 49500,27000,2100,10489.92,2200,2000,2000,1000,1900, 6000,5600,100,4000,14300,100,94100,1200,7000,2000,3000,1100,6900,1000,18500,6000,2000,4000,8400,11200,1000,15100,23300,4000,13100,4500,200,2000,5,3900,3200,2000,2000,67000,2000,500,2000,1000,1900,10400,1900,2000,3200,6500,1,2900,1000,14300,1000,2700,1500,12000,4,25000,2800,5000,15000,4000,1000,21000,15000,16000,54000,1500,19200,2000,2000,1000,39000,5000,1100,18000,1,3500,1000,1,5000,14000,1800,4000,1000,300,4000,1000,100,1000,4400,2000,2000,12000,200,100,1000,1000,2000,1600,2000,4000,14000,4000,13500,1000,200,200,1000,18000,23000,41400,6,500,3000,21000,6900,14600,1900,4000,4500,1000,2000,2000,1000,4100,2000,1000,2000,8000,3000,1500,2000,2000,3500,2000,2000,1000,3800,3,55000,500,1000,1000,2000,62400,2000,3000,200,200! ! 0,3500,2000,2000,500,3000,4500,1000,1,2000,3000,3600,1000,2000,2000,5000,23000,2000,1900,2000,6,2000,6,2,2000,2000,4600,1000,2000,1000,18000,6000,62000,68000,26800,5,45900,16900,21500,2000,22700,2000,2000,32000,1,5000,138000,159700,13000,2000,17619,2000,1000,4000,2000,1500,4000,2,158900,74100,6000,24900,6,500,1000,4,1,5,800,4000,4900,6500,5000,400,500,3000,32300,24000,300,11500,2000,5000,1000,500,5000,5500,17450,56800,2000,1000,21400,22000,6,3000,7500,3000,1000,1000,2000,1500,83700,2000,4000,170005,7,6700,1500,3500,2000,10563.97,1500,25000,2000,2000,2267.57,1100,3100,2000,3500,1,2000,6000,1500,200,2,4000,46400,296900,15,3700,7500,2,48500,3500,12000,2500,4000,8500,1000,14500,1000,11000,2000,2000,12,2,7600,3000,2000,8000,1600,4,2000,5000,34187.67,279100,9900,31300,814000,43500,5100,49500,4500,6262.38,100,10400,2400,1500,5000,2500,15000,4,32500,41100,358600,109600,514300,258200,225900,402700,27! 4300,75000,1000,56000,1,4100,1000,15000,100,4,7900,5000,105000 ,15100,2000,1100,2900,1500,600,500,1300,100,5000,5000,1,10100,7000,4,10500,5000,9500,1000,15200,2000,1,1,100,7800,3500,189900,58000,345000,151700,11000,6000,7000,15700,6000,3000,5000,1,2000,1000,36000,1000,500,8000,9000,6000,2000,26500,6000,5000,97200,2000,5100,17000,2500,25500,24000,5400,9,41500,6200,7500,5000,7000,41000,25000,1500,4,5000,1,21500,100,32000,32500,7,500,66400,21000,5000,5000,12600,3000,6200,38900,1,1000,6,41100,1200,31300,2500,58000,4100,58000,42500) Sorry for the inconvenience. I do understand fitting of distribution to such data is not a full proof method, but this is what is the procedure that has been followed in the risk management risk industry. Please note that my question is not pertaining to operational risk. My question is if distributions are not fitting to a particular data, how do we proceed further to simualte data based on this data. Regards Amelia Marsh __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2 - Specifying Colors Manually
Hi, Given a data frame, I'm trying to graph multiple lines on one graph, each line being a different color and each colored line corresponding to a specific name in the legend. Here is a very basic data sample to work with: x - seq(0,40,10) y1 - sample(1:50,5) y2 - sample(1:50,5) mydf - data.frame(x,y1,y2) p - ggplot(mydf,aes(x=mydf$x)) + geom_line(aes(y=mydf$y1,colour=green4)) + geom_line(aes(y=mydf$y2,colour=blue2)) + scale_color_manual(name=legend,values=c(y1=green4,y2=blue2)) p When I run this, the entire plot is blank. What I WANT to show up is two lines, one being the color of green4 and the other being blue2. Besides that, I'm trying to associate the colors with the names y1 and y2 in the legend, but my codes don't seem to be working. I'm very new to R/ggplot2, and I really appreciate any and all help I can get. Thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 - Specifying Colors Manually
Try this: ggplot(mydf,aes(x)) + geom_line(aes(y = y1, colour = y1)) + geom_line(aes(y = y2, colour = y2)) + scale_color_manual(values = c(y1 = green4, y2 = blue2)) Note that you don't need to use `mydf` and names in the manual scale should match the values in the aes() calls. Alsoit'smucheasiertoreadyourcodeifyouusespaces;) Hadley On Wed, Jul 22, 2015 at 1:13 PM, Abiram Srivatsa avsriva...@gmail.com wrote: Hi, Given a data frame, I'm trying to graph multiple lines on one graph, each line being a different color and each colored line corresponding to a specific name in the legend. Here is a very basic data sample to work with: x - seq(0,40,10) y1 - sample(1:50,5) y2 - sample(1:50,5) mydf - data.frame(x,y1,y2) p - ggplot(mydf,aes(x=mydf$x)) + geom_line(aes(y=mydf$y1,colour=green4)) + geom_line(aes(y=mydf$y2,colour=blue2)) + scale_color_manual(name=legend,values=c(y1=green4,y2=blue2)) p When I run this, the entire plot is blank. What I WANT to show up is two lines, one being the color of green4 and the other being blue2. Besides that, I'm trying to associate the colors with the names y1 and y2 in the legend, but my codes don't seem to be working. I'm very new to R/ggplot2, and I really appreciate any and all help I can get. Thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ordering in Sankey diagram using R and googleVis
Hello, I am trying to figure out if there is a way to order the left side of a Sankey diagram from most frequent to least frequent. I am using R version 3.2.1 and using googleVis version 0.5.9 for the Sankey. I've tried sorting, but that does not work. Is there anyway to force it to arrange the left (before) side in decreasing frequency? Something I am missing? Does not have to be using googleVis. Thank you! -Angela Example of the data I have, in a csv file: before after A B A B A B A C A A A A A B D E F B F B F F G H G A I reformat the data in R so it looks like this: before after count A B 4 A C 1 A A 2 D E 1 F B 2 F F 1 G H 1 G A 1 Then plot using this: plot( gvisSankey (data, from=before, to=after, weight=freq, options=list(width=600, height=800, sankey={iterations: 2}))) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Same results in arules after changing support and confidence
Hello, everyone! I'm new to R, so I'm sorry in advance if it's something obvious, but I still can't figure it out. I've started experimenting with apriori and everything was working fine, i.e. I changed support and confidence and results were different depending on combination. But on the next after running the same code and same dataset I'm getting 0 rules no matter which combination I'm using. My dataset looks like this (foodmarket data): Trans_Id,Product 3694728,Washington Berry Juice 3779788,Washington Berry Juice 414,Washington Berry Juice 4405313,Washington Berry Juice etc. My code is: library('arules') transactions - read.transactions(file=transactions.csv,format=single,sep=,,cols=c(1,2),rm.duplicates=false) basket_rules - apriori(transactions, parameter = list(sup = 0.05, conf = 0.01, target=rules,minlen=2)) With following results: http://postimg.org/image/8qw4e1hw7/ And another one with different parameters but same result: http://postimg.org/image/b9k7xfq91/ So, I have no idea how is that possible. Any input will be appreciated. Thank you. -- View this message in context: http://r.789695.n4.nabble.com/Same-results-in-arules-after-changing-support-and-confidence-tp4710218.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] model non-integer count outcomes
Sorry. Central limit theorem. Enough averaging and you get a normal distribution (simply stated, perhaps too simply). If so others will correct me before long. :-( Sent from my iPad On Jul 21, 2015, at 8:52 PM, Wensui Liu liuwen...@gmail.com wrote: what does CLT stand for? On Tue, Jul 21, 2015 at 11:41 PM, Don McKenzie d...@u.washington.edu wrote: Or if there are enough averages of enough counts, the CLT provides another option. On Jul 21, 2015, at 8:38 PM, David Winsemius dwinsem...@comcast.net wrote: On Jul 21, 2015, at 8:21 PM, Wensui Liu wrote: Dear Lister When the count outcomes are integers, we could use either Poisson or NB regression to model them. However, there are cases that the count outcomes are non-integers, e.g. average counts. I am wondering if it still makes sense to use Poisson or NB regression to model these non-integer outcomes. There is a quasi-binomial error model that accepts non-integer outcomes. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- WenSui Liu https://statcompute.wordpress.com/ __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] model non-integer count outcomes
Or if there are enough averages of enough counts, the CLT provides another option. On Jul 21, 2015, at 8:38 PM, David Winsemius dwinsem...@comcast.net wrote: On Jul 21, 2015, at 8:21 PM, Wensui Liu wrote: Dear Lister When the count outcomes are integers, we could use either Poisson or NB regression to model them. However, there are cases that the count outcomes are non-integers, e.g. average counts. I am wondering if it still makes sense to use Poisson or NB regression to model these non-integer outcomes. There is a quasi-binomial error model that accepts non-integer outcomes. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] model non-integer count outcomes
If you know the number of counts (n) used to calculate the average then you can still use a poisson distribution. Total = average * n glm(total ~ offset(n), family = poisson) ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey Op 22 jul. 2015 08:38 schreef Don McKenzie d...@u.washington.edu: Or if there are enough averages of enough counts, the CLT provides another option. On Jul 21, 2015, at 8:38 PM, David Winsemius dwinsem...@comcast.net wrote: On Jul 21, 2015, at 8:21 PM, Wensui Liu wrote: Dear Lister When the count outcomes are integers, we could use either Poisson or NB regression to model them. However, there are cases that the count outcomes are non-integers, e.g. average counts. I am wondering if it still makes sense to use Poisson or NB regression to model these non-integer outcomes. There is a quasi-binomial error model that accepts non-integer outcomes. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] model non-integer count outcomes
Thanks Thierry What if I don't know the n in the offset term? On Wednesday, July 22, 2015, Thierry Onkelinx thierry.onkel...@inbo.be wrote: If you know the number of counts (n) used to calculate the average then you can still use a poisson distribution. Total = average * n glm(total ~ offset(n), family = poisson) ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey Op 22 jul. 2015 08:38 schreef Don McKenzie d...@u.washington.edu javascript:;: Or if there are enough averages of enough counts, the CLT provides another option. On Jul 21, 2015, at 8:38 PM, David Winsemius dwinsem...@comcast.net javascript:; wrote: On Jul 21, 2015, at 8:21 PM, Wensui Liu wrote: Dear Lister When the count outcomes are integers, we could use either Poisson or NB regression to model them. However, there are cases that the count outcomes are non-integers, e.g. average counts. I am wondering if it still makes sense to use Poisson or NB regression to model these non-integer outcomes. There is a quasi-binomial error model that accepts non-integer outcomes. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org javascript:; mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org javascript:; mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org javascript:; mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- WenSui Liu https://statcompute.wordpress.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make a persistence landscape in R?
Hi lstat, The problem may be that as you are adding your connector triangles there is a fill color. Try reprogramming with col=NA if you are using polygon to draw the connectors. Jim On Wed, Jul 22, 2015 at 12:49 PM, David Winsemius dwinsem...@comcast.net wrote: On Jul 21, 2015, at 10:39 AM, lstat wrote: Hi there, I am trying to construct a persistence landscape in R that shows all of the overlapping triangles -- not just the overall silhouette-- how can you do this? Everytime I plug in the code I get one big triangle that corresponds to the largest barcode, but I cannot see any of the smaller barcodes/ overlapping isosceles trianges. You need to get out more. Talk to people who don't share your somewhat constricted view of the world. Learn to talk to a more diverse group of human beings,... say statisticians who might never have heard of a persistent landscape? It all sounds quite fascinating ... but there was no the code. Perhaps this is due to the Nabble interface which masquerades as R-help but is only a feeble imitation. Do read the fine posting guide: http://www.R-project.org/posting-guide.html -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] model non-integer count outcomes
On 22 Jul 2015, at 06:48 , Don McKenzie d...@uw.edu wrote: Sorry. Central limit theorem. Or some sort of vegetarian sandwich. Celery, Lettuce, Tomato sounds almost edible with sufficient mayo. ;-) Enough averaging and you get a normal distribution (simply stated, perhaps too simply). If so others will correct me before long. :-( Well, your punctuation doesn't quite work -- ')' comes too early. Otherwise it is close enough for jazz, although there are distributions that you can average forever and still not get a normal, and some might want to stress that it is the parameter estimators that become approximately normal. (Students sometimes get confused and believe that the original data magically become normally distributed when you have a lot of them.) In practice, one should ensure that one has many data for all the relevant averages (996 males and 4 females is no good), and also that one gets the variance structure at least roughly right. -pd Sent from my iPad On Jul 21, 2015, at 8:52 PM, Wensui Liu liuwen...@gmail.com wrote: what does CLT stand for? On Tue, Jul 21, 2015 at 11:41 PM, Don McKenzie d...@u.washington.edu wrote: Or if there are enough averages of enough counts, the CLT provides another option. On Jul 21, 2015, at 8:38 PM, David Winsemius dwinsem...@comcast.net wrote: On Jul 21, 2015, at 8:21 PM, Wensui Liu wrote: Dear Lister When the count outcomes are integers, we could use either Poisson or NB regression to model them. However, there are cases that the count outcomes are non-integers, e.g. average counts. I am wondering if it still makes sense to use Poisson or NB regression to model these non-integer outcomes. There is a quasi-binomial error model that accepts non-integer outcomes. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- WenSui Liu https://statcompute.wordpress.com/ __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot -issues with axis and labels not appearing
Hi Pierre, I get a reasonable plot using the following code: par(mar=c(6,4,4,2)) barpos-barplot(unlist(GEP.data2), main=Global Portfolio Weights, col.main=gray, col=blues9, cex.axis=1, ylim=c(-1,1), las=2, cex.lab=1, cex=0.8) axis(1,at=barpos,labels=rep(,8)) For one thing, you don't need the beside=TRUE argument as there is only one vector of values to display. The small value for cex.axis made the tick labels unreadable on my display. If you would like to have the bar labels horizontal, have a look at the staxlab function in the plotrix package. Jim On Tue, Jul 21, 2015 at 10:38 PM, Pierre Micallef micallefpie...@hotmail.com wrote: Hi I am experiencing a few issues with the barplot function. I have written the following code; barplot(as.matrix(GEP.data2), beside=TRUE, main=Global Portfolio Weights, col.main=gray, col=blues9, cex.axis=0.1, ylim=c(-1,1), las=2, cex.lab=1, cex=0.8) where; GEP.data2 = VGSIX.equity VUSTX.equity VGTSX.equity VFISX.equity VTSMX.equity VFITX.equity VEIEX.equity VIPSX.equity 1 -0.08645095 0.08991793 0.03548216 0.45 0.45 0.45 -0.1689109 -0.2200382 However i am having the following issues; (1) neither x or y axis appear on my graph (bars appear as if they are floating)(2) no axis labels are showing up (3) chart title is too high on graph and is being cut off from view Please can you help solve these issues? Thanks Pierre __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sensitivity,Specificity and Youden Index
How to calculate the sensitivity,specificity,Youden index for 18 factors and their combination (6 factors in each) with an outcome measure. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R-es] ERROR: subscript out of bounds
Buenas, Cuando intento ejecutar el siguiente codigo, me aparece el error Error in dist.mat[com.names, com.names] : subscript out of bounds. Porque me aparece este error? Y como puedo solucionarlo? Gracias!!! CONSOLE: my.phylo-read.tree(150_BootstrapConsensusTree_Cantabrian_ML.nwk) my.sample - read.delim(C:/Filogenia/my.sample.txt) my.sample-read.table(my.sample.txt,sep=\t, header=T, row.names=1) library(SDMTools) sntd.a.function - function(x){ com.names - names(x[x 0])# Get de names of the species present in a community my.com.dist - dist.mat[com.names, com.names] # Distance matrix diag(my.com.dist) = NA # Diagonal values to NA - Matriz sim, diagonal zero wt.sd(apply(my.com.dist,1,min,na.rm=T), x[x0]) } dist.mat-cophenetic(my.phylo) proba-apply(my.sample, MARGIN = 1, sntd.a.function) Error in dist.mat[com.names, com.names] : subscript out of bounds 150_BootstrapConsensusTree_Cantabrian_ML.nwk Description: Binary data LOC Amblystegium serpensAmblystegium subtileAnomodon attenuatus Anomodon viticulosusApometzgeria pubescens Brachythecium populeum Brachythecium velutinum Bryum moravicum Cf. Hypnum sp. Cryphaea heteromalla Scleropodium cespitans Eurhynchium sp. Kindbergia praelonga Frullania dilatata Frullania tamarisci Homalothecium sericeum Hypnum cupressiforme var. cupressiforme Hypnum cupressiforme var. filiforme Hypnum recurvatum Hypnum revolutumIsothecium alopecuroides Jungermannia sp.Leptodon smithiiLeucodon sciuroides Lejeunea cavifolia Lophocolea bidentataMetzgeria furcata Mnium stellare Neckera besseri Neckera complanata Neckera crispa Orthotrichum affine Orthotrichum lyelliiOrthotrichum speciosum Orthotricum rupestre Orthotrichum sp.Orthotrichum pallensOrthotrichum stramineum Orthotrichum striatum Othotrichum aff. schimperi Orthotrichum tenellum Plagiochila porelloides Plagiothecium nemorale Porella arboris-vitae Porella platyphylla Pseudoleskeella nervosa Pterigynandrum filiforme Radula complanata Rhynchostegiella curviseta Scleropodium cespitans Thuidium delicatulumTortella flavovirens var. flavovirens Ulota crispa Tortella tortuosa La Preste 0.5 0 0 0 12 0 0 0.5 0 0 0 6 2 377 0 0 0 23.5 18 0 0 1 0 3 0 0 8 0 0 0 0 65 0 76 52 107 41 0 139.5 21 0 5 0 0 157 250 157.5 35.5 0 8 17 0 0 1.5 Toran 30 46 305 148 0 8 6 1.5 0 0 0 8 0 451 49 491 28 27 102 0 669 0 0 163 1 0 171 1 0 1052.5 0 5 0 0 13 58.50 25 24 0 3.5 5 30 0 701 6 504.5 332 24 0 0 0 41.50 Els Ports 0 60 0 0 0 0 10 0 20 8 0 0 0 465.5 12 126 0 65 45 6 0 2 181 0 30 3 538 0 37.52434.5 84 44.53 0 0 46.56 5 45 0 0 21.50 100 56 0 82 180 0 0 0 0 0 0 ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
Re: [R] R version update
On Jul 21, 2015 9:30 PM, kle...@sxmail.de wrote: Dear Ladies and Gentlemen, as a beginner in R, I encountered a sort of naturally given limitsconcerning the process of amending R with packages. I apparently own version 3.0.2 and I principally decided to use R via the RKWard GUI on Linux Kubuntu Trusty Tahr; the R version installed on my computer is admittedly not far from the basics provided by my distribution's package management (also throughout multiple rounds of updates/ upgrades of the system). What would I have to do to finally update/ upgrade beyond that R version? https://cran.r-project.org/bin/linux/ubuntu/ Would that process (generally?) affect any/ my GUI? possibly. This problem appeared to me as I was busy installing packages an received the error message package ‘ATLAS’ is not available (for R version 3.0.2) . I don't think there is any package on CRAN with that name. What makes you think there is? I am quite desperate and would look forward to be indicated a path to a sustainable solution or be told how to mitigate/ circumvent (these/ such) problems. I'm not actually sure exactly what the problem is... Thanks a lot! Best regards,Markus Hofstetter [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Warning message with maxLik()
Sent from my iPhone Begin forwarded message: From: Maram SAlem marammagdysa...@gmail.com Date: July 21, 2015 at 11:40:56 PM GMT+2 To: Arne Henningsen arne.henning...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Subject: Re: [R] Warning message with maxLik() Dear Arne, The elements of the theta vector are indeed strictly positive. I've just tried to use instead : lamda = log (theta), which means that theta = exp (lamda), so as to get rid of the log() function that appears in the log-likelihood and is causing the 50 warnings, but still the estimates I got for lamda and then those I got for theta (using theta=exp(lamda)) are irrelvant and their standard errors are infinite, which means that therer is still a problem that I can't yet figure out. Thanks, Maram On 18 July 2015 at 08:01, Arne Henningsen arne.henning...@gmail.com wrote: Dear Maram - Please do not start a new thread for the same issue but reply to previous messages in this thread [1]. - Please read my previous responses [1] more carefully, e.g. to use theta - exp( param ) which guarantees that all elements of theta are always positive. [1] http://r.789695.n4.nabble.com/NaN-produced-from-log-with-positive-input-td4709463.html Best regards, Arne 2015-07-18 2:46 GMT+02:00 Maram SAlem marammagdysa...@gmail.com: Dear All, I'm trying to get the MLe for a certain distribution using maxLik () function. I wrote the log-likelihood function as follows: theta -vector(mode = numeric, length = 3) r- 17 n -30 T-c(7.048,0.743,2.404,1.374,2.233,1.52,23.531,5.182,4.502,1.362,1.15,1.86,1.692,11.659,1.631,2.212,5.451) C- c(0.562,5.69,12.603,3.999,6.156,4.004,5.248,4.878,7.122,17.069,23.996,1.538,7.792) # The loglik. func. loglik - function(param) { theta[1]- param[1] theta[2]- param[2] theta[3]- param[3] l-(r*log(theta[3]))+(r*log(theta[1]+theta[2]))+(n*theta[3]*log(theta[1]))+(n*theta[3]*log(theta[2]))+ (-1*(theta[3]+1))*sum(log((T*(theta[1]+theta[2]))+(theta[1]*theta[2])))+ (-1*theta[3]*sum(log((C*(theta[1]+theta[2]))+(theta[1]*theta[2] return(l) } then, I evaluated it at theta- c(40,50,2) v-loglik(param=theta) v [1] -56.66653 I used this same log-likelihood function, once with analytic gradient and another time with numerical one, with the maxLik function, and in both cases I got the same 50 warning messages and an MLE which is completely unrealistic as per my applied example. a - maxLik(loglik, gradlik, hesslik, start=c(40,50,2)) where gradlik and hesslik are the analytic gradient and Hessian matrix, respectively, given by: U - vector(mode=numeric,length=3) gradlik-function(param = theta,n, T,C) { U - vector(mode=numeric,length=3) theta[1] - param[1] theta[2] - param[2] theta[3] - param[3] r- 17 n -30 T-c(7.048,0.743,2.404,1.374,2.233,1.52,23.531,5.182,4.502,1.362,1.15,1.86,1.692,11.659,1.631,2.212,5.451) C- c(0.562,5.69,12.603,3.999,6.156,4.004,5.248,4.878,7.122,17.069,23.996,1.538,7.792) U[1]- (r/(theta[1]+theta[2]))+((n*theta[3])/theta[1])+( -1*(theta[3]+1))*sum((T+theta[2])/((theta[1]+theta[2])*T+(theta[1]*theta[2])))+ (-1*(theta[3]))*sum((C+theta[2])/((theta[1]+theta[2])*C+(theta[1]*theta[2]))) U[2]-(r/(theta[1]+theta[2]))+((n*theta[3])/theta[2])+ (-1*(theta[3]+1))*sum((T+theta[1])/((theta[1]+theta[2])*T+(theta[1]*theta[2])))+ (-1*(theta[3]))*sum((C+theta[1])/((theta[1]+theta[2])*C+(theta[1]*theta[2]))) U[3]-(r/theta[3])+(n*log(theta[1]*theta[2]))+ (-1)*sum(log((T*(theta[1]+theta[2]))+(theta[1]*theta[2])))+(-1)*sum(log((C*(theta[1]+theta[2]))+(theta[1]*theta[2]))) return(U) } hesslik-function(param=theta,n,T,C) { theta[1] - param[1] theta[2] - param[2] theta[3] - param[3] r- 17 n -30 T-c(7.048,0.743,2.404,1.374,2.233,1.52,23.531,5.182,4.502,1.362,1.15,1.86,1.692,11.659,1.631,2.212,5.451) C- c(0.562,5.69,12.603,3.999,6.156,4.004,5.248,4.878,7.122,17.069,23.996,1.538,7.792) G- matrix(nrow=3,ncol=3) G[1,1]-((-1*r)/((theta[1]+theta[2])^2))+((-1*n*theta[3])/(theta[1])^2)+ (theta[3]+1)*sum(((T+theta[2])/((theta[1]+theta[2])*T+(theta[1]*theta[2])))^2)+( theta[3])*sum(((C+theta[2])/((theta[1]+theta[2])*C+(theta[1]*theta[2])))^2) G[1,2]-((-1*r)/((theta[1]+theta[2])^2))+ (theta[3]+1)*sum(((T)/((theta[1]+theta[2])*T+(theta[1]*theta[2])))^2)+ (theta[3])*sum(((C)/((theta[1]+theta[2])*C+(theta[1]*theta[2])))^2) G[2,1]-G[1,2] G[1,3]-(n/theta[1])+(-1)*sum( (T+theta[2])/((theta[1]+theta[2])*T+(theta[1]*theta[2])))+(-1)*sum((C+theta[2])/((theta[1]+theta[2])*C+(theta[1]*theta[2]))) G[3,1]-G[1,3] G[2,2]-((-1*r)/((theta[1]+theta[2])^2))+((-1*n*theta[3])/(theta[2])^2)+ (theta[3]+1)*sum(((T+theta[1])/((theta[1]+theta[2])*T+(theta[1]*theta[2])))^2)+( theta[3])*sum(((C+theta[1])/((theta[1]+theta[2])*C+(theta[1]*theta[2])))^2)
Re: [R] Differences in output of lme() when introducing interactions
Type III is a peculiarity of SAS, which has taken root in the world. There are 3 main questions wrt to it: 1. How to compute it (outside of SAS). There is a trick using contr.treatment coding that works if the design has no missing factor combinations, your post has a link to such a description. The SAS documentation is very obtuse, thus almost no one knows how to compute the general case. 2. What is it? It is a population average. The predicted average treatment effect in a balanced population-- one where all the factor combinations appeared the same number of times. One way to compute 'type 3' is to create such a data set, get all the predicted values, and then take the average prediction for treatment A, average for treatment B, average for C, ... and test are these averages the same. The algorithm of #1 above leads to another explanation which is a false trail, in my opinion. 3. Should you ever use it? No. There is a very strong inverse correlation between understand what it really is and recommend its use. Stephen Senn has written very intellgently on the issues. Terry Therneau On 07/22/2015 05:00 AM, r-help-requ...@r-project.org wrote: Dear Michael, thanks a lot. I am studying the marginality and I came across to this post: http://www.ats.ucla.edu/stat/r/faq/type3.htm Do you think that the procedure there described is the right one to solve my problem? Would you have any other online resources to suggest especially dealing with R? My department does not have a statician, so I have to find a solution with my own capacities. Thanks in advance Angelo __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm help - final predictor variable NA
matthewjones43 matthew.jones at kellogg.ox.ac.uk writes: Hi, I am not a statistician and so I am sure whatever it is I am doing wrong must be an obvious error for those who are...Basically I can not understand why I get NA for variable 'CDSTotal' when running a glm? Does anyone have an idea of why this might be happening? It might be useful to look at http://stackoverflow.com/questions/7337761/ linear-regression-na-estimate-just-for-last-coefficient/7341074#7341074 (broken URL). You are overfitting the model by including a predictor that can be expressed as a linear combination of other predictors, and R is trying to handle it automatically. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R-es] web RSelenium
Estimados, les envío un código, el cuál me da error, aunque dudo si está escrito correctamente puesto que muchos ejemplos me están fallando. Agradezco si alguno me informa un error al igual que el informado por mi computadora, no creo que el inconveniente sea por la opción de R que instalé en Windows 8.1, ya me perdí buscando mi error. Revolution R Open 8.0.3 Using CRAN snapshot taken on 2015-04-01 library(RSelenium) checkForServer() startServer() remDr - remoteDriver() remDr$open() remDr$navigate(http://www.r-project.org;) webElems - remDr$findElement(using = 'css selector', body div div.row div.col-xs-12.col-sm-offset-1.col-sm-2.sidebar div div:nth-child(1) ul) resHeaders - unlist(lapply(webElems, function(x){x$getElementText()})) #Error in x$getElementText : object of type 'closure' is not subsettable Javier Rubén Marcuzzi Técnico en Industrias Lácteas Veterinario [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
Re: [R] Statistical distribution not fitting
So - as you can see, your data can be modelled. Now the interesting question is: what do you do with that knowledge. I know nearly nothing about your domain, but given that the data looks log-normal, I am curious abut the following: - Most of the events are in the small-loss category. But most of the damage is done by the rare large losses. Is it even meaningful to guard against a single 1/1000 event? Shouldn't you be saying: my contingency funds need to be large enough to allow survival of, say, a fiscal year with 99.9 % probability? This is a very different question. - If a loss occurs, in what time do the funds need to be replenished? Do you need to take series of events into account? - The model assumes that the data are independent. This is probably a poor (and dangerous) assumption. Cheers, B. On Jul 22, 2015, at 3:56 PM, Ben Bolker bbol...@gmail.com wrote: Amelia Marsh amelia_marsh08 at yahoo.com writes: Hello! (I dont know if I can raise this query here on this forum, but I had already raised on teh finance forum, but have not received any sugegstion, so now raising on this list. Sorry for the same. The query is about what to do, if no statistical distribution is fitting to data). I am into risk management and deal with Operatioanl risk. As a part of BASEL II guidelines, we need to arrive at the capital charge the banks must set aside to counter any operational risk, if it happens. As a part of Loss Distribution Approach (LDA), we need to collate past loss events and use these loss amounts. The usual process as being practised in the industry is as follows - Using these historical loss amounts and using the various statistical tests like KS test, AD test, PP plot, QQ plot etc, we try to identify best statistical (continuous) distribution fitting this historical loss data. Then using these estimated parameters w.r.t. the statistical distribution, we simulate say 1 miliion loss anounts and then taking appropriate percentile (say 99.9%), we arrive at the capital charge. However, many a times, loss data is such that fitting of distribution to loss data is not possible. May be loss data is multimodal or has significant variability, making the fitting of distribution impossible. Can someone guide me how to deal with such data and what can be done to simulate losses using this historical loss data in R. A skew-(log)-normal fit doesn't look too bad ... (whenever you have positive data that are this strongly skewed, log-transforming is a good step) hist(log10(mydat),col=gray,breaks=FD,freq=FALSE) ## default breaks are much coarser: ## hist(log10(mydat),col=gray,breaks=Sturges,freq=FALSE) lines(density(log10(mydat)),col=2,lwd=2) library(fGarch) ss - snormFit(log10(mydat)) xvec - seq(2,6.5,length=101) lines(xvec,do.call(dsnorm,c(list(x=xvec),as.list(ss$par))), col=blue,lwd=2) ## or try a skew-Student-t: not very different: ss2 - sstdFit(log10(mydat)) lines(xvec,do.call(dsstd,c(list(x=xvec),as.list(ss2$estimate))), col=purple,lwd=2) There are more flexible distributional families (Johnson, log-spline ...) Multimodal data are a different can of worms -- consider fitting a finite mixture model ... __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] interactive Map: Popups
I would start by looking at the Graphics and Web Technologies entries in Task Views on CRAN. In addition, I suspect there might be some packages listed in the Spatial task view that could do this. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 7/22/15, 4:35 PM, R-help on behalf of Marie-Louise r-help-boun...@r-project.org on behalf of tim...@hotmail.de wrote: Hello, I am trying to build a map of a country which shows informations to its regions in a popup window as soon as someone clicks on a region. Thank you -- View this message in context: http://r.789695.n4.nabble.com/interactive-Map-Popups-tp4710226.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to simulate informative censoring in a Cox PH model?
I think that the Cox model still works well when the only information in the censoring is conditional on variables in the model. What you describe could be called non-informative conditional on x. To really see the difference you need informative censoring that depends on something not included in the model. One option would be to use copulas to generate dependent data and then transform the values using your Weibul. Or you could generate your event times and censoring times based on x1 and x2, but then only include x1 in the model. On Wed, Jul 22, 2015 at 2:20 AM, Daniel Meddings dpmeddi...@gmail.com wrote: I wish to simulate event times where the censoring is informative, and to compare parameter estimator quality from a Cox PH model with estimates obtained from event times generated with non-informative censoring. However I am struggling to do this, and I conclude rather than a technical flaw in my code I instead do not understand what is meant by informative and un-informative censoring. My approach is to simulate an event time T dependent on a vector of covariates x having hazard function h(t|x)=lambda*exp(beta'*x)v*t^{v-1}. This corresponds to T~ Weibull(lambda(x),v), where the scale parameter lambda(x)=lambda*exp(beta'*x) depends on x and the shape parameter v is fixed. I have N subjects where T_{i}~ Weibull(lambda(x_{i}),v_{T}), lambda(x_{i})=lambda_{T}*exp(beta_{T}'*x_{i}), for i=1,...,N. Here I assume the regression coefficients are p-dimensional. I generate informative censoring times C_i~ Weibull(lambda(x_i),v_C), lambda(x_i)=lambda_C*exp(beta_C'*x_i) and compute Y_inf_i=min(T_i,C_i) and a censored flag delta_inf_i=1 if Y_inf_i = C_i (an observed event), and delta_inf_i=0 if Y_inf_i C_i (informatively censored: event not observed). I am convinced this is informative censoring because as long as beta_T~=0 and beta_C~=0 then for each subject the data generating process for T and C both depend on x. In contrast I generate non-informative censoring times D_i~Weibull(lambda_D*exp(beta_D),v_D), and compute Y_ninf_i=min(T_i,D_i) and a censored flag delta_ninf_i=1 if Y_ninf_i = D_i (an observed event), and delta_ninf_i=0 if Y_ninf_i D_i (non-informatively censored: event not observed). Here beta_D is a scalar. I scale the simulation by choosing the lambda_T, lambda_C and lambda_D parameters such that on average T_iC_i and T_iD_i to achieve X% of censored subjects for both Y_inf_i and Y_ninf_i. The problem is that even for say 30% censoring (which I think is high), the Cox PH parameter estimates using both Y_inf and Y_ninf are unbiased when I expected the estimates using Y_inf to be biased, and I think I see why: however different beta_C is from beta_T, a censored subject can presumably influence the estimation of beta_T only by affecting the set of subjects at risk at any time t, but this does not change the fact that every single Y_inf_i with delta_inf_i=1 will have been generated using beta_T only. Thus I do not see how my simulation can possibly produce biased estimates for beta_T using Y_inf. But then what is informative censoring if not based on this approach? Any help would be greatly appreciated. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Differences in output of lme() when introducing interactions
On 23/07/15 01:15, Therneau, Terry M., Ph.D. wrote: SNIP 3. Should you ever use it [i.e. Type III SS]? No. There is a very strong inverse correlation between understand what it really is and recommend its use. Stephen Senn has written very intellgently on the issues. Terry --- can you please supply an explicit citation? Ta. cheers, Rolf -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] interactive Map: Popups
Hello, I am trying to build a map of a country which shows informations to its regions in a popup window as soon as someone clicks on a region. Thank you -- View this message in context: http://r.789695.n4.nabble.com/interactive-Map-Popups-tp4710226.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R: Re: Differences in output of lme() when introducing interactions
Dear Terry, I am very grateful to you for such a detailed and helpful answer. Following your recommendation then I will skip the method presented at http://www.ats.ucla.edu/stat/r/faq/type3.htm So far, based on my understanding of R I arrived to the conclusion that the correct way to see if there is a correlation between my dependent variable (spectral centroid of a sound) and weight, height, and interaction between weight and height of participants asked to create those sounds (in a repeated measure design) is: lme_centroid - lme(Centroid ~ Weight*Height*Shoe_Size, data = My_data, random = ~1 | Subject) anova.lme(lme_centroid,type = marginal) Can anyone please confirm me that those formulas are actually correct and give the significant or non significant p-values for the main effects and their interactions? I would prefer to use lme(), not lmer(). I am making the assumption of course that the model I am using (Centroid ~ Weight*Height*Shoe_Size) is the best fit for my data. Thanks in advance Angelo Messaggio originale Da: thern...@mayo.edu Data: 22-lug-2015 15.15 A: r-help@r-project.org, angelo.arc...@virgilio.it Ogg: Re: Differences in output of lme() when introducing interactions Type III is a peculiarity of SAS, which has taken root in the world. There are 3 main questions wrt to it: 1. How to compute it (outside of SAS). There is a trick using contr.treatment coding that works if the design has no missing factor combinations, your post has a link to such a description. The SAS documentation is very obtuse, thus almost no one knows how to compute the general case. 2. What is it? It is a population average. The predicted average treatment effect in a balanced population-- one where all the factor combinations appeared the same number of times. One way to compute 'type 3' is to create such a data set, get all the predicted values, and then take the average prediction for treatment A, average for treatment B, average for C, ... and test are these averages the same. The algorithm of #1 above leads to another explanation which is a false trail, in my opinion. 3. Should you ever use it? No. There is a very strong inverse correlation between understand what it really is and recommend its use. Stephen Senn has written very intellgently on the issues. Terry Therneau On 07/22/2015 05:00 AM, r-help-requ...@r-project.org wrote: Dear Michael, thanks a lot. I am studying the marginality and I came across to this post: http://www.ats.ucla.edu/stat/r/faq/type3.htm Do you think that the procedure there described is the right one to solve my problem? Would you have any other online resources to suggest especially dealing with R? My department does not have a statician, so I have to find a solution with my own capacities. Thanks in advance Angelo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Differences in output of lme() when introducing interactions
Hi, In addition to Terry’s great comments below, as this subject has come up frequently over the years, there is also a great document by Bill Venables that is valuable reading: Exegeses on Linear Models http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf Regards, Marc Schwartz On Jul 22, 2015, at 8:15 AM, Therneau, Terry M., Ph.D. thern...@mayo.edu wrote: Type III is a peculiarity of SAS, which has taken root in the world. There are 3 main questions wrt to it: 1. How to compute it (outside of SAS). There is a trick using contr.treatment coding that works if the design has no missing factor combinations, your post has a link to such a description. The SAS documentation is very obtuse, thus almost no one knows how to compute the general case. 2. What is it? It is a population average. The predicted average treatment effect in a balanced population-- one where all the factor combinations appeared the same number of times. One way to compute 'type 3' is to create such a data set, get all the predicted values, and then take the average prediction for treatment A, average for treatment B, average for C, ... and test are these averages the same. The algorithm of #1 above leads to another explanation which is a false trail, in my opinion. 3. Should you ever use it? No. There is a very strong inverse correlation between understand what it really is and recommend its use. Stephen Senn has written very intellgently on the issues. Terry Therneau On 07/22/2015 05:00 AM, r-help-requ...@r-project.org wrote: Dear Michael, thanks a lot. I am studying the marginality and I came across to this post: http://www.ats.ucla.edu/stat/r/faq/type3.htm Do you think that the procedure there described is the right one to solve my problem? Would you have any other online resources to suggest especially dealing with R? My department does not have a statician, so I have to find a solution with my own capacities. Thanks in advance Angelo __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sensitivity,Specificity and Youden Index
On Wed, Jul 22, 2015 at 9:50 AM, Jomy Jose infoj...@gmail.com wrote: How to calculate the sensitivity,specificity,Youden index for 18 factors and their combination (6 factors in each) with an outcome measure. www.rseek.org turns up a bunch of references to Youden index, including packages. Without a reproducible example that includes some sample data (fake is fine), the code you used, and some clear idea of what output you expect, it's impossible to figure out how to help you. Here are some suggestions for creating a good reproducible example: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculate adjacent log odds for a table
On 7/21/2015 11:14 AM, Michael Friendly wrote: More generally, for an I x J x K table, where the last factor is the response, my desired result is a data frame of IJ(K-1) rows, with adjacent log odds in a 'logodds' column, and ideally, I'd like to have a general function to do this. Note that if T is the 10 x 3 matrix of frequencies shown by ftable(), the calculation is essentially log(T) %*% matrix(c(1, -1, 0, 0, 1, -1)) followed by reshaping and labeling. No one answered, but as I often find, writing it out as a MWE was helpful. Here is a simpler approach for my sample case, that should be easier to generalize # reshape to matrix T - matrix(mice.tab, nrow=prod(dim(mice.tab)[1:2]), ncol=dim(mice.tab)[3]) colnames(T) - dimnames(mice.tab)[[3]] rn - expand.grid(litter=factor(7:11), treatment=c(A,B)) rownames(T) - apply(rn, 1, paste, collapse=:) T # calculate log odds as contrasts C - matrix(c(1, -1, 0, 0, 1, -1), nrow=3) lodds - log(T) %*% C colnames(lodds) - c(0:1, 1:2+) lodds lodds 0:1 1:2+ 7:A 1.6625477 0.7884574 8:A 1.2527630 0.3364722 9:A 0.6061358 0.1823216 10:A 0.1431008 -0.1431008 11:A -1.0986123 -0.3483067 7:B 1.3730491 0.9985288 8:B 1.2272297 0.7537718 9:B 0.7156200 0.7884574 10:B 0.5725192 0.2006707 11:B -1.0986123 0.6286087 # make a data frame DF2 - data.frame(expand.grid(litter=factor(7:11), treatment=c(A,B), deaths=c(0:1, 1:2+)), logodds=c(lodds)) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make a persistence landscape in R?
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and http://adv-r.had.co.nz/Reproducibility.html John Kane Kingston ON Canada -Original Message- From: l.shul...@hotmail.com Sent: Tue, 21 Jul 2015 10:39:45 -0700 (PDT) To: r-help@r-project.org Subject: [R] How to make a persistence landscape in R? Hi there, I am trying to construct a persistence landscape in R that shows all of the overlapping triangles -- not just the overall silhouette-- how can you do this? Everytime I plug in the code I get one big triangle that corresponds to the largest barcode, but I cannot see any of the smaller barcodes/ overlapping isosceles trianges. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/How-to-make-a-persistence-landscape-in-R-tp4710159.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.