[R] [R-pkgs] New version of catspec package
I've uploaded a new version of catspec to CRAN. Catspec is for estimating certain special categorical models. It also contains ctab, a function for creating one-way, two-way, and multi-way percentage tables (nothing special there really). Ctab can now print more than one percentage type, as well as table marginals. The first special model in catspec is mclgen. Mclgen restructures a data frame so a multinomial logistic model can be estimated using a condition logit program. Doing so provides much more flexibility for imposing restrictions on the response variable. The second special (set of) models is sqtab, for estimating loglinear models for square tables (aka mobility models) such as symmetry, quasi-independence. One application can be to specify such models as multinomial logistic models with covariates, using mclgen. John Hendrickx ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] [R-pkgs] perturb package for evaluating collinearity
I've uploaded the R package perturb to CRAN. Perturb contains two programs for evaluating collinearity. Colldiag calculates condition indexes and variance decomposition proportions to detect and track down collinear sets of variables. Perturb takes a different approach. It re-estimates the model a specified number of times, adding random noise (perturbations) to selected variables at each iteration. Categorical variables are randomly reclassified. A summary then displays the impact of these perturbations on the stability of the coefficients. Perturb is not limited to linear regression and can handle interaction effects and transformations of independent variables. Version 1 of perturb was available from my website. This new version makes it much easier to reclassify categorical variables. In addition, colldiag now has a fuzz option to suppress printing of small variance decomposition values. Comments and criticisms invited, John Hendrickx ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Ever see a stata import problem like this?
I've had a similar problem once. What may have caused the problem then was a variate for which value lables had been defined for the highest and lowest values. What complicates things is that the file had been originally converted from SPSS to Stata. A workaround was to set convert.factor=FALSE and that seems to work here too (using R 1.91 and the latest update for foreign): m2-read.dta(morgen.dta,convert.factors=FALSE) summary(m2) CASEID yearidhrs1 Min. : 19721 Min. :1972 Min. : 1 Min. :0.00 1st Qu.: 1983475 1st Qu.:1978 1st Qu.: 445 1st Qu.: 37.00 Median : 1996808 Median :1987 Median : 905 Median : 40.00 Mean : 9963040 Mean :1986 Mean : 990 Mean : 41.05 3rd Qu.:19872187 3rd Qu.:1994 3rd Qu.:1358 3rd Qu.: 48.00 Max. :20002817 Max. :2000 Max. :3247 Max. : 89.00 NA's :17654.00 hrs2 prestigeagewed age Min. :0.00 Min. : 12.00 Min. : 12.00 Min. : 18.00 1st Qu.: 38.00 1st Qu.: 30.00 1st Qu.: 19.00 1st Qu.: 30.00 Median : 40.00 Median : 39.00 Median : 21.00 Median : 42.00 Mean : 39.79 Mean : 39.36 Mean : 22.10 Mean : 45.15 3rd Qu.: 45.00 3rd Qu.: 48.00 3rd Qu.: 24.00 3rd Qu.: 58.00 Max. : 89.00 Max. : 82.00 Max. : 73.00 Max. : 89.00 NA's :40159.00 NA's :1.00 NA's :15551.00 NA's :143.00 educpaeduc maeducspeduc Min. : 0.00 Min. :0.00 Min. : 0.00 Min. : 0.00 1st Qu.: 11.00 1st Qu.:8.00 1st Qu.: 8.00 1st Qu.: 12.00 Median : 12.00 Median : 11.00 Median : 12.00 Median : 12.00 Mean : 12.48 Mean : 10.21 Mean : 10.41 Mean : 12.53 3rd Qu.: 14.00 3rd Qu.: 12.00 3rd Qu.: 12.00 3rd Qu.: 14.00 Max. : 20.00 Max. : 20.00 Max. : 20.00 Max. : 20.00 NA's :127.00 NA's :11586.00 NA's :6782.00 NA's :18153.00 income Min. : 1.000 1st Qu.: 9.000 Median : 11.000 Mean : 9.756 3rd Qu.: 12.000 Max. : 13.000 NA's :3453.000 --- Paul Johnson [EMAIL PROTECTED] wrote: Greetings Everybody: I generated a 1.2MB dta file based on the general social survey with Stata8 for linux. The file can be re-opened with Stata, but when I bring it into R, it says all the values are missing for most of the variables. This dataset is called morgen.dta and I dropped a copy online in case you are interested http://www.ku.edu/~pauljohn/R/morgen.dta [snip] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] mutlicollinearity and MM-regression
--- [EMAIL PROTECTED] wrote: Dear R users, Usually the variance-inflation factor, which is based on R^2, is used as a measure for multicollinearity. But, in contrast to OLS regression there is no robust R^2 available for MM-regressions in R. Do you know if an equivalent or an alternative nmeasure of multicollinearity is available for MM-regression in R? I'm not sure what MM-regression is. But I've just put a general purpose tool for evaluating collinearity on my website. See http://www.xs4all.nl/~jhckx/R/perturb/ The perturb programs works by adding small random changes (perturbations) to selected variables. Categorical variables are randomly misclassified. This process is repeated a specified number of times, after which the impact of the perturbations on parameter stability can be evaluated. It should work with any R-procedure that has a formula. The package also contains colldiag, for calculating condition indexes and variance decomposition proportions. Since this only works on the independent variables, it should work for your problem as well. Feedback welcomed. I plan to submit the package to CRAN in a few days, after I get the help files updated __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] converting character strings to eval
--- Deepayan Sarkar [EMAIL PROTECTED] wrote: On Monday 19 July 2004 09:37, Wayne Jones wrote: Hi there fellow R-users, I'm stuck on this seemingly trivial problem. All I want to coerce a character string into a command. For example: x-rnorm(20) y-rnorm(20) str-lm(y~x) I want to evaluate the str command. I have tried eval(as.expression(str)) eval(parse(text = str)) seems to work. Couldn't eval be modified to automatically parse arguments if they're not expressions? Something like: eval2-function(arg) { if (!is.expression(arg)) arg-parse(text=arg) eval(arg) } Would a construction like eval2 have a downside or cause problems down the line? __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] indexing a parameter in nls
I'm trying to estimate a diagonal reference model in nls. In its basic form, the model consists of two factors with equal categories and a dependent variable. The model fits the main effects of the two factors such that they are proportional to each other. So the model I want to fit is: Y=p*m[f1]+(1-p)*m[f2] The m-parameters indicate the average main effect of factors f1 and f2 (with no intercept). The p-parameter indicates the relative impact of the two factors on Y. If p=.5, f1 and f2 have equal impact, for p.5, f1 has greater impact, for p.5 f2 has greater impact. I can estimate the model as follows for f1=row, f2=col, Y=nkids: rd-model.matrix(~as.factor(row)-1) cd-model.matrix(~as.factor(col)-1) strt-list(m1=2,m2=2,m3=2,m4=2,m5=3,p=.5) nls(nkids~p*(rd[,1]*m1+rd[,2]*m2+rd[,3]*m3+rd[,4]*m4+rd[,5]*m5) + (1-p)*(cd[,1]*m1+cd[,2]*m2+cd[,3]*m3+cd[,4]*m4+cd[,5]*m5), start=strt) --- What I want however, is a more compact specification, where row and col are used to index which m-parameter should be used, something like: nls(nkids~p*strt[row]+(1-p)*strt[col],start=strt) This specification produces an error: Error in nlsModel(formula, mf, start) : singular gradient matrix at initial parameter estimates. I suspect the problem is that strt[row] evaluates to a list of starting values rather than the object names m1, m2, etc. I've tried nls(nkids~p*get(names(strt[row]))+(1-p)*get(names(strt[col])),start=strt) But this produces an error as well: Error in qr.qty(QR, resid) : qr and y must have the same number of rows Can anyone point me in the right direction? How can I use row and col to index the appropriate m parameter in this model? __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] best editor for .R files
An extensive review of text editors for different platforms is at http://fmwww.bc.edu/repec/bocode/t/textEditors.html It's geared to Stata users but contains relevant information for R users as well. Personally, I use TextPad for Windows. It has syntax highlighting, regular expression search and replace, selection of rectangular blocks of text. You can run programs from within TextPad and if the output file has been opened it's automatically updated if you submit the syntax again, a very useful feature I found. Good luck, John Hendrickx --- [EMAIL PROTECTED] wrote: On the Windows platform, I use SourceEdit which also has R syntax highlighting (and much more) (http://www.brixoft.com/). However, as an organization, we do have plans to look at the IDE Eclipse (http://www.eclipse.org/ ) Partha Angel [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 11/20/2003 04:27 PM To: [EMAIL PROTECTED] cc: Subject:[R] best editor for .R files Which is the best editor for .R files? I currently use kate on my linux as it has R highlighting and allows me to split the window into two: in one I edit the .R file and in the other I have a shell so I run R and can easily copy and paste the code. There are some features that I don't like and I am having a look on some alternatives. I've heard wonders of emacs with ess but I am a little bit frightened of the steep learning curve. What do the R experts use or would recommend using? Both linux and/or windows alternatives are welcomed. I guess it would much depend on the particular needs/preferences of each user but I would like to know which are the most commonly used editors. Thanks, Angel __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ Free Pop-Up Blocker - Get it now __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] (multiway) percentage tables
R has amazing capabilities, but percentage tables are a weak spot IMHO. There's prop.table but that's rather unwieldly, especially for multiway tables. CrossTable by Marc Schwartz in the gregmisc library makes percentage tables a breeze but is limited to two-way tables. So I decided to try my own hand at writing an R-function that would make it easy to produce nicely formatted percentage tables for one-way, two-way, or multi-way tables. The first argument for ctab can be either a table object or one or more factors. The first variable is assumed to be the row variable, the second the column variable, subsequent variables are grouping variables. The type option can be used to specify percentage type (n, row, column, or total), digits to specify the number of decimal points, percentage=FALSE can be used to print proportions rather than percentages. row.vars and col.vars are passed on to ftables for formatting multiway tables. I'd like to see something like ctab in R-base at some point in the future. Perhaps it could be integrated in ftable? Perhaps I'll try that myself as a next project. I'm still learning R so comments on ctab are most welcome. Best, John Hendrickx examples of usage source(ctab.R) data(Titanic) ctab(Titanic) ctab(Titanic,type=r) ctab(Titanic,row.vars=1:3,type=r) ctab(Titanic,col.vars=c(2,4),type=r) ctab(Titanic,row.vars=c(1,3),type=c) ctab(Titanic,col.vars=c(Sex,Survived),type=r) ctab(Titanic,col.vars=c(Sex,Survived),type=t) -- ctab -- # ctab: oneway, twoway, multiway percentage tables # first argument must consist of one or more factors # or a table object (class table, xtabs, or ftable) # digits: number of digits after the decimal (default 2) # type: n for counts, row, column or total # for percentages (default n) # row.vars: # col.vars: same usage as ftable, ignored for one- and # two-way tables # percentages: FALSE== proportions are presented rather # than percentages (default TRUE) # comments to John Hendrickx [EMAIL PROTECTED] ctab-function(...,digits=2, type=c(n, row, column, total), row.vars=NULL, col.vars=NULL, percentages=TRUE) { if (attributes(...)$class==factor) { # create a table if the arguments are factors tbl-table(...) } else if (table %in% class(...) || class(...)==ftable) { # the argument is a table object (table, xtabs, ftable) tbl-eval(...) } else { stop(first argument must be either factors or a table object) } type-match.arg(type) # one dimensional table,restrict choices to n and total if (length(dim(tbl))==1) { type-ifelse(type==n,n,total) } # if the object is an ftable, use the row.vars and col.vars # use numeric indices to avoid finding the omitted # the object must be converted to a table to get the dimensions right if (class(tbl)==ftable) { nrowvar-length(names(attr(tbl,row.vars))) row.vars-1:nrowvar col.vars-(1:length(names(attr(tbl,col.vars+nrowvar tbl-as.table(tbl) } # marginals to exclude assuming first factor is the row vaariable, # second factor is the column variable # is overridden by row.vars or col.vars mrg2drop-0 if (type==column) {mrg2drop-1} if (type==row) {mrg2drop-2} if (type==total length(dim(tbl)) 1) {mrg2drop-c(1,2)} # use row.vars and col.vars to determine the # marginals to use when calculating percentages # start by translating names to variable positions nms-names(dimnames(tbl)) if (!is.null(row.vars) !is.numeric(row.vars)) { row.vars-order(match(nms,row.vars),na.last=NA) } if (!is.null(col.vars) !is.numeric(col.vars)) { col.vars-order(match(nms,col.vars),na.last=NA) } # calculate the other if only one is given if (!is.null(row.vars) is.null(col.vars)) { col.vars-(1:length(dim(tbl)))[-row.vars] } if (!is.null(col.vars) is.null(row.vars)) { row.vars-(1:length(dim(tbl)))[-col.vars] } # now determine the margin as the last element if (type==row !is.null(col.vars)) { mrg2drop-col.vars[length(col.vars)] } if (type==column !is.null(row.vars)) { mrg2drop-row.vars[length(row.vars)] } # if row.vars is given, col.vars has been determined if (type==total !is.null(row.vars)) { mrg2drop-c(col.vars[length(col.vars)],row.vars[length(row.vars)]) } marg-(1:length(dim(tbl)))[(-mrg2drop)] # create percentages if (type==n) { digits-0 } else
Re: [R] models for square tables
Labels can be helpful but can also cause messy output. In some of the other functions, I added an option to suppress printing labels, especially for symmetrical interactions. A problem (in my view) with your approach is that it sorts the categories according to the level names. I'd prefer the following, which would make labels optional but otherwise maintain the original order. mob.qi - function (rowvar,colvar,constrained=FALSE,print.labels=FALSE) { check.square(rowvar,colvar) if (constrained) { qi - ifelse(rowvar==colvar, 1, 0) nms-c(diagonal) } else { qi - ifelse(rowvar==colvar, rowvar, 0) nms-levels(rowvar) } qi-factor(qi) qi-C(qi,contr.treatment,base=1) if (print.labels) { levels(qi)-c(offdiag,nms) } qi } Thanks for your suggestions and for pointing out relevel to me! John Hendrickx --- David Firth [EMAIL PROTECTED] wrote: Nice. I guess I normally do things a little bit differently, to get coefficient names that look a bit more meaningful (eg avoiding numeric codes for factor levels). For example one possible adjustment to your code for the qi models would be mob.qi - function(rowvar, colvar, constrained=FALSE) { check.square(rowvar, colvar) if (!constrained) { qi - ifelse(rowvar==colvar, as.character(rowvar), offdiag) relevel(factor(qi), offdiag) } else qi - ifelse(rowvar==colvar, 1, 0) } immobile - mob.qi(OccFather, OccSon) glm.qi2-glm(Freq ~ OccFather + OccSon + immobile, family=poisson()) immobile - mob.qi(OccFather, OccSon, constrained=T) glm.q02 -glm(Freq ~ OccFather + OccSon + immobile, family=poisson()) Cheers, David On Wednesday, Feb 12, 2003, at 13:01 Europe/London, John Hendrickx wrote: I've posted a sample file for estimating loglinear models for square tables (mobility models) at http://www.xs4all.nl/~jhckx/mcl/R/ Comments and suggestions are welcome. John Hendrickx __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] multinomial conditional logit models
A little while ago, I asked for some help in estimating multinomial conditional logit models. If you restructure your data, you can estimate a multinomial logit as a conditional logit model. This gives flexibility in imposing constraints on your dependent variable. One application is to estimate a loglinear model for square tables such as quasi-independence or quasi-symmetry with covariates. Anyhow, I think I've sorted out most of the problems and I've posted a sample program at http://www.xs4all.nl/~jhckx/mcl/R/ There's also a sample program there on how to estimate models for square tables, aka mobility models. Comments and suggestions on how to do things more efficiently or elegantly in R are most welcome. One problem that remains is that clogit in R doesn't produce the same estimates as other programs like Stata. I've estimated this sample model in several packages and they all have the same estimates to 3 decimal places accuracy. I've tried different convergence settings in clogit but to no effect. I'd appreciate it if anyone could clarify this. Best regards, John Hendrickx __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] models for square tables
I've posted a sample file for estimating loglinear models for square tables (mobility models) at http://www.xs4all.nl/~jhckx/mcl/R/ Comments and suggestions are welcome. John Hendrickx __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] multinomial conditional logit models
Thanks, that did the trick. My mclgen function can be written as mclgen - function (datamat,catvar) { ncat - nlevels(catvar) perschoice-as.data.frame(rep(datamat,ncat)) perschoice-reshape(perschoice,direction=long, varying=lapply(names(datamat),rep,ncat), timevar=newy) perschoice-perschoice[sort.list(perschoice$id),] dep-parse(text=paste(perschoice$,substitute(catvar),sep=)) perschoice$depvar-as.numeric(eval(dep)==perschoice$newy) perschoice } I'd still appreciate the help of R-listers on how to use the value of catvar on the left-hand side of an expression within a function (perschoice$valueof(catvar)-perschoice$newy), and how to get R to drop the reference category of a factor in an interaction effect without the main effect (in clogit(depvar~occ+occ:educ+occ:black+strata(id),data=pc)). But many thanks for the help so far! John Hendrickx --- Thomas Lumley [EMAIL PROTECTED] wrote: On Thu, 30 Jan 2003, John Hendrickx wrote: The only problem though is the last step, specifying a list of varying variables. It should be possible to generate this from names(dset) but I can't get it right. as.list(rep(names(dset),2)) produces a list with four elements rather than a list with 2 components each containing two elements. dim() gives a matrix as a result. Could someone show me how to create list(x1=c(X1,X1),x2=c(X2,X2)) from names(dset)? lapply(names(dset),rep,2) gives [[1]] [1] X1 X1 [[2]] [1] X2 X2 -thomas __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] multinomial conditional logit models
A multinomial logit model can be specified as a conditional logit model after restructuring the data. Doing so gives flexibility in imposing restrictions on the dependent variable. One application is to specify a loglinear model for square tables, e.g. quasi-symmetry or quasi-independence, as a multinomial logit model with covariates. Further details on this technique and examples with several packages are on my homepage at http://www.xs4all.nl/~jhckx/mcl/ I've been trying to implement an MCL model in R and have been mostly succesful. However I'm still stuck on a few points and I'm hoping someone can point out how to do fix them. To estimate an MCL model, the data must be restructured into a person-choice file. * Each case must be duplicated ncat times (ncat is the number of categories of the dependent variable) * Each case must be identified by a strata variable (id) * Each duplicate must be identified by a variable indexing the choices (newy) * A dichotomous variable must indicate which duplicate corresponds with the respondent's choice (didep) I've done this as follows: mclgen - function (datamat,catvar) { ncat - nlevels(catvar) id-1:length(catvar) datamat-cbind(id,datamat) perschoice -NULL for (i in 1:ncat) { perschoice-rbind(perschoice,datamat) } perschoice-perschoice[sort.list(perschoice$id),] perschoice$newy-gl(ncat,1,length=nrow(perschoice)) dep-parse(text=paste(perschoice$,substitute(catvar),sep=)) perschoice$depvar-as.numeric(eval(dep)==perschoice$newy) perschoice } This works but I wonder if it's the most efficient method. I tried using rep rather than rbind in a loop but that replicated the dataset horizontally, not vertically. Is this the best solution? I also finally figure out how to include the argument catvar in a transformation involving another dataset but this solution seems very complicated (eval+parse+substitute). Is there a simpler solution? What I haven't figured out is how to use catvar once again but on the righthand side of the transformation. In the second to last line, I'd like to do perschoice$catvar-perschoice$newy. I've tried several possibities but I can't figure out how to let R substitute the value of catvar in that expression. eval(dep) doesn't work as it does for the lefthand side but how should it be done? My solution for now is to do it afterward. My program continues as follows: pc-mclgen(logan,occ) pc$occ-factor(pc$newy) library(survival) cl.lr-clogit(depvar~occ+occ:educ+occ:black+strata(id),data=pc) summary(cl.lr) The dataset logan is available from the website. occ is the dependent variable, educ and black are independents. Once mclgen has transformed the data into a person-choice file, a multinomial logit model can be estimated using the dichotomous dependent variable, the main effects of occ for the intercept term and interactions of educ and black with occ for the effects of these variables. This works more or less. What's unexpected though is that the reference category of occ isn't dropped in the interactions with educ and black. The highest category is dropped due to collineairity but that's not quite what I want. Could someone explain why this happens and how to let R drop the first category? A final problem is that the results of clogit are only approximate (2 digits) whereas in other packages the cl model produces exactly the same estimates as an mnl model. Is this because terms are dropped due to collinearity or are there options I should examine. clogit issues the following warnings which seem to be related to the collinearity but perhaps point to something else: Warning messages: 1: Loglik converged before variable 10 ; beta may be infinite. in: fitter(X, Y, strats, offset, init, control, weights = weights, 2: X matrix deemed to be singular; variable 9 14 in: coxph(formula = Surv(rep(1, 4190), depvar) ~ occ + occ:educ + Apologies for the length of this post. Hopefully someone will have time to read it through and help me out. As always, advance thanks for any assistance. John Hendrickx __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help