[R] Check a list of genes for a specific GO term
Hello everyone I have a dataframe with rows as probeset ID and columns as samples I want to check the rownames and find which are those probes are transcription factors. (GO:0006355 ) Any suggestions? Thanks! -- *Chirag Gupta* Department of Crop, Soil, and Environmental Sciences, 115 Plant Sciences Building, Fayetteville, Arkansas 72701 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bayesian estimate of prevalence with an imperfect test
Dear Lian, You might be interested to hear that a new package is available on CRAN to perform Bayesian estimation of true prevalence from apparent prevalence: http://cran.r-project.org/package=prevalence http://cran.r-project.org/package=prevalence The function 'truePrev()', that estimates true prevalence from individual samples, is also implemented as an online Shiny application: http://users.ugent.be/~bdvleess/R/prevalence/shiny/ http://users.ugent.be/~bdvleess/R/prevalence/shiny/ Best wishes, Brecht LianD wrote Thanks Bert I've been through my variables again and have managed to get the code working - shouldn't have tried to deal with it at the end of the day yesterday! all the best Lian -- View this message in context: http://r.789695.n4.nabble.com/Bayesian-estimate-of-prevalence-with-an-imperfect-test-tp4265595p4671014.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to turn off buffered output in R mac os?
hi everyone, I am writing R with textmate2.0 on my mac book. Today when I want to add a progress bar in a loop, i find the textmate bundle fails to print or cat anything until the loop is completed. I search for this problem in Google and some says that this is because the buffers output setting of R and can be solved by unchoosing Rconsole-Misc-buffered output. However, doesn't anyone notice that the Mac OS R even doesn't have this buffered output option? I have tried the flush.console() but it doesn't work, neither. Anyone has get into such a problem and can give me a help? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (lme4) p-values for single terms in mixed models involved in sig interactions
I am using lme4 to fit a mixed effects model to my data. I have a significant interaction between two variables. My question is what is the correct way to get p-values for single terms involved in that interaction. I have been using stepwise backwards deletion and model comparisons to get p-values,and refitting the model using a REML approach to get estimates.However, presumably to get the p values for single terms, I also have to remove the interaction as well, and therefore inaccurate. I have confused myself with this now, as to whether in this case you should compare a model with the interaction and the single term of interest removed to the minimum adequate model (in which case the p values are over inflated for the single terms), or whether to remove the interaction from the minimum adequate model, and then compare this to an updated model, with the single term removed. This is an example of what the model would look like: library(lme4) minadequatemodel-lmer(sq_rate~(day+temp+brood_size+weight+weight:brood_size+(1|ident),data=prov,REML=FALSE) ##to get p values for e.g. temp pvalmodtemp-update(minadequatemodel,~.+temp) anova(modelfin,modeltemp) ###but what's the correct way to get p value for brood_size or weight? Your help would be greatly appreciated...thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to turn off buffered output in R mac os?
On 06-07-2013, at 16:26, qixiang qixiang...@gmail.com wrote: hi everyone, I am writing R with textmate2.0 on my mac book. Today when I want to add a progress bar in a loop, i find the textmate bundle fails to print or cat anything until the loop is completed. I search for this problem in Google and some says that this is because the buffers output setting of R and can be solved by unchoosing Rconsole-Misc-buffered output. However, doesn't anyone notice that the Mac OS R even doesn't have this buffered output option? I have tried the flush.console() but it doesn't work, neither. Anyone has get into such a problem and can give me a help? This is not a matter for the R-help list. It belongs on the TextMate-Users mailing list. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The logistf() function from the logistf package. (Was: Data Package Query)
On 06/07/2013 23:01, Rolf Turner wrote: On 06/07/13 01:35, Yasmine Refai wrote: Hello, When I run the below syntax: *Trial-read.table(Trial.txt,header=TRUE)* *Trial* *save.image(file=Trial.RData)* *load(Trial.RData) fit-logistf(data=Trial, y~x1+x2) summary(fit) AIC(fit)* I am getting the below error: * AIC(fit) Error in UseMethod(logLik) : no applicable method for 'logLik' applied to an object of class logistf * Can you please help with that? You need to learn to crawl before you start trying to learn to walk. (1) Your convoluted code is filled with redundancies and circularity. Acquire some understanding of how R works before you plunge into a modelling exercise. (2) The logistf() function comes from the logistf package. This needs to be mentioned when you are asking for help with the function. (3) The error message seems to me to be quite clear. To calculate the AIC one needs the log likelihood and this cannot be calculated. This could be due to the fact that log likelihood is not applicable to the model used by logistf(), Small clarifcation: one needs the maximized log-likelihood. More abstractly: AIC is only applicable to models fitted by maximum likelihood. Biased-reduced fits (by Firth or anyone else) are not. This is one reason why there are many extensions to AIC to compare other sorts of fits. or simply to the fact that the package maintainer has not (yet?) written a method logLik.logistf(). I don't know, not being familiar with Firth's bias reduced logistic regression which is what the logistf package is about. It is incumbent upon ***you*** to know since you are invoking the logistf package. If you don't know, do some study and find out. You could also try to contact the package maintainer. (4) Your subject line is misleading. You are simply replying to replies to a previous query (which has really nothing to do with the current one) and are apparently too lazy to start a new thread. cheers, Rolf Turner [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Check a list of genes for a specific GO term
Hello, Your question is not very clear, maybe if you post a data example. To do so, use ?dput. If your data frame is named 'dat', use the following. dput(head(dat, 50)) # paste the output of this in a post If you want to get the rownames matching a certain pattern, maybe something like the following. idx - grep(GO:0006355, rownames(dat)) dat[idx, ] Hope this helps, Rui Barradas Em 07-07-2013 07:01, Chirag Gupta escreveu: Hello everyone I have a dataframe with rows as probeset ID and columns as samples I want to check the rownames and find which are those probes are transcription factors. (GO:0006355 ) Any suggestions? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] spatstat output
Dear R users, Is there a possibility to extract only the r, CI's envelope and L function from the output of spatstat? I use this code E - alltypes(df1, Kest, nsim = 100, envelope = TRUE,savepatterns=TRUE,correction=isotropic) And second question, is there a possibility to modify the margin of plot in spatstat? plot(E, sqrt(./pi) - r ~ r, ylab = L(r)-r,main=NULL,sub=NULL,las=1) Thank you very much for your help! CR -- --- Catalin-Constantin ROIBU Lecturer PhD, Forestry engineer Forestry Faculty of Suceava Str. Universitatii no. 13, Suceava, 720229, Romania office phone +4 0230 52 29 78, ext. 531 mobile phone +4 0745 53 18 01 +4 0766 71 76 58 FAX:+4 0230 52 16 64 silvic.usv.ro [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] spatstat output
On 07/07/13 22:12, catalin roibu wrote: Dear R users, Is there a possibility to extract only the r, CI's envelope and L function from the output of spatstat? I use this code E - alltypes(df1, Kest, nsim = 100, envelope = TRUE,savepatterns=TRUE,correction=isotropic) And second question, is there a possibility to modify the margin of plot in spatstat? plot(E, sqrt(./pi) - r ~ r, ylab = L(r)-r,main=NULL,sub=NULL,las=1) Thank you very much for your help! (1) This is a question about the spatstat package and as such should be in the first instance directed to the authors of this package. (2) Your example is *not reproducible* since we have no access to df1. (3) Your use of the expression CI's envelope indicates that you are confused. The abbreviation CI would appear to denote confidence interval. The envelopes do ***NOT*** consist of confidence intervals. They are ***critical envelopes***. (Repeat after me, 50 times: Critical envelope, critical envelope, .). See the help for envelope(). (4) The use of nsim=100, while not incorrect, is bizarre. The envelopes yield p-values equal to fractions whose denominator is (nsim+1). Hence nsim=99 (the default) is usually a much better choice than nsim=100. (5) The object E returned by your code is of class fasp (function array for spatial processes). It is thus a list whose first entry is names fns. In turn fns is a list whose entries correspond to the marks of the pattern to which you applied alltypes(). Each entry is an object of class fv. See the help for fv.object. If you examine the names of these objects --- which are data frames with extra attributes --- you should easily be able to see how to extract the items that you desire. (6) In respect of changing the margins, see the help for plot.fasp. Does the argument mar.panel do what you want? cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Check a list of genes for a specific GO term
In Bioconductor, install the annotation package http://bioconductor.org/packages/release/BiocViews.html#___AnnotationData corresponding to your chip, e.g., source(http://bioconductor.org/biocLite.R;) biocLite(hgu95av2.db) then load it and select the GO terms corresponding to your probes library(hgu95av2.db) lkup - select(hgu95av2.db, rownames(dat), GO) then use standard R commands to find the probesets that have the GO id you're interested in keep = lkup$GO %in% GO:0006355 unique(lkup$PROBEID[keep]) Ask follow-up questions about Bioconductor packages on the Bioconductor mailing list http://bioconductor.org/help/mailing-list/mailform/ Martin - Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Your question is not very clear, maybe if you post a data example. To do so, use ?dput. If your data frame is named 'dat', use the following. dput(head(dat, 50)) # paste the output of this in a post If you want to get the rownames matching a certain pattern, maybe something like the following. idx - grep(GO:0006355, rownames(dat)) dat[idx, ] Hope this helps, Rui Barradas Em 07-07-2013 07:01, Chirag Gupta escreveu: Hello everyone I have a dataframe with rows as probeset ID and columns as samples I want to check the rownames and find which are those probes are transcription factors. (GO:0006355 ) Any suggestions? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subset and order
Hi, You could also try ?data.table() x- read.table(text=a b c 1 2 3 3 3 4 2 4 5 1 3 4 ,sep=,header=TRUE) library(data.table) xt- data.table(xt) setkey(xt,a) subset(xt,b==3) # a b c #1: 1 3 4 #2: 3 3 4 iord - order(x$a) subset(x[iord, ], b == 3) # a b c #4 1 3 4 #2 3 3 4 Speed comparison: set.seed(12345) dat1- as.data.frame(matrix(sample(1:10,3*1e7,replace=TRUE),ncol=3)) colnames(dat1)-letters[1:3] system.time({ iord - order(dat1$a) res1-subset(dat1[iord, ], b == 3) }) # user system elapsed # 6.888 0.296 7.202 dt1- data.table(dat1) system.time({setkey(dt1,a) resdt1-subset(dt1,b==3)}) # user system elapsed # 0.72 0.06 0.78 head(resdt1) # a b c #1: 1 3 6 #2: 1 3 4 #3: 1 3 10 #4: 1 3 2 #5: 1 3 9 #6: 1 3 8 head(res1) # a b c #75 1 3 6 #93 1 3 4 #300 1 3 10 #301 1 3 2 #437 1 3 9 #672 1 3 8 A.K. - Original Message - From: Rui Barradas ruipbarra...@sapo.pt To: Noah Silverman noahsilver...@ucla.edu Cc: R-help@r-project.org r-help@r-project.org Sent: Friday, July 5, 2013 3:51 PM Subject: Re: [R] Subset and order Hello, If time is one of the problems, precompute an ordered index, and use it every time you want the df sorted. But that would mean you can't do it in a single operation. iord - order(x$a) subset(x[iord, ], b == 3) Rui Barradas Em 05-07-2013 20:47, Noah Silverman escreveu: That would work, but is painfully slow. It forces a new sort of the data with every query. I have 200,000 rows and need almost a hundred queries. Thanks, -N On Jul 5, 2013, at 12:43 PM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Maybe like this? subset(x[order(x$a), ], b == 3) Hope this helps, Rui Barradas Em 05-07-2013 20:33, Noah Silverman escreveu: Hello, I have a data frame with several columns. I'd like to select some subset *and* order by another field at the same time. Example: a b c 1 2 3 3 3 4 2 4 5 1 3 4 etc… I want to select all rows where b=3 and then order by a. To subset is easy: x[x$b==3,] To order is easy: x[order(x$a),] Is there a way to do both in a single efficient statement? Thanks, -- Noah Silverman, M.S., C.Phil UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coxph won't converge when including categorical (factor) variables
Hello all, I have posted a reply to Mark and Jeff out of my own fault sent it to them personally and not the group. I will re-post the question and Mark's answer so that the community can benefit from his comments. It seems that my posts are not in accordance with the guidelines and I will try to remedy that as well. I apologize for the mess I have made !!! I am afraid I can't post the actual data (as it is Protected Health Information). In any case here are the question, code, output and Mark's comment (which can be summarized that there is very little information about the pec package within the community and that the best solution would be to try and contact the author of the package - which I will and then repost the answer): I am attempting to evaluate the prediction error of a coxph model that was built after feature selection with glmnet. In the preprocessing stage I used na.omit (dataset) to remove NAs. I reconstructed all my factor variables into binary variables with dummies (using model.matrix) I then used glmnet lasso to fit a cox model and select the best performing features. Then I fit a coxph model using only these feature. When I try to evaluate the model using pec and a bootstrap I get an error that the prediction matrix has wrong dimensions. In the original dataset there are 313 variables. In the coxph model (glmnet.cox in my code) there are 313 df. However when I run pec the prediction matrix is noted as having 318 variables and therefore doesn't fit the testset which has 313 variables (+the status and time variables). It seems that pec does something to the variables while retraining the coxph model on the bootstrap samples. I tried setting singular.ok=FALSE in the coxph model without avail. Here is a summary of key variables (prior to restructuring for glmnet) Followed by the code I ran and the errors: summary (trainSet) time status Age_at_Dx CREATININE Min. : 0.1429 Min. :0. Min. :14.81 Min. :0.400 1st Qu.: 22.0714 1st Qu.:1. 1st Qu.:52.51 1st Qu.:0.800 Median : 52.9286 Median :1. Median :63.79 Median :0.900 Mean : 96.5415 Mean :0.7826 Mean :61.01 Mean :1.042 3rd Qu.:154.6071 3rd Qu.:1. 3rd Qu.:73.01 3rd Qu.:1.125 Max. :437.7143 Max. :1. Max. :87.23 Max. :6.000 Performance_StatusALBUMIN Cyto WBC Min. :0. Min. :0.70 Min. : 1.000 Min. : 0.60 1st Qu.:1. 1st Qu.:3.00 1st Qu.: 4.000 1st Qu.: 3.20 Median :1. Median :3.60 Median : 4.000 Median : 9.20 Mean :0.9728 Mean :3.48 Mean : 6.802 Mean : 28.35 3rd Qu.:1. 3rd Qu.:4.00 3rd Qu.:11.000 3rd Qu.: 33.10 Max. :4. Max. :5.20 Max. :17.000 Max. :373.00 PRKCD_pT507 HGBMaxblast CD19 Min. :-2.429605 Min. : 5.400 Min. : 0.00 Min. : 0.000 1st Qu.:-0.627005 1st Qu.: 8.500 1st Qu.:24.00 1st Qu.: 1.000 Median :-0.013117 Median : 9.550 Median :40.00 Median : 2.000 Mean : 0.006432 Mean : 9.689 Mean :43.74 Mean : 6.312 3rd Qu.: 0.559782 3rd Qu.:10.800 3rd Qu.:65.25 3rd Qu.: 5.000 Max. : 2.963658 Max. :14.300 Max. :98.00 Max. :98.000 GAPDHCD74 TP53 Fli1 Min. :-2.391021 Min. :-1.7902 Min. :-1.64244 Min. :-3.723143 1st Qu.:-0.662164 1st Qu.:-0.5502 1st Qu.:-0.53685 1st Qu.:-0.559783 Median : 0.003236 Median :-0.1546 Median :-0.10928 Median :-0.002154 Mean : 0.015759 Mean : 0.0235 Mean :-0.02285 Mean :-0.016555 3rd Qu.: 0.632472 3rd Qu.: 0.4759 3rd Qu.: 0.29869 3rd Qu.: 0.554521 Max. : 3.512741 Max. : 3.7226 Max. : 5.39242 Max. : 4.415324 LDH SQSTM0BM_BLAST CCND3 Min. :8.4 Min. :-1.56639 Min. : 0.00 Min. :-2.44772 1st Qu.: 562.8 1st Qu.:-0.48762 1st Qu.:31.00 1st Qu.:-0.59042 Median : 952.5 Median :-0.05746 Median :46.50 Median :-0.08412 Mean : 1617.9 Mean :-0.02272 Mean :50.64 Mean :-0.02506 3rd Qu.: 1596.8 3rd Qu.: 0.33718 3rd Qu.:72.00 3rd Qu.: 0.48353 Max. :36708.0 Max. : 3.59456 Max. :98.00 Max. : 3.58761 GAB2BAX ABS_BLST PRKAA1_2 Min. :-3.201564 Min. :-3.230737 Min. : 0.0 Min. :-1.90671 1st Qu.:-0.640279 1st Qu.:-0.593174 1st Qu.:99.8 1st Qu.:-0.66896 Median : 0.007018 Median :-0.106097 Median : 1836.5 Median :-0.10586 Mean :-0.013284 Mean :-0.007051 Mean : 15024.1 Mean :-0.03531 3rd Qu.: 0.635609 3rd Qu.: 0.588098 3rd Qu.: 11178.0 3rd Qu.: 0.44906 Max. : 2.396419 Max. : 3.439942 Max. :358080.0 Max. : 3.60037 RPS6KB1
Re: [R] Transferring commas in character vector to expression
x.lab - gsub(,,*symbol(\54\)*, x.lab) Wouldn't using just *\,\* instead of *symbol(\54\)* as the replacement do the same thing? To me it is simpler to understand. Note that this fails if the comma is the first or last character in the input because '*something*' is not a valid expression. Another problem is that '**' is parsed the same as ^, so a**d is displayed as a Delta superscript Delta d. One way to deal with that problem is to strip possible '*'s from the ends of x.lab and convert '**'s to '*'s before giving it to the parser. x.lab - gsub(^\\*|\\*$, , x.lab) x.lab - gsub(\\*\\*, *, x.lab) as in f1 - function (x.lab) { x.lab - gsub(\\*, *Delta*, x.lab) x.lab - gsub(,, *\,\*, x.lab) x.lab - gsub(^\\*|\\*$, , x.lab) x.lab - gsub(\\*\\*, *, x.lab) parse(text = x.lab, keep.source = FALSE) } where the code in your mail corresponds to the function f0: f0 - function (x.lab) { x.lab - gsub(\\*, *Delta*, x.lab) x.lab - gsub(,, *symbol(\54\)*, x.lab) parse(text = x.lab, keep.source = FALSE) } Another approach is not to turn the commas into strings, but turn anything that is not a '*' into a string. Then you don't have to change your code when you discover that the inputs might contain semicolons or something else. f2 - function (x.lab) { x.lab - gsub(([^*]+), \\\1\ , x.lab) # I put spaces around things to make it a little more readable; # they may not be very readable in some fonts. x.lab - gsub(\\*, * Delta * , x.lab) x.lab - gsub(^ \\*|\\* $, , x.lab) x.lab - gsub(\\* \\*, *, x.lab) parse(text = x.lab, keep.source = FALSE) } Use it as in dotchart(1:4, labels=f2(c(***d, ,,c*, a,*d, ))) This code gets ugly pretty quickly so you should bury it in a function. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Eric Archer - NOAA Federal Sent: Saturday, July 06, 2013 10:55 PM To: Duncan Mackay; r-help-r-project.org Subject: Re: [R] Transferring commas in character vector to expression Duncan, Thanks! That was the tip I needed. With that, I was able to get this to work perfectly: x.lab - c(a*a, bbb, c,cc*c, d,dd) x.lab - gsub(\\*, *Delta*, x.lab) x.lab - gsub(,, *symbol(\54\)*, x.lab) dotchart(1:length(x.lab), labels = parse(text = x.lab)) On Sat, Jul 6, 2013 at 9:38 PM, Duncan Mackay mac...@northnet.com.auwrote: Eric How does this look - (you might have to add a few symbol(\54) where needed for me this give a comma on windows7 ver 3.1 xyplot(1:4 ~ 1:4, scales = list(x=list(at = 1:4, labels = c(expression(a*a),expression(bbb),expression(c*Delta*cc*c),expression(d*Delta*symbol( \54)*dd) ) ) ) ) I mostly use lattice and have forgotten how to do labels in basic plot so have used lattice. should be similar and you can modify to suit. It was trial and error in going up through the numbers from about 38 Duncan At 11:57 7/07/2013, you wrote: Duncan, Thanks for the suggestion, but that won't work for my situation. I'm trying to use a character vector to label some axis ticks. There are some elements in the vector that have either a comma, or both Greek symbols and a comma, like the the third and fourth elements in x.lab below: x - 1:4 x.lab - c(a*a, bbb, c,cc*c, d,dd) x.lab - gsub(\\*, *Delta*, x.lab) x.lab - parse(text = x.lab) Error in parse(text = x.lab) : text:3:2: unexpected ',' 2: bbb 3: c, ^ dotchart(x, labels = x.lab) The root problem that I'm stumped on is how to either: 1) insert a comma into an expression and have it be read as a valid character, or 2) replace the comma in the character string with 'list(a, b, c)' as in the help for plotmath and have it interpreted correctly. Cheers, eric On Sat, Jul 6, 2013 at 3:33 PM, Duncan Mackay mac...@northnet.com.au wrote: Hi Eric I have not been following the thread but following on what David has said on previous occasions try for example plot(1,1, ylab = expression(aa aaa,aa bb*Delta*b *Delta*cc, c) ) Below is from a partly saved previous post of David's several months ago which may give you some ideas DATA_names-c( A mg kg, B mg kg, C mg kg, D mg kg, E mg kg, F mg kg, G mg kg, H mg kg) pos - barplot(1:length(DATA_names)) text(x=pos,y=-1, xpd=TRUE, srt=45, labels= sapply( gsub(mg kg, (mg kg)^-1, DATA_names), as.expression)) HTH Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England Armidale NSW 2351 Email: home: mac...@northnet.com.au At 07:47 6/07/2013, you wrote: I'm trying to format a given character vector as an expression with Greek symbols to be used in labeling axis ticks. Thanks to some help from David Winsemius,
[R] Hierarchical multi-level model with lmer: why are the highest-level random adjustments 0?
Hi all I have a hopefully not too stupid question about multi-level / mixed-effects modeling. I was trying to test a strategy from Crawley's 2013 R Book on a data set with the following structure: - dependent variable: CONSTRUCTION (a factor with 2 levels) - independent fixed effect: LENGTH (an integer in the interval [1, 61]) - random effects with the following hierarchical structure: MODE REGISTER SUBREGISTER FILE. Specifically: MODE: S REGISTER: monolog SUBREGISTER: scripted SUBREGISTER: unscripted REGISTER: dialog SUBREGISTER: private SUBREGISTER: public REGISTER: mix SUBREGISTER: broadcast MODE: W REGISTER: printed SUBREGISTER: academic SUBREGISTER: creative SUBREGISTER: instructional SUBREGISTER: nonacademic SUBREGISTER: persuasive SUBREGISTER: reportage REGISTER: nonprinted SUBREGISTER: letters SUBREGISTER: nonprofessional with various levels of FILE in each level of SUBREGISTER. Here's the head of the relevant data frame (best viewed with a non-proportional font): CASE MODE REGISTER SUBREGISTERFILE CONSTRUCTION LENGTH 11Wprinted reportage W2C-002 V_P_DO 11 22Wprinted nonacademic W2B-035 V_P_DO8 33W nonprinted nonprofessional W1A-014 V_P_DO8 44Wprinted reportage W2C-005 V_P_DO6 55S dialog private S1A-073 V_DO_P2 66S dialog private S1A-073 V_DO_P2 And here's the unique-types distribution of FILE in the design: tapply(FILE, list(SUBREGISTER, REGISTER, MODE), function (qwe) length(unique(qwe))) , , S dialog mix monolog nonprinted printed academic . . . . . broadcast. 20 . . . creative . . . . . instructional. . . . . letters . . . . . nonacademic . . . . . nonprofessional . . . . . persuasive . . . . . private 96 . . . . public 77 . . . . reportage. . . . . scripted . . 25 . . unscripted . . 66 . . , , W dialog mix monolog nonprinted printed academic . . . . 26 broadcast. . . . . creative . . . . 20 instructional. . . . 19 letters . . . 28 . nonacademic . . . . 37 nonprofessional . . . 17 . persuasive . . . . 9 private . . . . . public . . . . . reportage. . . . 20 scripted . . . . . unscripted . . . . . # I would usually have done this (using lme4) model.1.tog - lmer(CONSTRUCTION ~ LENGTH + (1|MODE/REGISTER/SUBREGISTER), family=binomial) # but Crawley (2013:692ff.) suggests this: LEVEL2 - MODE:REGISTER LEVEL3 - MODE:REGISTER:SUBREGISTER model.1.sep - lmer(CONSTRUCTION ~ LENGTH + (1|MODE) + (1|LEVEL2) + (1|LEVEL3), family=binomial) The results are the same for fixed and random effects, ok, but what I don't understand in this case is why the random adjustments to intercepts at the highest level of hierarchical organization (MODE) are 0: ranef(model.1.sep) $LEVEL3 (Intercept) S:dialog:private -0.9482442 S:dialog:public0.1216021 [...] $LEVEL2 (Intercept) S:dialog -0.4746389 [...] $MODE (Intercept) S 0 W 0 I am guessing that's got something to do with the fact that what the random adjustments to intercepts at the level of MODE would do is already taken care of by the random adjustments to intercepts on the lower levels of the hierarchical organization of the data - but when I run the code from Crawley on his data (a linear mixed-effects model, not a generalized linear mixed-effects model as mine), the highest hierarchical level of organization does not return 0 as random adjustment. What am I missing? Thanks for any input you may have! STG __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hierarchical multi-level model with lmer: why are the highest-level random adjustments 0?
I think this is much better posted on r-sig-mixed-models rather than r-help. -- Bert On Sun, Jul 7, 2013 at 1:14 PM, Stefan Th. Gries stgr...@gmail.com wrote: Hi all I have a hopefully not too stupid question about multi-level / mixed-effects modeling. I was trying to test a strategy from Crawley's 2013 R Book on a data set with the following structure: - dependent variable: CONSTRUCTION (a factor with 2 levels) - independent fixed effect: LENGTH (an integer in the interval [1, 61]) - random effects with the following hierarchical structure: MODE REGISTER SUBREGISTER FILE. Specifically: MODE: S REGISTER: monolog SUBREGISTER: scripted SUBREGISTER: unscripted REGISTER: dialog SUBREGISTER: private SUBREGISTER: public REGISTER: mix SUBREGISTER: broadcast MODE: W REGISTER: printed SUBREGISTER: academic SUBREGISTER: creative SUBREGISTER: instructional SUBREGISTER: nonacademic SUBREGISTER: persuasive SUBREGISTER: reportage REGISTER: nonprinted SUBREGISTER: letters SUBREGISTER: nonprofessional with various levels of FILE in each level of SUBREGISTER. Here's the head of the relevant data frame (best viewed with a non-proportional font): CASE MODE REGISTER SUBREGISTERFILE CONSTRUCTION LENGTH 11Wprinted reportage W2C-002 V_P_DO 11 22Wprinted nonacademic W2B-035 V_P_DO8 33W nonprinted nonprofessional W1A-014 V_P_DO8 44Wprinted reportage W2C-005 V_P_DO6 55S dialog private S1A-073 V_DO_P2 66S dialog private S1A-073 V_DO_P2 And here's the unique-types distribution of FILE in the design: tapply(FILE, list(SUBREGISTER, REGISTER, MODE), function (qwe) length(unique(qwe))) , , S dialog mix monolog nonprinted printed academic . . . . . broadcast. 20 . . . creative . . . . . instructional. . . . . letters . . . . . nonacademic . . . . . nonprofessional . . . . . persuasive . . . . . private 96 . . . . public 77 . . . . reportage. . . . . scripted . . 25 . . unscripted . . 66 . . , , W dialog mix monolog nonprinted printed academic . . . . 26 broadcast. . . . . creative . . . . 20 instructional. . . . 19 letters . . . 28 . nonacademic . . . . 37 nonprofessional . . . 17 . persuasive . . . . 9 private . . . . . public . . . . . reportage. . . . 20 scripted . . . . . unscripted . . . . . # I would usually have done this (using lme4) model.1.tog - lmer(CONSTRUCTION ~ LENGTH + (1|MODE/REGISTER/SUBREGISTER), family=binomial) # but Crawley (2013:692ff.) suggests this: LEVEL2 - MODE:REGISTER LEVEL3 - MODE:REGISTER:SUBREGISTER model.1.sep - lmer(CONSTRUCTION ~ LENGTH + (1|MODE) + (1|LEVEL2) + (1|LEVEL3), family=binomial) The results are the same for fixed and random effects, ok, but what I don't understand in this case is why the random adjustments to intercepts at the highest level of hierarchical organization (MODE) are 0: ranef(model.1.sep) $LEVEL3 (Intercept) S:dialog:private -0.9482442 S:dialog:public0.1216021 [...] $LEVEL2 (Intercept) S:dialog -0.4746389 [...] $MODE (Intercept) S 0 W 0 I am guessing that's got something to do with the fact that what the random adjustments to intercepts at the level of MODE would do is already taken care of by the random adjustments to intercepts on the lower levels of the hierarchical organization of the data - but when I run the code from Crawley on his data (a linear mixed-effects model, not a generalized linear mixed-effects model as mine), the highest hierarchical level of organization does not return 0 as random adjustment. What am I missing? Thanks for any input you may have! STG __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] Need hep for converting date data in POSIXct
Hi, I am not sure how your dataset looks like. If it is like the one below: (otherwise, please provide a reproducible example using ?dput()) dat1- read.table(text= datetime 10/02/2010 02:30 11/02/2010 04:00 14/02/2010 06:30 ,sep=,header=TRUE,stringsAsFactors=FALSE) lst1-split(dat1,(seq_along(dat1$datetime)-1)%%2+1) dat2- data.frame(datetime=as.POSIXct(paste(lst1[[1]][,1],lst1[[2]][,1]),format=%d/%m/%Y %H:%M)) str(dat2) #'data.frame': 3 obs. of 1 variable: # $ datetime: POSIXct, format: 2010-02-10 02:30:00 2010-02-11 04:00:00 ... dat2 # datetime #1 2010-02-10 02:30:00 #2 2010-02-11 04:00:00 #3 2010-02-14 06:30:00 #or data.frame(datetime=as.POSIXct(paste(dat1[seq(1,nrow(dat1),by=2),1], dat1[seq(2,nrow(dat1),by=2),1]),format=%d/%m/%Y %H:%M)) # datetime #1 2010-02-10 02:30:00 #2 2010-02-11 04:00:00 #3 2010-02-14 06:30:00 A.K. Hey everybody, I am a new user of R software. I don't know how I can merge two rows in one. In fact, I have one row with the date(dd/mm/) and another with the time (hh:mm) and I would like to get one row with date time in order to convert to POSIXct. How can I do it?? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Check a list of genes for a specific GO term
Hi I think I asked the wrong question. Apologies. Actually I want all the GO BP annotations for my organism and from them I want to retain only those annotations which annotate less than a specified number of genes. (say 1000 genes) I hope I have put it clearly. sorry again. Thanks! On Sun, Jul 7, 2013 at 6:55 AM, Martin Morgan mtmor...@fhcrc.org wrote: In Bioconductor, install the annotation package http://bioconductor.org/packages/release/BiocViews.html#___AnnotationData corresponding to your chip, e.g., source(http://bioconductor.org/biocLite.R;) biocLite(hgu95av2.db) then load it and select the GO terms corresponding to your probes library(hgu95av2.db) lkup - select(hgu95av2.db, rownames(dat), GO) then use standard R commands to find the probesets that have the GO id you're interested in keep = lkup$GO %in% GO:0006355 unique(lkup$PROBEID[keep]) Ask follow-up questions about Bioconductor packages on the Bioconductor mailing list http://bioconductor.org/help/mailing-list/mailform/ Martin - Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Your question is not very clear, maybe if you post a data example. To do so, use ?dput. If your data frame is named 'dat', use the following. dput(head(dat, 50)) # paste the output of this in a post If you want to get the rownames matching a certain pattern, maybe something like the following. idx - grep(GO:0006355, rownames(dat)) dat[idx, ] Hope this helps, Rui Barradas Em 07-07-2013 07:01, Chirag Gupta escreveu: Hello everyone I have a dataframe with rows as probeset ID and columns as samples I want to check the rownames and find which are those probes are transcription factors. (GO:0006355 ) Any suggestions? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- *Chirag Gupta* Department of Crop, Soil, and Environmental Sciences, 115 Plant Sciences Building, Fayetteville, Arkansas 72701 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (lme4) p-values for single terms in mixed models involved in sig interactions
sarah hoffmann s.hoffmann85 at outlook.com writes: I am using lme4 to fit a mixed effects model to my data. I have a significant interaction between two variables. My question is what is the correct way to get p-values for single terms involved in that interaction. I have been using stepwise backwards deletion and model comparisons to get p-values,and refitting the model using a REML approach to get estimates.However, presumably to get the p values for single terms, I also have to remove the interaction as well, and therefore inaccurate. I have confused myself with this now, as to whether in this case you should compare a model with the interaction and the single term of interest removed to the minimum adequate model (in which case the p values are over inflated for the single terms), or whether to remove the interaction from the minimum adequate model, and then compare this to an updated model, with the single term removed. This is an example of what the model would look like: library(lme4) minadequatemodel-lmer(sq_rate~(day+temp+ brood_size+weight+weight:brood_size+(1|ident),data=prov,REML=FALSE) ##to get p values for e.g. temp pvalmodtemp-update(minadequatemodel,~.+temp) anova(modelfin,modeltemp) ###but what's the correct way to get p value for brood_size or weight? Your help would be greatly appreciated...thanks! There are a variety of issues involved here, and most of them are not lme4-related. In fact, you'll have an even bigger problem with lme4 since by default it doesn't give p-values at all (see http://glmm.wikidot.com/faq for a description of why not, and some things you can do about that). * stepwards backwards deletion is almost always a bad idea (see e.g. Harrell _Regression Modeling Strategies_ 2001, or google 'stepwise regression problems') * violating marginality (i.e. testing the significance of main effects in the presence of an interaction containing the main effect) is almost always a bad idea: e.g. google Venables exegesis. There are _very_ occasionally reasons you would want to consider a model with an interaction term but with one of the main effects missing/removed (e.g. if you know based on an experimental design that all the treatments in an experiment start at the same time, you might want to set the intercepts the same, which would give you (time + treatment:time). It's hard to specify this case for two categorical variables; you pretty much have to construct the dummy variables yourself. Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.