[R] Check a list of genes for a specific GO term

2013-07-07 Thread Chirag Gupta
Hello everyone

I have a dataframe with rows as probeset ID and columns as samples
I want to check the rownames and find which are those probes are
transcription factors. (GO:0006355 )

Any suggestions?

Thanks!

-- 
*Chirag Gupta*
Department of Crop, Soil, and Environmental Sciences,
115 Plant Sciences Building, Fayetteville, Arkansas 72701

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bayesian estimate of prevalence with an imperfect test

2013-07-07 Thread brechtdv
Dear Lian,

You might be interested to hear that a new package is available on CRAN to
perform Bayesian estimation of true prevalence from apparent prevalence: 
http://cran.r-project.org/package=prevalence
http://cran.r-project.org/package=prevalence  

The function 'truePrev()', that estimates true prevalence from individual
samples, is also implemented as an online Shiny application: 
http://users.ugent.be/~bdvleess/R/prevalence/shiny/
http://users.ugent.be/~bdvleess/R/prevalence/shiny/  

Best wishes,
Brecht


LianD wrote
 Thanks Bert 
 
 I've been through my variables again and have managed to get the code
 working - shouldn't have tried to deal with it at the end of the day
 yesterday!
 
 all the best
 
 Lian





--
View this message in context: 
http://r.789695.n4.nabble.com/Bayesian-estimate-of-prevalence-with-an-imperfect-test-tp4265595p4671014.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to turn off buffered output in R mac os?

2013-07-07 Thread qixiang
hi everyone,
I am writing R with textmate2.0 on my mac book. Today when I want to add a 
progress bar in a loop, i find the textmate bundle fails to  print or cat 
anything until the loop is completed. 
I search for this problem in Google and some says that this is because the 
buffers output setting of R and can be solved by unchoosing 
Rconsole-Misc-buffered output. However, doesn't anyone notice that the Mac OS 
R even doesn't have this buffered output option?
I have tried the flush.console() but it doesn't work, neither.
Anyone has get into such a problem and can give me a help?
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (lme4) p-values for single terms in mixed models involved in sig interactions

2013-07-07 Thread sarah hoffmann
I am using lme4 to fit a mixed effects model to my data. I have a significant 
interaction between two variables. My question is what is the correct way to 
get p-values for single terms involved in that interaction. 
I have been using stepwise backwards deletion and model comparisons to get 
p-values,and refitting the model using a REML approach to get 
estimates.However, presumably to get the p values for single terms, I also have 
to remove the interaction as well, and therefore inaccurate. 
I have confused myself with this now, as to whether in this case you should 
compare a model with the interaction and the single term of interest removed to 
the minimum adequate model (in which case the p values are over inflated for 
the single terms), or whether to remove the interaction from the minimum 
adequate model, and then compare this to an updated model, with the single term 
removed.
This is an example of what the model would look like:
library(lme4)
minadequatemodel-lmer(sq_rate~(day+temp+brood_size+weight+weight:brood_size+(1|ident),data=prov,REML=FALSE)

##to get p values for e.g. temp
pvalmodtemp-update(minadequatemodel,~.+temp)
anova(modelfin,modeltemp)

###but what's the correct way to get p value for brood_size or weight?

Your help would be greatly appreciated...thanks!
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to turn off buffered output in R mac os?

2013-07-07 Thread Berend Hasselman

On 06-07-2013, at 16:26, qixiang qixiang...@gmail.com wrote:

 hi everyone,
 I am writing R with textmate2.0 on my mac book. Today when I want to add a 
 progress bar in a loop, i find the textmate bundle fails to  print or cat 
 anything until the loop is completed. 
 I search for this problem in Google and some says that this is because the 
 buffers output setting of R and can be solved by unchoosing 
 Rconsole-Misc-buffered output. However, doesn't anyone notice that the Mac 
 OS R even doesn't have this buffered output option?
 I have tried the flush.console() but it doesn't work, neither.
 Anyone has get into such a problem and can give me a help?

This is not a matter for the R-help list.
It belongs on the TextMate-Users mailing list.

Berend


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The logistf() function from the logistf package. (Was: Data Package Query)

2013-07-07 Thread Prof Brian Ripley

On 06/07/2013 23:01, Rolf Turner wrote:

On 06/07/13 01:35, Yasmine Refai wrote:

Hello,

When I run the below syntax:
*Trial-read.table(Trial.txt,header=TRUE)*
*Trial*
*save.image(file=Trial.RData)*
*load(Trial.RData)
fit-logistf(data=Trial, y~x1+x2)
summary(fit)
AIC(fit)*

I am getting the below error:
* AIC(fit)
Error in UseMethod(logLik) :
   no applicable method for 'logLik' applied to an object of class
logistf
*
Can you please help with that?


You need to learn to crawl before you start trying to learn to walk.

(1) Your convoluted code is filled with redundancies and circularity.
Acquire some understanding of how R works before you plunge into
a modelling exercise.

(2) The logistf() function comes from the logistf package.  This needs to
be mentioned when you are asking for help with the function.

(3) The error message seems to me to be quite clear.  To calculate the AIC
one needs the log likelihood and this cannot be calculated.  This could
be due
to the fact that log likelihood is not applicable to the model used by
logistf(),


Small clarifcation: one needs the maximized log-likelihood.

More abstractly: AIC is only applicable to models fitted by maximum 
likelihood.  Biased-reduced fits (by Firth or anyone else) are not. 
This is one reason why there are many extensions to AIC to compare other 
sorts of fits.



or simply to the fact that the package maintainer has not (yet?) written a
method logLik.logistf().  I don't know, not being familiar with Firth's
bias
reduced logistic regression which is what the logistf package is about.

It is incumbent upon ***you*** to know since you are invoking the logistf
package.  If you don't know, do some study and find out.  You could also
try to contact the package maintainer.

(4) Your subject line is misleading.  You are simply replying to replies to
a previous query (which has really nothing to do with the current one)
and are apparently too lazy to start a new thread.

  cheers,

  Rolf Turner

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Check a list of genes for a specific GO term

2013-07-07 Thread Rui Barradas

Hello,

Your question is not very clear, maybe if you post a data example.
To do so, use ?dput. If your data frame is named 'dat', use the following.

dput(head(dat, 50))  # paste the output of this in a post


If you want to get the rownames matching a certain pattern, maybe 
something like the following.



idx - grep(GO:0006355, rownames(dat))
dat[idx, ]


Hope this helps,

Rui Barradas


Em 07-07-2013 07:01, Chirag Gupta escreveu:

Hello everyone

I have a dataframe with rows as probeset ID and columns as samples
I want to check the rownames and find which are those probes are
transcription factors. (GO:0006355 )

Any suggestions?

Thanks!



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] spatstat output

2013-07-07 Thread catalin roibu
Dear R users,
Is there a possibility to extract only the r, CI's envelope and L function
from the output of spatstat?
I use this code
E - alltypes(df1, Kest, nsim = 100, envelope =
TRUE,savepatterns=TRUE,correction=isotropic)
And second question, is there a possibility to modify the margin of plot in
spatstat?

plot(E, sqrt(./pi) - r ~ r, ylab = L(r)-r,main=NULL,sub=NULL,las=1)

Thank you very much for your help!

CR

-- 
---
Catalin-Constantin ROIBU
Lecturer PhD, Forestry engineer
Forestry Faculty of Suceava
Str. Universitatii no. 13, Suceava, 720229, Romania
office phone +4 0230 52 29 78, ext. 531
mobile phone   +4 0745 53 18 01
   +4 0766 71 76 58
FAX:+4 0230 52 16 64
silvic.usv.ro

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] spatstat output

2013-07-07 Thread Rolf Turner

On 07/07/13 22:12, catalin roibu wrote:

Dear R users,
Is there a possibility to extract only the r, CI's envelope and L function
from the output of spatstat?
I use this code
E - alltypes(df1, Kest, nsim = 100, envelope =
TRUE,savepatterns=TRUE,correction=isotropic)
And second question, is there a possibility to modify the margin of plot in
spatstat?

plot(E, sqrt(./pi) - r ~ r, ylab = L(r)-r,main=NULL,sub=NULL,las=1)

Thank you very much for your help!

(1) This is a question about the spatstat package and as such should be
in the first instance directed to the authors of this package.

(2) Your example is *not reproducible* since we have no access to df1.

(3) Your use of the expression CI's envelope indicates that you are 
confused.
The abbreviation CI would appear to denote confidence interval. The 
envelopes
do ***NOT*** consist of confidence intervals.  They are ***critical 
envelopes***.
(Repeat after me, 50 times: Critical envelope, critical envelope, 
.).   See the help

for envelope().

(4)  The use of nsim=100, while not incorrect, is bizarre.  The 
envelopes yield

p-values equal to fractions whose denominator is (nsim+1).  Hence nsim=99
(the default) is usually a much better choice than nsim=100.

(5) The object E returned by your code is of class fasp (function 
array for
spatial processes).  It is thus a list whose first entry is names 
fns.  In turn
fns is a list whose entries correspond to the marks of the pattern to 
which
you applied alltypes().  Each entry is an object of class fv.  See the 
help for
fv.object.  If you examine the names of these objects --- which are data 
frames
with extra attributes --- you should easily be able to see how to 
extract the

items that you desire.

(6) In respect of changing the margins, see the help for plot.fasp. Does
the argument mar.panel do what you want?

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Check a list of genes for a specific GO term

2013-07-07 Thread Martin Morgan
In Bioconductor, install the annotation package

  http://bioconductor.org/packages/release/BiocViews.html#___AnnotationData

corresponding to your chip, e.g.,

  source(http://bioconductor.org/biocLite.R;)
  biocLite(hgu95av2.db)

then load it and select the GO terms corresponding to your probes

  library(hgu95av2.db)
  lkup - select(hgu95av2.db, rownames(dat), GO)
  
then use standard R commands to find the probesets that have the GO id you're 
interested in

  keep = lkup$GO %in% GO:0006355
  unique(lkup$PROBEID[keep])

Ask follow-up questions about Bioconductor packages on the Bioconductor mailing 
list

  http://bioconductor.org/help/mailing-list/mailform/

Martin
- Rui Barradas ruipbarra...@sapo.pt wrote:
 Hello,
 
 Your question is not very clear, maybe if you post a data example.
 To do so, use ?dput. If your data frame is named 'dat', use the following.
 
 dput(head(dat, 50))  # paste the output of this in a post
 
 
 If you want to get the rownames matching a certain pattern, maybe 
 something like the following.
 
 
 idx - grep(GO:0006355, rownames(dat))
 dat[idx, ]
 
 
 Hope this helps,
 
 Rui Barradas
 
 
 Em 07-07-2013 07:01, Chirag Gupta escreveu:
  Hello everyone
 
  I have a dataframe with rows as probeset ID and columns as samples
  I want to check the rownames and find which are those probes are
  transcription factors. (GO:0006355 )
 
  Any suggestions?
 
  Thanks!
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subset and order

2013-07-07 Thread arun
Hi,
You could also try ?data.table()
x- read.table(text=a    b    c
1    2    3
3    3    4
2    4    5
1    3    4
,sep=,header=TRUE)

library(data.table)

xt- data.table(xt)
 setkey(xt,a)
 subset(xt,b==3)
#   a b c
#1: 1 3 4
#2: 3 3 4



 iord - order(x$a)
 subset(x[iord, ], b == 3) 
#  a b c
#4 1 3 4
#2 3 3 4


Speed comparison:
set.seed(12345)
dat1- as.data.frame(matrix(sample(1:10,3*1e7,replace=TRUE),ncol=3))
colnames(dat1)-letters[1:3]
system.time({
iord - order(dat1$a)
res1-subset(dat1[iord, ], b == 3)
})
#  user  system elapsed 
#  6.888   0.296   7.202 

dt1- data.table(dat1)
system.time({setkey(dt1,a)
    resdt1-subset(dt1,b==3)})
# user  system elapsed 
#   0.72    0.06    0.78 

head(resdt1)
#   a b  c
#1: 1 3  6
#2: 1 3  4
#3: 1 3 10
#4: 1 3  2
#5: 1 3  9
#6: 1 3  8
 head(res1)
#    a b  c
#75  1 3  6
#93  1 3  4
#300 1 3 10
#301 1 3  2
#437 1 3  9
#672 1 3  8

A.K.
- Original Message -
From: Rui Barradas ruipbarra...@sapo.pt
To: Noah Silverman noahsilver...@ucla.edu
Cc: R-help@r-project.org r-help@r-project.org
Sent: Friday, July 5, 2013 3:51 PM
Subject: Re: [R] Subset and order

Hello,

If time is one of the problems, precompute an ordered index, and use it 
every time you want the df sorted. But that would mean you can't do it 
in a single operation.

iord - order(x$a)
subset(x[iord, ], b == 3)


Rui Barradas

Em 05-07-2013 20:47, Noah Silverman escreveu:
 That would work, but is painfully slow.  It forces a new sort of the data 
 with every query.  I have 200,000 rows and need almost a hundred queries.

 Thanks,

 -N


 On Jul 5, 2013, at 12:43 PM, Rui Barradas ruipbarra...@sapo.pt wrote:

 Hello,

 Maybe like this?

 subset(x[order(x$a), ], b == 3)


 Hope this helps,

 Rui Barradas

 Em 05-07-2013 20:33, Noah Silverman escreveu:
 Hello,

 I have a data frame with several columns.

 I'd like to select some subset *and* order by another field at the same 
 time.

 Example:

 a    b    c
 1    2    3
 3    3    4
 2    4    5
 1    3    4
 etc…


 I want to select all rows where b=3 and then order by a.

 To subset is easy:  x[x$b==3,]
 To order is easy: x[order(x$a),]

 Is there a way to do both in a single efficient statement?

 Thanks,



 --
 Noah Silverman, M.S., C.Phil
 UCLA Department of Statistics
 8117 Math Sciences Building
 Los Angeles, CA 90095




     [[alternative HTML version deleted]]



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coxph won't converge when including categorical (factor) variables

2013-07-07 Thread E Joffe
Hello all,

I have posted a reply to Mark and Jeff out of my own fault sent it to them 
personally and not the group.
I will re-post the question and Mark's answer so that the community can benefit 
from his comments.
It seems that my posts are not in accordance with the guidelines and I will try 
to remedy that as well.
I apologize for the mess I have made !!!
I am afraid I can't post the actual data (as it is Protected Health 
Information).


In any case here are the question, code, output and Mark's comment (which can 
be summarized that there is very little information about the pec package 
within the community and that the best solution would be to try and contact the 
author of the package - which I will and then repost the answer):


I am attempting to evaluate the prediction error of a coxph model that was 
built after feature selection with glmnet.

In the preprocessing stage I used na.omit (dataset) to remove NAs. I 
reconstructed all my factor variables into binary variables with dummies (using 
model.matrix) I then used glmnet lasso to fit a cox model and select the best 
performing features. Then I fit a coxph model using only these feature.

When I try to evaluate the model using pec and a bootstrap I get an error that 
the prediction matrix has wrong dimensions.

In the original dataset there are 313 variables. In the coxph model (glmnet.cox 
in my code) there are 313 df. However when I run pec the prediction matrix is 
noted as having 318 variables and therefore doesn't fit the testset which has 
313 variables (+the status and time variables).

It seems that pec does something to the variables while retraining the coxph 
model on the bootstrap samples. I tried setting singular.ok=FALSE in the coxph 
model without avail.

Here is a summary of key variables (prior to restructuring for glmnet) Followed 
by the code I ran and the errors:

 summary (trainSet)
 time  status Age_at_Dx   CREATININE   
 Min.   :  0.1429   Min.   :0.   Min.   :14.81   Min.   :0.400  
 1st Qu.: 22.0714   1st Qu.:1.   1st Qu.:52.51   1st Qu.:0.800  
 Median : 52.9286   Median :1.   Median :63.79   Median :0.900  
 Mean   : 96.5415   Mean   :0.7826   Mean   :61.01   Mean   :1.042  
 3rd Qu.:154.6071   3rd Qu.:1.   3rd Qu.:73.01   3rd Qu.:1.125  
 Max.   :437.7143   Max.   :1.   Max.   :87.23   Max.   :6.000  
 Performance_StatusALBUMIN  Cyto WBC
 Min.   :0. Min.   :0.70   Min.   : 1.000   Min.   :  0.60  
 1st Qu.:1. 1st Qu.:3.00   1st Qu.: 4.000   1st Qu.:  3.20  
 Median :1. Median :3.60   Median : 4.000   Median :  9.20  
 Mean   :0.9728 Mean   :3.48   Mean   : 6.802   Mean   : 28.35  
 3rd Qu.:1. 3rd Qu.:4.00   3rd Qu.:11.000   3rd Qu.: 33.10  
 Max.   :4. Max.   :5.20   Max.   :17.000   Max.   :373.00  
  PRKCD_pT507 HGBMaxblast  CD19   
 Min.   :-2.429605   Min.   : 5.400   Min.   : 0.00   Min.   : 0.000  
  1st Qu.:-0.627005   1st Qu.: 8.500   1st Qu.:24.00   1st Qu.: 1.000  
 Median :-0.013117   Median : 9.550   Median :40.00   Median : 2.000  
 Mean   : 0.006432   Mean   : 9.689   Mean   :43.74   Mean   : 6.312  
 3rd Qu.: 0.559782   3rd Qu.:10.800   3rd Qu.:65.25   3rd Qu.: 5.000  
 Max.   : 2.963658   Max.   :14.300   Max.   :98.00   Max.   :98.000  
 GAPDHCD74  TP53   Fli1  
 Min.   :-2.391021   Min.   :-1.7902   Min.   :-1.64244   Min.   :-3.723143  
 1st Qu.:-0.662164   1st Qu.:-0.5502   1st Qu.:-0.53685   1st Qu.:-0.559783  
 Median : 0.003236   Median :-0.1546   Median :-0.10928   Median :-0.002154  
 Mean   : 0.015759   Mean   : 0.0235   Mean   :-0.02285   Mean   :-0.016555  
 3rd Qu.: 0.632472   3rd Qu.: 0.4759   3rd Qu.: 0.29869   3rd Qu.: 0.554521  
 Max.   : 3.512741   Max.   : 3.7226   Max.   : 5.39242   Max.   : 4.415324  
  LDH  SQSTM0BM_BLAST CCND3 
 Min.   :8.4   Min.   :-1.56639   Min.   : 0.00   Min.   :-2.44772  
 1st Qu.:  562.8   1st Qu.:-0.48762   1st Qu.:31.00   1st Qu.:-0.59042  
 Median :  952.5   Median :-0.05746   Median :46.50   Median :-0.08412  
 Mean   : 1617.9   Mean   :-0.02272   Mean   :50.64   Mean   :-0.02506  
 3rd Qu.: 1596.8   3rd Qu.: 0.33718   3rd Qu.:72.00   3rd Qu.: 0.48353  
 Max.   :36708.0   Max.   : 3.59456   Max.   :98.00   Max.   : 3.58761  
  GAB2BAX   ABS_BLST   PRKAA1_2   
 Min.   :-3.201564   Min.   :-3.230737   Min.   : 0.0   Min.   :-1.90671  
 1st Qu.:-0.640279   1st Qu.:-0.593174   1st Qu.:99.8   1st Qu.:-0.66896  
 Median : 0.007018   Median :-0.106097   Median :  1836.5   Median :-0.10586  
 Mean   :-0.013284   Mean   :-0.007051   Mean   : 15024.1   Mean   :-0.03531  
  3rd Qu.: 0.635609   3rd Qu.: 0.588098   3rd Qu.: 11178.0   3rd Qu.: 0.44906  
 Max.   : 2.396419   Max.   : 3.439942   Max.   :358080.0   Max.   : 3.60037  
RPS6KB1  

Re: [R] Transferring commas in character vector to expression

2013-07-07 Thread William Dunlap
 x.lab - gsub(,,*symbol(\54\)*, x.lab)

Wouldn't using just
   *\,\*
instead of
   *symbol(\54\)*
as the replacement do the same thing?
To me it is simpler to understand.

Note that this fails if the comma is the first or last
character in the input because '*something*' is
not a valid expression.  Another problem is that '**'
is parsed the same as ^, so a**d is displayed as
a Delta superscript Delta d.  One way to deal with that
problem is to strip possible '*'s from the ends of
x.lab and convert '**'s to '*'s before giving it to the parser.
   x.lab - gsub(^\\*|\\*$, , x.lab)
   x.lab - gsub(\\*\\*, *, x.lab)
as in
f1 - function (x.lab) 
{
x.lab - gsub(\\*, *Delta*, x.lab)
x.lab - gsub(,, *\,\*, x.lab)
x.lab - gsub(^\\*|\\*$, , x.lab)
x.lab - gsub(\\*\\*, *, x.lab)
parse(text = x.lab, keep.source = FALSE)
}
where the code in your mail corresponds to the function f0:
f0 - function (x.lab) 
{
x.lab - gsub(\\*, *Delta*, x.lab)
x.lab - gsub(,, *symbol(\54\)*, x.lab)
parse(text = x.lab, keep.source = FALSE)
}

Another approach is not to turn the commas into strings, but
turn anything that is not a '*' into a string.  Then you don't have
to change your code when you discover that the inputs might
contain semicolons or something else.
f2 - function (x.lab) 
{
x.lab - gsub(([^*]+),  \\\1\ , x.lab)
# I put spaces around things to make it a little more readable;
# they may not be very readable in some fonts.
x.lab - gsub(\\*,  * Delta * , x.lab)
x.lab - gsub(^ \\*|\\* $, , x.lab)
x.lab - gsub(\\*  \\*, *, x.lab)
parse(text = x.lab, keep.source = FALSE)
}

Use it as in
   dotchart(1:4, labels=f2(c(***d, ,,c*, a,*d, )))
This code gets ugly pretty quickly so you should bury it in a function.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Eric Archer - NOAA Federal
 Sent: Saturday, July 06, 2013 10:55 PM
 To: Duncan Mackay; r-help-r-project.org
 Subject: Re: [R] Transferring commas in character vector to expression
 
 Duncan,
 
 Thanks! That was the tip I needed. With that, I was able to get this to
 work perfectly:
 
 x.lab - c(a*a, bbb, c,cc*c, d,dd)
 x.lab - gsub(\\*, *Delta*, x.lab)
 x.lab - gsub(,, *symbol(\54\)*, x.lab)
 dotchart(1:length(x.lab), labels = parse(text = x.lab))
 
 
 
 On Sat, Jul 6, 2013 at 9:38 PM, Duncan Mackay mac...@northnet.com.auwrote:
 
   Eric
 
  How does this look - (you might have to add a few symbol(\54)  where
  needed for me this give a comma on windows7  ver 3.1
 
   xyplot(1:4 ~ 1:4, scales = list(x=list(at = 1:4, labels =
 
 c(expression(a*a),expression(bbb),expression(c*Delta*cc*c),expression(d*Delta*symbol(
 \54)*dd)
  ) ) ) )
 
  I mostly use lattice and have forgotten how to do labels in basic plot so
  have used lattice. should be similar and you can modify to suit.
 
  It was trial and error in going up through the numbers  from about 38
 
 
  Duncan
 
  At 11:57 7/07/2013, you wrote:
 
  Duncan,
 
  Thanks for the suggestion, but that won't work for my situation. I'm
  trying to use a character vector to label some axis ticks. There are some
  elements in the vector that have either a comma, or both Greek symbols and
  a comma, like the the third and fourth elements in x.lab below:
 
   x - 1:4
   x.lab - c(a*a, bbb, c,cc*c, d,dd)
   x.lab - gsub(\\*, *Delta*, x.lab)
   x.lab - parse(text = x.lab)
  Error in parse(text = x.lab) : text:3:2: unexpected ','
  2: bbb
  3: c,
 ^
   dotchart(x, labels = x.lab)
 
  The root problem that I'm stumped on is how to either:
  1) insert a comma into an expression and have it be read as a valid
  character, or
  2) replace the comma in the character string with 'list(a, b, c)' as in
  the help for plotmath and have it interpreted correctly.
 
  Cheers,
  eric
 
 
  On Sat, Jul 6, 2013 at 3:33 PM, Duncan Mackay mac...@northnet.com.au 
  wrote:
   Hi Eric
 
  I have not been following the thread but following on what David has said
  on previous occasions
 
  try for example
 
  plot(1,1, ylab =  expression(aa aaa,aa bb*Delta*b *Delta*cc, c) )
 
  Below is from a partly saved previous post of David's several months ago
  which may give you some ideas
 
  DATA_names-c(
  A mg kg,
  B mg kg,
  C mg kg,
  D mg kg,
  E mg kg,
  F mg kg,
  G mg kg,
  H mg kg)
 
  pos - barplot(1:length(DATA_names))
  text(x=pos,y=-1, xpd=TRUE, srt=45,
  labels= sapply( gsub(mg kg, (mg kg)^-1,
  DATA_names),
  as.expression))
 
  HTH
 
  Duncan
 
 
  Duncan Mackay
  Department of Agronomy and Soil Science
  University of New England
  Armidale NSW 2351
  Email: home: mac...@northnet.com.au
 
 
 
 
  At 07:47 6/07/2013, you wrote:
   I'm trying to format a given character vector as an expression with Greek
  symbols to be used in labeling axis ticks. Thanks to some help from David
  Winsemius, 

[R] Hierarchical multi-level model with lmer: why are the highest-level random adjustments 0?

2013-07-07 Thread Stefan Th. Gries
Hi all

I have a hopefully not too stupid question about multi-level /
mixed-effects modeling. I was trying to test a strategy from Crawley's
2013 R Book on a data set with the following structure:

- dependent variable: CONSTRUCTION (a factor with 2 levels)
- independent fixed effect: LENGTH (an integer in the interval [1, 61])
- random effects with the following hierarchical structure: MODE 
REGISTER  SUBREGISTER  FILE. Specifically:

MODE: S
  REGISTER: monolog
SUBREGISTER: scripted
SUBREGISTER: unscripted
  REGISTER: dialog
SUBREGISTER: private
SUBREGISTER: public
  REGISTER: mix
SUBREGISTER: broadcast
MODE: W
  REGISTER: printed
SUBREGISTER: academic
SUBREGISTER: creative
SUBREGISTER: instructional
SUBREGISTER: nonacademic
SUBREGISTER: persuasive
SUBREGISTER: reportage
  REGISTER: nonprinted
SUBREGISTER: letters
SUBREGISTER: nonprofessional

with various levels of FILE in each level of SUBREGISTER. Here's the
head of the relevant data frame (best viewed with a non-proportional
font):

  CASE MODE   REGISTER SUBREGISTERFILE CONSTRUCTION LENGTH
11Wprinted   reportage W2C-002 V_P_DO   11
22Wprinted nonacademic W2B-035 V_P_DO8
33W nonprinted nonprofessional W1A-014 V_P_DO8
44Wprinted   reportage W2C-005 V_P_DO6
55S dialog private S1A-073 V_DO_P2
66S dialog private S1A-073 V_DO_P2

And here's the unique-types distribution of FILE in the design:
tapply(FILE, list(SUBREGISTER, REGISTER, MODE), function (qwe)
length(unique(qwe)))

, , S
dialog mix monolog nonprinted printed
academic .   .   .  .   .
broadcast.  20   .  .   .
creative .   .   .  .   .
instructional.   .   .  .   .
letters  .   .   .  .   .
nonacademic  .   .   .  .   .
nonprofessional  .   .   .  .   .
persuasive   .   .   .  .   .
private 96   .   .  .   .
public  77   .   .  .   .
reportage.   .   .  .   .
scripted .   .  25  .   .
unscripted   .   .  66  .   .

, , W
dialog mix monolog nonprinted printed
academic .   .   .  .  26
broadcast.   .   .  .   .
creative .   .   .  .  20
instructional.   .   .  .  19
letters  .   .   . 28   .
nonacademic  .   .   .  .  37
nonprofessional  .   .   . 17   .
persuasive   .   .   .  .   9
private  .   .   .  .   .
public   .   .   .  .   .
reportage.   .   .  .  20
scripted .   .   .  .   .
unscripted   .   .   .  .   .

# I would usually have done this (using lme4)
model.1.tog - lmer(CONSTRUCTION ~ LENGTH +
(1|MODE/REGISTER/SUBREGISTER), family=binomial)

# but Crawley (2013:692ff.) suggests this:
LEVEL2 - MODE:REGISTER
LEVEL3 - MODE:REGISTER:SUBREGISTER
model.1.sep - lmer(CONSTRUCTION ~ LENGTH + (1|MODE) + (1|LEVEL2) +
(1|LEVEL3), family=binomial)

The results are the same for fixed and random effects, ok, but what I
don't understand in this case is why the random adjustments to
intercepts at the highest level of hierarchical organization (MODE)
are 0:

ranef(model.1.sep)
$LEVEL3
 (Intercept)
S:dialog:private  -0.9482442
S:dialog:public0.1216021
[...]

$LEVEL2
 (Intercept)
S:dialog  -0.4746389
[...]

$MODE
  (Intercept)
S   0
W   0

I am guessing that's got something to do with the fact that what the
random adjustments to intercepts at the level of MODE would do is
already taken care of by the random adjustments to intercepts on the
lower levels of the hierarchical organization of the data - but when I
run the code from Crawley on his data (a linear mixed-effects model,
not a generalized linear mixed-effects model as mine), the highest
hierarchical level of organization does not return 0 as random
adjustment. What am I missing?

Thanks for any input you may have!
STG

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hierarchical multi-level model with lmer: why are the highest-level random adjustments 0?

2013-07-07 Thread Bert Gunter
I think this is much better posted on r-sig-mixed-models rather than r-help.

-- Bert

On Sun, Jul 7, 2013 at 1:14 PM, Stefan Th. Gries stgr...@gmail.com wrote:
 Hi all

 I have a hopefully not too stupid question about multi-level /
 mixed-effects modeling. I was trying to test a strategy from Crawley's
 2013 R Book on a data set with the following structure:

 - dependent variable: CONSTRUCTION (a factor with 2 levels)
 - independent fixed effect: LENGTH (an integer in the interval [1, 61])
 - random effects with the following hierarchical structure: MODE 
 REGISTER  SUBREGISTER  FILE. Specifically:

 MODE: S
   REGISTER: monolog
 SUBREGISTER: scripted
 SUBREGISTER: unscripted
   REGISTER: dialog
 SUBREGISTER: private
 SUBREGISTER: public
   REGISTER: mix
 SUBREGISTER: broadcast
 MODE: W
   REGISTER: printed
 SUBREGISTER: academic
 SUBREGISTER: creative
 SUBREGISTER: instructional
 SUBREGISTER: nonacademic
 SUBREGISTER: persuasive
 SUBREGISTER: reportage
   REGISTER: nonprinted
 SUBREGISTER: letters
 SUBREGISTER: nonprofessional

 with various levels of FILE in each level of SUBREGISTER. Here's the
 head of the relevant data frame (best viewed with a non-proportional
 font):

   CASE MODE   REGISTER SUBREGISTERFILE CONSTRUCTION LENGTH
 11Wprinted   reportage W2C-002 V_P_DO   11
 22Wprinted nonacademic W2B-035 V_P_DO8
 33W nonprinted nonprofessional W1A-014 V_P_DO8
 44Wprinted   reportage W2C-005 V_P_DO6
 55S dialog private S1A-073 V_DO_P2
 66S dialog private S1A-073 V_DO_P2

 And here's the unique-types distribution of FILE in the design:
 tapply(FILE, list(SUBREGISTER, REGISTER, MODE), function (qwe)
 length(unique(qwe)))

 , , S
 dialog mix monolog nonprinted printed
 academic .   .   .  .   .
 broadcast.  20   .  .   .
 creative .   .   .  .   .
 instructional.   .   .  .   .
 letters  .   .   .  .   .
 nonacademic  .   .   .  .   .
 nonprofessional  .   .   .  .   .
 persuasive   .   .   .  .   .
 private 96   .   .  .   .
 public  77   .   .  .   .
 reportage.   .   .  .   .
 scripted .   .  25  .   .
 unscripted   .   .  66  .   .

 , , W
 dialog mix monolog nonprinted printed
 academic .   .   .  .  26
 broadcast.   .   .  .   .
 creative .   .   .  .  20
 instructional.   .   .  .  19
 letters  .   .   . 28   .
 nonacademic  .   .   .  .  37
 nonprofessional  .   .   . 17   .
 persuasive   .   .   .  .   9
 private  .   .   .  .   .
 public   .   .   .  .   .
 reportage.   .   .  .  20
 scripted .   .   .  .   .
 unscripted   .   .   .  .   .

 # I would usually have done this (using lme4)
 model.1.tog - lmer(CONSTRUCTION ~ LENGTH +
 (1|MODE/REGISTER/SUBREGISTER), family=binomial)

 # but Crawley (2013:692ff.) suggests this:
 LEVEL2 - MODE:REGISTER
 LEVEL3 - MODE:REGISTER:SUBREGISTER
 model.1.sep - lmer(CONSTRUCTION ~ LENGTH + (1|MODE) + (1|LEVEL2) +
 (1|LEVEL3), family=binomial)

 The results are the same for fixed and random effects, ok, but what I
 don't understand in this case is why the random adjustments to
 intercepts at the highest level of hierarchical organization (MODE)
 are 0:

 ranef(model.1.sep)
 $LEVEL3
  (Intercept)
 S:dialog:private  -0.9482442
 S:dialog:public0.1216021
 [...]

 $LEVEL2
  (Intercept)
 S:dialog  -0.4746389
 [...]

 $MODE
   (Intercept)
 S   0
 W   0

 I am guessing that's got something to do with the fact that what the
 random adjustments to intercepts at the level of MODE would do is
 already taken care of by the random adjustments to intercepts on the
 lower levels of the hierarchical organization of the data - but when I
 run the code from Crawley on his data (a linear mixed-effects model,
 not a generalized linear mixed-effects model as mine), the highest
 hierarchical level of organization does not return 0 as random
 adjustment. What am I missing?

 Thanks for any input you may have!
 STG

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 

Re: [R] Need hep for converting date data in POSIXct

2013-07-07 Thread arun
Hi,
I am not sure how your dataset looks like.  If it is like the one below: 
(otherwise, please provide a reproducible example using ?dput())

dat1- read.table(text=
datetime
10/02/2010
02:30
11/02/2010
04:00
14/02/2010
06:30
,sep=,header=TRUE,stringsAsFactors=FALSE)

lst1-split(dat1,(seq_along(dat1$datetime)-1)%%2+1)
 dat2- 
data.frame(datetime=as.POSIXct(paste(lst1[[1]][,1],lst1[[2]][,1]),format=%d/%m/%Y
 %H:%M))
 str(dat2)
#'data.frame':    3 obs. of  1 variable:
# $ datetime: POSIXct, format: 2010-02-10 02:30:00 2010-02-11 04:00:00 ...
 dat2
# datetime
#1 2010-02-10 02:30:00
#2 2010-02-11 04:00:00
#3 2010-02-14 06:30:00


#or
data.frame(datetime=as.POSIXct(paste(dat1[seq(1,nrow(dat1),by=2),1],  
dat1[seq(2,nrow(dat1),by=2),1]),format=%d/%m/%Y %H:%M))
# datetime
#1 2010-02-10 02:30:00
#2 2010-02-11 04:00:00
#3 2010-02-14 06:30:00



A.K.



Hey everybody, 

I am a new user of R software. I don't know how I can merge two rows in 
one. In fact, I have one row with the date(dd/mm/) and another with the 
time (hh:mm) and I would like to get one row with date time in order to 
convert to POSIXct. How can I do it??

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Check a list of genes for a specific GO term

2013-07-07 Thread Chirag Gupta
Hi
I think I asked the wrong question. Apologies.

Actually I want all the GO BP annotations for my organism and from them I
want to retain only those annotations which annotate less than a specified
number of genes. (say 1000 genes)

I hope I have put it clearly.

sorry again.

Thanks!


On Sun, Jul 7, 2013 at 6:55 AM, Martin Morgan mtmor...@fhcrc.org wrote:

 In Bioconductor, install the annotation package


 http://bioconductor.org/packages/release/BiocViews.html#___AnnotationData

 corresponding to your chip, e.g.,

   source(http://bioconductor.org/biocLite.R;)
   biocLite(hgu95av2.db)

 then load it and select the GO terms corresponding to your probes

   library(hgu95av2.db)
   lkup - select(hgu95av2.db, rownames(dat), GO)

 then use standard R commands to find the probesets that have the GO id
 you're interested in

   keep = lkup$GO %in% GO:0006355
   unique(lkup$PROBEID[keep])

 Ask follow-up questions about Bioconductor packages on the Bioconductor
 mailing list

   http://bioconductor.org/help/mailing-list/mailform/

 Martin
 - Rui Barradas ruipbarra...@sapo.pt wrote:
  Hello,
 
  Your question is not very clear, maybe if you post a data example.
  To do so, use ?dput. If your data frame is named 'dat', use the
 following.
 
  dput(head(dat, 50))  # paste the output of this in a post
 
 
  If you want to get the rownames matching a certain pattern, maybe
  something like the following.
 
 
  idx - grep(GO:0006355, rownames(dat))
  dat[idx, ]
 
 
  Hope this helps,
 
  Rui Barradas
 
 
  Em 07-07-2013 07:01, Chirag Gupta escreveu:
   Hello everyone
  
   I have a dataframe with rows as probeset ID and columns as samples
   I want to check the rownames and find which are those probes are
   transcription factors. (GO:0006355 )
  
   Any suggestions?
  
   Thanks!
  
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.




-- 
*Chirag Gupta*
Department of Crop, Soil, and Environmental Sciences,
115 Plant Sciences Building, Fayetteville, Arkansas 72701

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (lme4) p-values for single terms in mixed models involved in sig interactions

2013-07-07 Thread Ben Bolker
sarah hoffmann s.hoffmann85 at outlook.com writes:

 I am using lme4 to fit a mixed effects model to my data. I have a
 significant interaction between two variables. My question is what
 is the correct way to get p-values for single terms involved in that
 interaction.  I have been using stepwise backwards deletion and
 model comparisons to get p-values,and refitting the model using a
 REML approach to get estimates.However, presumably to get the p
 values for single terms, I also have to remove the interaction as
 well, and therefore inaccurate.  I have confused myself with this
 now, as to whether in this case you should compare a model with the
 interaction and the single term of interest removed to the minimum
 adequate model (in which case the p values are over inflated for the
 single terms), or whether to remove the interaction from the minimum
 adequate model, and then compare this to an updated model, with the
 single term removed.

 This is an example of what the model would look like:
 library(lme4)
 minadequatemodel-lmer(sq_rate~(day+temp+
  brood_size+weight+weight:brood_size+(1|ident),data=prov,REML=FALSE)
 
 ##to get p values for e.g. temp
 pvalmodtemp-update(minadequatemodel,~.+temp)
 anova(modelfin,modeltemp)
 
 ###but what's the correct way to get p value for brood_size or weight?
 
 Your help would be greatly appreciated...thanks! 

  There are a variety of issues involved here, and most of  them are
not lme4-related.  In fact, you'll have an even bigger problem with
lme4 since by default it doesn't give p-values at all (see
http://glmm.wikidot.com/faq for a description of why not, and some
things you can do about that).

 * stepwards backwards deletion is almost always a bad idea
(see e.g. Harrell _Regression Modeling Strategies_ 2001, or google
'stepwise regression problems')

* violating marginality (i.e. testing the significance of main effects
in the presence of an interaction containing the main effect) is
almost always a bad idea: e.g. google Venables exegesis. There are
_very_ occasionally reasons you would want to consider a model with
an interaction term but with one of the main effects missing/removed (e.g.
if you know based on an experimental design that all the treatments
in an experiment start at the same time, you might want to set the
intercepts the same, which would give you (time + treatment:time).
It's hard to specify this case for two categorical variables; you
pretty much have to construct the dummy variables yourself.

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.