date:20100712

[R] How to mean, min lists and numbers

2010-07-12 Thread guox

I would like to sum/mean/min a list of lists and numbers to return the
related lists.

-1+2*c(1,1,0)+2+c(-1,10,-1) returns c(2,13,0) but
sum(1,2*c(1,1,0),2,c(-1,10,-1)) returns 15 not a list.
Using the suggestions of Gabor Grothendieck,
Reduce('+',list(-1,2*c(1,1,0),2,c(-1,10,-1))) returns what we want, 
c(2,13,0).

However, it seems that this way does not work to mean/min.
So, how to mean/min a list of lists and numbers to return a list? Thanks,

-james

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] make an model object (e.g. nlme) available in a user defined function (xyplot related)

2010-07-12 Thread Jun Shen

Dear Deepayan,

Thank you for taking the time to look into this issue.

I have a data object called Data, please find it at the end of the
message. Then I can run the code below separately in the console.

#Construct the nlme object
mod.nlme-nlme(RESP~E0+(Emax-E0)*CP**gamma/(EC50**gamma+CP**gamma),data=Data,method='REML',
fixed=E0+Emax+gamma+EC50~1,
random=EC50~1,
groups=~ID,
start=list(fixed=c(E0=1,Emax=100,gamma=1,EC50=50))
)

#Plotting
xyplot(RESP~CP,Data,
groups=ID,
panel=panel.superpose,
panel.groups=function(x,y,subscripts,...){
panel.xyplot(x,y,...)
subjectData-Data[subscripts,]
ind.pred-predict(mod.nlme,newdata=subjectData)
panel.xyplot(x,ind.pred,type='l',lty=2)
}
)
##
Then I constructed a test function to put the two tasks together and
it seems OK. Strangely I don't even need to print() the xyplot, it is
just automatically shown on the screen.

test.function-function(Data=Data){

mod.nlme-nlme(RESP~E0+(Emax-E0)*CP**gamma/(EC50**gamma+CP**gamma),data=Data,method='REML',
fixed=E0+Emax+gamma+EC50~1,
random=EC50~1,
groups=~ID,
start=list(fixed=c(E0=1,Emax=100,gamma=1,EC50=50))
)

xyplot(RESP~CP,data=Data,
groups=ID,
panel=panel.superpose,
panel.groups=function(x,y,subscripts,...){
panel.xyplot(x,y,...)
subjectData-Data[subscripts,]
ind.pred-predict(mod.nlme,newdata=subjectData)
panel.xyplot(x,ind.pred,type='l',lty=2)
}
)
}



Then I have my real function as follows: If I run the code as,

compare.curves(Data=Data)

The analytical part is working but not the plotting part (Error using
packet 1, object 'model' not found)
==
compare.curves-function(curve='ascending',Data=stop('A data object
must be specified'),parameter='EC50',random.pdDiag=FALSE,
start.values=c(Emax=100,E0=1,EC50=50,gamma=2),...){

if (curve=='ascending') model=as.formula('RESP ~
E0+(Emax-E0)*CP**gamma/(EC50**gamma+CP**gamma)')
if (curve=='descending') model=as.formula('RESP ~
E0+(Emax-E0)*(1-CP**gamma/(EC50**gamma+CP**gamma))')

mod.nlme-nlme(model=model,data=Data,method='REML',
fixed=Emax+E0+EC50+gamma~1,
random= if (length(parameter)==1)
eval(substitute(variable~1,list(variable=as.name(parameter
else {
variable-as.name(parameter[1])
for (i in 2:length(parameter)) variable-
paste(variable,'+',as.name(parameter[i]))
formula-as.formula(paste(variable,'~1'))
if (random.pdDiag) list(pdDiag(formula))
else formula
},
groups=~ID,
start=list(fixed=start.values)
)

mod.nlme.RSS-sum(resid(mod.nlme)^2)
df.mod.nlme-dim(Data)[1]-(4+length(parameter)) # 4 fixed 
effects
and plus the number of random effects
constrained.fit.parameters-coef(mod.nlme)

mod.nls.ind-lapply(split(Data,Data$ID),function(x){
nls(formula=model,data=x,start=start.values)
})


mod.nls.ind.RSS-do.call(sum,lapply(mod.nls.ind,function(x)resid(x)^2))
df.mod.nls.ind-dim(Data)[1]-4*length(unique(Data$ID))
ind.fit.parameters-do.call(rbind,lapply(mod.nls.ind,coef))

F.statistic-mod.nlme.RSS/mod.nls.ind.RSS

F.test.p.value-pf(F.statistic,df.mod.nlme,df.mod.nls.ind,lower.tail=FALSE)


print(
xyplot(RESP~CP,data=Data,
groups=ID,
panel=panel.superpose,
panel.groups=function(x,y,subscripts,...){
panel.xyplot(x,y,...)
subjectData-Data[subscripts,]
ind.pred-predict(mod.nlme,newdata=subjectData)
panel.xyplot(x,ind.pred,type='l',lty=2)
}
)
)



return(list(F_test_statistic=F.statistic,F_test_p_value=F.test.p.value,

Individual_fit=ind.fit.parameters,Constrained_fit=constrained.fit.parameters))
}
=
The data object Data
structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L), CP = c(1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30,

Re: [R] How to mean, min lists and numbers

2010-07-12 Thread Duncan Murdoch


On 12/07/2010 11:10 AM, g...@ucalgary.ca wrote:

I would like to sum/mean/min a list of lists and numbers to return the
related lists.

-1+2*c(1,1,0)+2+c(-1,10,-1) returns c(2,13,0) but
sum(1,2*c(1,1,0),2,c(-1,10,-1)) returns 15 not a list.
Using the suggestions of Gabor Grothendieck,
Reduce('+',list(-1,2*c(1,1,0),2,c(-1,10,-1))) returns what we want, 
c(2,13,0).


However, it seems that this way does not work to mean/min.
So, how to mean/min a list of lists and numbers to return a list? Thanks,


You need to be careful of terminology:  c(1,1,0) is not a list, it's a 
vector.  What you want is to apply functions componentwise to lists of 
vectors.


One way to do that is to bind them into a matrix, and use apply.  For 
example:


M - cbind(-1, c(1,1,0), c(-1,10,-1))
apply(M, 1, mean)

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to mean, min lists and numbers

2010-07-12 Thread David Winsemius



On Jul 12, 2010, at 11:10 AM, g...@ucalgary.ca wrote:


I would like to sum/mean/min a list of lists and numbers to return the
related lists.


You will advance in your understanding faster if you adopt the correct  
terminology:


-1+2*c(1,1,0)+2+c(-1,10,-1) returns c(2,13,0) but


... which is NOT a list, it is a vector.


sum(1,2*c(1,1,0),2,c(-1,10,-1)) returns 15 not a list.
Using the suggestions of Gabor Grothendieck,
Reduce('+',list(-1,2*c(1,1,0),2,c(-1,10,-1))) returns what we want,
c(2,13,0).

However, it seems that this way does not work to mean/min.


If you want a running cumulative mean of a vector, i.e,  
c( mean(vec[1]), mean(vec[1:2]), ,,, mean(vec) ):


vec - sample(1:20)

sapply(1:length(vec), function(x) mean(vec[1:x])



So, how to mean/min a list of lists and numbers to return a list?


Not a list and not working on a list of lists. A vector.
--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Calculating Gwet's AC1 statistic

2010-07-12 Thread Pete St Marie

Hi,

After searching the archives and Google and not turning up anything, I
thought I'd ask here.

Has anyone done an R package for calculating Gwet's AC1 statistic and variance?

K.L. Gwet. Computing inter-rater reliability and its variance in the
presence of high agreement. Br J Math Stat Psychol. 2008 May;61(Pt
1):29-48. http://www.ncbi.nlm.nih.gov/pubmed/18482474

Thanks,


--pete

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Quantmod Error Message

2010-07-12 Thread Tyler Campbell

I am trying to create a model using the Quantmod package in R. I am
using the following string of commands:
 ema-read.csv(file=ESU0 Jul 7 1 sec data.csv)
 Bid=(ema$Bid)
 twentysell=EMA(Bid,n=1200)
 fortysell=EMA(Bid,n=2400)
 sigup-ifelse(twentysellfortysell,1,0)
 sigdn-ifelse(twentysellfortysell,-1,0)
 specifyModel(Next(sigup)~lag(sigup,1) + Next(sigdn)~lag(sigdn,1), 1:31624)

After this last command, I get this error message:
Error in as.Date.default(x, origin = 1970-01-01) :
  do not know how to convert 'x' to class Date

I've thought it was a time series issue, but I have tried converting
the sigup and sigdn to a time series using
sigup_ts=ts(sigup)
sigdn_ts=ts(sigdn)
But the error still comes up. Any help on this issue would be greatly
appreciated.

Thanks,
Tyler Campbell
tyler.campb...@tradeforecaster.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] print.trellis draw.in - plaintext (gmail mishap)

2010-07-12 Thread Mark Connolly

require(grid)
require(lattice)
fred = data.frame(x=1:5,y=runif(5))
vplayout - function (x,y) viewport(layout.pos.row=x, layout.pos.col=y)
grid.newpage()
pushViewport(viewport(layout=grid.layout(2,2)))
p = xyplot(y~x,fred)
print(  p,newpage=FALSE,draw.in=vplayout(2,2)$name)


On Mon, Jul 12, 2010 at 8:58 AM, Felix Andrews fe...@nfrac.org wrote:
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 Yes, please, reproducible code.



 On 10 July 2010 00:49, Mark Connolly wmcon...@ncsu.edu wrote:
 I am attempting to plot a trellis object on a grid.

 vplayout = viewport(layout.pos.row=x, layout.pos.col=y)

 grid.newpage()
 pushViewport(viewport(layout=grid.layout(2,2)))

 g1 = ggplot() ...
 g2 = ggplot() ...
 g3 = ggplot() ...
 p = xyplot() ...

 # works as expected
 print(g1, vp=vplayout(1,1))
 print(g2, vp=vplayout(1,2))
 print(g3, vp=vplayout(2,1))

 # does not work
 print(  p,
         newpage=FALSE,
         draw.in=vplayout(2,2)$name)

 Error in grid.Call.graphics(L_downviewport, name$name, strict) :
  Viewport 'GRID.VP.112' was not found


 What am I doing wrong?

 Thanks!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Felix Andrews / 安福立
 http://www.neurofractal.org/felix/


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to mean, min lists and numbers

2010-07-12 Thread David Winsemius



On Jul 12, 2010, at 11:19 AM, Duncan Murdoch wrote:


On 12/07/2010 11:10 AM, g...@ucalgary.ca wrote:
I would like to sum/mean/min a list of lists and numbers to return  
the

related lists.

-1+2*c(1,1,0)+2+c(-1,10,-1) returns c(2,13,0) but
sum(1,2*c(1,1,0),2,c(-1,10,-1)) returns 15 not a list.
Using the suggestions of Gabor Grothendieck,
Reduce('+',list(-1,2*c(1,1,0),2,c(-1,10,-1))) returns what we want,  
c(2,13,0).


However, it seems that this way does not work to mean/min.
So, how to mean/min a list of lists and numbers to return a list?  
Thanks,


You need to be careful of terminology:  c(1,1,0) is not a list, it's  
a vector.  What you want is to apply functions componentwise to  
lists of vectors.


One way to do that is to bind them into a matrix, and use apply.   
For example:


M - cbind(-1, c(1,1,0), c(-1,10,-1))
apply(M, 1, mean)


As usual Duncan's understanding is better than mine. Just so you know,  
there are also utility functions row-oriented functions which are  
conisderably faster when they are the correct solution:


?rowSums
?rowMeans

 rowMeans(cbind(-1, c(1,1,0), c(-1,10,-1)) )
[1] -0.333  3.333 -0.667
  apply(cbind(-1, c(1,1,0), c(-1,10,-1)), 1, mean)
[1] -0.333  3.333 -0.667

  ... and there is a parallel version of a minimum functions, pmin  
that would have given you results for arguments that are just the  
vectors  of varying length you were working with:


 pmin(2, c(2,2,0) ,-1 , c(-1,10,-1))
#[1] -1 -1 -1   #(Done with argument recycling.)




Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] long to wide on larger data set

2010-07-12 Thread Matthew Dowle

Juliet,

I've been corrected off list. I did not read properly that you are on 64bit.

The calculation should be :
53860858 * 4 * 8 /1024^3 = 1.6GB
since pointers are 8 bytes on 64bit.

Also, data.table is an add-on package so I should have included :

   install.packages(data.table)
   require(data.table)

data.table is available on all platforms both 32bit and 64bit.

Please forgive mistakes: 'someoone' should be 'someone', 'percieved' should 
be
'perceived' and 'testDate' should be 'testData' at the end.

The rest still applies, and you might have a much easier time than I thought
since you are on 64bit. I was working on the basis of squeezing into 32bit.

Matthew


Matthew Dowle mdo...@mdowle.plus.com wrote in message 
news:i1faj2$lv...@dough.gmane.org...

 Hi Juliet,

 Thanks for the info.

 It is very slow because of the == in  testData[testData$V2==one_ind,]

 Why? Imagine someoone looks for 10 people in the phone directory. Would
 they search the entire phone directory for the first person's phone 
 number, starting
 on page 1, looking at every single name, even continuing to the end of the 
 book
 after they had found them ?  Then would they start again from page 1 for 
 the 2nd
 person, and then the 3rd, searching the entire phone directory from start 
 to finish
 for each and every person ?  That code using == does that.  Some of us 
 call
 that a 'vector scan' and is a common reason for R being percieved as slow.

 To do that more efficiently try this :

 testData = as.data.table(testData)
 setkey(testData,V2)# sorts data by V2
 for (one_ind in mysamples) {
   one_sample - testData[one_id,]
   reshape(one_sample)
 }

 or just this :

 testData = as.data.table(testData)
 setkey(testDate,V2)
 testData[,reshape(.SD,...), by=V2]

 That should solve the vector scanning problem, and get you on to the 
 memory
 problems which will need to be tackled. Since the 4 columns are character, 
 then
 the object size should be roughly :

53860858 * 4 * 4 /1024^3 = 0.8GB

 That is more promising to work with in 32bit so there is hope. [ That 
 0.8GB
 ignores the (likely small) size of the unique strings in global string 
 hash (depending
 on your data). ]

 Its likely that the as.data.table() fails with out of memory.  That is not 
 data.table
 but unique. There is a change in unique.c in R 2.12 which makes unique 
 more
 efficient and since factor calls unique, it may be necessary to use R 
 2.12.

 If that still doesn't work, then there are several more tricks (and we 
 will need
 further information), and there may be some tweaks needed to that code as 
 I
 didn't test it,  but I think it should be possible in 32bit using R 2.12.

 Is it an option to just keep it in long format and use a data.table ?

   testDate[, somecomplexrfunction(onecolumn, anothercolumn), by=list(V2) ]

 Why you you need to reshape from long to wide ?

 HTH,
 Matthew



 Juliet Hannah juliet.han...@gmail.com wrote in message 
 news:aanlktinyvgmrvdp0svc-fylgogn2ro0omnugqbxx_...@mail.gmail.com...
 Hi Jim,

 Thanks for responding. Here is the info I should have included before.
 I should be able to access 4 GB.

 str(myData)
 'data.frame':   53860857 obs. of  4 variables:
 $ V1: chr  23 26 200047 200050 ...
 $ V2: chr  cv0001 cv0001 cv0001 cv0001 ...
 $ V3: chr  A A A B ...
 $ V4: chr  B B A B ...
 sessionInfo()
 R version 2.11.0 (2010-04-22)
 x86_64-unknown-linux-gnu

 locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 On Mon, Jul 12, 2010 at 7:54 AM, jim holtman jholt...@gmail.com wrote:
 What is the configuration you are running on (OS, memory, etc.)? What
 does your object consist of? Is it numeric, factors, etc.? Provide a
 'str' of it. If it is numeric, then the size of the object is
 probably about 1.8GB. Doing the long to wide you will probably need
 at least that much additional memory to hold the copy, if not more.
 This would be impossible on a 32-bit version of R.

 On Mon, Jul 12, 2010 at 1:25 AM, Juliet Hannah juliet.han...@gmail.com 
 wrote:
 I have a data set that has 4 columns and 53860858 rows. I was able to
 read this into R with:

 cc - rep(character,4)
 myData - 
 read.table(myData.csv,header=FALSE,skip=1,colClasses=cc,nrow=53860858,sep=,)


 I need to reshape this data from long to wide. On a small data set the
 following lines work. But on the real data set, it didn't finish even
 when I took a sample of two (rows in new data). I didn't receive an
 error. I just stopped it because it was taking too long. Any
 suggestions for improvements? Thanks.

 # start example
 # i have commented out the write.table statement below

 testData -

Re: [R] [R-pkgs] New package list for analyzing list survey experiments

2010-07-12 Thread Spencer Graves

  I encourage all authors and maintainers of packages that they use 
findFn in the sos package to search for other uses of a name you 
want to use.  The findFn function searches for matches in the help 
pages of contributed packages, including all of CRAN plus some 
elsewhere.  The grepFn can identify help pages whose name contains a 
particular term.



  The R Journal from last December contains an article describing 
this:
http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Graves~et~al.pdf;. 




  Hope this helps.
  Spencer Graves


On 7/12/2010 7:08 AM, Jeffrey J. Hallman wrote:

I know nothing about your package, but list is a terrible name for it,
as list is also the name of a data type in R.
   



--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Need help on date calculation

2010-07-12 Thread Daniel Murphy

If you want to use the mondate package you will need to specify the day of
the month. Dates March-2010 and May-2010 are ambiguous. Recommend you
choose last day of both months as representative days. Then those days
will be integer months apart.

 a-mondate(March 31, 2010, displayFormat=%B %d, %Y)
 b-mondate(May 31, 2010, displayFormat=%B %d, %Y)
 print(a)
[1] March 31, 2010
 print(b)
[1] May 31, 2010
 b-a
[1] 2
attr(,timeunits)
[1] months
 c(b-a)   ## strip away the attribute
[1] 2

Technically speaking, since mondates are fundamentally numeric, the result
of b-a is *numeric*, not *integer*, but is as close to the integer 2 as an R
*numeric* can be:
 is.integer(c(b-a))
[1] FALSE
 is.integer(2)
[1] FALSE
 identical(c(b-a),2)
[1] TRUE

Even easier, use the mondate.ymd function, which assumes the last day of the
month if not provided, and you won't have to worry about the number of days
in a month or leap years. You can also retain your Month-Year format when
printed if that is a requirement:

 a-mondate.ymd(2010,3, displayFormat=%B-%Y)
 b-mondate.ymd(2010,5, displayFormat=%B-%Y)
 print(a)
[1] March-2010
 print(b)
[1] May-2010
 b-a
[1] 2
attr(,timeunits)
[1] months
 identical(c(b-a),2)
[1] TRUE

This works for any last-day-of-the-month's because mondate represents them
as numerics with zero fractional part.
Hope that helps,

Dan Murphy
=
Message: 26
Date: Sat, 10 Jul 2010 15:17:07 -0400
From: Gabor Grothendieck ggrothendi...@gmail.com
To: Bogaso Christofer bogaso.christo...@gmail.com
Cc: r-help@r-project.org
Subject: Re: [R] Need help on date calculation
Message-ID:
   aanlktimp7z4hhwn-qabdmeeigiyfn0um_aymsn5dx...@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1

On Sat, Jul 10, 2010 at 3:34 PM, Bogaso Christofer
bogaso.christo...@gmail.com wrote:
 Thanks Gabor for your input. However my question is, is your solution
 general for any value of a and b?


#1 and #2 are general.

For #3 I think there is a possibility you might in general have to do
the same tricks as #1 and #2 but am not sure.  Suggest you discuss
with author of mondate package.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Extract Clusters from Biclust Object

2010-07-12 Thread delfin13


Dear all, 
I share the problem Linda Garcia and Ram Kumar Basnet described; I have a
biclust object, containing several clusters. For drawing a heatmap, it is
possible to specify the cluster to be plotted. However, I'd like to extract
the clusters in this manner: 
  Cond.1  Cond.2
Gene - value - value

just like drawHeatmap specifies each cluster. Is there a way to extract
single clusters? E.g. like saying obj...@object3, meaning cluster no. 3 of
my biclust object? Unfortunately, the given answers I found in older posts
could'nt help me out... 
Any help is strongly appreciated! 

Best regards, 
Christine
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Extract-Clusters-from-Biclust-Object-tp2286066p2286066.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] in continuation with the earlier R puzzle

2010-07-12 Thread Raghu

When I just run a for loop it works. But if I am going to run a for loop
every time for large vectors I might as well use C or any other language.
The reason R is powerful is becasue it can handle large vectors without each
element being manipulated? Please let me know where I am wrong.

for(i in 1:length(news1o)){
+ if(news1o[i]s2o[i])
+ s[i]-1
+ else
+ s[i]--1
+ }

-- 
'Raghu'

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] a small puzzle?

2010-07-12 Thread Raghu

I know the following may sound too basic but I thought the mailing list is
for the benefit of all levels of people. I ran a simple if statement on two
numeric vectors (news1o and s2o) which are of equal length. I have done an
str on both of them for your kind perusal below. I am trying to compare the
numbers in both and initiate a new vector s as 1 or 0 depending on if the
elements in the arrays are greater or lesser than each other. When I do a
simple s=(news1os2o) I get the values of S as a string of TRUEs and FALSEs
but when I try to override using the if statements this cribs. I get only
one element in s and that is a puzzle. Any ideas on this please? Many
thanks.


if(news1os2o)(s-1) else
+ (s--1)
[1] -1
Warning message:
In if (news1o  s2o) (s - 1) else (s - -1) :
  the condition has length  1 and only the first element will be used
 s
[1] -1
 length(s)
[1] 1
 str(news1o)
 num [1:3588] 891 890 890 888 886 ...
 str(s2o)
 num [1:3588] 895 892 890 888 885 ...




-- 
'Raghu'

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How do I move axis labels closer to plot box?

2010-07-12 Thread chen jia

Hi there,

I place a vector of strings as labels at the tick points by using

axis(1,at=seq(0.1,0.7,by=0.1),
labels=paste(seq(10,70,by=10),%,sep=), tick=FALSE)

However, there is a large space between those labels and the boundary
of plot box. I want to reduce this space so that the labels appear
just next to the boundary of the plot box. How do I do that?

Thanks.

Best,
Jia

-- 
                         Ohio State University - Finance
                                   248 Fisher Hall
                                    2100 Neil Ave.
                              Columbus, Ohio  43210
                             Telephone: 614-292-2830
                       http://www.fisher.osu.edu/~chen_1002/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How do I convert a XML file to a data.frame?

2010-07-12 Thread Magnus Statistik


Hi,
I have a problem converting a XML file, via the XML package, to a data.frame.
 
The XML file looks like this:
 
Transaction
  ID value=0044/
  Var1 value=XYZ159/
  Var2 value=_/
  Var3 value=AMR1.0-INT-1005/
  Var4 value=2010-05-25 10:44:16:673/
  Var5 value=1/
  Var6 value=0/
/Transaction
Transaction
  ID value=0046/
  Var1 value=XBC254/
  Var2 value=GLOBAL/
  Var3 value=AMR2.0-INT-9997/
  Var4 value=2010-05-25 11:22:50:803/
  Var5 value=2/
  Var6 value=0/
/Transaction
Transaction
  ID value=unknown/
  Var1 value=HGF358/
  Var2 value=REGION_A/
  Var3 value=AMR2.5-INT-1154/
  Var4 value=2010-05-24 10:08:26:711/
  Var5 value=3/
  Var6 value=0/
/Transaction
 
I don't usually use XML files, but I have searched for an answer for quite a 
while. I have tried xmlToDataFrame but it demands a structure similar to this:
 
top
 obs
   var1value/var1
   var2value/var2
   var3value/var3
 /obs
 obs
   var1value/var1
   var2value/var2
   var3value/var3
 /obs  
/top
 
The top node top could in my case maybe be added to the XML file directly 
(or via a R command?), but the main issue is to use the children structure in 
my file (which is different compared to the one that can be used with the 
xmlToDataFrame),
var1 value=/
to convert the XML file to a meaningful data.frame with both categorical and 
quantitative data.
 
 
Any tips or tricks? They are highly appreciated.
 
Thanks,
Magnus
  
_
Hotmail: Free, trusted and rich email service.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] in continuation with the earlier R puzzle

2010-07-12 Thread Alain Guillet

 I don't know what is wrong with your code but I believe you should use 
ifelse instead of a for loop:


s - ifelse(news1o  s2o, 1 , -1 )


Alain

On 12-Jul-10 16:09, Raghu wrote:

When I just run a for loop it works. But if I am going to run a for loop
every time for large vectors I might as well use C or any other language.
The reason R is powerful is becasue it can handle large vectors without each
element being manipulated? Please let me know where I am wrong.

for(i in 1:length(news1o)){
+ if(news1o[i]s2o[i])
+ s[i]-1
+ else
+ s[i]--1
+ }



--
Alain Guillet
Statistician and Computer Scientist

SMCS - IMMAQ - Université catholique de Louvain
Bureau c.316
Voie du Roman Pays, 20
B-1348 Louvain-la-Neuve
Belgium

tel: +32 10 47 30 50

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a small puzzle?

2010-07-12 Thread jim holtman

You probably want to use ifelse

s - ifelse(news1os2o, 1, -1)

'if' only handle a single logical expression.


On Mon, Jul 12, 2010 at 10:02 AM, Raghu r.raghura...@gmail.com wrote:
 I know the following may sound too basic but I thought the mailing list is
 for the benefit of all levels of people. I ran a simple if statement on two
 numeric vectors (news1o and s2o) which are of equal length. I have done an
 str on both of them for your kind perusal below. I am trying to compare the
 numbers in both and initiate a new vector s as 1 or 0 depending on if the
 elements in the arrays are greater or lesser than each other. When I do a
 simple s=(news1os2o) I get the values of S as a string of TRUEs and FALSEs
 but when I try to override using the if statements this cribs. I get only
 one element in s and that is a puzzle. Any ideas on this please? Many
 thanks.


 if(news1os2o)(s-1) else
 + (s--1)
 [1] -1
 Warning message:
 In if (news1o  s2o) (s - 1) else (s - -1) :
  the condition has length  1 and only the first element will be used
 s
 [1] -1
 length(s)
 [1] 1
  str(news1o)
  num [1:3588] 891 890 890 888 886 ...
 str(s2o)
  num [1:3588] 895 892 890 888 885 ...




 --
 'Raghu'

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] in continuation with the earlier R puzzle

2010-07-12 Thread Seeliger . Curt

 The reason R is powerful is becasue it can handle large vectors without 
each
 element being manipulated? Please let me know where I am wrong.
 
 for(i in 1:length(news1o)){
 + if(news1o[i]s2o[i])
 + s[i]-1
 + else
 + s[i]--1
 + }

You might give ifelse() a shot here. 
s - ifelse(news1o  s2o, 1, -1)

Learning to think in vectors is important in R, just like thinking in sets 
is important for SQL, or thinking in rows and steps is important in SAS.

cur
-- 
Curt Seeliger, Data Ranger
Raytheon Information Services - Contractor to ORD
seeliger.c...@epa.gov
541/754-4638
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a small puzzle?

2010-07-12 Thread Alain Guillet

 In an if statement, you can use only elements. In your example, news1o 
and s2o are vectors so there is a warning saying the two vectors have a 
bigger length than one.


If you don't send two messages about the same problem in two minutes, 
you can see what people answer you... For example, I advised you to use 
ifelse which works on vectors.



Alain


On 12-Jul-10 16:02, Raghu wrote:

I know the following may sound too basic but I thought the mailing list is
for the benefit of all levels of people. I ran a simple if statement on two
numeric vectors (news1o and s2o) which are of equal length. I have done an
str on both of them for your kind perusal below. I am trying to compare the
numbers in both and initiate a new vector s as 1 or 0 depending on if the
elements in the arrays are greater or lesser than each other. When I do a
simple s=(news1os2o) I get the values of S as a string of TRUEs and FALSEs
but when I try to override using the if statements this cribs. I get only
one element in s and that is a puzzle. Any ideas on this please? Many
thanks.


if(news1os2o)(s-1) else
+ (s--1)
[1] -1
Warning message:
In if (news1o  s2o) (s- 1) else (s- -1) :
   the condition has length  1 and only the first element will be used

s

[1] -1

length(s)

[1] 1
  str(news1o)
  num [1:3588] 891 890 890 888 886 ...

str(s2o)

  num [1:3588] 895 892 890 888 885 ...






--
Alain Guillet
Statistician and Computer Scientist

SMCS - IMMAQ - Université catholique de Louvain
Bureau c.316
Voie du Roman Pays, 20
B-1348 Louvain-la-Neuve
Belgium

tel: +32 10 47 30 50

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can anybody help me understand AIC and BIC and devise a new metric?

2010-07-12 Thread Stephan Kolassa

Hi,

one comment: Claeskens and Hjort define AIC as 2*log L - 2*p for a model
with likelihood L and p parameters; consequently, they look for models
with *maximum* AIC in model selection and averaging. This differs from
the vast majority of authors (and R), who define AIC as -2*log L + 2*p
and search for the model with *minimum* AIC. Their definition of BIC is
similarly the negative of normal BIC.

I would compare this to defining \pi as the base of the natural
logarithm and e as the ratio of a circle's circumference to its
diameter: of course, you can do perfectly valid mathematics with your
own definitions, but it is a recipe for confusion.

Anyone who only reads Claeskens and Hjort, fires up R and selects the
model with the maximum AIC from the candidate models is in for some
*nasty* surprises.

Worse, as far as I see, Claeskens and Hjort nowhere mention that they
are using a definition that is diametrically opposed to what is
(overwhelmingly) common, and they do not comment on this.

However, Claeskens and Hjort managed to publish a book, which I have yet
to do, so it is quite possible that there is a major flaw in my
thinking. If so, I haven't found it yet, and I would be very grateful if
somebody pointed out what I misunderstand.

Otherwise, I would be *very* careful indeed about basing my analysis
strategy on their book, although the rest of the content is very helpful
indeed - you only need to remember where to switch signs and change
maximize to minimize etc.

For AIC and BIC novices, I would recommend going with Burnham
Anderson, which Kjetil cited below.

Best,
Stephan

Kjetil Halvorsen schrieb:

You should have a look at:

Model Selection and
Model Averaging
Gerda Claeskens
K.U. Leuven
Nils Lid Hjort
University of Oslo

Among other this will explain that AIC and BIC really aims at different goals.

On Mon, Jul 5, 2010 at 4:20 PM, Dennis Murphy djmu...@gmail.com wrote:

Hi:

On Mon, Jul 5, 2010 at 7:35 AM, LosemindL comtech@gmail.com wrote:

Hi all,

Could anybody please help me understand AIC and BIC and especially why do
they make sense?

Any good text that discusses model selection in detail will have some
discussion of
AIC and BIC. Frank Harrell's book 'Regression Modeling Strategies' comes
immediately
to mind, along with Hastie, Tibshirani and Friedman (Elements of Statistical
Learning)
and Burnham and Anderson's book (Model Selection and Multi-Model Inference),
but
there are many other worthy texts that cover the topic. The gist is that AIC
and BIC
penalize the log likelihood of a model by subtracting different functions of
its number
of parameters. David's suggestion of Wikipedia is also on target.

Furthermore, I am trying to devise a new metric related to the model
selection in the financial asset management industry.

As you know the industry uses Sharpe Ratio as the main performance
benchmark, which is the annualized mean of returns divided by the
annualized
standard deviation of returns.

I didn't know, but thank you for the information. Isn't this simply a
signal-to-noise
ratio quantified on an annual basis?

In model selection, we would like to choose a model that yields the highest
Sharpe Ratio.

However, the more parameters you use, the higher Sharpe Ratio you might
potentially get, and the higher risk that your model is overfitted.

I am trying to think of a AIC or BIC version of the Sharpe Ratio that
facilitates the model selection...

You might be able to make some progress if you can express the (penalized)
log likelihood as a function of the Sharpe ratio. But if you have several
years of
data in your model and the ratio is computed annually, then isn't it a
random
variable rather than a parameter? If so, it changes the nature of the
problem, no?
(Being unfamiliar with the Sharpe ratio, I fully recognize that I may be
completely
off-base in this suggestion, but I'll put it out there anyway :)

BTW, you might find the R-sig-finance list to be a more productive resource
in
this problem than R-help due to the specialized nature of the question.

HTH,
Dennis

Anybody could you please give me some pointers?

Thanks a lot!
--
View this message in context:
http://r.789695.n4.nabble.com/Can-anybody-help-me-understand-AIC-and-BIC-and-devise-a-new-metric-tp2278448p2278448.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] in continuation with the earlier R puzzle

2010-07-12 Thread David Winsemius



On Jul 12, 2010, at 10:09 AM, Raghu wrote:

When I just run a for loop it works. But if I am going to run a for  
loop
every time for large vectors I might as well use C or any other  
language.
The reason R is powerful is becasue it can handle large vectors  
without each

element being manipulated? Please let me know where I am wrong.

for(i in 1:length(news1o)){
+ if(news1o[i]s2o[i])
+ s[i]-1
+ else
+ s[i]--1
+ }


Perhaps:

s - 2*( news1o  s2o[1:length(news1o)] ) - 1

...which I think will throw errors under pretty much the same  
conditions that would cause errors in that loop.


--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] in continuation with the earlier R puzzle

2010-07-12 Thread Ted Harding

On 12-Jul-10 14:09:30, Raghu wrote:
 When I just run a for loop it works. But if I am going to
 run a for loop every time for large vectors I might as well
 use C or any other language.
 The reason R is powerful is becasue it can handle large vectors
 without each element being manipulated? Please let me know where
 I am wrong.
 
 for(i in 1:length(news1o)){
 + if(news1o[i]s2o[i])
 + s[i]-1
 + else
 + s[i]--1
 + }
 
 -- 
 'Raghu'

Many operations over the whole length of vectors can be done
in vectorised form, in which an entire vector is changed
in one operation based on the values of the separate elemnts
of other vectors, also all take into account in a single
operation. What happens behind to scenes is that the single
element by element operations are performed by a function
in a precompiled (usually from C) library. Hence R already
does what you are suggesting as a might as well alternative!

Below is an example, using long vectors. The first case is a
copy of your R loop above (with some additional initialisation
of the vectors). The second achieves the same result in the
vectorised form.

  news1o - runif(100)
  s2o- runif(100)
  s  - numeric(length(news1o))

  proc.time()
  #user  system elapsed 
  #   1.728   0.680 450.257 
  for(i in 1:length(news1o)){  ### Using a loop
if(news1o[i]s2o[i])
s[i]-   1
else
s[i]- (-1)
  }
  proc.time()
  #user  system elapsed
  #  11.184   0.756 460.340 
  s2 - 2*(news1o  s2o) - 1   ### Vectorised
  proc.time()
  #user  system elapsed 
  #  11.348   0.852 460.663

  sum(s2 != s)
  # [1] 0  ### Results identical

Result: The loop took (11.184 -  1.728) = 9.456 seconds,
  Vectorised, it took (11.348 - 11.184) = 0.164 seconds.

Loop/Vector = (11.184 - 1.728)/(11.348 - 11.184) = 57.65854

i.e. nearly 60 times as long.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 12-Jul-10   Time: 17:36:07
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compress string memCompress/Decompress

2010-07-12 Thread Seth Falcon

On Mon, Jul 12, 2010 at 9:17 AM, Erik Wright eswri...@wisc.edu wrote:
 Hi Seth,

 Can you recreate the example below using dbWriteTable?


Not sure if that is possible with the current dbWriteTable code (don't
have time to explore that right now).  You are welcome to poke around.
You could wrap the example in a helper function to provide your own
BLOB respecting write table function if you can't get dbWriteTable to
work for your case.

+ seth

-- 
Seth Falcon | @sfalcon | http://userprimary.net/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple Ploting Colors

2010-07-12 Thread Ken Takagi

Richardson, Patrick Patrick.Richardson at vai.org writes:

 
 I'm trying to use multiple plotting colors in my code. My first ifelse
statement successfully does what I
 want. However, now I want anything less than -4.5 to be green and the rest
black. I want another col
 argument but can only use one. How could I go about getting separate colors
for anything above 4.5  and less
 than -4.5?
 
 plot(three, type=h, col=ifelse(three  4.5, red, black), xlim=c(0,500),
ylim=range(three), lwd=2,
  xlab=Chromosome, ylab=Z-Score, font.lab=2, font=2, main=Upregulated
Genes in Patient Sample)
 
 Thanks in advance,
 
 Patrick
 
 The information transmitted is intended only for the p...{{dropped:22}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to use mpi.allreduce() in Rmpi?

2010-07-12 Thread Eduardo García

Hi everybody!

I have the next code which makes a reduction of the *a *variable in two
slaves, using the Rmpi package.

library(Rmpi)

mpi.spawn.Rslaves(nslaves=2)

reduc-function(){
  a-mpi.comm.rank()+2
  mpi.reduce(a,type=2, op=prod)
  return(paste(a=,a))
}

mpi.bcast.Robj2slave(reduc)
mpi.remote.exec(reduc())

cat(Product: )
mpi.reduce(1,op=prod)


mpi.close.Rslaves()


I want to use the function mpi.allreduce() instead of mpi.reduce(), so
slaves should receive the value of the reduction. I don't know how to do it
and the avaliable documentation is very small and i'm starting with Rmpi. I
also tried with the next two changes but nothing:


library(Rmpi)

mpi.spawn.Rslaves(nslaves=2)

reduc-function(){
  a-mpi.comm.rank()+2
  mpi.reduce(a,type=2, op=prod)
  return(paste(a=,a))
}

mpi.bcast.Robj2slave(reduc)
mpi.remote.exec(reduc())

cat(Product: )
mpi.allreduce(1,op=prod)


mpi.close.Rslaves()



and



library(Rmpi)

mpi.spawn.Rslaves(nslaves=2)

reduc-function(){
  a-mpi.comm.rank()+2
  mpi.allreduce(a,type=2, op=prod)
  return(paste(a=,a))
}

mpi.bcast.Robj2slave(reduc)
mpi.remote.exec(reduc())

cat(Product: )
mpi.allreduce(1,op=prod)


mpi.close.Rslaves()

Could somebody help me?

Thanks a lot in advance for your help !*
*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple Ploting Colors

2010-07-12 Thread Steve Lianoglou

Hi,

On Mon, Jul 12, 2010 at 12:02 PM, Richardson, Patrick
patrick.richard...@vai.org wrote:
 I'm trying to use multiple plotting colors in my code. My first ifelse 
 statement successfully does what I want. However, now I want anything less 
 than -4.5 to be green and the rest black. I want another col argument but 
 can only use one. How could I go about getting separate colors for anything 
 above 4.5  and less than -4.5?

 plot(three, type=h, col=ifelse(three  4.5, red, black), xlim=c(0,500), 
 ylim=range(three), lwd=2,
     xlab=Chromosome, ylab=Z-Score, font.lab=2, font=2, main=Upregulated 
 Genes in Patient Sample)

How about:

my.colors - ifelse(three  4.5, red, black)
my.colors[three  -4.5] - 'green'

plot(three, type='h', col=my.colors, ...)

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple Ploting Colors

2010-07-12 Thread Steve Lianoglou

One more thing:

On Mon, Jul 12, 2010 at 12:55 PM, Steve Lianoglou
mailinglist.honey...@gmail.com wrote:
 Hi,

 On Mon, Jul 12, 2010 at 12:02 PM, Richardson, Patrick
 patrick.richard...@vai.org wrote:
 I'm trying to use multiple plotting colors in my code. My first ifelse 
 statement successfully does what I want. However, now I want anything less 
 than -4.5 to be green and the rest black. I want another col argument but 
 can only use one. How could I go about getting separate colors for anything 
 above 4.5  and less than -4.5?

 plot(three, type=h, col=ifelse(three  4.5, red, black), 
 xlim=c(0,500), ylim=range(three), lwd=2,
     xlab=Chromosome, ylab=Z-Score, font.lab=2, font=2, main=Upregulated 
 Genes in Patient Sample)

 How about:

 my.colors - ifelse(three  4.5, red, black)
 my.colors[three  -4.5] - 'green'

 plot(three, type='h', col=my.colors, ...)

Depending on what you want the plot for, perhaps you might consider
changing your color palette from green - black - red to something
like blue - black - yellow, since many folks who are color can not
differentiate green from red all that well.

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] densities greater than 1 for values within an (0,1) intervall

2010-07-12 Thread Katja Hillmann


Hello,

I used the command kdensity in order to calculate the density of 
fractions/ratios (e.g. number of longterm unemployed on total 
unemployment). Thus I try to calculate the denisty of values less than 
1. However, the values of the kernel densitiy R provided (y-scale) are 
all greater than 1. Where is the problem and how may I solve it? Does R 
have problems in calculating distributions of variables within an 
intervall of 0 and 1?


Best,
Katja

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do I move axis labels closer to plot box?

2010-07-12 Thread Ken Takagi

chen jia chen_1002 at fisher.osu.edu writes:

 


Check out the ?par().  Specifically mgp.

HTH,
Ken

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple Ploting Colors

2010-07-12 Thread Steve Lianoglou

Ugh:

 Depending on what you want the plot for, perhaps you might consider
 changing your color palette from green - black - red to something
 like blue - black - yellow, since many folks who are color can not
 differentiate green from red all that well.

... folks who are color *blind* can not differentiate green from red ...


-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple Ploting Colors

2010-07-12 Thread Richardson, Patrick

Steve,

That worked perfectly.  Thank You!

Best regards,

Patrick


-Original Message-
From: Steve Lianoglou [mailto:mailinglist.honey...@gmail.com] 
Sent: Monday, July 12, 2010 12:55 PM
To: Richardson, Patrick
Cc: r-help@r-project.org
Subject: Re: [R] Multiple Ploting Colors

Hi,

On Mon, Jul 12, 2010 at 12:02 PM, Richardson, Patrick
patrick.richard...@vai.org wrote:
 I'm trying to use multiple plotting colors in my code. My first ifelse 
 statement successfully does what I want. However, now I want anything less 
 than -4.5 to be green and the rest black. I want another col argument but 
 can only use one. How could I go about getting separate colors for anything 
 above 4.5  and less than -4.5?

 plot(three, type=h, col=ifelse(three  4.5, red, black), xlim=c(0,500), 
 ylim=range(three), lwd=2,
     xlab=Chromosome, ylab=Z-Score, font.lab=2, font=2, main=Upregulated 
 Genes in Patient Sample)

How about:

my.colors - ifelse(three  4.5, red, black)
my.colors[three  -4.5] - 'green'

plot(three, type='h', col=my.colors, ...)

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
The information transmitted is intended only for the per...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple Ploting Colors

2010-07-12 Thread Ken Takagi

Richardson, Patrick Patrick.Richardson at vai.org writes:

 
 I'm trying to use multiple plotting colors in my code. My first ifelse
statement successfully does what I
 want. However, now I want anything less than -4.5 to be green and the rest
black. I want another col
 argument but can only use one. How could I go about getting separate colors
for anything above 4.5  and less
 than -4.5?
 
 plot(three, type=h, col=ifelse(three  4.5, red, black), xlim=c(0,500),
ylim=range(three), lwd=2,
  xlab=Chromosome, ylab=Z-Score, font.lab=2, font=2, main=Upregulated
Genes in Patient Sample)
 
 Thanks in advance,
 
 Patrick
 
 The information transmitted is intended only for the p...{{dropped:19}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] densities greater than 1 for values within an (0, 1) intervall

2010-07-12 Thread Jeff Newmiller

There is no constraint on the magnitude of probability density values, though 
the area under the curve must be equal to one. You may be thinking of 
cumulative probability distributions? If so, take a look at smoothed.df() in 
library (cwhmisc).

Katja Hillmann katja.hillm...@wiso.uni-hamburg.de wrote:

Hello,

I used the command kdensity in order to calculate the density of 
fractions/ratios (e.g. number of longterm unemployed on total 
unemployment). Thus I try to calculate the denisty of values less than 
1. However, the values of the kernel densitiy R provided (y-scale) are 
all greater than 1. Where is the problem and how may I solve it? Does R 
have problems in calculating distributions of variables within an 
intervall of 0 and 1?

Best,
Katja

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
---
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Calculating weibull's distribution mean, standard deviation, and variance

2010-07-12 Thread Oscar Rodriguez

Dear R community: 

Sorry if this question has a simple answer, but I am a new user of R. 

Do you know a command, or package that can estimate the weibull distribution's 
mean, standard deviation and variance? or can direct me to where to find it? 

Thanks in advance, 









Oscar Rodriguez Gonzalez 
Mobile: 519.823.3409 
PhD Student 
Canadian Research Institute for Food Safety 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compress string memCompress/Decompress

2010-07-12 Thread Erik Wright

Hi Seth,

Can you recreate the example below using dbWriteTable?

Thanks!,
Erik


On Jul 11, 2010, at 6:13 PM, Seth Falcon wrote:

 On Sun, Jul 11, 2010 at 11:31 AM, Matt Shotwell shotw...@musc.edu wrote:
 On Fri, 2010-07-09 at 20:02 -0400, Erik Wright wrote:
 Hi Matt,
 
 This works great, thanks!
 
 At first I got an error message saying BLOB is not implemented in RSQLite.  
 When I updated to the latest version it worked.
 
 SQLite began to support BLOBs from version 3.0.
 
 And RSQLite began supporting BLOBs only just recently :-)
 See the NEWS file for details.
 
 Below is a minimal example of how you might use BLOBs:
 
db - dbConnect(SQLite(), dbname = :memory:)
dbGetQuery(db, CREATE TABLE t1 (name TEXT, data BLOB))
 
z - paste(hello, 1:10)
df - data.frame(a = letters[1:10],
 z = I(lapply(z, charToRaw)))
dbGetPreparedQuery(db, insert into t1 values (:a, :z), df)
a - dbGetQuery(db, select name from t1)
checkEquals(10, nrow(a))
a - dbGetQuery(db, select data from t1)
checkEquals(10, nrow(a))
a - dbGetQuery(db, select * from t1)
checkEquals(10, nrow(a))
checkEquals(2, ncol(a))
 
checkEquals(z, sapply(a$data, rawToChar))
dbDisconnect(db)
 
 
 -- 
 Seth Falcon | @sfalcon | http://userprimary.net/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] in continuation with the earlier R puzzle

2010-07-12 Thread Huso, Manuela

Using Ted Harding's example:

news1o - runif(100)
s2o- runif(100)

pt1 - proc.time()
s  - numeric(length(news1o))-1  # Set all of s to -1
s[news1os2o] -1# Change to 1 only those values of s
 #  for which news1os2o
pt2- proc.time()
pt2-pt1  # Takes even less time...   
#   user  system elapsed 
#   0.040.000.05 


Please note:  I will be out of the office and out
  of email contact from 7/11-7/25/2010

Manuela Huso
Consulting Statistician
201H Richardson Hall
Department of Forest Ecosystems and Society
Oregon State University
Corvallis, OR   97331
ph: 541-737-6232
fx: 541-737-1393


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Ted Harding
Sent: Monday, July 12, 2010 9:36 AM
To: r-help@r-project.org
Cc: Raghu
Subject: Re: [R] in continuation with the earlier R puzzle

On 12-Jul-10 14:09:30, Raghu wrote:
 When I just run a for loop it works. But if I am going to
 run a for loop every time for large vectors I might as well
 use C or any other language.
 The reason R is powerful is becasue it can handle large vectors
 without each element being manipulated? Please let me know where
 I am wrong.
 
 for(i in 1:length(news1o)){
 + if(news1o[i]s2o[i])
 + s[i]-1
 + else
 + s[i]--1
 + }
 
 -- 
 'Raghu'

Many operations over the whole length of vectors can be done
in vectorised form, in which an entire vector is changed
in one operation based on the values of the separate elemnts
of other vectors, also all take into account in a single
operation. What happens behind to scenes is that the single
element by element operations are performed by a function
in a precompiled (usually from C) library. Hence R already
does what you are suggesting as a might as well alternative!

Below is an example, using long vectors. The first case is a
copy of your R loop above (with some additional initialisation
of the vectors). The second achieves the same result in the
vectorised form.

  news1o - runif(100)
  s2o- runif(100)
  s  - numeric(length(news1o))

  proc.time()
  #user  system elapsed 
  #   1.728   0.680 450.257 
  for(i in 1:length(news1o)){  ### Using a loop
if(news1o[i]s2o[i])
s[i]-   1
else
s[i]- (-1)
  }
  proc.time()
  #user  system elapsed
  #  11.184   0.756 460.340 
  s2 - 2*(news1o  s2o) - 1   ### Vectorised
  proc.time()
  #user  system elapsed 
  #  11.348   0.852 460.663

  sum(s2 != s)
  # [1] 0  ### Results identical

Result: The loop took (11.184 -  1.728) = 9.456 seconds,
  Vectorised, it took (11.348 - 11.184) = 0.164 seconds.

Loop/Vector = (11.184 - 1.728)/(11.348 - 11.184) = 57.65854

i.e. nearly 60 times as long.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 12-Jul-10   Time: 17:36:07
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] in continuation with the earlier R puzzle

2010-07-12 Thread Raghu

Thanks to you all. I stand corrected Ted and Manuela:) I am just an end user
and trying to pick up from such forums. Many thanks sirs.


On Mon, Jul 12, 2010 at 5:45 PM, Huso, Manuela manuela.h...@oregonstate.edu
 wrote:

 Using Ted Harding's example:

 news1o - runif(100)
 s2o- runif(100)

 pt1 - proc.time()
 s  - numeric(length(news1o))-1  # Set all of s to -1
 s[news1os2o] -1# Change to 1 only those values of s
 #  for which news1os2o
 pt2- proc.time()
 pt2-pt1  # Takes even less time...
 #   user  system elapsed
 #   0.040.000.05

 
 Please note:  I will be out of the office and out
  of email contact from 7/11-7/25/2010
 
 Manuela Huso
 Consulting Statistician
 201H Richardson Hall
 Department of Forest Ecosystems and Society
 Oregon State University
 Corvallis, OR   97331
 ph: 541-737-6232
 fx: 541-737-1393


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Ted Harding
 Sent: Monday, July 12, 2010 9:36 AM
 To: r-help@r-project.org
 Cc: Raghu
 Subject: Re: [R] in continuation with the earlier R puzzle

 On 12-Jul-10 14:09:30, Raghu wrote:
  When I just run a for loop it works. But if I am going to
  run a for loop every time for large vectors I might as well
  use C or any other language.
  The reason R is powerful is becasue it can handle large vectors
  without each element being manipulated? Please let me know where
  I am wrong.
 
  for(i in 1:length(news1o)){
  + if(news1o[i]s2o[i])
  + s[i]-1
  + else
  + s[i]--1
  + }
 
  --
  'Raghu'

 Many operations over the whole length of vectors can be done
 in vectorised form, in which an entire vector is changed
 in one operation based on the values of the separate elemnts
 of other vectors, also all take into account in a single
 operation. What happens behind to scenes is that the single
 element by element operations are performed by a function
 in a precompiled (usually from C) library. Hence R already
 does what you are suggesting as a might as well alternative!

 Below is an example, using long vectors. The first case is a
 copy of your R loop above (with some additional initialisation
 of the vectors). The second achieves the same result in the
 vectorised form.

  news1o - runif(100)
  s2o- runif(100)
  s  - numeric(length(news1o))

  proc.time()
  #user  system elapsed
  #   1.728   0.680 450.257
  for(i in 1:length(news1o)){  ### Using a loop
if(news1o[i]s2o[i])
s[i]-   1
else
s[i]- (-1)
  }
  proc.time()
  #user  system elapsed
  #  11.184   0.756 460.340
  s2 - 2*(news1o  s2o) - 1   ### Vectorised
  proc.time()
  #user  system elapsed
  #  11.348   0.852 460.663

  sum(s2 != s)
  # [1] 0  ### Results identical

 Result: The loop took (11.184 -  1.728) = 9.456 seconds,
  Vectorised, it took (11.348 - 11.184) = 0.164 seconds.

 Loop/Vector = (11.184 - 1.728)/(11.348 - 11.184) = 57.65854

 i.e. nearly 60 times as long.

 Ted.

 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 12-Jul-10   Time: 17:36:07
 -- XFMail --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
'Raghu'

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Calculate confidence interval of the mean based on ANOVA

2010-07-12 Thread Paul

I am trying to recreate an analysis that has been done by another group 
(in SAS I believe).  I'm stuck on one part, I think because my stats 
knowledge is lacking, and while it's OT, I'm hoping someone here can help.

Given this dataframe;

foo*-*structure(list(OBS = structure(1:18, .Label = c(1, 2, 3,

4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,

27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,

38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,

49, 50, 51, 52, 53, 54), class = factor), NOM = 
structure(c(1L,

1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

1L), .Label = c(0.05, 0.1, 1), class = factor), RUN = 
structure(c(1L,

1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L,

6L), .Label = c(1, 2, 3, 4, 5, 6), class = factor),

CALC = c(0.04989, 0.04872, 0.04544, 0.05645, 0.06516, 0.0622,

0.04868, 0.05006, 0.04746, 0.05574, 0.04442, 0.04742, 0.05508,

0.0593, 0.04898, 0.06373, 0.05537, 0.04674)), .Names = c(OBS,

NOM, RUN, CALC), row.names = c(NA, 18L), class = data.frame)

I want to perform an anova on CALC~RUN, and based on that calculate the 
95% confidence interval.  However the interval produced by the earlier 
analysis is  [0.04741, 0.05824].  Is there some way to calculate a 
confidence interval based on an ANOVa that I'm completely missing ?

  nrow(foo)

[1] 18

  mean(foo$CALC)

[1] 0.05282444

  fooaov-aov(CALC~RUN,data=foo)

  print(fooaov)

Call:

aov(formula = CALC ~ RUN, data = foo)

Terms:

RUN Residuals

Sum of Squares 0.0003991420 0.0003202277

Deg. of Freedom 5 12

Residual standard error: 0.005165814

Estimated effects may be unbalanced

  print(summary(fooaov))

Df Sum Sq Mean Sq F value Pr(F)

RUN 5 0.00039914 7.9828e-05 2.9914 0.05565 .

Residuals 12 0.00032023 2.6686e-05

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  model.tables(fooaov,type=means,se=TRUE)

Tables of means

Grand mean

0.05282444

RUN

RUN

1 2 3 4 5 6

0.04802 0.06127 0.04873 0.04919 0.05445 0.05528

Standard errors for differences of means

RUN

0.004218

replic. 3

  t.test(foo$CALC,conf.level=0.95)

One Sample t-test

data: foo$CALC

t = 34.4524, df = 17, p-value  2.2e-16

alternative hypothesis: true mean is not equal to 0

95 percent confidence interval:

0.04958955 0.05605934

sample estimates:

mean of x

0.05282444

Thanks

Paul.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to create sequence in month

2010-07-12 Thread Bogaso Christofer

Hi all, can anyone please guide me how to create a sequence of months? Here
I have tried following however couldn't get success 

 

 library(zoo)

 seq(as.yearmon(2010-01-01), as.yearmon(2010-03-01), by=1 month)

Error in del/by : non-numeric argument to binary operator

 

What is the correct way to do that?

 

Thanks for your time.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to create sequence in month

2010-07-12 Thread MacQueen, Don

As in this example:

 seq(as.Date(2000/1/1), as.Date(2003/1/1), by=mon)


On 7/12/10 11:25 AM, Bogaso Christofer bogaso.christo...@gmail.com
wrote:

 Hi all, can anyone please guide me how to create a sequence of months? Here
 I have tried following however couldn't get success
 
  
 
 library(zoo)
 
 seq(as.yearmon(2010-01-01), as.yearmon(2010-03-01), by=1 month)
 
 Error in del/by : non-numeric argument to binary operator
 
  
 
 What is the correct way to do that?
 
  
 
 Thanks for your time.
 
 
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://*stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
925 423-1062

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating weibull's distribution mean, standard deviation, and variance

2010-07-12 Thread Peter Ehlers


Try fitdistr() in pkg MASS.

  -Peter Ehlers

On 2010-07-12 11:17, Oscar Rodriguez wrote:

Dear R community:

Sorry if this question has a simple answer, but I am a new user of R.

Do you know a command, or package that can estimate the weibull distribution's 
mean, standard deviation and variance? or can direct me to where to find it?

Thanks in advance,


Oscar Rodriguez Gonzalez
Mobile: 519.823.3409
PhD Student
Canadian Research Institute for Food Safety


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] eliminating constant variables

2010-07-12 Thread Setlhare Lekgatlhamang

What was the question and answer here?

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of pdb
Sent: Sunday, July 11, 2010 5:23 AM
To: r-help@r-project.org
Subject: Re: [R] eliminating constant variables
Importance: Low

Awsome!

It made sense once I realised SD=standard deviation !

pdb
-- 
View this message in context:
http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2
284915.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

DISCLAIMER:\ Sample Disclaimer added in a VBScript.\ ...{{dropped:3}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to create sequence in month

2010-07-12 Thread Stefan Grosse

Am 12.07.2010 20:25, schrieb Bogaso Christofer:

 library(zoo)
 
 seq(as.yearmon(2010-01-01), as.yearmon(2010-03-01), by=1 month)
 

seq(as.Date(2010-01-01), as.Date(2010-03-01), by=1 month)


hth
Stefan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error in storage.mode(test) - logical

2010-07-12 Thread Heiman, Thomas J.

Hi There,

I get the following error from the code pasted below: Error in 
storage.mode(test) - logical :
  object 'HGBmt12_Natl_Ave_or_Facility' not found

library(RODBC)
library(car)
setwd(c://temp//cms)
a07.connect - odbcConnectAccess2007(DFC.accdb)
sqlTables(a07.connect) ##provides list of tables##
dataset - sqlFetch(a07.connect,'Analysis File 2007-2009') #puts dfc data into 
table mydata
str(dataset)

#this works and gives correct values
HGlt102009=dataset[,6]
HGBL10_F_2007 =dataset[,11]
HGmt122009=dataset[,7]
HGBL12_F_2007 = dataset[,16]
URRmt65Perc2009 = dataset[,3]
URRG65_F_2007=dataset[,22]
yes1=HGlt102009-HGBL10_F_2007
no1=HGlt102009-2
yes2=HGmt122009-HGBL12_F_2007
no2=HGmt122009-26
yes3=URRG65_F_2007-URRmt65Perc2009
no3=96-URRmt65Perc2009
Analysis2009 - transform(dataset
 , HGBlt10_Natl_Ave_or_Facility = recode(HGBL10_F_2007,0:2='National'; 
else='Facility')
 , HGBmt12_Natl_Ave_or_Facility = recode(HGBL12_F_2007,0:26='National'; 
else='Facility')
 , URRmt65_Natl_Ave_or_Facility = recode(URRG65_F_2007,96:100='National'; 
else='Facility')
 , HGlt10RawPerc = ifelse(HGBlt10_Natl_Ave_or_Facility == Facility,yes1,no1)
 , HGmt12RawPerc = ifelse(HGBmt12_Natl_Ave_or_Facility == Facility,yes2,no2)
 , URRmt65Perc = ifelse(URRmt65_Natl_Ave_or_Facility == Facility,yes3,no3)
 ,  HGBlt10Points-recode(HGlt10RawPerc ,-1001:0=10; 0:1=8; 1:2=6; 2:3=4; 
3:4=2; else=0 )
 ,  HGBlt12Points-recode(HGlt12RawPerc ,-1001:2=6; 2:3=4; 3:4=2; else=0 )
 ,  URRmt65Points-recode(HGlt10RawPerc ,-1001:0=10; 0:1=8; 1:2=6; 2:3=4; 
3:4=2; else=0 )
)

Any ideas on what it means and why?  I'd really appreciate it!!  Thank you!!

Sincerely,

tom



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] in continuation with the earlier R puzzle

2010-07-12 Thread Joshua Wiley

I wanted to point out one thing that Ted said, about initializing the
vectors ('s' in your example).  This can make a dramatic speed
difference if you are using a for loop (the difference is neglible
with vectorized computations).

Also, a lot of benchmarks have been flying around, each from a
different system and using random numbers without identical seeds.  So
to provide an overall comparison of all the methods I saw here plus
demonstrate the speed difference for initializing a vector (if you
know its desired length in advance), I ran these benchmarks.

Notes:
I did not want to interfere with your objects so I used different
names. The equivalencies are: news1o = x; s2o = y; s = z.
system.time() automatically calculates the time difference from
proc.time() between start and finish .

 ##R version info
 sessionInfo()
R version 2.11.1 (2010-05-31)
x86_64-pc-mingw32
#snipped

 ##Some Sample Data
 set.seed(10)
 x - rnorm(10^6)
 set.seed(15)
 y - rnorm(10^6)

 ##Benchmark 1
 z.1 - NULL
 system.time(for(i in 1:length(x)) {
+   if(x[i]  y[i]) {
+ z.1[i] - 1
+   } else {
+ z.1[i] - -1}
+ }
+ )
   user  system elapsed
1303.83  174.24 1483.74

 ##Benchmark 2
 #initialize 'z' at length
 z.2 - vector(numeric, length = 10^6)
 system.time(for(i in 1:length(x)) {
+   if(x[i]  y[i]) {
+ z.2[i] - 1
+   } else {
+ z.2[i] - -1}
+ }
+ )
   user  system elapsed
   3.770.003.77

 ##Benchmark 3

 z.3 - NULL
 system.time(z.3 - ifelse(x  y, 1, -1))
   user  system elapsed
   0.380.000.38

 ##Benchmark 4

 z.4 - vector(numeric, length = 10^6)
 system.time(z.4 - ifelse(x  y, 1, -1))
   user  system elapsed
   0.310.000.31

 ##Benchmark 5

 system.time(z.5 - 2*(x  y) - 1)
   user  system elapsed
   0.010.000.01

 ##Benchmark 6

 system.time(z.6 - numeric(length(x))-1)
   user  system elapsed
  0   0   0
 system.time(z.6[x  y] - 1)
   user  system elapsed
   0.030.000.03

 ##Show that all results are identical

 identical(z.1, z.2)
[1] TRUE
 identical(z.1, z.3)
[1] TRUE
 identical(z.1, z.4)
[1] TRUE
 identical(z.1, z.5)
[1] TRUE
 identical(z.1, z.6)
[1] TRUE


I have not replicated these on other system, but tentatively, it
appears that loops are significantly slower than ifelse(), which in
turn is slower than options 5 and 6.  However, when using the same
test data  and the same system, I did not find an appreciable
difference between options 5 and 6 speed wise.

Cheers,

Josh

On Mon, Jul 12, 2010 at 7:09 AM, Raghu r.raghura...@gmail.com wrote:
 When I just run a for loop it works. But if I am going to run a for loop
 every time for large vectors I might as well use C or any other language.
 The reason R is powerful is becasue it can handle large vectors without each
 element being manipulated? Please let me know where I am wrong.

 for(i in 1:length(news1o)){
 + if(news1o[i]s2o[i])
 + s[i]-1
 + else
 + s[i]--1
 + }

 --
 'Raghu'

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] exercise in frustration: applying a function to subsamples

2010-07-12 Thread Ted Byers

From the documentation I have found, it seems that one of the functions from
package plyr, or a combination of functions like split and lapply would
allow me to have a really short R script to analyze all my data (I have
reduced it to a couple hundred thousand records with about half a dozen
records.

I get the same result from ddply and split/lapply:

 ddply(moreinfo,c(m_id,sale_year,sale_week),
 +   function(df) data.frame(res = fitdist(df$elapsed_time,exp),est =
 res$estimate,sd = res$sd))
 Error in fitdist(df$elapsed_time, exp) :
   data must be a numeric vector of length greater than 1


and


 lapply(split(moreinfo,list(moreinfo$m_id,moreinfo$sale_year,moreinfo$sale_week)),
 +   function(df) fitdist(df$elapsed_time,exp))
 Error in fitdist(df$elapsed_time, exp) :
   data must be a numeric vector of length greater than 1


Now, in retrospect, unless I misunderstood the properties of a data.frame, I
suppose a data.frame might not have been entirely appropriate as the m_id
samples start and end on very different dates, but I would have thought a
list data structure should have been able to handle that.  It would seem
that split is making groups that have the same start and end dates (or that
if, for example, I have sale data for precisely the last year, split would
insist on both 2009 and 2010 having weeks from 0 through 52 instead of just
the weeks in each year that actually have data: 26 through 52 for last year
and 1 through 25 for this year).  I don't see how else the data passed to
fitdist could have a sample size of 0.

I'd appreciate understanding how to resolve this.  However, it isn't s show
stopper as it now seems trivial to just break it out into a loop (followed
by a lapply/split combo using only sale year and sale month).

While I am asking, is there a better way to split such temporally ordered
data into weekly samples that respective the year in which the sample is
taken as well as the week in which it is taken?

Thanks

Ted

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subsetting Lists

2010-07-12 Thread Andrew Leeser

I am looking for a way to create a vector which contains the second element of 
every vector in the list. However, not every vector has two components, so I 
need to generate an NA for those missing. For example, I have created the 
following list:

lst - list(c(a, b), c(c), c(d, e), c(f, g))

 lst
[[1]]
[1] a b
[[2]]
[1] c
[[3]]
[1] d e
[[4]]
[1] f g

I would like the output to be the following:

 output
[1] b   NA e   g

I know I can accomplish this using a for loop, but I am wondering if there's a 
simple and neat way of getting this done in one step?

Thanks in advance.

- Andrew


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting Lists

2010-07-12 Thread Jorge Ivan Velez

Hi Andrew,

Try

sapply(lst, [, 2)

HTH,
Jorge


On Mon, Jul 12, 2010 at 3:12 PM, Andrew Leeser  wrote:

 I am looking for a way to create a vector which contains the second element
 of
 every vector in the list. However, not every vector has two components, so
 I
 need to generate an NA for those missing. For example, I have created the
 following list:

 lst - list(c(a, b), c(c), c(d, e), c(f, g))

  lst
 [[1]]
 [1] a b
 [[2]]
 [1] c
 [[3]]
 [1] d e
 [[4]]
 [1] f g

 I would like the output to be the following:

  output
 [1] b   NA e   g

 I know I can accomplish this using a for loop, but I am wondering if
 there's a
 simple and neat way of getting this done in one step?

 Thanks in advance.

 - Andrew



[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] exercise in frustration: applying a function to subsamples

2010-07-12 Thread Erik Iverson

Your code is not reproducible.  Can you come up with a small example 
showing the crux of your data structures/problem, that we can all run in 
our R sessions?  You're likely get much higher quality responses this way.


Ted Byers wrote:

From the documentation I have found, it seems that one of the functions from

package plyr, or a combination of functions like split and lapply would
allow me to have a really short R script to analyze all my data (I have
reduced it to a couple hundred thousand records with about half a dozen
records.

I get the same result from ddply and split/lapply:


ddply(moreinfo,c(m_id,sale_year,sale_week),
+   function(df) data.frame(res = fitdist(df$elapsed_time,exp),est =
res$estimate,sd = res$sd))
Error in fitdist(df$elapsed_time, exp) :
  data must be a numeric vector of length greater than 1



and


lapply(split(moreinfo,list(moreinfo$m_id,moreinfo$sale_year,moreinfo$sale_week)),
+   function(df) fitdist(df$elapsed_time,exp))
Error in fitdist(df$elapsed_time, exp) :
  data must be a numeric vector of length greater than 1



Now, in retrospect, unless I misunderstood the properties of a data.frame, I
suppose a data.frame might not have been entirely appropriate as the m_id
samples start and end on very different dates, but I would have thought a
list data structure should have been able to handle that.  It would seem
that split is making groups that have the same start and end dates (or that
if, for example, I have sale data for precisely the last year, split would
insist on both 2009 and 2010 having weeks from 0 through 52 instead of just
the weeks in each year that actually have data: 26 through 52 for last year
and 1 through 25 for this year).  I don't see how else the data passed to
fitdist could have a sample size of 0.

I'd appreciate understanding how to resolve this.  However, it isn't s show
stopper as it now seems trivial to just break it out into a loop (followed
by a lapply/split combo using only sale year and sale month).

While I am asking, is there a better way to split such temporally ordered
data into weekly samples that respective the year in which the sample is
taken as well as the week in which it is taken?

Thanks

Ted

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] What is the degrees of freedom in an nlme model

2010-07-12 Thread Jun Shen

Dear all,

I want to do a F test, which involves calculation of the degrees of
freedom for the residuals. Now say, I have a nlme object mod.nlme. I
have two questions

1.How do I extract the degrees of freedom?
2.How is this degrees of freedom calculated in an nlme model?

Thanks.

Jun Shen

Some sample code and data
=
mod.nlme-nlme(RESP~E0+(Emax-E0)*CP**gamma/(EC50**gamma+CP**gamma),data=Data,
fixed=E0+Emax+gamma+EC50~1,
random=list(pdDiag(EC50+E0+gamma~1)),
groups=~ID,
start=list(fixed=c(E0=1,Emax=100,gamma=1,b=50))
)

The Data object
structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L), CP = c(1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30,
45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18,
30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12,
18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25,
12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5,
11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5,
7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5,
5, 7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3,
4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90), RESP = c(3.19, 2.52,
2.89, 3.28, 3.82, 7.15, 11.2, 16.25, 30.32, 55.25, 73.56, 82.07,
89.08, 95.86, 97.97, 99.03, 3.49, 4.4, 3.54, 4.99, 3.81, 10.12,
21.59, 24.93, 40.18, 61.01, 78.65, 88.81, 93.1, 94.61, 98.83,
97.86, 0.42, 0, 2.58, 5.67, 3.64, 8.01, 12.75, 13.27, 24.65,
46.1, 65.16, 77.74, 87.99, 94.4, 96.05, 100.4, 2.43, 0, 6.32,
5.59, 8.48, 12.32, 26.4, 28.36, 43.38, 69.56, 82.53, 91.36, 95.37,
98.36, 98.66, 98.8, 5.16, 2, 5.65, 3.48, 5.78, 5.5, 11.55, 8.53,
18.02, 38.11, 58.93, 70.93, 85.62, 89.53, 96.19, 96.19, 2.76,
2.99, 3.75, 3.02, 5.44, 3.08, 8.31, 10.85, 13.79, 32.06, 50.22,
63.7, 81.34, 89.59, 93.06, 92.47, 3.32, 1.14, 2.43, 2.75, 3.02,
5.4, 8.49, 7.91, 15.17, 35.01, 53.91, 68.51, 83.12, 86.85, 92.17,
95.72, 3.58, 0.02, 3.69, 4.34, 6.32, 5.15, 9.7, 11.39, 23.38,
42.9, 61.91, 71.82, 87.83)), .Names = c(ID, CP, RESP), class =
data.frame, row.names = c(NA,
-125L))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Robust regression error: Too many singular resamples

2010-07-12 Thread Alexandra Denby


Hello.

I've got a dataset that may have outliers in both x and y.  While I am not
at all familiar with robust regression, it looked like the function lmrob in
package robustbase should handle this situation.  When I try to use it, I
get:

Too many singular resamples
Aborting fast_s_w_mem()

Looking into it further, it appears that for an indicator variable in one of
my interaction terms, 98% of the data have value 1 and only 2% have value 0. 
I believe this is the cause of the problem, but am confused as to why the
algorithm cannot handle this situation.  The probability of actually getting
a singular sample ought to be fairly low, unless the sample sizes are fairly
tiny.  Is there some parameter I can tweak to increase the sample size, or
is something else going on?

You can easily reproduce this by running the following.  Any advice would be
appreciated.  Thank you.

library(robustbase)
x - rnorm(1)
isZ - c(rep(1,9800),rep(0,200))
y - rnorm(1)

model - lmrob(y~x*isZ)

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Robust-regression-error-Too-many-singular-resamples-tp2286585p2286585.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2: How to change font of labels in geom_text

2010-07-12 Thread Edwin Sun


I have the same problem and I wonder if there is any answer from the
community. Thanks.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/ggplot2-How-to-change-font-of-labels-in-geom-text-tp991579p2286671.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to create sequence in month

2010-07-12 Thread Gabor Grothendieck

On Mon, Jul 12, 2010 at 2:25 PM, Bogaso Christofer
bogaso.christo...@gmail.com wrote:
 Hi all, can anyone please guide me how to create a sequence of months? Here
 I have tried following however couldn't get success

 library(zoo)

 seq(as.yearmon(2010-01-01), as.yearmon(2010-03-01), by=1 month)


There currently is no seq method (we will make a note to add one) for
yearmon but you can add the appropriate sequence to the starting
yearmon object, use zooreg or convert to numeric or Date, perform the
seq and then convert back

# yearmon + seq
as.yearmon(2010-01-01) + 0:2/12
as.yearmon(2010-01) + 0:2/12

# zooreg
time(zooreg(1:3, as.yearmon(2010-01), freq = 12))

# seq.default
as.yearmon(seq(as.numeric(as.yearmon(2010-01)),
as.numeric(as.yearmon(2010-03)), 1/12))

# seq.Date
as.yearmon(seq(as.Date(2010-01-01), as.Date(2010-03-01), by = month))

Also note that if the reason you are doing this is to create a zooreg
object then its not necessary to explicitly form the sequence in the
first place since zooreg only requires the starting time and a
frequency as shown above.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] exercise in frustration: applying a function to subsamples

2010-07-12 Thread jim holtman

try 'drop=TRUE' on the split function call.  This will prevent the
NULL set from being sent to the function.

On Mon, Jul 12, 2010 at 3:10 PM, Ted Byers r.ted.by...@gmail.com wrote:
 From the documentation I have found, it seems that one of the functions from
 package plyr, or a combination of functions like split and lapply would
 allow me to have a really short R script to analyze all my data (I have
 reduced it to a couple hundred thousand records with about half a dozen
 records.

 I get the same result from ddply and split/lapply:

 ddply(moreinfo,c(m_id,sale_year,sale_week),
 +       function(df) data.frame(res = fitdist(df$elapsed_time,exp),est =
 res$estimate,sd = res$sd))
 Error in fitdist(df$elapsed_time, exp) :
   data must be a numeric vector of length greater than 1


 and


 lapply(split(moreinfo,list(moreinfo$m_id,moreinfo$sale_year,moreinfo$sale_week)),
 +       function(df) fitdist(df$elapsed_time,exp))
 Error in fitdist(df$elapsed_time, exp) :
   data must be a numeric vector of length greater than 1


 Now, in retrospect, unless I misunderstood the properties of a data.frame, I
 suppose a data.frame might not have been entirely appropriate as the m_id
 samples start and end on very different dates, but I would have thought a
 list data structure should have been able to handle that.  It would seem
 that split is making groups that have the same start and end dates (or that
 if, for example, I have sale data for precisely the last year, split would
 insist on both 2009 and 2010 having weeks from 0 through 52 instead of just
 the weeks in each year that actually have data: 26 through 52 for last year
 and 1 through 25 for this year).  I don't see how else the data passed to
 fitdist could have a sample size of 0.

 I'd appreciate understanding how to resolve this.  However, it isn't s show
 stopper as it now seems trivial to just break it out into a loop (followed
 by a lapply/split combo using only sale year and sale month).

 While I am asking, is there a better way to split such temporally ordered
 data into weekly samples that respective the year in which the sample is
 taken as well as the week in which it is taken?

 Thanks

 Ted

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ed50

2010-07-12 Thread Dipa Hari

I am using semiparametric Model
 library(mgcv)
sm1=gam(y~x1+s(x2),family=binomial, f)

How should I  find out standard error for ed50 for the above model 
ED50 =( -sm1$coef[1]-f(x2)) / sm1$coef [2]
 
f(x2) is estimated value for non parametric term.
 
Thanks


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] exercise in frustration: applying a function to subsamples

2010-07-12 Thread Ted Byers

OK,  here is a stripped down variant of my code.  I can run it here
unchanged (apart from the credentials for connecting to my DB).

Sys.setenv(MYSQL_HOME='C:/Program Files/MySQL/MySQL Server 5.0')
 library(TSMySQL)
 library(plyr)
 library(fitdistrplus)
 con - dbConnect(MySQL(), user=rejbyers, password=jesakos,
 dbname=merchants2)
 x - sprintf(SELECT m_id,sale_date,YEAR(sale_date) AS
 sale_year,WEEK(sale_date) AS sale_week,return_type,0.0001 +
 DATEDIFF(return_date,sale_date) AS elapsed_time FROM `risk_input` WHERE
 DATEDIFF(return_date,sale_date) IS NOT NULL)
 x
 moreinfo - dbGetQuery(con, x)
 str(moreinfo)
 #moreinfo
 #print(moreinfo)
 dbDisconnect(con)
 f1 - fitdist(moreinfo$elapsed_time,exp);
 summary(f1)
 lapply(split(moreinfo,list(moreinfo$m_id,moreinfo$sale_year,moreinfo$sale_week),drop
 = TRUE),
   function(df) fitdist(df$elapsed_time,exp))


I guess that for others to run this script, it is just necessary to create
some sample data, consisting of two or more m_id values (I have several
hundred), and temporally ordered data for each.  I am not familiar enough
with R to know how to do that using R.Usually, if I need dummy data, I
make it with my favourite rng using either C++ or Perl.  I am still trying
to get used to R.

Each record in my data has one random variate and a MySQL TIMESTAMP
(nn-nn- nn:nn:nn), anywhere from hundreds to thousands each week for
anywhere from a few months to several years.  My SQL actually produces the
random variate by taking the difference between the sale date and return
date, and is structured as it is because I know how to group by year and
week from a timestamp field using SQL but didn't know how to accomplish the
same thing in R.

The statement 'x' by itself, always shows me the correct SQL statement to
get the data (I can execute it unchanged in the mysql commandline client).
'str(moreinfo)' always gives me the data structure I expect.  E.g.:

 str(moreinfo)
'data.frame':   177837 obs. of  6 variables:
 $ m_id: num  171 206 206 206 206 206 206 218 224 224 ...
 $ sale_date   : chr  2008-04-25 07:41:09 2008-05-09 20:58:12
2008-09-06 19:51:52 2008-05-01 21:26:40 ...
 $ sale_year   : int  2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 ...
 $ sale_week   : int  16 18 35 17 31 21 19 52 44 35 ...
 $ return_type : num  1 1 1 1 1 1 1 1 1 1 ...
 $ elapsed_time: num  0.0001 0.0001 3.0001 4.0001 21.0001 ...

'summary(f1)' shows me the results I expect from the aggregate data.  E.g.:

 summary(f1)
FITTING OF THE DISTRIBUTION ' exp ' BY MAXIMUM LIKELIHOOD
PARAMETERS
  estimate   Std. Error
rate 0.0652917 0.0001547907
Loglikelihood:  -663134.7   AIC:  1326271   BIC:  1326281
--
GOODNESS-OF-FIT STATISTICS

_ Chi-squared_
Chi-squared statistic:  400277239
Degree of freedom of the Chi-squared distribution:  56
Chi-squared p-value:  0
!!! the p-value may be wrong
  with some theoretical
counts  5 !!!

!!! For continuous distributions, Kolmogorov-Smirnov and
  Anderson-Darling statistics should be prefered !!!

_ Kolmogorov-Smirnov_
Kolmogorov-Smirnov statistic:  0.1660987
Kolmogorov-Smirnov test:  rejected
!!! The result of this test may be too conservative as it
 assumes that the distribution parameters are known !!!

_ Anderson-Darling_
Anderson-Darling statistic:  Inf
Anderson-Darling test:  rejected


And at the end, I get the error I mentioned.  NB: In this variant, I added
drop = TRUE as Jim suggested.


lapply(split(all_samples,list(all_samples$m_id,all_samples$sale_year,all_samples$sale_week),drop
= TRUE),
+   function(df) fitdist(df$elapsed_time,exp))
Error in fitdist(df$elapsed_time, exp) :
  data must be a numeric vector of length greater than 1

If, then, drop = TRUE results in all empty combinations of m_id, year and
week being excluded, then (noticing the requirement is actually that the
sample size be greater than 1), I can only conclude that at least one of the
samples has only 1 record.

But that is too small.  Is there a way to allow the above code to apply
fitdist only if the sample size of a given subsample is greater than, say,
100?

Even better, is there a way to make the split more dynamic, so that it
groups a given m_id's data by month if the average weekly subsample size is
less than 100, or by day if the average weekly subsample is greater than
1000?

Thanks

Ted


On Mon, Jul 12, 2010 at 3:20 PM, Erik Iverson er...@ccbr.umn.edu wrote:

 Your code is not reproducible.  Can you come up with a small example
 showing the crux of your data structures/problem, that we can all run in our
 R sessions?  You're likely get much higher quality responses this way.

 Ted Byers wrote:

 From the documentation I have found, it seems that one of the functions
 from

 package plyr, or a combination of functions like split and lapply would
 allow me to have a really short R script to analyze all my data (I have

Re: [R] exercise in frustration: applying a function to subsamples

2010-07-12 Thread Ted Byers

Thanks Jim,

I acted on your suggestion and found the result unchanged.  :-(  Then I
noticed that fitdist doesn't like a sample size of 1 either.

If, then, drop = TRUE results in all empty combinations of m_id, year and
week being excluded, then (noticing the requirement is actually that the
sample size be greater than 1), I can only conclude that at least one of the
samples has only 1 record. I hadn't realized that some of the subsamples
were that small.  In my reply to Erik, I wrote:

But that is too small.  Is there a way to allow the above code to apply
 fitdist only if the sample size of a given subsample is greater than, say,
 100?  Even better, is there a way to make the split more dynamic, so that it
 groups a given m_id's data by month if the average weekly subsample size is
 less than 100, or by day if the average weekly subsample is greater than
 1000?


Thanks

Ted

On Mon, Jul 12, 2010 at 4:02 PM, jim holtman jholt...@gmail.com wrote:

 try 'drop=TRUE' on the split function call.  This will prevent the
 NULL set from being sent to the function.

 On Mon, Jul 12, 2010 at 3:10 PM, Ted Byers r.ted.by...@gmail.com wrote:
  From the documentation I have found, it seems that one of the functions
 from
  package plyr, or a combination of functions like split and lapply would
  allow me to have a really short R script to analyze all my data (I have
  reduced it to a couple hundred thousand records with about half a dozen
  records.
 
  I get the same result from ddply and split/lapply:
 
  ddply(moreinfo,c(m_id,sale_year,sale_week),
  +   function(df) data.frame(res = fitdist(df$elapsed_time,exp),est
 =
  res$estimate,sd = res$sd))
  Error in fitdist(df$elapsed_time, exp) :
data must be a numeric vector of length greater than 1
 
 
  and
 
 
 
 lapply(split(moreinfo,list(moreinfo$m_id,moreinfo$sale_year,moreinfo$sale_week)),
  +   function(df) fitdist(df$elapsed_time,exp))
  Error in fitdist(df$elapsed_time, exp) :
data must be a numeric vector of length greater than 1
 
 
  Now, in retrospect, unless I misunderstood the properties of a
 data.frame, I
  suppose a data.frame might not have been entirely appropriate as the m_id
  samples start and end on very different dates, but I would have thought a
  list data structure should have been able to handle that.  It would seem
  that split is making groups that have the same start and end dates (or
 that
  if, for example, I have sale data for precisely the last year, split
 would
  insist on both 2009 and 2010 having weeks from 0 through 52 instead of
 just
  the weeks in each year that actually have data: 26 through 52 for last
 year
  and 1 through 25 for this year).  I don't see how else the data passed to
  fitdist could have a sample size of 0.
 
  I'd appreciate understanding how to resolve this.  However, it isn't s
 show
  stopper as it now seems trivial to just break it out into a loop
 (followed
  by a lapply/split combo using only sale year and sale month).
 
  While I am asking, is there a better way to split such temporally ordered
  data into weekly samples that respective the year in which the sample is
  taken as well as the week in which it is taken?
 
  Thanks
 
  Ted
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?




-- 
R.E.(Ted) Byers, Ph.D.,Ed.D.
t...@merchantservicecorp.com
CTO
Merchant Services Corp.
350 Harry Walker Parkway North, Suite 8
Newmarket, Ontario
L3Y 8L3

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Statistical Learning and Datamining Course October 2010 Washington DC

2010-07-12 Thread Trevor Hastie

Short course: Statistical Learning and Data Mining III:
  Ten Hot Ideas for Learning from Data

 Trevor Hastie and Robert Tibshirani, Stanford University


 Georgetown University Conference Center
 Washington DC,
 October 11-12, 2010.

 This two-day course gives a detailed overview of statistical models for
 data mining, inference and prediction. With the rapid developments in
 internet technology, genomics, financial risk modeling, and other
 high-tech industries, we rely increasingly more on data analysis and
 statistical models to exploit the vast amounts of data at our
 fingertips.

 In this course we emphasize the tools useful for tackling modern-day
 data analysis problems. From the vast array of tools available, we have
 selected what we consider are the most relevant and exciting. Our
 top-ten list of topics are:

  * Regression and Logistic Regression (two golden oldies),
  * Lasso and Related Methods,
  * Support Vector and Kernel Methodology,
  * Principal Components (SVD) and Variations: sparse SVD, supervised
PCA, Nonnegative Matrix Factorization
  * Boosting, Random Forests and Ensemble Methods,
  * Rule based methods (PRIM),
  * Graphical Models,
  * Cross-Validation,
  * Bootstrap,
  * Feature Selection, False Discovery Rates and Permutation Tests.

 Our earlier courses are not a prerequisite for this new course. Although
 there is some overlap with past courses, our new course contains many
 topics not covered by us before.

 The material is based on recent papers by the authors and other
 researchers, as well as the new second edition of our best selling book:


 Statistical Learning: data mining, inference and prediction

 Hastie, Tibshirani  Friedman, Springer-Verlag, 2009

 http://www-stat.stanford.edu/ElemStatLearn/

 A copy of this book will be given to all attendees.

 The lectures will consist of video-projected presentations and
 discussion.
 Go to the site

 http://www-stat.stanford.edu/~hastie/sldm.html

 for more information and online registration.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What is the degrees of freedom in an nlme model

2010-07-12 Thread Bert Gunter

Jun:

Short answer: There is no such thing as df for a nonlinear model (whether or
not mixed effects).

Longer answer: df is the dimension of the null space when the data are
projected on the linear subspace of the model matrix of a **linear model **
. So, strictly speaking, no linear model, no df.

HOWEVER... nonlinear models are usually (always??) fit by successive linear
approximations, and approximate df are obtained from these approximating
subspaces.

However, the problem with this is that there is no guarantee that the
relevant residual distributions are sufficiently chisq with the approximate
df to give reasonable answers. In fact, lots of people much smarter than I
have spent lots of time trying to figure out what sorts of approximations
one should use to get trustworthy results. The thing is, in nonlinear
models, it can DEPEND on the exact form of the model -- indeed, that's what
distinguishes nonlinear models from linear ones! So this turns out to be
really hard and afaik these smart people don't agree on what should be done.


To see what one of the smartest people have to say about this, search the
archives for Doug Bates's comments on this w.r.t. lmer (he won't compute
such distributions nor provide P values because he doesn't know how to do it
reliably. Doug -- please correct me if I have it wrong).

A stock way to extricate oneself from this dilemma is: bootstrap!
Unfortunately, this is also probably too facile: for one thing, with a
nondiagonal covariance matrix (as in mixed effects models), how do you
resample to preserve the covariance structure? I believe this is an area of
active research in the time series literature, for example. For another,
this may be too computationally demanding to be practicable due to
convergence issues.

Bottom line: there may be no good way to do what you want.

Note to experts: Please view this post as an invitation to correct my errors
and provide authoritative info.

Cheers to all,

Bert

Bert Gunter
Genentech Nonclinical Biostatistics
 
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Jun Shen
Sent: Monday, July 12, 2010 12:34 PM
To: R-help
Subject: [R] What is the degrees of freedom in an nlme model

Dear all,

I want to do a F test, which involves calculation of the degrees of
freedom for the residuals. Now say, I have a nlme object mod.nlme. I
have two questions

1.How do I extract the degrees of freedom?
2.How is this degrees of freedom calculated in an nlme model?

Thanks.

Jun Shen

Some sample code and data
=
mod.nlme-nlme(RESP~E0+(Emax-E0)*CP**gamma/(EC50**gamma+CP**gamma),data=Data
,
fixed=E0+Emax+gamma+EC50~1,
random=list(pdDiag(EC50+E0+gamma~1)),
groups=~ID,
start=list(fixed=c(E0=1,Emax=100,gamma=1,b=50))
)

The Data object
structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L), CP = c(1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30,
45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18,
30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12,
18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25,
12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5,
11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5,
7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5,
5, 7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3,
4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90), RESP = c(3.19, 2.52,
2.89, 3.28, 3.82, 7.15, 11.2, 16.25, 30.32, 55.25, 73.56, 82.07,
89.08, 95.86, 97.97, 99.03, 3.49, 4.4, 3.54, 4.99, 3.81, 10.12,
21.59, 24.93, 40.18, 61.01, 78.65, 88.81, 93.1, 94.61, 98.83,
97.86, 0.42, 0, 2.58, 5.67, 3.64, 8.01, 12.75, 13.27, 24.65,
46.1, 65.16, 77.74, 87.99, 94.4, 96.05, 100.4, 2.43, 0, 6.32,
5.59, 8.48, 12.32, 26.4, 28.36, 43.38, 69.56, 82.53, 91.36, 95.37,
98.36, 98.66, 98.8, 5.16, 2, 5.65, 3.48, 5.78, 5.5, 11.55, 8.53,
18.02, 38.11, 58.93, 70.93, 85.62, 89.53, 96.19, 96.19, 2.76,
2.99, 3.75, 3.02, 5.44, 3.08, 8.31, 10.85, 13.79, 32.06, 50.22,
63.7, 81.34, 89.59, 93.06, 92.47, 3.32, 1.14, 2.43, 2.75, 3.02,
5.4, 8.49, 7.91, 15.17, 35.01, 53.91, 68.51, 83.12, 86.85, 92.17,
95.72, 3.58, 0.02, 3.69, 4.34, 6.32, 5.15, 9.7, 11.39, 23.38,
42.9, 61.91, 71.82, 87.83)), .Names = c(ID, CP, RESP), class =
data.frame, row.names = c(NA,
-125L))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

[R] SAS to R

2010-07-12 Thread trekvana


Hi everyone I dont know how to code in SAS but I do know how to code in R.
Can someone please be kind enough to translate this into R code for me:

proc mixed data = small method = reml;
 class id day;
 model weight = day/ solution ddfm = bw;
 repeated day/ subject=id type = unstructured;
run;

===

so far i think it is
gls(weight~day,corr=corSymm(???),method=REML,data=small)

my main problem is I dont know how to get the unstructured covariance matrix
to work

Thank you
-- 
View this message in context: 
http://r.789695.n4.nabble.com/SAS-to-R-tp2286695p2286695.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to select the column header with \Sexpr{}

2010-07-12 Thread Felipe Carrillo

Hi:
Since I work with a few different fish runs my column headers change everytime
I start a new Year. I have been using \Sexpr{} for my row and columns and now 
I am trying to use with my report column headers. \Sexpr{1,1} is row 1 column 1,
what can I use for headers? I tried \Sexpr{0,1} but sweave didn't like 
it..Thanks in advance
for any hints

 
Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish  Wildlife Service
California, USA


   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to select the column header with \Sexpr{}

2010-07-12 Thread Duncan Murdoch


On 12/07/2010 5:10 PM, Felipe Carrillo wrote:

Hi:
Since I work with a few different fish runs my column headers change everytime
I start a new Year. I have been using \Sexpr{} for my row and columns and now 
I am trying to use with my report column headers. \Sexpr{1,1} is row 1 column 1,
what can I use for headers? I tried \Sexpr{0,1} but sweave didn't like 
it..Thanks in advance

for any hints



\Sexpr takes an R expression, and inserts the first element of the 
result into your text.  Using just 0,1 (not including the quotes) is 
not a valid R expression.


You need to use paste() or some other function to construct the label 
you want to put in place, e.g. \Sexpr{paste(0,1,sep=,)} will give you 
0,1.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] findInterval and data resolution

2010-07-12 Thread Duncan Murdoch


On 12/07/2010 5:25 PM, Bryan Hanson wrote:

Hello Wise Ones...

I need a clever way around a problem with findInterval.  Consider:

vec1 - 1:10
vec2 - seq(1, 10, by = 0.1)

x1 - c(2:3)

a1 - findInterval(x1, vec1); a1 # example 1
a2 - findInterval(x1, vec2); a2  # example 2

In the problem I'm working on, vec* may be either integer or numeric, like
vec1 and vec2.  I need to remove one or more sections of this vector; for
instance if I ask to remove values 2:3 I want to remove all values between 2
and 3 regardless of the resolution of the data (in my thinking, vec2 is more
dense or has better resolution than vec1).  So example 1 above works fine
because the values 2 and 3 are the end points of a range that includes no
values in-between (a1).  But, in example 2 the answer is, correctly, also
the end points, but now there are values in between these end points.  Hence
a2 doesn't include the indices of the values in-between the end points.

I have looked at cut, but it doesn't quite behave the way I want since if I
set x1 - c(2:4) I get more intervals than I really want and cleaning it up
will be laborious.  I think I can construct the full set of indices I want
with a2[1]:a2[2] but is there a more clever way to do this?  I'm thinking
there might be a function out there that I am not aware of.


I'm not sure I understand what you want.  If you know x1 will always be 
an increasing vector, you could use something like a2[1]:a2[length(a2)] 
to select the full range of indices that it covers.  If x1 is not 
necessarily in increasing order, you'll have to do min(a2):max(a2) 
(which might be clearer in any case).


If you're more interested in the range of values in vec*, maybe

range(vec2[min(a2):max(a2)])

will give you want you want.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Question about food sampling analysis

2010-07-12 Thread Sarah Henderson

Greetings to all, and my apologies for a question that is mostly about
statistics and secondarily about R.  I have just started a new job that
(this week, apparently) requires statistical knowledge beyond my training
(as an epidemiologist).

The problem:

- We have 57 food production facilities in three categories
- Samples of 4-6 different foods were tested for listeria at each facility
- I need to describe the presence of listeria in food (1) overall and (2) by
facility category.

I know that samples within each facility cannot be treated as independent,
so I need an approach that accounts for (1) clustering within facilities and
(2) the different number of samples taken at each facility.  If someone
could kindly point me towards the right type of analysis for this and/or its
associated R functions/packages, I would greatly appreciate it.

Many thanks,

Sarah

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] findInterval and data resolution

2010-07-12 Thread Steve Taylor

How about this:
these = which(vec2  x1[1] | vec2  x1[2])
vec2[these]

# Or using logical indexation directly:
vec2[vec2  x1[1] | vec2  x1[2]]

 

From: Bryan Hanson han...@depauw.edu
To:R Help r-h...@stat.math.ethz.ch
Date: 13/Jul/2010 9:28a
Subject: [R] findInterval and data resolution
Hello Wise Ones...

I need a clever way around a problem with findInterval.  Consider:

vec1 - 1:10
vec2 - seq(1, 10, by = 0.1)

x1 - c(2:3)

a1 - findInterval(x1, vec1); a1 # example 1
a2 - findInterval(x1, vec2); a2  # example 2

In the problem I'm working on, vec* may be either integer or numeric, like
vec1 and vec2.  I need to remove one or more sections of this vector; for
instance if I ask to remove values 2:3 I want to remove all values between 2
and 3 regardless of the resolution of the data (in my thinking, vec2 is more
dense or has better resolution than vec1).  So example 1 above works fine
because the values 2 and 3 are the end points of a range that includes no
values in-between (a1).  But, in example 2 the answer is, correctly, also
the end points, but now there are values in between these end points.  Hence
a2 doesn't include the indices of the values in-between the end points.

I have looked at cut, but it doesn't quite behave the way I want since if I
set x1 - c(2:4) I get more intervals than I really want and cleaning it up
will be laborious.  I think I can construct the full set of indices I want
with a2[1]:a2[2] but is there a more clever way to do this?  I'm thinking
there might be a function out there that I am not aware of.

TIA, Bryan
*
Bryan Hanson
Acting Chair
Professor of Chemistry  Biochemistry
DePauw University, Greencastle IN USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R ( http://www.r/ 
)-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to select the column header with \Sexpr{}

2010-07-12 Thread Felipe Carrillo

Thanks for the quick reply Duncan.
I don't think I have explained myself well, I have a dataset named report and 
my column headers are run1,run2,run3,run4 and so on. 

I know how to access the data below those columns with \Sexpr{report[1,1]}  
\Sexpr{report[1,2]} and so on, but I can't access my column headers
with \Sexpr{} because I can't find the way to reference run1,run2,run3 and 
run4. 
Sorry if I am not explain myself really well.
 



- Original Message 
 From: Duncan Murdoch murdoch.dun...@gmail.com
 To: Felipe Carrillo mazatlanmex...@yahoo.com
 Cc: r-h...@stat.math.ethz.ch
 Sent: Mon, July 12, 2010 2:18:15 PM
 Subject: Re: [R] How to select the column header with \Sexpr{}
 
 On 12/07/2010 5:10 PM, Felipe Carrillo wrote:
  Hi:
  Since I work with a few different fish runs my column headers change 
everytime
  I start a new Year. I have been using \Sexpr{} for my row and columns and 
  now 
I am trying to use with my report column headers. \Sexpr{1,1} is row 1 column 
1,
  what can I use for headers? I tried \Sexpr{0,1} but sweave didn't like 
it..Thanks in advance
  for any hints
  
 
 \Sexpr takes an R expression, and inserts the first element of the result 
 into 
your text.  Using just 0,1 (not including the quotes) is not a valid R 
expression.
 
 You need to use paste() or some other function to construct the label you 
 want 
to put in place, e.g. \Sexpr{paste(0,1,sep=,)} will give you 0,1.
 
 Duncan Murdoch
 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] findInterval and data resolution

2010-07-12 Thread Bryan Hanson

Thanks Duncan... More appended at the bottom...


On 7/12/10 5:38 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote:

 On 12/07/2010 5:25 PM, Bryan Hanson wrote:
 Hello Wise Ones...
 
 I need a clever way around a problem with findInterval.  Consider:
 
 vec1 - 1:10
 vec2 - seq(1, 10, by = 0.1)
 
 x1 - c(2:3)
 
 a1 - findInterval(x1, vec1); a1 # example 1
 a2 - findInterval(x1, vec2); a2  # example 2
 
 In the problem I'm working on, vec* may be either integer or numeric, like
 vec1 and vec2.  I need to remove one or more sections of this vector; for
 instance if I ask to remove values 2:3 I want to remove all values between 2
 and 3 regardless of the resolution of the data (in my thinking, vec2 is more
 dense or has better resolution than vec1).  So example 1 above works fine
 because the values 2 and 3 are the end points of a range that includes no
 values in-between (a1).  But, in example 2 the answer is, correctly, also
 the end points, but now there are values in between these end points.  Hence
 a2 doesn't include the indices of the values in-between the end points.
 
 I have looked at cut, but it doesn't quite behave the way I want since if I
 set x1 - c(2:4) I get more intervals than I really want and cleaning it up
 will be laborious.  I think I can construct the full set of indices I want
 with a2[1]:a2[2] but is there a more clever way to do this?  I'm thinking
 there might be a function out there that I am not aware of.
 
 I'm not sure I understand what you want.  If you know x1 will always be
 an increasing vector, you could use something like a2[1]:a2[length(a2)]
 to select the full range of indices that it covers.  If x1 is not
 necessarily in increasing order, you'll have to do min(a2):max(a2)
 (which might be clearer in any case).
 
 If you're more interested in the range of values in vec*, maybe
 
 range(vec2[min(a2):max(a2)])
 

min(a2):max(a2) is very helpful, as it fixes another problem that I did not
post about.  More generally, I want to pass a vector of pairs of values to
be removed, like this:

x1 - c(2:3, 8:9); a3 - findInterval(x1, vec2)
a3 # which turns out to be 11 21 71 81

Where I want my function to remove all values between 2 and 3, and between 8
and 9, regardless of how many values are between these indices.  So in the
example of a3, I want to remove everything between 11 and 21, and everything
between 71 and 81, keeping everything else.

I think I can put together a function pretty quickly that takes x1 in
sequential pairs and returns all the intervening indicies which can then be
used to clean up the original vector.

Thanks again, and if anyone has another idea, do tell!  Bryan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Comparison of two very large strings

2010-07-12 Thread harsh yadav

Hi,

I have a function in R that compares two very large strings for about 1
million records.

The strings are very large URLs like:-


http://query.nytimes.com/gst/sitesearch_selector.html?query=US+Visa+Lawstype=nytx=25y=8.
..

or of larger lengths.

The data-frame looks like:-

id url
1
http://query.nytimes.com/gst/sitesearch_selector.html?query=US+Visa+Lawstype=nytx=25y=8.
..
2   http://query.nytimes.com/search/sitesearch?query=US+Visa+Lawssrchst=cse
3
http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N.
..
4
http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N
5
http://www.google.com/url?sa=Ustart=11q=http://app1.chinadaily.com.cn/star/2004/0610/fo4-1.htmlei=uUKwSe7XN9CCt

and so on for about 1 million records.

Here is the function that I am using to compare the two strings:-

stringCompare - function(currentURL, currentId){
  j - currentId - 1
 while(j=1)
previousURL - urlDataFrame[j,url]
previousURLLength - nchar(previousURL)
 #Compare smaller with bigger
if(nchar(currentURL) = previousURLLength){
 matchPhrase - substr(previousURL,1,nchar(currentURL))
if(matchPhrase == currentURL){
 return(TRUE)
}
}else{
 matchPhrase - substr(currentURL,1,previousURLLength)
if(matchPhrase == previousURL){
 return(TRUE)
}
}
 j - j -1
}
 return(FALSE)
}

Here, I compare the URL at a given row with all the previous URLs in the
data-frame. I compare the smaller of the two given URls with the larger one
(upto the length of the smaller).

When I run the above function for about 1 million records, the execution
becomes really slow, which otherwise is fast if I remove the
string comparison step.

Any ideas how it can be implemented in a fast and efficient way.

Thanks and Regards,
Harsh Yadav

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Matrix Column Names

2010-07-12 Thread David Neu

Hi,

Is there a way to create a matrix in which the column names are not
checked to see if they are valid variable names?

I'm looking something similar to the check.names argument to
data.frame.  If so, would such an approach work for the sparse matrix
classes in the Matrix package.

Many thanks!

Cheers,
Dave

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to select the column header with \Sexpr{}

2010-07-12 Thread David Winsemius



On Jul 12, 2010, at 5:45 PM, Felipe Carrillo wrote:


Thanks for the quick reply Duncan.
I don't think I have explained myself well, I have a dataset named  
report and

my column headers are run1,run2,run3,run4 and so on.

I know how to access the data below those columns with  
\Sexpr{report[1,1]} 

\Sexpr{report[1,2]} and so on, but I can't access my column headers
with \Sexpr{} because I can't find the way to reference  
run1,run2,run3 and run4.

Sorry if I am not explain myself really well.


Wouldn't this just be:

\Sexpr{names(report)}  # ?  or perhaps you want specific items in that  
vector?


Sexpr{names(report)[1]}, Sexpr{names(report)[2]}, etc

--
David.





- Original Message 

From: Duncan Murdoch murdoch.dun...@gmail.com
To: Felipe Carrillo mazatlanmex...@yahoo.com
Cc: r-h...@stat.math.ethz.ch
Sent: Mon, July 12, 2010 2:18:15 PM
Subject: Re: [R] How to select the column header with \Sexpr{}

On 12/07/2010 5:10 PM, Felipe Carrillo wrote:

Hi:
Since I work with a few different fish runs my column headers change

everytime
I start a new Year. I have been using \Sexpr{} for my row and  
columns and now
I am trying to use with my report column headers. \Sexpr{1,1} is  
row 1 column 1,
what can I use for headers? I tried \Sexpr{0,1} but sweave didn't  
like

it..Thanks in advance

for any hints



\Sexpr takes an R expression, and inserts the first element of the  
result into
your text.  Using just 0,1 (not including the quotes) is not a  
valid R

expression.

You need to use paste() or some other function to construct the  
label you want

to put in place, e.g. \Sexpr{paste(0,1,sep=,)} will give you 0,1.

Duncan Murdoch






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Comparison of two very large strings

2010-07-12 Thread David Winsemius



On Jul 12, 2010, at 6:03 PM, harsh yadav wrote:


Hi,

I have a function in R that compares two very large strings for  
about 1

million records.

The strings are very large URLs like:-


http://query.nytimes.com/gst/sitesearch_selector.html?query=US+Visa+Lawstype=nytx=25y=8 
.

..

or of larger lengths.

The data-frame looks like:-

id url
1
http://query.nytimes.com/gst/sitesearch_selector.html?query=US+Visa+Lawstype=nytx=25y=8 
.

..
2   http://query.nytimes.com/search/sitesearch?query=US+Visa+Lawssrchst=cse
3
http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N 
.

..
4
http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N
5
http://www.google.com/url?sa=Ustart=11q=http://app1.chinadaily.com.cn/star/2004/0610/fo4-1.htmlei=uUKwSe7XN9CCt

and so on for about 1 million records.

Here is the function that I am using to compare the two strings:-

stringCompare - function(currentURL, currentId){
 j - currentId - 1
while(j=1)
previousURL -
previousURLLength - nchar(previousURL)
#Compare smaller with bigger
if(nchar(currentURL) = previousURLLength){
matchPhrase - substr(previousURL,1,nchar(currentURL))
if(matchPhrase == currentURL){
return(TRUE)
}
}else{
matchPhrase - substr(currentURL,1,previousURLLength)
if(matchPhrase == previousURL){
return(TRUE)
}
}
j - j -1
}
return(FALSE)
}


Couldn't you just store the url vector after running  through nchar  
and then do the comparison in a vectorized manner?


test - rd.txt('id url
1 http://query.nytimes.com/gst/sitesearch_selector.html?query=US+Visa+Lawstype=nytx=25y=8 

2 http://query.nytimes.com/search/sitesearch?query=US+Visa+Lawssrchst=cse 

3 http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N 

4 http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N 

5 http://www.google.com/url?sa=Ustart=11q=http://app1.chinadaily.com.cn/star/2004/0610/fo4-1.htmlei=uUKwSe7XN9CCt 
', stringsAsFactors=FALSE)


copyUrls - test[,url]
sizeUrls - nchar(copyUrls)
lengU - length(sizeUrls)
sizidx - pmax(sizeUrls[1:(lengU-1)], sizeUrls[2:(lengU)])
substr(copyUrls[2:lengU], 1, sizidx) == substr(copyUrls[1:(lengU-1)],  
1, sizidx)


#[1] FALSE FALSE  TRUE FALSE


Here, I compare the URL at a given row with all the previous URLs in  
the
data-frame. I compare the smaller of the two given URls with the  
larger one

(upto the length of the smaller).

When I run the above function for about 1 million records, the  
execution

becomes really slow, which otherwise is fast if I remove the
string comparison step.

Any ideas how it can be implemented in a fast and efficient way.

Thanks and Regards,
Harsh Yadav

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Xyplot or Tin-R problem?

2010-07-12 Thread YANG, Richard ChunHSi

I ran the following script from xyplot Examples using Tin-R on 
Windows and saw no plot produced. 

EE - equal.count(ethanol$E, number=9, overlap=1/4)
xyplot(NOx ~ C | EE, data=ethanol,
prepanel = function(x,y) prepanel.loess(x, y, span=1),
xlab=Compression Ratio, ylab=NOx (micrograms/J),
panel = function(x,y) {
 panel.grid()(h = -1, v=2)
 panel.xyplot(x,y)
 panel.loess(x,y, span=1)
},
aspect = xy)

The Rgui showed 
 source(.trPaths[5])

Without any error msg. Did I miss anything? Please enlighten me.

Richard


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Continuing on with a loop when there's a failure

2010-07-12 Thread Josh B

Hi R sages,

Here is my latest problem. Consider the following toy example:

x - read.table(textConnection(y1 y2 y3 x1 x2
indv.1 bagels donuts bagels 4 6
indv.2 donuts donuts donuts 5 1
indv.3 donuts donuts donuts 1 10
indv.4 donuts donuts donuts 10 9
indv.5 bagels donuts bagels 0 2
indv.6 bagels donuts bagels 2 9
indv.7 bagels donuts bagels 8 5
indv.8 bagels donuts bagels 4 1
indv.9 donuts donuts donuts 3 3
indv.10 bagels donuts bagels 5 9
indv.11 bagels donuts bagels 9 10
indv.12 bagels donuts bagels 3 1
indv.13 donuts donuts donuts 7 10
indv.14 bagels donuts bagels 2 10
indv.15 bagels donuts bagels 9 6), header = TRUE)

I want to fit a logistic regression of y1 on x1 and x2. Then I want to run a 
logistic regression of y2 on x1 and x2. Then I want to run a logistic 
regression 
of y3 on x1 and x2. In reality I have many more Y columns than simply y1, 
y2, and y3, so I must design a loop. Notice that y2 is invariant and thus 
it 
will fail. In reality, some y columns will fail for much more subtle reasons. 
Simply screening my data to eliminate invariant columns will not eliminate the 
problem.

What I want to do is output a piece of the results from each run of the loop to 
a matrix. I want the to try each of my y columns, and not give up and stop 
running simply because a particular y column is bad. I want it to give me NA 
or something similar in my results matrix for the bad y columns, but I want it 
to keep going give me good data for the good y columns.

For instance:
results - matrix(nrow = 1, ncol = 3)
colnames(results) - c(y1, y2, y3)

for (i in 1:2) {
mod.poly3 - lrm(x[,i] ~ pol(x1, 3) + pol(x2, 3), data=x)
results[1,i] - anova(mod.poly3)[1,3]
}

If I run this code, it gives up when fitting y2 because the y2 is bad. It 
doesn't even try to fit y3. Here's what my console shows:

 results
y1 y2 y3
[1,] 0.6976063 NA NA

As you can see, it gave up before fitting y3, which would have worked.

How do I force my code to keep going through the loop, despite the rotten 
apples 
it encounters along the way? Exact code that gets the job done is what I am 
interested in. I am a post-doc -- I am not taking any classes. I promise this 
is 
not a homework assignment!

Thanks in advance,
---
Josh Banta, Ph.D
Center for Genomics and Systems Biology
New York University
100 Washington Square East
New York, NY 10003
Tel: (212) 998-8465
http://plantevolutionaryecology.org



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Comparison of two very large strings

2010-07-12 Thread David Winsemius



On Jul 12, 2010, at 6:46 PM, David Winsemius wrote:



On Jul 12, 2010, at 6:03 PM, harsh yadav wrote:


Hi,

I have a function in R that compares two very large strings for  
about 1

million records.

The strings are very large URLs like:-


http://query.nytimes.com/gst/sitesearch_selector.html?query=US+Visa+Lawstype=nytx=25y=8 
.

..

or of larger lengths.

The data-frame looks like:-

id url
1
http://query.nytimes.com/gst/sitesearch_selector.html?query=US+Visa+Lawstype=nytx=25y=8 
.

..
2   http://query.nytimes.com/search/sitesearch?query=US+Visa+Lawssrchst=cse
3
http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N 
.

..
4
http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N
5
http://www.google.com/url?sa=Ustart=11q=http://app1.chinadaily.com.cn/star/2004/0610/fo4-1.htmlei=uUKwSe7XN9CCt

and so on for about 1 million records.

Here is the function that I am using to compare the two strings:-

stringCompare - function(currentURL, currentId){
j - currentId - 1
while(j=1)
previousURL -
previousURLLength - nchar(previousURL)
#Compare smaller with bigger
if(nchar(currentURL) = previousURLLength){
matchPhrase - substr(previousURL,1,nchar(currentURL))
if(matchPhrase == currentURL){
return(TRUE)
}
}else{
matchPhrase - substr(currentURL,1,previousURLLength)
if(matchPhrase == previousURL){
return(TRUE)
}
}
j - j -1
}
return(FALSE)
}


Couldn't you just store the url vector after running  through  
nchar and then do the comparison in a vectorized manner?


test - rd.txt('id url
1 http://query.nytimes.com/gst/sitesearch_selector.html?query=US+Visa+Lawstype=nytx=25y=8 

2 http://query.nytimes.com/search/sitesearch?query=US+Visa+Lawssrchst=cse 

3 http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N 

4 http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N 

5 http://www.google.com/url?sa=Ustart=11q=http://app1.chinadaily.com.cn/star/2004/0610/fo4-1.htmlei=uUKwSe7XN9CCt 
', stringsAsFactors=FALSE)


copyUrls - test[,url]
sizeUrls - nchar(copyUrls)
lengU - length(sizeUrls)
sizidx - pmax(sizeUrls[1:(lengU-1)], sizeUrls[2:(lengU)])
substr(copyUrls[2:lengU], 1, sizidx) == substr(copyUrls[1: 
(lengU-1)], 1, sizidx)


#[1] FALSE FALSE  TRUE FALSE



Let me hasten to admit that when I tried to fix what I thought was an  
error in that program, I go the same result. It seemed as though I  
should have been getting errors by choosing the maximum string  
length.  Changing the pmax to pmin did not alter the results ... to my  
puzzlement ... until I further noticed that urls #3 and #4 were of the  
same length. When I extend the lengths, then only the version using  
pmin works properly.



--
David.


Here, I compare the URL at a given row with all the previous URLs  
in the
data-frame. I compare the smaller of the two given URls with the  
larger one

(upto the length of the smaller).

When I run the above function for about 1 million records, the  
execution

becomes really slow, which otherwise is fast if I remove the
string comparison step.

Any ideas how it can be implemented in a fast and efficient way.

Thanks and Regards,
Harsh Yadav

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] print.trellis draw.in - plaintext (gmail mishap)

2010-07-12 Thread Felix Andrews

The problem is that you have not pushed your viewport so it doesn't
exist in the plot. (You only pushed the layout viewport).

 grid.ls(viewports = TRUE)
ROOT
  GRID.VP.82

Try this:

vp - vplayout(2,2)
pushViewport(vp)
upViewport()
grid.ls(viewports = TRUE)
#ROOT
#  GRID.VP.82
#GRID.VP.86
print(p, newpage = FALSE, draw.in = vp$name)


-Felix


On 13 July 2010 01:22, Mark Connolly wmcon...@ncsu.edu wrote:
 require(grid)
 require(lattice)
 fred = data.frame(x=1:5,y=runif(5))
 vplayout - function (x,y) viewport(layout.pos.row=x, layout.pos.col=y)
 grid.newpage()
 pushViewport(viewport(layout=grid.layout(2,2)))
 p = xyplot(y~x,fred)
 print(  p,newpage=FALSE,draw.in=vplayout(2,2)$name)


 On Mon, Jul 12, 2010 at 8:58 AM, Felix Andrews fe...@nfrac.org wrote:
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 Yes, please, reproducible code.



 On 10 July 2010 00:49, Mark Connolly wmcon...@ncsu.edu wrote:
 I am attempting to plot a trellis object on a grid.

 vplayout = viewport(layout.pos.row=x, layout.pos.col=y)

 grid.newpage()
 pushViewport(viewport(layout=grid.layout(2,2)))

 g1 = ggplot() ...
 g2 = ggplot() ...
 g3 = ggplot() ...
 p = xyplot() ...

 # works as expected
 print(g1, vp=vplayout(1,1))
 print(g2, vp=vplayout(1,2))
 print(g3, vp=vplayout(2,1))

 # does not work
 print(  p,
         newpage=FALSE,
         draw.in=vplayout(2,2)$name)

 Error in grid.Call.graphics(L_downviewport, name$name, strict) :
  Viewport 'GRID.VP.112' was not found


 What am I doing wrong?

 Thanks!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Felix Andrews / 安福立
 http://www.neurofractal.org/felix/





-- 
Felix Andrews / 安福立
http://www.neurofractal.org/felix/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Continuing on with a loop when there's a failure

2010-07-12 Thread David Winsemius



On Jul 12, 2010, at 6:18 PM, Josh B wrote:


Hi R sages,

Here is my latest problem. Consider the following toy example:

x - read.table(textConnection(y1 y2 y3 x1 x2
indv.1 bagels donuts bagels 4 6
indv.2 donuts donuts donuts 5 1
indv.3 donuts donuts donuts 1 10
indv.4 donuts donuts donuts 10 9
indv.5 bagels donuts bagels 0 2
indv.6 bagels donuts bagels 2 9
indv.7 bagels donuts bagels 8 5
indv.8 bagels donuts bagels 4 1
indv.9 donuts donuts donuts 3 3
indv.10 bagels donuts bagels 5 9
indv.11 bagels donuts bagels 9 10
indv.12 bagels donuts bagels 3 1
indv.13 donuts donuts donuts 7 10
indv.14 bagels donuts bagels 2 10
indv.15 bagels donuts bagels 9 6), header = TRUE)

I want to fit a logistic regression of y1 on x1 and x2. Then I want  
to run a
logistic regression of y2 on x1 and x2. Then I want to run a  
logistic regression
of y3 on x1 and x2. In reality I have many more Y columns than  
simply y1,
y2, and y3, so I must design a loop. Notice that y2 is invariant  
and thus it
will fail. In reality, some y columns will fail for much more subtle  
reasons.
Simply screening my data to eliminate invariant columns will not  
eliminate the

problem.

What I want to do is output a piece of the results from each run of  
the loop to
a matrix. I want the to try each of my y columns, and not give up  
and stop
running simply because a particular y column is bad. I want it to  
give me NA
or something similar in my results matrix for the bad y columns, but  
I want it

to keep going give me good data for the good y columns.

For instance:
results - matrix(nrow = 1, ncol = 3)
colnames(results) - c(y1, y2, y3)

for (i in 1:2) {
mod.poly3 - lrm(x[,i] ~ pol(x1, 3) + pol(x2, 3), data=x)
results[1,i] - anova(mod.poly3)[1,3]
}

If I run this code, it gives up when fitting y2 because the y2 is  
bad. It

doesn't even try to fit y3. Here's what my console shows:


results

   y1 y2 y3
[1,] 0.6976063 NA NA

As you can see, it gave up before fitting y3, which would have worked.

How do I force my code to keep going through the loop, despite the  
rotten apples

it encounters along the way?


?try

http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-capture-or-ignore-errors-in-a-long-simulation_003f

(Doesn't only apply to simulations.)


Exact code that gets the job done is what I am
interested in. I am a post-doc -- I am not taking any classes. I  
promise this is

not a homework assignment!


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Xyplot or Tin-R problem?

2010-07-12 Thread David Winsemius



On Jul 12, 2010, at 6:26 PM, YANG, Richard ChunHSi wrote:


I ran the following script from xyplot Examples using Tin-R on
Windows and saw no plot produced.

EE - equal.count(ethanol$E, number=9, overlap=1/4)
xyplot(NOx ~ C | EE, data=ethanol,
   prepanel = function(x,y) prepanel.loess(x, y, span=1),
   xlab=Compression Ratio, ylab=NOx (micrograms/J),
   panel = function(x,y) {
panel.grid()(h = -1, v=2)
panel.xyplot(x,y)
panel.loess(x,y, span=1)
   },
   aspect = xy)

The Rgui showed

source(.trPaths[5])


Without any error msg. Did I miss anything? Please enlighten me.


I got the example to work fine but had no plotting with your version  
and cannot see the difference in the code. I assigned them to t1 nd t2  
and ...

 all.equal(t1, t2)
[1] Component 5: target, current do not match when deparsed
[2] Component 29: target, current do not match when deparsed

Looking at str applied to both does not illuminate me. I have seen  
problems on my Mac with examples copied from the help page and I  
suspect there is some invisible character sitting in a copy-pasted  
version that out mail-clients are not displaying. What happens if you  
try:


examples(xyplot)  #???

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Correct function name for text display of S4 object

2010-07-12 Thread Andrew Liu


Hello,

I am working on an R package for storing order book data. I currently 
have a display method that has the following output (ob is an S4 object):



display(ob)


Current time is 09:35:02

 Price  Ask Size

--

 11.42  900

 11.41  1,400

 11.40  1,205

 11.39  1,600

 11.38  400

--

2,700  11.36

1,100  11.35

1,100  11.34

1,600  11.33

700   11.32

--

Bid Size  Price

The package already has show, summary, and plot methods.

Is there a more conventional name than display for the above output, or 
is display as good as any other name?


Thanks,

Andrew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] do the standard R analysis functions handle spatial grid data?

2010-07-12 Thread chris howden

Hi everyone,

I'm doing a resource function analysis with radio collared dingos and GIS
info.

The ecologist I'm working with wants to send me the data in a 'grid
format'...straight out of ARCVIEW GIS.

I want to model the data using a GLM and maybe a LOGISTIC model as well. And
I was planning on using the glm and logistic functions in R.


Now I'm pretty sure that these functions require the data to be in a 2-D
spreadsheet format. And for me to call the responses and predictors as
columns from a data.frame (or 2-D matrix)

However I'm being told they can handle the data in a 'grid' format. So I'm
pretty sure this would mean I would be calling the responses and predictors
as 2-d matrices...and I don't think these functions can do that?


Can anyone enlighten me?

Am I right in thinking these function cannot handle data in a 3-D 'grid'
format and require data to be entered as a 2-d data.frame or matrix?


Are there other special functions out there that can handle this type of
data, and I should be using these instead?

Thanks for your help

Chris Howden
Founding Partner
Tricky Solutions 
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP development, Data Analysis,
Modelling, and Training
(mobile) 0410 689 945
(fax / office) (+618) 8952 7878
ch...@trickysolutions.com.au

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] do the standard R analysis functions handle spatial grid data?

2010-07-12 Thread Steve Taylor

Have a look at the Task View for spatial data...
http://cran.ms.unimelb.edu.au/web/views/Spatial.html 

From: chris howden tall.chr...@yahoo.com.au
To:r-help@r-project.org, r-sig-ecology-requ...@r-project.org
Date: 13/Jul/2010 2:01p
Subject: [R] do the standard R analysis functions handle spatial grid data?
Hi everyone,

I'm doing a resource function analysis with radio collared dingos and GIS
info.

The ecologist I'm working with wants to send me the data in a 'grid
format'...straight out of ARCVIEW GIS.

I want to model the data using a GLM and maybe a LOGISTIC model as well. And
I was planning on using the glm and logistic functions in R.

Now I'm pretty sure that these functions require the data to be in a 2-D
spreadsheet format. And for me to call the responses and predictors as
columns from a data.frame (or 2-D matrix)

However I'm being told they can handle the data in a 'grid' format. So I'm
pretty sure this would mean I would be calling the responses and predictors
as 2-d matrices...and I don't think these functions can do that?

Can anyone enlighten me?

Am I right in thinking these function cannot handle data in a 3-D 'grid'
format and require data to be entered as a 2-d data.frame or matrix?

Are there other special functions out there that can handle this type of
data, and I should be using these instead?

Thanks for your help

Chris Howden
Founding Partner
Tricky Solutions 
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP development, Data Analysis,
Modelling, and Training
(mobile) 0410 689 945
(fax / office) (+618) 8952 7878
ch...@trickysolutions.com.au 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R ( http://www.r/ 
)-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Accessing files on password-protected FTP sites

2010-07-12 Thread Cliff Clive


Hello everyone,

Is it possible to download data from password-protected ftp sites?  I saw
another thread with instructions for uploading files using RCurl, but I
could not find information for downloading them in the RCurl documentation.

I am using R 2.11 on a Windows XP 32-bit machine.

Thanks in advance,

Cliff
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Accessing-files-on-password-protected-FTP-sites-tp2286862p2286862.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] SAS Proc summary/means as a R function

2010-07-12 Thread Roger Deangelis


Hi,

  I am new to R.

  I am trying to create an R function to do a SAS proc means/summary

  proc.means ( data=bsebal;
   class team year;
   var ab h; 
   output out=BseBalAvg mean=;
 run;)
   

I have a solution if I quote the the argument. The working code to produce
BseBalAvg
is very elegant.

normalize  - melt(bsebal, id=c(team, year))#
normalize data
transpose - cast(normalize, team + year ~ variable ,mean)   # team year h
ab (means) 

Here is the problem

In SAS we have the option parmbuff which puts all the 'macro arguments' text
into one string
ie

%macro procmeans(text)/parmbuff;
   %put text;
%mend procmeans;

%procmeans(This is a sentence);

result
This is a sentence

Here is my R code

# This works
proc.means - function() {
   sapply(match.call()[-1],deparse)
}
proc.means(thisisasentence)

Result
  
thisisasentence

Note sapply allows for multiple arguments
and is not needed but is more robust.

# However this does not work

proc.means(this is a sentence)

unexpected symbol in   proc means(this is)

It appears that the second space causes the error

I have had some luck using formulas  

# This works in spite of the spaces
proc.means - function(formula) { 
parmbuff - deparse(substitute(formula)) 
print(parmbuff)
}
proc.means(team + year + variable)

# this does not work - same issue as above
proc.means(team year variable)

unexpected symbol in   proc means(team year)




 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/SAS-Proc-summary-means-as-a-R-function-tp2286888p2286888.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS Proc summary/means as a R function

2010-07-12 Thread Erik Iverson


On 07/12/2010 07:16 PM, Roger Deangelis wrote:


Hi,

   I am new to R.

   I am trying to create an R function to do a SAS proc means/summary

   proc.means ( data=bsebal;
class team year;
var ab h;
output out=BseBalAvg mean=;
  run;)



So you're actually trying to have R generate the SAS code?   R is not, in 
general, a macro language, and attempts to use it as such are fighting against 
the current.


Since you're writing your own function, you can make it accept as many arguments 
as you want, even an arbitrary number.  For instance,


test - data.frame(a = rnorm(100), b = rnorm(100, 10), c = rnorm(100, 20))

summarize  - function(...) {
  dots - list(...)
  lapply(dots, summary)
}

summarize(test$a, test$b, test$c)

Is that what you'd like?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accessing files on password-protected FTP sites

2010-07-12 Thread Jeff Newmiller

There is a standard notation for passwords in urls ... see for example 
http://www.devx.com/tips/Tip/5604

Cliff Clive cliffcl...@gmail.com wrote:


Hello everyone,

Is it possible to download data from password-protected ftp sites?  I saw
another thread with instructions for uploading files using RCurl, but I
could not find information for downloading them in the RCurl documentation.

I am using R 2.11 on a Windows XP 32-bit machine.

Thanks in advance,

Cliff
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Accessing-files-on-password-protected-FTP-sites-tp2286862p2286862.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
---
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to select the column header with \Sexpr{}

2010-07-12 Thread Felipe Carrillo

I had tried that earlier and didn't work either, I probably have \Sexpr in the 
wrong place. See example:
Column one header gets blank:

\documentclass[11pt]{article}
\usepackage{longtable,verbatim,ctable}
\usepackage{longtable,pdflscape}
\usepackage{fmtcount,hyperref}
\usepackage{fullpage} 
\title{United States}
\begin{document}
\setkeys{Gin}{width=1\textwidth} 
\maketitle
echo=F,results=hide=
report - structure(list(Date = c(3/12/2010, 3/13/2010, 3/14/2010, 
3/15/2010), Run1 = c(33 (119 ? 119), n (0 ? 0), 893 (110 ? 146), 
140 (111 ? 150)), Run2 = c(33 (71 ? 71), n (0 ? 0), 
337 (67 ? 74), 140 (68 ? 84)), Run3 = c(890 (32 ? 47), 
n (0 ? 0), 10,602 (32 ? 52), 2,635 (34 ? 66)), Run4 = c(0 ( ? ), 
n (0 ? 0), 0 ( ? ), 0 ( ? )), Run4 = c(0 ( ? ), n (0 ? 0), 
0 ( ? ), 0 ( ? ))), .Names = c(ID_Date, Run1, Run2, 
Run3, Run4, Run5), row.names = c(NA, 4L), class = data.frame)
require(stringr)
report - t(apply(report, 1, function(x) {str_replace(x, \\?, -)}))
#report
#latex(report,file=)
@
\begin{landscape}
\begin{table}[!tbp]
 \begin{center}
 \begin{tabular}{ll}\hline\hline
\multicolumn{1}{c}{\Sexpr{names(report)[1]}}   # Using \Sexpr here
\multicolumn{1}{c}{Run1}
\multicolumn{1}{c}{Run2}
\multicolumn{1}{c}{Run3}
\multicolumn{1}{c}{Run4}
\multicolumn{1}{c}{Run5}\tabularnewline
\hline
13/12/201033 (119 ? 119)33 (71 ? 71)890 (32 ? 47)0 ( ? )0 ( ? 
)\tabularnewline
23/13/2010n (0 ? 0)n (0 ? 0)n (0 ? 0)n (0 ? 0)n (0 ? 0)\tabularnewline
33/14/2010893 (110 ? 146)337 (67 ? 74)10,602 (32 ? 52)0 ( ? )0 ( ? 
)\tabularnewline
43/15/2010140 (111 ? 150)140 (68 ? 84)2,635 (34 ? 66)0 ( ? )0 ( ? 
)\tabularnewline
\hline
\end{tabular}
\end{center}
\end{table}
\end{landscape}
\end{document}
 
Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish  Wildlife Service
California, USA



- Original Message 
 From: David Winsemius dwinsem...@comcast.net
 To: Felipe Carrillo mazatlanmex...@yahoo.com
 Cc: Duncan Murdoch murdoch.dun...@gmail.com; r-h...@stat.math.ethz.ch
 Sent: Mon, July 12, 2010 3:14:49 PM
 Subject: Re: [R] How to select the column header with \Sexpr{}
 
 
 On Jul 12, 2010, at 5:45 PM, Felipe Carrillo wrote:
 
  Thanks for the quick reply Duncan.
  I don't think I have explained myself well, I have a dataset named report 
and
  my column headers are run1,run2,run3,run4 and so on.
  
  I know how to access the data below those columns with \Sexpr{report[1,1]} 
  \Sexpr{report[1,2]} and so on, but I can't access my column headers
  with \Sexpr{} because I can't find the way to reference run1,run2,run3 and 
run4.
  Sorry if I am not explain myself really well.
 
 Wouldn't this just be:
 
 \Sexpr{names(report)}  # ?  or perhaps you want specific items in that vector?
 
 Sexpr{names(report)[1]}, Sexpr{names(report)[2]}, etc
 
 --David.
  
  
  
  
  - Original Message 
  From: Duncan Murdoch murdoch.dun...@gmail.com
  To: Felipe Carrillo mazatlanmex...@yahoo.com
  Cc: r-h...@stat.math.ethz.ch
  Sent: Mon, July 12, 2010 2:18:15 PM
  Subject: Re: [R] How to select the column header with \Sexpr{}
  
  On 12/07/2010 5:10 PM, Felipe Carrillo wrote:
  Hi:
  Since I work with a few different fish runs my column headers change
  everytime
  I start a new Year. I have been using \Sexpr{} for my row and columns and 
now
  I am trying to use with my report column headers. \Sexpr{1,1} is row 1 
column 1,
  what can I use for headers? I tried \Sexpr{0,1} but sweave didn't like
  it..Thanks in advance
  for any hints
  
  
  \Sexpr takes an R expression, and inserts the first element of the result 
into
  your text.  Using just 0,1 (not including the quotes) is not a valid R
  expression.
  
  You need to use paste() or some other function to construct the label you 
want
  to put in place, e.g. \Sexpr{paste(0,1,sep=,)} will give you 0,1.
  
  Duncan Murdoch
  
  
  
  
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius, MD
 West Hartford, CT
 
 


  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS Proc summary/means as a R function

2010-07-12 Thread RICHARD M. HEIBERGER

Please get a copy of
R for SAS and SPSS Users
*by*
*Muenchen*, Robert A.

http://www.springer.com/statistics/computanional+statistics/book/978-0-387-09417-5

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Xyplot or Tin-R problem?

2010-07-12 Thread RICHARD M. HEIBERGER

You missed FAQ 7.22

 7.22 Why do lattice/trellis graphics not work?

The most likely reason is that you forgot to tell R to display the graph.
Lattice functions such as xyplot() create a graph object, but do not display
it (the same is true of *ggplot2* graphics, and Trellis graphics in S-Plus).
The print() method for the graph object produces the actual display. When
you use these functions interactively at the command line, the result is
automatically printed, but in source() or inside your own functions you will
need an explicit print() statement.

The FAQ on R and the separate FAQ on R for Windows are both accessible from
the Help menu item on the R Console on the R GUI.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fast string comparison

2010-07-12 Thread Ralf B

I am asking this question because String comparison in R seems to be
awfully slow (based on profiling results) and I wonder if perhaps '=='
alone is not the best one can do. I did not ask for anything
particular and I don't think I need to provide a self-contained source
example for the question. So, to re-phrase my question, are there more
(runtime) effective ways to find out if two strings (about 100-150
characters long) are equal?

Ralf






On Sun, Jul 11, 2010 at 2:37 PM, Sharpie ch...@sharpsteen.net wrote:


 Ralf B wrote:

 What is the fastest way to compare two strings in R?

 Ralf


 Which way is not fast enough?

 In other words, are you asking this question because profiling showed one of
 R's string comparison operations is causing a massive bottleneck in your
 code? If so, which one and how are you using it?

 -Charlie

 -
 Charlie Sharpsteen
 Undergraduate-- Environmental Resources Engineering
 Humboldt State University
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Fast-string-comparison-tp2285156p2285409.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fast string comparison

2010-07-12 Thread Hadley Wickham

strings - replicate(1e5, paste(sample(letters, 100, rep = T), collapse =  ))
system.time(strings[-1] == strings[-1e5])
#   user  system elapsed
#  0.016   0.000   0.017

So it takes ~1/100 of a second to do ~100,000 string comparisons. You
need to provide a reproducible example that illustrates why you think
string comparisons are slow.

Hadley


On Tue, Jul 13, 2010 at 6:52 AM, Ralf B ralf.bie...@gmail.com wrote:
 I am asking this question because String comparison in R seems to be
 awfully slow (based on profiling results) and I wonder if perhaps '=='
 alone is not the best one can do. I did not ask for anything
 particular and I don't think I need to provide a self-contained source
 example for the question. So, to re-phrase my question, are there more
 (runtime) effective ways to find out if two strings (about 100-150
 characters long) are equal?

 Ralf






 On Sun, Jul 11, 2010 at 2:37 PM, Sharpie ch...@sharpsteen.net wrote:


 Ralf B wrote:

 What is the fastest way to compare two strings in R?

 Ralf


 Which way is not fast enough?

 In other words, are you asking this question because profiling showed one of
 R's string comparison operations is causing a massive bottleneck in your
 code? If so, which one and how are you using it?

 -Charlie

 -
 Charlie Sharpsteen
 Undergraduate-- Environmental Resources Engineering
 Humboldt State University
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Fast-string-comparison-tp2285156p2285409.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cbind in for loops

2010-07-12 Thread jd6688


I have 30 files in the current directories, i would like to perform the
cbind(fil1,file2,file3,file4file30)

how could i do this in a for loop:

such as:
 file2 - list.files(pattern=.out3$)
   for (j in file2) {
 cbind(j)...how to implement cbind here
  }


Thanks.


-- 
View this message in context: 
http://r.789695.n4.nabble.com/cbind-in-for-loops-tp2285690p2285690.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] make an model object (e.g. nlme) available in a user defined function (xyplot related)

2010-07-12 Thread Deepayan Sarkar

On Mon, Jul 12, 2010 at 2:51 AM, Jun Shen jun.shen...@gmail.com wrote:
 Dear all,

 When I construct an nlme model object by calling nlme(...)-mod.nlme,
 this object can be used in xyplot(). Something like

 xyplot(x,y,..
 ..
 ind.predict-predict(mod.nlme)
 ..
 ) is working fine in console environment.

 But the same structure is not working in a user defined function. It
 seems the mod.nlme created in a user defined function can not be
 called in xyplot(). Why is that? Appreciate any comment. (The error
 message says  Error in using packet 1, object model not found)

Quoting from the footer:

 PLEASE [...] provide commented, minimal, self-contained, reproducible code.

-Deepayan


 Thanks.

 Jun Shen

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] boxplot on all the columns

2010-07-12 Thread jd6688




how to use boxplot on all the columns from he date frame instead of manually
entering the columns like below

bhtest1 - read.table(bhtest1.txt, header=TRUE)

boxplot (bhtest1[,2], bhtest1[,3], bhtest1[, 4], bhtest1[,5], bhtest1[,6],
bhtest1[,7])  

please help, Thanks,


-- 
View this message in context: 
http://r.789695.n4.nabble.com/boxplot-on-all-the-columns-tp2285693p2285693.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with comparisons for vectors

2010-07-12 Thread Joshua Wiley

This is also mentioned in FAQ 7.31

http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f

Also if you search the R-help archives for 'precision' you can find a
lot of threads discussing the issue in further depth.

On Sun, Jul 11, 2010 at 9:02 PM, Wu Gong w...@mtmail.mtsu.edu wrote:

 I don't know the real reason, but help(==) gives some clues.

 For numerical and complex values, remember == and != do not allow for the
 finite representation of fractions, nor for rounding error. Using all.equal
 with identical is almost always preferable. See the examples.

 x1 - 0.5 - 0.3
 x2 - 0.3 - 0.1
 x1 == x2                           # FALSE on most machines
 identical(all.equal(x1, x2), TRUE) # TRUE everywhere


 -
 A R learner.
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/problem-with-comparisons-for-vectors-tp2285557p2285685.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] boxplot on all the columns

2010-07-12 Thread Tal Galili

Actually,
boxplot (bhtest1)
Should do what you want...


Tal

Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Mon, Jul 12, 2010 at 7:14 AM, jd6688 jdsignat...@gmail.com wrote:

 boxplot (bhtest1

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cbind in for loops

2010-07-12 Thread Joshua Wiley

Hi,

Assuming that you have read the files into R,
and that their names (in R) are held in some object
(e.g., 'file2'), then this works

do.call(what = cbind, args = mget(x = file2, envir = .GlobalEnv)

Here is a reproducible example:

x1 - data.frame(x = 1:10)
x2 - data.frame(y = 1:10)
file.names - c(x1, x2)
do.call(cbind, mget(file.names, envir=.GlobalEnv))

Best regards,

Josh


On Sun, Jul 11, 2010 at 9:08 PM, jd6688 jdsignat...@gmail.com wrote:

 I have 30 files in the current directories, i would like to perform the
 cbind(fil1,file2,file3,file4file30)

 how could i do this in a for loop:

 such as:
     file2 - list.files(pattern=.out3$)
       for (j in file2) {
         cbind(j)...how to implement cbind here
      }


 Thanks.


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/cbind-in-for-loops-tp2285690p2285690.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interrupt R?

2010-07-12 Thread Prof Brian Ripley


On Sun, 11 Jul 2010, Spencer Graves wrote:


Hi, Richard and Duncan:


 Thank you both very much.  You provided different but workable 
solutions.



   1.  With Rgui 2.11.1 on Vista x64, the escape worked, but neither 
ctrl-c nor ctrl-C worked for me.


Why did you expect them too?  Ctrl-C is documented to implement 'Copy' 
(the standard Windows shortcut).  (Did you mean Ctrl-Shift-C by 
'Ctrl-C' as distinct from 'Ctrl-c'?  I don't think that works 
anywhere.)


As Duncan said, Ctrl-C works in Rterm, and in almost all other R 
implementations (the Mac R.app GUI is the only other exception I know: 
it also uses Escape).


This is documented in the README (called something like 
README.R-2.11.1 in the binary distribution) and in the rw-FAQ Q5.1. 
Maybe it would be a good idea to refresh your memory of the basic 
documentation?


   2.  The TCLTK version works but seems to require either more 
skill from the programmer or more user training than using escape under Rgui 
or ctrl-g/c under Emacs.



 Best Wishes,
 Spencer


On 7/11/2010 12:02 PM, Duncan Murdoch wrote:

On 11/07/2010 2:29 PM, Spencer Graves wrote:

   How can one interrupt the following gracefully:


while(TRUE){
   Sys.sleep(1)
}


   In R2.11.1 under Emacs+ESS, some sequence of ctrl-g, ctrl-c 
eventually worked for me.  Under Rgui 2.11.1, the only way I've found was 
to kill R.



   Suggestions on something more graceful?


This is an Emacs+ESS bug.  In the Windows GUI or using Rterm, the standard 
methods (ESC or Ctrl-C resp.) work fine.


Duncan Murdoch




   Beyond this, what would you suggest to update a real-time report 
when new data arrives in a certain directory?  A generalization of the 
above works, but I'd like something more graceful.



   Thanks,
   Spencer Graves


sessionInfo()
R version 2.11.1 (2010-05-31)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] SIM_0.5-0  RCurl_1.4-2bitops_1.0-4.1 R2HTML_2.1 oce_0.1-80


--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply is slower than for loop?

2010-07-12 Thread Liviu Andronic

On Fri, Jul 9, 2010 at 9:11 PM, Gene Leynes gleyne...@gmail.com wrote:
 I thought the apply functions are faster than for loops, but my most
 recent test shows that apply actually takes a significantly longer than a
 for loop.  Am I missing something?

Check Rnews for an article discussing proper usage of apply and for.
Liviu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply is slower than for loop?

2010-07-12 Thread Allan Engelhardt

On 12/07/10 08:16, Liviu Andronic wrote:
 On Fri, Jul 9, 2010 at 9:11 PM, Gene Leynesgleyne...@gmail.com  wrote:

 [...]
 Check Rnews for an article discussing proper usage of apply and for.
 Liviu


I am guessing you are referencing

@article{Rnews:Ligges+Fox:2008  
http://cran.r-project.org/doc/Rnews/bib/Rnews.html#Rnews:Ligges+Fox:2008,
   author = {Uwe Ligges and John Fox},
   title = {{{R} {H}elp {D}esk}: {H}ow Can {I} Avoid This Loop or Make It 
Faster?},
   journal = {R News},
   year = 2008,
   volume = 8,
   number = 1,
   pages = {46--50},
   month = {May},
   url = {http://CRAN.R-project.org/doc/Rnews/},
   pdf = {http://CRAN.R-project.org/doc/Rnews/Rnews_2008-1.pdf}
}


Allan


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] interpretation of svm models with the e1071 package

2010-07-12 Thread manuel.martin


Thanks a lot for the reply, some comments below

On 07/10/2010 04:11 AM, Steve Lianoglou wrote:

Hi,

On Fri, Jul 9, 2010 at 12:15 PM, manuel.martin
manuel.mar...@orleans.inra.fr  wrote:
   

Dear all,

after having calibrated a svm model through the svm() command of the e1071
package, is there a way to
i) represent the modeled relationships between the y and X variables
(response variable vs. predictors)?
 

Can you explain a bit more ... how do you want them represented?
   
I was thinking to a simple ŷ = fi(Xi) plot, fi resulting from the fitted 
svm model. Xi is the predictor, among the whole set of predictors, X, 
one wish to see the relationship with the response.
For boosted regression trees, which I am more familiar with, this is fi 
function is estimated by averaging the effects of all predictors but Xi, 
and plotting how ŷ varies as Xi does.


Hope this is a bit clearer, Manuel

   

ii) rank the influence of the predictors used in the model?
 

One technique that's often/sometimes used is to calculate the SVM's W
vector by using the support vectors along with their learned
weights/alphas.

This comes up every now and again. Here's an older post explaining how
you might do that with the svm model from e1071:

http://article.gmane.org/gmane.comp.lang.r.general/158272/match=w+b+vector+svr

Hope that helps.

   



--


INRA - InfoSol
Centre de recherche d'Orléans
2163 Avenue de la Pomme de Pin
CS 40001 ARDON
45075 ORLEANS Cedex 2
tel : (33) (0)2 38 41 48 21
fax : (33) (0)2 38 41 78 69
http://www.gissol.fr
http://bdat.orleans.inra.fr
00--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 123 matches

Mail list logo