Re: [R] Suggestion for big files [was: Re: A comment about R:]
I found Reservoir-Sampling algorithms of time complexity O(n(1+log(N/n))) by Kim-Hung Li , ACM Transactions on Mathematical Software Vol 20 No 4 Dec 94 p481-492. He mentions algorithm Z and K and proposed 2 improved versions alg L and M. Algorith L is really easy to implement but relatively slow, M doesn't look very difficult and is the fastest. Heberto Ghezzo McGill University Montreal - Canada Quoting François Pinard [EMAIL PROTECTED]: [Martin Maechler] FrPi Suppose the file (or tape) holds N records (N is not known FrPi in advance), from which we want a sample of M records at FrPi most. [...] If the algorithm is carefully designed, when FrPi the last (N'th) record of the file will have been processed FrPi this way, we may then have M records randomly selected from FrPi N records, in such a a way that each of the N records had an FrPi equal probability to end up in the selection of M records. I FrPi may seek out for details if needed. [...] I'm also intrigued about the details of the algorithm you outline above. I went into my old SPSS books and related references to find it for you, to no avail (yet I confess I did not try very hard). I vaguely remember it was related to Spearman's correlation computation: I did find notes about the severe memory limitation of this computation, but nothing about the implemented workaround. I did find other sampling devices, but not the very one I remember having read about, many years ago. On the other hand, Googling tells that this topic has been much studied, and that Vitter's algorithm Z seems to be popular nowadays (even if not the simplest) because it is more efficient than others. Google found a copy of the paper: http://www.cs.duke.edu/~jsv/Papers/Vit85.Reservoir.pdf Here is an implementation for Postgres: http://svr5.postgresql.org/pgsql-patches/2004-05/msg00319.php yet I do not find it very readable -- but this is only an opinion: I'm rather demanding in the area of legibility, while many or most people are more courageous than me! :-). -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] (no subject)
R friends, I am using R 2.1.0 in a Win XP . I have a problem working with lists, probably I do not understand how to use them. Lets suppose that a set of patients visit a clinic once a year for 4 years on each visit a test, say 'eib' is performed with results 0 or 1 The patients do not all visit the clinic the 4 times but they missed a lot of visits. The test is considered positive if it is positive at the last 2 visits of that patient, or a more lenient definition, it is positive in the last visit, and never before. Otherwise it is Negative = always negative or is a YoYo = unstable = changes from positive to negative. So, if I codify the visits with codes 1,2,4,8 if present at year 1,2,3,4 and similarly the tests positive I get the last2 list codifying the test code corresponding to the visits patterns possible, similarly the last1 list 20 here means NULL nobs - 400 # visits 0 1 23 45 6 7 89 last1 - list((20),(1),(2),c(3,2),(4),c(5,4),c(6,4),c(7,6,4),(8),c(9,8), # visits 10 11 12 13 14 15 c(10,8),c(11,10,8),c(12,8),c(13,12,8),c(14,12,8),c(15,14,12,8)) # visits 0 123 45 67 89 last2 - list((20),(20),(20),(3),(20),(5),(6),c(7,6),(20),(9), # visits 1011 1213 14 15 (10),c(11,10),(12),c(13,12),c(14,12),c(15,14,12)) # # simulate the visits # visit - rbinom(nobs,1,0.7) eib - visit # # simulate a positive test at a given visit # eib - ifelse(runif(nobs) 0.7,visit,0) # # create the codes # viskode - matrix(visit,ncol=4) %*% c(1,2,4,8) eibkode - matrix(eib,ncol=4) %*% c(1,2,4,8) # #this is the brute force method, slow, of computing the Results according to #the 2 definitions above. Add 16 to the test kode to signify YoYos, Exactly #16 will be the negatives # eibnoyoyo - eibkode+16 eiblst2 - eibkode+16 for(i in 1:nobs){ if(eibkode[i] %in% last1[[viskode[i]+1]]) eibnoyoyo[i] - eibkode[i] if(eibkode[i] %in% last2[[viskode[i]+1]]) eiblast2[i] - eibkode[i] } # #why is that these statements do not work? # eeibnoyoyo - eeiblst2 - rep(0,nobs) eeibnoyoyo - ifelse(eibkode %in% last1[viskode+1],eibkode,eibkode+16) eeiblast2 - ifelse(eibkode %in% last2[viskode+1],eibkode,eibkode+16) # table(viskode,eibkode) table(viskode,eibnoyoyo) table(viskode,eiblast2) # # these two tables must be diagonal!! # table(eibnoyoyo,eeibnoyoyo) table(eiblast2,eeiblast2) # Thanks for any help Heberto Ghezzo McGill University Canada __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Re: automatic updating
Hello, Running R 2.1.0 in a Win XP I put the snipped to automatically update my libraries on Tuesdays that was presented to the list some time back, It worked with no problem for R 2.0.1 but now that I installed R 2.1.0 and copy the old Rprofile to the new r/etc I get an error. This is my Rprofile: - - - - - - - - - - - - - - - - - - - - # Things you might want to change # options(papersize=a4) # options(editor=notepad) # options(pager=internal) # to prefer Compiled HTML help options(chmhelp=TRUE) # to prefer HTML help # options(htmlhelp=TRUE) # to prefer Windows help # options(winhelp=TRUE) .libPaths(c(c:/r/r_cran/library,c:/r/r_src/library, c:/r/r_jl/library,c:/r/r_bdr/library, c:/r/r_bio/library)) # # This script gets all the packages I don't already have # Run this once a week - say Tuesdays # if (interactive() ) { library(utils)} is.tuesday - as.POSIXlt(Sys.time())$wday == 2 if (is.tuesday == T) { cat(Running a package check...\nOccurs once a week, on Tuesdays\n) cat(Upgrade existing packages and check for new packages (y/N)? ) check.new - as.character(readLines(n = 1)) if (any(check.new == y, check.new == Y)) { options(CRAN = http://cran.us.r-project.org/;) cat(This can take a few seconds...\n) x - packageStatus(repositories = getOption(repositories)()[[1]]) print(x) install.packages(x$avail$Package[x$avail$Status == not installed]) cat(Upgrading to new versions if available\n) upgrade(x) } } # - - - - - - - - - - - - - when I start R 2.1.0 I get: R : Copyright 2005 Type 'q()' to quit R Running a package check... Occurs once a week, on Tuesdays Upgrade existing packages and check for new packages (y/N)? y This can take a few seconds... Error in packageStatus(repositories = getOption(repositories)()[[1]]) : attempt to apply non-function Where do I have to modify the snippet so it works with R 2.1, it was perfect for 2.0.1 Thanks for any help Heberto Ghezzo McGill University Montreal - Canada __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] find source code
I am using R 2.0.2 on a WinXP I am trying to get the code of the Kruskal-Wallis test but kruskal.test function (x, ...) UseMethod(kruskal.test) environment: namespace:stats ls(3) [1] acf acf2AR add.scope .. [181] kruskal.test ks.test ksmooth ... [475] window- write.ftable xtabs class(kruskal.test) [1] function getS3method(kruskal.test,function) Error in getS3method(kruskal.test, function) : S3 method kruskal.test.function not found getS3method(stats::kruskal.test,function) Error in getS3method(stats::kruskal.test, function) : no function 'stats::kruskal.test' could be found I searched the archives and the answer was ' use getS3method ' . The help for getS3method is getS3method(f,class,optional=FALSE) so I am lost Can somebody tell me how to get the source listing of kruskal.test or of any other hidden function? Thanks Heberto Ghezzo Meakins-Christie Labs Canada __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] find source code
Thanks to all who answered my query, I forgot completely to call methods() first to check the true whole name of the function. Heberto Ghezzo Quoting Uwe Ligges [EMAIL PROTECTED]: Simon Wood wrote: stats:::kruskal.test.default and how to get there: methods(kruskal.test) # note, you probably want the default method! getS3method(kruskal.test, default) Uwe On Mon, 17 Jan 2005 [EMAIL PROTECTED] wrote: I am using R 2.0.2 on a WinXP I am trying to get the code of the Kruskal-Wallis test but kruskal.test function (x, ...) UseMethod(kruskal.test) environment: namespace:stats ls(3) [1] acf acf2AR add.scope .. [181] kruskal.test ks.test ksmooth ... [475] window- write.ftable xtabs class(kruskal.test) [1] function getS3method(kruskal.test,function) Error in getS3method(kruskal.test, function) : S3 method kruskal.test.function not found getS3method(stats::kruskal.test,function) Error in getS3method(stats::kruskal.test, function) : no function 'stats::kruskal.test' could be found I searched the archives and the answer was ' use getS3method ' . The help for getS3method is getS3method(f,class,optional=FALSE) so I am lost Can somebody tell me how to get the source listing of kruskal.test or of any other hidden function? Thanks Heberto Ghezzo Meakins-Christie Labs Canada __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] help with limma
Follow up on my previous e-mail I am using Affys nzwC etc. are single columns vectors length 12000 then nzw,akr,bas are 12000 by 6 matrices Thanks again for any help, now I resend the e-mail to Gordon with the correct address I hope. Heberto Ghezzo McGill - Canada __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] problems with limma
I try to send this message To Gordon Smyth at [EMAIL PROTECTED],edu.au but it bounced back, so here it is to r-help I am trying to use limma, just downloaded it from CRAN. I use R 2.0.1 on Win XP see the following: library(RODBC) chan1 - odbcConnectExcel(D:/Data/mgc/Chips/Chips4.xls) dd - sqlFetch(chan1,Raw) # all data 12000 # nzw - cbind(dd$NZW1C,dd$NZW2C,dd$NZW3C,dd$NZW1T,dd$NZW2T,dd$NZW3T) akr - cbind(dd$AKR1C,dd$AKR2C,dd$AKR3C,dd$AKR1T,dd$AKR2T,dd$AKR3T) bas - cbind(dd$NZW1C,dd$NZW2C,dd$NZW3C,dd$AKR1C,dd$AKR2C,dd$AKR3C) # design-matrix(c(1,1,1,1,1,1,0,0,0,1,1,1),ncol=2) fit1 - lmFit(nzw,design) fit1 - eBayes(fit1) topTable(fit1,adjust=fdr,number=5) M t P.Value B 1 3679.480 121.24612 7.828493e-06 -4.508864 1903 3012.405 118.32859 7.828493e-06 -4.508866 9068 1850.232 92.70893 1.178902e-05 -4.508889 10635 2843.534 91.99336 1.178902e-05 -4.508890 561 18727.858 90.17085 1.178902e-05 -4.508893 # fit2 - lmFit(akr,design) fit2 - eBayes(fit2) topTable(fit2,adjust=fdr,number=5) Mt P.Value B 88 1426.738 80.48058 5.839462e-05 -4.510845 1964 36774.167 73.05580 5.839462e-05 -4.510861 5854 7422.578 68.60316 5.839462e-05 -4.510874 11890 1975.316 66.54480 5.839462e-05 -4.510880 9088 2696.952 64.16343 5.839462e-05 -4.510889 # fit3 - lmFit(bas,design) fit3 - eBayes(fit3) topTable(fit3,adjust=fdr,number=5) M t P.Value B 6262 1415.088 100.78933 2.109822e-05 -4.521016 5660 1913.479 96.40903 2.109822e-05 -4.521020 11900 4458.489 94.30738 2.109822e-05 -4.521022 9358 1522.330 80.46641 3.346749e-05 -4.521041 11773 1784.483 73.76620 3.346749e-05 -4.521053 #Now lets do all together in Anova # all - cbind(nzw,akr) ts - c(1,1,1,2,2,2,3,3,3,4,4,4) ts - as.factor(ts) levels(ts) - c(nzwC,nzwT,akrC,akrT) design - model.matrix(~0+ts) colnames(design) - levels(ts) fit4 - lmFit(all,design) cont.matrix - makeContrasts( + Baseline = akrC - nzwC, + NZW_Smk = nzwT - nzwC, + AKR_Smk = akrT - akrC, + Diff = (akrT - akrC) - (nzwT - nzwC), + levels=design) fit42 - contrasts.fit(fit4,cont.matrix) fit42 - eBayes(fit42) # topTable(fit42,coef=Baseline,adjust=fdr,number=5) M t P.Value B 3189942.0993 13.57485 0.004062283 -4.528799 8607 2634.1826 11.23476 0.006913442 -4.530338 10242 -942.2860 -10.99253 0.006913442 -4.530551 283-609.0831 -10.79354 0.006913442 -4.530735 3224 -1564.2572 -10.19429 0.008089034 -4.531351 - Shouldn't this be equal to fit1 above? topTable(fit42,coef=NZW_Smk,adjust=fdr,number=5) M t P.Value B 7724 -246.5956 -8.687324 0.1615395 -4.591133 1403 -307.8660 -7.063312 0.4066814 -4.591363 3865 -253.4899 -6.585582 0.4598217 -4.591457 3032 -509.2413 -5.841901 0.8294166 -4.591640 2490 -240.3259 -5.338679 0.9997975 -4.591795 - Shouldn't this be equal to fit2 above? - The P.Value are unreal!! topTable(fit42,coef=AKR_Smk,adjust=fdr,number=5) Mt P.Value B 11547 151.6622 6.380978 0.917470 -4.595085 12064 324.0851 6.337235 0.917470 -4.595085 6752 964.5478 5.858994 0.952782 -4.595086 10251 152.7587 5.339843 0.952782 -4.595087 1440 189.6056 4.933151 0.952782 -4.595089 - Shouldn't this be equal to fit3 above? - The P.Value are unreal!! topTable(fit42,coef=Diff,adjust=fdr,number=5) M t P.Value B 7724 302.6892 7.540195 0.4102211 -4.593201 1403 419.4962 6.805495 0.4102211 -4.593265 10251 270.5269 6.686796 0.4102211 -4.593277 3270 409.8391 6.414966 0.4192042 -4.593307 10960 -511.4711 -5.469247 0.9652171 -4.593435 # So the results I get from just pairwise comparisons are very significant, but when I try the Anova way, the significance completely dissapears. Am I doing something completely wrong? This is data from Affimetrix mouse chips. Thanks for any help Heberto Ghezzo Ph.D. Meakins-Christie Labs McGill University Montreal - Canada __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] problems with compiling a package
Hello, I am trying to compile packages for R2.0.0 patch in a win XP machine. Most of the packages compile without problems, with C or FTG or only R. Now some packeges give the following error which I do not understand how to correct ... preparing package xxx for lazy loading Error in names - .default('*tmp*',value=c(R,Platform,Date, : names attribute[4] must be the same length as the vector [3] Execution halted make: *** [lazyload] Error 1 Can somebody tell me how I can correct this error? One other question, this npreparing package for lazy loading does not occur for all packages, although their DESCRIPTION and folders are similar, When does a package goes to lazy loading and when it does not? Thanks Heberto Ghezzo McGill U Montreal - Canada __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] problems compiling packages in R 2.0.0
Hello, I am trying to get my old packages to work in R 2.0.0 in Windows XP. Here is what I did Etc is a package of pure R functions Rcmd INSTALL -l c:/R/R_Src/library C:/R/R_Src/src/Etc -Making package Etc - adding build stamp to DESCRIPTION installing R files installing man source files installing indices cat: c:/r/rw2000/library/*/CONTENTS: No such file or directory make[2]: ***[indices] Error 1 make[1]: ***[all] Error 2 make: *** [pkg-Etc] Error 2 *** Instalation of Etc failes *** Removing 'c:/R/R_Src/library/Etc' Dunnett is a package that computes the p value from Dunnett t-test has source code in Fortran Rcmd INSTALL -l c:/R/R_Src/library C:/R/R_Src/src/Dunnett -Making package Dunnett - adding build stamp to DESCRIPTION making DLL ... ... DLL made installing R files installing man source files installing indices cat: c:/r/rw2000/library/*/CONTENTS: No such file or directory make[2]: ***[indices] Error 1 make[1]: ***[all] Error 2 make: *** [pkg-Etc] Error 2 *** Instalation of Etc failes *** Removing 'c:/R/R_Src/library/Etc' Can somebody help me with that 'CONTENTS' file that does not exist? thanks for any help. Heberto Ghezzo McGill University Canada __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Re: problems installing package in R 2.0.0
Hello, I just installed R 2.0.0 in a Win XP machine. As old programs do not wor I tried to re-install them by: C:\R\RW2000\binRcmd INSTALL c:\r\r_src\src\autologi --Making package autologi -- adding build stamp to DESCRIPTION installing R files installing data files installing man source files installing indices not zipping data installing help Building/Updating help pages for package 'autologi' Formats: text html latex example chm autologi text html latex example wc: C:/R/rw2000/library/autologi/R/autologi: No such file or directory adding MD5 sums * DONE autologit then in R library() Packages in library 'C:/R/rw2000/library': autologi** No title available (pre-2.0.0 install?) ** baseThe R Base Package ... my directory for autologi has the following structure: c:\r\r_src\src\autologi\DESCRIPTION TITLE \R\autologi.r \man\autologi.RD \data\ex.dat I could not find anything relevant in the last version of Writing R Extensions that came with R 2.0.0. Another question. I did a full install packages from CRAN but then comparing the list of packages downloaded and installed with those in CRAN/windows/contrib/2.0/ i found packages like moc, multidim, multiv, netCDF, serialize, yags, xgobi. Can these packages be downloaded and installed or there is something broken in them? Thanks for any help and thanks to the R-Team. Heberto Ghezzo - McGill University __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html