Re: [R] Suggestion for big files [was: Re: A comment about R:]

2006-01-09 Thread r . ghezzo
I found Reservoir-Sampling algorithms of time complexity O(n(1+log(N/n))) by
Kim-Hung Li , ACM Transactions on Mathematical Software Vol 20 No 4 Dec 94
p481-492.
He mentions algorithm Z and K and proposed 2 improved versions alg L and M.
Algorith L is really easy to implement but relatively slow, M doesn't look very
difficult and is the fastest.
Heberto Ghezzo
McGill University
Montreal - Canada

Quoting François Pinard [EMAIL PROTECTED]:

 [Martin Maechler]

 FrPi Suppose the file (or tape) holds N records (N is not known
 FrPi in advance), from which we want a sample of M records at
 FrPi most. [...] If the algorithm is carefully designed, when
 FrPi the last (N'th) record of the file will have been processed
 FrPi this way, we may then have M records randomly selected from
 FrPi N records, in such a a way that each of the N records had an
 FrPi equal probability to end up in the selection of M records.  I
 FrPi may seek out for details if needed.

 [...] I'm also intrigued about the details of the algorithm you
 outline above.

 I went into my old SPSS books and related references to find it for you,
 to no avail (yet I confess I did not try very hard).  I vaguely remember
 it was related to Spearman's correlation computation: I did find notes
 about the severe memory limitation of this computation, but nothing
 about the implemented workaround.  I did find other sampling devices,
 but not the very one I remember having read about, many years ago.

 On the other hand, Googling tells that this topic has been much studied,
 and that Vitter's algorithm Z seems to be popular nowadays (even if not
 the simplest) because it is more efficient than others.  Google found
 a copy of the paper:

http://www.cs.duke.edu/~jsv/Papers/Vit85.Reservoir.pdf

 Here is an implementation for Postgres:

http://svr5.postgresql.org/pgsql-patches/2004-05/msg00319.php

 yet I do not find it very readable -- but this is only an opinion: I'm
 rather demanding in the area of legibility, while many or most people
 are more courageous than me! :-).

 --
 François Pinard   http://pinard.progiciels-bpi.ca

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] (no subject)

2005-06-20 Thread r . ghezzo

R friends,
I am using R 2.1.0 in a Win XP . I have a problem working with lists, probably I
do not understand how to use them.

Lets suppose that a set of patients visit a clinic once a year for 4 years
on each visit a test, say 'eib' is performed with results 0 or 1
The patients do not all visit the clinic the 4 times but they missed a lot
of visits.
The test is considered positive if it is positive at the last 2 visits of that
patient, or a more lenient definition, it is positive in the last visit, and
never before.
Otherwise it is Negative = always negative or is a YoYo = unstable = changes
from positive to negative.
So, if I codify the visits with codes 1,2,4,8 if present at year 1,2,3,4 and
similarly the tests positive I get the last2 list codifying the test code
corresponding to the visits patterns possible, similarly the last1 list
20 here means NULL

nobs - 400
#  visits   0   1   23 45  6  7   89
last1 - list((20),(1),(2),c(3,2),(4),c(5,4),c(6,4),c(7,6,4),(8),c(9,8),
#  visits  10  11 12  13 14 15
 c(10,8),c(11,10,8),c(12,8),c(13,12,8),c(14,12,8),c(15,14,12,8))
#  visits   0   123   45   67 89
last2 - list((20),(20),(20),(3),(20),(5),(6),c(7,6),(20),(9),
#  visits  1011  1213   14   15
  (10),c(11,10),(12),c(13,12),c(14,12),c(15,14,12))
#
# simulate the visits
#
visit - rbinom(nobs,1,0.7)
eib - visit
#
# simulate a positive test at a given visit
#
eib - ifelse(runif(nobs)  0.7,visit,0)
#
# create the codes
#
viskode - matrix(visit,ncol=4) %*% c(1,2,4,8)
eibkode - matrix(eib,ncol=4) %*% c(1,2,4,8)
#
#this is the brute force method, slow, of computing the Results according to
#the 2 definitions above. Add 16 to the test kode to signify YoYos, Exactly
#16 will be the negatives
#
 eibnoyoyo - eibkode+16
 eiblst2 - eibkode+16
 for(i in 1:nobs){
   if(eibkode[i] %in% last1[[viskode[i]+1]])
  eibnoyoyo[i] - eibkode[i]
   if(eibkode[i] %in% last2[[viskode[i]+1]])
  eiblast2[i] - eibkode[i]
 }
#
#why is that these statements do not work?
#
eeibnoyoyo - eeiblst2 - rep(0,nobs)
eeibnoyoyo - ifelse(eibkode %in% last1[viskode+1],eibkode,eibkode+16)
eeiblast2   - ifelse(eibkode %in% last2[viskode+1],eibkode,eibkode+16)
#
table(viskode,eibkode)
table(viskode,eibnoyoyo)
table(viskode,eiblast2)
#
#  these two tables must be diagonal!!
#
table(eibnoyoyo,eeibnoyoyo)
table(eiblast2,eeiblast2)
#
Thanks for any help
Heberto Ghezzo
McGill University
Canada

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Re: automatic updating

2005-04-19 Thread r . ghezzo
Hello,
Running R 2.1.0 in a Win XP
I put the snipped to automatically update my libraries on Tuesdays that was
presented to the list some time back, It worked with no problem for R 2.0.1 but
now that I installed R 2.1.0 and copy the old Rprofile to the new r/etc I get
an error.
This is my Rprofile:
- - - - - - - - - - - - - - - - - - - -
# Things you might want to change

# options(papersize=a4)
# options(editor=notepad)
# options(pager=internal)

# to prefer Compiled HTML help
 options(chmhelp=TRUE)

# to prefer HTML help
# options(htmlhelp=TRUE)

# to prefer Windows help
# options(winhelp=TRUE)

.libPaths(c(c:/r/r_cran/library,c:/r/r_src/library,
c:/r/r_jl/library,c:/r/r_bdr/library,
c:/r/r_bio/library))
#
# This script gets all the packages I don't already have
# Run this once a week - say Tuesdays
#
if (interactive() ) { library(utils)}
 is.tuesday - as.POSIXlt(Sys.time())$wday == 2
 if (is.tuesday == T)
 {
cat(Running a package check...\nOccurs once a week, on Tuesdays\n)
cat(Upgrade existing packages and check for new packages (y/N)? )
check.new - as.character(readLines(n = 1))
if (any(check.new == y, check.new == Y))
{
options(CRAN = http://cran.us.r-project.org/;)
cat(This can take a few seconds...\n)
x - packageStatus(repositories = getOption(repositories)()[[1]])
print(x)
install.packages(x$avail$Package[x$avail$Status == not installed])
cat(Upgrading to new versions if available\n)
upgrade(x)
   }
 }
#
- - - - - - - - - - - - -
when I start R 2.1.0 I get:

R : Copyright 2005

Type 'q()' to quit R

Running a package check...
Occurs once a week, on Tuesdays
Upgrade existing packages and check for new packages (y/N)? y
This can take a few seconds...
Error in packageStatus(repositories = getOption(repositories)()[[1]]) :
attempt to apply non-function

Where do I have to modify the snippet so it works with R 2.1, it was perfect for
2.0.1
Thanks for any help

Heberto Ghezzo
McGill University
Montreal - Canada

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] find source code

2005-01-17 Thread r . ghezzo
I am using R 2.0.2 on a WinXP
I am trying to get the code of the Kruskal-Wallis test but
 kruskal.test
function (x, ...)
UseMethod(kruskal.test)
environment: namespace:stats

 ls(3)
  [1] acf  acf2AR   add.scope
..
[181] kruskal.test ks.test  ksmooth
...
[475] window- write.ftable xtabs

 class(kruskal.test)
[1] function

 getS3method(kruskal.test,function)
Error in getS3method(kruskal.test, function) :
S3 method kruskal.test.function not found

 getS3method(stats::kruskal.test,function)
Error in getS3method(stats::kruskal.test, function) :
no function 'stats::kruskal.test' could be found

I searched the archives and the answer was ' use getS3method ' . The help for
getS3method is getS3method(f,class,optional=FALSE) so I am lost
Can somebody tell me how to get the source listing of kruskal.test or of any
other hidden function?
Thanks
Heberto Ghezzo
Meakins-Christie Labs
Canada

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] find source code

2005-01-17 Thread r . ghezzo
Thanks to all who answered my query, I forgot completely to call methods() first
to check the true whole name of the function.
Heberto Ghezzo

Quoting Uwe Ligges [EMAIL PROTECTED]:

 Simon Wood wrote:

  stats:::kruskal.test.default


 and how to get there:

 methods(kruskal.test) # note, you probably want the default method!
 getS3method(kruskal.test, default)


 Uwe


  On Mon, 17 Jan 2005 [EMAIL PROTECTED] wrote:
 
 
 I am using R 2.0.2 on a WinXP
 I am trying to get the code of the Kruskal-Wallis test but
 
 kruskal.test
 
 function (x, ...)
 UseMethod(kruskal.test)
 environment: namespace:stats
 
 ls(3)
 
   [1] acf  acf2AR   add.scope
 ..
 [181] kruskal.test ks.test  ksmooth
 ...
 [475] window- write.ftable xtabs
 
 
 class(kruskal.test)
 
 [1] function
 
 
 getS3method(kruskal.test,function)
 
 Error in getS3method(kruskal.test, function) :
 S3 method kruskal.test.function not found
 
 
 getS3method(stats::kruskal.test,function)
 
 Error in getS3method(stats::kruskal.test, function) :
 no function 'stats::kruskal.test' could be found
 
 I searched the archives and the answer was ' use getS3method ' . The help
 for
 getS3method is getS3method(f,class,optional=FALSE) so I am lost
 Can somebody tell me how to get the source listing of kruskal.test or of
 any
 other hidden function?
 Thanks
 Heberto Ghezzo
 Meakins-Christie Labs
 Canada
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
 
 
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] help with limma

2004-12-21 Thread r . ghezzo
Follow up on my previous e-mail
I am using Affys nzwC etc. are single columns vectors length 12000
then nzw,akr,bas are 12000 by 6 matrices
Thanks again for any help, now I resend the e-mail to Gordon with the correct
address I hope.
Heberto Ghezzo
McGill - Canada

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] problems with limma

2004-12-20 Thread r . ghezzo
I try to send this message To Gordon Smyth at [EMAIL PROTECTED],edu.au but it 
bounced
back, so here it is to r-help

I am trying to use limma, just downloaded it from CRAN. I use R 2.0.1 on Win XP
see the following:
 library(RODBC)
 chan1 - odbcConnectExcel(D:/Data/mgc/Chips/Chips4.xls)
 dd - sqlFetch(chan1,Raw)   # all data  12000
 #
 nzw - cbind(dd$NZW1C,dd$NZW2C,dd$NZW3C,dd$NZW1T,dd$NZW2T,dd$NZW3T)
 akr - cbind(dd$AKR1C,dd$AKR2C,dd$AKR3C,dd$AKR1T,dd$AKR2T,dd$AKR3T)
 bas - cbind(dd$NZW1C,dd$NZW2C,dd$NZW3C,dd$AKR1C,dd$AKR2C,dd$AKR3C)
 #
  design-matrix(c(1,1,1,1,1,1,0,0,0,1,1,1),ncol=2)
  fit1 - lmFit(nzw,design)
  fit1 - eBayes(fit1)
  topTable(fit1,adjust=fdr,number=5)
  M t  P.Value B
1  3679.480 121.24612 7.828493e-06 -4.508864
1903   3012.405 118.32859 7.828493e-06 -4.508866
9068   1850.232  92.70893 1.178902e-05 -4.508889
10635  2843.534  91.99336 1.178902e-05 -4.508890
561   18727.858  90.17085 1.178902e-05 -4.508893
 #
  fit2 - lmFit(akr,design)
  fit2 - eBayes(fit2)
  topTable(fit2,adjust=fdr,number=5)
  Mt  P.Value B
88 1426.738 80.48058 5.839462e-05 -4.510845
1964  36774.167 73.05580 5.839462e-05 -4.510861
5854   7422.578 68.60316 5.839462e-05 -4.510874
11890  1975.316 66.54480 5.839462e-05 -4.510880
9088   2696.952 64.16343 5.839462e-05 -4.510889
 #
  fit3 - lmFit(bas,design)
  fit3 - eBayes(fit3)
  topTable(fit3,adjust=fdr,number=5)
 M t  P.Value B
6262  1415.088 100.78933 2.109822e-05 -4.521016
5660  1913.479  96.40903 2.109822e-05 -4.521020
11900 4458.489  94.30738 2.109822e-05 -4.521022
9358  1522.330  80.46641 3.346749e-05 -4.521041
11773 1784.483  73.76620 3.346749e-05 -4.521053
 #Now lets do all together in Anova
 #
  all - cbind(nzw,akr)
  ts - c(1,1,1,2,2,2,3,3,3,4,4,4)
  ts - as.factor(ts)
  levels(ts) - c(nzwC,nzwT,akrC,akrT)
  design - model.matrix(~0+ts)
  colnames(design) - levels(ts)
  fit4 - lmFit(all,design)
  cont.matrix - makeContrasts(
+  Baseline = akrC - nzwC,
+  NZW_Smk = nzwT - nzwC,
+  AKR_Smk = akrT - akrC,
+  Diff = (akrT - akrC) - (nzwT - nzwC),
+  levels=design)
   fit42 - contrasts.fit(fit4,cont.matrix)
   fit42 - eBayes(fit42)
 #
   topTable(fit42,coef=Baseline,adjust=fdr,number=5)
   M t P.Value B
3189942.0993  13.57485 0.004062283 -4.528799
8607   2634.1826  11.23476 0.006913442 -4.530338
10242  -942.2860 -10.99253 0.006913442 -4.530551
283-609.0831 -10.79354 0.006913442 -4.530735
3224  -1564.2572 -10.19429 0.008089034 -4.531351

- Shouldn't this be equal to fit1 above?

   topTable(fit42,coef=NZW_Smk,adjust=fdr,number=5)
 M t   P.Value B
7724 -246.5956 -8.687324 0.1615395 -4.591133
1403 -307.8660 -7.063312 0.4066814 -4.591363
3865 -253.4899 -6.585582 0.4598217 -4.591457
3032 -509.2413 -5.841901 0.8294166 -4.591640
2490 -240.3259 -5.338679 0.9997975 -4.591795

- Shouldn't this be equal to fit2 above?
- The P.Value are unreal!!

   topTable(fit42,coef=AKR_Smk,adjust=fdr,number=5)
 Mt  P.Value B
11547 151.6622 6.380978 0.917470 -4.595085
12064 324.0851 6.337235 0.917470 -4.595085
6752  964.5478 5.858994 0.952782 -4.595086
10251 152.7587 5.339843 0.952782 -4.595087
1440  189.6056 4.933151 0.952782 -4.595089

- Shouldn't this be equal to fit3 above?
- The P.Value are unreal!!

   topTable(fit42,coef=Diff,adjust=fdr,number=5)
  M t   P.Value B
7724   302.6892  7.540195 0.4102211 -4.593201
1403   419.4962  6.805495 0.4102211 -4.593265
10251  270.5269  6.686796 0.4102211 -4.593277
3270   409.8391  6.414966 0.4192042 -4.593307
10960 -511.4711 -5.469247 0.9652171 -4.593435
 #

So the results I get from just pairwise comparisons are very significant, but
when I try the Anova way, the significance completely dissapears.
Am I doing something completely wrong?
This is data from Affimetrix mouse chips.
Thanks for any help
Heberto Ghezzo
Ph.D.
Meakins-Christie Labs
McGill University
Montreal - Canada

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] problems with compiling a package

2004-11-16 Thread r . ghezzo

Hello, I am trying to compile packages for R2.0.0 patch in a win XP machine.
Most of the packages compile without problems, with C or FTG or only R.
Now some packeges give the following error which I do not understand how to
correct
...
preparing package xxx for lazy loading
Error in names - .default('*tmp*',value=c(R,Platform,Date,   :
  names attribute[4] must be the same length as the vector [3]
Execution halted
make: *** [lazyload] Error 1

Can somebody tell me how I can correct this error?

One other question, this npreparing package for lazy loading does not occur for
all packages, although their DESCRIPTION and folders are similar, When does a
package goes to lazy loading and when it does not?
Thanks
Heberto Ghezzo
McGill U
Montreal - Canada

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] problems compiling packages in R 2.0.0

2004-10-14 Thread r . ghezzo
Hello, I am trying to get my old packages to work in R 2.0.0
in Windows XP. Here is what I did
Etc is a package of pure R functions

Rcmd INSTALL -l c:/R/R_Src/library C:/R/R_Src/src/Etc

-Making package Etc -
  adding build stamp to DESCRIPTION
  installing R files
  installing man source files
  installing indices
cat: c:/r/rw2000/library/*/CONTENTS: No such file or directory
make[2]: ***[indices] Error 1
make[1]: ***[all] Error 2
make: *** [pkg-Etc] Error 2
*** Instalation of Etc failes  ***

Removing 'c:/R/R_Src/library/Etc'

Dunnett is a package that computes the p value from Dunnett t-test
has source code in Fortran

Rcmd INSTALL -l c:/R/R_Src/library C:/R/R_Src/src/Dunnett

-Making package Dunnett -
  adding build stamp to DESCRIPTION
  making DLL ...
  ... DLL made
  installing R files
  installing man source files
  installing indices
cat: c:/r/rw2000/library/*/CONTENTS: No such file or directory
make[2]: ***[indices] Error 1
make[1]: ***[all] Error 2
make: *** [pkg-Etc] Error 2
*** Instalation of Etc failes  ***

Removing 'c:/R/R_Src/library/Etc'

Can somebody help me with that 'CONTENTS' file that does not exist?
thanks for any help.
Heberto Ghezzo
McGill University
Canada

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Re: problems installing package in R 2.0.0

2004-10-07 Thread r . ghezzo
Hello,
I just installed R 2.0.0 in a Win XP machine. As old programs do not wor I tried
to re-install them by:

C:\R\RW2000\binRcmd INSTALL c:\r\r_src\src\autologi

--Making package autologi --
  adding build stamp to DESCRIPTION
  installing R files
  installing data files
  installing man source files
  installing indices
  not zipping data
  installing help
 Building/Updating help pages for package 'autologi'
  Formats: text html latex example chm
  autologi text  html   latex   example
wc: C:/R/rw2000/library/autologi/R/autologi: No such file or directory
  adding MD5 sums

* DONE autologit

then in R

library()
Packages in library 'C:/R/rw2000/library':

autologi** No title available (pre-2.0.0 install?) **
baseThe R Base Package
 ...

my directory for autologi has the following structure:

c:\r\r_src\src\autologi\DESCRIPTION
TITLE
\R\autologi.r
\man\autologi.RD
\data\ex.dat

I could not find anything relevant in the last version of Writing R Extensions
that came with R 2.0.0.

Another question. I did a full install packages from CRAN but then comparing
the list of packages downloaded and installed with those in
CRAN/windows/contrib/2.0/ i found packages like moc, multidim, multiv, netCDF,
serialize, yags, xgobi. Can these packages be downloaded and installed or there
is something broken in them?
Thanks for any help and thanks to the R-Team.
Heberto Ghezzo - McGill University

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html