Hi Hadley,
Here is a more elaborate report of what I did and what when wrong. The
example is not reproducible because the dataset is to large. A smaller
dummy dataset is not an option as it works with smaller datasets. I'm
willing to run the code again with a development version of reshape.
Cheers,
Thierry
library(RODBC)
library(reshape)
Loading required package: plyr
setwd("d:/wouter")
Sys.info()
sysname release
"Windows" "XP"
version nodename
"build 2600, Service Pack 2" "LHPA000838"
machine login
"x86" "thierry_onkelinx"
user
"thierry_onkelinx"
sessionInfo()
R version 2.7.2 (2008-08-25)
i386-pc-mingw32
locale:
LC_COLLATE=Dutch_Belgium.1252;LC_CTYPE=Dutch_Belgium.1252;LC_MONETARY=Du
tch_Belgium.1252;LC_NUMERIC=C;LC_TIME=Dutch_Belgium.1252
attached base packages:
[1] stats graphics grDevices datasets tcltk utils methods
[8] base
other attached packages:
[1] reshape_0.8.1 plyr_0.1 RODBC_1.2-3 svSocket_0.9-5
svIO_0.9-5
[6] R2HTML_1.59 svMisc_0.9-5 svIDE_0.9-5
loaded via a namespace (and not attached):
[1] tools_2.7.2
channel <- odbcConnectAccess("db1.mdb")
km <- sqlQuery(channel = channel, query = "SELECT KMhokcode AS
Location, TaxonFK AS Species FROM kmhok_periode2_selectie ORDER BY
KMhokcode, TaxonFK", as.is = TRUE)
odbcCloseAll()
km$value <- 1
dim(km)
[1] 1157024 3
length(unique(km$Location))
[1] 6354
length(unique(km$Species))
[1] 1381
system.time(tmp <- cast(Location ~ Species, data = km[1:1000, ], fill
= 0))
user system elapsed
0.11 0.00 0.17
system.time(tmp <- cast(Location ~ Species, data = km[1:10000, ], fill
= 0))
user system elapsed
1.7 0.0 1.7
system.time(tmp <- cast(Location ~ Species, data = km[1:100000, ],
fill = 0))
user system elapsed
46.42 0.45 47.02
system.time(tmp <- cast(Location ~ Species, data = km, fill = 0))
Error: cannot allocate vector of size 33.5 Mb
Timing stopped at: 322.95 3.43 327.4
system.time(tmp <- table(km$Location, km$Species))
user system elapsed
1.10 0.00 1.11
------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
[EMAIL PROTECTED]
www.inbo.be
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
-----Oorspronkelijk bericht-----
Van: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Namens hadley wickham
Verzonden: vrijdag 10 oktober 2008 14:40
Aan: ONKELINX, Thierry
CC: r-sig-ecology@r-project.org
Onderwerp: Re: [R-sig-eco] Clustering large data
Thanks for your responses. The biggest problem seems to be cast() for
the reshape package which could not handle the dataset. Peter's
solution
using the mefa package worked fine. I found an other solution: table()
which works fine to crosstabulate presence-only data.
Exactly what error did you get? Or did it just take a very long time
and then you gave up? I have an experimental rewrite of the reshape
package that is more memory efficient and much faster (10 - 20x) -
however, it's still some time from being ready for production use.
Hadley
--
http://had.co.nz/
_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in this message
and any annex are purely those of the writer and may not be regarded as stating
an official position of INBO, as long as the message is not confirmed by a duly
signed document.