Re: [Rd] serialize() to via temporary file is heaps faster than doing it directly (on Windows)

2008-08-29 Thread Henrik Bengtsson
I just want to re-post this thread in case it slipped through the
summer sieve of someone that might be interested and/or has a real
solution beyond my serialize2() patch.

Cheers

Henrik

On Thu, Jul 24, 2008 at 8:10 PM, Henrik Bengtsson [EMAIL PROTECTED] wrote:
 Hi,

 FYI, I just notice that on Windows (but not Linux) it is orders of
 magnitude (below it's 50x) faster to serialize() and object to a
 temporary file and then read it back, than to serialize to an object
 directly.  This has for instance impact on how fast digest::digest()
 can provide a checksum.

 Example:
 x - 1:1e7;
 t1 - system.time(raw1 - serialize(x, connection=NULL));
 print(t1);
 #user  system elapsed
 #   174.23  129.35  304.70  ## 5 minutes
 t2 - system.time(raw2 - serialize2(x, connection=NULL));
 print(t2);
 # user  system elapsed
 # 2.190.185.72  ## 5 seconds
 print(t1/t2);
 #  usersystem   elapsed
 #   79.55708 718.6  53.26923
 stopifnot(identical(raw1, raw2));

 where serialize2() is serialize():ing to file and reading the results back:

 serialize2 - function(object, connection, ...) {
  if (is.null(connection)) {
# It is faster to serialize to a temporary file and read it back
pathname - tempfile();
con - file(pathname, open=wb);
on.exit({
  if (!is.null(con))
close(con);
  if (file.exists(pathname))
file.remove(pathname);
});
base::serialize(object, connection=con, ...);
close(con);
con - NULL;
fileSize - file.info(pathname)$size;
readBin(pathname, what=raw, n=fileSize);
  } else {
base::serialize(object, connection=connection, ...);
  }
 } # serialize2()

 The above benchmarking was done in a fresh R v2.7.1 session on WinXP Pro:

 sessionInfo()
 R version 2.7.1 Patched (2008-06-27 r46012)
 i386-pc-mingw32

 locale:
 LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
 States.1252;LC_MON
 ETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United 
 States.1252

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base


 When I do the same on a Linux machine there is no difference:

 sessionInfo()
 R version 2.7.1 (2008-06-23)
 x86_64-unknown-linux-gnu

 locale:
 LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 Is there an obvious reason (and an obvious fix) for this?

 Cheers

 Henrik


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] serialize() to via temporary file is heaps faster than doing it directly (on Windows)

2008-07-24 Thread Henrik Bengtsson
Hi,

FYI, I just notice that on Windows (but not Linux) it is orders of
magnitude (below it's 50x) faster to serialize() and object to a
temporary file and then read it back, than to serialize to an object
directly.  This has for instance impact on how fast digest::digest()
can provide a checksum.

Example:
x - 1:1e7;
t1 - system.time(raw1 - serialize(x, connection=NULL));
print(t1);
#user  system elapsed
#   174.23  129.35  304.70  ## 5 minutes
t2 - system.time(raw2 - serialize2(x, connection=NULL));
print(t2);
# user  system elapsed
# 2.190.185.72  ## 5 seconds
print(t1/t2);
#  usersystem   elapsed
#   79.55708 718.6  53.26923
stopifnot(identical(raw1, raw2));

where serialize2() is serialize():ing to file and reading the results back:

serialize2 - function(object, connection, ...) {
  if (is.null(connection)) {
# It is faster to serialize to a temporary file and read it back
pathname - tempfile();
con - file(pathname, open=wb);
on.exit({
  if (!is.null(con))
close(con);
  if (file.exists(pathname))
file.remove(pathname);
});
base::serialize(object, connection=con, ...);
close(con);
con - NULL;
fileSize - file.info(pathname)$size;
readBin(pathname, what=raw, n=fileSize);
  } else {
base::serialize(object, connection=connection, ...);
  }
} # serialize2()

The above benchmarking was done in a fresh R v2.7.1 session on WinXP Pro:

 sessionInfo()
R version 2.7.1 Patched (2008-06-27 r46012)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MON
ETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base


When I do the same on a Linux machine there is no difference:

 sessionInfo()
R version 2.7.1 (2008-06-23)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

Is there an obvious reason (and an obvious fix) for this?

Cheers

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel