Sameh,

if it's a matrix, that's easy as you can write it directly which is the fastest 
possible way without compression - e.g. quick proof of concept:

n <- 20000^2
A <- matrix(runif(n), ncol = sqrt(n))

## write (dim + payload)
con <- file(description = "matrix_file", open = "wb")
system.time({
writeBin(d <- dim(A), con)
dim(A)=NULL
writeBin(A, con)
dim(A)=d
})
close(con)

## read
con <- file(description = "matrix_file", open = "rb")
system.time({
d <- readBin(con, 1L, 2)
A1 <- readBin(con, 1, d[1] * d[2])
dim(A1) <- d
})
close(con)
identical(A, A1)

   user  system elapsed 
  0.931   2.713   3.644 
   user  system elapsed 
  0.089   1.360   1.451 
[1] TRUE

So it's really just limited by the speed of your disk, parallelization won't 
help here.

Note that in general you get faster read times by using compression as most 
data is reasonably compressible, so that's where parallelization can be useful. 
There are plenty of package with more tricks like mmapping the files etc., but 
the above is just base R.

Cheers,
Simon



> On 9/05/2024, at 3:20 PM, Sameh Abdulah <sameh.abdu...@kaust.edu.sa> wrote:
> 
> Hi,
> 
> I need to serialize and save a 20K x 20K matrix as a binary file. This 
> process is significantly slower in R compared to Python (4X slower).
> 
> I'm not sure about the best approach to optimize the below code. Is it 
> possible to parallelize the serialization function to enhance performance?
> 
> 
>  n <- 20000^2
>  cat("Generating matrices ... ")
>  INI.TIME <- proc.time()
>  A <- matrix(runif(n), ncol = m)
>  END_GEN.TIME <- proc.time()
>  arg_ser <- serialize(object = A, connection = NULL)
> 
>  END_SER.TIME <- proc.time()
>  con <- file(description = "matrix_file", open = "wb")
>  writeBin(object = arg_ser, con = con)
>  close(con)
>  END_WRITE.TIME <- proc.time()
>  con <- file(description = "matrix_file", open = "rb")
>  par_raw <- readBin(con, what = raw(), n = file.info("matrix_file")$size)
>  END_READ.TIME <- proc.time()
>  B <- unserialize(connection = par_raw)
>  close(con)
>  END_DES.TIME <- proc.time()
>  TIME <- END_GEN.TIME - INI.TIME
>  cat("Generation time", TIME[3], " seconds.")
> 
>  TIME <- END_SER.TIME - END_GEN.TIME
>  cat("Serialization time", TIME[3], " seconds.")
> 
>  TIME <- END_WRITE.TIME - END_SER.TIME
>  cat("Writting time", TIME[3], " seconds.")
> 
>  TIME <- END_READ.TIME - END_WRITE.TIME
>  cat("Read time", TIME[3], " seconds.")
> 
>  TIME <- END_DES.TIME - END_READ.TIME
>  cat("Deserialize time", TIME[3], " seconds.")
> 
> 
> 
> 
> Best,
> --Sameh
> 
> -- 
> 
> This message and its contents, including attachments are intended solely 
> for the original recipient. If you are not the intended recipient or have 
> received this message in error, please notify me immediately and delete 
> this message from your computer system. Any unauthorized use or 
> distribution is prohibited. Please consider the environment before printing 
> this email.
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> 

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Reply via email to