[ 
https://issues.apache.org/jira/browse/ARROW-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-7520.
------------------------------------
    Fix Version/s: 0.17.0
         Assignee: Neal Richardson
       Resolution: Fixed

This has been addressed in ARROW-5501; RecordBatch*Writer now requires that you 
pass an {{OutputStream}} so that you can manage the file connection. The 
previously supported behavior would let you open connections you couldn't close.

> [R] Writing many batches causes a crash
> ---------------------------------------
>
>                 Key: ARROW-7520
>                 URL: https://issues.apache.org/jira/browse/ARROW-7520
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 0.15.1
>         Environment: - Session info 
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
> setting  value                      
>  version  R version 3.6.1 (2019-07-05)
> os       Windows 10 x64              
>  system   x86_64, mingw32            
>  ui       RStudio                    
>  language (EN)                       
>  collate  English_United States.1252 
>  ctype    English_United States.1252 
>  tz       America/New_York           
>  date     2020-01-08                 
>  
> - Packages 
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
> ! package      * version     date       lib source                            
>      
>    acepack        1.4.1       2016-10-29 [1] CRAN (R 3.6.1)                   
>      
>    arrow        * 0.15.1.1    2019-11-05 [1] CRAN (R 3.6.2)                   
>      
>    askpass        1.1         2019-01-13 [1] CRAN (R 3.6.1)                   
>       
>    assertthat     0.2.1       2019-03-21 [1] CRAN (R 3.6.1)                   
>      
>    backports      1.1.5       2019-10-02 [1] CRAN (R 3.6.1)                   
>      
>    base64enc      0.1-3       2015-07-28 [1] CRAN (R 3.6.0)                   
>       
>    bit            1.1-14      2018-05-29 [1] CRAN (R 3.6.0)                   
>      
>    bit64          0.9-7       2017-05-08 [1] CRAN (R 3.6.0)                   
>      
>    blob           1.2.0       2019-07-09 [1] CRAN (R 3.6.1)                   
>       
>    callr          3.3.1       2019-07-18 [1] CRAN (R 3.6.1)                   
>      
>    cellranger     1.1.0       2016-07-27 [1] CRAN (R 3.6.1)                   
>      
>    checkmate      1.9.4       2019-07-04 [1] CRAN (R 3.6.1)                   
>       
>    cli            1.1.0       2019-03-19 [1] CRAN (R 3.6.1)                   
>      
>    cluster        2.1.0       2019-06-19 [2] CRAN (R 3.6.1)                   
>       
>    codetools      0.2-16      2018-12-24 [2] CRAN (R 3.6.1)                   
>      
>    colorspace     1.4-1       2019-03-18 [1] CRAN (R 3.6.1)                   
>      
>    commonmark     1.7         2018-12-01 [1] CRAN (R 3.6.1)                   
>       
>    crayon         1.3.4       2017-09-16 [1] CRAN (R 3.6.1)                   
>      
>    credentials    1.1         2019-03-12 [1] CRAN (R 3.6.2)                   
>      
>    curl         * 4.2         2019-09-24 [1] CRAN (R 3.6.1)                   
>       
>    data.table     1.12.2      2019-04-07 [1] CRAN (R 3.6.1)                   
>      
>    DBI          * 1.0.0       2018-05-02 [1] CRAN (R 3.6.1)                   
>      
>    desc           1.2.0       2018-05-01 [1] CRAN (R 3.6.1)                   
>       
>    devtools     * 2.2.0       2019-09-07 [1] CRAN (R 3.6.1)                   
>      
>    digest         0.6.23      2019-11-23 [1] CRAN (R 3.6.1)                   
>      
>    dplyr        * 0.8.3       2019-07-04 [1] CRAN (R 3.6.1)                   
>       
>    DT             0.9         2019-09-17 [1] CRAN (R 3.6.1)                   
>      
>    ellipsis       0.3.0       2019-09-20 [1] CRAN (R 3.6.1)                   
>      
>    evaluate       0.14        2019-05-28 [1] CRAN (R 3.6.1)                   
>       
>    foreign        0.8-71      2018-07-20 [2] CRAN (R 3.6.1)                   
>      
>    Formula      * 1.2-3       2018-05-03 [1] CRAN (R 3.6.0)                   
>      
>    fs             1.3.1       2019-05-06 [1] CRAN (R 3.6.1)                   
>       
>    fst          * 0.9.0       2019-04-09 [1] CRAN (R 3.6.1)                   
>      
>    future       * 1.15.0-9000 2019-11-19 [1] Github 
> (HenrikBengtsson/future@bc241c7)
>    ggplot2      * 3.2.1       2019-08-10 [1] CRAN (R 3.6.1)                   
>       
>    globals        0.12.4      2018-10-11 [1] CRAN (R 3.6.0)                   
>      
>    glue         * 1.3.1       2019-03-12 [1] CRAN (R 3.6.1)                   
>      
>    gridExtra      2.3         2017-09-09 [1] CRAN (R 3.6.1)                   
>       
>    gt           * 0.1.0       2019-11-27 [1] Github (rstudio/gt@284bbe5)      
>      
>    gtable         0.3.0       2019-03-25 [1] CRAN (R 3.6.1)                   
>      
>    Hmisc        * 4.3-0       2019-11-07 [1] CRAN (R 3.6.1)                   
>       
>    htmlTable      1.13.2      2019-09-22 [1] CRAN (R 3.6.1)                   
>      
>  D htmltools      0.3.6.9004  2019-09-20 [1] Github 
> (rstudio/htmltools@c49b29c)    
>    htmlwidgets    1.3         2018-09-30 [1] CRAN (R 3.6.1)                   
>       
>    jsonlite     * 1.6         2018-12-07 [1] CRAN (R 3.6.1)                   
>      
>    knitr          1.25        2019-09-18 [1] CRAN (R 3.6.1)                   
>      
>    lattice      * 0.20-38     2018-11-04 [2] CRAN (R 3.6.1)                   
>       
>    latticeExtra   0.6-28      2016-02-09 [1] CRAN (R 3.6.1)                   
>      
>    lazyeval       0.2.2       2019-03-15 [1] CRAN (R 3.6.1)                   
>      
>    lifecycle      0.1.0       2019-08-01 [1] CRAN (R 3.6.1)                   
>       
>    listenv        0.7.0       2018-01-21 [1] CRAN (R 3.6.1)                   
>      
>    lubridate    * 1.7.4       2018-04-11 [1] CRAN (R 3.6.1)                   
>      
>    magrittr     * 1.5         2014-11-22 [1] CRAN (R 3.6.1)                   
>       
>    Matrix         1.2-17      2019-03-22 [2] CRAN (R 3.6.1)                   
>      
>    memoise        1.1.0       2017-04-21 [1] CRAN (R 3.6.1)                   
>      
>    munsell        0.5.0       2018-06-12 [1] CRAN (R 3.6.1)                   
>       
>    nnet           7.3-12      2016-02-02 [2] CRAN (R 3.6.1)                   
>      
>    openssl        1.4.1       2019-07-18 [1] CRAN (R 3.6.1)                   
>      
>    outliers     * 0.14        2011-01-24 [1] CRAN (R 3.6.0)                   
>       
>    pillar         1.4.2       2019-06-29 [1] CRAN (R 3.6.1)                   
>      
>    pkgbuild       1.0.5       2019-08-26 [1] CRAN (R 3.6.1)                   
>      
>    pkgconfig      2.0.2       2018-08-16 [1] CRAN (R 3.6.1)                   
>       
>    pkgload        1.0.2       2018-10-29 [1] CRAN (R 3.6.1)                   
>      
>    plyr         * 1.8.4       2016-06-08 [1] CRAN (R 3.6.1)                   
>      
>    prettyunits    1.0.2       2015-07-13 [1] CRAN (R 3.6.1)                   
>       
>    processx       3.4.1       2019-07-18 [1] CRAN (R 3.6.1)                   
>      
>    pryr         * 0.1.4       2018-02-18 [1] CRAN (R 3.6.1)                   
>      
>    ps             1.3.0       2018-12-21 [1] CRAN (R 3.6.1)                   
>      
>    purrr        * 0.3.2       2019-03-15 [1] CRAN (R 3.6.1)                   
>      
>    R6           * 2.4.1       2019-11-12 [1] CRAN (R 3.6.1)                   
>      
>    RColorBrewer   1.1-2       2014-12-07 [1] CRAN (R 3.6.0)                   
>      
>    Rcpp           1.0.3       2019-11-08 [1] CRAN (R 3.6.1)                   
>      
>    readxl       * 1.3.1       2019-03-13 [1] CRAN (R 3.6.1)                   
>      
>    remotes        2.1.0       2019-06-24 [1] CRAN (R 3.6.1)                   
>      
>    rlang        * 0.4.2       2019-11-23 [1] CRAN (R 3.6.1)                   
>      
>    rmarkdown    * 2.0.3       2019-12-19 [1] Github 
> (rstudio/rmarkdown@26cc3b1)    
>    RODBC        * 1.3-16      2019-09-03 [1] CRAN (R 3.6.1)                   
>      
>    roxygen2     * 6.1.1       2018-11-07 [1] CRAN (R 3.6.1)                   
>      
>    rpart          4.1-15      2019-04-12 [2] CRAN (R 3.6.1)                   
>      
>    rprojroot      1.3-2       2018-01-03 [1] CRAN (R 3.6.1)                   
>      
>    RSQLite      * 2.1.2       2019-07-24 [1] CRAN (R 3.6.1)                   
>      
>    rstudioapi     0.10        2019-03-19 [1] CRAN (R 3.6.1)                   
>      
>    scales         1.0.0       2018-08-09 [1] CRAN (R 3.6.1)                   
>      
>    sessioninfo    1.1.1       2018-11-05 [1] CRAN (R 3.6.1)                   
>      
>    slide        * 0.0.0.9002  2019-11-27 [1] Github 
> (DavisVaughan/slide@92e8e02)   
>    ssh            0.6         2019-04-09 [1] CRAN (R 3.6.2)                   
>      
>    stringi        1.4.3       2019-03-12 [1] CRAN (R 3.6.0)                   
>      
>    stringr      * 1.4.0       2019-02-10 [1] CRAN (R 3.6.1)                   
>      
>    survival     * 2.44-1.1    2019-04-01 [2] CRAN (R 3.6.1)                   
>      
>    testthat       2.2.1       2019-07-25 [1] CRAN (R 3.6.1)                   
>      
>    tibble         2.1.3       2019-06-06 [1] CRAN (R 3.6.1)                   
>      
>    tidyr        * 1.0.0       2019-09-11 [1] CRAN (R 3.6.1)                   
>      
>    tidyselect     0.2.5       2018-10-11 [1] CRAN (R 3.6.1)                   
>      
>    usethis      * 1.5.1       2019-07-04 [1] CRAN (R 3.6.1)                   
>      
>    varhandle    * 2.0.3       2018-07-04 [1] CRAN (R 3.6.0)                   
>      
>    vctrs          0.2.0.9007  2019-11-27 [1] Github (r-lib/vctrs@945809e)     
>      
>    withr          2.1.2       2018-03-15 [1] CRAN (R 3.6.1)                   
>      
>    xfun           0.9         2019-08-21 [1] CRAN (R 3.6.1)                   
>      
>    xml2         * 1.2.2       2019-08-09 [1] CRAN (R 3.6.1)                   
>      
>    xts          * 0.11-2      2018-11-05 [1] CRAN (R 3.6.1)                   
>      
>    zoo          * 1.8-6       2019-05-28 [1] CRAN (R 3.6.1)                   
>      
>  
> [1] C:/Users/cklar/Desktop/R packages
> [2] C:/Program Files/R/R-3.6.1/library
>  
> P -- Loaded and on-disk path mismatch.
> D -- DLL MD5 mismatch, broken installation.
>            Reporter: Christian
>            Assignee: Neal Richardson
>            Priority: Trivial
>             Fix For: 0.17.0
>
>
> Hi,
> When creating north of 200-300 batches, the writing to the arrow file crashes 
> R - it doesn't even show an error message. Rstudio just aborts.
> I have the feeling that maybe each batch becomes a stream and R has issues 
> with the connections, but that's a total guess.
> Any help would be appreciated.
>  
> ##
>  
> Here is the function. When running it with 3000 it crashes immediately.
> Before that I ran it with 100, and then increased it slowly, and then it 
> randomly crashed again.
>  
> ##
> Now I received this error message after writing 30 batches.
> Error in ipc___RecordBatchWriter__WriteRecordBatch(self, batch) : 
>  Invalid: Invalid operation on closed file
>  Error in ipc___RecordBatchWriter__WriteRecordBatch(self, batch) : 
>  Invalid: Invalid operation on closed file
> ##
> write_arrow_custom(data.frame(A=c(1:100000),B=c(1:100000)),'C:/Temp/test.arrow',3000)
>  
> write_arrow_custom <- function(df,targetarrow,nrbatches) {
>   ct <- nrbatches
>   idxs <- c(0:ct)/ct*nrow(df)
>   idxs <- round(idxs,0) %>% as.integer()
>   idxs[length(idxs)] <- nrow(df)
>   df_nav <- idxs %>% as.data.frame() %>% rename(colfrom=1) %>% 
> mutate(colto=lead(colfrom)) %>% mutate(colfrom=colfrom+1) %>% 
> filter(!is.na(colto)) %>% mutate(R=row_number())
>   stopifnot(df_nav %>% mutate(chk=colto-colfrom+1) %>% '$'('chk') %>% 
> sum()==nrow(df))
>   table_df <- Table$create(name=rownames(df[1,]),df[1,])
>   writer <- RecordBatchFileWriter$create(targetarrow,table_df$schema)
>   df_nav %>% dlply(c('R'),function(df_nav)
> {     catl(glue('\\{df_nav$colfrom[1]}
> :{df_nav$colto[1]} / {df_nav$R[1]}...'))
>     tmp <- df[df_nav$colfrom[1]:df_nav$colto[1],]
>     writer$write_batch(record_batch(name = rownames(tmp), tmp))
>     NULL
>   }) -> batch_lst
>   writer$close()
>   rm(batch_lst)
>   gc()
> }
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to