[ https://issues.apache.org/jira/browse/ARROW-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney reopened ARROW-6230: --------------------------------- > [R] Reading in Parquet files are 20x slower than reading fst files in R > ----------------------------------------------------------------------- > > Key: ARROW-6230 > URL: https://issues.apache.org/jira/browse/ARROW-6230 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Affects Versions: 0.14.0 > Environment: Windows 10 Pro and Ubuntu > Reporter: Zhuo Jia Dai > Assignee: Wes McKinney > Priority: Major > Labels: parquet > Fix For: 0.15.0 > > Attachments: image-2019-08-14-10-04-56-834.png > > > *Problem* > Loading any of the data I mentioned below is 20x slower than the fst format > in R. > > *How to get the data* > [https://loanperformancedata.fanniemae.com/lppub/index.html] > Register and download any of these. I can't provide the data to you, and I > think it's best you register. > > !image-2019-08-14-10-04-56-834.png! > > *Code* > ```r > path = "data/Performance_2016Q4.txt" > library(data.table) > library(arrow) > a = data.table::fread(path, header = FALSE) > fst::write_fst(a, "data/a.fst") > arrow::write_parquet(a, "data/a.parquet") > rm(a); gc() > #read in test > system.time(a <- fst::read_fst("data/a.fst")) # 4.61 seconds > rm(a); gc() > read in test > system.time(a <- arrow::read_parquet("data/a.parquet") # 99.19 seconds > ``` -- This message was sent by Atlassian JIRA (v7.6.14#76016)