[jira] [Updated] (ARROW-6230) [R] Reading in Parquet files are 20x slower than reading fst files in R

2019-08-15 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-6230:
---
Affects Version/s: 0.14.0

> [R] Reading in Parquet files are 20x slower than reading fst files in R
> ---
>
> Key: ARROW-6230
> URL: https://issues.apache.org/jira/browse/ARROW-6230
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 0.14.0
> Environment: Windows 10 Pro and Ubuntu 
>Reporter: Zhuo Jia Dai
>Priority: Major
>  Labels: parquet
> Attachments: image-2019-08-14-10-04-56-834.png
>
>
> *Problem*
> Loading any of the data I mentioned below is 20x slower than the fst format 
> in R.
>  
> *How to get the data*
> [https://loanperformancedata.fanniemae.com/lppub/index.html]
> Register and download any of these. I can't provide the data to you, and I 
> think it's best you register.
>  
> !image-2019-08-14-10-04-56-834.png!
>  
> *Code*
> ```r
> path = "data/Performance_2016Q4.txt"
> library(data.table)
>  library(arrow)
> a = data.table::fread(path, header = FALSE)
> fst::write_fst(a, "data/a.fst")
> arrow::write_parquet(a, "data/a.parquet")
> rm(a); gc()
> #read in test
> system.time(a <- fst::read_fst("data/a.fst")) # 4.61 seconds
> rm(a); gc()
> read in test
> system.time(a <- arrow::read_parquet("data/a.parquet") # 99.19 seconds
> ```



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-6230) [R] Reading in Parquet files are 20x slower than reading fst files in R

2019-08-15 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-6230:
---
Labels: parquet  (was: paragraph)

> [R] Reading in Parquet files are 20x slower than reading fst files in R
> ---
>
> Key: ARROW-6230
> URL: https://issues.apache.org/jira/browse/ARROW-6230
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
> Environment: Windows 10 Pro and Ubuntu 
>Reporter: Zhuo Jia Dai
>Priority: Major
>  Labels: parquet
> Fix For: 0.14.1
>
> Attachments: image-2019-08-14-10-04-56-834.png
>
>
> *Problem*
> Loading any of the data I mentioned below is 20x slower than the fst format 
> in R.
>  
> *How to get the data*
> [https://loanperformancedata.fanniemae.com/lppub/index.html]
> Register and download any of these. I can't provide the data to you, and I 
> think it's best you register.
>  
> !image-2019-08-14-10-04-56-834.png!
>  
> *Code*
> ```r
> path = "data/Performance_2016Q4.txt"
> library(data.table)
>  library(arrow)
> a = data.table::fread(path, header = FALSE)
> fst::write_fst(a, "data/a.fst")
> arrow::write_parquet(a, "data/a.parquet")
> rm(a); gc()
> #read in test
> system.time(a <- fst::read_fst("data/a.fst")) # 4.61 seconds
> rm(a); gc()
> read in test
> system.time(a <- arrow::read_parquet("data/a.parquet") # 99.19 seconds
> ```



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-6230) [R] Reading in Parquet files are 20x slower than reading fst files in R

2019-08-15 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-6230:
---
Fix Version/s: (was: 0.14.1)

> [R] Reading in Parquet files are 20x slower than reading fst files in R
> ---
>
> Key: ARROW-6230
> URL: https://issues.apache.org/jira/browse/ARROW-6230
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
> Environment: Windows 10 Pro and Ubuntu 
>Reporter: Zhuo Jia Dai
>Priority: Major
>  Labels: parquet
> Attachments: image-2019-08-14-10-04-56-834.png
>
>
> *Problem*
> Loading any of the data I mentioned below is 20x slower than the fst format 
> in R.
>  
> *How to get the data*
> [https://loanperformancedata.fanniemae.com/lppub/index.html]
> Register and download any of these. I can't provide the data to you, and I 
> think it's best you register.
>  
> !image-2019-08-14-10-04-56-834.png!
>  
> *Code*
> ```r
> path = "data/Performance_2016Q4.txt"
> library(data.table)
>  library(arrow)
> a = data.table::fread(path, header = FALSE)
> fst::write_fst(a, "data/a.fst")
> arrow::write_parquet(a, "data/a.parquet")
> rm(a); gc()
> #read in test
> system.time(a <- fst::read_fst("data/a.fst")) # 4.61 seconds
> rm(a); gc()
> read in test
> system.time(a <- arrow::read_parquet("data/a.parquet") # 99.19 seconds
> ```



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-6230) [R] Reading in Parquet files are 20x slower than reading fst files in R

2019-08-15 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-6230:
---
Labels: paragraph  (was: )

> [R] Reading in Parquet files are 20x slower than reading fst files in R
> ---
>
> Key: ARROW-6230
> URL: https://issues.apache.org/jira/browse/ARROW-6230
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
> Environment: Windows 10 Pro and Ubuntu 
>Reporter: Zhuo Jia Dai
>Priority: Major
>  Labels: paragraph
> Fix For: 0.14.1
>
> Attachments: image-2019-08-14-10-04-56-834.png
>
>
> *Problem*
> Loading any of the data I mentioned below is 20x slower than the fst format 
> in R.
>  
> *How to get the data*
> [https://loanperformancedata.fanniemae.com/lppub/index.html]
> Register and download any of these. I can't provide the data to you, and I 
> think it's best you register.
>  
> !image-2019-08-14-10-04-56-834.png!
>  
> *Code*
> ```r
> path = "data/Performance_2016Q4.txt"
> library(data.table)
>  library(arrow)
> a = data.table::fread(path, header = FALSE)
> fst::write_fst(a, "data/a.fst")
> arrow::write_parquet(a, "data/a.parquet")
> rm(a); gc()
> #read in test
> system.time(a <- fst::read_fst("data/a.fst")) # 4.61 seconds
> rm(a); gc()
> read in test
> system.time(a <- arrow::read_parquet("data/a.parquet") # 99.19 seconds
> ```



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-6230) [R] Reading in Parquet files are 20x slower than reading fst files in R

2019-08-15 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-6230:
---
Summary: [R] Reading in Parquet files are 20x slower than reading fst files 
in R  (was: [R] Reading in parquent files are 20x slower than reading fst files 
in R)

> [R] Reading in Parquet files are 20x slower than reading fst files in R
> ---
>
> Key: ARROW-6230
> URL: https://issues.apache.org/jira/browse/ARROW-6230
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
> Environment: Windows 10 Pro and Ubuntu 
>Reporter: Zhuo Jia Dai
>Priority: Major
> Fix For: 0.14.1
>
> Attachments: image-2019-08-14-10-04-56-834.png
>
>
> *Problem*
> Loading any of the data I mentioned below is 20x slower than the fst format 
> in R.
>  
> *How to get the data*
> [https://loanperformancedata.fanniemae.com/lppub/index.html]
> Register and download any of these. I can't provide the data to you, and I 
> think it's best you register.
>  
> !image-2019-08-14-10-04-56-834.png!
>  
> *Code*
> ```r
> path = "data/Performance_2016Q4.txt"
> library(data.table)
>  library(arrow)
> a = data.table::fread(path, header = FALSE)
> fst::write_fst(a, "data/a.fst")
> arrow::write_parquet(a, "data/a.parquet")
> rm(a); gc()
> #read in test
> system.time(a <- fst::read_fst("data/a.fst")) # 4.61 seconds
> rm(a); gc()
> read in test
> system.time(a <- arrow::read_parquet("data/a.parquet") # 99.19 seconds
> ```



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)