[jira] [Updated] (ARROW-6876) Reading parquet file becomes really slow for 0.15.0

2019-10-14 Thread Bob (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob updated ARROW-6876:
---
Description: 
Hi,

 

I just noticed that reading a parquet file becomes really slow after I upgraded 
to 0.15.0 when using pandas.

 

Example:

*With 0.14.1*
 In [4]: %timeit df = pd.read_parquet(path)
 2.02 s ± 47.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

*With 0.15.0*
 In [5]: %timeit df = pd.read_parquet(path)
 22.9 s ± 478 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

 

The file is about 15MB in size. I am testing on the same machine using the same 
version of python and pandas.

 

Have you received similar complain? What could be the issue here?

 

Thanks a lot.

 

 

Edit1:

Some profiling I did:

0.14.1:

!image-2019-10-14-18-12-07-652.png!

 

0.15.0:

!image-2019-10-14-18-10-42-850.png!

 

  was:
Hi,

 

I just noticed that reading a parquet file becomes really slow after I upgraded 
to 0.15.0 when using pandas.

 

Example:

*With 0.14.1*
In [4]: %timeit df = pd.read_parquet(path)
2.02 s ± 47.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

*With 0.15.0*
In [5]: %timeit df = pd.read_parquet(path)
22.9 s ± 478 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

 

The file is about 15MB in size. I am testing on the same machine using the same 
version of python and pandas.

 

Have you received similar complain? What could be the issue here?

 

Thanks a lot.

 

 


> Reading parquet file becomes really slow for 0.15.0
> ---
>
> Key: ARROW-6876
> URL: https://issues.apache.org/jira/browse/ARROW-6876
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.15.0
> Environment: python3.7
>Reporter: Bob
>Priority: Major
> Attachments: image-2019-10-14-18-10-42-850.png, 
> image-2019-10-14-18-12-07-652.png
>
>
> Hi,
>  
> I just noticed that reading a parquet file becomes really slow after I 
> upgraded to 0.15.0 when using pandas.
>  
> Example:
> *With 0.14.1*
>  In [4]: %timeit df = pd.read_parquet(path)
>  2.02 s ± 47.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
> *With 0.15.0*
>  In [5]: %timeit df = pd.read_parquet(path)
>  22.9 s ± 478 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>  
> The file is about 15MB in size. I am testing on the same machine using the 
> same version of python and pandas.
>  
> Have you received similar complain? What could be the issue here?
>  
> Thanks a lot.
>  
>  
> Edit1:
> Some profiling I did:
> 0.14.1:
> !image-2019-10-14-18-12-07-652.png!
>  
> 0.15.0:
> !image-2019-10-14-18-10-42-850.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6876) Reading parquet file becomes really slow for 0.15.0

2019-10-14 Thread Bob (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob updated ARROW-6876:
---
Attachment: image-2019-10-14-18-12-07-652.png

> Reading parquet file becomes really slow for 0.15.0
> ---
>
> Key: ARROW-6876
> URL: https://issues.apache.org/jira/browse/ARROW-6876
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.15.0
> Environment: python3.7
>Reporter: Bob
>Priority: Major
> Attachments: image-2019-10-14-18-10-42-850.png, 
> image-2019-10-14-18-12-07-652.png
>
>
> Hi,
>  
> I just noticed that reading a parquet file becomes really slow after I 
> upgraded to 0.15.0 when using pandas.
>  
> Example:
> *With 0.14.1*
> In [4]: %timeit df = pd.read_parquet(path)
> 2.02 s ± 47.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
> *With 0.15.0*
> In [5]: %timeit df = pd.read_parquet(path)
> 22.9 s ± 478 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>  
> The file is about 15MB in size. I am testing on the same machine using the 
> same version of python and pandas.
>  
> Have you received similar complain? What could be the issue here?
>  
> Thanks a lot.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6876) Reading parquet file becomes really slow for 0.15.0

2019-10-14 Thread Bob (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob updated ARROW-6876:
---
Attachment: image-2019-10-14-18-10-42-850.png

> Reading parquet file becomes really slow for 0.15.0
> ---
>
> Key: ARROW-6876
> URL: https://issues.apache.org/jira/browse/ARROW-6876
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.15.0
> Environment: python3.7
>Reporter: Bob
>Priority: Major
> Attachments: image-2019-10-14-18-10-42-850.png
>
>
> Hi,
>  
> I just noticed that reading a parquet file becomes really slow after I 
> upgraded to 0.15.0 when using pandas.
>  
> Example:
> *With 0.14.1*
> In [4]: %timeit df = pd.read_parquet(path)
> 2.02 s ± 47.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
> *With 0.15.0*
> In [5]: %timeit df = pd.read_parquet(path)
> 22.9 s ± 478 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>  
> The file is about 15MB in size. I am testing on the same machine using the 
> same version of python and pandas.
>  
> Have you received similar complain? What could be the issue here?
>  
> Thanks a lot.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)