[jira] [Updated] (DRILL-5207) Improve Parquet scan pipelining

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5207:
-
Reviewer: Kunal Khatua  (was: Sudheesh Katkam)

> Improve Parquet scan pipelining
> ---
>
> Key: DRILL-5207
> URL: https://issues.apache.org/jira/browse/DRILL-5207
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>  Labels: doc-impacting
> Fix For: 1.10.0
>
>
> The parquet reader's async page reader is not quite efficiently pipelined. 
> The default size of the disk read buffer is 4MB while the page reader reads 
> ~1MB at a time. The Parquet decode is also processing 1MB at a time. This 
> means the disk is idle while the data is being processed. Reducing the buffer 
> to 1MB will reduce the time the processing thread waits for the disk read 
> thread.
> Additionally, since the data to process a page may be more or less than 1MB, 
> a queue of pages will help so that the disk scan does not block (until the 
> queue is full), waiting for the processing thread.
> Additionally, the BufferedDirectBufInputStream class reads from disk as soon 
> as it is initialized. Since this is called at setup time, this increases the 
> setup time for the query and query execution does not begin until this is 
> completed.
> There are a few other inefficiencies - options are read every time a page 
> reader is created. Reading options can be expensive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5207) Improve Parquet scan pipelining

2017-02-03 Thread Parth Chandra (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Parth Chandra updated DRILL-5207:
-
Labels: doc-impacting  (was: ready-to-commit)

> Improve Parquet scan pipelining
> ---
>
> Key: DRILL-5207
> URL: https://issues.apache.org/jira/browse/DRILL-5207
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>  Labels: doc-impacting
> Fix For: 1.10.0
>
>
> The parquet reader's async page reader is not quite efficiently pipelined. 
> The default size of the disk read buffer is 4MB while the page reader reads 
> ~1MB at a time. The Parquet decode is also processing 1MB at a time. This 
> means the disk is idle while the data is being processed. Reducing the buffer 
> to 1MB will reduce the time the processing thread waits for the disk read 
> thread.
> Additionally, since the data to process a page may be more or less than 1MB, 
> a queue of pages will help so that the disk scan does not block (until the 
> queue is full), waiting for the processing thread.
> Additionally, the BufferedDirectBufInputStream class reads from disk as soon 
> as it is initialized. Since this is called at setup time, this increases the 
> setup time for the query and query execution does not begin until this is 
> completed.
> There are a few other inefficiencies - options are read every time a page 
> reader is created. Reading options can be expensive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5207) Improve Parquet scan pipelining

2017-02-02 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-5207:
---
Labels: ready-to-commit  (was: )

> Improve Parquet scan pipelining
> ---
>
> Key: DRILL-5207
> URL: https://issues.apache.org/jira/browse/DRILL-5207
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> The parquet reader's async page reader is not quite efficiently pipelined. 
> The default size of the disk read buffer is 4MB while the page reader reads 
> ~1MB at a time. The Parquet decode is also processing 1MB at a time. This 
> means the disk is idle while the data is being processed. Reducing the buffer 
> to 1MB will reduce the time the processing thread waits for the disk read 
> thread.
> Additionally, since the data to process a page may be more or less than 1MB, 
> a queue of pages will help so that the disk scan does not block (until the 
> queue is full), waiting for the processing thread.
> Additionally, the BufferedDirectBufInputStream class reads from disk as soon 
> as it is initialized. Since this is called at setup time, this increases the 
> setup time for the query and query execution does not begin until this is 
> completed.
> There are a few other inefficiencies - options are read every time a page 
> reader is created. Reading options can be expensive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5207) Improve Parquet scan pipelining

2017-02-01 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-5207:

Fix Version/s: (was: 1.10)
   1.10.0

> Improve Parquet scan pipelining
> ---
>
> Key: DRILL-5207
> URL: https://issues.apache.org/jira/browse/DRILL-5207
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
> Fix For: 1.10.0
>
>
> The parquet reader's async page reader is not quite efficiently pipelined. 
> The default size of the disk read buffer is 4MB while the page reader reads 
> ~1MB at a time. The Parquet decode is also processing 1MB at a time. This 
> means the disk is idle while the data is being processed. Reducing the buffer 
> to 1MB will reduce the time the processing thread waits for the disk read 
> thread.
> Additionally, since the data to process a page may be more or less than 1MB, 
> a queue of pages will help so that the disk scan does not block (until the 
> queue is full), waiting for the processing thread.
> Additionally, the BufferedDirectBufInputStream class reads from disk as soon 
> as it is initialized. Since this is called at setup time, this increases the 
> setup time for the query and query execution does not begin until this is 
> completed.
> There are a few other inefficiencies - options are read every time a page 
> reader is created. Reading options can be expensive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5207) Improve Parquet scan pipelining

2017-01-20 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5207:

Reviewer: Sudheesh Katkam

Assigned Reviewer to [~sudheeshkatkam]

> Improve Parquet scan pipelining
> ---
>
> Key: DRILL-5207
> URL: https://issues.apache.org/jira/browse/DRILL-5207
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
> Fix For: 1.10
>
>
> The parquet reader's async page reader is not quite efficiently pipelined. 
> The default size of the disk read buffer is 4MB while the page reader reads 
> ~1MB at a time. The Parquet decode is also processing 1MB at a time. This 
> means the disk is idle while the data is being processed. Reducing the buffer 
> to 1MB will reduce the time the processing thread waits for the disk read 
> thread.
> Additionally, since the data to process a page may be more or less than 1MB, 
> a queue of pages will help so that the disk scan does not block (until the 
> queue is full), waiting for the processing thread.
> Additionally, the BufferedDirectBufInputStream class reads from disk as soon 
> as it is initialized. Since this is called at setup time, this increases the 
> setup time for the query and query execution does not begin until this is 
> completed.
> There are a few other inefficiencies - options are read every time a page 
> reader is created. Reading options can be expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)