[jira] [Updated] (HIVE-15055) Column pruning for nested fields in Parquet
[ https://issues.apache.org/jira/browse/HIVE-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-15055: Attachment: benchmark-hos.pdf > Column pruning for nested fields in Parquet > --- > > Key: HIVE-15055 > URL: https://issues.apache.org/jira/browse/HIVE-15055 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer, Physical Optimizer, > Serializers/Deserializers >Reporter: Chao Sun >Assignee: Chao Sun > Labels: performance > Attachments: benchmark-hos.pdf, design-doc-nested-column-pruning.pdf > > > Some columnar file formats such as Parquet store fields in struct type also > column by column using encoding described in Google Dramel pager. It's very > common in big data where data are stored in structs while queries only needs > a subset of the the fields in the structs. However, presently Hive still > needs to read the whole struct regardless whether all fields are selected. > Therefore, pruning unwanted sub-fields in struct or nested fields at file > reading time would be a big performance boost for such scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15055) Column pruning for nested fields in Parquet
[ https://issues.apache.org/jira/browse/HIVE-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-15055: Attachment: (was: benchmark-hos.pdf) > Column pruning for nested fields in Parquet > --- > > Key: HIVE-15055 > URL: https://issues.apache.org/jira/browse/HIVE-15055 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer, Physical Optimizer, > Serializers/Deserializers >Reporter: Chao Sun >Assignee: Chao Sun > Labels: performance > Attachments: benchmark-hos.pdf, design-doc-nested-column-pruning.pdf > > > Some columnar file formats such as Parquet store fields in struct type also > column by column using encoding described in Google Dramel pager. It's very > common in big data where data are stored in structs while queries only needs > a subset of the the fields in the structs. However, presently Hive still > needs to read the whole struct regardless whether all fields are selected. > Therefore, pruning unwanted sub-fields in struct or nested fields at file > reading time would be a big performance boost for such scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15055) Column pruning for nested fields in Parquet
[ https://issues.apache.org/jira/browse/HIVE-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-15055: Attachment: benchmark-hos.pdf [~Ferd], sure - updated. > Column pruning for nested fields in Parquet > --- > > Key: HIVE-15055 > URL: https://issues.apache.org/jira/browse/HIVE-15055 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer, Physical Optimizer, > Serializers/Deserializers >Reporter: Chao Sun >Assignee: Chao Sun > Labels: performance > Attachments: benchmark-hos.pdf, design-doc-nested-column-pruning.pdf > > > Some columnar file formats such as Parquet store fields in struct type also > column by column using encoding described in Google Dramel pager. It's very > common in big data where data are stored in structs while queries only needs > a subset of the the fields in the structs. However, presently Hive still > needs to read the whole struct regardless whether all fields are selected. > Therefore, pruning unwanted sub-fields in struct or nested fields at file > reading time would be a big performance boost for such scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15055) Column pruning for nested fields in Parquet
[ https://issues.apache.org/jira/browse/HIVE-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-15055: Attachment: (was: benchmark-hos.pdf) > Column pruning for nested fields in Parquet > --- > > Key: HIVE-15055 > URL: https://issues.apache.org/jira/browse/HIVE-15055 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer, Physical Optimizer, > Serializers/Deserializers >Reporter: Chao Sun >Assignee: Chao Sun > Labels: performance > Attachments: design-doc-nested-column-pruning.pdf > > > Some columnar file formats such as Parquet store fields in struct type also > column by column using encoding described in Google Dramel pager. It's very > common in big data where data are stored in structs while queries only needs > a subset of the the fields in the structs. However, presently Hive still > needs to read the whole struct regardless whether all fields are selected. > Therefore, pruning unwanted sub-fields in struct or nested fields at file > reading time would be a big performance boost for such scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15055) Column pruning for nested fields in Parquet
[ https://issues.apache.org/jira/browse/HIVE-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-15055: Attachment: benchmark-hos.pdf > Column pruning for nested fields in Parquet > --- > > Key: HIVE-15055 > URL: https://issues.apache.org/jira/browse/HIVE-15055 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer, Physical Optimizer, > Serializers/Deserializers >Reporter: Chao Sun >Assignee: Chao Sun > Labels: performance > Attachments: benchmark-hos.pdf, design-doc-nested-column-pruning.pdf > > > Some columnar file formats such as Parquet store fields in struct type also > column by column using encoding described in Google Dramel pager. It's very > common in big data where data are stored in structs while queries only needs > a subset of the the fields in the structs. However, presently Hive still > needs to read the whole struct regardless whether all fields are selected. > Therefore, pruning unwanted sub-fields in struct or nested fields at file > reading time would be a big performance boost for such scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15055) Column pruning for nested fields in Parquet
[ https://issues.apache.org/jira/browse/HIVE-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-15055: Attachment: design-doc-nested-column-pruning.pdf > Column pruning for nested fields in Parquet > --- > > Key: HIVE-15055 > URL: https://issues.apache.org/jira/browse/HIVE-15055 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer, Physical Optimizer, > Serializers/Deserializers >Reporter: Chao Sun >Assignee: Chao Sun > Labels: performance > Attachments: design-doc-nested-column-pruning.pdf > > > Some columnar file formats such as Parquet store fields in struct type also > column by column using encoding described in Google Dramel pager. It's very > common in big data where data are stored in structs while queries only needs > a subset of the the fields in the structs. However, presently Hive still > needs to read the whole struct regardless whether all fields are selected. > Therefore, pruning unwanted sub-fields in struct or nested fields at file > reading time would be a big performance boost for such scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15055) Column pruning for nested fields in Parquet
[ https://issues.apache.org/jira/browse/HIVE-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-15055: Labels: performance (was: ) > Column pruning for nested fields in Parquet > --- > > Key: HIVE-15055 > URL: https://issues.apache.org/jira/browse/HIVE-15055 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer, Physical Optimizer, > Serializers/Deserializers >Reporter: Chao Sun >Assignee: Chao Sun > Labels: performance > > Some columnar file formats such as Parquet store fields in struct type also > column by column using encoding described in Google Dramel pager. It's very > common in big data where data are stored in structs while queries only needs > a subset of the the fields in the structs. However, presently Hive still > needs to read the whole struct regardless whether all fields are selected. > Therefore, pruning unwanted sub-fields in struct or nested fields at file > reading time would be a big performance boost for such scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15055) Column pruning for nested fields in Parquet
[ https://issues.apache.org/jira/browse/HIVE-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-15055: Component/s: Serializers/Deserializers > Column pruning for nested fields in Parquet > --- > > Key: HIVE-15055 > URL: https://issues.apache.org/jira/browse/HIVE-15055 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer, Physical Optimizer, > Serializers/Deserializers >Reporter: Chao Sun >Assignee: Chao Sun > > Some columnar file formats such as Parquet store fields in struct type also > column by column using encoding described in Google Dramel pager. It's very > common in big data where data are stored in structs while queries only needs > a subset of the the fields in the structs. However, presently Hive still > needs to read the whole struct regardless whether all fields are selected. > Therefore, pruning unwanted sub-fields in struct or nested fields at file > reading time would be a big performance boost for such scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15055) Column pruning for nested fields in Parquet
[ https://issues.apache.org/jira/browse/HIVE-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-15055: Issue Type: New Feature (was: Improvement) > Column pruning for nested fields in Parquet > --- > > Key: HIVE-15055 > URL: https://issues.apache.org/jira/browse/HIVE-15055 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer, Physical Optimizer >Reporter: Chao Sun >Assignee: Chao Sun > > Some columnar file formats such as Parquet store fields in struct type also > column by column using encoding described in Google Dramel pager. It's very > common in big data where data are stored in structs while queries only needs > a subset of the the fields in the structs. However, presently Hive still > needs to read the whole struct regardless whether all fields are selected. > Therefore, pruning unwanted sub-fields in struct or nested fields at file > reading time would be a big performance boost for such scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)