[jira] [Updated] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10940: Assignee: (was: Sergey Shelukhin) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Fix For: 2.0.0 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, HIVE-10940.03.patch, HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10940: --- Assignee: Gunther Hagleitner HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Gunther Hagleitner Fix For: 2.0.0 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, HIVE-10940.03.patch, HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10940: -- Attachment: HIVE-10940.02.patch .02 sets the expr as a phys opt. This should avoid the overheads and only do it after dpp is done. I'm wondering if I can unset the filter altogether then (in table scan) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: 2.0.0 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10940: -- Attachment: HIVE-10940.03.patch Thanks [~sershe]. Addressed comments in 03. I forgot to handle fliter object. Other than that I've added the requested comment and delete the copy/paste comment. HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: 2.0.0 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, HIVE-10940.03.patch, HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10940: Attachment: HIVE-10940.01.patch HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-10940.01.patch, HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10940: Attachment: HIVE-10940.patch trunk patch HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)