[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615615#comment-14615615 ] Gopal V commented on HIVE-10940: [~hagleitn]: this fixes the leak, but reintroduces the performance issue. Added log lines and it showed for query27 {code} 2015-07-06 13:08:31,521 INFO [InputInitializer [Map 5] #0] io.HiveInputFormat: hasObj = false, hasExpr=true 2015-07-06 13:08:31,522 INFO [InputInitializer [Map 5] #0] io.HiveInputFormat: hive.io.file.readcolumn.ids=0,6 2015-07-06 13:08:31,522 INFO [InputInitializer [Map 5] #0] io.HiveInputFormat: hive.io.file.readcolumn.names=d_date_sk,d_year {code} so it hits the serialize codepath still {code} if (!hasObj) { serializedFilterObj = Utilities.serializeObject(filterObject); } {code} HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Gunther Hagleitner Fix For: 2.0.0 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, HIVE-10940.03.patch, HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609582#comment-14609582 ] Hive QA commented on HIVE-10940: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742955/HIVE-10940.03.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9132 tests executed *Failed tests:* {noformat} TestJdbcWithMiniKdcSQLAuthBinary - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4454/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4454/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4454/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12742955 - PreCommit-HIVE-TRUNK-Build HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: 2.0.0 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, HIVE-10940.03.patch, HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609091#comment-14609091 ] Gunther Hagleitner commented on HIVE-10940: --- [~gopalv]/[~sershe]/[~prasanth_j] can you take a look? HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: 2.0.0 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609199#comment-14609199 ] Prasanth Jayachandran commented on HIVE-10940: -- evaluateMapWork and evaluateReduceWork does the same thing. Call evaluateOperators directly instead? HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: 2.0.0 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, HIVE-10940.03.patch, HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609116#comment-14609116 ] Sergey Shelukhin commented on HIVE-10940: - {noformat} // lets take a look at the operator memory requirements. {noformat} this comment seems like it was c/p-ed. Can you add comment to where the new optimizer is added indicating that it should run last? serializedFilterObject is never set anymore. Set or remove? HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: 2.0.0 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609085#comment-14609085 ] Gunther Hagleitner commented on HIVE-10940: --- This patch doesn't really work for 2 reasons: * It serializes the same or similar objects unnecessarily multiple times during planning. * It ooms in dpp cases, because the expr references a reduce sink. HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: 2.0.0 Attachments: HIVE-10940.01.patch, HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586996#comment-14586996 ] Prasanth Jayachandran commented on HIVE-10940: -- Patch mostly looks good. Although it will be good to add some debug logging after each if null checks. Also from simple reference look up we don't seem be using textual representation of the filter expression anywhere. I don't think we need to set the text representation of filter expression. If we need text representation we have methods in PlanUtils to do so. [~ashutoshc]/[~gopalv] Any idea why we set the filter expression in text form to job conf? HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587018#comment-14587018 ] Sergey Shelukhin commented on HIVE-10940: - text representation is preserved for backward compat (if you mean the original one we used to serialize). Will add logging HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587060#comment-14587060 ] Prasanth Jayachandran commented on HIVE-10940: -- +1 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-10940.01.patch, HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14585367#comment-14585367 ] Gopal V commented on HIVE-10940: With more logging, it becomes slightly clearer {code} 2015-06-14 19:00:40,473 INFO [TezChild] io.HiveInputFormat: push down initiated with filterText = (l_orderkey = 121201) filterExpr = GenericUDFOPEqual(Column[l_orderkey], Const bigint 121201) serializedFilterObj = null serializedFilterExpr = AQEAamF2YS51dGlsLkFycmF5TGlz9AECAQFvcmcuYXBhY2hlLmhhZG9vcC5oaXZlLnFsLnBsYW4uRXhwck5vZGVDb2x1bW5EZXPjAQFsX29yZGVya2X5AAABbGluZWl0Ze0BAm9yZy5hcGFjaGUuaGFkb29wLmhpdmUuc2VyZGUyLnR5cGVpbmZvLlByaW1pdGl2ZVR5cGVJbmbvAQFiaWdpbvQBA29yZy5hcGFjaGUuaGFkb29wLmhpdmUucWwucGxhbi5FeHByTm9kZUNvbnN0YW50RGVz4wEBAgcJgpztgwkBBG9yZy5hcGFjaGUuaGFkb29wLmhpdmUucWwudWRmLmdlbmVyaWMuR2VuZXJpY1VERk9QRXF1YewBAAABgj0BRVFVQcwBBW9yZy5hcGFjaGUuaGFkb29wLmlvLkJvb2xlYW5Xcml0YWJs5QEAAAECAQFib29sZWHu filterObject = null {code} HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14585333#comment-14585333 ] Gopal V commented on HIVE-10940: That was a kryo messup, the patch looks like it works exactly as expected on trunk. HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584049#comment-14584049 ] Sergey Shelukhin commented on HIVE-10940: - Failures are unrelated. [~prasanth_j] can you take a look? or tell me who is familiar with this code otherwise HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584064#comment-14584064 ] Prasanth Jayachandran commented on HIVE-10940: -- I will take a look. HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584041#comment-14584041 ] Hive QA commented on HIVE-10940: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12739303/HIVE-10940.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9007 tests executed *Failed tests:* {noformat} org.apache.hive.beeline.TestSchemaTool.testSchemaInit org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4258/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4258/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4258/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12739303 - PreCommit-HIVE-TRUNK-Build HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584088#comment-14584088 ] Gopal V commented on HIVE-10940: Doesn't make sense, but let me re-test the patch on trunk build instead of LLAP. HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584082#comment-14584082 ] Sergey Shelukhin commented on HIVE-10940: - Why would it be null always? HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584084#comment-14584084 ] Sergey Shelukhin commented on HIVE-10940: - See setFilterExpr in desc. HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)