[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-07-06 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615615#comment-14615615
 ] 

Gopal V commented on HIVE-10940:


[~hagleitn]: this fixes the leak, but reintroduces the performance issue. Added 
log lines and it showed for query27

{code}
2015-07-06 13:08:31,521 INFO [InputInitializer [Map 5] #0] io.HiveInputFormat: 
hasObj = false, hasExpr=true
2015-07-06 13:08:31,522 INFO [InputInitializer [Map 5] #0] io.HiveInputFormat: 
hive.io.file.readcolumn.ids=0,6
2015-07-06 13:08:31,522 INFO [InputInitializer [Map 5] #0] io.HiveInputFormat: 
hive.io.file.readcolumn.names=d_date_sk,d_year
{code}

so it hits the serialize codepath still

{code}
 if (!hasObj) {
   serializedFilterObj = Utilities.serializeObject(filterObject);
 }
{code}

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Gunther Hagleitner
 Fix For: 2.0.0

 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, 
 HIVE-10940.03.patch, HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609582#comment-14609582
 ] 

Hive QA commented on HIVE-10940:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742955/HIVE-10940.03.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9132 tests executed
*Failed tests:*
{noformat}
TestJdbcWithMiniKdcSQLAuthBinary - did not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4454/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4454/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4454/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12742955 - PreCommit-HIVE-TRUNK-Build

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: 2.0.0

 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, 
 HIVE-10940.03.patch, HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-30 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609091#comment-14609091
 ] 

Gunther Hagleitner commented on HIVE-10940:
---

[~gopalv]/[~sershe]/[~prasanth_j] can you take a look?

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: 2.0.0

 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, 
 HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-30 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609199#comment-14609199
 ] 

Prasanth Jayachandran commented on HIVE-10940:
--

evaluateMapWork and evaluateReduceWork does the same thing. Call 
evaluateOperators directly instead?

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: 2.0.0

 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, 
 HIVE-10940.03.patch, HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-30 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609116#comment-14609116
 ] 

Sergey Shelukhin commented on HIVE-10940:
-

{noformat}
// lets take a look at the operator memory requirements.
{noformat}
this comment seems like it was c/p-ed.

Can you add comment to where the new optimizer is added indicating that it 
should run last?

serializedFilterObject is never set anymore. Set or remove?

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: 2.0.0

 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, 
 HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-30 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609085#comment-14609085
 ] 

Gunther Hagleitner commented on HIVE-10940:
---

This patch doesn't really work for 2 reasons:

   * It serializes the same or similar objects unnecessarily multiple times 
during planning. 
   * It ooms in dpp cases, because the expr references a reduce sink.

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: 2.0.0

 Attachments: HIVE-10940.01.patch, HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-15 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586996#comment-14586996
 ] 

Prasanth Jayachandran commented on HIVE-10940:
--

Patch mostly looks good. Although it will be good to add some debug logging 
after each if null checks. Also from simple reference look up we don't seem be 
using textual representation of the filter expression anywhere. I don't think 
we need to set the text representation of filter expression. If we need text 
representation we have methods in PlanUtils to do so.

[~ashutoshc]/[~gopalv] Any idea why we set the filter expression in text form 
to job conf?

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587018#comment-14587018
 ] 

Sergey Shelukhin commented on HIVE-10940:
-

text representation is preserved for backward compat (if you mean the original 
one we used to serialize). Will add logging

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-15 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587060#comment-14587060
 ] 

Prasanth Jayachandran commented on HIVE-10940:
--

+1

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-10940.01.patch, HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-14 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14585367#comment-14585367
 ] 

Gopal V commented on HIVE-10940:


With more logging, it becomes slightly clearer 

{code}
2015-06-14 19:00:40,473 INFO [TezChild] io.HiveInputFormat: push down initiated 
with  filterText = (l_orderkey = 121201) filterExpr = 
GenericUDFOPEqual(Column[l_orderkey], Const bigint 121201) 
serializedFilterObj = null serializedFilterExpr = 
AQEAamF2YS51dGlsLkFycmF5TGlz9AECAQFvcmcuYXBhY2hlLmhhZG9vcC5oaXZlLnFsLnBsYW4uRXhwck5vZGVDb2x1bW5EZXPjAQFsX29yZGVya2X5AAABbGluZWl0Ze0BAm9yZy5hcGFjaGUuaGFkb29wLmhpdmUuc2VyZGUyLnR5cGVpbmZvLlByaW1pdGl2ZVR5cGVJbmbvAQFiaWdpbvQBA29yZy5hcGFjaGUuaGFkb29wLmhpdmUucWwucGxhbi5FeHByTm9kZUNvbnN0YW50RGVz4wEBAgcJgpztgwkBBG9yZy5hcGFjaGUuaGFkb29wLmhpdmUucWwudWRmLmdlbmVyaWMuR2VuZXJpY1VERk9QRXF1YewBAAABgj0BRVFVQcwBBW9yZy5hcGFjaGUuaGFkb29wLmlvLkJvb2xlYW5Xcml0YWJs5QEAAAECAQFib29sZWHu
 filterObject = null
{code}

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-14 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14585333#comment-14585333
 ] 

Gopal V commented on HIVE-10940:


That was a kryo messup, the patch looks like it works exactly as expected on 
trunk.

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584049#comment-14584049
 ] 

Sergey Shelukhin commented on HIVE-10940:
-

Failures are unrelated. [~prasanth_j] can you take a look? or tell me who is 
familiar with this code otherwise

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-12 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584064#comment-14584064
 ] 

Prasanth Jayachandran commented on HIVE-10940:
--

I will take a look. 

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584041#comment-14584041
 ] 

Hive QA commented on HIVE-10940:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739303/HIVE-10940.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9007 tests executed
*Failed tests:*
{noformat}
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4258/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4258/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4258/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12739303 - PreCommit-HIVE-TRUNK-Build

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-12 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584088#comment-14584088
 ] 

Gopal V commented on HIVE-10940:


Doesn't make sense, but let me re-test the patch on trunk build instead of LLAP.

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584082#comment-14584082
 ] 

Sergey Shelukhin commented on HIVE-10940:
-

Why would it be null always?

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584084#comment-14584084
 ] 

Sergey Shelukhin commented on HIVE-10940:
-

See setFilterExpr in desc.

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)