[jira] [Updated] (HIVE-4105) Hive MapJoinOperator unnecessarily deserializes values for all join-keys

2013-04-24 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4105:
---

Fix Version/s: (was: 0.12.0)
   0.11.0

 Hive MapJoinOperator unnecessarily deserializes values for all join-keys
 

 Key: HIVE-4105
 URL: https://issues.apache.org/jira/browse/HIVE-4105
 Project: Hive
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.11.0

 Attachments: HIVE-4105-20130301.1.txt, HIVE-4105-20130301.txt, 
 HIVE-4105-20130415.txt, HIVE-4105.patch


 We can avoid this for inner-joins. Hive does an explicit value 
 de-serialization up front so even for those rows which won't emit output. In 
 these cases, we can do just with key de-serialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4105) Hive MapJoinOperator unnecessarily deserializes values for all join-keys

2013-04-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4105:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Vinod!

 Hive MapJoinOperator unnecessarily deserializes values for all join-keys
 

 Key: HIVE-4105
 URL: https://issues.apache.org/jira/browse/HIVE-4105
 Project: Hive
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.12.0

 Attachments: HIVE-4105-20130301.1.txt, HIVE-4105-20130301.txt, 
 HIVE-4105-20130415.txt, HIVE-4105.patch


 We can avoid this for inner-joins. Hive does an explicit value 
 de-serialization up front so even for those rows which won't emit output. In 
 these cases, we can do just with key de-serialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4105) Hive MapJoinOperator unnecessarily deserializes values for all join-keys

2013-04-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-4105:
--

Attachment: HIVE-4105-20130415.txt

Yes, the clearing of the row should happen independent of row-generation. 
Attaching updated patch addressing the review comment.

 Hive MapJoinOperator unnecessarily deserializes values for all join-keys
 

 Key: HIVE-4105
 URL: https://issues.apache.org/jira/browse/HIVE-4105
 Project: Hive
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: HIVE-4105-20130301.1.txt, HIVE-4105-20130301.txt, 
 HIVE-4105-20130415.txt, HIVE-4105.patch


 We can avoid this for inner-joins. Hive does an explicit value 
 de-serialization up front so even for those rows which won't emit output. In 
 these cases, we can do just with key de-serialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4105) Hive MapJoinOperator unnecessarily deserializes values for all join-keys

2013-04-09 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-4105:
--

Assignee: Vinod Kumar Vavilapalli
  Status: Patch Available  (was: Open)

 Hive MapJoinOperator unnecessarily deserializes values for all join-keys
 

 Key: HIVE-4105
 URL: https://issues.apache.org/jira/browse/HIVE-4105
 Project: Hive
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: HIVE-4105-20130301.1.txt, HIVE-4105-20130301.txt, 
 HIVE-4105.patch


 We can avoid this for inner-joins. Hive does an explicit value 
 de-serialization up front so even for those rows which won't emit output. In 
 these cases, we can do just with key de-serialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4105) Hive MapJoinOperator unnecessarily deserializes values for all join-keys

2013-03-01 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-4105:
--

Attachment: HIVE-4105-20130301.txt

Here's a patch to avoid value de-serialization where not needed in case of 
inner join.

In my microbenchmark, where I was map-joining a big table, with a small table, 
this brought the task execution time down from 15seconds to 10seconds on about 
3 million records on the big table, the second table being very small and the 
output is small too. Note that you won't see this much of an improvement for 
non-selective inner joins.

If folks are interested, I'll try productionizing the benchmark.

 Hive MapJoinOperator unnecessarily deserializes values for all join-keys
 

 Key: HIVE-4105
 URL: https://issues.apache.org/jira/browse/HIVE-4105
 Project: Hive
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
 Attachments: HIVE-4105-20130301.txt


 We can avoid this for inner-joins. Hive does an explicit value 
 de-serialization up front so even for those rows which won't emit output. In 
 these cases, we can do just with key de-serialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4105) Hive MapJoinOperator unnecessarily deserializes values for all join-keys

2013-03-01 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-4105:
--

Attachment: HIVE-4105-20130301.1.txt

Patch upmerged to the latest trunk.

 Hive MapJoinOperator unnecessarily deserializes values for all join-keys
 

 Key: HIVE-4105
 URL: https://issues.apache.org/jira/browse/HIVE-4105
 Project: Hive
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
 Attachments: HIVE-4105-20130301.1.txt, HIVE-4105-20130301.txt


 We can avoid this for inner-joins. Hive does an explicit value 
 de-serialization up front so even for those rows which won't emit output. In 
 these cases, we can do just with key de-serialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira