[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003698#comment-15003698 ] Lefty Leverenz commented on HIVE-11587: --- Changing the doc label from TODOC2.0 to TODOC1.3. > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Labels: TODOC2.0 > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002671#comment-15002671 ] Sergey Shelukhin commented on HIVE-11587: - Which subtasks should be backported? > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002674#comment-15002674 ] Wei Zheng commented on HIVE-11587: -- HIVE-11467 HIVE-10793 HIVE-11449 > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002677#comment-15002677 ] Wei Zheng commented on HIVE-11587: -- Oh, looks like we only need to backport HIVE-10793, since the other two have already been in branch-1 > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002726#comment-15002726 ] Sergey Shelukhin commented on HIVE-11587: - Backported this one. HIVE-10793 has conflict in cherry-pick, can you backport? > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Labels: TODOC2.0 > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002727#comment-15002727 ] Sergey Shelukhin commented on HIVE-11587: - Actuallly nm, looks like it's already committed to 1.3 > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Labels: TODOC2.0 > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002730#comment-15002730 ] Wei Zheng commented on HIVE-11587: -- Great, I was expecting there's some conflicts.. Thank you! > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Labels: TODOC2.0 > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002600#comment-15002600 ] Wei Zheng commented on HIVE-11587: -- Sounds right. In that case the few subtasks of this jira also need to be backported. I will get those patches ready for branch-1, and you can do the honor to commit them :) > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740362#comment-14740362 ] Lefty Leverenz commented on HIVE-11587: --- Doc note: This adds configuration parameter *hive.mapjoin.optimized.hashtable.probe.percent* to HiveConf.java, so it will need to be documented in the wiki for release 2.0.0. It could go at the end of the Query and DDL Execution section (before SerDes and I/O) or after *hive.mapjoin.optimized.hashtable* and *hive.mapjoin.optimized.hashtable.wbsize*. * [Configuration Properties -- Query and DDL Execution -- hive.optimize.distinct.rewrite (end of the section) | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.optimize.distinct.rewrite] * [Configuration Properties -- Query and DDL Execution -- hive.mapjoin.optimized.hashtable & hive.mapjoin.optimized.hashtable.wbsize | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.mapjoin.optimized.hashtable] The default of 0.5 seems to indicate that the value isn't really a percentage but rather a fraction. That should be clarified in the wiki. > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735498#comment-14735498 ] Wei Zheng commented on HIVE-11587: -- The only thing that is left alone is the "memUsage" param passed to MapJoinBytesTableContainer. I didn't change that since the regular join doesn't have any problem with the ballpark max probe space. I'm afraid it may cause some potential issues if I adjust this number. If we do want to change this for regular join case too, then we'd better create a separate JIRA to track that. Let me know your opinion. Thanks! > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735732#comment-14735732 ] Sergey Shelukhin commented on HIVE-11587: - Should be ok for regular join for now I guess > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735774#comment-14735774 ] Wei Zheng commented on HIVE-11587: -- Agree. If anything pops up for regular mapjoin in the future, we can always adjust that param. Can you please commit the patch to master? Thanks! > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735346#comment-14735346 ] Sergey Shelukhin commented on HIVE-11587: - +1... should we file separate JIRA for items that are not done, from the description? If any > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732062#comment-14732062 ] Wei Zheng commented on HIVE-11587: -- The above mismatches run clean locally. > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731324#comment-14731324 ] Sergey Shelukhin commented on HIVE-11587: - I actually have a comment :) other than that should be good > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731418#comment-14731418 ] Wei Zheng commented on HIVE-11587: -- [~sershe] Answered review comments :) > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731762#comment-14731762 ] Hive QA commented on HIVE-11587: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12754293/HIVE-11587.08.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9393 tests executed *Failed tests:* {noformat} TestContribNegativeCliDriver - did not produce a TEST-*.xml file org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5185/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5185/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5185/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12754293 - PreCommit-HIVE-TRUNK-Build > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731130#comment-14731130 ] Wei Zheng commented on HIVE-11587: -- Both TestPigHBaseStorageHandler and TestStreaming pass locally. [~sershe] Could you review the latest patch and commit it if possible? Thanks! > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729989#comment-14729989 ] Hive QA commented on HIVE-11587: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12754051/HIVE-11587.06.patch {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 9392 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join29 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_9 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_hybridgrace_hashjoin_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mapjoin_mapjoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dynpart_hashjoin_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_smb_main org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_unionDistinct_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_left_outer_join2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_leftsemi_mapjoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_nullsafe_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join5 org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5170/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5170/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5170/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12754051 - PreCommit-HIVE-TRUNK-Build > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730279#comment-14730279 ] Hive QA commented on HIVE-11587: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12754109/HIVE-11587.07.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9390 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5173/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5173/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5173/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12754109 - PreCommit-HIVE-TRUNK-Build > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, > HIVE-11587.06.patch, HIVE-11587.07.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725854#comment-14725854 ] Wei Zheng commented on HIVE-11587: -- index_auto_mult_tables.q runs clean locally. I'm planning to run some tests against TPC-DS tables. Will let you know once done. Thanks! > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725804#comment-14725804 ] Sergey Shelukhin commented on HIVE-11587: - +1 when tests pass or if failures are unrelated, can you take a look > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723791#comment-14723791 ] Sergey Shelukhin commented on HIVE-11587: - Thanks, some minor comments on RB. > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724153#comment-14724153 ] Hive QA commented on HIVE-11587: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12753348/HIVE-11587.05.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9377 tests executed *Failed tests:* {noformat} TestSchedulerQueue - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5124/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5124/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5124/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12753348 - PreCommit-HIVE-TRUNK-Build > Fix memory estimates for mapjoin hashtable > -- > > Key: HIVE-11587 > URL: https://issues.apache.org/jira/browse/HIVE-11587 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, > HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch > > > Due to the legacy in in-memory mapjoin and conservative planning, the memory > estimation code for mapjoin hashtable is currently not very good. It > allocates the probe erring on the side of more memory, not taking data into > account because unlike the probe, it's free to resize, so it's better for > perf to allocate big probe and hope for the best with regard to future data > size. It is not true for hybrid case. > There's code to cap the initial allocation based on memory available > (memUsage argument), but due to some code rot, the memory estimates from > planning are not even passed to hashtable anymore (there used to be two > config settings, hashjoin size fraction by itself, or hashjoin size fraction > for group by case), so it never caps the memory anymore below 1 Gb. > Initial capacity is estimated from input key count, and in hybrid join cache > can exceed Java memory due to number of segments. > There needs to be a review and fix of all this code. > Suggested improvements: > 1) Make sure "initialCapacity" argument from Hybrid case is correct given the > number of segments. See how it's calculated from keys for regular case; it > needs to be adjusted accordingly for hybrid case if not done already. > 1.5) Note that, knowing the number of rows, the maximum capacity one will > ever need for probe size (in longs) is row count (assuming key per row, i.e. > maximum possible number of keys) divided by load factor, plus some very small > number to round up. That is for flat case. For hybrid case it may be more > complex due to skew, but that is still a good upper bound for the total probe > capacity of all segments. > 2) Rename memUsage to maxProbeSize, or something, make sure it's passed > correctly based on estimates that take into account both probe and data size, > esp. in hybrid case. > 3) Make sure that memory estimation for hybrid case also doesn't come up with > numbers that are too small, like 1-byte hashtable. I am not very familiar > with that code but it has happened in the past. > Other issues we have seen: > 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you > should not allocate large array in advance. Even if some estimate passes > 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. > 5) For hybrid, don't pre-allocate WBs - only allocate on write. > 6) Change everywhere rounding up to power of two is used to rounding down, at > least for hybrid case (?) > I wanted to put all of these items in single JIRA so we could keep track of > fixing all of them. > I think there are JIRAs for some of these already, feel free to link them to > this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721412#comment-14721412 ] Hive QA commented on HIVE-11587: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12753165/HIVE-11587.03.patch {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 9380 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_leftsemi_mapjoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join29 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_empty org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_filter_on_outerjoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_runtime_skewjoin_mapjoin_spark org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_25 org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5116/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5116/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5116/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12753165 - PreCommit-HIVE-TRUNK-Build Fix memory estimates for mapjoin hashtable -- Key: HIVE-11587 URL: https://issues.apache.org/jira/browse/HIVE-11587 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Wei Zheng Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, HIVE-11587.03.patch Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memUsage to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate passes 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. 5) For hybrid, don't pre-allocate WBs - only allocate on write. 6) Change everywhere rounding up to power of two is used to rounding down, at least for hybrid case (?) I wanted to put all of these items in single JIRA so we could keep track of fixing all of them. I think there are JIRAs for
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721683#comment-14721683 ] Hive QA commented on HIVE-11587: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12753199/HIVE-11587.04.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9381 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5118/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5118/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5118/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12753199 - PreCommit-HIVE-TRUNK-Build Fix memory estimates for mapjoin hashtable -- Key: HIVE-11587 URL: https://issues.apache.org/jira/browse/HIVE-11587 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Wei Zheng Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, HIVE-11587.03.patch, HIVE-11587.04.patch Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memUsage to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate passes 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. 5) For hybrid, don't pre-allocate WBs - only allocate on write. 6) Change everywhere rounding up to power of two is used to rounding down, at least for hybrid case (?) I wanted to put all of these items in single JIRA so we could keep track of fixing all of them. I think there are JIRAs for some of these already, feel free to link them to this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721033#comment-14721033 ] Hive QA commented on HIVE-11587: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12753092/HIVE-11587.02.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9380 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5112/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5112/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5112/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12753092 - PreCommit-HIVE-TRUNK-Build Fix memory estimates for mapjoin hashtable -- Key: HIVE-11587 URL: https://issues.apache.org/jira/browse/HIVE-11587 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Wei Zheng Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memUsage to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate passes 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. 5) For hybrid, don't pre-allocate WBs - only allocate on write. 6) Change everywhere rounding up to power of two is used to rounding down, at least for hybrid case (?) I wanted to put all of these items in single JIRA so we could keep track of fixing all of them. I think there are JIRAs for some of these already, feel free to link them to this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720380#comment-14720380 ] Hive QA commented on HIVE-11587: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752866/HIVE-11587.01.patch {color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 9380 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join29 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_4 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_7 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_9 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_correlationoptimizer1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_inner_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_join_nulls org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_leftsemi_mapjoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_nullsafe_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join0 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5102/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5102/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5102/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 20 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12752866 - PreCommit-HIVE-TRUNK-Build Fix memory estimates for mapjoin hashtable -- Key: HIVE-11587 URL: https://issues.apache.org/jira/browse/HIVE-11587 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Wei Zheng Attachments: HIVE-11587.01.patch Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memUsage to maxProbeSize, or something, make sure it's passed correctly based on estimates
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720789#comment-14720789 ] Sergey Shelukhin commented on HIVE-11587: - Left some feedback. Mostly, such internals as wbs should not be exposed externally Fix memory estimates for mapjoin hashtable -- Key: HIVE-11587 URL: https://issues.apache.org/jira/browse/HIVE-11587 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Wei Zheng Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memUsage to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate passes 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. 5) For hybrid, don't pre-allocate WBs - only allocate on write. 6) Change everywhere rounding up to power of two is used to rounding down, at least for hybrid case (?) I wanted to put all of these items in single JIRA so we could keep track of fixing all of them. I think there are JIRAs for some of these already, feel free to link them to this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717823#comment-14717823 ] Sergey Shelukhin commented on HIVE-11587: - Can you please post an RB Fix memory estimates for mapjoin hashtable -- Key: HIVE-11587 URL: https://issues.apache.org/jira/browse/HIVE-11587 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Wei Zheng Attachments: HIVE-11587.01.patch Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memUsage to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate passes 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. 5) For hybrid, don't pre-allocate WBs - only allocate on write. 6) Change everywhere rounding up to power of two is used to rounding down, at least for hybrid case (?) I wanted to put all of these items in single JIRA so we could keep track of fixing all of them. I think there are JIRAs for some of these already, feel free to link them to this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700257#comment-14700257 ] Sergey Shelukhin commented on HIVE-11587: - [~mmokhtar] [~mmccline] [~gopalv] fyi Fix memory estimates for mapjoin hashtable -- Key: HIVE-11587 URL: https://issues.apache.org/jira/browse/HIVE-11587 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Wei Zheng Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memUsage to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate passes 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. 5) For hybrid, don't pre-allocate WBs - only allocate on write. 6) Change everywhere rounding up to power of two is used to rounding down, at least for hybrid case (?) I wanted to put all of these items in single JIRA so we could keep track of fixing all of them. I think there are JIRAs for some of these already, feel free to link them to this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)